Home » Tech Tips » Best Ways to Fine-Tune Your SQL Queries

Best Ways to Fine-Tune Your SQL Queries

This handy guide will teach you how to get the most out of your SQL queries by fine-tuning your SQL queries. See how you can keep your queries low-cost and efficient.

If you’re running without a data warehouse or separate analytical database for reporting, the live production database is likely your only source for the latest, up-to-date data. When querying a production database, optimization is key. An inefficient query will drain the production database’s resources, and cause slow performance or loss of service for other users if the query contains errors. It’s vital you optimize your queries for minimum impact on database performance.

SQL-Optimization-Techniques-myTechmint

SQL Query Performance Tuning Tips:

1. Define business requirements first

Definitely make sure you’re applying these practices when optimizing SQL queries, including:

  • Identify relevant stakeholders. Make sure all necessary parties are in the discussion to develop your query. When querying production databases, make sure the DBA team is included.
  • Focus on business outcomes. Give the query a definite and unique purpose. Taxing the production database for exploratory or duplicative reports is an unnecessary risk.
  • Frame the discussion for optimal requirements. Define the function and scope of the report by identifying its intended audience. This will focus the query on the tables with the correct level of detail.
  • Ask great questions. Follow the 5 W’s: Who? What? Where? When? Why?
  • Write very specific requirements and confirm them with stakeholders. The performance of the production database is too critical to have unclear or ambiguous requirements. Make sure the requirements are as specific as possible and confirm the requirements with all stakeholders before running the query.

2. SELECT fields instead of using SELECT *

When running exploratory queries, many SQL developers use SELECT * (read as “select all”) as a shorthand to query all available data from a table. However, if a table has many fields and many rows, this taxes database resources by querying a lot of unnecessary data.

Using the SELECT statement will point the database to querying only the data you need to meet the business requirements. Here’s an example where the business requirements request mailing addresses for customers.

Inefficient:

SELECT *
FROM Customers

This query may pull in other data also stored in the customer table, such as phone numbers, activity dates, and notes from sales and customer service.

Efficient:

SELECT FirstName, LastName, Address, City, State, Zip
FROM Customers

This query is much cleaner and only pulls the required information for mailing addresses.

To keep an index of all tables and field names, run a query from a system table such as INFORMATION_SCHEMA or ALL_TAB_COLUMNS.

3. Avoid SELECT DISTINCT

SELECT DISTINCT is a handy way to remove duplicates from a query. SELECT DISTINCT works by GROUPing all fields in the query to create distinct results. To accomplish this goal, however, a large amount of processing power is required. Additionally, data may be grouped to the point of being inaccurate. To avoid using SELECT DISTINCT, select more fields to create unique results.

Inefficient and inaccurate:

SELECT DISTINCT FirstName, LastName, State
FROM Customers

This query doesn’t account for multiple people in the same state having the same first and last name. Popular names such as Syrus or Rider will be grouped together, causing an inaccurate number of records. In larger databases, a large number of Syrus and Rider will cause this query to run slowly.

Efficient and accurate:

SELECT FirstName, LastName, Address, City, State, Zip
FROM Customers

By adding more fields, unduplicated records were returned without using SELECT DISTINCT. The database does not have to group any fields, and the number of records is accurate.

4. Create joins with INNER JOIN (not WHERE)

Some SQL developers prefer to make joins with WHERE clauses, such as the following:

SELECT Customers.CustomerID, Customers.Name, Sales.LastSaleDate
FROM Customers, Sales

WHERE Customers.CustomerID = Sales.CustomerID

This type of join creates a Cartesian Join, also called a Cartesian Product or CROSS JOIN.

In a Cartesian Join, all possible combinations of the variables are created. In this example, if we had 1,000 customers with 1,000 total sales, the query would first generate 1,000,000 results, then filter for the 1,000 records where CustomerID is correctly joined. This is an inefficient use of database resources, as the database has done 100x more work than required. Cartesian Joins are especially problematic in large-scale databases because a Cartesian Join of two large tables could create billions or trillions of results.

To prevent creating a Cartesian Join, use INNER JOIN instead:

SELECT Customers.CustomerID, Customers.Name, Sales.LastSaleDate
FROM Customers
INNER JOIN Sales
ON Customers.CustomerID = Sales.CustomerID

The database would only generate the 1,000 desired records where CustomerID is equal.

Some DBMS systems are able to recognize WHERE joins and automatically run them as INNER JOINs instead. In those DBMS systems, there will be no difference in performance between a WHERE join and INNER JOIN. However, INNER JOIN is recognized by all DBMS systems.

5. Use WHERE instead of HAVING to define filters

The goal of an efficient query is to pull only the required records from the database. Per the SQL Order of Operations, HAVING statements are calculated after WHERE statements. If the intent is to filter a query based on conditions, a WHERE statement is more efficient.

For example, let’s assume 200 sales have been made in the year 2021, and we want to query for the number of sales per customer in 2021.

SELECT Customers.CustomerID, Customers.Name, Count(Sales.SalesID)
FROM Customers

INNER JOIN Sales

ON Customers.CustomerID = Sales.CustomerID

GROUP BY Customers.CustomerID, Customers.Name

HAVING Sales.LastSaleDate BETWEEN #1/1/2021# AND #12/31/2021#

This query would pull 1,000 sales records from the Sales table, then filter for the 200 records generated in the year 2021, and finally count the records in the dataset.

In comparison, WHERE clauses limit the number of records pulled:

SELECT Customers.CustomerID, Customers.Name, Count(Sales.SalesID)
FROM Customers

INNER JOIN Sales

ON Customers.CustomerID = Sales.CustomerID

WHERE Sales.LastSaleDate BETWEEN #1/1/2021# AND #12/31/2021#

GROUP BY Customers.CustomerID, Customers.Name

This query would pull the 200 records from the year 2021, and then count the records in the dataset. The first step in the HAVING clause has been completely eliminated.

HAVING should only be used when filtering on an aggregated field. In the query above, we could additionally filter for customers with greater than 5 sales using a HAVING statement.

SELECT Customers.CustomerID, Customers.Name, Count(Sales.SalesID)
FROM Customers

INNER JOIN Sales

ON Customers.CustomerID = Sales.CustomerID

WHERE Sales.LastSaleDate BETWEEN #1/1/2021# AND #12/31/2021#

GROUP BY Customers.CustomerID, Customers.Name

HAVING Count(Sales.SalesID) > 5

6. Use wildcards at the end of a phrase only

When searching plaintext data, such as cities or names, wildcards create the widest search possible. However, the widest search is also the most inefficient search.

When a leading wildcard is used, especially in combination with an ending wildcard, the database is tasked with searching all records for a match anywhere within the selected field.

Consider this query to pull cities beginning with ‘Char’:

SELECT City FROM Customers
WHERE City LIKE ‘%Char%’

This query will pull the expected results of Charleston, Charlotte, and Charlton. However, it will also pull unexpected results, such as Cape Charles, Crab Orchard, and Richardson.

A more efficient query would be:

SELECT City FROM Customers
WHERE City LIKE ‘Char%’

This query will pull only the expected results of Charleston, Charlotte, and Charlton.

7. Use LIMIT to sample query results

Before running a query for the first time, ensure the results will be desirable and meaningful by using a LIMIT statement. (In some DBMS systems, the word TOP is used interchangeably with LIMIT.) The LIMIT statement returns only the number of records specified. Using a LIMIT statement prevents taxing the production database with a large query, only to find out the query needs editing or refinement.

In the 2021 sales query from above, we will examine a limit of 10 records:

SELECT Customers.CustomerID, Customers.Name, Count(Sales.SalesID)
FROM Customers

INNER JOIN Sales

ON Customers.CustomerID = Sales.CustomerID

WHERE Sales.LastSaleDate BETWEEN #1/1/2016# AND #12/31/2016#

GROUP BY Customers.CustomerID, Customers.Name

LIMIT 10

We can see by the sample whether we have a useable data set or not.

8. Run your query during off-peak hours

In order to minimize the impact of your analytical queries on the production database, talk to a DBA about scheduling the query to run at an off-peak time. The query should run when concurrent users are at their lowest number, which is typically the middle of the night (3 – 5 a.m.).

The more of the following criteria your query has, the more likely of a candidate it should be to run at night:

  • Selecting from large tables (>1,000,000 records)
  • Cartesian Joins or CROSS JOINs
  • Looping statements
  • SELECT DISTINCT statements
  • Nested subqueries
  • Wildcard searches in long text or memo fields
  • Multiple schema queries

9. Indexing

Indexing in SQL Server helps retrieve data more quickly from a table, thereby giving a tremendous boost to SQL query performance. Allow effective use of clustered and non-clustered indexes. Understand the query’s intent and choose the right form for your scenario.

Use a covering index to reduce the time needed for the execution of commonly used statements. Indexes occupy disk space. The more indexes you have, the greater the space used on the disk. In SQL Server, a clustered index requires no additional disk space, but any non-clustered index needs additional disk space as it is stored separately from the list.

10. Don’t run queries in a loop

Coding SQL queries in loops slow the entire sequence. Instead of writing a question and running it in a loop, bulk insert and update can be used depending on the situation.

Inefficient

for (int i = 0; i < 10; i++) {
$query = “INSERT INTO Business (X,Y,Z) VALUES . . . .”;
printf (“New Record has been inserted”);
}


Efficient

INSERT INTO Business (X,Y,Z) VALUES (1,2,3), (4,5,6). . . .

 

1 thought on “Best Ways to Fine-Tune Your SQL Queries”

  1. I got this web site from my buddy who informed me about this web
    page and at the moment this time I am browsing this web page and reading very informative content at this place.

    Reply

Leave a Comment