Tips for Sql query performance

The following are a list of good practices I follow when querying large sets of data.

`SELECT *`

Avoid selecting all columns from a table. Queries should explicitly list each column it requires

Changing column definitions on the underlying tables can have an adverse effect on dependencies such as views, name clashes etc
Avoid reading and returning unnecessary data
Can help avoid table scans
Explicitly selecting columns allows the optimizer to better tailor the query plan
Selecting columns could enable the optimised query to rely on an index and not need to cross reference the entire table

Temp tables

Avoid `SELECT INTO`

creates a heap
no clustered index
not supported on Azure

Use `CREATE TABLE`

Create a temp table with a clustered index
Insert the identifying keys for all the data, include other columns from the same tables used to build the initial set of keys
Index the table for use by subsequent queries

Wild cards

Minimise the use of % at the beginning of search strings, use more specific filters where possible such as _ and [a-e]% where possible. % at the start of the search string will result in a complete index or table scan

Scalar functions

Avoid scalar functions in JOIN and WHERE clauses. Calling a function can result in a full table scan as the function needs to be applied to the whole data set before the clause can be evaluated.

A simple set of steps for writing performant queries

When working with large data sets the preferred approach to writing performant queries is to minimise the use of OUTER join. Queries that use OUTER joins are often much slower than INNER joins. By separating the query as detailed below you can minimise the use of OUTER join and create efficient and performant queries that are also much easier for others to understand.

Define a temp table to hold all of the columns required in the results; define a clustered primary key for the table based on the natural unique identifier of the data being selected
Separate the requirements into INNER and OUTER joins
write the initial SELECT combining as many INNER joins as possible and filtering the result set to the minimum possible rows; insert the results into the temp table; exclude tables that do not a useful index for this operation
CREATE INDEXes on field combinations that support subsequent join operations (this may need to be done between the different UPDATE queries dependent on having the data for the index)
INNER join the temp table to each of the remaining tables and UPDATE the temp table with the new bits of data
SELECT the final data from the temp table

Tips for Sql query performance

SELECT *

Temp tables

Avoid SELECT INTO

Use CREATE TABLE

Wild cards

Scalar functions

A simple set of steps for writing performant queries

Share this:

Post navigation

Leave a Reply Cancel reply

`SELECT *`

Avoid `SELECT INTO`

Use `CREATE TABLE`