Tips for Sql query performance

The following are a list of good practices I follow when querying large sets of data.

SELECT *

Avoid selecting all columns from a table. Queries should explicitly list each column it requires

  • Changing column definitions on the underlying tables can have an adverse effect on dependencies such as views, name clashes etc
  • Avoid reading and returning unnecessary data
  • Can help avoid table scans
  • Explicitly selecting columns allows the optimizer to better tailor the query plan
  • Selecting columns could enable the optimised query to rely on an index and not need to cross reference the entire table

Temp tables

Avoid SELECT INTO

  • creates a heap
  • no clustered index
  • not supported on Azure

Use CREATE TABLE

  1. Create a temp table with a clustered index
  2. Insert the identifying keys for all the data, include other columns from the same tables used to build the initial set of keys
  3. Index the table for use by subsequent queries

Wild cards

Minimise the use of % at the beginning of search strings, use more specific filters where possible such as _ and [a-e]% where possible. % at the start of the search string will result in a complete index or table scan

Scalar functions

Avoid scalar functions in JOIN and WHERE clauses. Calling a function can result in a full table scan as the function needs to be applied to the whole data set before the clause can be evaluated.

A simple set of steps for writing performant queries

When working with large data sets the preferred approach to writing performant queries is to minimise the use of OUTER join. Queries that use OUTER joins are often much slower than INNER joins. By separating the query as detailed below you can minimise the use of OUTER join and create efficient and performant queries that are also much easier for others to understand.

  • Define a temp table to hold all of the columns required in the results; define a clustered primary key for the table based on the natural unique identifier of the data being selected
  • Separate the requirements into INNER and OUTER joins
  • write the initial SELECT combining as many INNER joins as possible and filtering the result set to the minimum possible rows; insert the results into the temp table; exclude tables that do not a useful index for this operation
  • CREATE INDEXes on field combinations that support subsequent join operations (this may need to be done between the different UPDATE queries dependent on having the data for the index)
  • INNER join the temp table to each of the remaining tables and UPDATE the temp table with the new bits of data
  • SELECT the final data from the temp table

Leave a Reply