Pages

Wednesday, 14 February 2018

Clustered Index Scan

Clustered Index Scan:

The Clustered Index Scan operator scans the clustered index specified in the Argument column of the query execution plan. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the output of the rows be returned in the order in which the clustered index has sorted it. If the ORDERED clause is not present, the storage engine scans the index in the optimal way, without necessarily sorting the output.

Clustered Index Scan is a logical and physical operator.


Graphical execution plan icon:





Consider the following simple query against the Person.Contact table in the AdventureWorks database:

SELECT *
FROM Person.Contact


Following is the actual execution plan:



We can see that a clustered index scan operation is performed to retrieve the required data. If you place the mouse pointer over the Clustered Index Scan icon, to bring up the ToolTip window, you will see that the clustered index used was PK_Contact_ContactID and that the estimated number of rows involved in the operation was 19972.

Indexes in SQL Server are stored in a B-tree (a series of nodes that point to a parent). A clustered index not only stores the key structure, like a regular index, but also sorts and stores the data, which is the main reason why there can be only one clustered index per table.

As such, a clustered index scan is almost the same in concept as a table scan. The entire index, or a large percentage of it, is being traversed,row-by-row, in order to identify the data needed by the query.

An index scan often occurs, as in this case, when an index exists but the optimizer determines that so many rows need to be returned that it is quicker to simply scan all the values in the index rather than use the keys provided by that index.

An obvious question to ask if you see an index scan in your execution plan is whether you are returning more rows than is necessary. If the number of rows returned is higher than you expect, that's a strong indication that you need to fine-tune the WHERE clause of your query so that only those rows that are actually needed are returned. Returning unnecessary rows wastes SQL Server resources and hurts overall performance.

You will need to analyze the guidelines below for each index scan in the query, trying to turn the scan into an index seek.

Let’s use the questions bellow to analyze the predicate of the index scans in each query.

Is the predicate a ‘sarg’?
‘Sarg’ is an abbreviation for ‘Search Argument’, it’s a word created by Kalen Delaney to mean an argument that can use an index seek for the search. Not all predicate expressions are ‘sarg’s, and there are a few rules that sargs need to follow. Grant Fritchey explained two of the rules of a ‘sarg’ in his great article ‘The seven sins against TSQL Performance‘.

The predicate can appear in the WHERE clause, a filter over a field of the table that’s suffering the scan, or a join, in this case the JOIN expression is the predicate.

Making queries Sargable: First Rule: You can’t use a function in the left side (the field side) of the predicate. This is very well demonstrated in Grant’s article.

The following query disobeys this rule:

         select  TransactionID,ProductID,TransactionDate,Quantity,ActualCost
          from bigtransactionhistory 

         where  month(transactionDate)=6 and year(transactiondate)=2008

We can make this query better doing some changes to the predicate. Just writing the same predicate in a new way we will solve this problem:

         select TransactionID,ProductID,
          TransactionDate,Quantity,ActualCost
          from bigTransactionHistory

          where transactionDate between '2008/06/01' and '2008/06/30'

As you can see in the new query plan below, the problem wasn’t solved with this query yet, but we are in the right path.


An interesting point is the fact that the scan is an index scan, not a clustered index scan. This happens because the non-clustered index is a covering index for this query.

A covering index is a non-clustered index that has all the fields the query needs in its leaf level pages. When using a regular non-clustered index that doesn’t cover the query, the database engine would need to do an additional operation called ‘key lookup’. This means that, after the data is found in the leaf level of the index, SQL Server needs to retrieve from the clustered index any additional fields that are not present in the non-clustered index.

We call an index ‘covering’ for a query when it has, in the leaf level, all the fields needed for the query. The fields became part of the index when they are part of the index key, they are either part of the clustered key or they are included with ‘include’ keyword when the index was created. A covering index doesn’t require key lookups and that’s better for performance.
What’s important to notice in the above query plan is the Index Scan element without a Key Lookup element, which denotes a covering index.

In fact the index IX_ProductID_TransactionDate is a composite index with ProductID and TransactionDate as the keys and Quantity and ActualCost included. That’s all the fields in the query, so the index become a covering index for this query, there is no need of key lookups in the query.

We will see some examples later in this article because the use of covering index is one of the solutions for the scan problem. Joe Webb also talked about covering index in his great article ‘Using Covering Indexes to improve query performance’.


Second Rule: The operators need to be of the same type
 Grant also covered this subject in his article, but what could actually cause this kind of mistake?
The answer: ORMs. I’ve experienced cases where ORMs such as NHibernate were creating ‘WHERE’ clauses that compared VARCHAR fields with NVARCHAR values, resulting in the same problem. Therefore, the ORMs are potential point of performance problems. I’m not saying to not use ORMs, I’m saying to be careful, check the queries that are being generated and configure the ORM correctly to avoid this mistake.

No comments:

Post a Comment