Clustered Index Scan:
The Clustered
Index Scan operator scans the clustered index specified in the
Argument column of the query execution plan. When an optional WHERE:()
predicate is present, only those rows that satisfy the predicate are returned.
If the Argument column contains the ORDERED clause, the query processor has
requested that the output of the rows be returned in the order in which the
clustered index has sorted it. If the ORDERED clause is not present, the
storage engine scans the index in the optimal way, without necessarily sorting
the output.
Clustered
Index Scan is a logical and physical operator.
Graphical
execution plan icon:
Consider the following simple query against the Person.Contact table in the AdventureWorks database:
SELECT *
FROM Person.Contact
Following is the actual execution plan:
We can see that a clustered index scan operation is performed to retrieve the
required data. If you place the mouse pointer over the Clustered Index Scan
icon, to bring up the ToolTip window, you will see that the clustered index
used was PK_Contact_ContactID and that the estimated number of rows involved in
the operation was 19972.
Indexes in SQL Server are stored in a B-tree (a series
of nodes that point to a parent). A clustered index not only stores the key
structure, like a regular index, but also sorts and stores the data, which is
the main reason why there can be only one clustered index per table.
As such, a clustered index scan is almost the same in
concept as a table scan. The entire index, or a large percentage of it, is
being traversed,row-by-row, in order to identify the data needed by the query.
An index scan often occurs, as in this case, when an
index exists but the optimizer determines that so many rows need to be returned
that it is quicker to simply scan all the values in the index rather than use
the keys provided by that index.
An obvious question to ask if you see an index scan in
your execution plan is whether you are returning more rows than is necessary.
If the number of rows returned is higher than you expect, that's a strong
indication that you need to fine-tune the WHERE clause of your query so that
only those rows that are actually needed are returned. Returning unnecessary
rows wastes SQL Server resources and hurts overall performance.
You will need to analyze the guidelines below for each
index scan in the query, trying to turn the scan into an index seek.
Let’s use the questions bellow to analyze the
predicate of the index scans in each query.
Is the predicate a ‘sarg’?
‘Sarg’ is an abbreviation for ‘Search Argument’, it’s a word created by Kalen Delaney to mean an argument that can use an index seek for the search. Not all predicate expressions are ‘sarg’s, and there are a few rules that sargs need to follow. Grant Fritchey explained two of the rules of a ‘sarg’ in his great article ‘The seven sins against TSQL Performance‘.
The predicate can appear in the WHERE clause, a filter over a field of the table that’s suffering the scan, or a join, in this case the JOIN expression is the predicate.
Making queries Sargable: First Rule: You can’t use a function in the left side (the field side) of the predicate. This is very well demonstrated in Grant’s article.
The following query disobeys this rule:
select TransactionID,ProductID,TransactionDate,Quantity,ActualCost
from bigtransactionhistory
where month(transactionDate)=6 and year(transactiondate)=2008
We can make this query better doing some changes to the predicate.
Just writing the same predicate in a new way we will solve this problem:
select TransactionID,ProductID,
TransactionDate,Quantity,ActualCost
from bigTransactionHistory
where transactionDate between '2008/06/01' and '2008/06/30'
As you can see in the new query plan below, the problem wasn’t
solved with this query yet, but we are in the right path.
Is the predicate a ‘sarg’?
‘Sarg’ is an abbreviation for ‘Search Argument’, it’s a word created by Kalen Delaney to mean an argument that can use an index seek for the search. Not all predicate expressions are ‘sarg’s, and there are a few rules that sargs need to follow. Grant Fritchey explained two of the rules of a ‘sarg’ in his great article ‘The seven sins against TSQL Performance‘.
The predicate can appear in the WHERE clause, a filter over a field of the table that’s suffering the scan, or a join, in this case the JOIN expression is the predicate.
Making queries Sargable: First Rule: You can’t use a function in the left side (the field side) of the predicate. This is very well demonstrated in Grant’s article.
The following query disobeys this rule:
select TransactionID,ProductID,TransactionDate,Quantity,ActualCost
from bigtransactionhistory
where month(transactionDate)=6 and year(transactiondate)=2008
select TransactionID,ProductID,
TransactionDate,Quantity,ActualCost
from bigTransactionHistory
where transactionDate between '2008/06/01' and '2008/06/30'
An interesting point is the fact that the scan is an
index scan, not a clustered index scan. This happens because the non-clustered
index is a covering index for this query.
A covering index is a non-clustered index that has all
the fields the query needs in its leaf level pages. When using a regular
non-clustered index that doesn’t cover the query, the database engine would
need to do an additional operation called ‘key lookup’. This means that, after
the data is found in the leaf level of the index, SQL Server needs to retrieve
from the clustered index any additional fields that are not present in the
non-clustered index.
We call an index ‘covering’ for a query when it has,
in the leaf level, all the fields needed for the query. The fields became part
of the index when they are part of the index key, they are either part of the
clustered key or they are included with ‘include’ keyword when the index was
created. A covering index doesn’t require key lookups and that’s better for
performance.
What’s important to notice in the above query plan is
the Index Scan element without a Key Lookup element, which denotes a covering
index.
In fact the index IX_ProductID_TransactionDate is
a composite index with ProductID and TransactionDate as the
keys and Quantity and ActualCost included. That’s all the
fields in the query, so the index become a covering index for this query, there
is no need of key lookups in the query.
We will see some examples later in this article
because the use of covering index is one of the solutions for the scan problem.
Joe Webb also talked about covering index in his great article ‘Using Covering
Indexes to improve query performance’.
Second Rule: The operators need to be of the same
type
Grant also covered this subject in his article, but
what could actually cause this kind of mistake?
The answer: ORMs. I’ve experienced cases where ORMs such as NHibernate
were creating ‘WHERE’ clauses that compared VARCHAR fields with NVARCHAR
values, resulting in the same problem. Therefore, the ORMs are potential point
of performance problems. I’m not saying to not use ORMs, I’m saying to be
careful, check the queries that are being generated and configure the ORM
correctly to avoid this mistake.

No comments:
Post a Comment