Pages

Tuesday, 2 April 2013

SCAN Count From MSDN Blog

SCAN COUNT meaning in SET STATISTICS IO output

As per Books Online, Scan Count is: Number of index or table scans performed.

During the course of this post, I shall point out examples and some thumb rules which will help you understand the Scan Count value better. I looked into the code and found that the Scan Count is calculated based on the number of scans started for fetching the resultant data set.

Thumb rules
1.       Scan count is 0 if the index that you are using is a unique index or clustered index on a primary key and you are seeking for only one value. Eg. WHERE Primary_Key_Column = <value>

2.       Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. Eg. WHERE Clustered_Index_Key_Column = <value>

3.       Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key.

Script used to populate the data in the tables used in the examples below:

create table tbl2 (a int, b int)
declare @var int
set @var = 1
while @var <= 1000
begin
       insert into tbl2 values (@var,@var+1)
       set @var = @var + 1
end

create clustered index cidx_a on tbl2(a)
create index ncidx_b on tbl2(b)


create table tbl3 (a int primary key, b int)
declare @var int
set @var = 1
while @var <= 1000
begin
       insert into tbl3 values (@var,@var+1)
       set @var = @var + 1
end

create index ncidx_b on tbl3(b)

  
create table tbl4 (a int, b int)
declare @var int
set @var = 1
while @var <= 1000
begin
       insert into tbl4 values (1,@var+1)
       set @var = @var + 1
end

create clustered index cidx_a on tbl4(a)
create index ncidx_b on tbl4(b)


create table tbl5 (a int, b int)
declare @var int
set @var = 1
while @var <= 1000
begin
        insert into tbl5 values (@var,@var+1)
        set @var = @var + 1
end

create unique index cidx_a on tbl5(a)
create index ncidx_b on tbl5(b)


Tbl2 has two columns a, b with no duplicate values with a clustered index on a and a non-clustered index on b.

Tbl3 has two columns a, b with a clustered index on primary key column a and a non-clustered index on b.

Tbl4 has two columns a, b with duplicate values for column a with a clustered index on a and a non-clustered index on b.

Tbl5 has two columns a, b with no duplicate values for column a with a unique non-clustered index on a and a non-clustered index on b.

Scan count is 0 if the index that you are using is a unique index or clustered index on a primary key and you are seeking for only one value

The following queries show a scan count as 0:
1.       select * from tbl5 where a = 1
2.       select * from tbl3 where a = 1

Output: Table 'tbl3'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The reason for this is that you have a unique index defined on column a on tbl5 which tells the engine that there is only one value in the table that satisfies the criteria a=1. Similar logic applies, when you are using the primary key column to search for a value.

Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for

The following queries show a scan count as 1:

1.       select * from tbl2 where a between 1 and 10

2.       select * from tbl3 where b between 1 and 10

3.       select * from tbl4 where a between 1 and 10

4.       select * from tbl3 with (index (ncidx_b)) where a between 1 and 10


Output: Table 'tbl4'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Since we are searching for values using the clustered index column and performing a scan/seek in one direction only after location the first value i.e. a = 1. This will be obvious from the query plan output of the above queries.

Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key

If you look at the query below, you will find that the Scan Count is 2.

select * from tbl2 where a = 1 or a = 2

Output: Table 'tbl2'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

For all three tables, the same query above shows a Scan Count = 2. The reason can be found in the plan as it is an Ordered Forward Seek with two different key values. The more the number of OR clause entries, you will see the Scan Count increasing proportionately.

|--Clustered Index Seek(OBJECT:([AdventureWorks].[dbo].[tbl2].[cidx_a]), SEEK:([AdventureWorks].[dbo].[tbl2].[a]=(1) OR [AdventureWorks].[dbo].[tbl2].[a]=(2)) ORDERED FORWARD)

I hope that the above examples help clarify some misconceptions around Scan Count. The Scan Count is not necessarily the deterministic factor on deciding whether a query plan being used is bad or good from an IO standpoint. You should be looking at the Physical/Logical read values first when comparing the STATISTICS IO outputs.

NOTE: I have not yet covered all the different permutations and combinations for non-clustered indexes in this post. In case you have a specific question, please feel free to comment and I shall respond accordingly.

SCAN COUNT For Joins:
 
This time I am going to use the MSDB database and illustrate the Scan Count output for Inner Joins.

The query that I used was:

select * from msdb.dbo.backupfile a
inner join msdb.dbo.backupset b
on a.backup_set_id = b.backup_set_id
where a.backup_set_id = 1

The query plan snippet for the above is:

|--Clustered Index Seek(OBJECT:([msdb].[dbo].[backupset].[PK__backupse__21F79AAB0E391C95] AS [b]), SEEK:([b].[backup_set_id]=(1)) ORDERED FORWARD)

|--Clustered Index Seek(OBJECT:([msdb].[dbo].[backupfile].[PK__backupfi__57D1800A17C286CF] AS [a]), SEEK:([a].[backup_set_id]=(1)) ORDERED FORWARD)  

The scan count output is as follows:

Table 'backupfile'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'backupset'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The scan count on the table “backupset” is Zero because the ID used to search is 1. The backup_set_id column has a primary key defined on it. So using thumb rule #1, we can explain this. The scan count on the table “backupfile” is 1 because the primary key used for the seek is defined on two columns: backup_set_id and file_number. In this case, SQL Server needs to go beyond the one value that it found to ensure that there are no other rows with the seek value i.e. backup_set_id = 1. This behavior can be explained with thumb rule #2.


Now, I shall remove the filter and see what kind of surprises we get for Scan Count numbers! J
The snippet of the query plan is shown below. The Index Scan/Seek operations feed into a Nested Loop operation as before but the difference here is that there is no SEEK filter applied on the table “backupset”.

Query:
select * from msdb.dbo.backupfile a
inner join msdb.dbo.backupset b
on a.backup_set_id = b.backup_set_id

|--Clustered Index Scan(OBJECT:([msdb].[dbo].[backupset].[PK__backupse__21F79AAB0E391C95] AS [b]))                                                                                                               
|--Clustered Index

Seek(OBJECT:([msdb].[dbo].[backupfile].[PK__backupfi__57D1800A17C286CF] AS [a]), SEEK:([a].[backup_set_id]=[msdb].[dbo].[backupset].[backup_set_id] as [b].[backup_set_id]) ORDERED FORWARD)  

The Statistics IO output for the query is as follows:
Table 'backupfile'. Scan count 20, logical reads 42, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'backupset'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The Scan Count of One for the table backupset is self-explanatory because it started a scan from the first value stored in the clustered index page and went on scanning till the end to retrieve, in my case, 20 rows. This can very well be explained by thumb rule #2.

Now the surprising part is the Scan Count of 20 for the table “backupfile”. If you look at the SEEK condition, you will see that the JOIN criteria is used as the SEEK condition here. This means that for each row returned from the Clustered Index Scan operation on the “backupset” table, it will use that value to Seek for a matching row in the “backupfile” table. Since, the table has a primary key defined on two columns and we are joining only one of the primary key columns, each seek on a value returned from the “backupfile” table would result in a Scan Count 1 as SQL Server would need to determine if there are any other rows with the same value. Since I have 20 rows returned and the join condition needs to be evaluated for all these rows, the scan count is 20*1=20.

If I change the query to the following:

select * from msdb.dbo.backupfile a
inner join msdb.dbo.backupset b
on a.backup_set_id = b.backup_set_id
where a.file_number = 1

The Statistics IO output changes to the following:
Table 'backupfile'. Scan count 0, logical reads 40, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'backupset'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The difference in the query plan now is that SEEK condition for the clustered index seek on the “backupfile” table changes to:

SEEK:([a].[backup_set_id]=[msdb].[dbo].[backupset].[backup_set_id] as [b].[backup_set_id] AND [a].[file_number]=(1.))

Even though we are evaluating the condition for all the 20 rows returned by the “backupset” table, we find that there is an additional filter in the SEEK condition i.e. the file_number column, for the data fetched from the “backupfile” table. This changes the scan count to Zero. The reason for this is that the SQL instance now knows that it has to fetch only one row for the value given in the filter as both columns present in the composite primary key are being used during the SEEK. Now SQL knows prior to the fetch that there cannot be duplicate rows, so no extra scan/seek is required.
This is a gotcha that we might overlook when a clustered index seek is being used! Since, we are dealing with a small set of rows, the plan changes may not be significant but for larger tables, such a difference in evaluation may have a significant impact on the query plan and ultimately the performance of the query.


No comments:

Post a Comment