Pages

Tuesday, 17 July 2012

GROUP BY VS GROUP BY ALL IN MS SQL SERVER

By using the GROUP BY ALL option, you can create groups even if they're empty.

In other words,your query returns groups of zero rows even when the WHERE clause filters out all the group's rows.

We will look into this more elobarately,Initially we are creating an temporary table.

create table #group_all_test
(empno tinyint ,
empname  varchar(10)
);


Inserting data into the temporary table.

insert into #group_all_test values (1,'grant')
insert into #group_all_test values (2,'grant')
insert into #group_all_test values (3,'fritchey')
insert into #group_all_test values (4,'fritchey')
insert into #group_all_test values (5,'fritchey')
insert into #group_all_test values (6,'itzik')
insert into #group_all_test values (7,'ben-gan')

Querying the temporary table.

select * from #group_all_test

empno empname   
----- ----------
1     grant
2     grant
3     fritchey
4     fritchey
5     fritchey
6     itzik
7     ben-gan

(7 row(s) affected)

The following query group the employees by empname and return the empname,count of total number of employees having the same name and order by the count.

select empname,count(*) cnt
from #group_all_test
group by empname
order by cnt

Result set:

empname    cnt        
---------- -----------
ben-gan     1
itzik           1
grant         2
fritchey      3
(4 row(s) affected)

Now we just include the filtering condition in our above query

The condition is we should return the the empname,count of total number of employees having the same name order by the count only for empno between 1 and 4.

we just modify the above query as

select empname,count(*) cnt
from #group_all_test
where empno >=1 and empno <= 4
group by empname
order by cnt

Result set:
empname    cnt        
---------- -----------
fritchey   2
grant      2
(2 row(s) affected)

If we observe the above result set the empname records "Itzik"  and "ben-gan" are missed out because these records have empno 6 and 7,hence these records are filtered out by where condition.and then group by clause applied to group the condition satisfied records.

The GROUP BY clause operates only on rows that remain after you apply the WHERE filter.

However, by adding the ALL option to the GROUP BY clause, you can generate empty groups—or groups of zero rows.

Now let's add ALL to the group by clause

select empname,count(*) cnt
from #group_all_test
where empno >=1 and empno <= 4
group by ALL empname
order by cnt

Result set:
empname    cnt        
---------- -----------
ben-gan    0
itzik          0
fritchey    2
grant       2

(4 row(s) affected)

Now the above result set contain the empname records "Itzik"  and "ben-gan" with count of "0"

The group by All clause generate groups for all the column values mentioned in the group by all clause.even it may be the zero count ot not..

Performance Considerations:

OPTIMAL GROUP BY CLAUSES

The GROUP BY performs better if you keep the number of grouping
Columns small. One way you can do so is to avoid grouping redundant columns, as in this

Example:
SELECT secondary_key_column, primary_key_column, COUNT (*)
FROM Table1
GROUP BY secondary_key_column, primary_key_column

Because primary key columns are unique and may not contain NULL by definition, the mention of
secondary_key_column in this example is redundant.

The problem is that if you take secondary_key_column out of the GROUP BY clause, you'll get an error message. All DBMSs except MySQL and Sybase will tell you that you can't have secondary_key_column in the select list if it's not also in the GROUP BY list.

This is how to write a query that's legal and that's faster:

SELECT MIN (secondary_key_column), primary_key_column, COUNT(*)
FROM Table1
GROUP BY primary_key_column

GAIN: 4/7

WARNING
Don't do this for Ingres; it shows a loss. The gain shown is for only seven DBMSs.


REDUCE BEFORE YOU EXPAND

GROUP BY tends to reduce row counts and JOIN tends to expand row counts.

Because a DBMS must evaluate FROM and WHERE clauses before GROUP BY clauses, this tip is not easy to put into practice, but there is a way.

You can make a join happen late by replacing it with a set operator. (The SQL Standard set operators are UNION, EXCEPT, and INTERSECT.)

For example, replace Statement #1 with Statement #2:

Statement #1:

SELECT SUM (Table1.column2), SUM (Table2.column2)
FROM Table1 INNER JOIN Table2
ON Table1.column1 = Table2.column1
GROUP BY Table1.column1

Statement #2:
SELECT column1, SUM (column2), 0
FROM Table1
GROUP BY column1

INTERSECT
SELECT column1, 0, SUM (column2)
FROM Table2
GROUP BY column1

GAIN: 2/2

Portability
Informix, Ingres, InterBase, Microsoft, MySQL, and Sybase don't support INTERSECT.

The gain shown is for only two DBMSs.

HAVING

Most DBMSs do not merge WHERE and HAVING clauses.

This means the following statements are logically the same but won't run at the same speed:

Query with WHERE and HAVING:

SELECT column1 FROM Table1
WHERE column2 = 5
GROUP BY column1
HAVING column1 > 6

Query with WHERE only:

SELECT column1 FROM Table1
WHERE column2 = 5
AND column1 > 6
GROUP BY column1
GAIN: 3/8

The "Query with WHERE only" runs faster on three of the Big Eight.

You should use this type of query except in the rare cases where you need to defer the filtering implied by column1 > 6—for example, if the comparison is hard to evaluate.

ALTERNATIVES TO GROUP BY

If you're writing a query that doesn't involve set functions, you can use DISTINCT as an alternative to GROUP BY.

DISTINCT has three advantages:

It's simpler,
It is legal to use in expressions,
And—with some DBMSs—it's faster.

So instead of using Query #1, use the alternative Query #2:

Query #1:
SELECT column1
FROM Table1
GROUP BY column1

Query #2:
SELECT DISTINCT column1
FROM Table1
GAIN: 4/8

No comments:

Post a Comment