Pages

Monday, 3 September 2012

All About AVG Aggregate Function.

AVG:

Returns the average of the values in a group. Null values are ignored.

Syntax:

AVG ( [ ALL | DISTINCT ] expression )

Arguments
----------

ALL

Applies the aggregate function to all values. ALL is the default.NULL values are ignored.


DISTINCT

Specifies that AVG be performed only on each unique instance of a value, regardless of how many times the value occurs.

Expression

Is an expression of the exact numeric or approximate numeric data type category,except for the bit data type.

Aggregate functions and subqueries are not permitted.

Unlike MIN and MAX, AVG can only accept a numeric expression as a parameter.

Return Types
----------------
The return type is determined by the type of the evaluated result of expression.

Expression result                         Return type

tinyint                                                   int

smallint                                                 int

int                                                         int

bigint                                                     bigint

decimal category (p, s)                          decimal(38, s) divided by decimal(10, 0)

money and smallmoney category             money

float and real category                            float


Remarks
----------

If the data type of expression is an alias data type, the return type is also of the alias data type.

However, if the base data type of the alias data type is promoted,
for example from tinyint to int, the return value is of the promoted data type and not the alias data type.

AVG () computes the average of a set of values by dividing the sum of those values by the count of nonnull values. If the sum exceeds the maximum value for the data type of the return value an error will be returned.

Points to remember:

1.Approxmate data types float or real does't return exact numeric values,to return exact numeric value use the exact numeric data types.

2.The Return type depends on the argument data type,If we have int as a data type of an argument and
we need exact value then we should cast the individual argument values.

3.The average aggregate function can not take argument as a bit Data type column.

Test1:
ALL
Applies the aggregate function to all values. ALL is the default.NULL values are ignored.

select * from #msora

id          name                          
----------- ------------------------------
1           ms avg 1
2           ms avg 2
3           ms avg 3
4           ms avg 4
NULL        ms ora null test


select avg(cast(id as numeric(5,2))) as average from #msora

average                                 
----------------------------------------
2.500000

From the above query it's clear the AVG function ignore the NULL values.

Test 2:

DISTINCT

Specifies that AVG be performed only on each unique instance of a value, regardless of how many times the value occurs.

insert into #msora values(1,'ms ora dup test');

command executed successfully.

select * from #msora;

id          name                          
----------- ------------------------------
1           ms avg 1
2           ms avg 2
3           ms avg 3
4           ms avg 4
NULL        ms ora null test
1           ms ora dup test


select avg(distinct cast(id as numeric(5,2))) as average from #msora

average                                 
----------------------------------------
2.500000

From the above result set it's clear distinct clause performed only on each unique instance of a value.

Test3:
Expression

Is an expression of the exact numeric or approximate numeric data type category,except for the bit data type.
Aggregate functions and subqueries are not permitted.

Example:

select avg(sum(id)) average from #msora

Server: Msg 130, Level 15, State 1, Line 1
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.


select avg((select id from #msora)) average from #msora

Server: Msg 130, Level 15, State 1, Line 1
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.

We need to write subqueries as a derived table.

select avg(id)
from
(select id,name from #msora)

Server: Msg 170, Level 15, State 1, Line 3
Line 3: Incorrect syntax near ')'.

Here the problem is purely a lack of a name for your derived table / nested subquery.

We need to write sub queries as a derived table with the alias name.

select avg(id) as average
from
(select id,name from #msora)  a

average           
-----------
2

Test4:

Expression

Is an expression of the exact numeric or approximate numeric data type category,except for the bit data type.

Bit Data type Description:
SQL Server 2000
Integer data type 1, 0, or NULL.

Remarks
Microsoft® SQL Server™ optimizes the storage used for bit columns.

If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte.
If there are from 9 through 16 bit columns, they are stored as 2 bytes, and so on.

create table #msora
(
id bit null,
name varchar(30) null
)

if we insert any value other than 0 or NUll automatically 1 will insert into that BIT data type column.

insert into #msora values(0,'ms ora apprx data type test');
insert into #msora values(5,'ms ora apprx data type test');
insert into #msora values(-65,'ms ora apprx data type test');
insert into #msora values(-65,'ms ora apprx data type test');


select * from #msora;

id   name                          
---- ------------------------------
1    ms ora apprx data type test
1    ms ora apprx data type test
1    ms ora apprx data type test
0    ms ora apprx data type test


select avg(id) as average from #msora

Server: Msg 409, Level 16, State 2, Line 1
The average aggregate operation cannot take a bit data type as an argument.

Test5:

The return type is determined by the type of the evaluated result of expression.

create table #msora
(
id tinyint null,
name varchar(30) null
)

insert into #msora values(0,'ms ora apprx data type test');
insert into #msora values(255,'ms ora apprx data type test');
insert into #msora values(50,'ms ora apprx data type test');

select * from #msora;

id   name                          
---- ------------------------------
0    ms ora apprx data type test
255  ms ora apprx data type test
50   ms ora apprx data type test

Smaple code:

set showplan_text on;

declare
@var tinyint;
select @var=avg(id)  from #msora;
print @var;


StmtText                                                   
-----------------------------------------------------------
declare
@var tinyint;
select @var=avg(id)  from #msora;

(1 row(s) affected)

StmtText                                                                                                                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  |--Compute Scalar(DEFINE:([Expr1003]=Convert([Expr1002])))
       |--Compute Scalar(DEFINE:([Expr1002]=If ([Expr1004]=0) then NULL else ([Expr1005]/Convert([Expr1004]))))
            |--Stream Aggregate(DEFINE:([Expr1004]=COUNT_BIG([#msora].[id]), [Expr1005]=SUM([#msora].[id])))
                 |--Table Scan(OBJECT:([tempdb].[dbo].[#msora___________________________________________________________________________________00000003F05E]))

(4 row(s) affected)

StmtText         
-----------------

print @var;

(1 row(s) affected)


if you look at the final step "Compute Scalar(DEFINE:([Expr1003]=Convert([Expr1002])))" this repragents convert from one data type to another

we declared variable as tinyint.

The argument of an AVG function is also tinyint.

Then why data type conversion was done here?

Because if AVG function argument has data type tinyint then it return result set in INT data type.

we can see same for smallint data type also.

if our varible has INT data type then the final step won't come,because there is no need of data type conversion.

declare
@var int;
select @var=avg(id)  from #msora;
print @var;

The table column ID may be any one of the data type(tinyint,smallint,int)

StmtText                                               
-------------------------------------------------------
declare
@var int;
select @var=avg(id)  from #msora;

(1 row(s) affected)

StmtText                                                                                                                                                                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  |--Compute Scalar(DEFINE:([Expr1002]=If ([Expr1003]=0) then NULL else ([Expr1004]/Convert([Expr1003]))))
       |--Stream Aggregate(DEFINE:([Expr1003]=COUNT_BIG([#msora].[id]), [Expr1004]=SUM([#msora].[id])))
            |--Table Scan(OBJECT:([tempdb].[dbo].[#msora______________________________________________________________________________________________________________00000003F05E]))

(3 row(s) affected)

StmtText           
-------------------

print @var;

(1 row(s) affected)

Now there is no data type conversion step for variable assignment.

Test 6:

Unlike MIN and MAX, AVG can only accept a numeric expression as a parameter.

select * from #msora;

id   name                          
---- ------------------------------
0    ms ora apprx data type test
255  ms ora apprx data type test
50   ms ora apprx data type test


select avg(name) as average from #msora

Server: Msg 409, Level 16, State 2, Line 1
The average aggregate operation cannot take a varchar data type as an argument.

Test 7:

AVG () computes the average of a set of values by dividing the sum of those values by the count of nonnull values.

If the sum exceeds the maximum value for the data type of the return value an error will be returned.

create table #msora
(
id int null,
name varchar(30) null
)


insert into #msora values(2147483647,'ms ora range test');
insert into #msora values(10,'ms ora range test');

Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).

select avg(id) as average from #msora

average    
-----------
NULL

Arithmetic overflow occurred.


Case Study1:(The return type of an function depends on argument expression data type)

Let's consider a table:

create table #msora
(
id int null,
name varchar(30) null
)

The command(s) completed successfully.

We are inserting sample data into the table..

set nocount on
declare
@var int
set @var=1
while(@var<5)
begin
insert into #msora values(@var,'ms avg '+cast(@var as varchar(10)))
set @var=@var+1
end

select * from #msora

id          name                          
----------- ------------------------------
1           ms avg 1
2           ms avg 2
3           ms avg 3
4           ms avg 4


We need to determine the average of an column ID acurately with two decimals.

select avg(id) as average from #msora

average    
-----------
2

If we caluclate manually the average of an column ID is 10/4 =2.5

But from the above query we are getting 2.

Because return type of an AVG function is Based on data type of an argument expression.

Here the argument column is ID with Int data type,So the result set is also an Int data type,So we are getting 2 instead of 2.5

Solution is:

we can CAST the data type of an argument column (ID) to the Exact numeric data type NUMERIC(5,2) or decimal then we will get exact value with two decimals
as shown below...

select avg(cast(id as numeric(5,2))) as average from #msora

average                                 
----------------------------------------
2.500000


Another solution is we can multiply with (one dot)implicit data type conversion.

select avg(1.*id) as average from #msora

average                                 
----------------------------------------
2.500000

Note that although the AVG() function returns a value of the same data type as its argument family, the value might not be expressed
with the same degree of accuracy as the argument.

For example, the earlier explicit and implicit conversions return a result to 6 decimal places.

You can cast the result of the AVG() function to 2 decimal places by explicitly using the CAST function as follows:

select cast(avg(1.*id) as numeric(5,2)) as average from #msora

average
-------
2.50

Another solution is if we have data type of column ID is either Numeric(p,s) or decimal then we will get exact value as result set.

But the first and second approach is better than the changes in data model.

This attempt is not usefull because we are trying to cast result set after caluclating the average of an column values,Hence we got 2.00 as an result set.

select cast(avg(id) as numeric(5,2)) as average from #msora

average
-------
2.00


Case Study 2:(Approxmate data type does't return exact numeric values)

Let's consider an test table with two columns,first column ID having approxmate data type FLOAT.

create table #msora
(
id float null,
name varchar(30) null
)

The command(s) completed successfully.

Inserting data into table.

set nocount on
declare
@var int
set @var=1
while(@var<5)
begin
insert into #msora values(@var,'ms avg '+cast(@var as varchar(10)))
set @var=@var+1
end

insert into #msora values(3,'ms ora apprx data type test');

select * from #msora

id                                                    name                          
----------------------------------------------------- ------------------------------
1.0                                                   ms avg 1
2.0                                                   ms avg 2
3.0                                                   ms avg 3
4.0                                                   ms avg 4
3.0                                                   ms ora apprx data type test

select avg(id) as average from #msora

average                                              
-----------------------------------------------------
2.6000000000000001

If we caluclate the average manually 13.0/5=2.6 should come but here we are getting 2.6000000000000001,

Because of the approximate nature of the float and real data types.

Do not use these data types when exact numeric behavior is required, such as in financial applications, in operations involving rounding,
or in equality checks. Instead, use the integer, decimal, money, or smallmoney data types.


Thanks...

No comments:

Post a Comment