Pages

Thursday, 30 August 2012

ON Clause vs Where Clause in OUTER Joins

Source:Inside Ms SQL server 2008 Tsql querying Book

Let's consider a #customer table with two columns.

CREATE TABLE #Customers
(
customerid CHAR(5) NOT NULL PRIMARY KEY,
country VARCHAR(10) NOT NULL
);

The command(s) completed successfully.

Let's consider another #orders table with two columns.

CREATE TABLE #Orders
(
orderid INT NOT NULL PRIMARY KEY,
customerid CHAR(5) NULL REFERENCES Customers(customerid)
);
Result set:
Skipping FOREIGN KEY constraint '#Orders' definition for temporary table.

we can't create foreign key for temporary table.Eventhough we don't have foreign key on #orders table,we are inserting valid data[the valid set of data values present in the primary key column of table #customer(customerid)] into the foreign key column.

Data inserted into #customers table.

INSERT INTO #Customers(customerid, country) VALUES('cust1', 'INDIA');
INSERT INTO #Customers(customerid, country) VALUES('cust2', 'UK');
INSERT INTO #Customers(customerid, country) VALUES('cust3', 'USA');
INSERT INTO #Customers(customerid, country) VALUES('cust4', 'CANADA');
INSERT INTO #Customers(customerid, country) VALUES('cust5', 'RUS');

Data inserted into #orders table.

INSERT INTO #Orders(orderid, customerid) VALUES(1, 'cust1');
INSERT INTO #Orders(orderid, customerid) VALUES(2, 'cust2');
INSERT INTO #Orders(orderid, customerid) VALUES(3, 'cust3');
INSERT INTO #Orders(orderid, customerid) VALUES(4, 'cust4');
INSERT INTO #Orders(orderid, customerid) VALUES(5, 'cust1');
INSERT INTO #Orders(orderid, customerid) VALUES(6, 'cust2');
INSERT INTO #Orders(orderid, customerid) VALUES(7, NULL);

select * from #customers;
customerid country   
---------- ----------
cust1      INDIA
cust2      UK
cust3      USA
cust4      CANADA
cust5      RUS
(5 row(s) affected)

select * from #orders;
orderid     customerid
----------- ----------
1           cust1
2           cust2
3           cust3
4           cust4
5           cust1
6           cust2
7           NULL
(7 row(s) affected)

Now Simply we are checking which list of customers have orders and who dosn't have order.

select c.customerid,c.country,o.orderid,o.customerid
from #customers c left outer join #orders o on c.customerid=o.customerid
customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA           1           cust1
cust1      INDIA           5           cust1
cust2      UK                2           cust2
cust2      UK                6           cust2
cust3      USA              3           cust3
cust4      CANADA     4           cust4
cust5      RUS            NULL        NULL
(7 row(s) affected)

Cust1,cust2,cust3,cust4 have orders but only the cust5 doesn't have orders.

For the list of customers who had order:

select c.customerid,c.country,o.orderid,o.customerid
from #customers c Inner join #orders o on c.customerid=o.customerid
customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust2      UK         2           cust2
cust3      USA        3           cust3
cust4      CANADA     4           cust4
cust1      INDIA      5           cust1
cust2      UK         6           cust2
(6 row(s) affected)

For the list of customers who doesn't have order:

select c.customerid,c.country,o.orderid,o.customerid
from #customers c left outer join #orders o on c.customerid=o.customerid
where orderid is null

customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust5      RUS        NULL        NULL
(1 row(s) affected)

Now let's see the difference between ON clause and WHERE clause

My Requirement here is: we need to select customerid,country and orderid details having country "INDIA" only

My first attempt is:

select c.customerid,c.country,o.orderid,o.customerid
from #customers c left outer join #orders o on c.customerid=o.customerid and c.country='INDIA'

Result set:
customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1
cust2      UK         NULL        NULL
cust3      USA        NULL        NULL
cust4      CANADA     NULL        NULL
cust5      RUS        NULL        NULL
(6 row(s) affected)

A confusing aspect of queries containing an OUTER JOIN clause is whether to specify a logical
expression in the ON fi lter or in the WHERE fi lter.

The main difference between the two is that ON is applied before adding outer rows, while WHERE is applied afterwards.

An elimination of a row from the preserved table by the ON fi lter is not final because the rows from preserved table again will add it back as an outer rows;

An elimination of a row by the WHERE fi lter, by contrast, is fi nal. Bearing this in mind should help you make the right choice.

what's happening in above query.

step1:cross join is done between two tables.
Result set is:
customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      2           cust2
cust1      INDIA      3           cust3
cust1      INDIA      4           cust4
cust1      INDIA      5           cust1
cust1      INDIA      6           cust2
cust1      INDIA      7           NULL
cust2      UK         1           cust1
cust2      UK         2           cust2
cust2      UK         3           cust3
cust2      UK         4           cust4
cust2      UK         5           cust1
cust2      UK         6           cust2
cust2      UK         7           NULL
cust3      USA        1           cust1
cust3      USA        2           cust2
cust3      USA        3           cust3
cust3      USA        4           cust4
cust3      USA        5           cust1
cust3      USA        6           cust2
cust3      USA        7           NULL
cust4      CANADA     1           cust1
cust4      CANADA     2           cust2
cust4      CANADA     3           cust3
cust4      CANADA     4           cust4
cust4      CANADA     5           cust1
cust4      CANADA     6           cust2
cust4      CANADA     7           NULL
cust5      RUS        1           cust1
cust5      RUS        2           cust2
cust5      RUS        3           cust3
cust5      RUS        4           cust4
cust5      RUS        5           cust1
cust5      RUS        6           cust2
cust5      RUS        7           NULL
(35 row(s) affected)

Step2:ON Clause is applied on the resultset coming from previous step.(step1)

Here the logical expression specified in ON clause is on c.customerid=o.customerid and c.country='INDIA'

This condition will check on the resultset came from cross join.
customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1

So only two records satisfy two conditions[c.customerid=o.customerid and c.country='INDIA'] put in the
logical expression from the resultset came from cross join.

Step3:Outer rows will add
In this step all the rows from the preserved table will add and if condition is satisfy coresponding column values will add from non preserved table else( for unmatched rows) NULL values will add from the NON preserved table.

customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1
cust2      UK         NULL        NULL
cust3      USA        NULL        NULL
cust4      CANADA     NULL        NULL
cust5      RUS        NULL        NULL
(6 row(s) affected)

If we specify the filter condition in the IN clause OUTER rows will add after ON filter applied.

So this is not our expected result,we need only records with country name "INDIA" but with this query we are getting other country records also.

Second attempt:

select c.customerid,c.country,o.orderid,o.customerid
from #customers c left outer join #orders o on c.customerid=o.customerid
where c.country='INDIA'

customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1
(2 row(s) affected)

step1:cross join is done between two tables.
we can see the result set from previous first attempt step1.

Step2:ON Clause is applied on the resultset coming from previous step.(step1)
Here the logical expression specified in ON clause is on c.customerid=o.customerid
This condition will check on the resultset came from cross join.
customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1
cust2      UK         2           cust2
cust2      UK         6           cust2
cust3      USA        3           cust3
cust4      CANADA     4           cust4
cust5      RUS        NULL        NULL
(7 row(s) affected)

Except one record with customer_id cust5 all are satisfy this join condition and the result set is as shown above.

Step3:Outer rows will add
In this step all records from the preserved table will add and if condition is satisfy coresponding column values will add from non preserved table else( for unmatched rows) NULL values will add from the NON preserved table.

customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1
cust2      UK         2           cust2
cust2      UK         6           cust2
cust3      USA        3           cust3
cust4      CANADA     4           cust4

Step4:
Now the where clause filter condition will apply on the result set came from previous step.
where country='INDIA'

only two records will come

customerid country    orderid     customerid
---------- ---------- ----------- ----------
cust1      INDIA      1           cust1
cust1      INDIA      5           cust1

Yes this is our expected result.

The point we need to remember is
If we specify the logical expression in the IN clause OUTER rows will add after ON filter applied.
where as if we specify in where clause filter will apply after adding outer rows.

If your intention is to get all rows from preserved table and matched and unmatched records(with NULL) from non preserver table,you should keep filter condition in ON clause.

Thank you...

No comments:

Post a Comment