Pages

Wednesday, 21 February 2018

Set Operators


Set Operators (Transact-SQL)

SQL Server provides the following set operators. Set operators combine results from two or more queries into a single result set.

UNION
Combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the union. The UNION operation is different from using joins that combine columns from two tables.

The following are basic rules for combining the result sets of two queries by using UNION:
·         Each query must have the same number of columns
·         Each column must have compatible data types
·         Column names for the final result set are taken from the first query
·         ORDER BY and COMPUTE clauses can only be issued for the overall result set and not within each individual result set
·         GROUP BY and HAVING clauses can only be issued for each individual result set and not for the overall result set
·         SELECT INTO clause can only be issued for first result set not for the overall result set.

Syntax:
-- Syntax for SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Parallel Data Warehouse
 
    {<query_specification> | ( <query_expression> ) }  
    UNION [ ALL ]  
    <query_specification | ( <query_expression> )  
    [ UNION [ ALL ] <query_specification> | ( <query_expression> )  
    [ ...n ] ]  

<query_specification> | (<query_expression>)
Is a query specification or query expression that returns data to be combined with the data from another query specification or query expression. The definitions of the columns that are part of a UNION operation do not have to be the same, but they must be compatible through implicit conversion. When data types differ, the resulting data type is determined based on the rules for 
data type precedence. When the types are the same but differ in precision, scale, or length, the result is determined based on the same rules for combining expressions.

Columns of the xml data type must be equivalent. All columns must be either typed to an XML schema or untyped. If typed, they must be typed to the same XML schema collection.

UNION
specifies that multiple result sets are to be combined and returned as a single result set.

ALL
incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.

Using SELECT INTO with UNION

SELECT statement specifies that the table named ProductResults holds the final result set of the union of the designated columns of the ProductModel and Gloves tables.
-- Uses AdventureWorks    
SELECT ProductModelID, Name  
INTO dbo.ProductResults  
FROM Production.ProductModel  
WHERE ProductModelID NOT IN (3, 4)  
UNION  
SELECT ProductModelID, Name  
FROM dbo.Gloves;  
GO  

Using UNION of two SELECT statements with ORDER BY

/* INCORRECT */  
-- Uses AdventureWorks  
  
SELECT ProductModelID, Name  
FROM Production.ProductModel  
WHERE ProductModelID NOT IN (3, 4)  
ORDER BY Name  
UNION  
SELECT ProductModelID, Name  
FROM dbo.Gloves;  
GO  
  
/* CORRECT */  
-- Uses AdventureWorks  
  
SELECT ProductModelID, Name  
FROM Production.ProductModel  
WHERE ProductModelID NOT IN (3, 4)  
UNION  
SELECT ProductModelID, Name  
FROM dbo.Gloves  
ORDER BY Name;  
GO  


EXCEPT and INTERSECT
Returns distinct rows by comparing the results of two queries.

EXCEPT returns distinct rows from the left input query that aren’t output by the right input query.

INTERSECT returns distinct rows that are output by both the left and right input queries operator.

The basic rules for combining the result sets of two queries that use EXCEPT or INTERSECT are the following:
  • The number and the order of the columns must be the same in all queries.
  • The data types must be compatible.

-- Syntax for SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Parallel Data Warehouse  
  
{ <query_specification> | ( <query_expression> ) }   
{ EXCEPT | INTERSECT }  
{ <query_specification> | ( <query_expression> ) }  

Arguments

<query_specification> | ( <query_expression> )
Is a query specification or query expression that returns data to be compared with the data from another query specification or query expression. The definitions of the columns that are part of an EXCEPT or INTERSECT operation do not have to be the same, but they must be comparable through implicit conversion. When data types differ, the type that is used to perform the comparison and return results is determined based on the rules for
data type precedence.

When the types are the same but differ in precision, scale, or length, the result is determined based on the same rules for combining expressions.

The query specification or expression cannot return xml, text, ntext, image, or non binary CLR user-defined type columns because these data types are not comparable.

EXCEPT
Returns any distinct values from the query to the left of the EXCEPT operator that are not also returned from the right query.

INTERSECT
Returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operator.

Remarks

When the data types of comparable columns that are returned by the queries to the left and right of the EXCEPT or INTERSECT operators are character data types with different collations, the required comparison is performed according to the rules of collation precedence. If this conversion cannot be performed, the SQL Server Database Engine returns an error.

When comparing column values for determining DISTINCT rows, two NULL values are considered equal.

The column names of the result set that are returned by EXCEPT or INTERSECT are the same names as those returned by the query on the left side of the operator.

Column names or aliases in ORDER BY clauses must reference column names returned by the left-side query.

The nullability of any column in the result set returned by EXCEPT or INTERSECT is the same as the nullability of the
corresponding column that is returned by the query on the left side of the operator.

If EXCEPT or INTERSECT is used together with other operators in an expression, it is evaluated in the context of the following precedence:
1.      Expressions in parentheses
2.      The INTERSECT operator
3.      EXCEPT and UNION evaluated from left to right based on their position in the expression

If EXCEPT or INTERSECT is used to compare more than two sets of queries, data type conversion is determined by comparing two queries at a time, and following the previously mentioned rules of expression evaluation.

EXCEPT and INTERSECT cannot be used in distributed partitioned view definitions, query notifications.

EXCEPT and INTERSECT may be used in distributed queries, but are only executed on the local server and not pushed to the linked server. Therefore, using EXCEPT and INTERSECT in distributed queries may affect performance.
Fast forward-only and static cursors are fully supported in the result set when they are used with an EXCEPT or INTERSECT operation. If a keyset-driven or dynamic cursor is used together with an EXCEPT or INTERSECT operation, the cursor of the result set of the operation is converted to a static cursor.

When an EXCEPT operation is displayed by using the Graphical Showplan feature in SQL Server Management Studio, the operation appears as a left anti semi join, and an INTERSECT operation appears as a left semi join.


Tip
If you don't have the exact same columns in all queries use a default value or a NULL value such as:

SELECT firstName, lastName, company FROM businessContacts
UNION
SELECT firstName, lastName, NULL FROM nonBusinessContacts

or

SELECT firstName, lastName, createDate FROM businessContacts
UNION ALL
SELECT firstName, lastName, getdate() FROM nonBusinessContacts


Performance Considerations:
UNION vs. UNION ALL Examples

Let's take a look at a few simple examples of how these commands work and how they differ. As you will see the final result sets will differ, but there is some interesting info on how SQL Server actually completes the process.

UNION ALL
In this first example we are using the UNION ALL operator against the Employee table from the AdventureWorks database. This is probably not something you would do, but this helps illustrate the differences of these two operators.

There are 290 rows in table dbo. Employee.

SELECT * FROM HumanResources.Employee
UNION ALL
SELECT * FROM HumanResources.Employee
UNION ALL
SELECT * FROM HumanResources.Employee

When this query is run the result set has 870 rows. This is the 290 rows returned 3 times. The data is just put together one dataset on top of the other dataset.

Here is the execution plan for this query. We can see that the table was queried 3 times and SQL Server did a Concatenation step to concatenate all of the data.


UNION
In this next example we are using the UNION operator against the Employee table again from the AdventureWorks database.

SELECT * FROM HumanResources.Employee
UNION
SELECT * FROM HumanResources.Employee
UNION
SELECT * FROM HumanResources.Employee

When this query is run the result set has 290 rows. Even though we combined the data three times the UNION operator removed the duplicate records and therefore returns just the 290 unique rows. 

Here is the execution plan for this query. We can see that SQL Server first queried 2 of the tables, then did a Merge Join operation to combine the first two tables and then it did another Merge Join along with querying the third table in the query. So we can see there was much more worked that had to be performed to get this result set compared to the UNION ALL.



UNION vs. UNION ALL Examples With Sort on Clustered Index Column

If we take this a step further and do a SORT of the data using the Clustered Index column we get these execution plans. From this we can see that the execution plan that SQL Server is using is identical for each of these operations even though the final result sets will still contain 870 rows for the UNION ALL and 290 rows for the UNION. 

UNION ALL

UNION


UNION vs. UNION ALL Examples With Sort on Non-indexed Column

Here is another example doing the same thing, but this time doing a SORT on a non indexed column. As you can see the execution plans are again identical for these two queries, but this time instead of using a MERGE JOIN, a CONCATENATION and SORT operations are used. 

UNION ALL

UNION


No comments:

Post a Comment