Pages

Tuesday, 30 August 2016

subquery


A subquery is an embedded SQL statement within an outer query. The subquery provides an answer to the outer query in the form of a scalar value, a list of values, or a data set, and may be substituted for an expression, list, or table, respectively, within the outer query. The matrix of subquery types and SELECT statement usage is shown in Table 11-1. Traditionally, a subquery may only contain a SELECT query and not a data-modification query, which explains why subqueries are sometimes referred to as subselects.

Five basic forms are possible when building a subquery, depending on the data needs and your favored syntax:

Simple subquery: The simple subquery can be a stand-alone query and can run by itself. It is executed once, with the result passed to the outer query. Simple subqueries are constructed as normal SELECT queries and placed within parentheses.

Common table expression (CTE): CTEs are a syntactical variation of the simple subquery, similar to a view, which defines the subquery at the beginning of the query using the WITH command. The CTE can then be accessed multiple times within the main query as if it were a view or derived table.

Correlated subquery: This is similar to a simple subquery except that it references at least one column in the outer query, so it cannot run separately by itself. Conceptually, the outer query runs first and the correlated subquery runs once for every row in the outer query. Physically, the Query Optimizer is free to generate an efficient query execution plan.

Row constructor: A VALUES clause or the FROM clause that supplies hard-coded values as a subquery

Composable SQL: The ability to pass data from an INSERT, UPDATE, or DELETE statement’s output clause to an outer query



Expression
Subquery returns a scalar
value
List
Subquery returns
a list of values
Data Set
Subquery returns
a multi-column
data source
Any expression
e.g., SELECT list,
HAVING clause,
GROUP BY, JOIN ON, etc.
The subquery result is
used as an expression
supplying the value for
the column. If the result
is empty, NULL is used
instead.
X
X
Derived Table
FROM (data source) AS ALIAS
or
WITH CTE
This is the only location where a
subquery can use a table alias
The subquery’s data set
is accepted as a (one
row, one column)
derived table source
within the outer query. If
the result is empty, an
empty derived table
source is used.
The subquery’s
data set is
accepted as a
(one row)
derived table
source within
the outer query.
The subquery’s
data set is
accepted as a
derived table
source within
the outer query.
WHERE x
{=,<>,!=,>,>=,!>,<,<=,!<}
(subquery)
The WHERE clause is true
if the test value compares
true with the subquery’s
scalar value. If the
subquery returns no
result, the WHERE clause
is not true.
X
X
WHERE x
{=,<>,!=,>,>=,!>,<,<=,!<}
ALL
(subquery)
The WHERE condition is
true if the test value
meets the condition for
the scalar value returned
by the subquery. If the
subquery returns no
result, the WHERE
condition is not true.
The WHERE
condition is
true if the test
value meets the
condition for
every value
returned by the
subquery.
X
WHERE x
{=,<>,!=,>,>=,!>,<,<=,!<}
SOME|ANY
(subquery)
The WHERE condition is
true if the test value
meets the condition for
the scalar value returned
by the subquery. If the
subquery returns no
result, the where
condition is not true.
The WHERE
condition is
true if the test
value meets the
condition for
any value
returned by the
subquery.
X
WHERE x
IN | = ANY
(subquery)
The WHERE condition is
true if the test value is
equal to the scalar value
returned by the
subquery. If the subquery
returns no result, the
WHERE condition is not
true.
The WHERE
condition is
true if the test
value is found
within the list
of values
returned by the
subquery.
X
WHERE EXISTS (Subquery)
The WHERE condition is
true if the subquery
returns a value.
The WHERE
condition is
true if the
subquery
returns at least
one value.
The WHERE
condition is
true if the
subquery
returns at least
one row.




Simple subqueries:
Simple subqueries are executed in the following order:
1. The simple subquery is executed once.
2. The results are passed to the outer query.
3. The outer query is executed once.

The most basic simple subquery returns a single (scalar) value, which is then used as an expression in the outer query, as follows:
SELECT (SELECT 3) AS SubqueryValue;
Result:
SubqueryValue
--------------
3

The subquery (SELECT 3) returns a single value of 3, which is passed to the outer SELECT statement.

The outer SELECT statement is then executed as if it were the following:

SELECT 3 AS SubqueryValue;

Of course, a subquery with only hard-coded values is of little use. A useful subquery fetches data from a table, for example:

USE OBXKites;
SELECT ProductName
FROM dbo.Product
WHERE ProductCategoryID  =
(Select ProductCategoryID
FROM dbo.ProductCategory
Where ProductCategoryName = ‘Kite’);

To execute this query, SQL Server first evaluates the subquery and returns a value to the outer query (your unique identifier will be different from the one in this query):
Select ProductCategoryID
FROM dbo.ProductCategory
Where ProductCategoryName = ‘Kite’;
Result:
ProductCategoryID
------------------------------------
c38D8113-2BED-4E2B-9ABF-A589E0818069

The outer query then executes as if it were the following:
SELECT ProductName
FROM dbo.Product
WHERE ProductCategoryID
= ‘c38D8113-2BED-4E2B-9ABF-A589E0818069’;
Result:
ProductName
--------------------------------------------------
Basic Box Kite 21 inch
Dragon Flight
Sky Dancer
Rocket Kite
...

If you think subqueries seem similar to joins, you’re right. Both are a means of referencing multiple data
sources within a single query, and many queries that use joins may be rewritten as queries using subqueries.

Best Practice
Use a join to pull data from two data sources that can be filtered or manipulated as a whole after the join. If the data must be manipulated prior to the join, then use a derived table subquery.

Using scalar subqueries
If the subquery returns a single value it may then be used anywhere inside the SQL SELECT statement where an expression might be used, including column expressions, JOIN conditions, WHERE conditions, or HAVING conditions.

Normal operators (+, =, between, and so on) will work with single values returned from a subquery; data-type conversion using the CAST () or CONVERT () functions may be required, however.
Scalar sub-query in SELECT statement
Select name, deptid,Salary,
(select max(salary) from employee me where me.deptid = e.deptid) as Department_MAX_Salary
from employee e
order by deptid, salary

Scalar sub-query in WHERE Clause

Select a.name,b.name as Department, a.salary from employee a, Department b
where a.deptid=b.deptid and
a.salary < (select max(salary) from employee)

Scalar sub-query in ORDER BY Clause

Select deptid, name from employee e
order by (
select name from department d where e.deptid=d.deptid)






Using subqueries as lists:
Subqueries begin to shine when used as lists. A single value, commonly a column, in the outer query is compared with the subquery’s list by means of the IN operator. The subquery must return only a single column; multiple columns will fail.

The IN operator returns a value of true if the column value is found anywhere in the list supplied by the subquery, in the same way that WHERE . . . IN returns a value of true when used with a hard-coded list:

SELECT FirstName, LastName
FROM dbo.Contact
WHERE HomeRegion IN (’NC’, ‘SC’, ‘GA’, ‘AL’, ‘VA’);

A list subquery serves as a dynamic means of generating the WHERE . . . IN condition list:

SELECT FirstName, LastName
FROM dbo.Contact
WHERE Region IN (Subquery that returns a list of states);



Using subqueries as tables:
In the same way that a view may be used in the place of a table within the FROM clause of a SELECT statement, a subquery in the form of a derived table can replace any table, provided the subquery has an alias. This technique is very powerful and is often used to break a difficult query problem down into smaller bite-size chunks.

Using a subquery as a derived table is an excellent solution to the aggregate-function problem. When you are building an aggregate query, every column must participate in the aggregate function in some way, either as a GROUP BY column or as an aggregate function (sum(), avg(), count(), max(), or min()). This stipulation makes returning additional descriptive information difficult. However, performing the aggregate functions in a subquery and passing the rows found to the outer query as a derived table enables the outer query to then return any columns desired.

For more information about aggregate functions and the group by keyword, see Chapter 12, ‘‘Aggregating Data.’’

The question ‘‘How many of each product have been sold?’’ is easy to answer if only one column from
the Product table is included in the result:

SELECT P.Code, SUM(Quantity) AS QuantitySold
FROM dbo.OrderDetail AS OD
JOIN dbo.Product AS P
ON OD.ProductID = P.ProductID
GROUP BY P.Code
ORDER BY P.Code;

Result:
Code QuantitySold
--------------- ----------------------------------------
1002 47.00
1003 5.00
1004 2.00
1012 5.00
...

The result includes P.Code, but not the name or description. Of course, it’s possible to simply group by every column to be returned, but that’s sloppy. The following query performs the aggregate summation in a subquery that is then joined with the Product table so that every column is available without additional work:

SELECT P.Code,
P.ProductName,
Sales.QuantitySold
FROM dbo.Product AS P
JOIN (SELECT ProductID, SUM(Quantity) AS QuantitySold
FROM dbo.OrderDetail
GROUP BY ProductID) AS Sales
ON P.ProductID = Sales.ProductID
ORDER BY P.Code;


The query is fast and efficient, it provides the required aggregate data, and all the product columns can
be added to the output columns. The result is as follows:

Code ProductName QuantitySold
----------- -------------------- ------------------
1002 Dragon Flight 47.00
1003 Sky Dancer 5.00
1004 Rocket Kite 2.00
1012 Falcon F-16 5.00
...




All, some, and any
Though not as popular as IN, three other options are worth examining when using a subquery in a WHERE clause. Each provides a twist on how items in the subquery are matched with the WHERE clause’s test value.

ALL must be true for every value.

SOME and ANY, which are equivalent keywords, must be true for some of the values in the subquery.

The next query demonstrates a simple ALL subquery. In this case, select returns true if 1 is less than every value in the subquery:

SELECT ‘True’ as ‘AllTest’
WHERE 1 < ALL
(SELECT a
FROM
(VALUES
(2),
(3),
(5),
(7),
(9)
) AS ValuesTable (a)
);
Result:
AllTest
--------------
True

Be very careful with the ALL condition if the subquery might return a null. A null value in the subquery results will force the ALL to return a false, because it’s impossible to prove that the test is true for every value in the subquery if one of those values is unknown.

In this query, the last value is changed from a 9 to null and the query no longer returns true:

SELECT ‘True’ AS ‘AllTest’
WHERE 1 < ALL
(SELECT a
FROM
(VALUES
(2),
(3),
(5),
(7),
(null)
) AS ValuesTable(a)
);
Result (empty result set):

AllTest
       --------------

The SOME and ANY conditional tests return true if the condition is met for any values in the subquery result set.

For example:
SELECT ‘True’ as ‘SomeTest’
WHERE 5 = SOME
(SELECT a
FROM
(VALUES
(2),
(3),
(5),
(7),
(9)
) AS MyTable(a)
);

Result:
SomeTest
--------------
True

The ANY and SOME conditions are similar to the in condition. In fact = ANY and = SOME are exactly like IN. ANY and SOME conditions have the extra functionality to testing for other conditional tests such as <, <=, >, =>, and <>.





Row constructors:

New for SQL Server 2008, row constructors provide a convenient way to supply hard-coded values directly in a subquery. The VALUES clause is wrapped in parentheses, as is every hard-coded row. It requires an alias and a column alias list, also in parentheses.

Row constructors can be used in the FROM clause and joined just like any other type of data source.

The next query creates a row constructors data source called MyRowConstructor with two columns, a and b:
SELECT a, b
FROM
(VALUES
(1, 2),
(3, 4),
(5, 6),
(7, 8),
(9, 10)
) AS MyRowConstructor (a, b)

Result:
a b
----------- -----------
1 2
3 4
5 6
7 8
9 10



Common table expressions
The common table expression (CTE) defines what could be considered a temporary view, which can be referenced just like a view in the same query. Because CTEs may be used in the same ways that simple subqueries are used and they compile exactly like a simple subquery, I’ve included them in the simple subquery heading and will show example code CTEs alongside simple subqueries.

The CTE uses the WITH clause, which defines the CTE. Inside the WITH clause is the name, column aliases, and SQL code for the CTE subquery. The main query can then reference the CTE as a data source:

WITH CTEName (Column aliases)
AS (Simple Subquery)
SELECT. . .
FROM CTEName;

The WITH keyword not only begins a CTE, it also adds a hint to a table reference. This is why the statement before a CTE must be terminated with a semicolon — just one more reason to always terminate every statement with a semicolon.

The following example is the exact same query as the preceding subquery, only in CTE format. The name of the CTE is CTEQuery. It returns the ProductionCategoryID column and uses the exact same SQL Select statement as the preceding simple subquery:

WITH CTEQuery (ProductCategoryID)
AS (Select ProductCategoryID
from dbo.ProductCategory
Where ProductCategoryName = ‘Kite’)

(Note that a CTE by itself is an incomplete SQL statement. If you try to run the preceding code, you will get a syntax error.)

Once the CTE has been defined in the WITH clause, the main portion of the query can reference the CTE using its name as if the CTE were any other table source, such as a table or a view. Here’s the complete example, including the CTE and the main query:
WITH CTEQuery (ProductCategoryID)
AS (Select ProductCategoryID
from dbo.ProductCategory
Where ProductCategoryName = ‘Kite’)
SELECT ProductName
FROM dbo.Product
WHERE ProductCategoryID
= (SELECT ProductCategoryID FROM CTEQuery);

To include multiple CTEs within the same query, define the CTEs in sequence prior to the main query:
WITH
CTE1Name (column names)
AS (Simple Subquery),
CTE2Name (column names)
AS (Simple Subquery)
SELECT. . .
FROM CTE1Name
INNER JOIN CTE2Name
ON . . .

Although CTEs may include complex queries, they come with two key restrictions:
unlike subqueries, CTEs may not be nested. A CTE may not include another CTE.
CTEs may not reference the main query. Like simple subqueries, they must be self-contained.

However, a CTE may reference any of the CTEs defined before it, or even itself (see below).




Best Practice
Although the CTE syntax may initially appear alien, for very complex queries that reference the same subquery in multiple locations, using a CTE may reduce the amount of code and improve readability.

A CTE is really just a different syntax for a simple subquery used as a derived table, with one key exception: CTEs can recursively refer to the same table using a union, and this works great for searching an adjacency pairs pattern hierarchy.



Correlated Subqueries:
Correlated subqueries sound impressive, and they are. They are used in the same ways that simple subqueries are used, the difference being that correlated subqueries reference columns in the outer query. They do this by referencing the name or alias of a table in the outer query, to reference the outer query. This capability to limit the subquery by the outer query makes these queries powerful and flexible. Because correlated subqueries can reference the outer query, they are especially useful for complex WHERE conditions.

Correlating in the where clause
The capability to reference the outer query also means that correlated subqueries won’t run by themselves because the reference to the outer query would cause the query to fail. The logical execution order is as follows:

1. The outer query is executed once.
2. The subquery is executed once for every row in the outer query, substituting the values from the outer query into each execution of the subquery.
3. The subquery’s results are integrated into the result set.

If the outer query returns 100 rows, then SQL Server will execute the logical equivalent of 101 queries — one for the outer query, and one subquery for every row returned by the outer query.

In practice, the SQL Server Query Optimizer will likely figure out a way to perform the correlated subquery without actually performing the 101 queries. In fact, I’ve sometimes seen correlated subqueries outperform other query plans. If they solve your problem, then don’t avoid them for performance reasons.

Example:
You can use a subquery anywhere in a SQL Statement where an expression is allowed. In the following example, I use a CASE expression and subquery to check whether line item sales totals in the SalesOrderDetail table equals the sales subtotal listed in the SalesOrderHeader table:

SELECT
  SalesOrderNumber,
  SubTotal,
  OrderDate,
  CASE WHEN
    (
      SELECT SUM(LineTotal)
      FROM Sales.SalesOrderDetail
      WHERE SalesOrderID = 43659
    ) =  SubTotal THEN 'balanced'
             ELSE 'not balanced'
  END AS LineTotals
FROM
  Sales.SalesOrderHeader
WHERE
  SalesOrderID = 43659;
     
I’ve included the CASE expression as part of the fourth column expression. The CASE expression uses the subquery to total the line item sales in the SalesOrderDetail table for order 43659. Notice that, as in the preceding example, the subquery is enclosed in parentheses and uses the SUM aggregate function to return a single value. I then use an equal (=) operator to compare the subquery’s result to the SubTotal column in the SalesOrderHeader table. If the amounts are equal, the CASE expression returns a value of balanced. It the values are not equal, CASE returns not balanced.

As you can see, the line item sales total in the SalesOrderDetail table does not match the subtotal in the SalesOrderHeader table, at least not for sale 43659. However, suppose you want to verify all the sales listed in the two tables to see whether the totals balance. To do so, you must modify both the subquery and the outer query in order to create the condition necessary to support a correlated subquery. A correlated subquery, also known as a repeating subquery, is one that depends on the outer query for specific values. This is particularly important if your outer query returns multiple rows.



The best way to understand how correlated subqueries work is to look at an example. In the following SELECT statement, I include a CASE expression as one of the column expressions, as you saw in the preceding example:

SELECT
  SalesOrderNumber,
  SubTotal,
  OrderDate,
  CASE WHEN
    (
      SELECT SUM(LineTotal)
      FROM Sales.SalesOrderDetail d
      WHERE d.SalesOrderID = h.SalesOrderID
    ) =  h.SubTotal THEN 'balanced'
    ELSE 'not balanced'
  END AS LineTotals
FROM
  Sales.SalesOrderHeader h;

As before, the CASE expression includes a subquery that returns the total amount for line item sales. However, notice that the subquery’s WHERE clause is different from the previous example. Instead of specifying an order ID, the WHERE clause references the SalesOrderID column from the outer query. I do this by using table aliases to distinguish the two columns-hfor SalesOrderHeader and d for SalesOrderDetail-and then specifying that the column values must be equal for the WHERE condition to evaluate to true. That means that, for each row in the SalesOrderHeader table returned by the outer query, the SalesOrderID value associated with that row is plugged into the subquery and compared with the SalesOrderID value of the SalesOrderDetail table. As a result, the subquery is executed for each row returned by the outer query.

The value returned by the subquery is then compared to the SubTotal column of the SalesOrderHeader table and a value for the LineTotals column is provided, a process repeated for each row. 
     



Correlating a derived table using apply:

A subquery used as a derived table can reference the outer query, which makes it a correlated subquery. This technique leverages the previous correlated subquery method by using the correlation in the WHERE clause of the derived table subquery.

Subqueries used as derived tables aren’t allowed to reference the outer query if they are included in the outer query with a JOIN. However, the CROSS APPLY or OUTER APPLY method of including the subquery allows passing data to the subquery.

First, to set up some sample data:
USE tempdb;
CREATE TABLE TableA (ID INT);
INSERT INTO TableA VALUES (1);
INSERT INTO TableA VALUES (2);

CREATE TABLE TableB (ID INT);
INSERT INTO TableB VALUES (1);
INSERT INTO TableB VALUES (3);

The following query uses a CROSS APPLY to pass every row from the outer query to the derived table subquery. The subquery then filters its rows to those that match IDs. The CROSS APPLY returns every row from the outer query that had a match in the subquery. Functionally, it’s the equivalent to an inner join between TableA and TableB:

SELECT B.ID AS Bid, A.ID AS Aid
FROM TableB AS B
CROSS APPLY
(Select ID from TableA
where TableA.ID = B.ID) AS A;
Result:
Bid Aid
----------- -----------
1 1

The next query uses the same correlated derived table subquery, but changes to an OUTER APPLY to include all rows from the outer query. This query is the same as a left outer join between TableA and TableB:

SELECT B.ID AS Bid, A.ID AS Aid
FROM TableB AS B
OUTER APPLY
(Select ID from TableA
where TableA.ID = B.ID) AS A;
Result:
ID ID
----------- -----------
1 1
3 NULL



Composable SQL
Composable SQL, also called select from output or DML table source (in SQL Server BOL), is the ability to pass data from an insert, update, or delete’s output clause to an outer query. This is a very powerful new way to build subqueries, and it can significantly reduce the amount of code and improve the performance of code that needs to write to one table, and then, based on that write, write to another table.

To track the evolution of composable SQL (illustrated in Figure 11-3), SQL Server has always had DML triggers, which include the inserted and deleted virtual tables. Essentially, these are a view to the DML modification that fired the triggers. The deleted table holds the before image of the data, and the inserted table holds the after image.

Since SQL Server 2005, any DML statement that modifies data (INSERT, UPDATE, DELETE, MERGE) can have an optional OUTPUT clause that can SELECT from the virtual inserted and deleted table. The OUTPUT clause can pass the data to the client or insert it directly into a table.

The inserted and deleted virtual tables are covered in Chapter 26, ‘‘Creating DML Triggers,’’ and the output clause is detailed in Chapter 15, ‘‘Modifying Data.’’ In SQL Server 2008, composable SQL can place the DML statements and its OUTPUT clause in a subquery and then select from that subquery. The primary benefit of composable SQL, as opposed to just using the OUTPUT clause to insert into a table, is that OUTPUT clause data may be further filtered and manipulated by the outer query.

The following script first creates a table and then has a composable SQL query. The subquery has an UPDATE command with an OUTPUT clause. The OUTPUT clause passes the oldvalue and newvalue columns to the outer query. The outer query filters out TestData and then inserts it into the CompSQL table:

CREATE TABLE CompSQL (oldvalue varchar(50), newvalue varchar(50));

INSERT INTO CompSQL (oldvalue, newvalue )
SELECT oldvalue, newvalue
FROM
(UPDATE HumanResources.Department
SET GroupName = ‘Composable SQL Test’
OUTPUT Deleted.GroupName as ‘oldvalue’,
Inserted.GroupName as ‘newvalue’
WHERE Name = ‘Sales’);

SELECT oldvalue, newvalue
FROM CompSQL
WHERE newvalue <> ‘TestData’;

Result:
oldvalue newvalue
----------------------- ------------------------------
Sales and Marketing Composable SQL Test

Note several restrictions on composable SQL:

The update DML in the subquery must modify a local table and cannot be a partitioned view.

The composable SQL query cannot include nested composable SQL, aggregate function, subquery, ranking function, full-text features, user-defined functions that perform data access, or the textptr function.

The target table must be a local base table with no triggers, no foreign keys, no merge replication, or updatable subscriptions for transactional replication.
     


No comments:

Post a Comment