A subquery is an embedded SQL
statement within an outer query. The subquery provides an answer to the outer
query in the form of a scalar value, a list of values, or a data set, and may
be substituted for an expression, list, or table, respectively, within the
outer query. The matrix of subquery types and SELECT statement usage is shown
in Table 11-1. Traditionally, a subquery may only contain a SELECT query and
not a data-modification query, which explains why subqueries are sometimes
referred to as subselects.
Five basic forms are possible when building
a subquery, depending on the data needs and your favored syntax:
Simple subquery: The
simple subquery can be a stand-alone query and can run by itself. It is
executed once, with the result passed to the outer query. Simple subqueries are
constructed as normal SELECT queries and placed within parentheses.
Common table expression (CTE): CTEs
are a syntactical variation of the simple subquery, similar to a view, which
defines the subquery at the beginning of the query using the WITH command. The
CTE can then be accessed multiple times within the main query as if it were a view
or derived table.
Correlated subquery: This is
similar to a simple subquery except that it references at least one column in
the outer query, so it cannot run separately by itself. Conceptually, the outer
query runs first and the correlated subquery runs once for every row in the
outer query. Physically, the Query Optimizer is free to generate an efficient
query execution plan.
Row constructor: A VALUES
clause or the FROM clause that supplies hard-coded values as a subquery
Composable SQL: The
ability to pass data from an INSERT, UPDATE, or DELETE statement’s output
clause to an outer query
|
|
Expression
Subquery returns a scalar
value
|
List
Subquery returns
a list of values
|
Data Set
Subquery returns
a multi-column
data source
|
|
Any expression
e.g., SELECT list,
HAVING clause,
GROUP BY, JOIN ON, etc.
|
The subquery result is
used as an expression
supplying the value for
the column. If the result
is empty, NULL is used
instead.
|
X
|
X
|
|
Derived Table
FROM (data source) AS ALIAS
or
WITH CTE
This is the only location where a
subquery can use a table alias
|
The subquery’s data set
is accepted as a (one
row, one column)
derived table source
within the outer query. If
the result is empty, an
empty derived table
source is used.
|
The subquery’s
data set is
accepted as a
(one row)
derived table
source within
the outer query.
|
The subquery’s
data set is
accepted as a
derived table
source within
the outer query.
|
|
WHERE x
{=,<>,!=,>,>=,!>,<,<=,!<}
(subquery)
|
The WHERE clause is true
if the test value compares
true with the subquery’s
scalar value. If the
subquery returns no
result, the WHERE clause
is not true.
|
X
|
X
|
|
WHERE x
{=,<>,!=,>,>=,!>,<,<=,!<}
ALL
(subquery)
|
The WHERE condition is
true if the test value
meets the condition for
the scalar value returned
by the subquery. If the
subquery returns no
result, the WHERE
condition is not true.
|
The WHERE
condition is
true if the test
value meets the
condition for
every value
returned by the
subquery.
|
X
|
|
WHERE x
{=,<>,!=,>,>=,!>,<,<=,!<}
SOME|ANY
(subquery)
|
The WHERE condition is
true if the test value
meets the condition for
the scalar value returned
by the subquery. If the
subquery returns no
result, the where
condition is not true.
|
The WHERE
condition is
true if the test
value meets the
condition for
any value
returned by the
subquery.
|
X
|
|
WHERE x
IN | = ANY
(subquery)
|
The WHERE condition is
true if the test value is
equal to the scalar value
returned by the
subquery. If the subquery
returns no result, the
WHERE condition is not
true.
|
The WHERE
condition is
true if the test
value is found
within the list
of values
returned by the
subquery.
|
X
|
|
WHERE EXISTS (Subquery)
|
The WHERE condition is
true if the subquery
returns a value.
|
The WHERE
condition is
true if the
subquery
returns at least
one value.
|
The WHERE
condition is
true if the
subquery
returns at least
one row.
|
Simple subqueries:
Simple subqueries are executed in the
following order:
1. The simple subquery is executed once.
2. The results are passed to the outer
query.
3. The outer query is executed once.
The most basic simple subquery returns a
single (scalar) value, which is then used as an expression in the outer query,
as follows:
SELECT (SELECT 3) AS SubqueryValue;
Result:
SubqueryValue
--------------
3
The subquery (SELECT 3) returns a single
value of 3, which is passed to the outer SELECT statement.
The outer SELECT statement is then executed
as if it were the following:
SELECT 3 AS SubqueryValue;
Of course, a subquery with only hard-coded
values is of little use. A useful subquery fetches data from a table, for
example:
USE OBXKites;
SELECT ProductName
FROM dbo.Product
WHERE ProductCategoryID =
(Select ProductCategoryID
FROM dbo.ProductCategory
Where ProductCategoryName = ‘Kite’);
To execute this query, SQL Server first
evaluates the subquery and returns a value to the outer query (your unique
identifier will be different from the one in this query):
Select ProductCategoryID
FROM dbo.ProductCategory
Where ProductCategoryName = ‘Kite’;
Result:
ProductCategoryID
------------------------------------
c38D8113-2BED-4E2B-9ABF-A589E0818069
The outer query then executes as if it were
the following:
SELECT ProductName
FROM dbo.Product
WHERE ProductCategoryID
= ‘c38D8113-2BED-4E2B-9ABF-A589E0818069’;
Result:
ProductName
--------------------------------------------------
Basic Box Kite 21 inch
Dragon Flight
Sky Dancer
Rocket Kite
...
If you think subqueries seem similar to joins, you’re right. Both are a means of
referencing multiple data
sources within a single query, and many
queries that use joins may be rewritten as queries using subqueries.
Best Practice
Use a join to pull data from two data
sources that can be filtered or manipulated as a whole after the join. If the
data must be manipulated prior to the join, then use a derived table subquery.
Using scalar subqueries
If the subquery returns a single value it
may then be used anywhere inside the SQL SELECT statement where an expression
might be used, including column expressions, JOIN conditions, WHERE conditions,
or HAVING conditions.
Normal operators (+, =, between, and so on)
will work with single values returned from a subquery; data-type conversion
using the CAST () or CONVERT () functions may be required, however.
Scalar
sub-query in SELECT statement
Select name, deptid,Salary,
(select max(salary) from employee me where
me.deptid = e.deptid) as Department_MAX_Salary
from employee e
order by deptid, salary
Scalar sub-query in WHERE Clause
Select a.name,b.name as Department,
a.salary from employee a, Department b
where a.deptid=b.deptid and
a.salary < (select max(salary) from
employee)
Scalar sub-query in ORDER BY Clause
Select deptid, name from employee e
order by (
select name from department d where
e.deptid=d.deptid)
Using subqueries as lists:
Subqueries begin to shine when used as
lists. A single value, commonly a column, in the outer query is compared with
the subquery’s list by means of the IN operator. The subquery must return only
a single column; multiple columns will fail.
The IN operator returns a value of true if
the column value is found anywhere in the list supplied by the subquery, in the
same way that WHERE . . . IN returns a value of true when used with a hard-coded
list:
SELECT FirstName, LastName
FROM dbo.Contact
WHERE HomeRegion IN (’NC’, ‘SC’, ‘GA’, ‘AL’,
‘VA’);
A list subquery serves as a dynamic means
of generating the WHERE . . . IN condition list:
SELECT FirstName, LastName
FROM dbo.Contact
WHERE
Region IN (Subquery that returns a list of states);
Using subqueries as tables:
In the same way that a view may be used in
the place of a table within the FROM clause of a SELECT statement, a subquery
in the form of a derived table can replace any table, provided the subquery has
an alias. This technique is very powerful and is often used to break a
difficult query problem down into smaller bite-size chunks.
Using a subquery as a derived table is an
excellent solution to the aggregate-function problem. When you are building an
aggregate query, every column must participate in the aggregate function in
some way, either as a GROUP BY column or as an aggregate function (sum(), avg(),
count(), max(), or min()). This stipulation makes returning additional
descriptive information difficult. However, performing the aggregate functions
in a subquery and passing the rows found to the outer query as a derived table
enables the outer query to then return any columns desired.
For more information about aggregate
functions and the group by keyword, see Chapter 12, ‘‘Aggregating Data.’’
The question ‘‘How many of each product
have been sold?’’ is easy to answer if only one column from
the Product table is included in the
result:
SELECT P.Code, SUM(Quantity) AS
QuantitySold
FROM dbo.OrderDetail AS OD
JOIN dbo.Product AS P
ON OD.ProductID = P.ProductID
GROUP BY P.Code
ORDER BY P.Code;
Result:
Code QuantitySold
---------------
----------------------------------------
1002 47.00
1003 5.00
1004 2.00
1012 5.00
...
The result includes P.Code, but not the
name or description. Of course, it’s possible to simply group by every column
to be returned, but that’s sloppy. The following query performs the aggregate
summation in a subquery that is then joined with the Product table so that
every column is available without additional work:
SELECT P.Code,
P.ProductName,
Sales.QuantitySold
FROM dbo.Product AS P
JOIN (SELECT ProductID, SUM(Quantity) AS
QuantitySold
FROM dbo.OrderDetail
GROUP BY ProductID) AS Sales
ON P.ProductID = Sales.ProductID
ORDER BY P.Code;
The query is fast and efficient, it
provides the required aggregate data, and all the product columns can
be added to the output columns. The result
is as follows:
Code ProductName QuantitySold
----------- --------------------
------------------
1002 Dragon Flight 47.00
1003 Sky Dancer 5.00
1004 Rocket Kite 2.00
1012 Falcon F-16 5.00
...
All, some, and any
Though not as popular as IN, three other
options are worth examining when using a subquery in a WHERE clause. Each
provides a twist on how items in the subquery are matched with the WHERE clause’s
test value.
ALL must be true for every value.
SOME and ANY, which are equivalent
keywords, must be true for some of the values in the subquery.
The next query demonstrates a simple ALL subquery.
In this case, select returns true if 1 is less than every value in the
subquery:
SELECT ‘True’ as ‘AllTest’
WHERE 1 < ALL
(SELECT a
FROM
(VALUES
(2),
(3),
(5),
(7),
(9)
) AS ValuesTable (a)
);
Result:
AllTest
--------------
True
Be very careful with the ALL condition if
the subquery might return a null. A null value in the subquery results will
force the ALL to return a false, because it’s impossible to prove that the test
is true for every value in the subquery if one of those values is unknown.
In this query, the last value is changed
from a 9 to null and the query no longer returns true:
SELECT ‘True’ AS ‘AllTest’
WHERE 1 < ALL
(SELECT a
FROM
(VALUES
(2),
(3),
(5),
(7),
(null)
) AS ValuesTable(a)
);
Result (empty result set):
AllTest
--------------
The SOME and ANY conditional tests return true
if the condition is met for any values in the subquery result set.
For example:
SELECT ‘True’ as ‘SomeTest’
WHERE 5 = SOME
(SELECT a
FROM
(VALUES
(2),
(3),
(5),
(7),
(9)
) AS MyTable(a)
);
Result:
SomeTest
--------------
True
The ANY and SOME conditions are similar to
the in condition. In fact = ANY and = SOME are exactly like IN. ANY and SOME conditions
have the extra functionality to testing for other conditional tests such as <,
<=, >, =>, and <>.
Row constructors:
New for SQL Server 2008, row constructors
provide a convenient way to supply hard-coded values directly in a subquery.
The VALUES clause is wrapped in parentheses, as is every hard-coded row. It requires
an alias and a column alias list, also in parentheses.
Row constructors can be used in the FROM clause
and joined just like any other type of data source.
The next query creates a row constructors
data source called MyRowConstructor with two columns, a and b:
SELECT a, b
FROM
(VALUES
(1, 2),
(3, 4),
(5, 6),
(7, 8),
(9, 10)
) AS MyRowConstructor (a, b)
Result:
a b
----------- -----------
1 2
3 4
5 6
7 8
9 10
Common table expressions
The common table expression (CTE) defines
what could be considered a temporary view, which can be referenced just like a
view in the same query. Because CTEs may be used in the same ways that simple subqueries
are used and they compile exactly like a simple subquery, I’ve included them in
the simple subquery heading and will show example code CTEs alongside simple
subqueries.
The CTE uses the WITH clause, which defines
the CTE. Inside the WITH clause is the name, column aliases, and SQL code for
the CTE subquery. The main query can then reference the CTE as a data source:
WITH CTEName (Column aliases)
AS (Simple Subquery)
SELECT. . .
FROM CTEName;
The WITH keyword not only begins a CTE, it
also adds a hint to a table reference. This is
why the statement before a CTE must be terminated with a semicolon — just one
more reason to always terminate every statement with a semicolon.
The following example is the exact same
query as the preceding subquery, only in CTE format. The name of the CTE is CTEQuery.
It returns the ProductionCategoryID column and uses the exact same SQL Select statement
as the preceding simple subquery:
WITH CTEQuery (ProductCategoryID)
AS (Select ProductCategoryID
from dbo.ProductCategory
Where ProductCategoryName = ‘Kite’)
(Note that a CTE by itself is an incomplete
SQL statement. If you try to run the preceding code, you will get a syntax
error.)
Once the CTE has been defined in the WITH clause,
the main portion of the query can reference the CTE using its name as if the
CTE were any other table source, such as a table or a view. Here’s the complete
example, including the CTE and the main query:
WITH CTEQuery (ProductCategoryID)
AS (Select ProductCategoryID
from dbo.ProductCategory
Where ProductCategoryName = ‘Kite’)
SELECT ProductName
FROM dbo.Product
WHERE ProductCategoryID
= (SELECT ProductCategoryID FROM CTEQuery);
To include multiple CTEs within the same
query, define the CTEs in sequence prior to the
main query:
WITH
CTE1Name (column names)
AS (Simple Subquery),
CTE2Name (column names)
AS (Simple Subquery)
SELECT. . .
FROM CTE1Name
INNER JOIN CTE2Name
ON . . .
Although CTEs may include complex queries,
they come with two key restrictions:
■ unlike subqueries, CTEs may not be nested. A CTE may not
include another CTE.
■ CTEs may not reference the main query. Like simple subqueries,
they must be self-contained.
However, a CTE may reference any of the
CTEs defined before it, or even itself (see below).
Best Practice
Although the CTE syntax may initially
appear alien, for very complex queries that reference the same subquery in
multiple locations, using a CTE may reduce the amount of code and improve
readability.
A CTE is really just a different syntax for
a simple subquery used as a derived table, with one key exception: CTEs can
recursively refer to the same table using a union, and this works great for
searching an adjacency pairs pattern hierarchy.
Correlated Subqueries:
Correlated subqueries sound impressive, and
they are. They are used in the same ways that simple subqueries are used, the
difference being that correlated subqueries reference columns in the outer query.
They do this by referencing the name or alias of a table in the outer query, to
reference the outer query. This capability to limit the subquery by the outer
query makes these queries powerful and flexible. Because correlated subqueries
can reference the outer query, they are especially useful for complex WHERE conditions.
Correlating in the where clause
The capability to reference the outer query
also means that correlated subqueries won’t run by themselves because the
reference to the outer query would cause the query to fail. The logical
execution order is as follows:
1. The outer query is executed once.
2. The subquery is executed once for every
row in the outer query, substituting the values from the outer query into each
execution of the subquery.
3. The subquery’s results are integrated
into the result set.
If the outer query returns 100 rows, then
SQL Server will execute the logical equivalent of 101 queries — one for the
outer query, and one subquery for every row returned by the outer query.
In practice, the SQL Server Query Optimizer
will likely figure out a way to perform the correlated subquery without
actually performing the 101 queries. In fact, I’ve sometimes seen correlated
subqueries outperform other query plans. If they solve your problem, then don’t
avoid them for performance reasons.
Example:
You can use a subquery anywhere in a SQL
Statement where an expression is allowed. In the following example, I use a
CASE expression and subquery to check whether line item sales totals in the
SalesOrderDetail table equals the sales subtotal listed in the SalesOrderHeader
table:
SELECT
SalesOrderNumber,
SubTotal,
OrderDate,
CASE WHEN
(
SELECT SUM(LineTotal)
FROM Sales.SalesOrderDetail
WHERE SalesOrderID =
43659
) = SubTotal THEN
'balanced'
ELSE 'not balanced'
END AS LineTotals
FROM
Sales.SalesOrderHeader
WHERE
SalesOrderID = 43659;
I’ve included the CASE expression as part
of the fourth column expression. The CASE expression uses the subquery to total
the line item sales in the SalesOrderDetail table for order 43659. Notice that,
as in the preceding example, the subquery is enclosed in parentheses and uses
the SUM aggregate function to return a single value. I then use an equal (=)
operator to compare the subquery’s result to the SubTotal column in the
SalesOrderHeader table. If the amounts are equal, the CASE expression returns a
value of balanced. It the values are not equal, CASE returns not
balanced.
As you can see, the line item sales total
in the SalesOrderDetail table does not match the subtotal in the
SalesOrderHeader table, at least not for sale 43659. However, suppose you want
to verify all the sales listed in the two tables to see whether the totals
balance. To do so, you must modify both the subquery and the outer query in
order to create the condition necessary to support a correlated subquery.
A correlated subquery, also known as a repeating subquery, is one that
depends on the outer query for specific values. This is particularly important
if your outer query returns multiple rows.
The best way to understand how correlated
subqueries work is to look at an example. In the following SELECT statement, I
include a CASE expression as one of the column expressions, as you saw in the
preceding example:
SELECT
SalesOrderNumber,
SubTotal,
OrderDate,
CASE WHEN
(
SELECT SUM(LineTotal)
FROM Sales.SalesOrderDetail d
WHERE d.SalesOrderID = h.SalesOrderID
)
= h.SubTotal THEN 'balanced'
ELSE 'not balanced'
END
AS LineTotals
FROM
Sales.SalesOrderHeader h;
As before, the CASE expression includes a
subquery that returns the total amount for line item sales. However, notice
that the subquery’s WHERE clause is different from the previous example.
Instead of specifying an order ID, the WHERE clause references the SalesOrderID
column from the outer query. I do this by using table aliases to distinguish
the two columns-hfor SalesOrderHeader and d for SalesOrderDetail-and
then specifying that the column values must be equal for the WHERE condition to
evaluate to true. That means that, for each row in the SalesOrderHeader table
returned by the outer query, the SalesOrderID value associated with that row is
plugged into the subquery and compared with the SalesOrderID value of the
SalesOrderDetail table. As a result, the subquery is executed for each row
returned by the outer query.
The value returned by the subquery is then
compared to the SubTotal column of the SalesOrderHeader table and a value for
the LineTotals column is provided, a process repeated for each row.
Correlating a derived table using apply:
A subquery used as a derived table can
reference the outer query, which makes it a correlated subquery. This technique
leverages the previous correlated subquery method by using the correlation in
the WHERE clause of the derived table subquery.
Subqueries used as derived tables aren’t
allowed to reference the outer query if they are included in the outer query
with a JOIN. However, the CROSS APPLY or OUTER APPLY method of including the
subquery allows passing data to the subquery.
First, to set up some sample data:
USE tempdb;
CREATE TABLE TableA (ID INT);
INSERT INTO TableA VALUES (1);
INSERT INTO TableA VALUES (2);
CREATE TABLE TableB (ID INT);
INSERT INTO TableB VALUES (1);
INSERT INTO TableB VALUES (3);
The following query uses a CROSS APPLY to pass every row from the
outer query to the derived table subquery. The subquery then filters its rows
to those that match IDs. The CROSS APPLY returns every row from the outer query
that had a match in the subquery. Functionally, it’s the equivalent to an inner
join between TableA and TableB:
SELECT B.ID AS Bid, A.ID AS Aid
FROM TableB AS B
CROSS APPLY
(Select ID from TableA
where TableA.ID = B.ID) AS A;
Result:
Bid Aid
----------- -----------
1 1
The next query uses the same correlated
derived table subquery, but changes to an OUTER APPLY to include all rows from
the outer query. This query is the same as a left outer join between TableA and
TableB:
SELECT B.ID AS Bid, A.ID AS Aid
FROM TableB AS B
OUTER APPLY
(Select ID from TableA
where TableA.ID = B.ID) AS A;
Result:
ID ID
----------- -----------
1 1
3 NULL
Composable SQL
Composable SQL, also called select from
output or DML table source (in SQL Server BOL), is the ability to pass data
from an insert, update, or delete’s output clause to an outer query. This is a
very powerful new way to build subqueries, and it can significantly reduce the
amount of code and improve the performance of code that needs to write to one
table, and then, based on that write, write to another table.
To track the evolution of composable SQL
(illustrated in Figure 11-3), SQL Server has always had DML triggers, which
include the inserted and deleted virtual tables. Essentially, these are a view
to the DML modification that fired the triggers. The deleted table holds the
before image of the data, and the inserted table holds the after image.
Since SQL Server 2005, any DML statement
that modifies data (INSERT, UPDATE, DELETE, MERGE) can have an optional OUTPUT clause
that can SELECT from the virtual inserted and deleted table. The OUTPUT clause
can pass the data to the client or insert it directly into a table.
The inserted and deleted virtual tables are
covered in Chapter 26, ‘‘Creating DML Triggers,’’ and the output clause is
detailed in Chapter 15, ‘‘Modifying Data.’’ In SQL Server 2008, composable SQL
can place the DML statements and its OUTPUT clause in a subquery and then
select from that subquery. The primary benefit of composable SQL, as opposed to
just using the OUTPUT clause to insert into a table, is that OUTPUT clause data
may be further filtered and manipulated by the outer query.
The following script first creates a table
and then has a composable SQL query. The subquery has an UPDATE command with an
OUTPUT clause. The OUTPUT clause passes the oldvalue and newvalue columns to
the outer query. The outer query filters out TestData and then inserts it into
the CompSQL table:
CREATE TABLE CompSQL (oldvalue varchar(50),
newvalue varchar(50));
INSERT INTO CompSQL (oldvalue, newvalue )
SELECT oldvalue, newvalue
FROM
(UPDATE HumanResources.Department
SET GroupName = ‘Composable SQL Test’
OUTPUT Deleted.GroupName as ‘oldvalue’,
Inserted.GroupName as ‘newvalue’
WHERE Name = ‘Sales’);
SELECT oldvalue, newvalue
FROM CompSQL
WHERE newvalue <> ‘TestData’;
Result:
oldvalue newvalue
-----------------------
------------------------------
Sales and Marketing Composable SQL Test
Note several restrictions on composable
SQL:
■ The update DML in the subquery must modify a local table and
cannot be a partitioned view.
■ The composable SQL query cannot include nested composable SQL,
aggregate function, subquery, ranking function, full-text features,
user-defined functions that perform data access, or the textptr function.
■ The target table must be a local base table with no triggers,
no foreign keys, no merge replication, or updatable subscriptions for
transactional replication.
No comments:
Post a Comment