My Sql Note: subquery

A subquery is an embedded SQL statement within an outer query. The subquery provides an answer to the outer query in the form of a scalar value, a list of values, or a data set, and may be substituted for an expression, list, or table, respectively, within the outer query. The matrix of subquery types and SELECT statement usage is shown in Table 11-1. Traditionally, a subquery may only contain a SELECT query and not a data-modification query, which explains why subqueries are sometimes referred to as subselects.

Five basic forms are possible when building a subquery, depending on the data needs and your favored syntax:

Simple subquery: The simple subquery can be a stand-alone query and can run by itself. It is executed once, with the result passed to the outer query. Simple subqueries are constructed as normal SELECT queries and placed within parentheses.

Common table expression (CTE): CTEs are a syntactical variation of the simple subquery, similar to a view, which defines the subquery at the beginning of the query using the WITH command. The CTE can then be accessed multiple times within the main query as if it were a view or derived table.

Correlated subquery: This is similar to a simple subquery except that it references at least one column in the outer query, so it cannot run separately by itself. Conceptually, the outer query runs first and the correlated subquery runs once for every row in the outer query. Physically, the Query Optimizer is free to generate an efficient query execution plan.

Row constructor: A VALUES clause or the FROM clause that supplies hard-coded values as a subquery

Composable SQL: The ability to pass data from an INSERT, UPDATE, or DELETE statement’s output clause to an outer query

	Expression Subquery returns a scalar value	List Subquery returns a list of values	Data Set Subquery returns a multi-column data source
Any expression e.g., SELECT list, HAVING clause, GROUP BY, JOIN ON, etc.	The subquery result is used as an expression supplying the value for the column. If the result is empty, NULL is used instead.	X	X
Derived Table FROM (data source) AS ALIAS or WITH CTE This is the only location where a subquery can use a table alias	The subquery’s data set is accepted as a (one row, one column) derived table source within the outer query. If the result is empty, an empty derived table source is used.	The subquery’s data set is accepted as a (one row) derived table source within the outer query.	The subquery’s data set is accepted as a derived table source within the outer query.
WHERE x {=,<>,!=,>,>=,!>,<,<=,!<} (subquery)	The WHERE clause is true if the test value compares true with the subquery’s scalar value. If the subquery returns no result, the WHERE clause is not true.	X	X
WHERE x {=,<>,!=,>,>=,!>,<,<=,!<} ALL (subquery)	The WHERE condition is true if the test value meets the condition for the scalar value returned by the subquery. If the subquery returns no result, the WHERE condition is not true.	The WHERE condition is true if the test value meets the condition for every value returned by the subquery.	X
WHERE x {=,<>,!=,>,>=,!>,<,<=,!<} SOME\|ANY (subquery)	The WHERE condition is true if the test value meets the condition for the scalar value returned by the subquery. If the subquery returns no result, the where condition is not true.	The WHERE condition is true if the test value meets the condition for any value returned by the subquery.	X
WHERE x IN \| = ANY (subquery)	The WHERE condition is true if the test value is equal to the scalar value returned by the subquery. If the subquery returns no result, the WHERE condition is not true.	The WHERE condition is true if the test value is found within the list of values returned by the subquery.	X
WHERE EXISTS (Subquery)	The WHERE condition is true if the subquery returns a value.	The WHERE condition is true if the subquery returns at least one value.	The WHERE condition is true if the subquery returns at least one row.

Simple subqueries:

Simple subqueries are executed in the following order:

1. The simple subquery is executed once.

2. The results are passed to the outer query.

3. The outer query is executed once.

The most basic simple subquery returns a single (scalar) value, which is then used as an expression in the outer query, as follows:

SELECT (SELECT 3) AS SubqueryValue;

Result:

SubqueryValue

--------------

The subquery (SELECT 3) returns a single value of 3, which is passed to the outer SELECT statement.

The outer SELECT statement is then executed as if it were the following:

SELECT 3 AS SubqueryValue;

Of course, a subquery with only hard-coded values is of little use. A useful subquery fetches data from a table, for example:

USE OBXKites;

SELECT ProductName

FROM dbo.Product

WHERE ProductCategoryID =

(Select ProductCategoryID

FROM dbo.ProductCategory

Where ProductCategoryName = ‘Kite’);

To execute this query, SQL Server first evaluates the subquery and returns a value to the outer query (your unique identifier will be different from the one in this query):

Select ProductCategoryID

FROM dbo.ProductCategory

Where ProductCategoryName = ‘Kite’;

Result:

ProductCategoryID

------------------------------------

c38D8113-2BED-4E2B-9ABF-A589E0818069

The outer query then executes as if it were the following:

SELECT ProductName

FROM dbo.Product

WHERE ProductCategoryID

= ‘c38D8113-2BED-4E2B-9ABF-A589E0818069’;

Result:

ProductName

--------------------------------------------------

Basic Box Kite 21 inch

Dragon Flight

Sky Dancer

Rocket Kite

...

If you think subqueries seem similar to joins, you’re right. Both are a means of referencing multiple data

sources within a single query, and many queries that use joins may be rewritten as queries using subqueries.

Best Practice

Use a join to pull data from two data sources that can be filtered or manipulated as a whole after the join. If the data must be manipulated prior to the join, then use a derived table subquery.

Using scalar subqueries

If the subquery returns a single value it may then be used anywhere inside the SQL SELECT statement where an expression might be used, including column expressions, JOIN conditions, WHERE conditions, or HAVING conditions.

Normal operators (+, =, between, and so on) will work with single values returned from a subquery; data-type conversion using the CAST () or CONVERT () functions may be required, however.

Scalar sub-query in SELECT statement

Select name, deptid,Salary,

(select max(salary) from employee me where me.deptid = e.deptid) as Department_MAX_Salary

from employee e

order by deptid, salary

Scalar sub-query in WHERE Clause

Select a.name,b.name as Department, a.salary from employee a, Department b

where a.deptid=b.deptid and

a.salary < (select max(salary) from employee)

Scalar sub-query in ORDER BY Clause

Select deptid, name from employee e

order by (

select name from department d where e.deptid=d.deptid)

Using subqueries as lists:

Subqueries begin to shine when used as lists. A single value, commonly a column, in the outer query is compared with the subquery’s list by means of the IN operator. The subquery must return only a single column; multiple columns will fail.

The IN operator returns a value of true if the column value is found anywhere in the list supplied by the subquery, in the same way that WHERE . . . IN returns a value of true when used with a hard-coded list:

SELECT FirstName, LastName

FROM dbo.Contact

WHERE HomeRegion IN (’NC’, ‘SC’, ‘GA’, ‘AL’, ‘VA’);

A list subquery serves as a dynamic means of generating the WHERE . . . IN condition list:

SELECT FirstName, LastName

FROM dbo.Contact

WHERE Region IN (Subquery that returns a list of states);

Using subqueries as tables:

In the same way that a view may be used in the place of a table within the FROM clause of a SELECT statement, a subquery in the form of a derived table can replace any table, provided the subquery has an alias. This technique is very powerful and is often used to break a difficult query problem down into smaller bite-size chunks.

Using a subquery as a derived table is an excellent solution to the aggregate-function problem. When you are building an aggregate query, every column must participate in the aggregate function in some way, either as a GROUP BY column or as an aggregate function (sum(), avg(), count(), max(), or min()). This stipulation makes returning additional descriptive information difficult. However, performing the aggregate functions in a subquery and passing the rows found to the outer query as a derived table enables the outer query to then return any columns desired.

For more information about aggregate functions and the group by keyword, see Chapter 12, ‘‘Aggregating Data.’’

The question ‘‘How many of each product have been sold?’’ is easy to answer if only one column from

the Product table is included in the result:

SELECT P.Code, SUM(Quantity) AS QuantitySold

FROM dbo.OrderDetail AS OD

JOIN dbo.Product AS P

ON OD.ProductID = P.ProductID

GROUP BY P.Code

ORDER BY P.Code;

Result:

Code QuantitySold

--------------- ----------------------------------------

1002 47.00

1003 5.00

1004 2.00

1012 5.00

...

The result includes P.Code, but not the name or description. Of course, it’s possible to simply group by every column to be returned, but that’s sloppy. The following query performs the aggregate summation in a subquery that is then joined with the Product table so that every column is available without additional work:

SELECT P.Code,

P.ProductName,

Sales.QuantitySold

FROM dbo.Product AS P

JOIN (SELECT ProductID, SUM(Quantity) AS QuantitySold

FROM dbo.OrderDetail

GROUP BY ProductID) AS Sales

ON P.ProductID = Sales.ProductID

ORDER BY P.Code;

The query is fast and efficient, it provides the required aggregate data, and all the product columns can

be added to the output columns. The result is as follows:

Code ProductName QuantitySold

----------- -------------------- ------------------

1002 Dragon Flight 47.00

1003 Sky Dancer 5.00

1004 Rocket Kite 2.00

1012 Falcon F-16 5.00

...

All, some, and any

Though not as popular as IN, three other options are worth examining when using a subquery in a WHERE clause. Each provides a twist on how items in the subquery are matched with the WHERE clause’s test value.

ALL must be true for every value.

SOME and ANY, which are equivalent keywords, must be true for some of the values in the subquery.

The next query demonstrates a simple ALL subquery. In this case, select returns true if 1 is less than every value in the subquery:

SELECT ‘True’ as ‘AllTest’

WHERE 1 < ALL

(SELECT a

FROM

(VALUES

(2),

(3),

(5),

(7),

(9)

) AS ValuesTable (a)

);

Result:

AllTest

--------------

True

Be very careful with the ALL condition if the subquery might return a null. A null value in the subquery results will force the ALL to return a false, because it’s impossible to prove that the test is true for every value in the subquery if one of those values is unknown.

In this query, the last value is changed from a 9 to null and the query no longer returns true:

SELECT ‘True’ AS ‘AllTest’

WHERE 1 < ALL

(SELECT a

FROM

(VALUES

(2),

(3),

(5),

(7),

(null)

) AS ValuesTable(a)

);

Result (empty result set):

AllTest

--------------

The SOME and ANY conditional tests return true if the condition is met for any values in the subquery result set.

For example:

SELECT ‘True’ as ‘SomeTest’

WHERE 5 = SOME

(SELECT a

FROM

(VALUES

(2),

(3),

(5),

(7),

(9)

) AS MyTable(a)

);

Result:

SomeTest

--------------

True

The ANY and SOME conditions are similar to the in condition. In fact = ANY and = SOME are exactly like IN. ANY and SOME conditions have the extra functionality to testing for other conditional tests such as <, <=, >, =>, and <>.

Row constructors:

New for SQL Server 2008, row constructors provide a convenient way to supply hard-coded values directly in a subquery. The VALUES clause is wrapped in parentheses, as is every hard-coded row. It requires an alias and a column alias list, also in parentheses.

Row constructors can be used in the FROM clause and joined just like any other type of data source.

The next query creates a row constructors data source called MyRowConstructor with two columns, a and b:

SELECT a, b

FROM

(VALUES

(1, 2),

(3, 4),

(5, 6),

(7, 8),

(9, 10)

) AS MyRowConstructor (a, b)

Result:

a b

----------- -----------

1 2

3 4

5 6

7 8

9 10

Common table expressions

The common table expression (CTE) defines what could be considered a temporary view, which can be referenced just like a view in the same query. Because CTEs may be used in the same ways that simple subqueries are used and they compile exactly like a simple subquery, I’ve included them in the simple subquery heading and will show example code CTEs alongside simple subqueries.

The CTE uses the WITH clause, which defines the CTE. Inside the WITH clause is the name, column aliases, and SQL code for the CTE subquery. The main query can then reference the CTE as a data source:

WITH CTEName (Column aliases)

AS (Simple Subquery)

SELECT. . .

FROM CTEName;

The WITH keyword not only begins a CTE, it also adds a hint to a table reference. This is why the statement before a CTE must be terminated with a semicolon — just one more reason to always terminate every statement with a semicolon.

The following example is the exact same query as the preceding subquery, only in CTE format. The name of the CTE is CTEQuery. It returns the ProductionCategoryID column and uses the exact same SQL Select statement as the preceding simple subquery:

WITH CTEQuery (ProductCategoryID)

AS (Select ProductCategoryID

from dbo.ProductCategory

Where ProductCategoryName = ‘Kite’)

(Note that a CTE by itself is an incomplete SQL statement. If you try to run the preceding code, you will get a syntax error.)

Once the CTE has been defined in the WITH clause, the main portion of the query can reference the CTE using its name as if the CTE were any other table source, such as a table or a view. Here’s the complete example, including the CTE and the main query:

WITH CTEQuery (ProductCategoryID)

AS (Select ProductCategoryID

from dbo.ProductCategory

Where ProductCategoryName = ‘Kite’)

SELECT ProductName

FROM dbo.Product

WHERE ProductCategoryID

= (SELECT ProductCategoryID FROM CTEQuery);

To include multiple CTEs within the same query, define the CTEs in sequence prior to the main query:

WITH

CTE1Name (column names)

AS (Simple Subquery),

CTE2Name (column names)

AS (Simple Subquery)

SELECT. . .

FROM CTE1Name

INNER JOIN CTE2Name

ON . . .

Although CTEs may include complex queries, they come with two key restrictions:

■ unlike subqueries, CTEs may not be nested. A CTE may not include another CTE.

■ CTEs may not reference the main query. Like simple subqueries, they must be self-contained.

However, a CTE may reference any of the CTEs defined before it, or even itself (see below).

Best Practice

Although the CTE syntax may initially appear alien, for very complex queries that reference the same subquery in multiple locations, using a CTE may reduce the amount of code and improve readability.

A CTE is really just a different syntax for a simple subquery used as a derived table, with one key exception: CTEs can recursively refer to the same table using a union, and this works great for searching an adjacency pairs pattern hierarchy.

Correlated Subqueries:

Correlated subqueries sound impressive, and they are. They are used in the same ways that simple subqueries are used, the difference being that correlated subqueries reference columns in the outer query. They do this by referencing the name or alias of a table in the outer query, to reference the outer query. This capability to limit the subquery by the outer query makes these queries powerful and flexible. Because correlated subqueries can reference the outer query, they are especially useful for complex WHERE conditions.

Correlating in the where clause

The capability to reference the outer query also means that correlated subqueries won’t run by themselves because the reference to the outer query would cause the query to fail. The logical execution order is as follows:

1. The outer query is executed once.

2. The subquery is executed once for every row in the outer query, substituting the values from the outer query into each execution of the subquery.

3. The subquery’s results are integrated into the result set.

If the outer query returns 100 rows, then SQL Server will execute the logical equivalent of 101 queries — one for the outer query, and one subquery for every row returned by the outer query.

In practice, the SQL Server Query Optimizer will likely figure out a way to perform the correlated subquery without actually performing the 101 queries. In fact, I’ve sometimes seen correlated subqueries outperform other query plans. If they solve your problem, then don’t avoid them for performance reasons.

Example:

You can use a subquery anywhere in a SQL Statement where an expression is allowed. In the following example, I use a CASE expression and subquery to check whether line item sales totals in the SalesOrderDetail table equals the sales subtotal listed in the SalesOrderHeader table:

SELECT

SalesOrderNumber,

SubTotal,

OrderDate,

CASE WHEN

(

SELECT SUM(LineTotal)

FROM Sales.SalesOrderDetail

WHERE SalesOrderID = 43659

) = SubTotal THEN 'balanced'

ELSE 'not balanced'

END AS LineTotals

FROM

Sales.SalesOrderHeader

WHERE

SalesOrderID = 43659;

I’ve included the CASE expression as part of the fourth column expression. The CASE expression uses the subquery to total the line item sales in the SalesOrderDetail table for order 43659. Notice that, as in the preceding example, the subquery is enclosed in parentheses and uses the SUM aggregate function to return a single value. I then use an equal (=) operator to compare the subquery’s result to the SubTotal column in the SalesOrderHeader table. If the amounts are equal, the CASE expression returns a value of balanced. It the values are not equal, CASE returns not balanced.

As you can see, the line item sales total in the SalesOrderDetail table does not match the subtotal in the SalesOrderHeader table, at least not for sale 43659. However, suppose you want to verify all the sales listed in the two tables to see whether the totals balance. To do so, you must modify both the subquery and the outer query in order to create the condition necessary to support a correlated subquery. A correlated subquery, also known as a repeating subquery, is one that depends on the outer query for specific values. This is particularly important if your outer query returns multiple rows.

The best way to understand how correlated subqueries work is to look at an example. In the following SELECT statement, I include a CASE expression as one of the column expressions, as you saw in the preceding example:

SELECT

SalesOrderNumber,

SubTotal,

OrderDate,

CASE WHEN

(

SELECT SUM(LineTotal)

FROM Sales.SalesOrderDetail d

WHERE d.SalesOrderID = h.SalesOrderID

) = h.SubTotal THEN 'balanced'

ELSE 'not balanced'

END AS LineTotals

FROM

Sales.SalesOrderHeader h;

As before, the CASE expression includes a subquery that returns the total amount for line item sales. However, notice that the subquery’s WHERE clause is different from the previous example. Instead of specifying an order ID, the WHERE clause references the SalesOrderID column from the outer query. I do this by using table aliases to distinguish the two columns-hfor SalesOrderHeader and d for SalesOrderDetail-and then specifying that the column values must be equal for the WHERE condition to evaluate to true. That means that, for each row in the SalesOrderHeader table returned by the outer query, the SalesOrderID value associated with that row is plugged into the subquery and compared with the SalesOrderID value of the SalesOrderDetail table. As a result, the subquery is executed for each row returned by the outer query.

The value returned by the subquery is then compared to the SubTotal column of the SalesOrderHeader table and a value for the LineTotals column is provided, a process repeated for each row.

Correlating a derived table using apply:

A subquery used as a derived table can reference the outer query, which makes it a correlated subquery. This technique leverages the previous correlated subquery method by using the correlation in the WHERE clause of the derived table subquery.

Subqueries used as derived tables aren’t allowed to reference the outer query if they are included in the outer query with a JOIN. However, the CROSS APPLY or OUTER APPLY method of including the subquery allows passing data to the subquery.

First, to set up some sample data:

USE tempdb;

CREATE TABLE TableA (ID INT);

INSERT INTO TableA VALUES (1);

INSERT INTO TableA VALUES (2);

CREATE TABLE TableB (ID INT);

INSERT INTO TableB VALUES (1);

INSERT INTO TableB VALUES (3);

The following query uses a CROSS APPLY to pass every row from the outer query to the derived table subquery. The subquery then filters its rows to those that match IDs. The CROSS APPLY returns every row from the outer query that had a match in the subquery. Functionally, it’s the equivalent to an inner join between TableA and TableB:

SELECT B.ID AS Bid, A.ID AS Aid

FROM TableB AS B

CROSS APPLY

(Select ID from TableA

where TableA.ID = B.ID) AS A;

Result:

Bid Aid

----------- -----------

1 1

The next query uses the same correlated derived table subquery, but changes to an OUTER APPLY to include all rows from the outer query. This query is the same as a left outer join between TableA and TableB:

SELECT B.ID AS Bid, A.ID AS Aid

FROM TableB AS B

OUTER APPLY

(Select ID from TableA

where TableA.ID = B.ID) AS A;

Result:

ID ID

----------- -----------

1 1

3 NULL

Composable SQL

Composable SQL, also called select from output or DML table source (in SQL Server BOL), is the ability to pass data from an insert, update, or delete’s output clause to an outer query. This is a very powerful new way to build subqueries, and it can significantly reduce the amount of code and improve the performance of code that needs to write to one table, and then, based on that write, write to another table.

To track the evolution of composable SQL (illustrated in Figure 11-3), SQL Server has always had DML triggers, which include the inserted and deleted virtual tables. Essentially, these are a view to the DML modification that fired the triggers. The deleted table holds the before image of the data, and the inserted table holds the after image.

Since SQL Server 2005, any DML statement that modifies data (INSERT, UPDATE, DELETE, MERGE) can have an optional OUTPUT clause that can SELECT from the virtual inserted and deleted table. The OUTPUT clause can pass the data to the client or insert it directly into a table.

The inserted and deleted virtual tables are covered in Chapter 26, ‘‘Creating DML Triggers,’’ and the output clause is detailed in Chapter 15, ‘‘Modifying Data.’’ In SQL Server 2008, composable SQL can place the DML statements and its OUTPUT clause in a subquery and then select from that subquery. The primary benefit of composable SQL, as opposed to just using the OUTPUT clause to insert into a table, is that OUTPUT clause data may be further filtered and manipulated by the outer query.

The following script first creates a table and then has a composable SQL query. The subquery has an UPDATE command with an OUTPUT clause. The OUTPUT clause passes the oldvalue and newvalue columns to the outer query. The outer query filters out TestData and then inserts it into the CompSQL table:

CREATE TABLE CompSQL (oldvalue varchar(50), newvalue varchar(50));

INSERT INTO CompSQL (oldvalue, newvalue )

SELECT oldvalue, newvalue

FROM

(UPDATE HumanResources.Department

SET GroupName = ‘Composable SQL Test’

OUTPUT Deleted.GroupName as ‘oldvalue’,

Inserted.GroupName as ‘newvalue’

WHERE Name = ‘Sales’);

SELECT oldvalue, newvalue

FROM CompSQL

WHERE newvalue <> ‘TestData’;

Result:

oldvalue newvalue

----------------------- ------------------------------

Sales and Marketing Composable SQL Test

Note several restrictions on composable SQL:

■ The update DML in the subquery must modify a local table and cannot be a partitioned view.

■ The composable SQL query cannot include nested composable SQL, aggregate function, subquery, ranking function, full-text features, user-defined functions that perform data access, or the textptr function.

■ The target table must be a local base table with no triggers, no foreign keys, no merge replication, or updatable subscriptions for transactional replication.

My Sql Note

Pages

Tuesday, 30 August 2016

subquery

Scalar sub-query in WHERE Clause

Scalar sub-query in ORDER BY Clause

No comments:

Post a Comment