Proxy > Gmail Facebook Yahoo!

Writing Joins in T-SQL



Occasionally someone will ask for my help with a query and say that both a right outer join and a left outer join was tried, and still the expected results were not achieved. That made me realize that some developers do not completely understand outer joins and that an article explaining how to use them might help.

Inner Join Review

The most commonly used join is an INNER JOIN. This type of join combines rows from two tables only when they match on the joining condition. Usually the primary key from one table matches a foreign key on another table, but join conditions can be more complex than that.
(Note: Most of the information in this article can be applied to views as well as tables. For simplicity, the word "table" will be used to mean table or view unless stated otherwise. Keys are not defined on views, but the underlying table’s key column or columns are often included in the view. To keep things simple, let's assume that is the case.)
INNER JOIN will retrieve a results row only where there is a perfect match between the two tables in the join condition. You will also often see one row from one of the tables matching multiple rows in the other table. For example, one customer can have many orders. One order can have many order details. The data on the one side will be repeated for each row on the many side. The following query is an example showing how the information from the Sales.SalesOrderHeader is repeated on each matching row:

SELECT s.SalesOrderID, OrderDate,ProductID FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS d ON s.SalesOrderID = d.SalesOrderID
ORDER BY s.SalesOrderID, ProductID


Outer Join Introduction
OUTER JOIN is used to join two tables even if there is not a match. An OUTER JOIN can be used to return a list of all the customers and the orders even if no orders have been placed for some of the customers. A keyword, RIGHT or LEFT, is used to specify which side of the join returns all possible rows. I like using LEFT because it makes sense to me to list the most important table first. Except for one example demonstrating RIGHT OUTER JOIN, this article will use left joins. Just a note: the keywords INNER and OUTER are optional.
The next example returns a list of all the customers and the SalesOrderID for the orders that have been placed, if any.
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
It uses the LEFT keyword because the Sales.Customer table is located on
the left side and we want all rows returned from that table even if
there is no match in the Sales.SalesOrderHeader table. This is an
important point. Notice also that the CustomerID column is the primary
key of the Sales.Customer table and a foreign key in the
Sales.SalesOrderHeader table. This means that there must be a valid
customer for every order placed. Writing a query that returns all
orders and the customers if they match doesn’t make sense. The LEFT
table should always be the primary key table when performing a LEFT
OUTER JOIN.
If the location of the tables in the query are switched, the RIGHT keyword is used and the same results are returned:
SELECT c.CustomerID, s.SalesOrderID FROM Sales.SalesOrderHeader s
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID = s.CustomerID
Notice that I didn’t change the join condition at all. It doesn’t
matter which side of the equal sign the columns are listed; only where
the tables are named is it important.
If I have a LEFT OUTER JOIN, what is returned from the table on the right side of the join where there is not a match? Each column from the right side will return a NULL. Try this query which lists the non-matching rows first:
SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
ORDER BY s.SalesOrderID

By adding a WHERE clause to check for a NULL SalesOrderID, you can find all the customers who have not placed an order. My copy of AdventureWorks returns 66 customers with no orders:

SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
WHERE s.SalesOrderID IS NULL
Occasionally, you will need to be more specific. How can you find all the customers who have not placed an order in 2002? There are several ways to solve this problem. You could create a view of all the orders placed in 2002 and join the view on the Sales.Customer table. Another option is to create a CTE, or Common Table Expression, of the orders placed in 2002. This example shows how to use a CTE to get the required results:
WITH s AS
(   SELECT SalesOrderID, customerID
    FROM Sales.SalesOrderHeader
    WHERE OrderDate between '1/1/2002' and '12/31/2002'
)
SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer c 
LEFT OUTER JOIN s ON c.customerID = s.customerID WHERE s.SalesOrderID IS NULL
My favorite technique to solve this problem is much simpler. Additional criteria, in this case filtering on the OrderDate, can be added to the join condition. The query joins all customers to the orders placed in 2002. Then the results are restricted to those where there is no match. This query will return exactly the same results as the previous, more complicated query:
SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
                              and s.OrderDate between '1/1/2002' and '12/31/2002' WHERE s.SalesOrderID IS NULL

Using Aggregates with Outer Joins

Aggregate queries introduce another pitfall watch out for. The following example is an attempt to list all the customers and the count of the orders that have been placed. Can you spot the problem?
SELECT c.CustomerID, count(*) OrderCountFROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
GROUP BY c.CustomerIDORDER BY OrderCount
Now the customers with no orders look like they have placed one order. That is because this query is counting the rows returned. To solve this problem, count the SalesOrderID column. NULL values are eliminated from the count.
SELECT c.CustomerID, count(SalesOrderID) OrderCountFROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader sON c.CustomerID = s.CustomerID 
GROUP BY c.CustomerIDORDER BY OrderCount

Multiple Joins

Once more than two tables are involved in the query, things get a bit more complicated. When a table is joined to the RIGHT table, a LEFT OUTER JOIN must be used. That is because the NULL rows from the RIGHT table will not match any rows on the new table. An INNER JOIN causes the non-matching rows to be eliminated from the results. If the Sales.SalesOrderDetail table is joined to the Sales.SalesOrderHeader table and an INNER JOIN is used, none of the customers without orders will show up. NULL cannot be joined to any value, not even NULL.
To illustrate this point, when I add the Sales.SalesOrderDetail table to one of the previous queries that checked for customers without orders, I get back no rows at all.
SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
INNER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL
To get correct results, change the INNER JOIN to a LEFT JOIN.
SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
LEFT OUTER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL


What about additional tables joined to Sales.Customer, the table on the left? Must outer joins be used? If it is possible that there are some rows without matches, it must be an outer join to guarantee that no results are lost. The Sales.Customer table has a foreign key pointing to the Sales.SalesTerritory table. Every customer’s territory ID must match a valid value in Sales.SalesTerritory. This query returns 66 rows as expected because it is impossible to eliminate any customers by joining to Sales.SalesTerritory:
SELECT c.CustomerID, s.SalesOrderID, t.Name FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
INNER JOIN Sales.SalesTerritory t ON c.TerritoryID = t.TerritoryID
WHERE SalesOrderID IS NULL
Sales.SalesTerritory is the primary key table; every customer must match a valid territory. If you wanted to write a query that listed all territories, even those that had no customers, an outer join will be used. This time, Sales.Customers is on the right side of the join.
SELECT t.Name, CustomerID FROM Sales.SalesTerritory t 
LEFT OUTER JOIN Sales.Customer c ON t.TerritoryID = c.TerritoryID

Conclusion

Queries with outer joins can be tricky to write. Extra time and care must be spent making sure the results are correct. Think about the relationship between the tables and make sure that the outer join is continued down the path.  This article covered almost everything you need to know about outer joins.


Responses

0 Respones to "Writing Joins in T-SQL"


Send mail to your Friends.  

Expert Feed

 
Return to top of page Copyright © 2011 | My Code Logic Designed by Suneel Kumar