T-SQL Tuesday 37: Hidden Joins
T-SQL Tuesday is turning four! (Or is it three in human years? Is there some sort of conversion for blogging parties?) December 2012′s topic is a Month of Joins, and it is being hosted by Sebastian Meine. Let’s jump in.
As a novice in writing T-SQL, one saying you may overhear is that “Joins slow down queries.” Then you start to think about it, and the evidence starts to pile up. For example, think about those fancy-pants data warehouses. Those denormailze their data so they don’t use as many joins! And they can be blazingly fast, so it must be true!
You decide to write queries using the minimal amount of joins possible. But did you know that you can’t always avoid joins? Or that sometimes avoiding them can be a hassle?
Subqueries are not the answer
Consider the following two queries. The first one uses a subquery to return the required set. The second uses an INNER JOIN. If you were to subscribe to the logic of “Joins Suck,” then you would expect the first to perform better than the second. Let’s test them out.
-- WARNING: -- DO NOT RUN THIS ON PRODUCTION! -- Turn on Include Actual Execution Plan (Ctrl+M) DBCC FREEPROCCACHE SET STATISTICS IO ON -- Query 1: Results via subquery SELECT p.ProductID, p.Name FROM Production.Product p WHERE p.ProductSubcategoryID IN ( SELECT psc.ProductSubcategoryID FROM Production.ProductSubcategory psc WHERE psc.Name LIKE '%Bikes%' ) -- Query 2: Same results via join SELECT p.ProductID, p.Name FROM Production.Product p INNER JOIN Production.ProductSubcategory psc ON p.ProductSubcategoryID = psc.ProductSubcategoryID WHERE psc.Name LIKE '%Bikes%'
Start by turning on Include Actual Execution Plan by clicking its button or typing Ctrl+M. Once that’s done, you can run the very top to free up the cache so we have a clean slate to work against, and then turn STATISTICS IO on so we can see the activity going on behind the scenes. Then run both queries together.
They chose the same plan! But what’s more, the first query is still using a join, even though we didn’t code it that way. Because we turned on STATISTICS IO, we can see that SQL Server is completing the same read operations as well. Despite the catchy title of this blog post, there’s no such thing as a HIDDEN JOIN. But you should keep in mind that whenever you are including more than one table in a query, SQL Server is likely to use a join in execution.
I really don’t Want a Join…
As I mentioned, there is a way to return the information without the use of a join, but it’s a bit of a pain. What’s happening here is that you are doing research and playing a larger role in the execution of the query by taking some of the work away from SQL Server. Here’s what I mean:
-- Hrm, which SubCategories match up with '%Bikes%'? SELECT psc.SubCategoryID, psc.Name FROM Production.ProductSubCategoryID psc WHERE psc.Names LIKE '%Bikes%'
-- Oh! I see I need IDs 1, 2, and 3. Let me fill those in... SELECT p.ProductID, p.Name FROM Production.Product p WHERE p.ProductSubcategoryID IN (1,2,3)
You can certainly work around joins if you’d like, but once you start writing large, complicated queries, the hassle won’t be worth it.




Pingback: T-SQL Tuesday #37 - Roundup - sqlity.net | sqlity.net