T-SQL Tuesday 37: Hidden Joins

Logo for T-SQL Tuesday

T-SQL Tuesday

T-SQL Tuesday is turning four! (Or is it three in human years? Is there some sort of conversion for blogging parties?) December 2012′s topic is a Month of Joins, and it is being hosted by Sebastian Meine. Let’s jump in.

As a novice in writing T-SQL, one saying you may overhear is that “Joins slow down queries.” Then you start to think about it, and the evidence starts to pile up. For example, think about those fancy-pants data warehouses. Those denormailze their data so they don’t use as many joins! And they can be blazingly fast, so it must be true!

You decide to write queries using the minimal amount of joins possible. But did you know that you can’t always avoid joins? Or that sometimes avoiding them can be a hassle?

Subqueries are not the answer

Consider the following two queries. The first one uses a subquery to return the required set. The second uses an INNER JOIN. If you were to subscribe to the logic of “Joins Suck,” then you would expect the first to perform better than the second. Let’s test them out.

-- WARNING:
-- DO NOT RUN THIS ON PRODUCTION!
-- Turn on Include Actual Execution Plan (Ctrl+M)
DBCC FREEPROCCACHE
SET STATISTICS IO ON

-- Query 1: Results via subquery
SELECT p.ProductID, p.Name
FROM Production.Product p
WHERE p.ProductSubcategoryID IN (
	SELECT psc.ProductSubcategoryID
	FROM Production.ProductSubcategory psc
	WHERE psc.Name LIKE '%Bikes%'
)

-- Query 2: Same results via join
SELECT p.ProductID, p.Name
FROM Production.Product p
INNER JOIN Production.ProductSubcategory psc
	ON p.ProductSubcategoryID = psc.ProductSubcategoryID
WHERE psc.Name LIKE '%Bikes%'

Start by turning on Include Actual Execution Plan by clicking its button or typing Ctrl+M. Once that’s done, you can run the very top to free up the cache so we have a clean slate to work against, and then turn STATISTICS IO on so we can see the activity going on behind the scenes. Then run both queries together.

Mirror, mirror...

Mirror, mirror…

They chose the same plan! But what’s more, the first query is still using a join, even though we didn’t code it that way. Because we turned on STATISTICS IO, we can see that SQL Server is completing the same read operations as well. Despite the catchy title of this blog post, there’s no such thing as a HIDDEN JOIN. But you should keep in mind that whenever you are including more than one table in a query, SQL Server is likely to use a join in execution.

I really don’t Want a Join…

As I mentioned, there is a way to return the information without the use of a join, but it’s a bit of a pain. What’s happening here is that you are doing research and playing a larger role in the execution of the query by taking some of the work away from SQL Server. Here’s what I mean:

-- Hrm, which SubCategories match up with '%Bikes%'?
SELECT psc.SubCategoryID, psc.Name
FROM Production.ProductSubCategoryID psc
WHERE psc.Names LIKE '%Bikes%'
Find those IDs!

Find those IDs!

-- Oh! I see I need IDs 1, 2, and 3. Let me fill those in...
SELECT p.ProductID, p.Name
FROM Production.Product p
WHERE p.ProductSubcategoryID IN (1,2,3)
Was it worth the extra work?

Was it worth the extra work?

You can certainly work around joins if you’d like, but once you start writing large, complicated queries, the hassle won’t be worth it.

Tagged on: , ,

One thought on “T-SQL Tuesday 37: Hidden Joins

  1. Pingback: T-SQL Tuesday #37 - Roundup - sqlity.net | sqlity.net