SQL Server

Myth of the Inclusion Filter

When we think about sets, we tend to think in terms of inclusion. “This is my set of donors who have given in 2012.” “Please show me all the customers that have bought at least $10,000 worth of merchandise.”

This is fine, but I think it’s given rise to a fallacy: that of the Inclusion Filter.” It’s a fallacy because no such thing exists.

From a linguistic vantagepoint, the words inclusion and filter don’t jive together. They have two different meanings.

inclusion: the action or state of being taken in or in comprising a part of a whole or group

filter: something that has the effect of a filter (as by holding back elements or modifying the appearance of something)

In terms of SQL Server, filters are applied via the WHERE clause. Often termed the predicate, it’s used to reduce the scope of the query. In other words, it takes a larger set – perhaps all donors or all customers – and excludes the records that don’t fit the criteria defined by the clause. This is the basic building blocks of query writing.

SELECT FullName, CustomerID, etc.
FROM dbo.Customers
WHERE PurchaseDate >= '20120101'
  AND SaleType = 'Closeout'

In the above query, we’re querying the whole of the dbo.Customers table, but we’re only returning records that have made purchases since the beginning of this year and those sales have been closeouts. Our WHERE clause filters out the records that don’t match the criteria so they don’t show up in our results set. The WHERE clause isn’t “including” those customers: if we took it away, we’d return the whole table, a portion of which would be the data we were actually looking for.

A Set of Customers

The WHERE clause filters out the Customers table except for the white portion.

What’s the Point?

Over the past few weeks, I’ve heard over and over again about “inclusion filters.” That we’ll be able to “include this set of people” and “include that set of people,” as though with the toggle of a switch. While it’s easy enough to see what users mean by this sort of talk – they usually mean to UNION two different sets into a larger whole – I think it does users a disservice to not explain the difference. The query below more typically resembles users’ requests for multiple inclusion sets.

SELECT FullName, CustomerID, etc.
FROM dbo.Customers
WHERE PurchaseDate >= '20120101'
  AND SaleType = 'Closeout'
SELECT FullName, CustomerID, etc.
FROM dbo.PotentialCustomers
WHERE InterestDate >= '20120101'

As in the first example, we would return just the customers who had made closeout purchase this year. But in addition, we’d also include potential customers who had shown interest in our products this year as well. What might the plain-English version of this request sound like? “Show me all the people who made purchases from closeout this year, and include all the people who signed up for our mailing list this year too.”

A Pair of Unioned Sets

Including two different result sets via UNION.

Despite the word “inclusion,” there is still no such thing as an inclusion filter.

I don’t ever expect an end user to fully understand how result sets are built and delivered by DBAs and developers via the back-end, but do I advocate for explaining the process from a 30,000-foot point-of-view. It can help clear up issues when the users ask for something that isn’t actually stored in the database. It can help them learn more about the data they do have. And perhaps they’ll even see that you’re on their side, helping them learn and grow. It’s a great way to establish trust.