Home MVC Storefront

Creating IN Queries With Linq To Sql

Props on this one go to Scott Hanselman who pulled me back from the edge of the cliff last night. I was particularly distraught in getting a MIX demo together where I had to do some queries using LINQ, and I couldn't for the life of me figure out how to fashion an IN query! With Scott's help (and patience) I figured it out, and thought I should blog for my own reference, at least.

 

It Depends On What Your Definition of "IN" Is...
An IN query will pull back a set of results from SQL that is within a given range. This range can be set manually, or can itself be a query. So if you have an eCommerce application and you want to know what products you have in a given user's cart, you could do this (using AdventureWorks):

SELECT * FROM Production.Product WHERE ProductID IN (SELECT ProductID FROM Sales.ShoppingCartItem WHERE ShoppingCartID='RobsCart')

This will return all the Product records that are in my cart. This is a fundamental query structure and up until today I thought, for sure, that Linq To Sql doesn't support it. I was sort of right - but not really.

 

LINQ Is People To (Or It's Made From People...)
It's important to remember that the people that made LINQ were trying to approximate a "SQL within Code" sort of thing - this means that they built LINQ to query just about anything, and also built a SQL Translator called Linq To Sql. They ran into limitations with trying to contort a static language structure (VB or C#) into SQL, but for the most part if you think long enough (or Skype Hanselman) you can figure it out.

The key here is to think "Top Down" (please, no flames...to alleviate the "Top Down" reference, I'll use LOL Cats to describe the problem statement):

  1. WTF GIMMEH CART
  2. CART CAN HAZ PRODUCTS?
  3. GIMMEH PRODUCTS (NOM NOM NOM)

If you break it down this way (and not in SQL terms as above), you can begin to see how LINQ might make a query out of this:

Start with the Cart (pretend my cart is ID=75144):

AdventureWorks.DB db=new DB();

var itemQuery = from cartItems in db.SalesOrderDetails
              where cartItems.SalesOrderID == 75144
              select cartItems.ProductID;

Next we need to get the products, but only those that are in the cart. We do this by using our first query, inside the second:

var myProducts = from p in db.Products
                where itemQuery.Contains(p.ProductID)
                select p;

Here is the key to this weirdness:

Linq To Sql only constructs the query when the Enumerator is tripped.

So as whacky as this structure may look, know that what you're doing here is creating a set of Expressions that Linq To Sql is going to parse into a SQL Statement, and it will only execute that statement when you enumerate over the results, or ask it to actually do something with the result set (like Count(), ToList(), etc). So despite how it looks - only one query is being executed.

It might take you 10 different LINQ statements to get what you want - but know that you can nest all of them and only call the database once.

If you've looked over the "101 LINQ Examples" site, you may know this - but I found it really groovy that you can embed anything IQueryable inside of another IQueryable statement (IQueryable is what your "var" is when you do the above query).

Here's the generated SQL for the above query:

SELECT [t0].[ProductID], [t0].[Name], [t0].[ProductNumber], [t0].[MakeFlag], [t0].[FinishedGoodsFlag], 
[t0].[Color], [t0].[SafetyStockLevel], [t0].[ReorderPoint], [t0].[StandardCost], [t0].[ListPrice], 
[t0].[Size], [t0].[SizeUnitMeasureCode], [t0].[WeightUnitMeasureCode], [t0].[Weight], 
[t0].[DaysToManufacture], [t0].[ProductLine], [t0].[Class], [t0].[Style], [t0].[ProductSubcategoryID], 
[t0].[ProductModelID], [t0].[SellStartDate], [t0].[SellEndDate], [t0].[DiscontinuedDate], 
[t0].[rowguid], [t0].[ModifiedDate]
FROM [Production].[Product] AS [t0]
WHERE EXISTS(
    SELECT NULL AS [EMPTY]
    FROM [Sales].[SalesOrderDetail] AS [t1]
    WHERE ([t1].[ProductID] = [t0].[ProductID]) AND ([t1].[SalesOrderID] = @p0)
)

Notice that rather than an "IN" statement, we get a "WHERE EXISTS" - which is just about synonymous with the IN statement. I had a bit of a gag reflex when I saw the "SELECT NULL AS [EMPTY]" but that's simply an empty return set - the SELECT lookup is not interested in returning the record - only that it EXISTS. So in terms of efficiency, this is about as good as it gets.

 

What If IN Didn't EXIST?
I didn't really generate an IN statement - but this guy did and he tipped me off to nesting the query bits. Notice that, in his case, he didn't need to create an IQueryable - he just used an Array. This is where the fun starts with these queries - LINQ is a whole mess of extensions (at it's core) that hang off of IEnumerable. Linq To Sql will (in most cases) parse these expressions out and allow you to work with them in the context of a query.

In other words, I could have written the LINQ query above, like this:

int[] productList = new int[] { 1, 2, 3, 4 };

var myProducts = from p in db.Products
                 where productList.Contains(p.ProductID)
                select p;

And the generated SQL would be:

SELECT [t0].[ProductID], [t0].[Name], [t0].[ProductNumber], [t0].[MakeFlag], [t0].[FinishedGoodsFlag], 
[t0].[Color], [t0].[SafetyStockLevel], [t0].[ReorderPoint], [t0].[StandardCost], [t0].[ListPrice], 
[t0].[Size], [t0].[SizeUnitMeasureCode], [t0].[WeightUnitMeasureCode], [t0].[Weight], [t0].[DaysToManufacture], 
[t0].[ProductLine], [t0].[Class], [t0].[Style], [t0].[ProductSubcategoryID], [t0].[ProductModelID], 
[t0].[SellStartDate], [t0].[SellEndDate], [t0].[DiscontinuedDate], [t0].[rowguid], [t0].[ModifiedDate]
FROM [Production].[Product] AS [t0]
WHERE [t0].[ProductID] IN (@p0, @p1, @p2, @p3)

Hey! Look at that! Something I didn't think was possible actually is!

I hope you're starting to see the pattern here - and that the IN statement is built on the reverse thinking of a SQL statement. In other words you're not saying "confine this result set to this range", it's more of a "use this range to confine the result set" - which is just the kind of thing a programmer might thing, and fits right in with the rest of the LINQ syntax. Sounds subtle - but it's very important when you doing this type of querying to remember that LINQ is a programmatic construct - NOT a SQL construct.

Steve avatar
Steve says:
Wednesday, February 27, 2008
Either those linq guys did this to specifically target this type of 'IN' scenerio and therefore the sql that is produced has a kind of IF developer_wants_to_use_IN then output this SQL.... Or these guys better lock their doors as Sara Conner might be paying them a visit that is some pretty cool compiler AI right there folks. I have to wonder if we need DBA's anymore, that is some nice SQL hmm not that a DBA has ever helped me with any SQL before usually they just change my indexes and cause my nightly batch runs to run 6 hours over.

Richard Bushnell avatar
Richard Bushnell says:
Wednesday, February 27, 2008
Great write-up, Rob. I find the assumption that LINQ is like SQL to be a handicap too sometimes. When I tried to use "group by", for example, I couldn't get my head around it. In case you're interested, I wrote a post up about it at http://richardbushnell.net/index.php/2008/02/08/how-to-use-grouping-in-c-linq-syntax/ I'm sure there are lots of surprises with LINQ-to-SQL for me yet.

Carlos Eduardo Appel Klein - DigitalDesk avatar
Carlos Eduardo Appel Klein - DigitalDesk says:
Wednesday, February 27, 2008
Excellent post Rob. After reading, i think i realy need spend some time to study link.

Marcus McConnell avatar
Marcus McConnell says:
Wednesday, February 27, 2008
"Linq To Sql only constructs the query when the Enumerator is tripped." is such an important point about LINQ. Once I learned this building dynamic queries was easy and made a lot more sense!

Ayende Rahien avatar
Ayende Rahien says:
Wednesday, February 27, 2008
Rob, Now let us try doing something like this: select * from Order where CustomerId in (select Id from Customer where City == "London")

Randal Beckhorn avatar
Randal Beckhorn says:
Wednesday, February 27, 2008
I have found this free tool to be an awesome help with grasping LINQ... http://www.linqpad.net/ Check it out.

Steven Harman avatar
Steven Harman says:
Wednesday, February 27, 2008
As I think Ayende is hinting... the IN only works when the right-side of the expression is another LINQ-to-SQL statement (that will be turned into SQL) or an IQueryable that is sufficiently small. And the real issue is with the second point, you need a sufficiently small collection of in memory items. Why? Because LINQ-to-SQL will create a SQL parameter for each value in the IQueryable (be it an Array, List, etc...) and pass that big list of parameters on to SQL Server as part of the prepared SQL statement. Thats a HUGE problem when that IQueryable starts to get _big_... where big isn't really all that big... around 2100 items is enough. Why 2100? Because its just another magic number, seriously. SQL Server has a limit of 2098 parameters for a prepared statement and if you exceed that limit, SQL Server will raise an out of memory exception and terminate the connection. Meaning... no in-memory collections with more than ~2100 items or your query is gonna puke! For the record, the NHibernate folks ran into the same issue http://forum.hibernate.org/viewtopic.php?p=2369318

Jacob avatar
Jacob says:
Wednesday, February 27, 2008
Nice catch, Rob! I just got done wrestling with this problem and my solution was nowhere near as simple. I'll keep this in mind for the next time I go refactoring.

Michael avatar
Michael says:
Wednesday, February 27, 2008
Rob, very timely for me I was trying to do something with a not in query once I saw this I was able to reverse it to a not contains IQueryable queryAll=from invoiceAll in context.vw_all_invoices select invoiceAll; var results= from i in invoices where !queryAll.Contains(new vw_all_invoice(i.Invoice_Number)) select i; Invoices is a table of invoices to be submitted, this filters out the ones laready submitted, it works great Thanks.

Rob Conery avatar
Rob Conery says:
Wednesday, February 27, 2008
@Ayende: Here ya go: AdventureWorks.DB db=new DB(); var userQry = from c in db.Customers join ca in db.CustomerAddresses on c.CustomerID equals ca.CustomerID join a in db.Addresses on ca.AddressID equals a.AddressID where a.City == "London" select c.CustomerID; var orderQry = from o in db.SalesOrderHeaders where userQry.Contains(o.CustomerID) select o; int orderCount = orderQry.Count(); Assert.IsTrue(orderCount == 756); This works a treat and doesn't nest ORs - it uses the same WHERE EXISTS clause as above. This is AdWorks so note that I had to run a join in the first query. @Steven: Yep - totally dig it. If you pass an int array[] then yes, it can scale in a nasty way. But I would imagine you'd have that problem with or without LINQ right? In other words - if you need to pass values in statically (rather than do an IN (SELECT....) you run the chance, especially in code, of really screwing up your query.

Roger Jennings avatar
Roger Jennings says:
Wednesday, February 27, 2008
Rob, I believe Jim Wooley was the first to suggest the Contains trick in his "Use the new LINQ "Contains" extension method for the SQL "IN" clause" (http://www.devauthority.com/blogs/jwooley/archive/2007/08/06/69922.aspx) of August 06, 2007. At least that's the first instance I've seen. Cheers, --rj

John Walker avatar
John Walker says:
Tuesday, March 25, 2008
I just found out about the IN statement stuff as well. It's still confusing to me. Using your example: int[] productList = new int[] { 1, 2, 3, 4 }; var myProducts = from p in db.Products where productList.Contains(p.ProductID) select p; My brain says that below should be the correct syntax: int[] productList = new int[] { 1, 2, 3, 4 }; var myProducts = from p in db.Products where p.ProductID.Contains(productList) select p; Of course, that doesn't make sense considering that ProductID is an int, but it's just the way my brain has been configured to work until now. ;) Good stuff


Search Me
Index Of MVC Screencasts

You can watch all of the MVC Screencasts up at ASP.NET, and even leave comments if you like.

Subscribe

Popular Posts
 
My Tweets
  • Isn't the Rails/Asshole thing dead? http://tinyurl.com/57dmvx
  • Pushups last night: 17, 13, 9, 7, 3
  • @kevindente my wife (and me) consider the Roomba to be on par with Tivo in terms of generation-defining technology
  • @kevindente I'll hold you and we can cry together. Maybe you can ... even ... blog about it.
  • Writing tests for InventoryService - talk about a slipper-slope process! Is there such a thing as Cart Concurrency? I dunno! Maybe?
  About Me



Hi! My name is Rob Conery and I work at Microsoft on the ASP.NET team. I am the Creator of SubSonic and was the Chief Architect of the Commerce Starter Kit (a free, Open Source eCommerce platform for .NET)

I live in Kauai, HI with my family, and when my clients aren't looking, I sometimes write things on my blog (giving away secrets of incalculable value).