Hanalei, Hawaii Tuesday, February 09, 2010

Creating IN Queries With Linq To Sql

Props on this one go to Scott Hanselman who pulled me back from the edge of the cliff last night. I was particularly distraught in getting a MIX demo together where I had to do some queries using LINQ, and I couldn't for the life of me figure out how to fashion an IN query!

Props on this one go to Scott Hanselman who pulled me back from the edge of the cliff last night. I was particularly distraught in getting a MIX demo together where I had to do some queries using LINQ, and I couldn't for the life of me figure out how to fashion an IN query! With Scott's help (and patience) I figured it out, and thought I should blog for my own reference, at least.

 

It Depends On What Your Definition of "IN" Is...
An IN query will pull back a set of results from SQL that is within a given range. This range can be set manually, or can itself be a query. So if you have an eCommerce application and you want to know what products you have in a given user's cart, you could do this (using AdventureWorks):

SELECT * FROM Production.Product WHERE ProductID IN (SELECT ProductID FROM Sales.ShoppingCartItem WHERE ShoppingCartID='RobsCart')

This will return all the Product records that are in my cart. This is a fundamental query structure and up until today I thought, for sure, that Linq To Sql doesn't support it. I was sort of right - but not really.

 

LINQ Is People To (Or It's Made From People...)
It's important to remember that the people that made LINQ were trying to approximate a "SQL within Code" sort of thing - this means that they built LINQ to query just about anything, and also built a SQL Translator called Linq To Sql. They ran into limitations with trying to contort a static language structure (VB or C#) into SQL, but for the most part if you think long enough (or Skype Hanselman) you can figure it out.

The key here is to think "Top Down" (please, no flames...to alleviate the "Top Down" reference, I'll use LOL Cats to describe the problem statement):

  1. WTF GIMMEH CART
  2. CART CAN HAZ PRODUCTS?
  3. GIMMEH PRODUCTS (NOM NOM NOM)

If you break it down this way (and not in SQL terms as above), you can begin to see how LINQ might make a query out of this:

Start with the Cart (pretend my cart is ID=75144):

AdventureWorks.DB db=new DB();

var itemQuery = from cartItems in db.SalesOrderDetails
              where cartItems.SalesOrderID == 75144
              select cartItems.ProductID;

Next we need to get the products, but only those that are in the cart. We do this by using our first query, inside the second:

var myProducts = from p in db.Products
                where itemQuery.Contains(p.ProductID)
                select p;

Here is the key to this weirdness:

Linq To Sql only constructs the query when the Enumerator is tripped.

So as whacky as this structure may look, know that what you're doing here is creating a set of Expressions that Linq To Sql is going to parse into a SQL Statement, and it will only execute that statement when you enumerate over the results, or ask it to actually do something with the result set (like Count(), ToList(), etc). So despite how it looks - only one query is being executed.

It might take you 10 different LINQ statements to get what you want - but know that you can nest all of them and only call the database once.

If you've looked over the "101 LINQ Examples" site, you may know this - but I found it really groovy that you can embed anything IQueryable inside of another IQueryable statement (IQueryable is what your "var" is when you do the above query).

Here's the generated SQL for the above query:

SELECT [t0].[ProductID], [t0].[Name], [t0].[ProductNumber], [t0].[MakeFlag], [t0].[FinishedGoodsFlag], 
[t0].[Color], [t0].[SafetyStockLevel], [t0].[ReorderPoint], [t0].[StandardCost], [t0].[ListPrice], 
[t0].[Size], [t0].[SizeUnitMeasureCode], [t0].[WeightUnitMeasureCode], [t0].[Weight], 
[t0].[DaysToManufacture], [t0].[ProductLine], [t0].[Class], [t0].[Style], [t0].[ProductSubcategoryID], 
[t0].[ProductModelID], [t0].[SellStartDate], [t0].[SellEndDate], [t0].[DiscontinuedDate], 
[t0].[rowguid], [t0].[ModifiedDate]
FROM [Production].[Product] AS [t0]
WHERE EXISTS(
    SELECT NULL AS [EMPTY]
    FROM [Sales].[SalesOrderDetail] AS [t1]
    WHERE ([t1].[ProductID] = [t0].[ProductID]) AND ([t1].[SalesOrderID] = @p0)
)

Notice that rather than an "IN" statement, we get a "WHERE EXISTS" - which is just about synonymous with the IN statement. I had a bit of a gag reflex when I saw the "SELECT NULL AS [EMPTY]" but that's simply an empty return set - the SELECT lookup is not interested in returning the record - only that it EXISTS. So in terms of efficiency, this is about as good as it gets.

 

What If IN Didn't EXIST?
I didn't really generate an IN statement - but this guy did and he tipped me off to nesting the query bits. Notice that, in his case, he didn't need to create an IQueryable - he just used an Array. This is where the fun starts with these queries - LINQ is a whole mess of extensions (at it's core) that hang off of IEnumerable. Linq To Sql will (in most cases) parse these expressions out and allow you to work with them in the context of a query.

In other words, I could have written the LINQ query above, like this:

int[] productList = new int[] { 1, 2, 3, 4 };

var myProducts = from p in db.Products
                 where productList.Contains(p.ProductID)
                select p;

And the generated SQL would be:

SELECT [t0].[ProductID], [t0].[Name], [t0].[ProductNumber], [t0].[MakeFlag], [t0].[FinishedGoodsFlag], 
[t0].[Color], [t0].[SafetyStockLevel], [t0].[ReorderPoint], [t0].[StandardCost], [t0].[ListPrice], 
[t0].[Size], [t0].[SizeUnitMeasureCode], [t0].[WeightUnitMeasureCode], [t0].[Weight], [t0].[DaysToManufacture], 
[t0].[ProductLine], [t0].[Class], [t0].[Style], [t0].[ProductSubcategoryID], [t0].[ProductModelID], 
[t0].[SellStartDate], [t0].[SellEndDate], [t0].[DiscontinuedDate], [t0].[rowguid], [t0].[ModifiedDate]
FROM [Production].[Product] AS [t0]
WHERE [t0].[ProductID] IN (@p0, @p1, @p2, @p3)

Hey! Look at that! Something I didn't think was possible actually is!

I hope you're starting to see the pattern here - and that the IN statement is built on the reverse thinking of a SQL statement. In other words you're not saying "confine this result set to this range", it's more of a "use this range to confine the result set" - which is just the kind of thing a programmer might thing, and fits right in with the rest of the LINQ syntax. Sounds subtle - but it's very important when you doing this type of querying to remember that LINQ is a programmatic construct - NOT a SQL construct.


Steve - February 27, 2008 - Either those linq guys did this to specifically target this type of 'IN' scenerio and therefore the sql that is produced has a kind of IF developer_wants_to_use_IN then output this SQL....
Or these guys better lock their doors as Sara Conner might be paying them a visit that is some pretty cool compiler AI right there folks. I have to wonder if we need DBA's anymore, that is some nice SQL hmm not that a DBA has ever helped me with any SQL before usually they just change my indexes and cause my nightly batch runs to run 6 hours over.
Richard Bushnell - February 27, 2008 - Great write-up, Rob. I find the assumption that LINQ is like SQL to be a handicap too sometimes. When I tried to use "group by", for example, I couldn't get my head around it. In case you're interested, I wrote a post up about it at http://richardbushnell.net/index.php/2008/02/08/how-to-use-grouping-in-c-linq-syntax/ I'm sure there are lots of surprises with LINQ-to-SQL for me yet.
Carlos Eduardo Appel Klein - DigitalDesk - February 27, 2008 - Excellent post Rob. After reading, i think i realy need spend some time to study link.
Marcus McConnell - February 27, 2008 - "Linq To Sql only constructs the query when the Enumerator is tripped." is such an important point about LINQ. Once I learned this building dynamic queries was easy and made a lot more sense!
Ayende Rahien - February 27, 2008 - Rob,
Now let us try doing something like this:

select * from Order
where CustomerId in (select Id from Customer where City == "London")
Randal Beckhorn - February 27, 2008 - I have found this free tool to be an awesome help with grasping LINQ...

http://www.linqpad.net/

Check it out.
Jacob - February 27, 2008 - Nice catch, Rob! I just got done wrestling with this problem and my solution was nowhere near as simple. I'll keep this in mind for the next time I go refactoring.
Michael - February 27, 2008 - Rob, very timely for me I was trying to do something with a not in query once I saw this I was able to reverse it to a not contains IQueryable queryAll=from invoiceAll in context.vw_all_invoices select invoiceAll; var results= from i in invoices where !queryAll.Contains(new vw_all_invoice(i.Invoice_Number)) select i; Invoices is a table of invoices to be submitted, this filters out the ones laready submitted, it works great Thanks.
Rob Conery - February 27, 2008 - @Ayende: Here ya go: AdventureWorks.DB db=new DB(); var userQry = from c in db.Customers join ca in db.CustomerAddresses on c.CustomerID equals ca.CustomerID join a in db.Addresses on ca.AddressID equals a.AddressID where a.City == "London" select c.CustomerID; var orderQry = from o in db.SalesOrderHeaders where userQry.Contains(o.CustomerID) select o; int orderCount = orderQry.Count(); Assert.IsTrue(orderCount == 756); This works a treat and doesn't nest ORs - it uses the same WHERE EXISTS clause as above. This is AdWorks so note that I had to run a join in the first query. @Steven: Yep - totally dig it. If you pass an int array[] then yes, it can scale in a nasty way. But I would imagine you'd have that problem with or without LINQ right? In other words - if you need to pass values in statically (rather than do an IN (SELECT....) you run the chance, especially in code, of really screwing up your query.
Roger Jennings - February 27, 2008 - Rob,

I believe Jim Wooley was the first to suggest the Contains trick in his "Use the new LINQ "Contains" extension method for the SQL "IN" clause" (http://www.devauthority.com/blogs/jwooley/archive/2007/08/06/69922.aspx) of August 06, 2007.

At least that's the first instance I've seen.

Cheers,

--rj
John Walker - March 25, 2008 - I just found out about the IN statement stuff as well. It's still confusing to me. Using your example: int[] productList = new int[] { 1, 2, 3, 4 }; var myProducts = from p in db.Products where productList.Contains(p.ProductID) select p; My brain says that below should be the correct syntax: int[] productList = new int[] { 1, 2, 3, 4 }; var myProducts = from p in db.Products where p.ProductID.Contains(productList) select p; Of course, that doesn't make sense considering that ProductID is an int, but it's just the way my brain has been configured to work until now. ;) Good stuff
ollie - May 14, 2009 - Thanks, You've met by daily fix of linq.
Peter Kellner - June 2, 2009 - And, don't forget that you can't pass a list of items into a compiled query, and we know how badly we want to compile everything! http://peterkellner.net/2009/05/06/linq-to-sql-sl...
DoYouKnow.IN - August 20, 2009 - Hi All

I want to search Element from List of string in any part of string from DB field... any one has idea.

string[] productList = new string[] { "aaa","bbb" , "ccc", "dddd" };

var myProducts = from p in db.Products
where p.ProductName ****(Any part of product name having any word from productlist)****
select p;

Any Idea???
amarax - September 7, 2009 - Unfortunately this syntax in LINQ to Entities doesn't work
var q1 = from n in context.Phone
where phones.Keys.Contains(n.Number) &&
n.Person.PersonId != PersonId
select n;
Execution error:
LINQ to Entities does not recognize the method 'Boolean Contains[String](System.Collections.Generic.IEnumerable`1[System.String], System.String)' method, and this method cannot be translated into a store expression.
Rob Conery - September 7, 2009 - Yep - won't work on strings.
arro - October 18, 2009 - This does not work for me, I'm pretty new to LINQ and actually starting to have doubts.

var entusrs = from eu in db.EntityUsers
where eu.User == SessionHandler.Handler.User.ID
select eu.Entity;

var ents = from e in db.Entities
where (entusrs.Contains(e.ID))
select e;

Also tried this...

var entusrs = from eu in db.EntityUsers
where eu.User == SessionHandler.Handler.User.ID
select new { ID = ue.Entity.Value };

var ents = from e in db.Entities
where (entusrs.Contains(e.ID))
select e;

both Entity and ID are of type int

Any ideas ?
arro - October 20, 2009 - Hi, I had to do a ToList() for it to work

var myProducts = from p in db.Products
where itemQuery.ToList().Contains(p.ProductID)
select p;
Gecko