Hanalei, Hawaii 9/2/2010
438 Posts and Counting

LINQ: Understanding LINQ, Vars, and Lambdas

Friday, August 24, 2007 - I've been working lately with LINQ and LinqToSql in an effort to see how SubSonic can "play nice" with the emerging framework. In my previous posts I discussed the DataContext in LinqToSql and what it means from an architectural point of view. I also discussed ideas for abstracting the framework a bit. In this post I want to go a little deeper with LINQ as a language, and see what's under the hood of the technology, and get to know the players a bit. Dramatis Personae There are 3 major elements to understanding a LINQ query, and it pays to take a second to get to the know them VERY well, since understanding them will allow you to work with the language much easier (because LINQ is very much a language in and of itself). These guys are:

  1. The new var keyword
  2. System.Data.Linq.Table<> and
  3. The Lambda Expression
Consider this LINQ query as an example:
            Northwind.NorthwindContext db = new NorthwindContext(myConnectionString);
            var result =
                  from products in db.Products
                  select products;
I already dived into the DataContext - this is the thing that works up a connection to your DB, and then keeps track of all the changes you make during your context "session". Let's take a look at the other stuff though... Fashionably Late Typing With VAR This is a new construct for C# and you can think of this a "Late-typed". A lot of people (self included) thought of this as "variant" - sort of a "bucket type" - but that's not accurate. All that var does is let you defer typing until you know what it is you're working with. It's not the same as the Object - it's meant to be worked with as a typed value. Consider this:
    var myDateVar=DateTime.Now;

    object myDateOb=DateTime.Now;
These are both valid assignments, but if you try to work with myDateOb as an object...
    myDate=myDateOb.AddDays(4);
... you'll get a compiler error because it's an object, not a DateTime object. If you do the same with myDateVar:
    myDate=myDateVar.AddDays(4);
Everything works out fine. Why do this? Because of LINQ! There's a gap when working with a query language like LINQ - what do I do with my results? Using something like a Reader is nice - but that doesn't allow you to use LINQ on non-DB queries (like querying over an array of strings let's say). DataTables are nice too but they don't have any kind of type-referencing either. By using a var, you can return a result set and work directly with it as a newly-typed object. So if we rewrite the query above, we can iterate over the result set using the same var keyword: var_query There are all kinds of great applications of late-typing that allow you to work in a "duck-typing" way with a degree of freedom from compiler pain. Duck-typing (working with compiler-inferred types) is more a style of programming than it is a technology. I won't go into that here, but there should be some recognition that introducing this "loosey goosey" way of programming into a pretty strict language set (C#) can cause some headaches. This, more than anything, is a good reason to step out and learn a new language on the weekend (like Ruby!) that does this kind of thing so you can see for yourself the goods and bads from a learned perspective (which is always important). System.Data.Linq.Table<> When working with a LINQ statement, you'll be running queries against a representative "table" that derives from the type System.Data.Linq.Table<>. The type that fits itself in that Generic Bikini <> is defined by your from clause (and you can have more than one). So in our example query, this would be DataContext.Products. Since we used LinqToSql to query the database, LINQ knows that when we say "from db.Products" we're talking about a System.Data.Linq.Table<Product>. In fact
the var result set of a LINQ query is a System.Data.Linq.Table<whatever your select statement is>. This is KEY to understanding the way LINQ works.
There are some key methods in this class, and they are:
  • Single() - returns a single record according to the passed-in Lambda expression (more below)
  • Where() - returns a subset of records according to a Where()
  • Count() - returns the count of the rows in the System.Data.Linq.Table<>
  • ToList<>() - sends the results to a typed list
  • ToArray<>() - sends the results to a typed array
What's even more important with this class is that it's full of query methods that allow you to do things like
  • Aggregate()
  • Sum()
  • GroupBy()
  • Any()
  • Average()
  • Contains()
and so on. This might seem confusing - why in the world would you want to run a query then Aggregate (or GroupBy etc) on the result set? The answer is that System.Data.Linq.Table<> not only holds the result, it also constructs a LINQ query! So you can use these methods easily with LINQ:
        Northwind.NorthwindContext db = new NorthwindContext();
        decimal? result =db.Products.Sum(p => p.UnitPrice);
Notice here that I didn't need to use a var (though I could have if I wanted). Since db.Products is a System.Data.Linq.Table<Product> - I can just query it and it goes and gets the data. Many of these methods return IEnumerable or IQueryable objects (which you can think of as System.Data.Linq.Table<> constructs), which you can append onto your query as needed. So if you wanted to use Where() instead of the build in where clause, you could:
        Northwind.NorthwindContext db = new NorthwindContext();
        var result =
              from products in db.Products.Where(p => p.CategoryID==5)
              select products;
I'll go more into these aggregate and advanced queries down below. First we need to figure out just what this Lambda thing is we're working with. Method Shorthand With Lambda Let's jump right to it - Lambda expressions are methods in and of themselves, and the whole idea to them is that they don't need to be typed - they infer their types. A lot of people call them Anonymous Methods which is partially true - they're actually anonymous method degate shorthand (huh?).
The syntax for a Lambda expression is: parameter(s) => function.
No types are needed - they are inferred from the parameters (Left Side) of the Lambda by their use in the Right Side To fully get this, let's write a simple method to return a square of a number, and then translate that to a Lambda:
        decimal GetSquare(decimal number){
            decimal result=number*number;
            return result;
        }
That's not that much code is it? It could be written nicely in Lambda form however like this: x => x * x; In this Lambda we're saying "take this parameter, x, and shove it through this function x * x". This statement is key to understanding why Lambdas are a big deal to LINQ:
Lambdas can be parsed into a Left/Right expression (Binary) for use in LINQ Expression Trees
Understanding this concept, you can see just how powerful the LINQ query language can be by embedding methods and expressions withing methods and expressions. Another note on Lambdas - you might see the term "Func" all over the intellisence prompter when running up queries - this is an "Anonymous Delegate" declaration - basically a definition holder for a Lambda expression. So to fully write out the Lambda expression above in 3.5 code, we'd use:
    Func<decimal, decimal> = x => x*x;
Then to use it, you'd write:
    decimal root=f1(4);
Now we're getting to the meat of it all. When you're working with a LINQ query and you specify a "Where" (for example), you're asked to pass in a "Func<Type,bool>" - this is an Anonymous Delegate that gets defined at runtime by you. In other words, you get to specify to a pretty detailed level what you want to see. So consider this:
    Northwind.NorthwindContext db = new NorthwindContext();
    List<Northwind.Data.Product> result=db.Products    .Where(p => p.ProductID >20 && p.CategoryID ==5 && p.Deleted ==false).ToList<Product>();
In this expression we're passing in the parameter Product, and then a boolean expression - which is precisely the "Func" signature you see in intellisense: "Func<Product, boolean>". You can see how this type of "anonymous expression generation" can come in very handy! This expression can be shorthanded by using the where clause:
    var result =
    from products in db.Products
    where products.ProductID > 20 && products.CategoryID == 5 && products.Deleted == false
    select products;
In fact you can now see that LINQ is essentially two things: a massive expression parser and a set of Extension Methods. The where clause above is simply an extension method for creating the same Lambda statement above it. Bugs In The Butter In working up all the examples for this post I decided I was going to do some cool rollups and joins. Turns out that it *seems* to be broken. I've communicated what I found to the MS team and they're looking into it. To illustrate, let's take a look at a simple join statement on a rollup query:
    var result =
          from products in db.Products
          join orderItems in db.Order_Details on products.ProductID equals orderItems.ProductID into joined
          select new {products.ProductName, OrderTotal=joined.Sum(t => t.UnitPrice*t.Quantity)};
In this query, I'm joining the Order Details table (shows renamed properly with an underscore :p ) into a simple table called "joined". Simple enough so far. Next I'm returning a result set with the name of the product and a total for all orders on it. Notice how Sum() takes a Lambda? Remembering from discussion above you can dissect this Lambda by now (I hope) and see that it can be transalated to "Func(Type, decimal)". Our left side (the parameter) is the type Product, and the right side is our decimal: UnitPrice * Quantity. The important thing here is the right side: it's type "decimal", not nullable type "decimal?". That should be OK right? Well it's not unfortunately. Running up the debugger you can see the SQL that's created for this query:
SELECT [t0].[ProductName], (
    SELECT SUM([t1].[UnitPrice] * (CONVERT(Decimal(29,4),[t1].[Quantity])))
    FROM [dbo].[Order Details] AS [t1]
    WHERE [t0].[ProductID] = [t1].[ProductID]
    ) AS [value]
FROM [dbo].[Products] AS [t0]
A nested SELECT? Dude where's my JOIN? Before I get my crank on - note the intelligent CONVERT. Since Quantity is an int, the multiplication with UnitPrice in SQL Server would have resulted in an int result - not what we want at all. LINQ is smart enough to convert this for us so we don't need to stress on it :). But it's this conversion that's also causing us a problem - but in the code this time. Since LINQ is using a nested SELECT here without a JOIN, it's effectively doing a LEFT OUTER - returning ALL products and then rolling up stats. Shouldn't "join" be an INNER by default since that's what it is in SQL Server? What if I have some products that don't have orders? The answer: Runtime Crash.

To see this more clearly, add a record to your Products table and then run the above LINQ query - the error you get is

"The null value cannot be assigned to a member with type System.Decimal which is a non-nullable value type."

Hmmm - two things are wrong with this. Why isn't it a nullable type? There could be situations where I'm Summing and I have null data (like right now)!

Also - why is LINQ doing a nested SELECT? It might not be a bug - but it's something to be aware of.

Sum()ming It All Up
Hopefully understanding the major players with LINQ will help you to see how you can use LINQ more effectively, and also when NOT to use LINQ. In the tests I've done,  I've found that writing up more advanced summing/statistical queries is much easier in SQL though I'm sure that there might be some exceptions.

Also, the error I received reminds me that LINQ, above all, is a translation tool and at some point my mind isn't going to work like those in Redmond (for better or for worse, til death do us part). Sometimes it's better to just use SQL (view or SP), which is something I say a lot with SubSonic :).

Would love to hear your thoughts, and if you're an MS/LINQ person, I'd love to hear your take on the exception above.

Related


Gravatar
Michael Giagnocavo - Friday, August 31, 2007 - LINQ is *so* much more than just a SQL helping tool. In fact, you could remove SQL from the picture entirely and still be left with a very useful set of tools. LINQ is just the limited inclusion of functional concepts in C#. C#'s starting to mature. Other languages wouldn't have to add features to represent the LINQ feature set. C# had to go add them (in a very limited manner): - More type inference - Expressions - Function composition operators (Extension methods are just a terribly limited form) - A sort of tuples (anon types) Once people get those concepts, LINQ to SQL naturally flows out. Additionally, you can apply these concepts to all sorts of C# programming, not just data access. (See my site; I have several examples).
Gravatar
Zack Owens - Saturday, August 25, 2007 - step out and learn a new language on the weekend (like Ruby!) Like VB :)
Gravatar
Rob Conery - Saturday, August 25, 2007 - Homey I'm tryin to FORGET VB and all the years I sweated over it's ridiculous syntax! :)
Gravatar
6 Links Today (2007-08-25) - Saturday, August 25, 2007 - [...] LINQ: Understanding LINQ, Vars, and Lambdas [...]
Gravatar
Rick Strahl - Saturday, August 25, 2007 - I guess the issue of SQL Parsing and possible inability to express some SQL as LINQ is just the thing that scares me about LINQ to SQL: It's a black box and it has a tremendously complex task of creating SQL from essentially structured language statement. This is a very complex task vastly more complex than parsing straight SQL statements as a database engine has to especially because it's not a 1 - 1 map of functionality. I think it's actually pretty cool how well it does work from what I've seen - it appears pretty damn smart, but I suspect there are going to be some scenarios where the parser missteps or - probably more common - SQL queries that can simply not express using LINQ syntax (or are hideously complex to express as LINQ queries - there seem to be lots of examples of this out there even in the LINQ to SQL documentation). The question is what happens in that scenario? Your choices are (thinking out loud): Create a view or stored procedure that feeds the data and add to the model (ok if you can do that) or bypass LINQ to SQL and run the query with straight ADO.NET and basically break the model. Neither scenario sounds great especially the latter where you end up with inconsistent data access mechanisms. It feels like oh so close, but yet oh so far...
Gravatar
Bill M. - Sunday, August 26, 2007 - step out and learn a new language on the weekend (like Ruby!) I would be interested in reading anything you had to say about F#, assuming you had the chance to look at it. Ugly syntax -- I think -- but interesting functionality (there's a pun there somewhere). Of course not sure people are going to use it.
Gravatar
John S. - Sunday, August 26, 2007 - Whoa! All my comments just went through from earlier tonight...sorry for the dupes.
Gravatar
Scott Williams - Monday, August 27, 2007 - This is all very interesting, but, six months or a year from now where does this leave SubSonic? Will they be compatible partners or jealous rivals? I have made a commitment to SS and I hope it doesn't just get discarded like yesterday's leftovers when something new comes along!
Gravatar
ScottGu - Monday, August 27, 2007 - To address Rick's question above - you can actually drop down and use raw SQL expressions with LINQ to SQL. I just blogged about how to-do this here: http://weblogs.asp.net/scottgu/archive/2007/08/27/linq-to-sql-part-8-executing-custom-sql-expressions.aspx Hope this helps, Scott
Gravatar
Rob Conery - Monday, August 27, 2007 - @Scott Williams- The first sentence says it all, doesn't it? SubSonic's not going anywhere - and my main goal is to figure out how to make it work WITH the new technologies in .NET 3.x. We're not in a competitive space here, nor do I want to be. I don't mind offering alternatives for those who want something simpler, but I also want to make sure we embrace what's there.
Gravatar
Rob Conery - Monday, September 03, 2007 - @Michael: C#'s starting to mature? C# is a superset of C - not sure what you mean here. In terms of "Type Inference" - there's not such thing in C# - it's a typed language and always will be. Expressions are LINQ and tuples are just sugar for on the fly class declaration. In terms of "all sorts" - examples?
Gravatar
Michael Giagnocavo - Tuesday, September 04, 2007 - Oh, as a quick example of how C# 3 goes beyond LINQ, check out lambda function blocks (something VB doesn't have). Demonstrate how you can use a lambda function block with a LINQ to SQL query.
Gravatar
Rob Conery » My Personal Lambda Crusade - Monday, March 17, 2008 - [...] I wrote a bunch about them here [...]
Gravatar
Mischa Kroon - Tuesday, March 18, 2008 - I'm seeing a broken image link ?
url: http://spook.wekeroad.com/files/2007/08/var-query.gif