Recently I have been asked by a client to use LINQ for some interesting stuff (which I'll blog about later), and one of the things I needed to do was to use it to query a REST service. Step 1 was to query Amazon (since there is an existing sample out there on how to do this), and I just wrapped up that part of it and thought I would share with you how I dove nose-first into refactoring the existing sample to work with .NET 3.5, and also some of the neat stuff I learned along the way. Many thanks to Fabrice Marguerie, who's code and work this is based on. Check out his forthcoming book, Linq In Action.
This is a really long post, but I hope it's worth it. LINQ is a very cool technology, and I'll get this out of the way now: I have no plans with SubSonic and LINQ :).
What's The Deal With LINQ?
Simply put, LINQ (Language INtegrated Query) is a set of language extensions to C# that allow you to build a set of expressions aimed at "querying" a set of data. This can be data from SQL Server, Flickr, Amazon, Are You Hot Or Not?, or Atwood's stack of Blog Comments. In short, it can be anything. You define the parameters, and then you handle the actual build of the expression during execution - so it's all within your control.
Once you grasp how the thing works, you can leverage it in LOTS of nice ways. It's very extensible since the LINQ call itself is unhooked completely from the source, so all you need to update is your Query Class (which we'll get to in a minute) and you can unhinge your existing data source and move it completely without touching any lines of client code. Ayende would be stoked.
Seeing The Forest Through The Expression Trees
The core concept behind LINQ is the ExpressionTree. Everything about LINQ, in one way or another, is an Expression: Binary, Unary, MethodCall, Lambda, AndAlso... they're all expressions and they can be parsed and negotiated and traversed. It's not terribly simple to do this (as of yet) and requires some recursive programming savvy, but for fairly simple matters you can walk the trees pretty well.
So the best way to imagine what an expression looks like is to take a look at this example (from Fabrice's post):
Expression.Lambda<Func<Book, Boolean>>(
Expression.AndAlso(
Expression.AndAlso(
Expression.AndAlso(
Nests within nests within Generic calls... well it's a bit daunting. So to try and clear it up, I'll show you what I did to refactor the original LinqToAmazon code (I had to do this to get it to work with March 2007 CTP of Orcas), that I'll write more about in a forthcoming post.
First: What Are We Doing?
We're going to define an object model, use LINQ to query that object model, and have the object model load itself from an Amazon Web Service call according to the LINQ query parameters. We're NOT going to load up a ton of XML from Amazon and then query over it - that would be silly. In this case we're actually going to build a REST call to Amazon using LINQ and our AmazonItem object, and then deserialize the returned XML into a List<> for consumption.
The Cast
Here are the classes and what we'll be using:
- AmazonItem: The class that defines the object we're querying. In our case it's a book (see below)
- AmazonServiceCall: A queryable class that interfaces directly with LINQ (and our client app), that helps to build out our Query. Think of this as the BLL or Facade.
- AmazonQuery: The class that handles the parsing of the LINQ expression and handles the call to the data source. This is essentially your DAL.
I've followed the way Fabrice started this project, and there might be room for improvement here but I found this to be a good structure for what I needed to do.
Step 1) Creating the object model.
This is a simple class for representing an item from Amazon. In keeping with the original demo, I'm going to keep it to books - but part of the refactoring is to allow querying of all types of products at Amazon. Here's our object (using the special sugary notation for property declarations that's new with C# 3.0):
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; namespace Amazon.ServiceQuery { public class AmazonItem { public string Title { get; set; } public string ASIN { get; set; } public string ISBN { get; set; } public string Publisher { get; set; } public List<string> EditorialReviews { get; set; } public string Author { get; set; } public List<string> Images { get; set; } public int SalesRank { get; set; } public string DetailUrl { get; set; } public decimal Price { get; set; } public DateTime DatePublished { get; set; } public BookCondition Condition { get; set; }
I should note here that Amazon returns a TON of information with it's service calls - this is just a small subset of the data but it's really all we need for this example.
Step 2) Define the Query Builder (AmazonServiceQuery): Hello IQueryable<>
This is the service class that's going to build out our actual Query class (AmazonQuery, that in turn is going to talk to Amazon for us). This class is the one that LINQ talks to when we want our data.
It's at this point you need to get acquainted with IQueryable<>. This interface essentially tells LINQ (and .NET) that it's a class that you can... well... query against using LINQ. It will ask you to define some members and many of these you can actually choose NOT to implement. I'm going to focus on the two main methods you'll need to define for now (see the source for the full explanation): CreateQuery<>() and Execute<>().
CreateQuery() is called when you actually write your LINQ Query:
// Here is a simpler query that returns something like ten books var query = from book in new AmazonServiceCall() where book.Title == "Hyperion" && book.Condition == BookCondition.All && book.Price < 15 select book;
In this syntax, notice that we're using an old friend from VBScript days - the "var" keyword. You can think of this a general bucket declaration like "variant" - but that's not really what it is. Var is essentially a "DuckType" type - if that makes any sense at all. For those who don't know what Duck Typing is - it's essentially you, as the programmer, telling the runtime engine (or compiler) that you don't want to figure out what type of variable you're working with - it needs to do it. The compiler then takes a look at the value of the variable, and says to itself
"well, if it walks like a duck and talks like a duck... well then I spose it must be a duck". This is also known as "deferred typing".
Anyway - using this allows us to work with the query and not know it's return type, which is essential for this operation since we don't really care about what type the Query is - we just want its result set. More on that later.
When the compiler hits this section of code, CreateQuery() goes off and essentially "parses" this big expression into tons of little ones. Here's the code for CreateQuery<>():
public IQueryable<S> CreateQuery<S>(System.Linq.Expressions.Expression expression) { if (typeof(S) != typeof(AmazonItem)) throw new Exception("Only " + typeof(AmazonItem).FullName + " objects are supported."); return (IQueryable<S>)new AmazonQuery(expression); }
If you haven't noticed yet, there's a lot of generics at work here. In this case it's a great use of them. since in the LINQ call above, LINQ didn't know what it was dealing with in terms of a Type. The CreateQuery method does, however, and with a quick validation of the type to make sure we're using AmazonItem, we pass the call off to the meat of the application here: AmazonQuery() and also pass in the LINQ expression to the constructor (more on this class below).
You're Losing Me...
Think of it this way: it's Friday and you're about to go out with your friends to a local bar. You're buddies are good friends that you go out with a lot, and they know what you like to drink on what occasions. So you get to the bar and you tell your buddy Linq to get you something to drink that's cold and tasty. He says "sure - be right back".
The thing is, Linq knows what you want to drink because he's known you for a long time, and you've basically told him before you like a Makers Rocks for the first round (with a water back) and Mirror Pond Pales thereafter.
So back he comes, drinks in hand, and you're off to another hangover.
The thing about it is that it doesn't matter what bar you go to - you're buddy Linq will always know what you want to drink, at what point. You're a creature of habit - like a compiler - predictable and sometimes fun to hang with.
In this same way, we've defined what types we are going to use in the AmazonServiceCall() class in the very declaration:
public class AmazonServiceCall: IQueryable<AmazonItem> {
There it is! This class is IQueryable for AmazonItems! Now when our buddy Linq has to go fetch us some books, he'll know (when we ask for them) that we want books!
Get On With It
The other thing that CreateQuery() does is to pass in the expression you've defined - in this case it's
where book.Title == "Hyperion" && book.Condition ==
BookCondition.All && book.Price < 15
This where expression is parsed into a LambdaExpression. If you're not up on Lambda Expressions - they are, in summary form, shorthand for defining functions. There's a great write up here on the history of them and why they're used in functional programming. I suggest you read it if you're not up on Lambdas. Here's what that expression comes out to when parsed:
book => (((book.Title = "Hyperion") && (Convert(book.Condition) = 0)) && (book.Price < 15)))
This Lambda is saying, basically, that we're going to define the variable "book" as the thing that is defined by the title being "Hyperion" (my favorite book of all time), the condition being "all" (defined as 0 in the enum), and the price being less than 15.
Parsing The Lambdas
Now we have our Query object defined, and we've used our AmazonItem class to help us write out the query, which is nice and clean. Now we need to execute it and get something back. That's done easily using this line of code:
AmazonItem[] books = query.ToArray();
First, remember that when we called CreateQuery() above the return was the type AmazonQuery(). This is important as now the execution is handed to this Query class, which again is of type IQueryable<AmazonItem>.
When we call "ToArray()" on our query object, we're asking the class to enumerate itself. When this class was created, we passed the expression in as part of the constructor and stored it in the _Expression property so we could parse it later. This is really important, since this expression holds all the where information for us.
Now, all we need to do is to fill in the body of GetEnumerator for the AmazonQuery class, test the expressions, and we can form up our query.
Back Into The Trees
This part's not going to be easy - we're going to have to work our way through the expression tree set contained in the _Expression variable, and right now it's not very straightforward in Beta 1. I'll do my best to explain this stuff and hopefully something will stick in your mind...
When we call "ToArray()" on AmazonQuery, the internal method "GetEnumerator()" is kicked off. This is your queue to jump in and make something happen. In our case we want to get at that Lambda Expression and parse it so we know what to tell Amazon. The first thing we need do is walk the ExpressionTree path.
The first Expression in the tree is always (well, now at least) a MethodCallExpression. This basically means that we're executing a Method to get a value (I think - I'm not really sure the reasoning here). There are 2 arguments to this MethodCallExpression:
- Argument[0] - it's always the value of the overall expression call, and we don't need to access this - it's a return type basically
- Argument[1] - this is where our Where is defined, and we want to access this cause it has our Lambdas
Remember - EVERYTHING IS AN EXPRESSION so as you hop from one to the other (in every direction), you have to cast to the appropriate expression in order to get anything out. In our case, we need to cast Argument[1] to a UnaryExpression (single-sided) since it's the result of the Where expression set (single value). We then need to access the Operand (the thing that defines the operational value of the UnaryExpression) and set that to type LambdaExpression. Here's the code:
/// <summary>
/// The main execution method
/// </summary>
/// <returns></returns>
IEnumerator IEnumerable.GetEnumerator() {MethodCallExpression methodCall; methodCall = _Expression as MethodCallExpression; UnaryExpression xp = (UnaryExpression)methodCall.Arguments[1]; LambdaExpression lamby = (LambdaExpression)xp.Operand; ProcessExpression(lamby);
Now that we have our Where Lambdas, we can parse them out and structure up our query. I'll speed it up at this point - I know this post is getting long. Next, we'll dive into the line at the end there that says "ProcessExpression" and see what it does...
Recursive Expression Parsing
This is where knowledge of recursive programming will help out. A Lambda can be nested to umpteen levels (and why they're so nice for LINQ). In this case I'm going to use on of Fabrice's original methods (which I modified slightly) to recursively parse the Lambda set: ProcessExpression():
/// <summary> /// Process the passed-in LINQ expression /// </summary> /// <param name="expression"></param> private void ProcessExpression(Expression expression) { if (expression.NodeType == ExpressionType.AndAlso) { ProcessAndAlso((BinaryExpression)expression); } else if (expression.NodeType == ExpressionType.Equal) { ProcessEQ((BinaryExpression)expression); } else if (expression.NodeType == ExpressionType.LessThan) { ProcessLessThan((BinaryExpression)expression); } else if (expression.NodeType == ExpressionType.LessThanOrEqual) { ProcessLessThan((BinaryExpression)expression); } else if (expression is MethodCallExpression) { ProcessMethodCall((MethodCallExpression)expression); } else if (expression is LambdaExpression) { ProcessExpression(((LambdaExpression)expression).Body); } }
In this simple conditional tree, we're taking a look at what type of expression we're dealing with (or it's "NodeType" - in other words what it does) and then handing it off to a handler method.
Note that in the last call, we're recursively calling ProcessExpression if we're dealing with a Lambda. This will walk the Lambda Expression until we get down to a simple Binary Expression (two-sided value set). When we have a Binary expression, we than pass it off according to it's "NodeType" - or the operational value in the middle ("=", "<", ">" etc).
Enough Theory and Background - The Useful Stuff
It's here that you'll need to implement your logic - in the "ProcessXXXX" methods. Where's generate up a bunch of Lambdas, which are themselves a bunch of Binaries, which themselves are expressions made up of MemberExpressions (the property of the AmazonItem used in the expression) and ConstantExpression (a literal, like "Hyperion").
For the Amazon demo, we're using a subset of the total power of the LINQ query engine - Equals and LessThan. In other words we're allowing expressions like "Title=Hyperion" and "Price<100". If we allowed other types of expressions, we'd need to handle them explicitly in ProcessExpression.
For Equality Expressions (NodeType==Equal), we hand off the expression to ProcessEQ() which accepts a BinaryExpression as an argument:
/// <summary> /// Handle the "=" operator and set the properties of this query accordingly /// </summary> /// <param name="expression"></param> private void ProcessEQ(BinaryExpression expression) { //make sure the left side is a Member, and the right is a constant value //if this isn't the case, it's Unary and is an enum setting if (expression.Left is MemberExpression && expression.Right is ConstantExpression) { //the member - "Title", "Publisher", etc MemberExpression memb = expression.Left as MemberExpression; //the setting ConstantExpression val = expression.Right as ConstantExpression; //see what we need to set here switch (memb.Member.Name.ToLower()) { case "title": _Title = val.Value.ToString(); break; case "publisher": _Publisher = val.Value.ToString(); break; case "price": decimal dPrice = 0; decimal.TryParse(val.Value.ToString(), out dPrice); _MaximumPrice = dPrice; break; } } else if ((expression.Left is UnaryExpression) && (((UnaryExpression)expression.Left).Operand.Type == typeof(BookCondition))) { //this is an enum setting if (expression.Right.NodeType == ExpressionType.Constant) _Condition = (BookCondition)((ConstantExpression)expression.Right).Value; } }
In this bit of code, we examine the values of the expressions and set some class-scope variables (class fields) we're going to use when we execute the call to amazon later.
Here's the code for Less Than:
/// <summary> /// Process the LessThan operator - this is for Price /// </summary> /// <param name="expression"></param> void ProcessLessThan(BinaryExpression expression) { MemberExpression memb = expression.Left as MemberExpression; ConstantExpression val = expression.Right as ConstantExpression; if (memb.Member.Name.ToLower() == "price") { decimal dPrice = 0; decimal.TryParse(val.Value.ToString(), out dPrice); _MaximumPrice = dPrice; } }
With these methods, it's easy to see how you can shape them to meet your own needs with your own data set.
Putting It All Together
Once again, let's look at the main execution point, called when you use query.ToArray():
/// <summary> /// The main execution method /// </summary> /// <returns></returns> IEnumerator IEnumerable.GetEnumerator() { MethodCallExpression methodCall; methodCall = _Expression as MethodCallExpression; UnaryExpression xp = (UnaryExpression)methodCall.Arguments[1]; //the "Wheres" in the query LambdaExpression lamby = (LambdaExpression)xp.Operand; ProcessExpression(lamby); //get the return set, a List<AmazonItem> var items = PerformWebQuery(); //return their enumerator return items.GetEnumerator(); }
The first steps are to parse up the expressions using ProcessExpression() which sets some class-scope variables:
- _Title
- _Publisher
- _Condition
- _MaximumPrice
Now that these values are set, we can build the Amazon query. This is done in then PerformWebQuery() method:
/// <summary> /// Give Amazon a call, and get the bits back. /// </summary> /// <returns></returns> private IEnumerable<AmazonItem> PerformWebQuery() { //your access key string AmazonKey = ""; try { AmazonKey = ConfigurationSettings.AppSettings["AmazonKey"].ToString(); } catch { throw new Exception("Need to define your Amazon Access key in the "+
"App.Config as 'AmazonKey' - can't find it currently"); } string AmazonServiceUrl = http://webservices.amazon.com/onca/xml?+
"Service=AWSECommerceService&AWSAccessKeyId=" + AmazonKey +
"&Operation=ItemSearch&SearchIndex=Books&ResponseGroup=Medium"; var amazonQuery = AmazonServiceUrl; // Generate URL if (!String.IsNullOrEmpty(_Title)) amazonQuery += "&Title=" + HttpUtility.UrlEncode(_Title); if (!String.IsNullOrEmpty(_Publisher)) amazonQuery += "&Publisher=" + HttpUtility.UrlEncode(_Publisher); if (_Condition.HasValue) amazonQuery += "&Condition=" + HttpUtility.UrlEncode(_Condition.ToString()); if (_MaximumPrice.HasValue) amazonQuery += "&MaximumPrice=" + HttpUtility.UrlEncode((_MaximumPrice * 100).Value.ToString(CultureInfo.InvariantCulture)); //call the AmazonService var items = AmazonQuery.LoadFromAmazonUrl(amazonQuery); return items; }
Again, notice the use here of the deferred typing - this keeps things nimble for us since all we care about in the end is the enumerator.
You'll notice that there is a method there called "LoadFromAmazonUrl" - I'll get more into that with a followup post which gets more into why I'm doing all this in the first place.
All of this code will be available at the end of the week, when the second part of this post is ready to go.
