Hanalei, Hawaii 9/2/2010
438 Posts and Counting

LINQ and Geocoding

Thursday, August 30, 2007 -

There's a lot you can do with LINQ and the new C# goodies that are coming up in .NET 3.0. In previous posts I've talked about LinqToSql, vars and Lambdas, and using LINQ to query things other than a DB. Today I want to try to do something a little more applicable and business-oriented to show just how far you can push LINQ and LinqToSQL. I'm putting on my geek hat today, cowboy size...

Geocoding Basics
Mapping applications are everywhere, and having "locational awareness" is paramount when running customer-related applications. Gathering stats about a user's IP can tell you a lot about their physical location, which in turn can tell you a lot about the market you're serving.

The core of geocoding is setting an address or location to a point on the globe using Latitude and Longitude. This is pretty easily done using some free services out there like geocoder.us and Google (you need a free Google Maps API key to use this API). Between the two, geocoder.us is much more accurate in terms of finding addresses, but Google is about 100 times faster if you have a lot of data.

Geocoder.us is really easy to use:

        public static Coordinates GetLatAndLong(string address) {
            Coordinates result = null;
            //using RPC.Geocoder - this is a free service

            //do the web request and get the reply
            string reply = SubSonic.Sugar.Web.ReadWebPage("http://rpc.geocoder.us/service/csv?address=" + address);

            //the reply is in CSV Format, like this:
            //38.898748,-77.037684,1600 Pennsylvania Ave NW,Washington,DC,20502
            if(!String.IsNullOrEmpty(reply)) {
                try
                {
                    result = new Coordinates();
                    string[] resultCSV = reply.Split(',');
                    if (resultCSV.Length > 0)
                    {
                        result.Latitude = resultCSV[0];
                        result.Longitude = resultCSV[1];
                        result.Address = resultCSV[2];
                        result.City = resultCSV[3];
                        result.State = resultCSV[4];
                        result.Zip = resultCSV[5];
                    }
                }
                catch
                {

                }
            }
            return result;

        }

I'm using SubSonic to grab the web page, but the code's really simple and if you need it, leave me a comment and I'll post it.

Preparing Your Data
So I have an old database of about 3000 US hotels here on my box that I'm going to use for a demo. They don't have any Latitude/Longitude data - so I'll need to add that in. The table is really basic - just name and address info. To this I'm going to add two fields of NULLABLE REAL, called Latitude and Longitude.

Next, I kick up a LinqToSql class and drag my Hotels table on. I want to make a minor adjustment to the model - I don't like working with Sytem.Float, so instead I set the model to use System.Double for lat/long instead:

dbcontext 

Now I just need to run a loop on all the hotels in the table (where longitude is null), and then set the values accordingly:

        static void GeoCodeData()
        {
            //create the context
            Geo.DataContext db = new Geo.DataContext();

            //load up the hotels that we don't have any geo data for
            var result = from hotels in db.Hotels
                         where hotels.Longitude==null
                         select hotels;

            //counters for output
            int totalHotels = result.Count();
            int currentHotel = 1;


            foreach (var hotel in result)
            {
                //our result object
                Geo.Coordinates coord;

                //build a usable address from the record
                string fullAddress=hotel.Street+", "+hotel.City+","+hotel.State+" "+hotel.Zip;

                //notify what's going on
                Console.WriteLine("(" + currentHotel.ToString() + " of " + totalHotels.ToString() + " - " + fullAddress);

                //look it up
                coord = Geo.GeoCodeService.GetLatAndLong(fullAddress);
                if (coord != null)
                {
                    //set the values
                    hotel.Latitude = float.Parse(coord.Latitude);
                    hotel.Longitude =float.Parse(coord.Longitude);
                    Console.WriteLine("Found it! **************************");

                    //save the data as we go - want to save each record
                    //in case of trouble, like network outage or something
                    db.SubmitChanges();
                }
                else
                {
                    //can't find the address
                    Console.WriteLine("XXXXXXXXXXX No luck XXXXXXXXXXXXXXXX");
                }
                currentHotel++;
            }

            //save it
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

This is going to take a while since I have 3000 records. If you have a lot - use Google's API, it's a lot faster. I have the code that you can download here to use with Google if you like.

Assuming that all worked out, let's GEEK OUT!

Running a Proximity Query
If you liked math in school, you're going to love this next part. Let's create a business case so we can write some code for our newly-geocoded data:

The application needs to locate hotels for people within a specified distance of a given location. For instance, if we're traveling to La Mesa, CA to learn more about Jon Galaxy and Submarines, we might need a place to stay. The application should return all hotels within x miles of La Mesa.

There's a lot of math involved with translating distances between two sets of coordinates on a sphere. The X and Y of it all gets skewed by the fact that the earth is round, and you have to run up some very in-depth trigonometry to get this right.

There is a formula you can use, and it's called the Haversine Formula:

Now I forgot long ago how to read this notation, but a quick Google or two led me to this post on (appropriately named) "Ask Dr. Math" (note the initial "presumption" - I love math geeks):

Presuming a spherical Earth with radius R (see below), and that the
locations of the two points in spherical coordinates (longitude and
latitude) are lon1,lat1 and lon2,lat2, then the Haversine Formula
(from R. W. Sinnott, "Virtues of the Haversine," Sky and Telescope,
vol. 68, no. 2, 1984, p. 159):

  dlon = lon2 - lon1
  dlat = lat2 - lat1
  a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
  c = 2 * atan2(sqrt(a), sqrt(1-a))
  d = R * c

...

[where R (the Earth's Radius)] is 360 degrees times 60 minutes/degree times 1.852 km/minute = 40003.2 km. The implied radius is the circumference divided by 2 pi: R = 6367 km = 3956 mi

Bleh. Well, after some tinkering I was able to write up a nice method that ended up being about 180 lines long - for one equation! All to calculate the distance between two points on a sphere!

I decided to try and flex Lambda expressions for what they're good for, and was able to write the above (with some minor tweaks to avoid a multi-step function) into this (the Lambda is at the bottom):

        const Double EARTH_RADIUS_IN_MILES = 3956.0;
        //helper method to make reading the lambda a bit easier
        static double ToRadian(double val)
        {
            return val * (Math.PI / 180);
        }
        //helper method for converting Radians, making the lamda easier to read
        static double DiffRadian(double val1, double val2)
        {
            return ToRadian(val2) - ToRadian(val1);
        }

        /// <summary>
        /// Function to evaluate the distance between two points on the Earth
        /// </summary>
        Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) =>
        EARTH_RADIUS_IN_MILES * 2 *
        (
            Math.Asin(
                Math.Min(1,
                    Math.Sqrt(
                        (
                            Math.Pow(Math.Sin((DiffRadian(lat1, lat2)) / 2.0), 2.0) +
                            Math.Cos(ToRadian(lat1)) * Math.Cos(ToRadian(lat2)) *
                            Math.Pow(Math.Sin((DiffRadian(lon1, lon2)) / 2.0), 2.0)
                        )
                   )
               )
           )
         )
        ;

This bit of code illustrates three things about Lambda expressions:

  1. Their roots. They come from computational languages and they're all about Math, and it shows here!
  2. They are pretty damn powerful - reducing my 180 lines of code to 14 - and I could have made it one if I wanted to
  3. They're ugly. I wouldn't want anyone to try and understand this. At the same time - that's precisely what made the "regular" function so long - I tried to be as clear as I could with all the math and ended up creating a monster. I spose if you're into math, reading a Lambda expression is pretty easy - so maybe this is preferred?

Now the good news here is that you can put this into a Scalar function if you want to put it into your DB. It's up to you and your business to determine maintainability - there is a way to do this (I'll show you below) without stuffing logic into your DB - heck you can even port the solution to MySQL if you want!

Anyway - here's the SQL Function (which I grabbed from here):

CREATE FUNCTION [dbo].[DistanceBetween] (@Lat1 as real,
                @Long1 as real, @Lat2 as real, @Long2 as real)
RETURNS real
AS
BEGIN

DECLARE @dLat1InRad as float(53);
SET @dLat1InRad = @Lat1 * (PI()/180.0);
DECLARE @dLong1InRad as float(53);
SET @dLong1InRad = @Long1 * (PI()/180.0);
DECLARE @dLat2InRad as float(53);
SET @dLat2InRad = @Lat2 * (PI()/180.0);
DECLARE @dLong2InRad as float(53);
SET @dLong2InRad = @Long2 * (PI()/180.0);

DECLARE @dLongitude as float(53);
SET @dLongitude = @dLong2InRad - @dLong1InRad;
DECLARE @dLatitude as float(53);
SET @dLatitude = @dLat2InRad - @dLat1InRad;
/* Intermediate result a. */
DECLARE @a as float(53);
SET @a = SQUARE (SIN (@dLatitude / 2.0)) + COS (@dLat1InRad)
                 * COS (@dLat2InRad)
                 * SQUARE(SIN (@dLongitude / 2.0));
/* Intermediate result c (great circle distance in Radians). */
DECLARE @c as real;
SET @c = 2.0 * ATN2 (SQRT (@a), SQRT (1.0 - @a));
DECLARE @kEarthRadius as real;
/* SET kEarthRadius = 3956.0 miles */
SET @kEarthRadius = 6376.5;        /* kms */

DECLARE @dDistance as real;
SET @dDistance = @kEarthRadius * @c;
return (@dDistance);
END

If you visit the site with the link above, please note that the code he has there is incorrect (the C# stuff). I don't have the math chops to know why, but I do know that I had to translate Dr. Math's equation into a Lambda to make it work properly.

Putting It Together
OK so now we have the code - let's use it! The first thing I can do, if my boss is a DBA and I'm in a SQL Server shop is to crank up an SP that uses the function above:

CREATE PROCEDURE FindNearestHotel
(
    @sourceLat real,
    @sourceLong real,
    @mileage int
)
AS
BEGIN
        SELECT     Name, Address, Latitude, Longitude,
        dbo.DistanceBetween(
            @sourceLat,
            @sourceLong,
            Hotels.Latitude,
            Hotels.Longitude) as distance
        FROM         Hotels
        WHERE dbo.DistanceBetween(
            @sourceLat,
            @sourceLong,
            Hotels.Latitude,
            Hotels.Longitude)<@mileage
END

If I run this (using a test Lat/Long) it works nicely:

 

sptest

Exploding LINQ
When I wrote up that Lambda I remember giggling to myself, thinking "there's just no way LINQ is going to know what to do with this!". Was I right? Let's see...

The first thing I did was to create a partial class and extend the Hotel class to have a method called "CalculateDistanceFrom()" - doing it this way will allow me to use the method as a condition in a LINQ where statement. So I added in this to my partial, referencing the Lambda function (and helpers) above:

        public double CalculateDistanceFrom(double fromLat, double fromLon)
        {
            return CalcDistance(this.Latitude,this.Longitude, fromLat, fromLon);
        }

Next, I worked up the following LINQ statement:

 

        Geo.DataContext db = new Geo.DataContext();
        var result = from hotel in db.Hotels
                    where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100
                    select hotel;

It reads nice doesn't it? And the survey says....

KABOOOM!

LINQ didn't like this at all - but I have to applaud it for trying! The error I got was "Cannot translate this to SQL" and that makes perfect sense, since there's some whacky gymnastics in there. Hmmm - looks like I've maxxed out LINQ! Or did I...

I remember seeing some great querying examples from Anders and others, so I decided to do some caching of the data, and querying the cache instead of LinqToSql (db.Hotels). This is a nice alternative because it decouples my query from my DB! The List<Hotel> could have come from any DB provider (even, *gasp*, MySQL).

To make this work, I created a static List<Hotel> to hold ALL of the hotels in the DB in memory, and I then created a "singleton" access point to get this data:

        //cached data set of hotels
        static List<Geo.Data.Hotel> hotels;

        //singleton instance... sort of
        static List<Geo.Data.Hotel> GetHotels()
        {
            if (hotels == null)
            {
                Geo.DataContext db = new Geo.DataContext();
                hotels = db.Hotels.Where(hotel => hotel.Longitude != null).ToList<Geo.Data.Hotel>();
            }
            return hotels;
        }

Now, I can rewrite the query pretty easily:

        var result = from hotel in GetHotels()
                     where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100
                     select hotel;

        foreach (var h in result)
        {
            Console.WriteLine(h.Name);
        }
        Console.ReadLine();

And it worked perfectly :), which really isn't a surprise. What is nice about this is that, once again, we see how nice the syntax is to read, rather than looping over foreach... loops.

Summary
So once again we're reminded that there is translation going on here between you, LINQ, and your DB. I'd say I stretched my comfort zone trying to write that Lambda up - but it was pretty dang fun - especially seeing it work!

The question here is - does it belong in code, or the DB? What do you think? It is a data operation after all... isn't it? Or is this more business logic? You tell me!

Related


Gravatar
Rob Conery - Friday, August 31, 2007 - @shawn - you raise a good point - the Haversine equation actually falls apart for distances under 20 miles - because the curvature of the earth becomes pretty much a non factor so accounting for it blows the equation.

The guy who did the formula above corrected for that. I understand that he needed to put the disclaimer in - but it sounded like flat vs round earth :) and not a contorted sphere.
Gravatar
Michael Giagnocavo - Friday, August 31, 2007 - If you wanted to use CalculateDistanceFrom in a LINQ to SQL query, you just need to create it as an Expression. Then the provider can turn it into a SQL statement. Depending on the expression, this might allow the use of SQL indexes. TomAsp talks about this: http://tomasp.net/articles/dynamic-linq-queries.aspx
Gravatar
Scott Peterson - Friday, August 31, 2007 - I think I missed something: where did the "Coordinates" type come from? Is this a .NET 3.0 thing? What about the Geo namespace?
Gravatar
Jason Monroe - Thursday, August 30, 2007 - My first blush tells me that this kind of thing is supposed to be in the business layer.. However, As I started to write "Dude, it goes in the BLL and you know it!" My PM and architect hat fell off the shelf and landed squarely on my head and I had a "whoa there trigger!" moment. Thinking about it logically, the Math for the formula isn't going to change. That's the thing about math.. once it's written once, it always seems to work out that way. I'm a firm believer that a database should only be used to store and retrieve data. However, in this case, the only way to get the data that you want, is to keep the math logic in the DB. If you don't, then you have to load the entire data table into your business object and do your processing in the middle(ish) tier. I think that the question is a little to complex just to give a concrete yes or no answer.. I'd have to see some performance numbers, bandwidth utilization, caching schemes.. etc
Gravatar
shawn - Thursday, August 30, 2007 - Just as an FYI, the reason that Dr. Math mentions the presumption that the earth is a sphere is because it is in fact not a sphere. Treating the earth as a regular sphere is simply "good enough" for average purposes.
Gravatar
9 Links Today (2007-08-30) - Thursday, August 30, 2007 - [...] LINQ and Geocoding to calculate the distance between two points on a sphere [...]
Gravatar
Ralf Brandenburg - Thursday, August 30, 2007 - Hi Rob, thanks for this post. In the moment I am working on a Geoproject and asking me some of your questions. I would believe that the code should be placed in the business logic. Cause these calculations are expensive it would be easier to distribute the layers on different machines to achieve load balancing. Best Regards Ralf
Gravatar
Rydal - Thursday, August 30, 2007 - I have nothing to say but excellent article! I've been looking for a way to reduce/improve my geo calculation code and I think you've made some good points that may be useful to me.
Gravatar
Dirk - Friday, August 31, 2007 - And people say our jobs are boring, Ha! This is fun and getting paid for it just makes me ecstatic :-> I implemented a power prediction service for wind turbines based on weather forecasting about a year ago (including probability intervals). Looping in .Net was just way too slow so I also moved it to SQL server, cranking out 15K records at a time with no loops at all. I must say I felt a bit guilty about putting what are obviously business rules in SPs and UDFs, but the performance gain was too great to ignore...
Gravatar
Rob Conery - Monday, September 03, 2007 - @Scott - yep sorry that's my bad ;). I left out the helper class "Coordinates" and it's only a data-carrier. @Michael - not sure if you read the whole thing but that's precisely what I tried to do. LINQ can't parse the Haversine Lambda - it doesn't know how to. Make sense?
Gravatar
Keith Farmer - Tuesday, September 04, 2007 - Of course a client-side function would not be translatable to SQL, but if you created a UDF in SQL Server instead, you can wire that up in the data context and use that in your queries.

See http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx, near the end of the article (past the sproc stuff).
Gravatar
Rob Conery - Tuesday, September 04, 2007 - @Keith - agree that technically it can be done that way, but if you stare into the mirror and speak aloud the solution: "I'll make a UDF which contains some deep math so I can wire it into my ORM solution and expose it via LINQ". ScottGu mentioned the same thing to me (that it would work this way) but if I'm going to the trouble of making a UDF - I should probably just expose it in an SP and keep it simple :). In terms of the expression itself - I picked a hard one but in general, most Lambdas you throw at LINQ it can decipher into SQL. This operation is pretty complex so it stumbled a bit.
Gravatar
Rick Strahl - Tuesday, September 04, 2007 - "I'll make a UDF which contains some deep math so I can wire it into my ORM solution and expose it via LINQ".

Ding, Ding, Ding!!!

I think that's a key takeaway in ORM in general. In some situations OR/M makes things HARDER than they would be in some other way and it might simply be easier to work around the issue by not using the OR/M interface to the database.

Luckily though I think that for the most part LINQ to SQL is indeed pretty good converting things into SQL. The most common use cases are definitely coverered and if that doesn't work there's always ExecuteQuery for string based queries. That would have helped here but it will at least let you get around more traditional issues.
Gravatar
Michael Giagnocavo - Tuesday, September 04, 2007 - Actually, you CAN translate the client side function; as long as you can describe it as an Expression. This is where the new C# compiler really shines as it lets you do this stuff that other O/R mappers can't even approach (since they have no way of inspecting your compiled code). It might not be very easy or a good idea in this particular case, but in general just remember that's an option.
Gravatar
Jon Galloway - Tuesday, September 04, 2007 - We should be virtualizing Applications, not Machines... One of the benefits of my new job at Vertigo Software is that I have more frequent opportunities to talk...
Gravatar
Rob Conery - Tuesday, September 04, 2007 - @Micheal: The Lambda is the Expression - it doesn't work in LINQ (unless I'm missing something). Would love to see an example otherwise if you had the time.
Gravatar
[Giagnocavo]Michael::Write() - Wednesday, September 05, 2007 - Calling custom methods in LINQ-to-SQL... ...
Gravatar
Michael Giagnocavo - Wednesday, September 05, 2007 - Hi Rob!

I posted an article that I think shows how you can approach having "complicated" C# logic get translated into SQL:

http://www.atrevido.net/blog/2007/09/05/Calling Custom Methods In LINQtoSQL.aspx

The key takeaway is that you must use Expression Trees, not just anonymous methods via lambdas. (Lambdas aren't actually even needed, in truth, although they can simplify the code required.)

Let me know if this helps.
Gravatar
[Giagnocavo]Michael::Write() - Thursday, September 06, 2007 - Complicated functions in LINQ to SQL... ...
Gravatar
Rob Conery » LINQ and GeoCode, Part 2 - Thursday, September 06, 2007 - [...] background here has to do with my original post on LINQ and Geocoding. I translated a C# method into a Lambda for fun in order to see how far I could push LinqToSql - [...]
Gravatar
Roger Jennings - Friday, September 07, 2007 - Pingback from http://oakleafblog.blogspot.com/2007/09/linq-and-entity-framework-updates-for.html.
Gravatar
links for 2007-09-13 « dstelow notes… - Thursday, September 13, 2007 - [...] Rob Conery ยป LINQ and Geocoding Mapping applications are everywhere, and having "locational awareness" is paramount when running customer-related applications. Gathering stats about a user's IP can tell you a lot about their physical location, which in turn can tell you a lot abou (tags: dev dotnet geocoding geo linq) [...]
Gravatar
Troy DeMonbreun - Friday, September 14, 2007 - For any of your readers that might be interested, I posted a similar SQL Function back in February on my blog for calculating distances between 2 lat/lon points:

http://blog.troyd.net/PermaLink,guid,847b0f1f-498c-43d4-80de-d29902fbd2eb.aspx

Of note, I found the constant 6366.707019 (earth radius nautical miles) returned more accurate results than 6376.5. The resulting EARTH_RADIUS_IN_MILES would then be 3956.0871 (rounded). This may seem picky, but over significant distances, it definitely adds up.
Gravatar
Rich Gibson - Friday, September 28, 2007 - Hi Rob, I run geocoder.us. Thanks so much for this article, and for the code. A user just emailed me wanting to use geocoder.us from ASP. Since I'm a Perl/Ruby sort of guy I went to google :-) The free geocoder.us service is throttled to one request every 15 seconds, but paid clients are unthrottled and so Google's 100 times faster measure is probably not as true for the pay service. The code is open source, and the data is free. It is a tad complicated to get set up, but you can run your own local free version which is 'pretty damned fast.' I wrote two blog posts on doing proximity searches and calculating distances...I imagine posting them in here may cause this message to be marked spam...but here goes :-) http://geocoder.us/blog/2006/10/31/calculating-proximity-find-your-closest-pizza-place/ http://geocoder.us/blog/2006/04/21/calculating-distances/ Regards, Rich Gibson
Gravatar
Rafal - Friday, February 15, 2008 - Great article!

Could you please provide a link to a zip file with sample working project we could download? Some classes are missing here (Coordinates for example)

Also would you be willing to provide the sample database of 3000 hotels, so we can test it? Where to get data like this?? I was thinking to test it with > 100K records to see how performace is...

Thanks
Gravatar
Rob Conery » LINQ Gymnastics: Creating A Predictive Query With LINQ - Wednesday, February 27, 2008 - [...] been… interesting (the Adventure Works part) but it’s great for working out those weak LINQ muscles that tend to atrophy after a while. On the heals of last night’s nested query discovery, I [...]
Gravatar
SQL Server 2008 Spacial Data « QuantuMatrix’s Weblog - Tuesday, April 14, 2009 - [...] Finding the distance between two points given their lat/long [...]