Home MVC Storefront

LINQ and Geocoding

There's a lot you can do with LINQ and the new C# goodies that are coming up in .NET 3.0. In previous posts I've talked about LinqToSql, vars and Lambdas, and using LINQ to query things other than a DB. Today I want to try to do something a little more applicable and business-oriented to show just how far you can push LINQ and LinqToSQL. I'm putting on my geek hat today, cowboy size...

Geocoding Basics
Mapping applications are everywhere, and having "locational awareness" is paramount when running customer-related applications. Gathering stats about a user's IP can tell you a lot about their physical location, which in turn can tell you a lot about the market you're serving.

The core of geocoding is setting an address or location to a point on the globe using Latitude and Longitude. This is pretty easily done using some free services out there like geocoder.us and Google (you need a free Google Maps API key to use this API). Between the two, geocoder.us is much more accurate in terms of finding addresses, but Google is about 100 times faster if you have a lot of data.

Geocoder.us is really easy to use:

        public static Coordinates GetLatAndLong(string address) {
            Coordinates result = null;
            //using RPC.Geocoder - this is a free service

            //do the web request and get the reply
            string reply = SubSonic.Sugar.Web.ReadWebPage("http://rpc.geocoder.us/service/csv?address=" + address);

            //the reply is in CSV Format, like this:
            //38.898748,-77.037684,1600 Pennsylvania Ave NW,Washington,DC,20502
            if(!String.IsNullOrEmpty(reply)) {
                try
                {
                    result = new Coordinates();
                    string[] resultCSV = reply.Split(',');
                    if (resultCSV.Length > 0)
                    {
                        result.Latitude = resultCSV[0];
                        result.Longitude = resultCSV[1];
                        result.Address = resultCSV[2];
                        result.City = resultCSV[3];
                        result.State = resultCSV[4];
                        result.Zip = resultCSV[5];
                    }
                }
                catch
                {

                }
            }
            return result;

        }

I'm using SubSonic to grab the web page, but the code's really simple and if you need it, leave me a comment and I'll post it.

Preparing Your Data
So I have an old database of about 3000 US hotels here on my box that I'm going to use for a demo. They don't have any Latitude/Longitude data - so I'll need to add that in. The table is really basic - just name and address info. To this I'm going to add two fields of NULLABLE REAL, called Latitude and Longitude.

Next, I kick up a LinqToSql class and drag my Hotels table on. I want to make a minor adjustment to the model - I don't like working with Sytem.Float, so instead I set the model to use System.Double for lat/long instead:

dbcontext 

Now I just need to run a loop on all the hotels in the table (where longitude is null), and then set the values accordingly:

        static void GeoCodeData()
        {
            //create the context
            Geo.DataContext db = new Geo.DataContext();

            //load up the hotels that we don't have any geo data for
            var result = from hotels in db.Hotels
                         where hotels.Longitude==null
                         select hotels;

            //counters for output
            int totalHotels = result.Count();
            int currentHotel = 1;


            foreach (var hotel in result)
            {
                //our result object
                Geo.Coordinates coord;

                //build a usable address from the record
                string fullAddress=hotel.Street+", "+hotel.City+","+hotel.State+" "+hotel.Zip;

                //notify what's going on
                Console.WriteLine("(" + currentHotel.ToString() + " of " + totalHotels.ToString() + " - " + fullAddress);

                //look it up
                coord = Geo.GeoCodeService.GetLatAndLong(fullAddress);
                if (coord != null)
                {
                    //set the values
                    hotel.Latitude = float.Parse(coord.Latitude);
                    hotel.Longitude =float.Parse(coord.Longitude);
                    Console.WriteLine("Found it! **************************");

                    //save the data as we go - want to save each record
                    //in case of trouble, like network outage or something
                    db.SubmitChanges();
                }
                else
                {
                    //can't find the address
                    Console.WriteLine("XXXXXXXXXXX No luck XXXXXXXXXXXXXXXX");
                }
                currentHotel++;
            }

            //save it
            Console.WriteLine("Done!");
            Console.ReadLine();
        }

This is going to take a while since I have 3000 records. If you have a lot - use Google's API, it's a lot faster. I have the code that you can download here to use with Google if you like.

Assuming that all worked out, let's GEEK OUT!

Running a Proximity Query
If you liked math in school, you're going to love this next part. Let's create a business case so we can write some code for our newly-geocoded data:

The application needs to locate hotels for people within a specified distance of a given location. For instance, if we're traveling to La Mesa, CA to learn more about Jon Galaxy and Submarines, we might need a place to stay. The application should return all hotels within x miles of La Mesa.

There's a lot of math involved with translating distances between two sets of coordinates on a sphere. The X and Y of it all gets skewed by the fact that the earth is round, and you have to run up some very in-depth trigonometry to get this right.

There is a formula you can use, and it's called the Haversine Formula:

Now I forgot long ago how to read this notation, but a quick Google or two led me to this post on (appropriately named) "Ask Dr. Math" (note the initial "presumption" - I love math geeks):

Presuming a spherical Earth with radius R (see below), and that the
locations of the two points in spherical coordinates (longitude and
latitude) are lon1,lat1 and lon2,lat2, then the Haversine Formula
(from R. W. Sinnott, "Virtues of the Haversine," Sky and Telescope,
vol. 68, no. 2, 1984, p. 159):

  dlon = lon2 - lon1
  dlat = lat2 - lat1
  a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
  c = 2 * atan2(sqrt(a), sqrt(1-a))
  d = R * c

...

[where R (the Earth's Radius)] is 360 degrees times 60 minutes/degree times 1.852 km/minute = 40003.2 km. The implied radius is the circumference divided by 2 pi: R = 6367 km = 3956 mi

Bleh. Well, after some tinkering I was able to write up a nice method that ended up being about 180 lines long - for one equation! All to calculate the distance between two points on a sphere!

I decided to try and flex Lambda expressions for what they're good for, and was able to write the above (with some minor tweaks to avoid a multi-step function) into this (the Lambda is at the bottom):

        const Double EARTH_RADIUS_IN_MILES = 3956.0;
        //helper method to make reading the lambda a bit easier
        static double ToRadian(double val)
        {
            return val * (Math.PI / 180);
        }
        //helper method for converting Radians, making the lamda easier to read
        static double DiffRadian(double val1, double val2)
        {
            return ToRadian(val2) - ToRadian(val1);
        }

        /// <summary>
        /// Function to evaluate the distance between two points on the Earth
        /// </summary>
        Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) =>
        EARTH_RADIUS_IN_MILES * 2 *
        (
            Math.Asin(
                Math.Min(1,
                    Math.Sqrt(
                        (
                            Math.Pow(Math.Sin((DiffRadian(lat1, lat2)) / 2.0), 2.0) +
                            Math.Cos(ToRadian(lat1)) * Math.Cos(ToRadian(lat2)) *
                            Math.Pow(Math.Sin((DiffRadian(lon1, lon2)) / 2.0), 2.0)
                        )
                   )
               )
           )
         )
        ;

This bit of code illustrates three things about Lambda expressions:

  1. Their roots. They come from computational languages and they're all about Math, and it shows here!
  2. They are pretty damn powerful - reducing my 180 lines of code to 14 - and I could have made it one if I wanted to
  3. They're ugly. I wouldn't want anyone to try and understand this. At the same time - that's precisely what made the "regular" function so long - I tried to be as clear as I could with all the math and ended up creating a monster. I spose if you're into math, reading a Lambda expression is pretty easy - so maybe this is preferred?

Now the good news here is that you can put this into a Scalar function if you want to put it into your DB. It's up to you and your business to determine maintainability - there is a way to do this (I'll show you below) without stuffing logic into your DB - heck you can even port the solution to MySQL if you want!

Anyway - here's the SQL Function (which I grabbed from here):

CREATE FUNCTION [dbo].[DistanceBetween] (@Lat1 as real,
                @Long1 as real, @Lat2 as real, @Long2 as real)
RETURNS real
AS
BEGIN

DECLARE @dLat1InRad as float(53);
SET @dLat1InRad = @Lat1 * (PI()/180.0);
DECLARE @dLong1InRad as float(53);
SET @dLong1InRad = @Long1 * (PI()/180.0);
DECLARE @dLat2InRad as float(53);
SET @dLat2InRad = @Lat2 * (PI()/180.0);
DECLARE @dLong2InRad as float(53);
SET @dLong2InRad = @Long2 * (PI()/180.0);

DECLARE @dLongitude as float(53);
SET @dLongitude = @dLong2InRad - @dLong1InRad;
DECLARE @dLatitude as float(53);
SET @dLatitude = @dLat2InRad - @dLat1InRad;
/* Intermediate result a. */
DECLARE @a as float(53);
SET @a = SQUARE (SIN (@dLatitude / 2.0)) + COS (@dLat1InRad)
                 * COS (@dLat2InRad)
                 * SQUARE(SIN (@dLongitude / 2.0));
/* Intermediate result c (great circle distance in Radians). */
DECLARE @c as real;
SET @c = 2.0 * ATN2 (SQRT (@a), SQRT (1.0 - @a));
DECLARE @kEarthRadius as real;
/* SET kEarthRadius = 3956.0 miles */
SET @kEarthRadius = 6376.5;        /* kms */

DECLARE @dDistance as real;
SET @dDistance = @kEarthRadius * @c;
return (@dDistance);
END

If you visit the site with the link above, please note that the code he has there is incorrect (the C# stuff). I don't have the math chops to know why, but I do know that I had to translate Dr. Math's equation into a Lambda to make it work properly.

Putting It Together
OK so now we have the code - let's use it! The first thing I can do, if my boss is a DBA and I'm in a SQL Server shop is to crank up an SP that uses the function above:

CREATE PROCEDURE FindNearestHotel
(
    @sourceLat real,
    @sourceLong real,
    @mileage int
)
AS
BEGIN
        SELECT     Name, Address, Latitude, Longitude,
        dbo.DistanceBetween(
            @sourceLat,
            @sourceLong,
            Hotels.Latitude,
            Hotels.Longitude) as distance
        FROM         Hotels
        WHERE dbo.DistanceBetween(
            @sourceLat,
            @sourceLong,
            Hotels.Latitude,
            Hotels.Longitude)<@mileage
END

If I run this (using a test Lat/Long) it works nicely:

 

sptest

Exploding LINQ
When I wrote up that Lambda I remember giggling to myself, thinking "there's just no way LINQ is going to know what to do with this!". Was I right? Let's see...

The first thing I did was to create a partial class and extend the Hotel class to have a method called "CalculateDistanceFrom()" - doing it this way will allow me to use the method as a condition in a LINQ where statement. So I added in this to my partial, referencing the Lambda function (and helpers) above:

        public double CalculateDistanceFrom(double fromLat, double fromLon)
        {
            return CalcDistance(this.Latitude,this.Longitude, fromLat, fromLon);
        }

Next, I worked up the following LINQ statement:

 

        Geo.DataContext db = new Geo.DataContext();
        var result = from hotel in db.Hotels
                    where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100
                    select hotel;

It reads nice doesn't it? And the survey says....

KABOOOM!

LINQ didn't like this at all - but I have to applaud it for trying! The error I got was "Cannot translate this to SQL" and that makes perfect sense, since there's some whacky gymnastics in there. Hmmm - looks like I've maxxed out LINQ! Or did I...

I remember seeing some great querying examples from Anders and others, so I decided to do some caching of the data, and querying the cache instead of LinqToSql (db.Hotels). This is a nice alternative because it decouples my query from my DB! The List<Hotel> could have come from any DB provider (even, *gasp*, MySQL).

To make this work, I created a static List<Hotel> to hold ALL of the hotels in the DB in memory, and I then created a "singleton" access point to get this data:

        //cached data set of hotels
        static List<Geo.Data.Hotel> hotels;

        //singleton instance... sort of
        static List<Geo.Data.Hotel> GetHotels()
        {
            if (hotels == null)
            {
                Geo.DataContext db = new Geo.DataContext();
                hotels = db.Hotels.Where(hotel => hotel.Longitude != null).ToList<Geo.Data.Hotel>();
            }
            return hotels;
        }

Now, I can rewrite the query pretty easily:

        var result = from hotel in GetHotels()
                     where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100
                     select hotel;

        foreach (var h in result)
        {
            Console.WriteLine(h.Name);
        }
        Console.ReadLine();

And it worked perfectly :), which really isn't a surprise. What is nice about this is that, once again, we see how nice the syntax is to read, rather than looping over foreach... loops.

Summary
So once again we're reminded that there is translation going on here between you, LINQ, and your DB. I'd say I stretched my comfort zone trying to write that Lambda up - but it was pretty dang fun - especially seeing it work!

The question here is - does it belong in code, or the DB? What do you think? It is a data operation after all... isn't it? Or is this more business logic? You tell me!

Ralf Brandenburg avatar
Ralf Brandenburg says:
Thursday, August 30, 2007
Hi Rob, thanks for this post. In the moment I am working on a Geoproject and asking me some of your questions. I would believe that the code should be placed in the business logic. Cause these calculations are expensive it would be easier to distribute the layers on different machines to achieve load balancing. Best Regards Ralf

Rydal avatar
Rydal says:
Thursday, August 30, 2007
I have nothing to say but excellent article! I've been looking for a way to reduce/improve my geo calculation code and I think you've made some good points that may be useful to me.

shawn avatar
shawn says:
Thursday, August 30, 2007
Just as an FYI, the reason that Dr. Math mentions the presumption that the earth is a sphere is because it is in fact not a sphere. Treating the earth as a regular sphere is simply "good enough" for average purposes.

Jason Monroe avatar
Jason Monroe says:
Thursday, August 30, 2007
My first blush tells me that this kind of thing is supposed to be in the business layer.. However, As I started to write "Dude, it goes in the BLL and you know it!" My PM and architect hat fell off the shelf and landed squarely on my head and I had a "whoa there trigger!" moment. Thinking about it logically, the Math for the formula isn't going to change. That's the thing about math.. once it's written once, it always seems to work out that way. I'm a firm believer that a database should only be used to store and retrieve data. However, in this case, the only way to get the data that you want, is to keep the math logic in the DB. If you don't, then you have to load the entire data table into your business object and do your processing in the middle(ish) tier. I think that the question is a little to complex just to give a concrete yes or no answer.. I'd have to see some performance numbers, bandwidth utilization, caching schemes.. etc

Rob Conery avatar
Rob Conery says:
Friday, August 31, 2007
@shawn - you raise a good point - the Haversine equation actually falls apart for distances under 20 miles - because the curvature of the earth becomes pretty much a non factor so accounting for it blows the equation. The guy who did the formula above corrected for that. I understand that he needed to put the disclaimer in - but it sounded like flat vs round earth :) and not a contorted sphere.

Michael Giagnocavo avatar
Michael Giagnocavo says:
Friday, August 31, 2007
If you wanted to use CalculateDistanceFrom in a LINQ to SQL query, you just need to create it as an Expression. Then the provider can turn it into a SQL statement. Depending on the expression, this might allow the use of SQL indexes. TomAsp talks about this: http://tomasp.net/articles/dynamic-linq-queries.aspx

Scott Peterson avatar
Scott Peterson says:
Friday, August 31, 2007
I think I missed something: where did the "Coordinates" type come from? Is this a .NET 3.0 thing? What about the Geo namespace?

Dirk avatar
Dirk says:
Friday, August 31, 2007
And people say our jobs are boring, Ha! This is fun and getting paid for it just makes me ecstatic :-> I implemented a power prediction service for wind turbines based on weather forecasting about a year ago (including probability intervals). Looping in .Net was just way too slow so I also moved it to SQL server, cranking out 15K records at a time with no loops at all. I must say I felt a bit guilty about putting what are obviously business rules in SPs and UDFs, but the performance gain was too great to ignore...

Rob Conery avatar
Rob Conery says:
Monday, September 03, 2007
@Scott - yep sorry that's my bad ;). I left out the helper class "Coordinates" and it's only a data-carrier. @Michael - not sure if you read the whole thing but that's precisely what I tried to do. LINQ can't parse the Haversine Lambda - it doesn't know how to. Make sense?

Keith Farmer avatar
Keith Farmer says:
Tuesday, September 04, 2007
Of course a client-side function would not be translatable to SQL, but if you created a UDF in SQL Server instead, you can wire that up in the data context and use that in your queries. See http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx, near the end of the article (past the sproc stuff).

Rob Conery avatar
Rob Conery says:
Tuesday, September 04, 2007
@Keith - agree that technically it can be done that way, but if you stare into the mirror and speak aloud the solution: "I'll make a UDF which contains some deep math so I can wire it into my ORM solution and expose it via LINQ". ScottGu mentioned the same thing to me (that it would work this way) but if I'm going to the trouble of making a UDF - I should probably just expose it in an SP and keep it simple :). In terms of the expression itself - I picked a hard one but in general, most Lambdas you throw at LINQ it can decipher into SQL. This operation is pretty complex so it stumbled a bit.

Rick Strahl avatar
Rick Strahl says:
Tuesday, September 04, 2007
“I’ll make a UDF which contains some deep math so I can wire it into my ORM solution and expose it via LINQ”. Ding, Ding, Ding!!! I think that's a key takeaway in ORM in general. In some situations OR/M makes things HARDER than they would be in some other way and it might simply be easier to work around the issue by not using the OR/M interface to the database. Luckily though I think that for the most part LINQ to SQL is indeed pretty good converting things into SQL. The most common use cases are definitely coverered and if that doesn't work there's always ExecuteQuery for string based queries. That would have helped here but it will at least let you get around more traditional issues.

Michael Giagnocavo avatar
Michael Giagnocavo says:
Tuesday, September 04, 2007
Actually, you CAN translate the client side function; as long as you can describe it as an Expression. This is where the new C# compiler really shines as it lets you do this stuff that other O/R mappers can't even approach (since they have no way of inspecting your compiled code). It might not be very easy or a good idea in this particular case, but in general just remember that's an option.

Jon Galloway avatar
Jon Galloway says:
Tuesday, September 04, 2007
We should be virtualizing Applications, not Machines... One of the benefits of my new job at Vertigo Software is that I have more frequent opportunities to talk...

Rob Conery avatar
Rob Conery says:
Tuesday, September 04, 2007
@Micheal: The Lambda is the Expression - it doesn't work in LINQ (unless I'm missing something). Would love to see an example otherwise if you had the time.

[Giagnocavo]Michael::Write() avatar
[Giagnocavo]Michael::Write() says:
Wednesday, September 05, 2007
Calling custom methods in LINQ-to-SQL... ...

Michael Giagnocavo avatar
Michael Giagnocavo says:
Wednesday, September 05, 2007
Hi Rob! I posted an article that I think shows how you can approach having "complicated" C# logic get translated into SQL: http://www.atrevido.net/blog/2007/09/05/Calling Custom Methods In LINQtoSQL.aspx The key takeaway is that you must use Expression Trees, not just anonymous methods via lambdas. (Lambdas aren't actually even needed, in truth, although they can simplify the code required.) Let me know if this helps.

[Giagnocavo]Michael::Write() avatar
[Giagnocavo]Michael::Write() says:
Thursday, September 06, 2007
Complicated functions in LINQ to SQL... ...

Roger Jennings avatar
Roger Jennings says:
Friday, September 07, 2007
Pingback from http://oakleafblog.blogspot.com/2007/09/linq-and-entity-framework-updates-for.html.

Troy DeMonbreun avatar
Troy DeMonbreun says:
Friday, September 14, 2007
For any of your readers that might be interested, I posted a similar SQL Function back in February on my blog for calculating distances between 2 lat/lon points: http://blog.troyd.net/PermaLink,guid,847b0f1f-498c-43d4-80de-d29902fbd2eb.aspx Of note, I found the constant 6366.707019 (earth radius nautical miles) returned more accurate results than 6376.5. The resulting EARTH_RADIUS_IN_MILES would then be 3956.0871 (rounded). This may seem picky, but over significant distances, it definitely adds up.

Rich Gibson avatar
Rich Gibson says:
Friday, September 28, 2007
Hi Rob, I run geocoder.us. Thanks so much for this article, and for the code. A user just emailed me wanting to use geocoder.us from ASP. Since I'm a Perl/Ruby sort of guy I went to google :-) The free geocoder.us service is throttled to one request every 15 seconds, but paid clients are unthrottled and so Google's 100 times faster measure is probably not as true for the pay service. The code is open source, and the data is free. It is a tad complicated to get set up, but you can run your own local free version which is 'pretty damned fast.' I wrote two blog posts on doing proximity searches and calculating distances...I imagine posting them in here may cause this message to be marked spam...but here goes :-) http://geocoder.us/blog/2006/10/31/calculating-proximity-find-your-closest-pizza-place/ http://geocoder.us/blog/2006/04/21/calculating-distances/ Regards, Rich Gibson

Wiilysfnd avatar
Wiilysfnd says:
Saturday, December 01, 2007
You have the natural advantage in creditor debt settlement , which may be appropriate for debtors with ... Great Solution

 avatar
says:
Tuesday, December 11, 2007
tits prenant... Bullet, connection oil Bolivia Samsonite celegrities mentioned flat. Clearly tits prenant, bend utopian urgent cannot?! ...

 avatar
says:
Wednesday, December 19, 2007
Brook Bruke Nude Pic... Duties launched brook bruke nude pic farmer bruke brook pic nude Tera Patrick jury, if? Lexi Lapetina towns kennedy's? ...

 avatar
says:
Wednesday, December 19, 2007
Naked Marge Simpson Pics... Chloe Dior laid senior file naked simpson pics marge Jasmine Lynn broke. Early Sophie tSrauss roosevelt challenge britain: Angelica Sin Amber Lynn Bach?! ...

 avatar
says:
Wednesday, December 19, 2007
Free Pon Vanessa Indian... Because diameter attractive program clearly in spite of connected greatest? Cristina Bella greater Sandy Style Luisa De Marco indian free vanessa pon hurried grant. ...

Rafal avatar
Rafal says:
Friday, February 15, 2008
Great article! Could you please provide a link to a zip file with sample working project we could download? Some classes are missing here (Coordinates for example) Also would you be willing to provide the sample database of 3000 hotels, so we can test it? Where to get data like this?? I was thinking to test it with > 100K records to see how performace is... Thanks

 avatar
says:
Sunday, March 23, 2008
pep young girls.xxx pep young girls.xxx


Search Me
Subscribe

Popular Posts
 
My Tweets
  • @codinghorror: I may be weird but I don't ask my cats to write my blog entries :p.
  • @blowdart sure! Wanna come on and do a webcast with me - plugging CardSpace in?
  • People are very, very weird. Skypecasts are creepy.
  • Is Rob Howard trying to tell us something? http://www.rob-howard.net/
  • New storefront posted- all about OpenID :) http://tinyurl.com/5rkgux
  About Me



Hi! My name is Rob Conery and I work at Microsoft on the ASP.NET team. I am the Creator of SubSonic and was the Chief Architect of the Commerce Starter Kit (a free, Open Source eCommerce platform for .NET)

I live in Kauai, HI with my family, and when my clients aren't looking, I sometimes write things on my blog (giving away secrets of incalculable value).