LINQ and Geocoding

There's a lot you can do with LINQ and the new C# goodies that are coming up in .NET 3.

There’s a lot you can do with LINQ and the new C# goodies that are coming up in .NET 3.0. In previous posts I’ve talked about LinqToSql, vars and Lambdas, and using LINQ to query things other than a DB. Today I want to try to do something a little more applicable and business-oriented to show just how far you can push LINQ and LinqToSQL. I’m putting on my geek hat today, cowboy size…

Geocoding Basics
Mapping applications are everywhere, and having “locational awareness” is paramount when running customer-related applications. Gathering stats about a user’s IP can tell you a lot about their physical location, which in turn can tell you a lot about the market you’re serving.

The core of geocoding is setting an address or location to a point on the globe using Latitude and Longitude. This is pretty easily done using some free services out there like geocoder.us and Google (you need a free Google Maps API key to use this API). Between the two, geocoder.us is much more accurate in terms of finding addresses, but Google is about 100 times faster if you have a lot of data.

Geocoder.us is really easy to use:

        public static Coordinates GetLatAndLong(string address) {            Coordinates result = null;            //using RPC.Geocoder - this is a free service            //do the web request and get the reply            string reply = SubSonic.Sugar.Web.ReadWebPage("http://rpc.geocoder.us/service/csv?address=" + address);            //the reply is in CSV Format, like this:            //38.898748,-77.037684,1600 Pennsylvania Ave NW,Washington,DC,20502            if(!String.IsNullOrEmpty(reply)) {                try                {                    result = new Coordinates();                    string[] resultCSV = reply.Split(',');                    if (resultCSV.Length > 0)                    {                        result.Latitude = resultCSV[0];                        result.Longitude = resultCSV[1];                        result.Address = resultCSV[2];                        result.City = resultCSV[3];                        result.State = resultCSV[4];                        result.Zip = resultCSV[5];                    }                }                catch                {                }            }            return result;        }

I’m using SubSonic to grab the web page, but the code’s really simple and if you need it, leave me a comment and I’ll post it.

Preparing Your Data
So I have an old database of about 3000 US hotels here on my box that I’m going to use for a demo. They don’t have any Latitude/Longitude data – so I’ll need to add that in. The table is really basic – just name and address info. To this I’m going to add two fields of NULLABLE REAL, called Latitude and Longitude.

Next, I kick up a LinqToSql class and drag my Hotels table on. I want to make a minor adjustment to the model – I don’t like working with Sytem.Float, so instead I set the model to use System.Double for lat/long instead:

dbcontext 

Now I just need to run a loop on all the hotels in the table (where longitude is null), and then set the values accordingly:

        static void GeoCodeData()        {            //create the context            Geo.DataContext db = new Geo.DataContext();            //load up the hotels that we don't have any geo data for            var result = from hotels in db.Hotels                         where hotels.Longitude==null                         select hotels;            //counters for output            int totalHotels = result.Count();            int currentHotel = 1;            foreach (var hotel in result)            {                //our result object                Geo.Coordinates coord;                //build a usable address from the record                string fullAddress=hotel.Street+", "+hotel.City+","+hotel.State+" "+hotel.Zip;                //notify what's going on                Console.WriteLine("(" + currentHotel.ToString() + " of " + totalHotels.ToString() + " - " + fullAddress);                //look it up                coord = Geo.GeoCodeService.GetLatAndLong(fullAddress);                if (coord != null)                {                    //set the values                    hotel.Latitude = float.Parse(coord.Latitude);                    hotel.Longitude =float.Parse(coord.Longitude);                    Console.WriteLine("Found it! **************************");                    //save the data as we go - want to save each record                    //in case of trouble, like network outage or something                    db.SubmitChanges();                }                else                {                    //can't find the address                    Console.WriteLine("XXXXXXXXXXX No luck XXXXXXXXXXXXXXXX");                }                currentHotel++;            }            //save it            Console.WriteLine("Done!");            Console.ReadLine();        }

This is going to take a while since I have 3000 records. If you have a lot – use Google’s API, it’s a lot faster. I have the code that you can download here to use with Google if you like.

Assuming that all worked out, let’s GEEK OUT!

Running a Proximity Query
If you liked math in school, you’re going to love this next part. Let’s create a business case so we can write some code for our newly-geocoded data:

The application needs to locate hotels for people within a specified distance of a given location. For instance, if we’re traveling to La Mesa, CA to learn more about Jon Galaxy and Submarines, we might need a place to stay. The application should return all hotels within x miles of La Mesa.

There’s a lot of math involved with translating distances between two sets of coordinates on a sphere. The X and Y of it all gets skewed by the fact that the earth is round, and you have to run up some very in-depth trigonometry to get this right.

There is a formula you can use, and it’s called the Haversine Formula:

Now I forgot long ago how to read this notation, but a quick Google or two led me to this post on (appropriately named) “Ask Dr. Math” (note the initial “presumption” – I love math geeks):

Presuming a spherical Earth with radius R (see below), and that the
locations of the two points in spherical coordinates (longitude and
latitude) are lon1,lat1 and lon2,lat2, then the Haversine Formula
(from R. W. Sinnott, “Virtues of the Haversine,” Sky and Telescope,
vol. 68, no. 2, 1984, p. 159):

  dlon = lon2 – lon1
  dlat = lat2 – lat1
  a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
  c = 2 * atan2(sqrt(a), sqrt(1-a))
  d = R * c

[where R (the Earth's Radius)] is 360 degrees times 60 minutes/degree times 1.852 km/minute = 40003.2 km. The implied radius is the circumference divided by 2 pi: R = 6367 km = 3956 mi

Bleh. Well, after some tinkering I was able to write up a nice method that ended up being about 180 lines long – for one equation! All to calculate the distance between two points on a sphere!

I decided to try and flex Lambda expressions for what they’re good for, and was able to write the above (with some minor tweaks to avoid a multi-step function) into this (the Lambda is at the bottom):

        const Double EARTH_RADIUS_IN_MILES = 3956.0;        //helper method to make reading the lambda a bit easier        static double ToRadian(double val)        {            return val * (Math.PI / 180);        }        //helper method for converting Radians, making the lamda easier to read        static double DiffRadian(double val1, double val2)        {            return ToRadian(val2) - ToRadian(val1);        }        /// <summary>        /// Function to evaluate the distance between two points on the Earth        /// </summary>        Func<double, double, double, double, double> CalcDistance = (lat1, lon1, lat2, lon2) =>        EARTH_RADIUS_IN_MILES * 2 *        (            Math.Asin(                Math.Min(1,                    Math.Sqrt(                        (                            Math.Pow(Math.Sin((DiffRadian(lat1, lat2)) / 2.0), 2.0) +                            Math.Cos(ToRadian(lat1)) * Math.Cos(ToRadian(lat2)) *                            Math.Pow(Math.Sin((DiffRadian(lon1, lon2)) / 2.0), 2.0)                        )                   )               )           )         )        ;

This bit of code illustrates three things about Lambda expressions:

  1. Their roots. They come from computational languages and they’re all about Math, and it shows here!
  2. They are pretty damn powerful – reducing my 180 lines of code to 14 – and I could have made it one if I wanted to
  3. They’re ugly. I wouldn’t want anyone to try and understand this. At the same time – that’s precisely what made the “regular” function so long – I tried to be as clear as I could with all the math and ended up creating a monster. I spose if you’re into math, reading a Lambda expression is pretty easy – so maybe this is preferred?

Now the good news here is that you can put this into a Scalar function if you want to put it into your DB. It’s up to you and your business to determine maintainability – there is a way to do this (I’ll show you below) without stuffing logic into your DB – heck you can even port the solution to MySQL if you want!

Anyway – here’s the SQL Function (which I grabbed from here):

CREATE FUNCTION [dbo].[DistanceBetween] (@Lat1 as real,                @Long1 as real, @Lat2 as real, @Long2 as real)RETURNS realASBEGINDECLARE @dLat1InRad as float(53);SET @dLat1InRad = @Lat1 * (PI()/180.0);DECLARE @dLong1InRad as float(53);SET @dLong1InRad = @Long1 * (PI()/180.0);DECLARE @dLat2InRad as float(53);SET @dLat2InRad = @Lat2 * (PI()/180.0);DECLARE @dLong2InRad as float(53);SET @dLong2InRad = @Long2 * (PI()/180.0);DECLARE @dLongitude as float(53);SET @dLongitude = @dLong2InRad - @dLong1InRad;DECLARE @dLatitude as float(53);SET @dLatitude = @dLat2InRad - @dLat1InRad;/* Intermediate result a. */DECLARE @a as float(53);SET @a = SQUARE (SIN (@dLatitude / 2.0)) + COS (@dLat1InRad)                 * COS (@dLat2InRad)                 * SQUARE(SIN (@dLongitude / 2.0));/* Intermediate result c (great circle distance in Radians). */DECLARE @c as real;SET @c = 2.0 * ATN2 (SQRT (@a), SQRT (1.0 - @a));DECLARE @kEarthRadius as real;/* SET kEarthRadius = 3956.0 miles */SET @kEarthRadius = 6376.5;        /* kms */DECLARE @dDistance as real;SET @dDistance = @kEarthRadius * @c;return (@dDistance);END

If you visit the site with the link above, please note that the code he has there is incorrect (the C# stuff). I don’t have the math chops to know why, but I do know that I had to translate Dr. Math’s equation into a Lambda to make it work properly.

Putting It Together
OK so now we have the code – let’s use it! The first thing I can do, if my boss is a DBA and I’m in a SQL Server shop is to crank up an SP that uses the function above:

CREATE PROCEDURE FindNearestHotel(    @sourceLat real,    @sourceLong real,    @mileage int)ASBEGIN        SELECT     Name, Address, Latitude, Longitude,        dbo.DistanceBetween(            @sourceLat,            @sourceLong,            Hotels.Latitude,            Hotels.Longitude) as distance        FROM         Hotels        WHERE dbo.DistanceBetween(            @sourceLat,            @sourceLong,            Hotels.Latitude,            Hotels.Longitude)<@mileageEND

If I run this (using a test Lat/Long) it works nicely:

 

sptest

Exploding LINQ
When I wrote up that Lambda I remember giggling to myself, thinking “there’s just no way LINQ is going to know what to do with this!”. Was I right? Let’s see…

The first thing I did was to create a partial class and extend the Hotel class to have a method called “CalculateDistanceFrom()” – doing it this way will allow me to use the method as a condition in a LINQ where statement. So I added in this to my partial, referencing the Lambda function (and helpers) above:

        public double CalculateDistanceFrom(double fromLat, double fromLon)        {            return CalcDistance(this.Latitude,this.Longitude, fromLat, fromLon);        }

Next, I worked up the following LINQ statement:

 

        Geo.DataContext db = new Geo.DataContext();        var result = from hotel in db.Hotels                    where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100                    select hotel;

It reads nice doesn’t it? And the survey says….

KABOOOM!

LINQ didn’t like this at all – but I have to applaud it for trying! The error I got was “Cannot translate this to SQL” and that makes perfect sense, since there’s some whacky gymnastics in there. Hmmm – looks like I’ve maxxed out LINQ! Or did I…

I remember seeing some great querying examples from Anders and others, so I decided to do some caching of the data, and querying the cache instead of LinqToSql (db.Hotels). This is a nice alternative because it decouples my query from my DB! The List<Hotel> could have come from any DB provider (even, *gasp*, MySQL).

To make this work, I created a static List<Hotel> to hold ALL of the hotels in the DB in memory, and I then created a “singleton” access point to get this data:

        //cached data set of hotels        static List<Geo.Data.Hotel> hotels;        //singleton instance... sort of        static List<Geo.Data.Hotel> GetHotels()        {            if (hotels == null)            {                Geo.DataContext db = new Geo.DataContext();                hotels = db.Hotels.Where(hotel => hotel.Longitude != null).ToList<Geo.Data.Hotel>();            }            return hotels;        }

Now, I can rewrite the query pretty easily:

        var result = from hotel in GetHotels()                     where hotel.CalculateDistanceFrom(zip.Latitude, zip.Longitude) < 100                     select hotel;        foreach (var h in result)        {            Console.WriteLine(h.Name);        }        Console.ReadLine();

And it worked perfectly :) , which really isn’t a surprise. What is nice about this is that, once again, we see how nice the syntax is to read, rather than looping over foreach… loops.

Summary
So once again we’re reminded that there is translation going on here between you, LINQ, and your DB. I’d say I stretched my comfort zone trying to write that Lambda up – but it was pretty dang fun – especially seeing it work!

The question here is – does it belong in code, or the DB? What do you think? It is a data operation after all… isn’t it? Or is this more business logic? You tell me!