NoSQL – A Practical Approach, Part 1

In my last post I took a look at possible approaches to using NoSQL and Reporting, and many people wanted to see what this might look like in practice. In part one, below, I’ll show you ways to work with a NoSQL solution (in this case DB4O) in ways that you will find pretty familiar. I’ll also show you the freedom you can have as a developer when you stop thinking relationally.


The Problem

We’re building an application with [Favorite ORM] and so far it’s worked out pretty well, but we’ve had to make some compromises that we don’t particularly like and we’re finding that as our application matures we’re having to think way too much about how the data is being handled. Our “maintenance scaling” is not optimal. If you’re not one of these people, then you’re very lucky. I’ve written my own ORM and it still gets in my way, making me very cranky sometimes.

Step 1: Shift Your Thinking

We work in a pseudo-OO world, straddling a weird purgatory between relational and object-oriented programming. We’ve been lying to ourselves people – we can’t “go full OO” because our ORMs won’t like it.

ORMs are high-maintenance girl/boyfriends from hell that creep their little co-dependent weirdness deep into your application. Don’t believe me? What namespaces have you had to include in your object model lately? That ISession thing – yah that lives right in your FRICKIN GLOBAL ASAX.

Buddy, you moved your application into that apartment, got your own drawer, and now you’re having to deal with a high-maintenance partner from hell, who looked all sweet and honey-pottish when you first met but now is demanding foot rubs and chocolate. You get cranky – maybe say some things you didn’t mean and then…

There your ORM sits, pouting quietly.

I know what it is. You think I’m fat. That’s the problem – you don’t love me anymore.

You want to tell the truth:

Yes, well, sweetie 11 dlls – 800,000 lines of code (and counting) which all weighs in at 4Mb – all to query a database… yah YOU’RE A PIG

… but you don’t. You’re nicer than that and so you sit quietly and console your ORM. “I don’t mind working up another IUserType sweetie – it’s no big thing. This time I’ll be sure to override *all* the methods so you don’t fail completely when a weird value comes out of the database. It’s my fault… as always…”

There is an alternative and it requires something of you – open your mind and shift into OO. Many of the questions people have about OO/DocumentDBs can be answered by leaning on what you learned from school or books.

OODBs just serialize and persist your object to disk – keep this in mind, always. The dehydrate/rehydrate them and if you change a type on your object – well the hydrator will do what it can to help out. If you add a property, then all your prop settings for it will be null for all your objects. If you change a property name you’ll have to tell your persistence somehow.

Lots of questions – so let’s do this. I want to offer as much real code to get your juices flowing so here we go…

Step 2: You’ll Need A Helper

One thing that many people love about Rails is having so much support from the command line. You can kick up the console and work directly with your model interactively. We don’t have that with .NET and instead rely on our tooling – Visual Studio – which tells us most of what we want to know.

I like working with a console and I include one in every project. It helps me to work directly with the domain – adding in data from another system, spiking things, whatever. You’ll need one if you’re going to work with an OODB and I know that right there I’ve probably lost many of you. I know I can’t talk you into it – so go ahead and take off if you like :) . I think it’s your loss

…Because things become a whole lot easier when you get the hell out of the database (where you shouldn’t be anyway). Working in two different systems at once is not only disorienting – it’s also silly. If you need to rename a property and reset it’s data you can do so quickly with a method. I know it’s not as fast as rename column/regen codebase – but honestly it’s incredibly fast.

But wait. Wait just a damn minute. How often have you *needed* to do this? Yes yes I know that changing property names during development is something that happens – but how often have you had data that you needed to take with it? If you say “all the time” and you’re talking development – you’re doing it wrong.

If you’re talking about a hot-swap on a live system, well OK. In the 24 years that I’ve been doing this, I’ve changed a column name on a DB precisely twice – and I paid the damn price for it. But I know that people need to do this and I’m happy to say it’s actually *easier* with an object system because you don’t have to worry about blowing up Views or other stored queries.

To do this change you:

  1. Create your new property, leaving the old one
  2. Create a console task (or use Albacore) to copy, intelligently, the old property to the new one (perhaps there’s rules? Lookups?)
  3. Delete your old property and refactor your code

I don’t think that it’s that difficult – but I will acknowledge that it’s a downside. I’ll take that downside for the other improvements, however.

Step 3: You’ll Need The DLLs

For the code you’re about to see I’m going to use DB4O because it’s the most current, mature, and cheapest solution. It’s not free for a commercial project, but it is for developers. If you’re using it in Open Source they have a BSD-friendly license that you can use for distribution if you like.

You can download it from here.

Step 4: Switching Your Model Over

This is probably the simplest thing you can do – believe it or not. You don’t have to add any attributes, no weird base classes. You don’t have to mark your properties as virtual or implement some kind of interface – DB4O will just take your object, serialize it to disk, and let you go on to your business.

You can (GASP) even remove parameterless constructors. How many ORMs let you do that?

Bask in that for a second. You can use factories and patterns that help you write less code and you can do it completely without thinking about what your co-dependent database thinks. It’s so freeing that it’s actually a bit scary – just let it roll for a bit – we’ll come back to this.

Here’s my model – it’s the one I’m using with Oren for Kona, our app we’re building for the NHibernate series on Tekpub. I’ve copied the project on my hard drive and haven’t touched a thing, except to:

  • Remove the NHibernate namespacing that lives on a number of the model classes
  • Remove the mapping files (there were 8 of them)
  • Remove the IUserType stuff we added for episode 3

I left everything else the same.

Step 5: Setup Unit of Work

I really like the way NHibernate (and Linq to SQL) allow you to use Unit of Work. It’s a very natural way of persisting the data – so let’s set that up. First I’ll setup the interface:

Next I want to setup the connection to DB4O. Normally this is done by opening up a file but that won’t work on the web (it’ll work just fine for desktop apps) so I want to make sure to use the client/server model here which DB4O implements nicely:

Notice that I’m putting the data file into App_Data – this is because it’s a protected directory by ASP.NET and also allows me to terse up the connection string so I don’t need to put in a file reference.

The thing to note here is you want to be sure the server stays a singleton – which you can work with your favorite IoC if you like (it implements IObjectServer). There’s probably a better pattern here – happy to take recommendations.

Finally – let’s create our Unit of Work, which will implement our interface above:

Step 6: Taking it For a Spin

There’s more we’re going to do here – but for those of you feeling antsy, crank up a test project and add an App.config with a hard file reference for your “ObjectStore” connection string (you can’t use HttpContext in a test project unless you mock it- just hard code a file ref, it’s easier. Call it whatever you want but the extension should be “.yap”).

Here’s some code that will work, assuming you have a Category and Product object:

                [Fact]        public void DB_Should_Save_Multiple_Nested_Objects() {            using (var s = SessionFactory.Current) {                                s.DeleteAll<Category>();                s.DeleteAll<Product>();                var c = new Category("test1");                var c2 = new Category("test2");                var p = new Product();                p.SKU = "1234";                p.Name = "Jurge";                p.Categories.Add(c);                p.Categories.Add(c2);                s.Save(p);                s.CommitChanges();                var cat = s.Single<Category>(x => x.Name == "test2");                Assert.NotNull(cat);                Assert.Equal(2, s.All<Category>().Count());                Assert.Equal(1, s.All<Product>().Count());            }        }

Something to notice here – when I save the Product object (with a child IEnumerable<Category>) the objects are pealed apart and joined by pointers. Take that in for a moment: DB4O knows that these are complex objects and will separate them for you so you can query them individually (as opposed to a document db, which just embeds the objects).

Something else to notice is that to store these things you need to call “CommitChanges()”, which commits the files to disk. If you don’t do this the objects will still be “stored” – but only in RAM. This makes DB4O an ideal testing mechanism – you don’t have to mock out your repository (if you don’t want to) and you’re also working directly with the storage medium so you’ll know if there are any weirdnesses.

One question that comes up is:

How does DB4O know to add a new object, or update an old one?

This is a great question because the answer is so simple that, many times it’s a cold splash of water to shake you out of your Relational Malaise. The answer is “Equals() and GetHashCode()” – DB4O just checks an internal index to see if there is an existing hash for the object you pass in – reset GetHashCode to return the hash of your identifier and you’re all set.

In my case I overrode all of that (like you should anyway in an OO world) and set my Product.Equals()/GetHashCode() to evaluate the SKU.

Finally – for those wondering about transactions – that’s exactly what CommitChanges() does (which calls the DB4O Commit() method). You can Rollback() after Save() if you like – it’s all very intuitive and there’s a lot more info on their website including a 200 page PDF will all kinds of great examples.

Step 7: Making it Pretty

This all seems pretty good – but it would be fun to get past the method noise with all the generic typing – like Linq to SQL and SubSonic do with built-in properties for objects.

Great! Let’s do it:

    public class KonaSession:Db4oSession {        internal KonaSession(IObjectServer server) : base(server) { }        public IQueryable<Product> Products {            get {                return All<Product>();            }        }        public IQueryable<Order> Orders {            get {                return All<Order>();            }        }        public IQueryable<Customer> Customers {            get {                return All<Customer>();            }        }    }

To make this work I had to reset the SessionFactory to return KonaSession instead of Db4oSession (2 lines of code – once again focusing on OO means here, not relational weirdness):

    public class SessionFactory {        static ISession _current;        //this needs to stay static - can't have more than         //one server on the file        static IObjectServer _server;        public static ISession CreateSession() {            if (_server == null) {                string _dbPath = System.Configuration.ConfigurationManager                    .ConnectionStrings["ObjectStore"].ConnectionString;                //check to see if this is pointing to data directory                //change as you need btw                if (_dbPath.Contains("|DataDirectory|")) {                    //we know, then, that this is a web project                    //and HttpContext is hopefully not null...                    _dbPath = _dbPath.Replace("|DataDirectory|", "");                    string appDir = HttpContext.Current.Server.MapPath("~/App_Data/");                    _dbPath = Path.Combine(appDir, _dbPath);                }                _server = Db4oFactory.OpenServer(_dbPath, 0);            }            return new KonaSession(_server);        }        public static KonaSession Current {            get {                if (_current == null)                    _current = CreateSession();                return (KonaSession)_current;            }        }    }
 

Now you can query happily in the way you’ve been used to:

        [Fact]        public void I_Dont_Want_To_Have_My_Cheese_Moved_Too_Far_Away() {            using (var db = SessionFactory.Current) {                var prods = db.Products;                Assert.Equal(1, prods.Count());                          }        }

 

Summary

All of a sudden you begin to feel a ton of freedom that’s afforded by staying close to OO principles. In this case I can go further and seal up the DB4O-specific stuff – which will force people to only query through the KonaSession class above. This is nice because, ho ho ho, these are your AggregateRoots and you really only want people to access/work with the child object (such as Category) through the parent objects – just like DDD says you should.

In Part 2 I’ll do a deeper dive on ways to approach reporting.