Monday, November 03, 2008 -
Let's get this out of the way: I know you're going to think I'm nuts as you read this. You may "pfft" to what you're about to read - know that I know you're "pfft"-ing me. All I ask is that you consider what I'm about to suggest...
Development Friction
The term "friction" gets thrown around a lot in terms of development. It's what it sounds like: something you do or a process you undertake that slows you down as you crank out code. If you think on this for a second - when you're building an application what's the number one thing that slows you down (technically speaking. Running out of Red Bull doesn't count)?
For me it's the database "stuff". It's why I made SubSonic - I was tired of thinking about it and I wanted something faster and easier.
Tangent
This whole post (and thing I'm about to dive into) was dreamt up during a bike ride to the store. I thought a fun post idea was to "send a post back in time" and entitle it "Greetings from the Year 2012", and I would laugh at all the silly stuff I did in 2008. The top of that list, for me, is the continuing struggle we have with persisting data "properly". After 10+ years, the marriage of web and database is still arguing over the same old stuff. Can't we move on? Shouldn't this be easier by now?
You might have other sources of friction, or you might be saying "dude [MY ORM] roolz! LOLZ at U". I'll bet it does - but you still need to work with it (and your database) as you build out your site. Even if it takes you 1 minute to "update and regen" - you still have to mess with the DB and mess with mapping (if you do that).
I've talked with a lot of people over the last few weeks about this and asked them the same question:
In 5 years what do you think will finally be changed?
The answer, every single time, is a variation on "ORM's will finally work properly". What if I told you that you don't need to wait for 5 years for this? What if I told you to ditch your ORM and your database and focus on what's the most important thing: your application?
How Do You Do What You Do?
There are generally two camps of developers:
If you're a TDD fan (or want to be) you might be interested in Domain-Driven Development (DDD). Yes, it's another one of those buzz-words but you might actually be a DDD person right now, and not even know it. Check it out:
"What it's all about is creating as simple a model as possible, one that still captures what's important for the domain of the application. During development the process could really be described as knowledge-crunching by the developers and domain experts together. The knowledge that is gained is put into the model."
(Nilsson, Applying Domain-driven Design and Patterns)
Yes, I'm quoting a DDD book. I've been absorbed :). The point here is that most of us have always worked this way - working closely with clients to understand their business, and making sure what we build is focused.
The split comes in with what system you work from: the database or the tests? Which is more appropriate in terms of building out an application domain? I'm hoping to convince you, today, to toss your database as you don't need it. Not yet anyway. Focus on your tests and your domain and I think you'll see that you can move a lot faster.
Database YAGNI (or the Database as a Feature)
YAGNI is the principle of "You Aint Gonna Need It", which essentially means "don't add it unless requirements/testing make you add it" and is one of the main drivers of TDD. It keeps code light and manageable, and keeps cruft out of your application.
I think this should apply to architectures as well - why implement, design, and build a database (with data access code) when the application doesn't need it? And yes, this is where the Crazy Talk kicks in.
<CrazyTalk>
Databases are pretty heavy information organization and retrieval systems. Even the lighter-weight ones are capable of powering some pretty high-level querying at very rapid speeds.
Do you need this when you're developing? Do you need to play "ORM-Catch up"? Probably not.
What if, in 10 years, the platform could just translate your model for you and save it properly without you thinking about it?
What if I told you that you can do that now? Well you can - and this is where you're gonna say "PFFT" and tell me I'm crazy. But as I said - I know you're going to say it... so just pretend I can hear you...
</CrazyTalk>
Off The Reservation with OODBs
Object Databases have been around for a long, long time. If you've never heard of these things, well they basically crumple your object up into binary (by serialization) and save it to disk for you to access later. If you'd like to read more, this is a great post on OODBs, what they are, and how they work. To summarize:
An OODBMS is the result of combining object oriented programming principles with database management principles. Object oriented programming concepts such as encapsulation, polymorphism and inheritance are enforced as well as database management concepts such as the ACID properties (Atomicity, Consistency, Isolation and Durability) which lead to system integrity, support for an ad hoc query language and secondary storage management systems which allow for managing very large amounts of data. The Object Oriented Database Manifesto [Atk 89] specifically lists the following features as mandatory for a system to support before it can be called an OODBMS; Complex objects, Object identity, Encapsulation , Types and Classes ,Class or Type Hierarchies, Overriding,overloading and late binding,Computational completeness , Extensibility, Persistence , Secondary storage management, Concurrency, Recovery and an Ad Hoc Query Facility.
Here's my thought for you: What if you used an OODB for development ONLY and implemented SQL Server later, when you know what you need to create.
You can't ditch your RDBMS entirely - never. However I will suggest that working on two systems at once, when you don't need to, is silly. You can do a lot more in terms of RAD development right now - and port to SQL later.
Doesn't it make more sense to create the database when, and only when, you need it? Hold that thought - I'm coming back to it.
DB4O - A Free OSS Object Database
I've been using DB4O a lot over the last few months and I really like it. The tutorials are very easy and getting up to speed is no problem at all. First, however, I'd like to go over some things you might be wondering about.
Let's say I have three objects in my model:
public class Product { public string Name { get; set; } public Supplier Supplier { get; set; } public IList<Review> Reviews { get; set; } } public class Review { public string AuthorEmail { get; set; } public string Body { get; set; } } public class Supplier { public string Name { get; set; } }
You might be wondering about how these objects are persisted, and how integrity might be enforced. It's actually required that they enforce many of the same concepts as an RDBMS so that you don't make a mess out of your model storage.
For instance, if I create a Supplier for a Product, it will be stored as an independent Supplier that I can then assign to another Product. If I change that Supplier's name, it will get changed for both. It's a single object, and the OODB works with the idea of "pointers" in the same way that a regular database does. The thing here, however, is that there is no joining - the relationships implicit and understood. In this way, an OODB can actually (and often do) outperform their RDBMS counterparts.
But I'm not here to talk about perf and scaling - it doesn't matter for what I'm suggesting :).
If you're curious and want to play along at home, go and download DB4O from their website and install it. To use it you have to make a few references in your application (specifically Db4objects.Db4o.dll). You then need to write some wrapper code - but I've got that covered for you:
using Db4objects.Db4o; using System.Web; using System.IO; public class DB4O { static readonly object padlock = new object(); // static object container variable static IObjectContainer _db = null; static string _dbPath = System.Configuration.ConfigurationManager.ConnectionStrings["ObjectStore"].ConnectionString; public static string DBPath { get { return _dbPath; } set { _dbPath = value; } } public static IObjectContainer Container { get { lock (padlock) { if (_db == null) { //check to see if this is pointing to data directory //change as you need btw if (_dbPath.Contains("|DataDirectory|")) { //we know, then, that this is a web project //and HttpContext is hopefully not null... _dbPath = _dbPath.Replace("|DataDirectory|", ""); string appDir = HttpContext.Current.Server.MapPath("~/App_Data/"); _dbPath = Path.Combine(appDir, _dbPath); } _db = Db4oFactory.OpenFile(_dbPath); } return _db; } } } public static void CloseContainer() { if (_db != null) { _db.Close(); } _db = null; }
This code assumes that you
Have an App_Data directory and that you have a connection string in your Web.config like this one:
<add name="ObjectStore" connectionString="|DataDirectory|ObjectStore.yap"/>
Yes, that's a singleton you see there, and yes I know it's probably making you cringe. The reason we need to use a singleton is that the IObjectContainer locks the binary file where the data is kept. File locking and singletons might not sit well with you right now, but in this case is that the way I have this setup here is for a single user - ME - because I'm developing against it. If this were a live app I would be able to set a bunch of settings to make this thread-safe etc.
But it's not production - it's development only (have I mentioned this yet?) so you don't need to worry about singletons, perf, and scaling.
Great, so now that we have our container, let's store an object. This isn't really a test, but you've read a lot more crazy stuff so far, so this won't surprise you much. Consider this a spike please - or maybe pretend I'm asserting something:
[TestMethod] public void ObjectRepo_Should_Store_Product() { Product p = new Product(); p.Name = "test product"; p.Supplier=new Supplier("Test supplier"); DB.Container.Store(p); }
Yes, it's that easy. But that's not the best part. I can add a DLL to the project that DB4O just released, called "Db4objects.Db4o.Linq" and it does what you might imagine, which is exremely cool:
[TestMethod] public void ObjectRepo_Should_Return_Product() { var result = from Product p in DB.Container where p.Name == "test product" select p; Assert.AreEqual(1, result.Count()); }
Yes, that would be LINQ, working with an OODB container. We can also query a bit deeper, with no problems:
[TestMethod] public void ObjectRepo_Should_Return_Product_By_Supplier() { var result = from Product p in DB.Container where p.Supplier.Name == "Test supplier" select p; Assert.AreEqual(1, result.Count()); }
And to illustrate my point above about object independence and integrity, I can also get the Supplier, independent of the Product:
[TestMethod] public void ObjectRepo_Should_Return_Supplier() { var result = from Supplier s in DB.Container select s; Assert.AreEqual(1, result.Count()); }
This, literally, is the tip of the iceberg. DB4O has support for transactions, indexing, and many different ways of querying to improve performance and usage. But I'm not talking about perf here :) - it doesn't matter, this is only for development.
Also it's worth noting that if I change the properties of Product around, it won't break. The changed property just won't get loaded - but everything else will. So you can change and alter as required and nothing breaks!
Real-World Application
What I'm suggesting is that you can create a IRepository<T> and then implement a nice ObjectRepository<T> to work with in your application. It's very simple - and yes, here's some more code for you:
using System; using System.Collections; using System.Linq; using System.Linq.Expressions; public interface IRepository<T> { IQueryable<T> GetAll(); PagedList<T> GetPaged(int pageIndex, int pageSize); IQueryable<T> Find(Expression<Func<T, bool>> expression); void Save(T item); void Delete(T item); }
Now, you can implement this quite nicely with DB4O:
using System; using System.Collections; using System.Linq; using Db4objects.Db4o.Linq; public class ObjectRepository<T> : IRepository<T> where T: class { /// <summary> /// Returns all T records in the repository /// </summary> public IQueryable<T> GetAll() { return (from T items in DB4O.Container select items).AsQueryable(); } /// <summary> /// Returns a PagedList of items /// </summary> /// <param name="pageIndex">zero-based index to be used for lookup</param> /// <param name="pageSize">the size of the paged items</param> /// <returns></returns> public PagedList<T> GetPaged(int pageIndex, int pageSize) { var query=(from T items in DB4O.Container select items).AsQueryable(); return new PagedList<T>(query,pageIndex,pageSize); } /// <summary> /// Finds an item using a passed-in expression lambda /// </summary> public IQueryable<T> Find(System.Linq.Expressions.Expression<Func<T, bool>> expression) { return GetAll().Where(expression); } /// <summary> /// Saves an item to the database /// </summary> /// <param name="item"></param> public void Save(T item) { DB4O.Container.Store(item); } /// <summary> /// Deletes an item from the database /// </summary> /// <param name="item"></param> public void Delete(T item) { DB4O.Container.Delete(item); } }
If you're wondering what a "PagedList" is - you can find out more here.
ORM Selection Made Easy
Suppose you didn't need to worry about your database as you build your application. Better yet, suppose you didn't need to worry about your ORM! This latter thought is actually a critical, critical item. When I suggested that you could wait until you're about to launch to actually implement a database, a friend of mine asked me "well if you do that, you might back yourself into such a complex model that your ORM won't handle it. What do you do then?".
And I said "BINGO".
Put another way, what you're getting by not worrying about your ORM (until you need it) is the freedom to develop your app without influence from your database. It is true that you can make a model that's too complex for your favorite ORM - but doesn't that mean your favorite ORM probably wasn't up to the task anyway? Isn't it much nicer to find that out the easy way?
In most cases you can probably just jump over to SQL very simply, just by replacing the reference to ObjectRepository to something like SqlRepository (this code is using Linq To Sql - but you can change this out with EF in the future - I'll try to update this later):
using System; using System.Collections; using System.Data.Linq; using System.Linq; public class SqlRepository<T> : IRepository<T> where T: class { NorthwindDB.DB _db = null; public SqlRepository(){ _db=new NorthwindDB.DB(); } /// <summary> /// Gets the table provided by the type T and returns for querying /// </summary> private Table<T> Table { get { return _db.GetTable<T>(); } } /// <summary> /// Returns all T records in the repository /// </summary> public IQueryable<T> GetAll() { return Table; } /// <summary> /// Returns a PagedList of items /// </summary> /// <param name="pageIndex">zero-based index to be used for lookup</param> /// <param name="pageSize">the size of the paged items</param> /// <returns></returns> public PagedList<T> GetPaged(int pageIndex, int pageSize) { return new PagedList<T>(Table, pageIndex, pageSize); } /// <summary> /// Finds an item using a passed-in expression lambda /// </summary> public IQueryable<T> Find(System.Linq.Expressions.Expression<Func<T, bool>> expression) { return Table.Where(expression); } /// <summary> /// Saves an item to the database /// </summary> /// <param name="item"></param> public void Save(T item) { if (!Table.Contains(item)) { Table.InsertOnSubmit(item); } _db.SubmitChanges(); } /// <summary> /// Deletes an item from the database /// </summary> /// <param name="item"></param> public void Delete(T item) { Table.DeleteOnSubmit(item); _db.SubmitChanges(); } }
Since we're wrapping everything in IRepository<T>, you can swap parts as needed. I'll be you were wondering why you might want to develop this way (I know I was a long time ago - who really ever swaps components anyway?) - but this is a good example of why you might want to decouple your system as much as possible.
Swapping out data stores like this is equivalent to turning your minivan into a ferrari if you want to drive the Autobahn, and back again when you need to get the kids from Soccer.
Am I nuts? If you think I am - please give me some details as I think this would make for an interesting discussion. Just don't tell the Alt.NET guys yet :).
As for the idea. I like the fact that it reduces friction up-front.
I don't like the fact that it creates a debt to be paid off downstream. That is, when the time comes to switch to RDBMS. Or, if you try to make repositories interchangeable to ease that pain, it then feels like you're *adding* friction up-front, just to prevent having to do more work later.
I feel like the happy ending here would simply be not to spend time switching to a RDBMS. Why not just use db40 in production? If someone wants a RDBMS in the picture, write ETL scripts to export to a warehouse :) Their dRS replication system seems designed for just that task.
I recently began a project at work and am doing my best to stick to TDD as inspired by your MVC Storefront stuff. However, I keep finding myself getting pulled back to the database. At the very least what you're proposing here would remove the temptation of the database, at least initially, and allow me to focus more strictly on tests, implementing the model and the UI until I have something that is ready.
Nore sure I'm very clear (english is not my first language). In other terms I would like to write a ProductRepository that takes an injected SqlRepository, but genericity prevents me to do it. In that case, I don't see what this new method changes compared to what you do in the Storefront where you also have nice interfaces (like ICatalogRepository) that already decouples your code from the persistence layer.
Thanks
What are your thoughts on using a double mapping model...where you make simple mappings from your database to your O/RM and then map from that to your Domain Model? In this case your Repository could be over the entity model and it return Domain Model objects.
One big problem is that for most applications, trying to change OODB to RDBMS would not work without a LOT of work.
Ayende, you make a nice broad negative statement there, without anything to back it up. Why does moving from OODB to RDBMS necessarily involve any more work than designing the schema and mapping it to your object model, which is the exact same work you would do if you started with the RDBMS?
[)amien
Rob, can you demonstrate how to implement IRepository
Still, most businesses I work with are fanatically committed to relational databases. Looks like they'll still be the default in 10 years. There's something about databases that inspires people to be conservative and to accept change only on a geological timescale.
Lemme work on this a minute - I have some ideas...
What's the level of maturity of such project?
Why did you keep mentioning to not think about performance/scalling? What's the issues with perf/scalling using OODB's?
Could this be used in a RL scenario? For example StoreFront, would it be up to the task?
the only thing is that you don't get the benefit of testing the rdbms as you develop.
Thanks for sharing your thoughts.
Like you said, it sounds crazy but some of us just plain don't care about storage whatsoever. Users don't care about storage and as a result neither do I.
By the way, this is a particularly great point of view from someone who knows so much about ORM. Keep pushing this.
Also this brings up another point...what if you have to support an existing DB schema? If you've been working "un-teathered" with your own data model that makes sense just within your app's domain, you may have a LOT of work ahead of you to re-work your data model to perform well with your legacy database.
Version one of the system I am working on will be taking db4o into production, assuming it performance tests well. We are keeping the repository implementation even simpler than yours (only fetching by aggregates by identifier) and using lucene for complicated queries.
It's also a detail of the repository implementation, so it definitely doesn't need to be addressed if Rob is using db4o as a prototyping tool.
I think a good example of this might be to see how hard it would be to apply IRepositorty
The fact of the matter is that OODBs actually hold the record for the largest, fastest database in history. They have very distinct advantages for simple data retrieval (no joins is one of em). There are issues - like any system. But mostly OODBs are more than up for the task.
In terms of maturity - DB4O is on version 7 and has been around 8 years. Other products (like Matisse and Versant) have been around longer and are big time industrial scale.
Matisse, for example, comes with a query manager that lets you write SQL :).
"Visual Studio Syndrome" is when people bang out a project quickly in visual studio, and then find it takes a few months to get the project working in a production environment. It's encouraged by a number of Microsoft tools that let you hit "F5" to run your project in a quirky development environment that has only a superficial resemblance to the environment that your product will run in.
Effective developers work in an environment that is as close to identical as the production environment as possible. It ought to be possible to build a copy of the development system for a new developer by following a checklist, and to build a new production system and move the data to it quickly. It ought to be possible to build a staging server that has a complete copy (or a big chunk) of the production system data so you can acceptance test changes before putting them into production.
You also need to consider the long-term evolution of the production system. There's going to be a day when you need to add a few new columns or a few tables to your database. It may be a few days, a few months or a few years after it goes into production, but it will happen if the system is successful. Your approach doesn't address the data migration issues, but sweeps them under the rug -- rather than solving the "two artifact" problem which is so deadly to ORM, it intensifies it.
I honestly don't see what the tooling has to do with anything here. What I'm discussing is a development pattern that relates to architecture of the application. I'm trying to encourage TDD/DDD (as opposed to data-first, which is usually VS/RAD's focus).
>>Effective developers work in an environment that is as close to identical as the production environment as possible
Sure - I agree. Code has nothing to do with instrumentation does it?
>>>It ought to be possible to build a staging server that has a complete copy (or a big chunk) of the production system data so you can acceptance test changes before putting them into production
Agree. Are you assuming that you can't have test data in the OODB? If so, why not?
>>>There's going to be a day when you need to add a few new columns or a few tables to your database. It may be a few days, a few months or a few years after it goes into production, but it will happen if the system is successful. Your approach doesn't address the data migration issues, but sweeps them under the rug
How so? If you use an OODB to get yourself to Production, let's say, then you load your data into a relational structure in an RDBMS then won't you be at the exact same point? What's swept under the rug? You lost me here.
As a developer I don't care about storage, why should I, it should just work. How hard can it be?
I would really like to see more discussion about this, keep up the good work.
Player Table
PlayerId non-null
Name non-null
CurrentGameId non-null
Game
GameId non-null
If I deleted a Game it would blow up deleting the player because the only way it seems to work is by setting CurrentGameId = null first and then doing a delete. This of course blows up.
When I first started to deal with Domain Model, I always wanted to jump right in and crank out the schema so that I could have a solid 'foundation' to work off of. After a few projects, it became rather apparent that if you build out a schema first - you are tying your hands and really limiting your flexibility. You become bound to a schema that you designed before you even started coding!
On the other hand, if you ditch the schema until the very end, you are free to run rampant breaking and changing stuff as you please, which is more of my style anyways. The only downside I've encountered with postponing the schema until the end was having persistent data, and you just cured that, THANKS!
Quick side question: what blogging platform is this running on (assuming that its not custom built)?
Thanks,
Chance
During development, managing these things is painful, especially if each developer is developing against a local copy of the database. You can even end up with two incompatible change scripts to the database.
My only concern with the approach that you outlined is that it doesn't immediately allow for including business rules in the database which is (according to some) where these types of things belong (for a variety of reasons that go beyond this comment).
On the whole, very sane and thought out argument. Even as a database developer I found myself agreeing with you by the end. It's important to have a stable product before attempting to create a well thought out and properly optimized database.
In the meantime, I'm still having trouble getting my head out of the RDBMS. Complex joins always tend to break down my model. For example - say given the above you want to get the Suppliers who have products that were reviewed by a given email:
IRepository
var suppliers = rep.Find((s) => ???);
reply
But that doesn't matter during development :).
I took a short look at oodbms about 10 years ago. At that time they weren't really mature enough for larger applications and "real world" use. Has anyone looked at others systems than db4o lately?
The IRepository interface can serve a large part of your persistence needs but as they say there will be many scenarios in a large system in terms of both domain complexity and volume of data. Two examples of queries that aren't a good fit for Linq or any other ORM are SQL that uses case expressions or nested table expressions or relational OLAP constructs, but such queries (at least in my experience) are a small part of the overall code base and can be isolated in your persistence interface. Perhaps the IVeryNastyQueriesRepository ;o) That way the databasephobics can ignore that part of the system or refer the problem to the UserInterfacephobics.
Ignoring the database implementation until the very end - again in a large system - is like ignoring any other key piece of integration your system may require. It poses a significant risk and as such should be addressed as early in the project as possible. There are a number of database tools out there that work well with the process of refactoring your application - http://www.liquibase.org/ is currently my favourite tool for this. Such a tool is easy enough to incorporate into your build process so that rather than following a set of instructions a developer can check out the code, run the build script and be up and running with everything they need including test data.
Being blond, I got lost in a few spots towards the end .. so i'll need to read it a dozen more times. That said, I really love the idea. Right now, i'm basing my current projects of your StoreFront code. So for me, i've got TestRepositories which have hard coded stuff. My latest project has various 'pages' and is all working funky-dory .. and i haven't even created (nor thought about) generating my SqlRepository just yet. That's the _very very last_ thing on my mind and scheduled in.
Having to NOT WORRY about a Sql repository is a blessing because we can worry about getting the application RIGHT. Persistence is a requirement, but it's so not a worry to initially implement, that it can be delt with at the end of the development cycle IMO. Getting the business logic right and communicating to the user (ie. UX) the requirements should be delt with first. That's what's initially important.
Would it a long shot to maybe ask the StoreFront to get an update .. to have DB40 magic? At least part of the project? *beg and grovel?*
-PK-
>>>Ignoring the database implementation until the very end - again in a large system - is like ignoring any other key piece of integration your system may require
It's a fair point - but honestly won't you need to address the very same issue anyway? If you have a complex data interaction - it will happen at some point even if you ignore it using OODB. Which would you prefer: to have spent a bunch of time developing an ORM only to find out you now have to change in mid-course because of this complexity, or to let the dev process roll out and know precisely how to work with the problem at hand?
I don't think focusing on good OO architecture will be bad for your DB. And if it is - which is more important? Ahhh the million dollar question :).
Only if I've never used an ORM before on a large system - otherwise there'd be some serious prototyping to do, but I take your point. My take is I don't see this as an either or situation - more of a nice approach to consider and use for as long as it works. I think focusing on good OO architecture, good back end relational design and prudent risk management are all important. What is most important depends on the project requirements.
http://seesharper.wordpress.com/2008/11/05/the-great-data-context-in-the-cloud/
http://seesharper.wordpress.com/2008/11/06/the-great-data-context-in-the-cloud-part-ii-synchronisation/
Also, the MTG card brought back some memories!
2- And I am currently taking over a project done in this fashion, ( but in memory Reposity with fillers to make necessary data avail) and the Model for the most complex part is just made in a way that you cannot translate it to a RDMS mapping.
My first point is my biggest concern, the second is mainly due to inexperience developping with this approach, but it would be for sure a pitfall to watch.
Great post!
Also I think it's weird that you can't map a model to an RDBMS.
Sent from my phone. Please excuse brief replies.
select * so no worries there. Also delayed execution applied the
filter first, not after.
For the Model I could not map to an RDBS, well persisting data to a OODB is so simple, they persisted a part that does not belong to persistence but should be computed on demand, or you end up with redundant data across your database. But now swicthing repository involve a lot of work being able to reconstuct complex object from other entities, this should have been spot sooner.
In my experience, there's always a bit of give-and-take between the DB design/ORM/Object model (we're not all Ayende - and can't all convince our customers that running off of open source trunk code is a good idea), and you're likely to encounter a case where something that you can do with objects can't be done with DB4O, or you can do it with DB4O, but not your ORM, etc...
Also, your object design might have to be changed to get the performance you need. I'd hope this is a really edge case, but I had a certain project where a Many-to-One relationship ended up being stored by passing XML to SQL, and then having a trigger on an XML column. Nasty, nasty stuff, but it was sooo much quicker than the ORM's implementation of storing a Many-to-one relationship.
There are always cases where how you do something is influenced by which tools you're using to do it (compare ASP.NET with ASP.NET MVC, and compare ASP.NET MVC with Monorail), and not using the target technologies to develop against means you're going to hit problems later.
And, yes, performance is a development concern. No matter what you think.
I've always been a perf nut, but only over the last 2.5 years have I been swayed to address perf when needed. And believe it or not - 9 times out of 10 it's not that big a deal. More often perceived slowdowns are from too much client code (js libraries, etc); almost never a client call.
Perf shouldn't really be handled until you need to (or until a test id's a problem). ADO is really fast these days - and attributing bottlenecks to DB calls is really something from 4-5 years ago.
Attributing performance to DB Queries is still a big issue, especially when you're dealing with millions of rows.
By doing this it should eventually drive the tools to catch up with development methods. We might get to object persistence nirvana a little bit sooner than we otherwise would.
There are a couple links that look like feature requests on the Db4o site
http://tracker.db4o.com/browse/COR-1376
http://tracker.db4o.com/browse/COR-1143
related to TransactionScope.
Below is the unit test I would like to figure out how to make pass.
[Test]
public void CanRollbackSavedObjectInRepository()
{
var newObject = new TestObject()
{
SomeInteger = 1,
SomeString = "string here",
};
using (TransactionScope scope = new TransactionScope())
{
_repository.Save(newObject);
if (_repository.GetAll().Count() != 1)
Assert.Fail("Singe item was not saved to the object database. itemCount = " + _repository.GetAll().Count().ToString());
//scope.Complete();
}
if (_repository.GetAll().Count() > 0)
Assert.Fail("TransactionScope Failed");
}
Any ideas?
I found a project at http://code.google.com/p/uniframework/
In this project there's an implementation of IEnlistmentNotification called Db4oEnlist and in my repository I can do something like this...
public void Save(T item)
{
Db4oEnlist enlist = new Db4oEnlist(this.Container, item);
bool inTransaction = Enlist(enlist);
this.Container.Store(item);
if (!inTransaction)
this.Container.Commit();
}
the Enlist method looks like...
private bool Enlist(Db4oEnlist enlist)
{
System.Transactions.Transaction currentTx = System.Transactions.Transaction.Current;
if (currentTx != null)
{
currentTx.EnlistVolatile(enlist, EnlistmentOptions.None);
return true;
}
return false;
}
Unless there's a better way out there to do this using TransactionScope... I was able to get my unit test above to pass...
mean :). Are you saying that wrapping DB4O calls in a transaction scope
won't work?
That said, I know DB40 will support them as well.
not sure what you mean by "I know DB40 will support them as well."
from Product p in DB.Container
instead of this
from Product p in context.Products
So, the products are just in the container, no indirection needed.
I will test it out for sure. It might be a goo option for a single user application instead og SQLite+NHibernate, like you said, stop worrying about all that.
1. Db4o does not take kindly to multiple web sites sharing a database file. This is per design but is really a show stopper since my application consists of two distinct websites operating on the same data. Also Db4o seems to get confused saving/loading complex hierarchies of references. Loading a list (100 elements) that references a set of elements which each reference each other through a tree took 10 seconds to load.
2. It could be that I am too used to SQL, but all my objects have one or more IDs that are either their own ID or a reference. These IDs are normally assigned by the database, but Db4o does not assign IDs. So to use both SQL and Db4o I need to either make Db4o assign IDs like SQL and thus know about which field is an ID or I have to hide the IDs. Neither solution is attractive - making Db4o assign IDs seems the safest.
3. The next problem I encountered was that my Db4o repository only needs to work on one type while my Oracle repository needs to work on two types (remember I have 2 tables). This is because Db4o stores a whole object including references and references references. This is a cool feature, but SQL works differently and really provides a model that is very different.
My conclusion is that your idea is sound, especially for a quick spike which needs some persistence. Db4o might also be useful for the initial stages of a website, but once you decide to switch to SQL then you will need to make several changes and you can't go back. I would not suggest trying to run a SQLRepository side by side with a Db4oRepository. The models are too different and the result feels wrong.
What are your thoughts on using a double mapping model...where you make simple mappings from your database to your O/RM and then map from that to your Domain Model? In this case your Repository could be over the entity model and it return Domain Model objects.
One big problem is that for most applications, trying to change OODB to RDBMS would not work without a LOT of work.
Ayende, you make a nice broad negative statement there, without anything to back it up. Why does moving from OODB to RDBMS necessarily involve any more work than designing the schema and mapping it to your object model, which is the exact same work you would do if you started with the RDBMS?
[)amien
Still, most businesses I work with are fanatically committed to relational databases. Looks like they'll still be the default in 10 years. There's something about databases that inspires people to be conservative and to accept change only on a geological timescale.
What's the level of maturity of such project?
Why did you keep mentioning to not think about performance/scalling? What's the issues with perf/scalling using OODB's?
Could this be used in a RL scenario? For example StoreFront, would it be up to the task?
The fact of the matter is that OODBs actually hold the record for the largest, fastest database in history. They have very distinct advantages for simple data retrieval (no joins is one of em). There are issues - like any system. But mostly OODBs are more than up for the task.
In terms of maturity - DB4O is on version 7 and has been around 8 years. Other products (like Matisse and Versant) have been around longer and are big time industrial scale.
Matisse, for example, comes with a query manager that lets you write SQL :).
the only thing is that you don't get the benefit of testing the rdbms as you develop.
Thanks for sharing your thoughts.
Like you said, it sounds crazy but some of us just plain don't care about storage whatsoever. Users don't care about storage and as a result neither do I.
By the way, this is a particularly great point of view from someone who knows so much about ORM. Keep pushing this.
Also this brings up another point...what if you have to support an existing DB schema? If you've been working "un-teathered" with your own data model that makes sense just within your app's domain, you may have a LOT of work ahead of you to re-work your data model to perform well with your legacy database.
I think a good example of this might be to see how hard it would be to apply IRepositorty<T> to something like AdventureWorks. If you use SqlRepository<T> and then inherit from it to something like SqlCatalogRepository, you should be able to map things quite easily. All in all, when I'm doing this with Linq To Sql and the Storefront, it doesn't take me too long (initially). What DOES take time is doing it over, and over, and over :).
"Visual Studio Syndrome" is when people bang out a project quickly in visual studio, and then find it takes a few months to get the project working in a production environment. It's encouraged by a number of Microsoft tools that let you hit "F5" to run your project in a quirky development environment that has only a superficial resemblance to the environment that your product will run in.
Effective developers work in an environment that is as close to identical as the production environment as possible. It ought to be possible to build a copy of the development system for a new developer by following a checklist, and to build a new production system and move the data to it quickly. It ought to be possible to build a staging server that has a complete copy (or a big chunk) of the production system data so you can acceptance test changes before putting them into production.
You also need to consider the long-term evolution of the production system. There's going to be a day when you need to add a few new columns or a few tables to your database. It may be a few days, a few months or a few years after it goes into production, but it will happen if the system is successful. Your approach doesn't address the data migration issues, but sweeps them under the rug -- rather than solving the "two artifact" problem which is so deadly to ORM, it intensifies it.
I honestly don't see what the tooling has to do with anything here. What I'm discussing is a development pattern that relates to architecture of the application. I'm trying to encourage TDD/DDD (as opposed to data-first, which is usually VS/RAD's focus).
>>Effective developers work in an environment that is as close to identical as the production environment as possible
Sure - I agree. Code has nothing to do with instrumentation does it?
>>>It ought to be possible to build a staging server that has a complete copy (or a big chunk) of the production system data so you can acceptance test changes before putting them into production
Agree. Are you assuming that you can't have test data in the OODB? If so, why not?
>>>There's going to be a day when you need to add a few new columns or a few tables to your database. It may be a few days, a few months or a few years after it goes into production, but it will happen if the system is successful. Your approach doesn't address the data migration issues, but sweeps them under the rug
How so? If you use an OODB to get yourself to Production, let's say, then you load your data into a relational structure in an RDBMS then won't you be at the exact same point? What's swept under the rug? You lost me here.
As a developer I don't care about storage, why should I, it should just work. How hard can it be?
I would really like to see more discussion about this, keep up the good work.
In the meantime, I'm still having trouble getting my head out of the RDBMS. Complex joins always tend to break down my model. For example - say given the above you want to get the Suppliers who have products that were reviewed by a given email:
IRepository<Supplier> rep = ...;
var suppliers = rep.Find((s) => ???);
But that doesn't matter during development :).
I took a short look at oodbms about 10 years ago. At that time they weren't really mature enough for larger applications and "real world" use. Has anyone looked at others systems than db4o lately?
The IRepository interface can serve a large part of your persistence needs but as they say there will be many scenarios in a large system in terms of both domain complexity and volume of data. Two examples of queries that aren't a good fit for Linq or any other ORM are SQL that uses case expressions or nested table expressions or relational OLAP constructs, but such queries (at least in my experience) are a small part of the overall code base and can be isolated in your persistence interface. Perhaps the IVeryNastyQueriesRepository ;o) That way the databasephobics can ignore that part of the system or refer the problem to the UserInterfacephobics.
Ignoring the database implementation until the very end - again in a large system - is like ignoring any other key piece of integration your system may require. It poses a significant risk and as such should be addressed as early in the project as possible. There are a number of database tools out there that work well with the process of refactoring your application - http://www.liquibase.org/ is currently my favourite tool for this. Such a tool is easy enough to incorporate into your build process so that rather than following a set of instructions a developer can check out the code, run the build script and be up and running with everything they need including test data.
>>>Ignoring the database implementation until the very end - again in a large system - is like ignoring any other key piece of integration your system may require
It's a fair point - but honestly won't you need to address the very same issue anyway? If you have a complex data interaction - it will happen at some point even if you ignore it using OODB. Which would you prefer: to have spent a bunch of time developing an ORM only to find out you now have to change in mid-course because of this complexity, or to let the dev process roll out and know precisely how to work with the problem at hand?
I don't think focusing on good OO architecture will be bad for your DB. And if it is - which is more important? Ahhh the million dollar question :).
Only if I've never used an ORM before on a large system - otherwise there'd be some serious prototyping to do, but I take your point. My take is I don't see this as an either or situation - more of a nice approach to consider and use for as long as it works. I think focusing on good OO architecture, good back end relational design and prudent risk management are all important. What is most important depends on the project requirements.
Being blond, I got lost in a few spots towards the end .. so i'll need to read it a dozen more times. That said, I really love the idea. Right now, i'm basing my current projects of your StoreFront code. So for me, i've got TestRepositories which have hard coded stuff. My latest project has various 'pages' and is all working funky-dory .. and i haven't even created (nor thought about) generating my SqlRepository just yet. That's the _very very last_ thing on my mind and scheduled in.
Having to NOT WORRY about a Sql repository is a blessing because we can worry about getting the application RIGHT. Persistence is a requirement, but it's so not a worry to initially implement, that it can be delt with at the end of the development cycle IMO. Getting the business logic right and communicating to the user (ie. UX) the requirements should be delt with first. That's what's initially important.
Would it a long shot to maybe ask the StoreFront to get an update .. to have DB40 magic? At least part of the project? *beg and grovel?*
-PK-
I recently began a project at work and am doing my best to stick to TDD as inspired by your MVC Storefront stuff. However, I keep finding myself getting pulled back to the database. At the very least what you're proposing here would remove the temptation of the database, at least initially, and allow me to focus more strictly on tests, implementing the model and the UI until I have something that is ready.
Nore sure I'm very clear (english is not my first language). In other terms I would like to write a ProductRepository that takes an injected SqlRepository, but genericity prevents me to do it. In that case, I don't see what this new method changes compared to what you do in the Storefront where you also have nice interfaces (like ICatalogRepository) that already decouples your code from the persistence layer.
Thanks
Rob, can you demonstrate how to implement IRepository<T> with the double-mapping pattern? For example, could you demonstrate how to implement IRepository<T> with the data access approach in the MVC storefront? It seems like T comes from the Model and the Repository only understands the ORM generated "version" of the class... is there some magic mapping that can happen between the types?
Lemme work on this a minute - I have some ideas...
Version one of the system I am working on will be taking db4o into production, assuming it performance tests well. We are keeping the repository implementation even simpler than yours (only fetching by aggregates by identifier) and using lucene for complicated queries.
It's also a detail of the repository implementation, so it definitely doesn't need to be addressed if Rob is using db4o as a prototyping tool.
Player Table
PlayerId non-null
Name non-null
CurrentGameId non-null
Game
GameId non-null
If I deleted a Game it would blow up deleting the player because the only way it seems to work is by setting CurrentGameId = null first and then doing a delete. This of course blows up.
When I first started to deal with Domain Model, I always wanted to jump right in and crank out the schema so that I could have a solid 'foundation' to work off of. After a few projects, it became rather apparent that if you build out a schema first - you are tying your hands and really limiting your flexibility. You become bound to a schema that you designed before you even started coding!
On the other hand, if you ditch the schema until the very end, you are free to run rampant breaking and changing stuff as you please, which is more of my style anyways. The only downside I've encountered with postponing the schema until the end was having persistent data, and you just cured that, THANKS!
Quick side question: what blogging platform is this running on (assuming that its not custom built)?
Thanks,
Chance
reply
Also, the MTG card brought back some memories!
During development, managing these things is painful, especially if each developer is developing against a local copy of the database. You can even end up with two incompatible change scripts to the database.
My only concern with the approach that you outlined is that it doesn't immediately allow for including business rules in the database which is (according to some) where these types of things belong (for a variety of reasons that go beyond this comment).
On the whole, very sane and thought out argument. Even as a database developer I found myself agreeing with you by the end. It's important to have a stable product before attempting to create a well thought out and properly optimized database.
http://seesharper.wordpress.com/2008/11/05/the-...
http://seesharper.wordpress.com/2008/11/06/the-...
2- And I am currently taking over a project done in this fashion, ( but in memory Reposity with fillers to make necessary data avail) and the Model for the most complex part is just made in a way that you cannot translate it to a RDMS mapping.
My first point is my biggest concern, the second is mainly due to inexperience developping with this approach, but it would be for sure a pitfall to watch.
Great post!
Also I think it's weird that you can't map a model to an RDBMS.
Sent from my phone. Please excuse brief replies.
For the Model I could not map to an RDBS, well persisting data to a OODB is so simple, they persisted a part that does not belong to persistence but should be computed on demand, or you end up with redundant data across your database. But now swicthing repository involve a lot of work being able to reconstuct complex object from other entities, this should have been spot sooner.
As for the idea. I like the fact that it reduces friction up-front.
I don't like the fact that it creates a debt to be paid off downstream. That is, when the time comes to switch to RDBMS. Or, if you try to make repositories interchangeable to ease that pain, it then feels like you're *adding* friction up-front, just to prevent having to do more work later.
I feel like the happy ending here would simply be not to spend time switching to a RDBMS. Why not just use db40 in production? If someone wants a RDBMS in the picture, write ETL scripts to export to a warehouse :) Their dRS replication system seems designed for just that task.
select * so no worries there. Also delayed execution applied the
filter first, not after.
In my experience, there's always a bit of give-and-take between the DB design/ORM/Object model (we're not all Ayende - and can't all convince our customers that running off of open source trunk code is a good idea), and you're likely to encounter a case where something that you can do with objects can't be done with DB4O, or you can do it with DB4O, but not your ORM, etc...
Also, your object design might have to be changed to get the performance you need. I'd hope this is a really edge case, but I had a certain project where a Many-to-One relationship ended up being stored by passing XML to SQL, and then having a trigger on an XML column. Nasty, nasty stuff, but it was sooo much quicker than the ORM's implementation of storing a Many-to-one relationship.
There are always cases where how you do something is influenced by which tools you're using to do it (compare ASP.NET with ASP.NET MVC, and compare ASP.NET MVC with Monorail), and not using the target technologies to develop against means you're going to hit problems later.
And, yes, performance is a development concern. No matter what you think.
I've always been a perf nut, but only over the last 2.5 years have I been swayed to address perf when needed. And believe it or not - 9 times out of 10 it's not that big a deal. More often perceived slowdowns are from too much client code (js libraries, etc); almost never a client call.
Perf shouldn't really be handled until you need to (or until a test id's a problem). ADO is really fast these days - and attributing bottlenecks to DB calls is really something from 4-5 years ago.
Attributing performance to DB Queries is still a big issue, especially when you're dealing with millions of rows.
By doing this it should eventually drive the tools to catch up with development methods. We might get to object persistence nirvana a little bit sooner than we otherwise would.
There are a couple links that look like feature requests on the Db4o site
http://tracker.db4o.com/browse/COR-1376
http://tracker.db4o.com/browse/COR-1143
related to TransactionScope.
Below is the unit test I would like to figure out how to make pass.
[Test]
public void CanRollbackSavedObjectInRepository()
{
var newObject = new TestObject()
{
SomeInteger = 1,
SomeString = "string here",
};
using (TransactionScope scope = new TransactionScope())
{
_repository.Save(newObject);
if (_repository.GetAll().Count() != 1)
Assert.Fail("Singe item was not saved to the object database. itemCount = " + _repository.GetAll().Count().ToString());
//scope.Complete();
}
if (_repository.GetAll().Count() > 0)
Assert.Fail("TransactionScope Failed");
}
Any ideas?
I found a project at http://code.google.com/p/uniframework/
In this project there's an implementation of IEnlistmentNotification called Db4oEnlist and in my repository I can do something like this...
public void Save(T item)
{
Db4oEnlist enlist = new Db4oEnlist(this.Container, item);
bool inTransaction = Enlist(enlist);
this.Container.Store(item);
if (!inTransaction)
this.Container.Commit();
}
the Enlist method looks like...
private bool Enlist(Db4oEnlist enlist)
{
System.Transactions.Transaction currentTx = System.Transactions.Transaction.Current;
if (currentTx != null)
{
currentTx.EnlistVolatile(enlist, EnlistmentOptions.None);
return true;
}
return false;
}
Unless there's a better way out there to do this using TransactionScope... I was able to get my unit test above to pass...
mean :). Are you saying that wrapping DB4O calls in a transaction scope
won't work?
That said, I know DB40 will support them as well.
not sure what you mean by "I know DB40 will support them as well."
from Product p in DB.Container
instead of this
from Product p in context.Products
So, the products are just in the container, no indirection needed.
I will test it out for sure. It might be a goo option for a single user application instead og SQLite+NHibernate, like you said, stop worrying about all that.
1. Db4o does not take kindly to multiple web sites sharing a database file. This is per design but is really a show stopper since my application consists of two distinct websites operating on the same data. Also Db4o seems to get confused saving/loading complex hierarchies of references. Loading a list (100 elements) that references a set of elements which each reference each other through a tree took 10 seconds to load.
2. It could be that I am too used to SQL, but all my objects have one or more IDs that are either their own ID or a reference. These IDs are normally assigned by the database, but Db4o does not assign IDs. So to use both SQL and Db4o I need to either make Db4o assign IDs like SQL and thus know about which field is an ID or I have to hide the IDs. Neither solution is attractive - making Db4o assign IDs seems the safest.
3. The next problem I encountered was that my Db4o repository only needs to work on one type while my Oracle repository needs to work on two types (remember I have 2 tables). This is because Db4o stores a whole object including references and references references. This is a cool feature, but SQL works differently and really provides a model that is very different.
My conclusion is that your idea is sound, especially for a quick spike which needs some persistence. Db4o might also be useful for the initial stages of a website, but once you decide to switch to SQL then you will need to make several changes and you can’t go back. I would not suggest trying to run a SQLRepository side by side with a Db4oRepository. The models are too different and the result feels wrong.
Wanted to give you a thumbs up.