Prolog
Think back to your high school/early college days. It's Friday night and you call your buddy Jimmer wondering what's happening.
"I hear Worm's gotta a keg and these Italian exchange students will be there! We're headin over there in 30 - you in?"
So you get your stuff together, trying to wear something that makes you look cool and most importantly that you didn't try to look cool. You hop in your mom's old Nissan and head over to Jimmer's. Two hours later after everyone's lagged hard, you end up at Worm's and there's no one there but him and 2 other guys, and they're asleep.
"Yah I couldn't get the keg - this dick at the liquor store nabbed my ID and told me I'd get arrested so I took off. What a #$%#in @!#hole... but I hear Jonah's havin some people over and these Phi Delts from State are comin too. I hear they're bringin a keg - and man those State dudes don't drink swill so you know the keg's gonna be good - Coors Light or somthin... and the Italian's are headed over there too."
Out the door you go, in pursuit, wondering what a Phi Delt is and why they don't drink swill. Two hours later, when both keg and Italians fail to materialize, you realize the night's a bust and you should have just stayed at home, playing Warcraft or watching the Colbert report.
WTF Are You Doing?
Being a developer is like trying to find a party on Saturday night - sometimes you have the time of your life, other times it's a complete waste of time and you end up fried, wondering WTF you were doing with all your time. The ultimate kegger-carrot has to be the shiney new data access approach- the "new way" of doing things that brings you that much closer to your data, proves a bunch of other people wrong, and sheds that much time from your day.
Despite the direction this post is taking - no - I'm not trying to pimp SubSonic. I got to thinking about all the approaches to data access tonite over dinner and I was sort of struck by all the different paths I've taken over the years and just how hard it is (in terms of LOC) to pull data from a DB. Ya'd think that by NOW, 10 years into this whole data-driven web thing we'd have figured it out. Perhaps settled the debate and come up with some common way of sending data from DB to page.
For some reason this most fundamental of needs has been ridiculously inflamed, mused over, conjured upon, and obfoobarscated to the point of religiosity. And as I sit there with my Moa Sandwich in front of me, I had a big WTF moment.
So let's set the stage. There are 3 main camps of data access: The ORM Nuts, The Generators, and the Out of The Boxers. Lets take a look at each shall we?
ORM Sucks
ORM - or Object Relational Mapping is the concept that a table in your Database is akin to an object representation in Object-oriented programming. I don't like ORM cause I've seen what it does to people. I've lost friends to ORM, and I've had my brushes with it too. It's not easy to walk away from...
ORM Makes sense for simple cases - a Product persists itself to a Product table. Same with an Order and a User. Generally these tables correspond almost completely to the object they persist.
The problem comes in when you consider that a database is, well, a database and usually complies with (or should comply with) rules of normalization at some level. As your database grows and becomes more complicated (which happens as you build out your project right?), this model breaks down completely. If you're an ORM person, ask yourself how much code you need to write to support a many to many relationship...
To blow this completely up let's reverse the logic here and see if we can apply OOP design principles to a database. The first thing we need is a table called "Object", with a field for an ID and a Description. Then we need to throw Relational Theory out the window, and proceed to build one of the ugliest databases imaginable...
Rails guys go so far as to say "I don't want any logic in my database", meaning that all things (even DRI) belong in code- and theoretically that's fine. It's when people actually begin to use your application that your theory decides to crash and you end up with a Twitter mess :). Am I saying this model won't scale? Of course not - you can get anything to scale as long as you have a big enough server right?
Kill Your Code Generator (or Do You Need Stored Procedure Intervention?)
I worked on a project for a very long time where we took a set of existing Code Smith templates and reworked them for our project needs. When run, these templates would reach into our DB, pull out the table structure, create a set of SPs to wrap each table (CRUD, Find, etc), and then create the code to use each SP.
OK, you know who you are and I bet you're thinking to yourself "Oh shit - yah I had an application like that and it was really a PITA". It's OK... we're here for you. Anyway back to my boring story...
As the system grew we ended up with 283 stored procedures, 269 of which did nothing but simple SELECT, UPDATE, INSERT, and DELETE. I asked the lead architect once why we use SPs, and he said to me "I've always used SPs". I love working for guys like that.
When I asked him why we use so many damned SPs he gave me the standard answer: "SPs are faster and more secure. They protect against injection and I don't want people writing inline SQL in my application". Bah.
I think that's the standard answer for most people who have an Stored Procedure Addiction. Stored Procedures can kill if you don't get control of them. They can actually crawl out of the DB and crawl into your dreams. I once had... nevermind.
Here's some sobering news on these beasts:
I love Code Smith like I love good beer. Actually that's not true - but I am in some serious like with Code Smith so don't get me wrong there. It has been abused, however. Seriously abused (this goes for MyGeneration too).
Code that you generate is STILL CODE. You need to know it and support it or it's no good.
Out Of The Box: Just Gimme The Damn Koolaid
Finally we get to the folks who just use what MS gives them: SQL Datasource with a DataSets. This is fine and they work OK, plus hey, it's really visual :). There really is no way better to get your application to talk to your DB faster than to drag the table onto a blank ASP Web Form and watch Visual Studio do it's magic...
But consider the result:
Encapsulation and abstraction are at the core of what we do. The Rails DRY mantra (Don't Repeat Yourself) is perfect here because this model of talking to your database is redundantly redundant and repetitive.
Imagine each little SQL Datasource as if it's a zit, blemishing the pages of your application. Take a second and squeeze out each one, and clear up that messy complexion.
OK Smart Guy, What's Your Answer Then?
Use your noggin, and follow the path of least resistance. In the extremes lie the danger (or as Obi Wan says - "only the Sith deal in absolutes"). I like to read the Tao a lot - it keeps my feet on the ground and helps me understand Phil Jackson's Triangle Offense. One quote I think says it all:
The mark of a moderate man
is freedom from his own ideas.
It's easy to toss quotes around and get all philosophical (this is a far stretch from where this post started with kegs and Italian exchange students but hey, it's a ride we took together), so let's put some rubber on the road shall we?
So to wrap this up - each approach is valid, but no approach is final. It's too tempting to fall into a DAL religion and think "it's the only way". Never is there only one way in anything code-related. Open your approach to use SQL Server in every way - even check out Functions! They can save you so much time it's unreal - and that's what this all boils down to: saving time, creating less code, and offering value.
Nice post - very pragmatic. :)
I find that there's kind of a middle road that seems often overlooked in the ORM vs straight data debate. People get so enarmoured with ORM as the end all solution to data access that they forget there's a middle road of business objects that's supposed to really provide the front end of the application.
At that level more abstraction of the data can and should occur. I want to write my query - however complex or relational once in a business object (or a stored proc with the bus object calling the stored proc) and be done with it.
The big problem is that data representation if you run a complex query that returns arbitrary data how do you represent that? LINQ comes close, Microsoft chose to lock off dynamic results to local method scope, so the dynamic results can't be passed out of of the bussiness layer making the dynamic aspect pretty useless in a business layer...
Today in .NET the best way to address this is with a hybrid approach that mixes both ORM and dynamic ADO.NET structures (DataReaders and in some cases yes even DataSets).
All these years - all these approaches and still we can't really say we've ever solved this problem. It seems we just keep making the same mistakes over and over again...
I do take issue with a couple other statements though..
"Stop trying to fight it - use the DB for what it's good for (DRI and data handling) and let it do the heavy lifting. Your goal should always be performance - don't sacrifice extra connections just to satisfy the ORM design model."
I do agree to let the database do what it is good at. However, letting the database do all your heavy lifting is a very dangerous path to head down..This won't bite you, however, until you have to scale..then you are forced to scale at the database level..contrast that against an application which can scale without necessarily scaling the database (think caching, partitioning, etc)..You are left with much more flexibility than just "use replication"..Obviously if you have the need to update 50k records in your database, that needs to be done in the DB, but putting business logic into the database will come back to bite you in the end..there's a balance there..
I would also argue against having performance being your driving goal. This is another dangerous path..I think the hardcore SP guys are a good example of why this is faulty thinking..Again, use your head, and let the application do what it's good at, while letting the Db do what it's good at..
Oh, and I'd leave out the line about "ORM nuts" next time..I can see how that might be construed as offensive..
The database will execute data routines an order of magnitude faster than your code will. I don't know if you're speaking from experience or theory, but generally most architectural models put one DB server behind a scaled out application server cluster. I don't think anyone's gonna worry about burning up a DB behind today's applications. A web server - sure.
Explain how putting some level business logic in the dB will bite me. I have never had this bite me and I have 4 URLs up at PayPal and I did just this in the name of performance - I had to! They get hit Millions of times a day! So what would you do if your client told you that their site would get a million hits an hour? Would you change anything? If so, why would you do it any other way?
And with all due respect - if performance isn't your goal, what is? I agree that in the short term you need to deliver a product (increase dev cycles over perf)- but in the long run if your goal isn't perf than you're not serving your client.
Finally - it's a little late to "leave out" the ORM nut thing, but why? Developers who adhere strictly to ORM (or any other model for that matter) are nuts.
Thanks for the thoughts Evan - do let me know more though as I'm curious about your suppositions. Let's hash this out and don't be shy about your opinions. Just be sure to back them up :)
Good post, Rob. But I have to ask, do you really think using NHibernate's associations for many to many relationships constitute too much code? Or is it in the queries that you feel the complexity exists?
Been lurking your blog awhile since I discovered CSK and I felt compelled to reply to this post.
In my real life job rather than the tiny bit of consulting I do on the side I write applications that support municipalities. Usually relating to SCADA or AMR implementations in the city of choice. It is typical that either one of these applications send a *read* ranging from every 15 seconds to every 15 minutes. When you get on the scale that I am working with if you do not put at least some business logic in the database you are doing yourself and the city a great disservice.
A typical example for what I do is each househould in a city has at least 2 meters, water and electric, and we'll assume it is a moderate size city not a metropolis so 50,000 people. Not even counting all the commercial buildings in the city we have 100,000 meters submitting data every 15 seconds. Particular to water meters I have to trend that data for that particular time entry over the last 2+ years to recognize leaks. I now do and always will trend that data within the database and give a bit back to the code to throw an alert whether or not the meter is reporting higher than normal usage. If this higher than normal usage is recognized for a period of 15 minutes so 60 reads then an alert is sent.
Trending that level of data is a significant task and is best handled by the DB. There are so many other examples I could give to support your argument, but yes anyone who thinks otherwise just has not had the opportunity to work on a large scale or they just like pain, either one...
Thanks for a great blog,
Josh
Been lurking your blog awhile since I discovered CSK and I felt compelled to reply to this post.
In my real life job rather than the tiny bit of consulting I do on the side I write applications that support municipalities. Usually relating to SCADA or AMR implementations in the city of choice. It is typical that either one of these applications send a *read* ranging from every 15 seconds to every 15 minutes. When you get on the scale that I am working with if you do not put at least some business logic in the database you are doing yourself and the city a great disservice.
A typical example for what I do is each househould in a city has at least 2 meters, water and electric, and we'll assume it is a moderate size city not a metropolis so 50,000 people. Not even counting all the commercial buildings in the city we have 100,000 meters submitting data every 15 seconds. Particular to water meters I have to trend that data for that particular time entry over the last 2+ years to recognize leaks. I now do and always will trend that data within the database and give a bit back to the code to throw an alert whether or not the meter is reporting higher than normal usage. If this higher than normal usage is recognized for a period of 15 minutes so 60 reads then an alert is sent.
Trending that level of data is a significant task and is best handled by the DB. There are so many other examples I could give to support your argument, but yes anyone who thinks otherwise just has not had the opportunity to work on a large scale or they just like pain, either one...
Thanks for a great blog,
Josh
@Josh I think your comment here is BANG ON. Many projects like yours grow from "neat idea" to the core of a business unit. If you hadn't put your performance hat on, this neat idea might never have come to be a solid product like it is today.
I still prefer stored procedures... going to mysql where it's all inline sql... yurg!
Interesting article though
What you are talking about "one table one class" is much more a data structure issue then it has to do with ORM. ORM is about mapping Relational Data to Object Models. That means that you can have multiple tables for one class, a table can be represented by nultiple objects, etc etc.
In Rails your statement is true. But not for NHibernate.
Another quick point. The database is usually one of the harder parts of a system to scale. That is why E-Bay (as an example) don't even use joins, they pull up all their data to memory and join it there. Just to make sura that the DB has enough power to run the site.
What you are talking about "one table one class" is much more a data structure issue then it has to do with ORM. ORM is about mapping Relational Data to Object Models. That means that you can have multiple tables for one class, a table can be represented by nultiple objects, etc etc.
In Rails your statement is true. But not for NHibernate.
Another quick point. The database is usually one of the harder parts of a system to scale. That is why E-Bay (as an example) don't even use joins, they pull up all their data to memory and join it there. Just to make sura that the DB has enough power to run the site.
Most ORM tools map a table to a class since a table usually represents a logical element, as does a class. Most ORM tools do support compositing these things too, as you mention. The thing I just can't get my head around though is WHY. It's code for code's sake if you ask me. Call me old fashioned but why do I need to work up all these constructs, shove them through layers of abstraction just to get some dang data out of my database?
So yes, your comment is spot-on in terms of "for long" as I have a low gag-reflex to this type of coding.
Your eBay example is mis-stated. Yes they do the joins/sorting in the app layer, but this is because they can scale the app layer easily because they scale OUT. Their DBs are Oracle and they rewrote their own flavor of J2EE to streamline the DB access. So in terms of cost, yes it's much safer and simpler to scale out your app layer then to kick down with another Oracle license.
And I completely agree with it - it makes great sense! And has absolutely nothing to do with ORM...
Most ORM tools map a table to a class since a table usually represents a logical element, as does a class. Most ORM tools do support compositing these things too, as you mention. The thing I just can't get my head around though is WHY. It's code for code's sake if you ask me. Call me old fashioned but why do I need to work up all these constructs, shove them through layers of abstraction just to get some dang data out of my database?
So yes, your comment is spot-on in terms of "for long" as I have a low gag-reflex to this type of coding.
Your eBay example is mis-stated. Yes they do the joins/sorting in the app layer, but this is because they can scale the app layer easily because they scale OUT. Their DBs are Oracle and they rewrote their own flavor of J2EE to streamline the DB access. So in terms of cost, yes it's much safer and simpler to scale out your app layer then to kick down with another Oracle license.
And I completely agree with it - it makes great sense! And has absolutely nothing to do with ORM...
Scaling databases is a nightmare - they are the least scalable aspect of any system. By choice you should always scale anything but the database.
Cost isn't an issue for anyone running Oracle (or really shouldn't be), it is just correct to scale the app layer long before the persistence layer.
I want to work where you do where you Oracle licenses are that easy!
When you are considering writing an app for a client part of your contract will usually contain one of two provisions. Either you will support the application for a period of time or you will have to provide in depth documentation for them to support it themselves. Either way the less code the better and ORM is NEVER less code.
Just my two cents,
Josh
Maybe you were trying to be funny but the humor failed to come across. What you seem to be saying that I do agree with is that one should pick the correct tool that fits the task at hand. Your product looks interesting and I look forward to trying it out.
DRI in the application layer is a path that will usually require duplication of logic. Unless your app is small, and stays that way, other applications or tools will need to utilize the same data. Having DRI on the DB eliminates the need to duplicate work, and ensures that consistency is maintained?
... but you just said SPs *are* code... :(
Agreed that the arguments for SPs you cite are bad. I'm only recently a convert to the idea of using SPs, but my reasoning is that it better abstracts the database details away from your other code. Let the db developers who designed the database figure out how best to query it. In the PHP world they've been trying for years to get programmers away from embedding HTML all over the place (presentation logic should be cleanly separated)... and yet, embedding SQL (hence not cleanly separating relational logic) is a good thing?
Embedding SQL is something all platforms *still* struggle with. The SQLDataSource control is a big offender here.
This seems a strange line of reasoning - it reads along the lines of "why bother securing your database at the SP level, because you're server might be completely compromised anyway" - presumably the chances are that it isn't, and SP-based security provides an excellent second line of defence against things like any SQL injection vulnerabilities that you don't know about.
I know what you're saying, and that works well for your customers who have one web site they want now. But I work for a company that is working on a very, very large project. That will involve more people over time, and more and more complexity over several years. My goal is not performance in the way you describe it. It is MAINTENANCE.
Describe to me what you mean by your goal is "maintenance" versus "performance". If you document your stuff, why can't you maintain a performant site? Where exactly is the tradeoff here and how are these two things not aligned?
If you're trying to justify using an extra connection or two to the DB so your code is more "maintanable" - explain that scenario and why it's better than going the performance route (one connection, multiple results).
In terms of large projects - does it involve 110 people from 4 different cost centers within a Fortune 50 company? Will the code review be done by Microsoft? I have perspective here and I'm not just pulling this outta my behind :)
>> If somebody's gotten into your server <<
The point is precisely that they might not necessarily have done this. Of course, if you stick a server on the network with a weak sa password, you get all you deserve, and you have got bigger problems than using SPs for security could address, but I start from the premise that we've not done that. What I'm suggesting is more along the lines of you've put deadlocks on all the external doors - now we're going to fix a security grille behind it.
Your argument seems to boil down to because more severe possible attack vectors exist, it's not worth thinking about the less severe ones - it may be precisely that it is via one of the less severe attack vectors that the system is compromised.
You miss the point of SPs and call someone else silly for teaching you about security.
And the fact that you should not even know the names of your tables if you are dealing with databases from code perspective.
Therefore prefer SPs at all times, dba pros do, random coders criticize them.
You miss the point of SPs and call someone else silly for teaching you about security.
And the fact that you should not even know the names of your tables if you are dealing with databases from code perspective.
Therefore prefer SPs at all times, dba pros do, random coders criticize them.
Here, let me fix that for you. ;)
you realize the night's a bust and you should have just stayed at home, playing Space Invaders or watching the Dukes of Hazzard.
Quick comment: I think developers working on a business application (where business is the focus) and CRUD is secondary love ORM. Developers working on a CRUD application where they are focusing on just getting "some dang data out of my database" do not. They can argue all they want, but they are in different situations.
Am I missing something? It is 4:00 am in the morning, so maybe I am.
In some programming languages this gets messy since the SQL statements obfuscate the code.. so store the SQL queries in a separate source file where they are easy to maintain (in nice little sub procedures tucked away), and reduce concatenation obfuscation by formating some {$macrovars} in the sql query strings with a format function, if needed.
The bottom line is when it comes to debugging, you have to use SQL.. all your tools know SQL, your shell account allows SQL, and you can dump SQL, you can send SQL over the network (but you can't send an effing object instance over the network) and you can write to stderr or the web browser any SQL that is causing a problem with a debugln call, and you can copy and paste the problem SQL into other SQL tools... SQL REUSE! So why beat around the bush?
Reuse SQL, and save thousands of hours on decision making.. just use effing SQL!
It is as if no one uses SQL anymore? Do I have that right? How can you possibly debug the database without having SQL handy, and how can you possibly debug the database without being able to reuse SQL snippets from window to window? I think someone reinvented the wheel, a square one! Without SQL, and with all this beating around the bush, we have to map our brains to SQL through all these SQL abstracitons! Somehow, our brains have to map to SQL, so why not just effing use SQL? There are disadvantages to SQL, yes.... but for creeps sake, there are disadvantages to all hte others.
"Lots of architects have learned, painfully, that you usually can't magick relational rows away behind object/class abstractions. The right way to think about a database is as a set of normalized tables that are designed to be addressed with SQL strings." --Tim Bray