CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Rod Paddock


The Reality of Stored Procedures

 

OK I’ve decided to jump into this discussion of using stored procedures for data access and manipulation.

 

First I will start out with some background. I have been involved in building multi tier applications for nearly 10 years. In this time we have adopted a rule for our projects:

 

Stored Procedures are the ONLY mechanism for accessing data.

 

In this post I will discuss some of the points made by Jermey Miller in his post:

 

http://codebetter.com/blogs/jeremy.miller/archive/2006/05/25/145450.aspx

 

One thing I will compliment Jeremy on is his disclosure of the types of applications he builds.  Jeremy discusses using the database as mechanism for object persistence.  Jermemy states his apps are “are primarily about business rules, not reporting”.  Fair enough.

 

Generally speaking the applications I work on deal with large amounts of data and an equally large number of users. The largest application I work on has over a Terabyte of live OLTP data and upwards of 1,000 concurrent users. This application started with 70GB of data in 1999 and had grown to that size in 7 years. This application was created with 100% stored procedures.

 

In the interest of disclosure: I am not a TDD expert, nor do we implement a TDD style of development here. However, we do use concepts that have been adopted by the TDD world. Further down this blog post I will discuss TDD concepts so keep this in mind.

 

So here are some of our realities:

 

Stored Procedures are a Requirement for Batch Processing

 

In systems of any size there are always batch processes: Credit card applications need to be scored, letter files need to be created for export to mail houses, customer statements need to be created. How much of this data will you transfer to a middle tier server? If your data is of any significant size the answer should be none or very little. These types of processes need to be run on the server. Transferring data back and forth to a middle tier is not a wise idea on a number of levels the primary one being network performance. These types of operations are best suited to running in a stored procedure.

 

Stored Procedures as API

 

Another benefit of stored procedures is that they are an API. Stored procedures separate implementation from interface.  Stored procedures are a contract between the developer layer and the database layer.

 

The physical storage mechanism for data may differ greatly from  the presented view of the data. DBA’s must be able to physically change the underlying data without the permission of the developers. There job is to conform to the contract they provided the developer.  How they implement that contract is up to them.

 

Also… stored procedures can be testing in an automated way. You can incorporate stored procedure calls into your unit testing framework.

 

Stored Procedures and TDD

 

 One of Jeremy’s criticisms about stored procedures is that they: “make TDD a slower, less productive process.”  He goes on to state that referential integrity (or other database constraints) requires you to set up a lot of other data in order to run your tests.  From his words he never says it’s not possible, just more difficult.  It seems to me that having ordered sets of tests or branched sets of tests might solve this issue.

 

 

DBA’s Do Work Separately from Developers

 

In a lot of companies this is the reality: DBA’s work separately from developers. A lot of organizations separate operational and   developmental (R&D) duties.  DBA’s responsibility is to keep a database operating within acceptable performance parameters.  The function of DBA is not just making sure there’s enough disk space and that the data is backed up.

 

DBA’s do lots of query analysis and tuning. One thing that may not be obvious in a database is that a query that runs fine today might not run the same way tomorrow. Why not ? Well as data is added to a database the statistics surrounding that data may change. This means queries that ran fine last night might not run the same way today. It is the job of the DBA to fix these problems ASAP when they are found.

 

So if the queries are all being fired from a client, what is a DBA to do? Are they going to ask a developer to make a change to production code and slip stream a version to 1000 desktops during production? Yah right! What they will do in a production environment is: change the offending stored procedure, test it and put it into production on demand. This is a much less intrusive mechanism for making a change during production.  Is this recommended? Not necessarily but it is a reality in a production world.

 

Another reality is that in a lot of cases DBA’s own their databases. Developers are required to live in the world of the DBAs and they dictate just how data is access from there database and who can access that data.  In a lot of cases DBAs will provide data to developers in the form of stored procedures, no questions asked.  If this is the case what do you do with your TDD process? As Clint Eastwood said in Heartbreak Ridge: “You adapt, you overcome”. Just because it’s hard doesn’t mean it can be done. TDD developers need to adapt there processes to this reality.

 

Without saying: Let me know whatcha think!

 

Rodman



Technorati Tags: , , ,

 



Comments

Jake Good said:

Though I'm happy that no argument was posed that stored procedures are useful as they are compiled/in-memory and run faster... I still want to comment on a few items that you posed (without reading the original story).

*Stored Procedures are a Requirement for Batch Processing*
Though I think we all agree that your problem is definitely one that a lot of systems can fall to... I don't think that stored procedures are the only answer. Intelligent software in code can accomplish similar tasks. This assumes that no sort of business logic or validation needs to take place.

Which brings up my next point:
*Stored Procedures as API*
Duplicating code or having business logic / validation in stored procedures breaks many accepted development practices. If you're using stored procedures to dump data, you better be doing some validation... and I don't really like the idea of having that logic in stored procedures. APIs means that there's encapsulation... which means some sort of logic that people can't see unless they know how to read / edit stored procs, another bad idea.

In the case you presented about credit card processing... it's a business rule that the credit card numbers have to be entered in correctly... this logic clearly needs to go in the middle layer AND/OR in the presentation layer. Otherwise we have lots of duplicate logic and a potential for multiple points of failure.

*Stored Procedures and TDD*
I would agree with your point that if your tests are setup properly, then there's no excuse. Test Scaffoldings work out really well. If your data access code is written well.. and your middle tiers are written well, then your tests should have an easy time proving it.

*DBA’s Do Work Separately from Developers*
I see the role of a DBA diminishing and being folded into development all the time. Our team of people working on a project continually does it's own DBA style stuff. Granted when you're in a high-demand... large data scenario, you'll want someone to optimize the data storage, but for a lot of applications... it's not the first thing you should be looking at.

Of course this is all rambling and my opinion... but you asked! :)
# June 21, 2006 1:26 PM

johnwood said:

>> Transferring data back and forth to a middle tier is not a wise idea on a number of levels the primary one being network performance.  <<
I"m not sure I buy the performance argument. If it's really an issue you can run the process on the same machine as the database and you'll just end up with IPC and nothing on the wire, and IPC can be pretty fast. Saying that I haven't done any actual performance comparisons between using cursors in an SP and cursors via IPC. The slight drawback in performance, IMO, is eclipsed by the advantages in being able to write your business logic against abstract business entities rather than rows of data.
# June 21, 2006 1:39 PM

Rob said:

RE: Jake
>If you're using stored procedures to dump data, you better be doing
>some validation... and I don't really like the idea of having that logic in
>stored procedures.

I keep seeing this assumption over and over - the assumption that all code will go through the middle tier you are creating and that all projects are new. This is very far from the reality of many IT environments. They already have existing DBs with existing SPs that contain business logic. Additionally, other legacy systems may already be using the database. Perhaps they need incomaptible (with the middle tier) systems to also work with the database. This ideal world where all the logic can exist purely as .NET code with inline SQL generated by some template is something that isn't always feasible.

Also, in shops where the DBA's are responsible for data integrity, pushing that job onto the .NET developers will never be allowed. And while you may find that perhaps it should be ideally, you still have to deal with systems that do not fit that model. The cost-benefit analysis will prevent those systems from being changed just because some TDD people prefer to put all the logic in *their* middle tier. Obviously, this approach is better suited to new projects, and won't work everywhere.

Again, the ideological and pure (and sometimes dogmatically religious) approach needs to be tempered with practicality and possibilty. That's where we need to be careful and separate the arguments for "should be", "can be", and "must be". I still think that people are viewing everything in the context of their tools/methodologies (in this case, TDD with templates/nHibernate), and that's coloring their reality. The danger then is that the methodologies and tools become more important than the projects and environments, which is a little self-defeating. If all you have is a hammer, everything looks like a nail.
# June 21, 2006 2:05 PM

Jake Good said:

I would agree that my arguments were tailored towards new projects...

I would also agree that its dangerous to view entirely from tools/methodologies...

The interesting part of the discussion is that... if there really were hard concrete, quantifiable facts that prove one way or the other (using stored procs)... we'd all be in the same boat. Maybe this topic will always be left up to an opinion! :)
# June 21, 2006 2:33 PM

mb said:

Check out iBATIS (http://ibatis.apache.org/) - gives you that clean break between the role of DBA and developer - map existing procs / queries to objects via an external definition file - very handy.

I agree that procs definitely have their place (batch processing, reporting, etc...) but really those should be external running / scheduled processes IMO
# June 21, 2006 3:48 PM

Sachin Rao said:

Tired old discussion.  I'm not sure if anything new came out of this blog.
# June 21, 2006 4:23 PM

Rob said:

Jake, absolutely, although rather than pure opinion, I'd like to say that it should probably depend on the realities and possibilities of the project and environment as well.
# June 21, 2006 6:14 PM

Joshua Flanagan said:

Rod - it seems like your argument is heavily based on "if stored procedures are being forced on you, then stored procedures are a viable solution."  I don't think anyone would argue that.
When people discuss preferences over whether to use stored procedures or not, there is an assumption that they have a choice. It is "what would I do, when given the chance to make a design decision."
So with that in mind, of course most of the arguments are going to make assumptions that projects are new, and all code can go through the middle tier (since you are designing it). That should not be surprising.
# June 22, 2006 8:51 AM

Eric Wise said:

In our environment things like credit cards get scrubbed on the UI layer and once again in the stored procedure.

Repeat business logic?  Certainly.  Safety for both in process and out of process calls? Definitely.
# June 22, 2006 10:53 AM

Adam Machanic said:

Stored procedures as API.  Exactly!

http://www.simple-talk.com/2006/06/06/to-sp-or-not-to-sp-in-sql-server-an-argument-for-stored-procedures/

To those who say otherwise, ask yourself a simple question:  When designing an object-oriented system, would you mark all of your internal properties as public?  By not using stored procedures, that's EXACTLY what you're doing with the database.

# June 22, 2006 10:55 AM

Adam Machanic said:

MB:

"I agree that procs definitely have their place (batch processing, reporting, etc...) but really those should be external running / scheduled processes "

You're saying that instead of using built-in scheduling tools, which most DBMSs ship with, that the wheel should be reinvented and developers should write their own offline processes?  That doesn't seem like an efficient use of time to me.  What benefit do you see?
# June 22, 2006 10:57 AM

Aaron Erickson said:

A nice dose of reality.  Thanks!

The bottom line is that the DBAs are usually seen as the most "agile" people in the entire IT shop much of the time.  A DBA can produce a report for the CFO that is not defined by one of the applications laying around in an hour (yes, it might be rough, no UI, etc).  In fact, ask any DBA about Ad-Hoc requests coming in from various departments, and you might get an idea why the DBA side of the house is considered to have "more" agility much of the time than does the dev house.

In no way, shape, or form, do I believe this is right or good.  But it is perception.  And it is the genesis of why the middle tier dev folks will never be the place that the DBAs are folded into for most organizations.
# June 22, 2006 12:04 PM

mb said:

Adam Machanic:

"You're saying that instead of using built-in scheduling tools, which most DBMSs ship with, that the wheel should be reinvented and developers should write their own offline processes?  That doesn't seem like an efficient use of time to me.  What benefit do you see?"

No - I believe the intent of this flurry of discussion is centered around the use of procs within applications - business object persistence, implications on TDD, etc... If I need to have a scheduled import to my database yes I'm going to use SQL Server's built-in scheduling and DTS services. If management wants to have XYZ reports then yes, I'm going to use SQL Server Reporting Services - no way in hell am I going to recreate the wheel if it exists for me already. So no, that's not what I was implying with my comment.

What I *was* implying was that there are certain aspects of most applications that are prime candidates for external tools to handle. Batch  credit card processing in the off hours, processing canned reports and emailing them off to the appropriate parties, hitting an FTP server for the latest remote updates, etc... These activities have nothing to do with my business logic and how my objects are supposed to behave and interact with one another, which is the driving functional area behind these latest blog posts.

I have seen the "I can update a stored procedure while an application is running and save the world!" notion fall on it's face more times than I care to remember. What if you forget to put your change back into source control? (assuming you're keeping your database scripts in source control in the first place) What if the change is localized to the site and doesn't apply to the common code base? What happens when you need to update that customer with a new release? Having one or two of these special cases to track might not be such a big deal but once one or two people do it, it catches on real quick - you end up with a mess in no time.

In the end I want the applications I create to be as loosely coupled and as easy to test as possible, and in my experience using an ORM (or similar) for the application components (not aux processes as noted above) has been the best way to achieve that. (again, my experience, YMMV)  In those situations where stored procedures cannot be avoided in an application component, such as working with a legacy database, I'll employ iBATIS to achieve my end goals.
# June 22, 2006 5:06 PM

Neil said:

What about shrink-wrapped applications that are sent to thousands of users without DBAs? I avoid stored procs because they are hard to update as compared with queries embedded in the source code. We do not want users touching the queries anyway, so stored procs make no sense for shrink-wrapped apps.
# June 25, 2006 8:21 AM

Adarsh Bhat said:

The lesser logic you have in the lower layers, the more flexible your overall application is. Stored procedures certainly do help when you know exaclty what is to be done. In the case of a application that expects heavy customization, this may not be such a great choice.
# June 29, 2006 1:07 PM

Patrick Wellink said:

Well, if you build your app and people start inserting records you have the possibillity to tune performance of a sproc. Usually users enter a lot more records than the testers test with. And so the performance of the DB will be different as well.

To have Stored procedures only is a very usefull feature cause I as a DBA can focus on peformance and tweak SP's to perform as best as they can.

Using INDEX hints and WITH NOLOCK options can greatly affect performance without touching the logic in any way. If all those queries are hard coded you cannot tune anything and you are left witha an untunable system.




# July 20, 2006 2:05 AM

RVBoy said:

The fact is that sometimes SP are the best and obvious choice and sometimes they are not. To start out saying "We always use SP" is Junk Science - the process of deciding the desired outcome then assembling convenient "evidence" to support it.

SP for everything, all the time? How can that be? Come on, Rod, tell us it ain't so.

# November 23, 2006 3:11 PM

mike bayer said:

if im working in a place where the DBAs are actively involved in the workings of an application, are pushing as hard as the development team to get things done and add features, then the stored procedure approach can be a dream....ive been on one project where this was the case.  it is a great model when you have the right people.

on the other hand, if you work in a place where the DBA's handle about 50 different databases for 50 different applications which they have no clue what theyre even used for, much less how to even log in (nor do they particularly care...they really do just want to do "backups and checking disk space") declaring that "all database access shall be via SP" will lead to almost instant immobility.  most places I've worked have this kind of DBA.  they want to give you your tables (but question every single new column add without bothering to understand the application), and then just have you go away.

so i think the question of "where do you put your business/application/persistence" logic, i.e. SP level or application, depends on largely on organizational factors and not technical ones (not to mention if youre in a shop that only uses MySQL)...

# January 5, 2007 6:29 PM

PohEe.com said:

I 100% agreed on using Stored Procedures. For my xp in Monster.com, we have a very dedicated role when designing Stored Proc. It save us alot of times esp on fine tuning our stored proc because all this job is done by our DB Engineer and not DBA

# January 12, 2007 10:36 AM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add
Check out Devlicio.us!

Our Sponsors

Free Tech Publications