CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Jeremy D. Miller -- The Shade Tree Developer

Under the hood and working with .Net, TDD, Software Design, and Agile Stuff

The Importance of Being Explicit

I've gotten burned several times lately with little defects that are indirectly caused by implicit "read the tea leaves" style programming.  Here's an example of what I mean taken from my work last week.

<Rule DayLimit="5" />

It's just an attribute in an XML configuration file that specifies configurable business rules, no big deal.  The problem was that a value of "0" in the DayLimit attribute was interpreted as a completely different business rule than a positive value in the DayLimit attribute.  Fortunately an automated regression test picked up the forgotten requirement.  I think it would have elminated some confusion if there had been a separate attribute for the "0" case like AllowPostDating="False" to be more explicit.

I caused an edge case bug by a little bit of sloppy programming.   I was pulling data from some validation tables in the database into an array of objects that would be consumed by business logic classes.  In this case it was perfectly legal to have "child" rows even if the "header" row didn't exist.  In the case of orphan records, I just created the header object and assigned a zero value to it's Rate property.  In some cases during business validation I would check whether Rate != 0 to check if the business entity existed at the header level. This was confusing, but workable until some automated tests failed with false validation errors when existing header records had a rate of zero.  Once I understood the issue, I corrected the object structure to make the existence test more explicit like the following.

	public class ImplicitRecord
	{
		private decimal _rate;

		public decimal Rate
		{
			get { return _rate; }
			set { _rate = value; }
		}
	}

	public class ImplicitBusinessClass
	{
		public void Process(ImplicitRecord record)
		{
			// Check if there is ANY rate
			if (record.Rate == 0)
			{
				// create an error message
			}
		}
	}

	public class ExplicitRecord
	{
		private decimal _rate;
		private bool _hasRate;

		public decimal Rate
		{
			get { return _rate; }
			set { _rate = value; }
		}

		public bool HasRate
		{
			get { return _hasRate; }
			set { _hasRate = value; }
		}
	}

	public class ExplicitBusinessClass
	{
		public void Process(ExplicitRecord record)
		{
			// Check if there is ANY rate
			if (!record.HasRate)
			{
				// create an error message
			}
		}
	}

One of the scariest, most error prone idioms in all of software development is passing around a Hashtable/ArrayList of Hashtable/ArrayList objects.  How many pernicious bugs have been caused by fouling up the key values to the Request, Session, and QueryString collections?  Assuming you have a choice in the matter, which class below would you rather consume based on it's public API?  The answer is ExplicitActionClass unless you're being perverse just to spite me.  Remember that other developers will follow behind you, so code to reduce the probability of their mistakes.  Make the public API easy to use and intention revealing.

	public class HashtableActionClass
	{
		public ArrayList Execute(Hashtable arguments)
		{
			// unwrap things in the arguments hashtable and perform work
			string userName = (string) arguments["USER_NAME"];
			decimal purchaseAmount = (decimal) arguments["PURCHASE_AMOUNT"];

			// perform work and return an answer

			return new ArrayList();
		}
	}

	public class ReturnClass
	{
	}

	public class ExplicitActionClass
	{
		public ReturnClass Execute(string userName, decimal purchaseAmount)
		{
			ReturnClass returnValue = new ReturnClass();

			// perform work and log results to the returnValue variable

			return returnValue;
		} 
	}

I've never coded in C++, but I'm guessing that passing around pointer structures between objects led to some truly wicked bugs.

Evil Databases

Ambiguous database designs can cause even more damage.  I've often run across database tables whose columns represent very different conceptual things depending upon the values in another column.  I know there is a little bit of database inefficiency by having a bunch of columns with lots of null data, but I'd still much rather have separate columns for separate logical concepts and pieces of information. 

I've been in several situations where the only way to integrate two or more systems was to directly peek into another system's underlying database.  This is fraught with so much risk that it's borderline insane.  It's risky because you're duplicating a lot of logic required to "interpret" the business meaning in the underlying data.  On one hand you have to correctly reproduce the interpretation, and on the other hand you must keep the duplicated logic synchronized through later changes.  The synchronization just isn't going to happen because the different codebases are probably being built and tested separately.   Any change in the database schema becomes risky without coordinated, large scale automated testing on every downstream system or any form of compile time checks keeping the applications synchronized with the database. 

At a previous employer we had an absolutely humongous Operational Data Store (ODS) that contained answers to every question.  Many applications in the enterprise touched this monster.  A much greener, more idealistic version of myself foolishly tried to give the ODS team a suggestion to optimize a terribly sluggish view I needed to access.  I basically had my ear chewed off and told that the code change would take 6 months of regression testing to do something like that.  I ended up creating a polling mechanism to cache the very important, extremely volatile data every 15 minutes just so our application could actually function with any kind of decent performance.  The biggest usage of our user interface turned out to be a report on the cached data that we'd thrown in at the last minute as a "nice to do" feature request.  We almost had to add another web server to the farm just because of the demand for this data that was too difficult to pull out of the ODS.

I think that people are dangerously overexuberant about SOA in general, but prudent usage of SOA should eliminate the need for duplicating the fragile database sharing crap, and that sounds pretty darn good to me (using web services as a thin pass-through data layer instead of raw ADO.NET or JDBC seems rather foolish to me though). 

I feel that coding explicitly is orthogonal to the Static versus Dynamic Typing debate.  "Duck Typing" is one thing, but hidden meaning in code is just plain bad coding.


Published Aug 24 2005, 01:00 AM by Jeremy D. Miller
Filed under:

Comments

BlackTigerX said:

I actually just saw this a few minutes ago, in one database, there is this field "BilledCorrect", and I was told that 1, means is correct, 2 is incorrect

=o|
# August 23, 2005 6:02 PM

Sahil Malik said:

See, and when I'm explicit everyone bitches at me.
# August 23, 2005 6:07 PM

Jeremy D. Miller said:

Gary,

Reflection abuse vs. Hashtable of Hashtables? I'm not sure which one wins the "who's more evil" smackdown. I actually was trying *not* to rant and say something useful for once here on CodeAdequately.com.

I, of course, have already ranted a little bit about abusing reflection here--> http://codebetter.com/blogs/jeremy.miller/archive/2005/06/29/130090.aspx

I don't remember what system I was looking at that day, but I bet you can make a good guess.
# August 23, 2005 7:39 PM

Joshua Flanagan said:

You caught my attention with the paragraph in passing about fragile database sharing crap. It is an issue very dear to me right now as I try to figure out an integration strategy for an application that contains a lot of data other people want. I agree SOAP would be overkill, but I wonder what you mean by "raw ADO.NET" ? Does that mean giving other users/apps a database username/password to use via ADO.NET? Isn't that fragile database sharing? I'm really trying to avoid opening up the DB to be used whenever and however by people that get an account.
Right now, I'm leaning towards flat file extractions being dumped to a network share, for all to consume as they see fit. It seem so ancient and low-tech, but somehow elegant and "future proof".
# August 23, 2005 8:53 PM

darrell said:

If you have lots of null columns, you don't really have a normalized database. Those should be put into a foreign key table with a "type" column.

Duck typing is something made up by the Pragmatic Programmers. It is what the REAL intention behind typing is, not the watered down "this is how we implemented it in [a given staticly typed language]." But I would never pass in a Hashtable so that my object messages were one parameter. I'd pass in an object on which I could call known methods, dynamic or static doesn't matter.
# August 24, 2005 9:27 AM

Jeremy D. Miller said:

Darrell,

There are always exception cases to depart from a normalized database structure. In a logic-intensive application the database should be designed around object persistence. I was thinking specifically about the design of a database table to store an inheritance hierarchy of classes. Check out Fowler's Single Table Inheritance pattern from the PEAA book. The way you're suggesting to design tables to store the inheritance relationship is only one option out of many. If a denormalized database makes persistence easier without hurting performance, then normalization is *not* important.


# August 24, 2005 10:29 AM

Jeremy D. Miller said:

Josh,

I just meant that I would never put an unnecessary SOAP layer between an application and its database. A couple years ago some of the clowns on your "Architecture" team thought it was a good idea to make every application access data through SOAP web services no matter what the situation. Thank God we had no real power then. I wouldn't *ever* allow any other application connect directly to a production transactional database. I'm not wild about the file extracts because of file locking concerns, but I think you're on the right path. DTS to push the data extracts into a reporting DB maybe? I also don't like the integration by polling against a reporting DB strategy either. That sucked royally in production.
# August 28, 2005 11:00 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add

About Jeremy D. Miller

Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy previously worked as a systems architect building mission critical supply chain software for a Fortune 100 company and learned agile development practices as a .Net consultant at ThoughtWorks, one of the pioneers of agile development. Jeremy is the author of the open source StructureMap (http://structuremap.sourceforge.net) tool for Dependency Injection with .Net and the forthcoming StoryTeller (http://storyteller.tigris.org) tool for supercharged FIT testing in .Net. Jeremy's thoughts on just about everything software related can be found on his weblog "The Shade Tree Developer" at http://codebetter.com/blogs/jeremy.miller, part of the popular CodeBetter site. Jeremy is a Microsoft MVP for C#. Check out Devlicio.us!

This Blog

Syndication

News

All opinions expressed here constitute my (Jeremy D. Miller's) personal opinion, and do not necessarily represent the opinion of any other organization or person, including (but not limited to) my fellow employees, my employer, its clients or their agents.

About Me

"Best Of" Compendium

StructureMap (Dependency Injection for .Net)

StoryTeller (Supercharged Fit)

Build your own Cab

TestDriven

MVP