CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Jeremy D. Miller -- The Shade Tree Developer

Under the hood and working with .Net, TDD, Software Design, and Agile Stuff

December 2006 - Posts

  • Jeremy in Java Land

    I'm coding in Java this week and next!  ...and I'm a bit uncomfortable.  The problem isn't really Java per se, it's mostly unfamiliarity with the existing architecture and the different toolset.  Here's what I think so far:

    • The language difference is...    ...not worth talking about.  I haven't coded in Java since 2001 and I'm not having any particular problem with the language because...
    • Jumping from ReSharper to IntelliJ and vice versa goes a long ways towards smoothing the transition.  ReSharper shortcuts don't entirely map 1 to 1 with IntelliJ, but close. 
    • I spoke too soon.  I'm definitely slower in Java.
    • IntelliJ is the Cadillac/Lexis/Mercedes of IDE's.  This pretty well follows for any JetBrains product, but IntelliJ is the flagship.  There's a heckuva lot of stuff packed in there out of the box that you have to piece together with VS.  The Visual Studio team has heard this before, but one more time -- "just make it like IntelliJ."  I know VS's feature set is primarily driven by the Mort persona, but I bet Mort would want IntelliJ stuff if he/she were familiar with IntelliJ's feature set. 
      • More accurately, I don't care about a lot of the stuff that VS has that IntelliJ is missing, but the reverse is not true.
    • I think Visual Studio looks better though.  Considering the fact that we stare at those things all day, that's not trivial.
    • JUnit 3 sucks in comparison with NUnit 2+.  Not even getting into the expanded stuff in MbUnit, just the basic mechanics are clumsy to me.  I've heard that the newest version of JUnit brings them into parity with NUnit.
    • Unfortunately, the architecture of the code is not conducive to using TDD.  I'm actually driving the Java service through .Net FitNesse tests running from NUnit that go through the .Net gateway classes on the client.  Ouch.  It's easy to get the Java service up in debugging mode, but this isn't an ideal way to develop.  The green bars are farther and fewer in between.  I'm getting uncomfortably familiar with the IntelliJ debugger.  Yes Virginia, TDD is faster -- even in the short run.
    • I don't particularly like CVS.  Not the out and out disdain I feel towards Visual UnSafe, but I still want Subversion.  More accurately put, CVS doesn't like me.
    • We're doing some interoperability between .Net and Java by sending xml messages (don't ask).  Xml serialization from Data Transfer Objects on our (.Net) end and JAXB on the Java server side both working off shared XSD's.  I'm gonna call this one a draw.  JAXB seems to give us more control over the XSD compliance, but it's clumsy to use. 
    • To the guys who built JAXB, would you please go read up on the Law of Demeter
    • My client's naming convention calls for private members to start with "_" like "_doSomething(arg1, arg2)."  My VB background pokes out here as I sometimes struggle to stop capitalizing everything in code.  I use upper case for publics and lower case for private and protected members in C# (with underscores for class fields).  No big deal.
      • I've always thought that you can look at C# code and tell whether the developer came from a VB or Java background pretty easily by their capitalization in the code.  Of course, you can spot a COBOL (insanely long monolithic methods) or a Fortran (didn't get the memo that output parameters are evil) guy in any language.  You don't have to spot Smalltalkers because they'll always let you know on a near daily basis.
    • The declared exception thing is annoying.  I really don't see it doing anything but cluttering up the code personally.  It's what you're used to I guess, because my client's Java guys think that C# not having checked exceptions is a shortcoming.  I suspect that declared exceptions lead to using exceptions as a flow control device and I think that's mildly evil.
    • Could we please get an industry standard for the shortcut keys for "step into", "step over", and "step outside"?  I swear that every IDE built starts by saying, let's take the debugging shortcut keys from product X and move them all over one key to the left.
    • Everytime I see GregorianCalendar or JulianCalendar in Java I want to start chanting.  And what's up with having a dozen different classes for dates and times?  One class too simple?
    • I'm going to ask for a raise now that I'm multilingual;)

     

    End Result:

    I'm happy to stick to .Net & C#, but it's been fun just to do something different and see how the other half lives.  Now, if I could just get on a Ruby or Python or Squeak or Haskell or Erlang project...

  • Are Code Statistics Useful?

    My team had a conversation about code statistics at lunch the other day, specifically about Cyclomatic Complexity numbers.  The gist of the conversation was whether or not it was important or useful to regularly run these metrics on your code.  My coworker's point, which I *mostly* agree with, is that he would refactor a class well before it got to a high CC number by simple inspection.  As in, he doesn't need a tool to "know" when a class is getting too big or taking on too many responsibilities. 

    I would agree completely (and I think you should be able to spot bad code visually anyway), except for the old parable about putting a frog in boiling water.  If you drop a frog into boiling water he jumps right out.  If you put a frog in cool water, then gradually heat the water to boiling, he won't jump out.  Classes can be the same way as they slowly accrete yet one more function.  A CC number is an awfully quick warning that you're going out of bounds.  A run of NDepends onto StructureMap pretty well confirmed to me some weak spots in the class structure and spurred me to make some refactorings that cleaned up the internal structure.

    Frank Kelly has a post that pretty well expresses my opinions on metrics.  But in brief, even though I routine forget to run NDepends, I think:

    • Metrics should only inform, never take the place of a constant attention to code structure and visual awareness of that structure
    • Metrics are very useful when you inherit someone else's codebase
    - btw, this post and conversation was spawned by a codebase I was looking at (no names) that had some monster classes.  When I ran some CC numbers I came up with a half dozen classes over 200 in Cyclomatic Complexity.  20 is the rule of thumb for the upper threshold of a class, just in case you're not familiar with the CC number.  200 screams for refactoring.
     
  • Downcasting is a Code Smell

    Before you go any farther, a Code Smell simply implies that there *might* be something wrong with your code and that you *definitely* need to evaluate your code. 

     

    Downcasting, in my opinion, is a code smell.  Specifically, I think code like this below is smelly: 

            public void Filter(IFilter filter, object[] targets)

            {

                if (filter is StringValueFilter)

                {

                    StringValueFilter stringValueFilter = (StringValueFilter) filter;

                    // call a specific member on the StringValueFilter

                }

            }

    There's a couple different problems that the downcasting smell might be exposing:

    • Your abstraction is leaky and needs to be refactored or eliminated.  In the case above, everything should be accomplished through the IFilter interface.
    • You've violated Tell, Don't Ask, which is another way of saying that you're not properly encapsulating functionality.  Encapsulation is important for a couple reasons.  First, encapsulation makes code easier to use because there is less stuff necessary to know or do to make it work.  Secondly, poor encapsulation can greatly increase duplication throughout your code base.

    To avoid downcasting, you might take a look at:

    • The Information Expert pattern from Craig Larman, or
    • Use Double Dispatch.  Double Dispatch used to feel really odd to me, but I've used it in a couple places in the last year where it has streamlined quite a bit of hacky if/then logic.
  • You know it's a bad day when...

    ...merely changing the exception being thrown is a minor victory

    I do so enjoy working with someone else's very elaborate code for the first time. 

  • Want Testable Code? Be Careful with Static State

    This post isn't really a rant, more of a warning.  I really, really think that testability should be a first class consideration for doing software design.  From experience, retrofitting an existing codebase with automated testing is problematic.  One of the worst culprits has consistently been dealing with application state.

    Yeah, I know that I've written this post before, but one more time. Static state in your application can potentially cripple your ability to do automated testing.  The only way to create reliable automated tests is to completely control the inputs coming into the test.  Cached data really needs to be controlled as well to keep tests from being skewed by previous tests.  In order to be reliable and maintainable, automated tests should be isolated from one another and independent upon the order in which the tests are executed.

    Yes, you most certainly have legitimate needs for static members, singletons, and cross request state.  Just remember to leave yourself an easy way to flush that state or replace those singletons on demand inside automated tests so that you can reliably establish a known starting state with know inputs. It might mean a little more code to write, and maybe even additional complexity.  A little more coding in exchange for easier testing can lead to shipping that same code earlier.  It's an awfully good idea to keep your eyes on the bigger picture.

    O/R mappers can be particularly problematic because they often implement some sort of stateful cache beneath the covers.  We were getting burned today because we were resetting the database state of some domain objects directly to the database in the SetUp of a FitNesse suite.  The service had a cached version from the previous test and confusing errors ensued.  In my previous job we had a wrapper around our NHibernate access that had a "ResetSession()" method that we could call between tests to tear down the stateful NHibernate caches to guarantee a clean slate for each test.  We left ourselves a grammar in a FitNesse fixture that we could use to flush all of the cached data between test runs.  I would strongly recommend leaving yourself some sort of hook to do exactly that.

    I've never employed this tactic much, but going to randomized data might be a good idea inside of test data.  In our case, I think we could make our problems largely go away by using a bit fancier FitNesse machinery to grab new surrogate key values inside our tests to avoid test data collision.  The test mechanics get a little bit harder, but I'll gladly take a bit more coding in return for tests that can run back to back.

    More:

    Chill out on the Singleton Fetish

    TDD Design Starter Kit - Static Methods and Singletons May Be Harmful

    *A* way to get around the Singleton problem without trashing your testability and creating strong coupling:  http://structuremap.sourceforge.net/SingletonInjection.htm
     

  • Introduction to using the StoryTeller Alpha #1

    The first ship is away!  The first ship is away!   Download the StoryTeller v0.50 at http://storyteller.tigris.org/servlets/ProjectDocumentList?folderID=0.

    At Scott Bellware's request I'm making an alpha release of StoryTeller today along with a little introduction to the capabilities that are built into StoryTeller so far.  If you're willing to forgo the Wiki portion of FitNesse and edit html by an editor, StoryTeller is ready to go.  For those of us with an investment in FitNesse, there are also some tools in StoryTeller to supplement the usage of FitNesse to smooth out some of the kinks of FitNesse.  Just to be clear, StoryTeller still uses the exact same .Net version of the FIT engine used in FitNesse (http://fitnessedotnet.sourceforge.net).   The only caveat is that StoryTeller will be .Net 2.0+ from the onset and FitnesseDotNet is not yet ready to make that leap.  StoryTeller might temporarily come with a forked version of the FIT engine that we use on my current project that has added support for .Net 2.0 specific features (generic types, nullable types, and enums).  Here's a summary of the functionality in this release:

    • Organize FIT tests for an application by a nested Suite hierarchy
    • SetUp & TearDown support
    • Run tests inside of Visual Studio.Net
      • Pull the test definition from a FitNesse (or any other web) url
      • Embed Wiki text inside your code
      • Define the test and table structure programmatically
    • A drop in replacement for FitNesse's TestRunner.exe
    • Run tests from NAnt
      • by tag
      • from FitNesse files
    • Run tests in a clean AppDomain
    • The UserInterface is included, but all it does at this point is run a single test at one time.  If you want to play with the client, I suggest that you do it from the VS.Net solution to avoid changing configuration.

     

    I am assuming a basic familiarity with FIT testing for the rest of the post.  See FitNesse.org for more background on FIT concepts.

     

    Domain Terms

    StoryTeller's core domain model is a composite pattern consisting of classes implementing the ILeaf interface.  Many aggregate operations take advantage of the Composite & Visitor Pattern combination (more on that later).  The individual classes/terms are:

    • Table - Stores and structures the information for a single html table within a FIT test
    • Row - A row of cells inside a Table
    • Cell - A single cell within a Table row
    • Comment - A piece of freeform text within a Test.  At this point, StoryTeller does not support any kind of markup inside of Comments.
    • Test - Represents a logical Test (duh).  Contains a mixture of Comment's, Table's, and Include's
    • Fragment - A subclass of Test.  Represents a reusable "fragment" of FIT html that can be included inside Test's.  StoryTeller follows the same convention as FitNesse for "SetUp" and "TearDown" fragments
    • Include - a reference inside a Test to a Fragment object
    • Suite - A related group of Test's.  Can also include other child Suite's in an n-deep composite structure
    • SystemUnderTest - The logical "System" being tested.  Instances of the ISystemUnderTest interface know how to persist and retrieve their test data, and execute tests. 

     

    File Structure

    In it's default mode, StoryTeller structures tests into a hierarchical file structure.  The top folder is the "SystemUnderTest" folder.  Inside this folder are folders for the top level Suite's and a FixtureLibrary file.  Roughly, a system folder is going to look like this:

    • Root
      • FixtureLibrary.xml
      • Suite #1
      • Suite #2

    Inside the source tree is a folder called \TestData\SimpleTestApplication that can be used as an example.

    FixtureLibrary

    One of the changes from FitNesse to StoryTeller is the manner in which Fixture classes are aliased inside the FIT tables.  My current concept is to have an object that represents the entire aliasing of Fixture classes as well as information related to editing Fixture tables by Fixture type.  At this point, all the FixtureLibrary class supports is registering all of the concrete Fixture types within a single assembly into the FIT engine.  The point being that if you have a Fixture class called "MyAssembly.Namespace1.Namespace2.Namespace3.MyFixture," you can reference it in FIT tables as simple "MyFixture" instead of the full name that might scare off your testers.  Create a file called "FixtureLibrary.xml" in the root directory of the system to list the assemblies that contain FIT Fixture classes like this:

    <?xml version="1.0"?>

    <FixtureLibrary xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

      <Assemblies>

        <Assembly>StoryTeller.TestFixtures1</Assembly>

        <Assembly>StoryTeller.TestFixtures2</Assembly>

      </Assemblies>

    </FixtureLibrary>

     It's simply an Xml-serialized memento of the FixtureLibrary class.

     

    Suite

    At this point StoryTeller requires a file named "Suite.xml" in each Suite folder that contains the Suite "Key" (same as the folder name).  That file looks like this:

    <?xml version="1.0"?>

    <Suite>

      <Key>Grandchild</Key>

      <Tags />

    </Suite>

     I'll explain the tagging later

    The Suite folder might look like this:

    • Parent Suite
      • Child Suite #1
      • Child Suite #2
      • Test1.htm
      • Test2.htm
      • Setup.fragment
      • TearDown.fragment
      • DataSetup.fragment 

     

     

    Test 

    I do think I'll create a mode later that lets you save tests in the FitNesse Wiki text format, but for right now the tests are stored in minimal XHTML (StoryTeller parses the file as XML!) files with an "htm" extension like this one:

    <html>

        <head>

            <!-- Test "Key" should be in the title node -->

            <title>The First Test</title>

        </head>

        <body>

     

            <h2>The First Test</h2>

     

            <!-- The tags are read from any <li> node with text of form {key} = {value} -->

            <h4>Tags</h4>

            <ol>

                <li>category = regression</li>

                <li>category1 = regression1</li>

            </ol>

     

            <!-- Comments need to be nested inside <p> tags -->

            <p>Some comment</p>

     

            <!-- Tables are in <table> elements (surprise), all formatting will be ignored -->

            <Table border="1">

               <tbody>

                   <TR>

                       <td>A</td>

                       <td>B</td>

                   </TR>

                   <tr>

                       <TD>C</TD>

                       <td>D</td>

                   </tr>

               </tbody>

            </Table>

     

            <p>Another comment</p>

        </body>

    </html>

     

    Fragment

    Fragments are formatted exactly like Test's, but saved to a ".fragment" extension.  In the course of writing this post I realized that I don't yet support reading and writing Include references to the test html.  I think that this will be added soon.

     

    Tagging

    You can add tags to both Suites and Tests to organize and find Tests by category.  Tests and child Suites inherit tags from their parent Suites if not explicitly defined in the Test.  Some usages for tagging might be:

    • The FIT workflow that I try to use is to define FIT tests that are "in flight" as Acceptance tests, and FIT tests from previously completed stories as Regression tests.  The primary reason being that failures in Regression tests will cause a FIT run to fail, while Acceptance tests will not.
    • Maybe you group Tests into Suites by component or feature.  You might use tags to further categorize the tests by user story or task.
    • Assigning Tests to individuals.  You might tag Tests to indicate to the Testers that the tests are incorrect, or the Testers might tag a Test to let developers know that the Test is not complete and safe to ignore.

     

    NAnt Tasks

    I've built a couple of custom NAnt tasks as entry points to the StoryTeller engine.  If there's any demand I'll add MSBuild equivalents as well.  To use any of these tasks, StoryTeller.dll and StoryTeller.Tasks.dll must be reachable by NAnt.  Make it easier on yourself and just copy the dll's into the same folder as the NAnt executable.  Right off the bat, if you're using FitNesse, here's an easy way to run FitNesse tests from a NAnt build inside of CruiseControl.Net.

        <storyteller.runfitnesse

          failonerror="false"

          systemName="Fitnesse Test Run"

          description="Regression Tests"

          fixtureLibraryFile="${fixture.library.file}"

          binaryFolder="${test.app.dir}"

          testFolder="${fitnesse.dir}"

          outputFile="${results.dir}\RegressionTestsOutput.htm"

          />

    "testFolder" is the directory containing the FitNesse tests you want to run.  "binaryFolder" is the directory that contains the binary assemblies and configuration files.  The task works by first importing the FitNesse tests into the StoryTeller internal structure, then executes the tests by creating a new AppDomain with the ApplicationBase pointing to the "binaryFolder."  StoryTeller.dll must be deployed to the "binaryFolder." 

    Import an existing FitNesse test folder to StoryTeller with:

        <storyteller.importfitnesse

          fitnesseFolder="${fitnesse.dir}"

          destinationFolder="${imported.test.dir}"

          />

      

    Finally, you can run actual StoryTeller test structures with:

        <storyteller.testbatch

          description="All Tests"

          outputFile="${results.dir}\RunAllTests.results.htm"

          binaryFolder="${test.app.dir}"

          testFolder="${sample.test.dir}"

          />

     

        <storyteller.testbatch

          description="All Tests"

          outputFile="${results.dir}\RunWithTags.results.htm"

          binaryFolder="${test.app.dir}"

          testFolder="${sample.test.dir}">

            <tag category="Story" value="Story1"/>

        >

     

     

    In writing this up I noticed that I haven't added support for running a batch of tests by Suite. That will be added in the next alpha.

    Running FitNesse Tests from NUnit

    Finally, the one feature, as simple as it is, that will make even the most diehard FitNesse fan use StoryTeller -- running FitNesse tests from within NUnit tests to debug or just to run the FitNesse tests locally without deploying the code or importing the FitNesse tests.

    Run a FitNesse Test Locally 

            [Test]

            public void RunFromWebPage()

            {

                TestRunner runner = new TestRunner();

                runner.ReferenceAssembly(Assembly.GetExecutingAssembly().FullName);

     

                TestResult result = runner.RunFromWebPage("http://MyServer:8080/SuiteAcceptanceTests/Test001");

                result.AssertSuccess();

     

                // if you need to,

                result.OpenResultsInBrowser();

            }

    Build the FitNesse Test in Code

                TestRunner runner = new TestRunner();

                runner.ReferenceAssembly(Assembly.GetExecutingAssembly().FullName);

     

                Test test = new Test();

                test.AddTable(typeof (ArithmeticFixture));

                test.AddRow("X,Y,Add()");

                test.AddRow("2,2,4");

                test.AddRow("A,4,6");

     

                TestResult result = runner.ExecuteTest(test);

                result.AssertSuccess();

    Lastly, with Wiki Text

            [Test]

            public void ExecuteWikiText()

            {

                TestRunner runner = new TestRunner();

                runner.ReferenceFixtureType(typeof (ArithmeticFixture));

     

                TestResult result =

                    runner.ExecuteWikiText(

                        @"

                    !|ArithmeticFixture|

                    |X|Y|Add()|

                    |2|2|4|

                    |2|3|5|

                    |2|3|6|

                            ");

     

                Assert.AreEqual(2, result.Counts.Right);

                Assert.AreEqual(1, result.Counts.Wrong);

            }

  • Relearning Lessons

    It's never fun relearning something you already knew...

    • Last year I worked a lot against a database that was missing a lot of referential integrity and even a few uniqueness checks.  You know how the old timers say that's a really bad thing, but you've never actually seen it happen because you don't ever leave off the integrity?  Orphan records and duplicate records and other things that go bump in the night that you've heard campfire stories about?  Bad, bad stuff.  You can fix bugs in your code, but invalid data in the production database can be awfully dicey to clean up.
    • The doozy for me this week.   When you have performance problems, you profile the code because the performance problem may not be where you would expect it.  We have a performance problem in some new code inside a brand spanking new architecture.  We did a few (minor, it wasn't that bad) changes for performance that didn't help that much.  Then one of my coworkers used the JetBrains profiler on it and found that a 3rd party component in the mix was throwing and swallowing an exception on every single data record going through that component.  In one fell swoop we identified our performance problem.  A quick support request and we've got a patched version of the component to correct the problem.
    • Oh yeah, you know how they tell you that throwing and catching exceptions is expensive in CPU time?  Believe you me, that one's true.
  • Indelible proof of a healthy team

    ... or irrefutable evidence of an unhealthy team.

     

    Can your team do an iteration retrospective with a minimum of venom and squirming?  Is everybody unafraid to speak up in a retrospective?  Can you calmly talk about what's not going well and what is without retribution?

    If you can answer these questions in the affirmative, you just might be a healthy team.  If you answer no, or if you have, let's say, regularly scheduled mandatory team building meetings, you might need a checkup.

  • Ditto on Ayende's Microsoft OSS Post

     

    I want to comment a bit on The Problem of Open Source in the Microsoft World from Ayende.  Please go read his post and add your voice somewhere.  I actually do believe that Microsoft listens to us, we just have to start asking for the right things. 

    I strongly believe that a friendlier attitude from Microsoft towards OSS tools in .Net can only benefit us -- and Microsoft as well I would think.  Recently, one of the senior developers at my client asked my opinion in regards to .Net versus Java.  That's not an interesting conversation in terms of pure technology anymore (.Net/Java versus Ruby/Python/LAMP is far more interesting).  In terms of development community though, I think there is a vast gulf between Java and .Net, at least at the upper end.  I openly admire the vibrant OSS community in Java (I'm downright jealous of the community around Rails) and the wealth of innovation that they have sparked.  The .Net development community seems to either:

    1. Wait for Microsoft's new tools and use them without any critical evaluation.  Are you sure that Enterprise Library is the best choice?  Could it be better?  Are there existing alternatives that are better?  Could we write something much simpler than CAB and use that instead? 
    2. Or port something from Java and now Ruby

    I think the lack of innovation from the .Net community is extremely disappointing.  We hurt ourselves By limiting .Net innovation to Redmond and Java leftovers.  I'd really just like to see more of an attitude of "we can do it ourselves" rather than having tunnel vision on Redmond.  And the old excuse that innovation is lacking because .Net is much newer than Java?  It's been 5-6 years now.  That excuse is tired. 

    Microsoft developer tools are primarily geared around RAD development  (I've banned the "M" word from my blog, but you know that I'm thinking about "M's" here).  Doing Domain Driven development with Agile practices might not even be possible or efficient without the OSS tools that have historically provided the gaps between Microsoft's tooling.  Those tools have been, and continue to be, important.  Microsoft *still* does not have a fullblown Inversion of Control tool, a released O/R mapper, a Continuous Integration tool, or a mock object library.  All OSS tools that I depend on almost daily.

    If nothing else, OSS tools, especially developer tools, can be driven by developer needs faster than Microsoft can possibly move -- and Microsoft can't possibly anticipate every need.

    By the way, my money is where my mouth is.  I'm not as prolific as Ayende (nobody is), but my OSS resume is:

    • StructureMap - Dependency Injection tool (first release in June 2004)
    • StoryTeller - .Net tooling for FitNesse testing (shooting for the first alpha in January 2007)

    and I will be contributing some enhancements to NUnitForms and FitnesseDotNet (shows its heritage as a Java port way too much).

    I would definitely recommend being involved in an OSS project.  My OSS work has positively contributed to my career path.  I wouldn't say that it's brought me any particular fame, but it's been a great learning experience.  I started StructureMap as a way to learn .Net when I was stuck in a non-coding architect role.  Besides being useful in and of itself, it gave me a toolbench to try out TDD and design patterns that I've used on paying projects since.

     

    Does anybody know where JetBrains stands in regard to a full .Net IDE ala IntelliJ?  I'd concur with Ayende and the commenters that I think our best hope is JetBrains.  I have this ridiculous vision of a bunch of developers with bad hair saying "help me JetBrains, you're my only hope."

    I still dislike the GPL license by the way.  We're using a GPL licensed tool in our code base that has a specific provision to allow you to redistribute the binary as is in your own product just like NAnt.  Very early on I predicted that the company's inhouse lawyer would have a conniption over the license.  I'm apparently psychic because my prediction unfolded exactly the way I said it would.  I get the utopian ideal behind the GPL, but the benefits of being an OSS author are indirect.  The specific gains are reputation and often a chance to learn from working on projects that are quite different or more challenging than your day job.  The gains are primarily derived from somebody else using your OSS tool.  By slapping the GPL on it you're effectively dooming the fruits of your labor to the dust bin (or academia).

     

    P.S. -- I partially blame lawyers for this because I know that part of MS's attitude is due to a fear of legal proceedings over intellectual property rights. 

  • On Writing Maintainable Code

    A QUICK NOTE:  This was supposed to be a single treatise on the coding and design principles that *I* think are most important for writing maintainable code.  A draft of this has been on my hard drive for a long, long time and it's turning into my own great white whale.  Just to get it out, I'm breaking it into pieces that will follow shortly, depending on ongoing bouts with writer's block.  I'm going to intermix quick discussions of these design concepts with some case studies of systems I've built or worked with that illustrate both the positive outcome of following the principles and the pain incurred by failing to apply the principles.  Some day, I'll gather the pieces back up into a single coherent article.

     

    Enable Change or Else!

    Change is a constant in an enterprise software system.  A system that is expensive or risky to change is an opportunity cost to your business.  Especially if you're a small company, poor code quality in your flagship product will jeopardize your company's future. 

    A couple of months ago I talked about my vision for creating a Maintainable Software Ecosystem in which I claimed that the single most important quality for an enterprise software system is maintainability, i.e. the ease in which a system can be modified or extended.  I spent a lot of screen space talking about topics like source control, test automation, and build automation.  There's a lot of supporting practices that can greatly aid in shipping working code, but I purposely put off the single most important factor -- the Code!  Maintainability will not happen without a commitment to quality code, now, and throughout the lifecycle of a system.  I can always throw code quality to the wind and code faster now, but that slop will catch up to me or the next team in the future, and the future has a funny way of happening sooner than we expect.  Besides, I've been the "next" team, and it wasn't pretty.

    I spent much of the past two years at my previous job extending, restructuring, or flat out re-writing legacy code.  I frequently saw my team's efforts hindered because of existing code that was poorly factored or just flat out hard to understand.  We were consistently faster when we were working with the newer code that we wrote and designed inside of an Agile process than we were working with the older legacy code.  Some of the disparity in productivity between new and legacy code was our familiarity with the newer code and the better build and test infrastructure of the newer code, but I'd still place much of the blame on the structural flaws of the legacy code.  Ironically, and certainly not for the first time, I thought some of the biggest impediments to extending our system were directly attributable to well meaning attempts at creating extensibility in the existing code. 

    Extensibility yes, but how?

    One solution for system maintainability is to build in "extensibility" points or use metadata-driven design approaches.  It's great if the extensibility points match up well with the actual direction of the later change, but the wrong extensibility points can cause a lot of harm by making a design harder to understand or awkward to extend.  David Hayden recently posted some frustrations with this style of design. 

    If it's true that most code spends much more time being maintained (changed) than the initial write, then it certainly behooves us to create code that can be changed.  Extensibility points can certainly help, but the wrong extensibility points can also do plenty of harm, so they're not the whole answer.  In My Programming Manifesto, I expressed a strong preference for creating maintainable code throughout the codebase rather than concentrating on specific extensibility points for future needs.  I've always thought that the Simplest Thing That Could Possibly Work is largely true because the percentages say that you simply will not be able to anticipate a great deal of the future change with consistency. Overall, odds are that the best chance for successful maintainability is in creating "malleable" code that can be easily and safely changed in unforeseen ways.  Instead of focusing on future proofing code in a few "strategic" spots, concentrate on making your code easy to change as a simple matter of course.

    Coder to Craftsman

    When people first learn how to write code they necessarily focus on just making the code work, with little or no thought for style or structure, and certainly no thought for the future.  Any particular piece of code tends to land wherever the coder happened to be working when they realized that they needed that piece of code -- or wherever the RAD wizards felt like dropping the code. 

    I think there is an inflection point where a coder mindlessly spewing out code transforms into a thoughtful software craftsman capable of creating maintainable code.  That inflection point happens the day a coder first stops, lifts his/her nose out of the coding window, and says to him/herself "where should this code go?"  That might also lead to questions like "how can I do this with less code?" or "how can I write this to make it easier to understand?" or even "how can I solve one problem at a time?"  The rest of a developer's career is spent pursuing better and better answers to the question "where should this code go?" 

    My first "enterprise-y" system was an ASP Classic web application on top of Access to track project auditing for my engineering team.  If you opened up any of the early ASP scripts from that application you'd see SQL statement construction mixed with post form handling, data access and HTML creation all intermixed.  Business logic happening willy-nilly at various points in the ASP page whenever I was coding away and realized I needed some logic.  A lot of functionality was duplicated because each ASP page was self-contained, causing more effort to write and then change the application.  The pages themselves became difficult to understand because all the code was dumped into one bucket with no rhyme or reason.  Troubleshooting business logic meant sifting through a lot of unrelated http handling code. Finding data access problems meant reading through quite a bit of the html templating along the way.  The code stunk and I knew it, even as a coding newbie working solo.  There had to be better ways to build the application.  I improved things just by creating a set of common utility subroutines that could be used to reduce the amount of duplicated code. It wasn't much, but it was a start.

    The Maintainable Code Checklist

    So exactly how do I know "where this code should go" for maintainability?  To guide your coding and design for maintainability, I've come up with a checklist of the half dozen questions in the table below that I think should be answered in the positive.  There are three major themes running through the checklist -- intention revealing code, getting rapid feedback from the code, and being able to do one thing at a time.  Assuming that maintainability is important to you, it's also time to talk about the design principles that provide guidance to answering this Maintainable Code Checklist.  In the table below I've tried to tie the maintainability questions to some of the design principles that will guide the thoughtful developer to answers.  This certainly isn't a comprehensive list, but it's a start. 

    Question Yes comes from...
    Can I find the code related to the problem or the requested change? Good Naming, High Cohesion, Single Responsibility Principle, Separation of Concerns
    Can I understand the code? Good Naming, Composed Method, The Principle of Least Surprise, Can You See The Flow of the Code?, Encapsulation
    Is it easy to change the code? Don't Repeat Yourself, Wormhole Anti-Pattern, Open Closed Principle
    Can I quickly verify my changes in isolation? Loose Coupling, High Cohesion, Testability
    Can I make the change with a low risk of breaking existing features? Loose Coupling, High Cohesion, Open Closed Principle, Testability
    Will I know if and why something is broken? Fail Fast

    You can't help but notice that many of the principles are closely related.  I think you could say that many of the principles are simply looking at the exact same problems from a different angle.  Because of this, I'm going to first do a survey of the principles, then I'll try to illustrate the principles in code with some real life examples from my career.

     

    It All Starts with Separation of Concerns

    Separation of Concerns is the Alpha and Omega of design principles.  No other design principle that I'm going to discuss -- be it loose coupling, high cohesion, encapsulation, minimal duplication, or orthogonality -- is possible without Separation of Concerns.  Simply put, strive to do one thing at a time in your code.  Layering.  Divide and conquer.  A lot of the other principles are about enabling a system to change with minimal effort and risk.  Before you can even think about that, you need to be able to build the system, and then understand that code.  The human mind and eye can only handle so much complexity at any one time.  At a minimum, separating the traditional concerns of user interface, business logic, and data access into "layers" of the application can help minimize the complexity of any single piece of the code.  The system is still as complex as it has to be, but you stand a much better chance of understanding the business rules or data access mapping in isolation than you could if it was all mixed together.

    Back to my original ASP application.  Like almost all ASP code circa 1998 there was no separation of concerns.  Business logic, user interface presentation, and data access details were hopelessly intermingled in a single code file.  It was difficult to "see" the business logic flow because it was obscured by all the intermingled html markup and ADO database manipulation.  My first exposure to separation of concerns was the 3- or n-tier architectures for Windows DNA applications on Wrox's old ASPToday website (forget about the physical deployment options and let's just talk about logical layering here).  At a bare minimum, layered systems should be easier to understand because you can look at presentation logic or business logic in isolation.

    A couple of months ago at my previous job we were discussing the technical tasks necessary to localize the application with foreign language support.  The company has essentially run out of growth room in the United States and was looking at the European market with great hope.  Localizing the application, and supporting Unicode encoding for that matter, was a major opportunity for the company.  One of the developers was repeating the typical opinion that localization is easy to do upfront and always much more difficult to retrofit.  Sure, but in this particular case the flagship product was going to be extremely tedious to localize because they were building strings to display on the screen very deep within big stored procedures as well as the C# middle tier code.  For that matter, code that created HTML text with string concatenation was intermixed with business logic.  In that particular case, localization after the fact could have been made much less costly simply by a better separation of concerns.  Instead of mucking around with every single area of the code, a good separation of concerns would have enabled us to focus strictly on the presentation layers to make the localization changes.

    When you say layering, we probably come up with a knee jerk list:

    • User interface, presentation, controller logic
    • Business logic, rules, domain model
    • Database and data persistence
    • Service layer
    • Logging/auditing
    • Security

    That's an awfully good start, but I think the layering metaphor of higher layers talking to lower layers might not fit perfectly anymore.  "Concern" or "Layer" might be interpreted too coarsely.  Even within a single traditional "layer," you may have finer grained areas of concern that should be separated as well.  I'll discuss this more in the section about the Single Responsibility Principle.

    Orthogonality

    Orthogonality is more a goal than a principle.  In geometry, two or more axis of movement are said to be orthogonal if a change in position on one axis does not effect the position in the other axis.  If I walk north for a mile I've changed my latitude, but not my east-west longitude.  The Pragmatic Programmers applied the term Orthogonality to software as:

    The basic idea of orthogonality is that things that are not related conceptually should not be related in the system. Parts of the architecture that really have nothing to do with the other, such as the database and the UI, should not need to be changed together. A change to one should not cause a change to the other. Unfortunately, we've seen systems throughout our careers where that's not the case.

    In essence, Orthogonality in software design is the ability to change one thing at a time.  An orthogonal codebase allows different concerns to be changed independently.  Another way to think about Orthogonality is that it's the ability to work in isolation with only one aspect of a system at a time. 

    A classic, positive example is the invention of Cascading Style Sheets (CSS) in the late 90's.  When I first learned how to create html websites (in Frontpage 97!), I embedded all of the style properties directly into the html.  Mixing the content and the structure of the content with the formatting of the content often made large scale websites hard to maintain.  Moving presentation rules into a CSS stylesheet greatly simplified the creation of the html content, while also allowing the style of the web page to be more readily changed than before.  CSS started to make the formatting and content of an html page orthogonal.

    In a negative example, I inherited a codebase had very poor orthogonality between the user interface and the business logic.  In this case we had a significantly complex piece of code that worked through business logic to build an html response inside code.  One of the significant problems with this code was that the business logic could only be tested and verified through an examination of the resulting html.  Everytime we had to change the user interface, we broke all of the automated tests that were intended to test the business logic.  We weren't able to change business logic easily because we had html concatenation cluttering up the business logic.  The very first mistake was a failure to separate the business logic and user interface concerns (<sarcasm>another tip, when you're rewriting a bad existing system, you might not want to reproduce the existing bad design in the new programming language</sarcasm>). 

     

    So how do you make Orthogonality happen?  Tune in next time -- assuming that I can break through my case of writer's block.

  • Noah, I want you to build an Ark

    Think about this for a minute, say your Noah, walking down the street, minding your own business, and a big voice booms out "Noah, I want you to build an Ark.  When can you ship?"  Here's the proper response -- "Right, what's an Ark?"  When the big voice says "it shall be 200 cubits by 40 cubits by 60 cubits" you respond with "Right, what's a cubit?"  And so on and so forth.  The point being that you cannot give an accurate estimate without a detailed understanding of the tasks involved to carry out the development work.  I don't think you can efficiently develop a "complete" list of developer tasks upfront, but you've got to get deeper than "build an ark" in release planning.  "Build the rudder" and "create the keel" and "attach the ramp" are much more actionable user stories.

    My project has run into some difficulty in project and story management (in no small part because *I'm* sharing a bit of the pm hat).  Specifically, we as an entire organization, missed a lot of complexity and fine grained detail in going from a traditional functional and technical specification document to detailed user stories and tasks.  The warning sign that I (and everyone else) missed was vague, amorphous story titles like "Create a New Invoice" and estimating that story in release level planning.  That simple "Create New Invoice" CRUD screen has turned into a myriad of user defaults, cascading dropdown lists, about a dozen new fields, and a fair amount of branching logic because there is at least four different logical types of "Invoices."

    I made a conscious choice early on in the project that we would move into coding as quickly as possible because I felt that it was more important to build up momentum early than run into analysis paralysis by trying to get the story list perfect upfront.  I knew that our user stories were vague and that there would be plenty of new stories spill out as we got an iteration or two into the project. 

    What to Do?

    • Assume that your first estimate sucks and revisit it as you go.  Revise estimates early on as you learn more about the problem domain in the first iterations.  You always discover new stories as a result of early development.  Update the story backlog and communicate those changes to your management and customer as early as possible.
    • Every development task or need should be represented by a story card that is visible to the team and management.  My current client depends on developers "knowing" how to interpret requirements and apply them to their user interface.  That domain knowledge by their developers is a great thing, but those implicitly understood tasks have to be expressed explicitly in story backlogs so that the project can be accurately sized and controlled.  My team is all new to their organization and the dependency on implicit tasks has got us into trouble with our initial release planning.  In retrospect, we should have pushed harder upfront to understand the details
    • Get a "Domain Expert" on, or very readily available, to the development team.  Specifically, a domain expert who has a vested interest in your project succeeding.  I think this rule applies to any type of process or team.  The only difference compared to waterfall techniques is that iterative or Agile projects admit that fact.  The most successful project that I've ever been a part of had the business sponsor and team engaged on a daily basis (it wasn't an Agile project, it was the 20-something Jeremy works lots of hours process).
    • In release planning, get the Business Analyst(s) to walk through candidate stories, out load, in detail (basically, do a lightweight JAD).  Throwing a specification document over the wall to the downstream group and running away has to be the dumbest aspect of waterfall programming.  Face to face communication conveys much, much more information than spec documents.  I remember now how utterly moronic waterfall thinking can be.  Your project stands a much better chance of success if the developers, requirements folks, and testers interact on a daily basis and pull together for a common goal.
    • Write acceptance tests early in iterations.  We've found a lot of requirement needs by writing detailed acceptance tests with Fitnesse because it flushes out a lot of technical detail that isn't apparent at the "spec" level. 

    The same problem exists for software designs, especially upfront design specifications.  I've seen too many teams create design specifications with no more detail than a single UML component diagram with a box for "the Invoice component" and an arrow pointing to the "shipping system."  I'm extremely dubious about the value of doing a *lot* of UML modeling upfront before coding, but it's even more ludicrous to say "we've got a design" in a waterfall shop just to check off a process step when you really don't know much of anything.

    Geek points for whoever tells me what the title of this post is from.  Hint, he used to be really, really funny before the goofy rainbow colored sweater era.

  • Composed Method Pattern

    This was supposed to be a part of a much longer post on writing maintainable code, but I'm having trouble finishing the bigger post and I wanted to see an actual code-centric post before the new year.

    I've talked a lot about Object Oriented concepts in the past, but there's always procedural code lurking inside each and every class.  I feel very strongly that long methods and big classes are a veritable breeding ground for bugs.  Small methods (and classes) are easier to troubleshoot by inspection, and hence, less likely to have bugs than a long method.  To keep methods small and easily understandable, I like the Composed Method pattern first described by Kent Beck.  The Composed Method pattern states that:

    1. Divide your code into methods that only perform a single, identifiable task. 
    2. Keep all of the operations in a single method at the same level of abstraction.  Think of a method that iterates through a collection and performs an operation with each child object.  If we apply the Composed Method pattern we would move the operation on the child object into a separate method that is called from the main method.
    3. Following the Composed Method pattern will lead to the kind of small methods that make trouble shooting easier.

    In other words, using the Composed Method means assiduously avoiding the ArrowHead Anti-Pattern (deep if/then or looping hierarchies in a single method).

    Here's an example that I lifted from Joshua Kerievsky's Refactoring to Patterns book that illustrates the application of Composed Method.  Pretending that the ArrayList class in .Net doesn't exist, create a resizable collection class that can be locked into a read only mode.  The internal storage is just an array.  As necessary the class will create a new, larger internal array.  If we use pure brute force coding to build the Add() method in one big method we might get this code below.

        public class MyExpandableList

        {

            private object[] _elements = new object[10];

            private bool _readOnly;

            private int _size = 0;

     

     

            public void Add(object child)

            {

                if (!_readOnly)

                {

                    int newSize = _elements.Length + 1;

                    if (newSize > _elements.Length)

                    {

                        object[] newElements = new object[_elements.Length + 10];

                        for (int i = 0; i < _size; i++)

                        {

                            newElementsIdea = _elementsIdea;

                        }

     

                        _elements = newElements;

                    }

     

                    _elements[_size] = child;

                    _size++;

                }

            }

     

            public bool ReadOnly

            {

                get { return _readOnly; }

                set { _readOnly = value; }

            }

        }

    It's not the biggest, hairiest method in the world, but it could definitely be better.  Let's clean up the Add() method by doing these 6 refactorings.

    1. Invert the "readonly" check to a Guard Clause
    2. Extract Method - void addElement(object)
    3. Introduce Explaining Variable - shouldGrow
    4. Decompose Conditional - bool atCapacity()
    5. Inline Variable - shouldGrow --> atCapacity()
    6. Extract Method - void grow()

    Applying that series of refactorings to the Add() method leads to this second version of the expandable list class.

        public class MyExpandableListRefactored

        {

            private object[] _elements = new object[10];

            private bool _readOnly;

            private int _size = 0;

     

     

            public void Add(object child)

            {

                if (_readOnly)

                {

                    return;

                }

     

                if (atCapacity())

                {

                    grow();

                }

     

                addElement(child);

            }

     

            private void grow()

            {

                object[] newElements = new object[_elements.Length + 10];

                for (int i = 0; i < _size; i++)

                {

                    newElementsIdea = _elementsIdea;

                }

     

                _elements = newElements;

            }

     

            private bool atCapacity()

            {

                int newSize = _elements.Length + 1;

                return newSize > _elements.Length;

            }

     

            private void addElement(object child)

            {

                _elements[_size] = child;

                _size++;

            }

     

            public bool ReadOnly

            {

                get { return _readOnly; }

                set { _readOnly = value; }

            }

        }

    I ended up adding several new methods, but the methods are very simple, and the Add() method is far more readable.  I'm sure that the obvious objection to Composed Method is that the longer stack traces and call stacks making the code harder to debug.  My simple answer is that a combination of using Test Driven Development and well factored code should minimize the need for the debugger.

More Posts