CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Karl Seguin

.NET From Ottawa, Ontario - http://twitter.com/karlseguin/

January 2008 - Posts

  • Foundations of Programming - PDF

    UPDATED: There's an official free ebook now available here.

     

    Tim Barcz was kind enough to compile the the foundation series into a single PDF, for your sharing/printer pleasure.

    You can grab it here 

    I'll be taking a short break from blogging for the next couple weeks, so have fun playing with the new toys.
     

    Posted Jan 02 2008, 08:12 PM by karl with 16 comment(s)
    Filed under:
  • Foundations of Programming - Part 6 - NHibernate

    In part 3 we took our first stab at bridging the data and object world by hand-writing our own data access layer and mapper. The approach turned out to be rather limited and required quite a bit of repetitive code (although it was useful in demonstrating the basics). Adding more object and more functionality would bloat our DAL into an enormously unmaintainable violation of DRY (don't repeat yourself). In this section we'll look at an actual O/R Mapping framework to do all the heavy lifting for us. Specifically, we'll look at the popular open-source NHibernate framework (http://www.hibernate.org/343.html).

    The single greatest barrier preventing people from adopting domain driven design is the issue of persistence. My own adoption of O/R mappers came with great trepidation and doubt. You'll essentially be asked to trade in your knowledge of a tried and true method for something that seems a little too magical. A leap of faith may be required.

    The first thing to come to terms with is that O/R mappers generate your SQL for you. I know, it sounds like it's going to be slow, insecure and inflexible, especially since you probably figured that it'll have to use inline SQL. But if you can push those fears out of your mind for a second, you have to admit that it could save you a lot of time and result in a lot less bugs. Remember, we want to focus on building behavior, not worry about plumbing (and if it makes you feel any better, a good O/R mapper will provide simple ways for you to circumvent the automated code generation and execute your own SQL or stored procedures).

    Infamous Inline SQL vs Stored Procedure Debate
    Over the years, there's been some debate between inline SQL and stored procedures. This debate has been very poorly worded, because when people hear inline SQL, they think of badly written code like:

    public int GetUserIdByCredentials(string userName, string password)
    {
       string sql = @"SELECT UserId FROM Users 
                      WHERE UserName = '" + userName + "' AND Password = '" + password + "'";
       using (SqlCommand command = new SqlCommand(sql))
       {
          //todo
          return 0;
       }         
    }
    
    If you stop and think about it though, and compares apples to apples, I think you'll agree that neither is particularly better than the other. Let me help you out.

    Stored Procedures are more Secure
    Inline SQL should be written using parameterized queries just like you do with stored procedures. For example, the correct way to write the above code in order to eliminate the possibility of an SQL injection attack is:

    public int GetUserIdByCredentials(string userName, string password)
    {
       string sql = @"SELECT UserId FROM Users 
                      WHERE UserName = @UserName AND Password = @Password";
       using (SqlCommand command = new SqlCommand(sql))
       {
          command.Parameters.Add("@UserName", SqlDbType.VarChar, 64).Value = userName;
          command.Parameters.Add("@Password", SqlDbType.VarChar, 64).Value = password;
          //todo
          return 0;
       }         
    }
    

    Stored procedures provide an abstraction to the underlying schema
    Whether you're using inline SQL or stored procedures, what little abstraction you can put in a SELECT statement is the same. If any substantial changes are made, your stored procedures are going to break and there's a good chance you'll need to change the calling code to deal with the issue. O/R Mappers on the other side, generally provide much better abstraction.

    If I make a change, I don't have to recompile the code
    Somewhere, somehow, people got it in their head that code compilations should be avoided at all cost (maybe this comes from the days where projects could take hours to compile). If you change a stored procedure, you still have to re-run your unit and integration tests and deploy a change to production. It genuinely scares and puzzles me that developers consider a change to a stored procedure or XML trivial compared to a similar change in code.

    Stored Procedures reduce network traffic
    Who cares? In most cases your database is sitting on a GigE connection with your servers and you aren't paying for that bandwidth. You're literally talking fractions of milliseconds. On top of that, a well configured O/R mapper can save round-trips via identify map implementations, caching and lazy loading.

    Stored procedures are faster
    This is the excuse I held onto the longest. Write a reasonable/common SQL statement inline and then write the same thing in a stored procedure and time them. Go ahead. In most cases there's little or no difference. In some cases, stored procedures will be slower because a cached execution plan will not be efficient given a certain parameter. Jeff Atwood called using stored procedures for the sake of better performance a fairly extreme case of premature optimization. He's right. The proper approach is to take the simplest possible approach (let a tool generate your SQL for you), and optimize specific queries when/if bottlenecks are identified.

    It took a while, but after a couple years, I realized that the debate between inline and stored procedures was as trivial and meaningless as the one about C# and VB.NET. Of course, since the differences are practically non-existing; why not just use stored procedures? If you aren't willing to adopt an O/R mapper, that's certainly what I would suggest – there's no sense in dynamically creating your own inline SQL. However, O/R mappers, which rely on inline SQL, provide three very important benefits (there are more, but with respect to maintainability, I think these are the most important:

    1. You end up writing a lot less code – which obviously results in a more maintainable system,
    2. You gain a true level of abstraction from the underlying data source – both because you're querying the O/R mapper for your data directly (and it converts that into the appropriate SQL), and because you're providing mapping information between your table schemas and domain objects,
    3. If your impedance mismatch is low, they save you from having to write a lot of repetitive code; however, if your impedance mismatch is high, you'll be able to design your database the way it should be, and your domain layer the way it should be, without having to create an uncomfortable compromise – the O/R mapper will handle the mismatch for you.

    In the end, this really comes down to building the simplest solution upfront. After a few iterations, you can spend time profiling your code, and only if you detect an actual problem do you have to address that specific case. It might not sound so much simpler because you have to learn a fairly complex framework upfront, but that's the reality of our profession.

    Remember, our goal is to widen our knowledge base by looking at different ways to build systems in order to provide our clients with greater value. While we may be specifically talking about NHibernate, the goal is really to introduce to concept of O/R mappers, and try to correct the blind faith .NET developers have put into stored procedures and ADO.NET.

    NHibernate
    Of the frameworks and tools we've looked at so far, NHibernate is the most complex. This complexity is certainly something you should take into account when deciding on a persistence solution, but once you do find a project that allows for some R&D time, the payoff will be well worth it in future projects. The nicest thing about NHibernate, and a major design goal of the framework, is that it's completely transparent – your domain objects aren't forced to inherit a specific base class and you don't have to use a bunch of decorator attributes. This makes unit testing your domain layer possible – if you're using a different persistent mechanism, say typed datasets, the tight coupling between domain and data makes it hard/impossible to properly unit test.

    At a very high level, you configure NHibernate by telling it how your database (tables and columns) map to your domain objects, use the NHibernate API and NHibernate Query Language to talk to your database, and let it do the low level ADO.NET and SQL work.

    In previous parts we focused on a system for a car dealership – specifically focusing on cars and upgrades. In this part we'll change perspective slightly and look at car sales (sales, models and sales people).The domain model is simple – a SalesPerson has zero or more Sales which reference a specific Model.

    I've also included a VS.NET solution that contains sample code and annotations – you can find a link at the end of this article. All you need to do to get it running is create a new database, execute the provide SQL script (a handful of create tables), and configure the connection string. The sample, along with the rest of this article, is meant to help you get started with NHibernate – a topic too often overlooked.

    You might also be interested in the excellent NHibernate Reference Manual as well as Manning's NHibernate in Action book.

    Configuration
    The secret to NHibernate's amazing flexibility lies in its configurability. Initially it can be rather daunting to set it up, but after a coupe project it becomes rather natural. The first step is to configure the NHibernate itself. The simplest such configuration, which must be added to your app.config or web.config, looks like:

    <?xml version="1.0" encoding="utf-8" ?> 
    <configuration>
      <configSections>
        <section name="hibernate-configuration" type="NHibernate.Cfg.ConfigurationSectionHandler, NHibernate" /> 
      </configSections>
      <hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
        <session-factory>
          <property name="hibernate.dialect">NHibernate.Dialect.MsSql2005Dialect</property> 
          <property name="hibernate.connection.provider">NHibernate.Connection.DriverConnectionProvider</property> 
          <property name="hibernate.connection.connection_string">Server=SERVER;Initial Catalog=DATABASE;User Id=USER;Password=PASSWORD;</property> 
          <mapping assembly="CodeBetter.Foundations" /> 
        </session-factory>
      </hibernate-configuration>
    </configuration>
    

    Of the four values, dialect is the most interesting. This tells NHibernate what specific language our database speaks. If, later on, we ask NHibernate to return a paged result of Cars and our dialect is set to SQL Server 2005, NHibernate will issue an SQL SELECT utilizing the ROW_NUMBER() ranking function. However, if the dialect is set to MySQL, NHibernate will issue a SELECT with a LIMIT. In most cases, you'll set this once and forget about it, but it does provide some insight into the capabilities provide by a layer that generates all of your data access code.

    In our configuration, we also told NHibernate that our mapping files were located in the CodeBeter.Foundations assembly. Mapping files are embedded XML files which tell NHibernate how each class is persisted. With this information, NHibernate is capable of returning a Car object when you ask for one, as well as saving it. The general convention is to have a mapping file per domain object, and for them to be placed inside a Mappings folder. The mapping file for our Model object, name Model.hbm.xml, looks like:

    <hibernate-mapping xmlns="urn:nhibernate-mapping-2.2" assembly="CodeBetter.Foundations" namespace="CodeBetter.Foundations">
      <class name="Model" table="Models" lazy="true" proxy="Model">
        <id name="Id" column="Id" type="int" access="field.lowercase-underscore">
          <generator class="native" /> 
        </id>
        <property name="Name" column="Name" type="string" not-null="true" length="64" /> 
        <property name="Description" column="Description" type="string" not-null="true" /> 
        <property name="Price" column="Price" type="double" not-null="true" /> 
      </class>
    </hibernate-mapping>
    

    (it's important to make sure the "Build Action" for all mapping files is set to "Embedded Resources")

    This file tells NHibernate that the Model class maps to rows in the Models table, and that the 4 properties Id, Name, Description and Price map to the Id, Name, Description and Price columns. The extra information around the Id property specifies that the value is generated by the database (as opposed to NHibernate itself (for clustered solutions), or our own algorithm) and that there's no setter, so it should be accessed by the field with the specified naming convention (we supplied Id as the name, and lowercase-underscore as the naming strategy, so it'll use a field named _id).

    With the mapping file set up, we can start interacting with the database:

    //Let's add a new car model
    Model model = new Model();
    model.Name = "Hummbee";
    model.Description = "Great handling, built-in GPS to always find your way back home, Hummbee2Hummbe(tm) communication";
    model.Price = 50000.00;         
    ISession session = _sessionFactory.OpenSession();
    session.Save(model);
    
    
    //Let's discount the x149
    Model model = session.CreateQuery("from Model model where model.Name = ?").SetString(0, "X149").UniqueResult<Model>();
    model.Price -= 5000;
    ISession session = _sessionFactory.OpenSession();
    session.Update(model); 
    

    The above example shows how easy it is to persist new objects to the database, retrieve them and update them – all without any ADO.NET or SQL.

    You may be wondering where the _sessionFactory object comes from, and exactly what an ISession is. The _sessionFactory (of type ISessionFactory) is a global thread-safe object that you'd likely create on application start. You'll typically need one per database your application is using (which means you'll typically only need one), and its job, like most factories, is to create a preconfigured object: an ISession. The ISession has no ADO.NET equivalent, but it does map loosely to a database connection. However, creating an ISession doesn't necessarily open up a connection. Instead, ISessions smartly manage connections and command objects for you. Unlike connections which should be opened late and closed early, you needn't worry about having ISessions stick around for a while (although they aren't thread-safe). If you're building an ASP.NET application, you could safely open an ISession on BeginRequest and close it on EndRequest (or better yet, lazy-load it in case the specific request doesn't require an ISession).

    ITransaction is another piece of the puzzle which is created by calling BeginTransaction on an ISession. It's common for .NET developers to ignore the need for transactions within their applications. This is unfortunate because it can lead to unstable and even unrecoverable states in the data. An ITransaction is used to keep track of the unit of work – tracking what's changed, been added or deleted, figuring out what and how to commit to the database, and providing the capability to rollback should an individual step fail.

    Relationships
    In our system, it's important that we track sales – specifically with respect to sales people, so that we can provide some basic reports. We're told that a sale can only ever belong to a single sales person, and thus set up a one to many relationship – that is, a sales person can have multiple sales, and a sales can only belong to a single sales person. In our database, this relationship is represented as a SalesPersonId column in the Sales table (a foreign key). In our domain, the SalesPerson class has a Sales collection and the Sales class has a SalesPerson property (references).

    Both ends of the relationship needs to be setup in the appropriate mapping file. On the Sales end, which maps a single property, we use a glorified property element called many-to-one:

    ...
    <many-to-one name="SalesPerson" class="SalesPerson" column="SalesPersonId" not-null="true"/>
    ...
    

    We're specifying the name of the property, the type/class, and the foreign key column name. We're also specifying an extra constraint, that is, when we add a new Sales object, the SalesPerson property can't be null.

    The other side of the relationship, the collection of sales a sales person has, is slightly more complicated – namely because NHibernate's terminology isn't standard .NET lingo. To set up a collection we use a set, list, map, bag or array element. Your first inclination might be to use list, but NHibernate requires that you have a column that specifies the index. In other words, the NHibernate team sees a list as a collection where the index is important, and thus must be specified. What most .NET developers think of as a list, NHibernate calls a bag. Confusingly, whether you use a list or a bag element, your domain type must be an IList (or its generic IList equivalent). This is because .NET doesn't have an IBag object. In short, for your every day collection, you use the bag element and make your property type an IList.

    The other interesting collection option is the set. A set is a collection that cannot contain duplicates – a common scenario for enterprise application (although it is rarely explicitly stated). Oddly, .NET doesn't have a set collection, so NHibernate uses the Iesi.Collection.ISet interface. There are four specific implementations, the ListSet which is really fast for very small collections (10 or less items), the SortedSet which can be sorted, the HashedSet which is fast for larger collections and the HybridSet which initially uses a ListSet and automatically switches itself to a HashedSet as your collection grows.

    For our system we'll use a bag (even though we can't have duplicate sales, it's just a little more straightforward right now), so we declare our Sales collection as an IList:

    private IList<Sale> _sales;
    public IList<Sale> Sales
    {
       get { return _sales;}
    }
    
    And add our element to the SalesPerson mapping file:
    ...
    <bag name="Sales" access="field.lowercase-underscore" table="Sales" inverse="true" cascade="all">
       <key column="SalesPersonId" />
       <one-to-many class="Sale" />
    </bag>
    ...
    
    Again, if you look at each element/attribute, it isn't as complicated as it first might seem. We identify the name of our property, specify the access strategy (we don' t have a setter, so tell it to use the field with our naming convention), the table and column holding the foreign key, and the type/class of the items in the collection.

    We've also set the cascade attribute to all which means that when we call Update on a sales person, any changes made to his or her sales collection (additions, removals, changes to existing sales) will automatically be persisted. Cascading can be a real time saver as your system grows in complexity.

    Querying
    NHibernate supports two different querying approaches: Hibernate Query Language (HQL) and Criteria Queries (you can also query in actual SQL, but lose portability when doing so). HQL is the easier of two as it looks a lot like SQL – you use from, where, aggregates, order by, group by, etc. However, rather than querying against your tables, you write queries against your domain – which means HQL supports OO principles like inheritance and polymorphism. Either query methods are abstractions on top of SQL, which means you get total portability – all you need to do to target a different database is change your dialect configuration.

    HQL works off of the IQuery interface, which is created by calling CreateQuery on your session. With the IQuery you can return individual entities, collections, substitute parameters and more. Here are some example:

    string lastName = "allen";
    ISession session = _sessionFactory.OpenSession();
    
    //retrieve a salesperson by last name
    IQuery query = session.CreateQuery("from SalesPerson p where p.LastName = 'allen'");
    SalesPerson p = query.UniqueResult<SalesPerson>();
    
    //same as above but in 1 line, and with the last name as a variable
    SalesPerson p = session.CreateQuery("from SalesPerson p where p.LastName = ?").SetString(0, lastName).UniqueResult<SalesPerson>();
    
    //people with few sales         
    IList<SalesPerson> slackers = session.CreateQuery("from SalesPerson person where size(person.Sales) < 5").List<SalesPerson>();
    
    This is just a subset of what can be accomplished with HQL (the downloadable sample has slightly more complicated examples).

    Lazy Loading
    When we load a sales person, say by doing: SalesPerson person = session.Get(1); the Sales collection won't be loaded. That's because, by default, collections are lazily loaded. That is, we won't hit the database until the information is specifically requested (i.e., we access the Sales property). We can override the behavior by setting lazy="false" on the bag element.

    The other, more interesting, lazy load strategy implemented by NHibernate is on entities themselves. You'll often want to add a reference to an object without having to load the actual object from the database. For example, when we add a sales to a sales person, we need to specify the model, but don't want to load all the model information – all we really want to do is get the Id so we can store it in the ModelId column of the Sales table. When you use session.Load(id) NHibernate will load a proxy of the actual object (unless you specify lazy="false" in the class element). As far as you're concerned, the proxy behaves exactly like the actual object, but none of the data will be retrieved from the database until the first time you ask for it. This makes it possible to write the following code:

    Sale sale = new Sale(session.Load<Model>(1), DateTime.Now, 46000.00);
    salesPerson.AddSales(sale);
    session.SaveOrUpdate(salesPerson);
    
    without ever having to actually hit the database to load the model.

    Download
    I've included a download which'll hopefully provide a base for you to start playing with NHibernate. The code is well documented - take special care to read the annotations withint he mapping files. To get it running:

    1. Create a new database and run the CREATE TABLE commands located in CREATE_TABLES.sql,
    2. Modify the hibernate.connection.connection_string property within the app.config so that it can connect to your newly created database
    Once configured, take a look at the Run method within Sample.cs and walk through each call one at a time.

    Download Project

    Conclusion
    We've only touched the tip of what you can do with NHibernate. We haven't looked at its Criteria Queries (which is a query API tied even closer to your domain than HQL), its caching capabilities, filtering of collections, performance optimizations, logging, or native SQL abilities. Beyond NHibernate the tool, hopefully you've learnt more about object relational mapping, and alternative solutions to the limited toolset baked into .NET. It is hard to let go of hand written SQL statement, but looking beyond the bias of what's comfortable, it's impossible to rationalize doing all that work upfront.

More Posts

Our Sponsors

Proudly Partnered With