CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Jeffrey Palermo [MVP]

Software management consultant and CTO, Headspring Systems

Use guid.comb in your database if you need guid keys but don't want to take a big performance hit - level 300

Once again, I'm behind the times.  COMB Guids have been around since 2002, and I'm just now finding out about them.  In short, WinNT used to use the MAC address of a network card to help generate GUIDs, and they were pseudo-sorted.  Since then, Guid have become more random and not dependent on anything.  Because of this, SQL Server 2000+ incurs a significant performance degradation when you use Guids as primary keys.  The degradation can be up to a factor of 10.  The reason is the random nature of the Guid.

We must remember that a Guid is not a string even thought that's how we see them.  A Guid is a 16-byte data structure.  An int is a 4-byte data structure, so we can simply think of it as 4 times as large.  The kicker is that integer keys are sorted and incremental.  Guids are random.  A clustered index can keep them in order for you, but every insert incurs the job of finding where in the list to insert the new value.  Integer keys just go at the end of the list.

I wish I could use plain integers for surrogate keys all the time, but there are a few times where I'm forced to use a Guid.  For those rare times, I'm now using the COMB Guid.

If you are using Guids (and NHibernate), check out the guid.comb key generator.  Read Jimmy Nilsson's article on COMB Guids to understand just why they perform better than normal Guids, and then go through the NHibernate documentation to see how you can use them in your mappings.  Ok, I'll just give you a sample:

<class name="MyClass" table="MyTable" dynamic-update="true">
<id column="my_key" type="Guid" name="Id">
<generator class="guid.comb" />
</id>

<property name="Something" type="Int16" column="something" not-null="true" />
<property name="SomethingElse" type="Byte" column="something_else" not-null="true" />

</class>

Technorati Tags: , , , , , , , ,


Comments

Jon Galloway said:

Another option (in Sql Server 2005) is NEWSEQUENTIALID(), which generates a GUID which is greater than any previous GUID on that computer. If you set NEWSEQUENTIALID() as the default on a PK GUID column, you'll avoid the fragmentation problem.

http://msdn2.microsoft.com/en-us/library/ms189786.aspx
# August 19, 2006 9:48 PM

Melvin Lee said:

Hi Jeffrey,

Can you tell me for what situations have you used (comb) guids? I had a situation at a customer where the customer have several applications that share part of the database schema, but not the database instance. One application will aggregate through all the database instances and display the data as it is from one database. My solution was to use guids so that I don't have to assign ranges to applications. This is the only situation I can think of why I should use guids.
# August 20, 2006 7:11 AM

C-J Berg said:

Melvin Lee: Yes, you would want to use it to ease replication scenarios.

As Jon wrote, there's a new function in SQL Server 2005 that does exactly this, NewSequentialID. Just be sure to read the following notes on the subject:

http://blogs.msdn.com/sqlprogrammability/archive/2006/03/23/559061.aspx
# August 20, 2006 11:12 AM

Melvin Lee said:

C-J,

I understand why you should use comb guids. But what I'm actually looking for are scenarios where I should use guids instead of plain integers for surrogate keys. Other than the scenario I have described I can't think of another situation where I would use guids as keys
# August 21, 2006 3:30 PM

ScottBellware said:

I typically prefer GUIDs for ID's in place of integers if the integer ID's have to be generated by a database server.

I've never really been comfortable with the idea that an abstraction like a  business object - something that can be used for more than persistence to the application's database  - must get it's ID from an external, and very likely, distributed system.  Since a GUID's uniqueness is ensured algorithmically, and since it can be done in the same local memory space as the object that it identifies, it seems a bit off to me that we'd defer to a remote dependency for something that is easily had locally, and that can be kept close to the object that is its primary client, and that is most concerned with it.

Since SQL Server 2005 has addressed the index fragmentation issue with GUID-based indexes, and since the COMB GUID generation code is easily had, GUIDs are a much easier decision to make.

That said, there are still compelling reasons to use integer ID's, but the vast majority of database apps aren't of the ilk that require integer ID's specifically.
# August 22, 2006 12:15 AM

About Jeffrey Palermo

Jeffrey Palermo is a software management consultant and the CTO of Headspring Systems in Austin, TX. Jeffrey specializes in Agile coaching and helps companies double the productivity of software teams. Jeffrey is an MCSD.Net , Microsoft MVP, Certified Scrummaster, Austin .Net User Group leader, AgileAustin board member, INETA speaker, INETA Membership Mentor, Christian, husband, father, motorcyclist, Eagle Scout, U.S. Army Veteran, and Texas A&M University graduate. Check out Devlicio.us!

This Blog

Syndication