CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]


For loop follow up (Disagreeing with Richter?!)

Ok, I have received about a dozen emails in regard to my last post An in depth look at for loops basically telling me that “I am wrong because Richter says so”.  These emails have used two points of reference by Jeffrey Richter. The first is the CLR internals presentation available as a web cast in particular around minute 12 during the foreach discussion where some points are made about for and foreach. The second reference I have been getting a lot of is the pages around 305 in CLR via C# 2.

 

To Jeffrey Richter if you happen to read this, let me start by saying I have the highest respect for you, I have thoroughly enjoyed nearly everything you have ever written; it is not only factual but a pleasure to read. I would personally not know most of the information I do today if it were not for books like yours. The most beneficial parts of your content are not these hidden gems but the teaching of the process to obtain them yourself. You taught me how to fish :)

 

That said let me address some of these. I will start with CLR via C#.

"The second thing to notice about the code above is that the JIT compiler knows that the for loop is accessing array elements 0 through Length – 1. So the JIT compiler produces code that, at runtime, tests that all array accesses will be within the array’s valid range. Specifically the JIT compiler produces code to check if ((Length – 1) <= a.GetUpperBound(0)). This check occurs just before the loop. If the check is good, the JIT compiler will not generate code inside the loop to verify that each access is within the valid range. This allows array access within loops to be very fast." Page 306 paragraph 2 CLR via C#

 

I fear that Jeffrey Richter has been caught by the same confusion as Brad Abrams. The code in question is

        static int RichtersCode() {

            Int32[] a = new Int32[5];

            int total = 0;

            for(Int32 index=0;index<a.Length;index++) {

                total += a[index];

            }

            return total;

        }

        static int RichtersCode() {

            Int32[] a = new Int32[5];

00000000  push        esi 

00000001  mov         edx,5

00000006  mov         ecx,7915982Ah

0000000b  call        FFB21F70

00000010  mov         ecx,eax

            int total = 0;

00000012  xor         esi,esi

            for(Int32 index=0;index<a.Length;index++) {

00000014  xor         edx,edx

00000016  mov         eax,dword ptr [ecx+4]

00000019  test        eax,eax

0000001b  jle         00000028

                total += a[index];

0000001d  add         esi,dword ptr [ecx+edx*4+8]

            for(Int32 index=0;index<a.Length;index++) {

00000021  add         edx,1

00000024  cmp         eax,edx

00000026  jg          0000001D

            }

            return total;

00000028  mov         eax,esi

0000002a  pop         esi 

0000002b  ret             

Listing 1: Jeffrey Richter’s code

 

By looking at this we should quickly see that it is in fact not calling GetUpperBound(0) prior to the loop. It has inlined the Length property and it never does any comparison. The JIT is realizing that since the array was created locally (on the stack??? and it knows the length that it was created with) that there is no need to make any comparisons. This cannot possibly throw an exception as written in native code. This optimization is in fact much better than the bounds hoisting we were hoping for, I believe this is being done for Constrained Execution Regions but I had already planned on discussing this theory in more depth for another post, you’ll just have to wait J

We can see that this is the same “confusing” optimization we looked at in the article because if we create the array outside of the function the optimization will not occur as can be seen here.

        static Int32[] b = new Int32[5];

        static int RichtersCode() {

            int total = 0;

            for(Int32 index=0;index<b.Length;index++) {

                total += b[index];

            }

            return total;

        }

        static int RichtersCode() {

            int total = 0;

00000000  push        edi 

00000001  push        esi 

00000002  xor         ecx,ecx

            for(Int32 index=0;index<b.Length;index++) {

00000004  xor         edx,edx

00000006  mov         esi,dword ptr ds:[022B1EC8h]

0000000c  mov         eax,dword ptr [esi+4]

0000000f  test        eax,eax

00000011  jle         00000025

00000013  mov         edi,dword ptr [esi+4]

                total += b[index];

00000016  cmp         edx,edi

00000018  jae         0000002A

0000001a  add         ecx,dword ptr [esi+edx*4+8]

            for(Int32 index=0;index<b.Length;index++) {

0000001e  add         edx,1

00000021  cmp         eax,edx

00000023  jg          00000016

            }

            return total;

00000025  mov         eax,ecx

00000027  pop         esi 

00000028  pop         edi 

00000029  ret             

0000002a  call        792B44A9

0000002f  int         3   

Listing 2: Jeffrey Richter’s code modified to use external array

 

The specific lines that show the bounds are still there are right before the assignment (in the blue area)

00000016  cmp         edx,edi

00000018  jae         0000002A

 

This is comparing the current counter against edx (which is where we have our length). It will then jump to throw the exception.

I feel I have done a fairly good job illustrating that there is in fact no array bounds hoisting occurring; we are looking at the second optimization discussed in the article. My guess is that we have errata in the book, a late change for CER, or a JIT bug in that it does not hoist at other times (I actually think it may be a combination of the three J).

 

The second bit that I have been receiving emails about deals with foreach vs for and comments that were made in the CLR internals presentation (which let me add is an amazing presentation and I highly recommend it).  There are actually two bits in the presentation that have been brought up, the first is that foreach on a string (or other array operation) is much slower than using a for loop. The second is that for and an index is a better usage overall than a foreach.

For the first case we have to remember that foreach does in fact have special handling in both VB.NET and C# for dealing with arrays. They will in fact generate identical code to a normal for (but strings are a special special case). Let’s take a look at the code in question.

        static void TestForeach() {

            string test = "thisisatestingstring";

            foreach (char c in test) {

                Console.WriteLine(c);

            }

        }

 

        static void TestForLoop() {

            string test = "thisisatestingstring";

            for(int i=0;i<test.Length;i++) {

                Console.WriteLine(testIdea [I]);

            }

        }

 

Listing 3: Two methods to iterate through strings

 

When we disassemble this to IL we receive the following

.method private hidebysig static void TestForeach() cil managed
{
      .maxstack 2
      .locals init (
            [0] string text1,
            [1] char ch1,
            [2] string text2,
            [3] int32 num1)
      L_0000: ldstr "thisisatestingstring"
      L_0005: stloc.0 
      L_0006: ldloc.0 
      L_0007: stloc.2 
      L_0008: ldc.i4.0 
      L_0009: stloc.3 
      L_000a: br.s L_001e
      L_000c: ldloc.2 
      L_000d: ldloc.3 
      L_000e: callvirt instance char string::get_Chars(int32)
      L_0013: stloc.1 
      L_0014: ldloc.1 
      L_0015: stsfld char PerformanceTestHarness.Program::dummy2
      L_001a: ldloc.3 
      L_001b: ldc.i4.1 
      L_001c: add 
      L_001d: stloc.3 
      L_001e: ldloc.3 
      L_001f: ldloc.2 
      L_0020: callvirt instance int32 string::get_Length()
      L_0025: blt.s L_000c
      L_0027: ret 
}

Listing 4: Foreach over string IL

 

.method private hidebysig static void TestForLoop() cil managed
{
      .maxstack 2
      .locals init (
            [0] string text1,
            [1] int32 num1)
      L_0000: ldstr "thisisatestingstring"
      L_0005: stloc.0 
      L_0006: ldc.i4.0 
      L_0007: stloc.1 
      L_0008: br.s L_001a
      L_000a: ldloc.0 
      L_000b: ldloc.1 
      L_000c: callvirt instance char string::get_Chars(int32)
      L_0011: stsfld char PerformanceTestHarness.Program::dummy2
      L_0016: ldloc.1 
      L_0017: ldc.i4.1 
      L_0018: add 
      L_0019: stloc.1 
      L_001a: ldloc.1 
      L_001b: ldloc.0 
      L_001c: callvirt instance int32 string::get_Length()
      L_0021: blt.s L_000a
      L_0023: ret 
}

Listing 5: for loop IL

 

They are in fact nearly identical. There is no performance gain to be reached here by using for. It is very important to remember that foreach has dual definitions. Admittedly I always say that looking at IL is bad so here is the disassembled native code.

        static void TestForeach() {

            string test = "thisisatestingstring";

00000000  push        edi 

00000001  push        esi 

00000002  mov         esi,dword ptr ds:[02314AF8h]

            foreach (char c in test) {

00000008  xor         edx,edx

0000000a  mov         edi,dword ptr [esi+8]

0000000d  test        edi,edi

0000000f  jle         00000024

00000011  movzx       ecx,word ptr [esi+edx*2+0Ch]

                dummy2 = c;

00000016  mov         word ptr ds:[00912FE8h],cx

0000001d  add         edx,1

            foreach (char c in test) {

00000020  cmp         edi,edx

00000022  jg          00000011

00000024  pop         esi 

            }

        }

00000025  pop         edi 

00000026  ret           

  

        static void TestForLoop() {

            string test = "thisisatestingstring";

00000000  push        edi 

00000001  push        esi 

00000002  mov         esi,dword ptr ds:[02314AF8h]

            for(int i=0;i<test.Length;i++) {

00000008  xor         ecx,ecx

0000000a  mov         edi,dword ptr [esi+8]

0000000d  test        edi,edi

0000000f  jle         00000024

                dummy2 = testIdea [I];

00000011  movzx       eax,word ptr [esi+ecx*2+0Ch]

00000016  mov         word ptr ds:[00912FE8h],ax

            for(int i=0;i<test.Length;i++) {

0000001d  add         ecx,1

00000020  cmp         edi,ecx

00000022  jg          00000011

00000024  pop         esi 

            }

        }

00000025  pop         edi 

00000026  ret             

Listing 6: Disassembled code

 

 

The second bit from the presentation that people have emailed me about is some comments made about for being better than foreach in general. I believe many people are taking these comments out of context as Jeffrey does in fact qualify these comments in the context of collections, where they do hold true as the enumerator simply does an indexed lookup on the collection. I gave what I thought was a good example of where it would not be faster, a linked list, in my original post that I will go into a bit more detail with.

Let’s consider for a minute how a linked list works. I store in my class a pointer (reference) to the first item in my list, if you want the 5th item I will iterate through my list until I hit the 5th node and return it to you, in this process I touch five nodes of the list. The same holds true for nine, thirty four, or seven thousand; we can therefore say that an indexed lookup into our linked list is O(n) (Big O n), we can actually make a stronger statements but this will suffice. If we were to create an enumerator for our linked list, its MoveNext()  function would simply return CurrentNode.Next this operation occurs in O(1). We can therefore say that our n O(1) operations will be much faster than n O(n) operations .. we could in fact even summarize this by saying that the enumerator will produce n touches of nodes where as the for with indexes will produce n touches of nodes. This is not that big of a deal on small lists but on a big list (say 100 items) the difference would be 100 touches with the enumerator vs 4950 touches with the index.

This holds true with other objects as well although not usually to this scale, a tree for instance would be much heavier to hit with indexes as opposed to an enumeration. Most items such as this will not offer an index based lookup so it’s not that big of a deal ..

 

Hope this clears up any confusion.


Published Jun 12 2006, 08:20 PM by Greg
Filed under:

Comments

Jason Haley said:

# June 13, 2006 9:38 AM

Regula Guy said:

Fascinating Mr Spock

# May 21, 2007 7:58 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add
Check out Devlicio.us!