CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]

August 2006 - Posts

  • Must Read Book Released

    Serge Lidin has released an updated copy of his Masterring IL Assembler, the previous version of this book was amazing so I have very high expectations for the new edition. The new edition published by APress covers all of the additions made in 2.0 including generics.

    If you use Reflection.Emit or spend time looking at underlying IL this book is a must read. If you don’t you will still learn a lot about how things work under the covers.

    I will be providing a full review as soon as I can.

     

  • Recommendations for a tablet pc?

    Mainly for the write on screen capabilities with powerpoint and for "portable surfing" / document management.

    I would prefer to not spend too much in the process.

    Anyone with recommendations?

  • Framework Design Guidelines: Property or Method

    I have a proposed addition to “Framework Design Guidelines” based upon a situation that came up a few days ago. I believe the situation is already covered implicitly but there seems to be some contention on that point so perhaps some clarification would help us write better code.

    The situation has to deal with the choice between a method and a property. The rules currently state

    CONSIDER using a property if the member represents a logical attribute of the type

    DO use a method rather than a property in the following situations

    ·         The operation is orders of magnitude slower than a field access would be.

    Given in the framework design guidelines the order of magnitude rule is referring more to items such as those that access network resources.

    The case which needs to be clarified is when you are dealing with a property that may also be used as a looping termination condition. If it is done as a property but is not a simple field access the client has no way to know whether or not they need to manually hoist the loop. As an example consider the following.

            static void Example() {

                for (int i = 0; i < SomeObject.Size; i++) {

                    //dosomething

                }

            }

     

     

    As we learned in An in depth look at for loops if Size is a simple field access that gets inlined the access will automatically be hoisted. If however it is not a simple field access that gets inlined it will be called on every iteration of the loop.

    From a client developer point of view we have no way of knowing how we should structure our loop without knowing the internal implementation of SomeObject, particularly how the Size property is implemented. If the property is not a simple field access that will be inlined we need to know to hoist it manually as in the following example.

            static void Example() {

                int count = SomeObject.Size;

                for (int i = 0; i < count; i++) {

                    //dosomething

                }

            }

     

    Or more appropriately..

            static void Example() {

                for (int i = 0, count=SomeObject.Size; i < count; i++) {

                    //dosomething

                }

            }

     

    Given one could make the argument to always use these forms in order to work around the question but this code certainly seems to be a bit more cryptic than the first version. I would also wager it quite likely that a Jr. developer could fall into the mistake of undoing this code (or writing it as the first version without realizing the performance impact). There are also numerous resources telling you specifically not to hoist property accesses due to the optimization that occurs when the property can be inlined.

    Of course the real question is where does this actually happen … I personally got caught by the Image object in the framework. Height and Width will in turn pinvoke to native code eliminating the possibility of the JIT automatically hoisting the call. As such looping with height or width as termination conditions will cause a pinvoke on every iteration of your loop.

    In order to help alleviate this I propose to add a point of guidance specific to looping stop conditions. If it is a simple field access, make it a property. If it is not a simple field access make it a method. It is much clearer from a client perspective that a method incurs the further overhead and should therefore be hoisted manually.

  • Performance Contest #2: Triangle Solitaire

    The previous contest went pretty well so here is the second incarnation of the performance contest. Hopefully I can keep from wasting hours of time by running all the results with JIT optimizations disabled!

    I am not providing a naïve solution to this problem, I wrote one and it is around 100 lines of code. The reason I am not providing a naïve solution is two-fold. This problem is a good brain challenge if you have never tackled a problem like this before, by giving a naïve solution I would be taking away the experience from people who have not dealt with this type of problem. Secondly the ability to test behavior in this problem is extraordinarily easy (you can quite easily generate positions, just remove one peg)

    The Problem

    Have you ever eaten at Cracker Barrel? I have … While waiting for my heart attack on a plate breakfast the other morning I picked up a little game they have sitting on their tables. The game is a small logic problem that falls into the category of peg solitaires.

     

    The board is setup in a triangle with n holes (a row has previous rows + 1 holes in it). Into these holes pegs are placed, to being a game all holes are filled except for one. To move a piece you must jump over another piece (like in checkers) into an empty square. You can only jump diagonally or horizontally and you must end up in an empty hole. When you jump a piece that piece is removed from the board. The goal is to only have a single piece remaining.

    You can identify the holes on a board as is done in the picture above. This is how the initial board will be given to your solver (an array of integers with 1 representing a peg and 0 representing no peg).

    Your challenge should you choose to accept it is to create a program that can solve an arbitrary position. You will be given a board represented as a series of integers, a zero means that the hole at that position is empty, a non-zero value means that there is a peg in that hole. From this board your job is to return whether or not a board is solvable.

    Submission Process

            public interface ITrianglePegSolitaireSolver {

                bool IsSolvable(int[] _Board);

            }

    Interface for the solver

    Your submission must meet the ITrianglePegSolitaireSolver interface and must arrive in my email before midnight (GMT – 5) on Friday September 8th in order to be run through the tests.

    Submissions are to be EMAILED to me at Gregory<insertmylastname>1@gmail.com. Do not post your submission in comments or I will be forced to delete it. Also it is not your best interest to do this as someone could come through and optimize one line of your code to beat you :P

    I will the Sunday after the contest ends create a post including performance results for all code. I will also post a zip file of all submissions (including my performance harness & unit tests so others can view it).

    Scoring

    Thousands of problems will be run through your solver some on big boards, some on little boards. Some problems will be very simple, some will be very complex. Any incorrect answer will result in a disqualification. Entries who have all items correct will be scored based upon the time they spent coming up with their answer.

    The Prizes

    CodeBetter.com has sponsored this performance contest and supplied the prizes! Everyone give a big thanks to

    First Place:

    “Zen of Code Optimization: The Ultimate Guide to Writing Software That pushes PC’s to the Limit” by Michael Abrash

    Abrash is an optimization god … This book is considered a religious text in art of writing efficient code.

    Second Place:

    “How to Solve It” by George Polya … A classic work on how to solve problems (you can solve it next time!)

    Best Safe Entry:

    A copy of “Algorithms” by Robert Sedgewick (this book was printed in 1988 and 95% of it still applies!)

     

    If you are unfamiliar with any of these books I highly recommend reading every one of them at least five times.  Every winning submission will also receive a real world copy of the game we will be modeling.

    I will do a submission but I cannot win a prize.

    Two of these books are in used condition as I could not possibly buy new ones

    Additional Rules

    1)      Only one submission per person to be evaluated, you can submit multiples for discussion if you come across an interesting tidbit that you think is worth sharing

    2)      Items will be run on the MS 2.0.50727 Commercial CLR (there can be no tricks such as adding IL instructions to mono)

    3)      You may not call out to unmanaged code!

    4)      You CAN use unsafe code

    5)      No you can’t know what the data is beforehand.

    6)      You must pass all tests in order to win (what good is fast if it doesn’t work?)

    If you have any questions, need clarification, or have thoughts on the overall idea of these competitions please post them here.

    Posted Aug 18 2006, 02:01 AM by Greg with 25 comment(s)
    Filed under:
  • Performance Contest #1 Results and Analysis

     

    Sorry for the delay in getting the results out. There is a really funny story behind it with me running 7 hours of tests in debug mode then aggregating results and getting it all ready before I realized what I had done :-/

    You can find the original contest here http://codebetter.com/blogs/gregyoung/archive/2006/07/27/147736.aspx.

    Before I announce the winners I want to look at various optimizations for the problem, some which were used and some which were not used.  Many seemed to have realized that the suggestion towards threading in this case was a red herring (well sort of, further information below). Large parts of this problem are spent in the creation of objects, the hash function and tokenizing the string. Let's look at some strategies to coming up with a good solution.

    Order is Important!

    One of the key optimizations for the tokenizer is realizing that the order of processing is extremely important. If we know any information about the domain our code will run in we can often make domain specific optimizations by changing the ordering of our statements.

    In the English language for instance characters appear on an order of magnitude more frequently than punctuation. In order to optimize for this case we should check our most often circumstances first. This is better shown with a simple example. The following two functions are both doing the exact same thing but the second example is much faster than the first.

            static count CountWrongOrder(string s) {

                count ret = new count();

                foreach (char c in s) {

                    if (c == ' ' || c == '\t' || c == '\n') {

                        ret.spaces++;

                    } else if (char.IsDigit(c)) {

                        ret.numbers++;

                    } else if (char.IsUpper(c)) {

                        ret.uppercase++;

                    } else if (char.IsLower(c)) {

                        ret.lowercase++;

                    }

                }

                return ret;

            }

            static count CountRightOrder(string s) {

                count ret = new count();

                foreach (char c in s) {

                    if (char.IsLower(c)) {

                        ret.lowercase++;

                    } else if (c == ' ' || c =='\t' || c=='\n') {

                        ret.spaces++;

                    } else if (char.IsUpper(c)) {

                        ret.uppercase++;

                    } else if (char.IsDigit(c)) {

                        ret.numbers++;

                    }

                }

                return ret;

            }

    Right and wrong order

     

    Run across the string "This is some normal english text. Occasionally you will also get a number such as 2" (concat’ed 10000) times the second version is almost twice the speed of the first. In our most often case in the first example (a lower case letter) we have to fail through all of the other conditions where as in the second example it ius our first condition. The goal should be to check the mutually exclusive conditions in an order that is correct when you statistically analyze the data.

    ToCharArray() ?!

    I saw many submissions calling Data.ToCharArray() to get a character array representing the data so they could read it. There is another method on the string object that does a similar job, which although hidden to you is much faster than ToCharArray(). Consider the following test.

            static int ORCharsNewArray(string s) {

                char[] tmp = s.ToCharArray();

                int ret = 0;

                foreach (char c in tmp) {

                    ret |= c;

                }

                return ret;

            }

            static int ORCharsSameArray(string s) {

                int ret = 0;

                foreach (char c in s) {

                    ret |= c;

                }

                return ret;

            }

            static void Main(string[] args) {

                string foo = "123456789012345678901234567890";

                Stopwatch s = new Stopwatch();

                s.Start();

                for (int i = 0; i < 100000; i++) {

                    int bar = ORCharsNewArray(foo);

                }

                s.Stop();

                Console.WriteLine(s.ElapsedTicks);

                s.Reset();

                s.Start();

                for (int i = 0; i < 100000; i++) {

                    int bar = ORCharsSameArray(foo);

                }

                s.Stop();

                Console.WriteLine(s.ElapsedTicks);

            }

    ToCharArray vs get_Chars

     

    In debug mode they have nearly identical performance but in release the ORCharsSameArray function is about 3 times faster. When you use a foreach on a string (or if you use an index) a special method get_Chars is called (you can’t call this on your own). We can see this occurring by looking at the IL in question.

    .method private hidebysig static int32 ORCharsSameArray(string s) cil managed

    {

          .maxstack 2

          .locals init (

                [0] int32 num1,

                [1] char ch1,

                [2] string text1,

                [3] int32 num2)

          L_0000: ldc.i4.0

          L_0001: stloc.0

          L_0002: ldarg.0

          L_0003: stloc.2

          L_0004: ldc.i4.0

          L_0005: stloc.3

          L_0006: br.s L_0018

          L_0008: ldloc.2

          L_0009: ldloc.3

          L_000a: callvirt instance char string::get_Chars(int32)

          L_000f: stloc.1

          L_0010: ldloc.0

          L_0011: ldloc.1

          L_0012: or

          L_0013: stloc.0

          L_0014: ldloc.3

          L_0015: ldc.i4.1

          L_0016: add

          L_0017: stloc.3

          L_0018: ldloc.3

          L_0019: ldloc.2

          L_001a: callvirt instance int32 string::get_Length()

          L_001f: blt.s L_0008

          L_0021: ldloc.0

          L_0022: ret

    }

    IL of ORCharsSameArray

     

    This method lets you view the data inside of the string as if it were an array of chars.  Since the data is only being read there is no need to generate an array. The reason that they are roughly the same speed in debug is that the method does not get inlined, in release mode the method is inlined and it is nearly as efficient as unsafe code.

    Threading (The Red Herring?)

    A few tried threaded solutions only one did it efficiently. Garmon got the algorithm right.

    The problem with threading is locking … In order to maintain a hash properly between two threads one needs to setup critical sections around the hash. The big problem comes in that on every probe to the hash one has to worry about a resize of the hash by the other thread. In order to get around this one has to create a hash per thread, since each thread has its own hash they can operate on their own hash without locking as the other thread will not touch the hash.

    Once the two threads are complete one then must merge the two hashes. This will always be a O(n) operation at best generally O(hashsize) which is why threading was not much of an issue here unless you get absolutely huge data sets with lots of repetition. It takes a lot to make up for the O(n) operation on the copy, the creation of the second hash, the delay to actually start the thread, and the additional memory overhead caused by having two hashes.

    The key to threads being successful is a high rate of repetition in the data!

    Use a Map

    All of the submissions used if conditionals to tokenize the data.  There were cases such as

                if ((c >= 'a') && (c <= 'z')) {

                    builder.Append(char.ToUpper(value));

                }

    Conditional Tokenizing

     

    Could we pre-generate this information into a table and simply do a table lookup?

            enum Operations {

                EndWord = 0x0100,

                MoveNext = 0x0200

            }

            static string FormatEntry(int value) {

                return string.Format("0x{0:x4}", value);

            }

     

            static void MainToRun(string[] args) {

                Console.WriteLine("UInt16 [] map = {");

                for (int i = 0; i < 255; i++) {

                    char c = (char)i;

                    if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9')) {

                        Console.Write(FormatEntry(c | (int)Operations.MoveNext));

                    } else if (c >= 'A' && c <= 'Z') {

                        Console.Write(FormatEntry(char.ToLower(c) | (int)Operations.MoveNext));

                    } else if (c == ' ' || c == '\t' || c == '\n' || c == '\r') {

                        Console.Write(FormatEntry(0 | (int)Operations.EndWord));

                    } else {

                        Console.Write(FormatEntry(0));

                    }

                    if (i < 254) {

                        Console.Write(", ");

                    }

                    if ((i + 1) % 8 == 0) {

                        Console.Write("\n");

                    }

                }

                Console.Write("\n}\n");

            }

    Code to generate map

     

    Note that this code is doing any transformations that we may want and is also storing some additional information in the high bits of the int16(we only use 8 bits for the char). In particular it is storing a bit that tells us if what was read is a word terminator or not and it tells us whether or not we should add to the current position of the output buffer. This code will produce output similar to the following

            static readonly UInt16 [] map = {

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0100, 0x0100, 0x0000, 0x0000, 0x0100, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0100, 0x0000, 0x0000, 0x0000, 0x0224, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x022d, 0x0000, 0x0000,

    0x0230, 0x0231, 0x0232, 0x0233, 0x0234, 0x0235, 0x0236, 0x0237,

    0x0238, 0x0239, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0261, 0x0262, 0x0263, 0x0264, 0x0265, 0x0266, 0x0267,

    0x0268, 0x0269, 0x026a, 0x026b, 0x026c, 0x026d, 0x026e, 0x026f,

    0x0270, 0x0271, 0x0272, 0x0273, 0x0274, 0x0275, 0x0276, 0x0277,

    0x0278, 0x0279, 0x027a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0261, 0x0262, 0x0263, 0x0264, 0x0265, 0x0266, 0x0267,

    0x0268, 0x0269, 0x026a, 0x026b, 0x026c, 0x026d, 0x026e, 0x026f,

    0x0270, 0x0271, 0x0272, 0x0273, 0x0274, 0x0275, 0x0276, 0x0277,

    0x0278, 0x0279, 0x027a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,

    0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000

    Map outputted by example

     

    Now that we have this map saved off we can write an extremely elegant tokenizer that is also faster than our prior incarnations which used conditionals.

            public unsafe WordEntry[] CountWords(string _Text) {

                table.Clear();

                int loc = 0;

                int lastloc = 0;

                byte [] buf = new byte[_Text.Length * 2];

                fixed (byte* buffer = buf) {

                    fixed (char* c = _Text) {

                        char* current = c;

                        char* stop = c + _Text.Length;

                        while (current < stop) {

                            UInt16 val = map[*current];

                            int add = (val >> 9);

                            if (add > 0) {

                                buffer[loc] = (byte) (val & 0xFF);

                                loc++;

                            } else if (loc != lastloc && ((val >> 8) & 1) == 1) {

                                table.Increment(buffer + lastloc, loc - lastloc );

                                loc += 4;

                                lastloc = loc;

                            }

                            current++;

                        }

                        if (loc != lastloc) {

                            table.Increment(buffer + lastloc, loc - lastloc);

                        }

                        return table.ToArray();

                    }

                }

            }

    Map Based Parser

     

    Clean, compact, and efficient … what more can you ask for?

    A Side Challenge

    I gave myself a side challenge yesterday, could I tokenize the string two chars at a time with only a single branch in my loop? It took me a little while to come up with an answer to this problem but it did exist. Here is the code.

                        while (current < stop) {

                            Int32 chars = *current;

                            chars = (chars ) >> 16 | ((chars & 0xFF) << 8);

                            Int32 processed = parsermap[chars];

                            Int16* tmp2 = (Int16*)tmp;

                            *tmp2 = (Int16)(processed & 0xFFFF);

                            int add = (processed >> 19); //number to add to output pointer

                            tmp += add;

                            tmp2 = (Int16*)current;

                            tmp2 += (processed >> 17) & 0x03; //number to add to buffer pointer (1 or 2)

                            current = (Int32*)tmp2;

                            if ((processed & 0x010000) > 0) {

                                Int32* t = (Int32*) tmp;

                                *t=0;

                                table.Increment(last, (int) (tmp - last));

                                tmp += 4;

                                last = tmp;

                            }

                        }

    Parse 2 chars at a time with only one if statement?

     

    In order to really look at this code you will also need to look at the code that generates a 64k map that it references (no I will not post the 64k map hereJ). The complete cod