Landman Code: C# SuperFastHash and MurmurHash2 implementations

C# SuperFastHash and MurmurHash2 implementations

25 February 2009

I’ve been emailed about a SuperFastHash C# implementation, and I felt like doing low level stuff, since I’m knees deep in DDD at the moment. So I looked at my Pascal and BASM implementation of SuperFastHash and figured, I could totally make this in C#. Searching around if nobody else had done it already (then I could just send a link to that site as reply), I saw some articles analyzing SuperFastHash and breaking it. During that search I also found another Hasher called MurmurHash2 which passes some test very nicely and is a lot faster than SuperFastHash. But it seems that this hash is too simple and does have some vurnabilities as well, so I’ve implemented both in c# and left you with the choice. Important to know, these hashes are for hashtables and not meant for verifying or cryptology.

I'll discuss each hash separately and discuss my different implementation, in the end I’ll post the final implementation and let you chose yourself.

SuperFastHash

This one I already knew, so I implemented this one very fast to the following simple implementation. This implementation is very standard .NET code which just a few optimizations.

    public class SuperFastHashSimple : IHashAlgorithm
    {
        public UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = Convert.ToUInt32(dataLength);
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4
            Int32 currentIndex = 0;
            while (numberOfLoops > 0)
            {
                hash += BitConverter.ToUInt16(dataToHash, currentIndex);
                UInt32 tmp = (UInt32)(BitConverter.ToUInt16(dataToHash, currentIndex + 2) << 11) ^ hash;
                hash = (hash << 16) ^ tmp;
                hash += hash >> 11;
                currentIndex += 4;
                numberOfLoops--;
            }

            switch (remainingBytes)
            {
                case 3: hash += BitConverter.ToUInt16(dataToHash, currentIndex);
                    hash ^= hash << 16;
                    hash ^= ((UInt32)dataToHash[currentIndex + 2]) << 18;
                    hash += hash >> 11;
                    break;
                case 2: hash += BitConverter.ToUInt16(dataToHash, currentIndex);
                    hash ^= hash << 11;
                    hash += hash >> 17;
                    break;
                case 1: hash += dataToHash[currentIndex];
                    hash ^= hash << 10;
                    hash += hash >> 1;
                    break;
                default:
                    break;
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

The most performance was lost with the BitConverter.ToUInt16 which is implemented as a unsafe pointer cast, but does some checking before the cast, the next step was to inline the byte to UInt16 conversion.

    public class SuperFastHashInlineBitConverter : IHashAlgorithm
    {
        public UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = (UInt32)dataLength;
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4
            Int32 currentIndex = 0;
            while (numberOfLoops > 0)
            {
                hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8);
                UInt32 tmp = (UInt32)((UInt32)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8) << 11) ^ hash;
                hash = (hash << 16) ^ tmp;
                hash += hash >> 11;
                numberOfLoops--;
            }

            switch (remainingBytes)
            {
                case 3:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8);
                    hash ^= hash << 16;
                    hash ^= ((UInt32)dataToHash[currentIndex]) << 18;
                    hash += hash >> 11;
                    break;
                case 2:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex] << 8);
                    hash ^= hash << 11;
                    hash += hash >> 17;
                    break;
                case 1: 
                    hash += dataToHash[currentIndex];
                    hash ^= hash << 10;
                    hash += hash >> 1;
                    break;
                default:
                    break;
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

This is the fastest no-hacks managed implementation of the SuperFastHash, but the result will change if the Endianness changes. I’ve been looking for a way to cast the byte array to a int array without have to do Marshal.Copy, I found a dirty hack using FieldOffset(0), I have no idea if this hack is going to be supported on new versions of the CLR so using this is a risk. The only strange part is that the UInts.Length is the length of the bytes array, but the index is of the UInt16 array, so the max index for that array is UInts.Length / sizeof(UInt16). Below is the implementation of this hack.

    public class SuperFastHashUInt16Hack : IHashAlgorithm
    {
        [StructLayout(LayoutKind.Explicit)]
        // no guarantee this will remain working
        struct BytetoUInt16Converter
        {
            [FieldOffset(0)]
            public Byte[] Bytes;

            [FieldOffset(0)]
            public UInt16[] UInts;
        }

        public UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = (UInt32)dataLength;
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4
            Int32 currentIndex = 0;
            UInt16[] arrayHack = new BytetoUInt16Converter { Bytes = dataToHash }.UInts;
            while (numberOfLoops > 0)
            {
                hash += arrayHack[currentIndex++];
                UInt32 tmp = (UInt32)(arrayHack[currentIndex++] << 11) ^ hash;
                hash = (hash << 16) ^ tmp;
                hash += hash >> 11;
                numberOfLoops--;
            }
            currentIndex *= 2; // fix the length
            switch (remainingBytes)
            {
                case 3:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8);
                    hash ^= hash << 16;
                    hash ^= ((UInt32)dataToHash[currentIndex]) << 18;
                    hash += hash >> 11;
                    break;
                case 2:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex] << 8);
                    hash ^= hash << 11;
                    hash += hash >> 17;
                    break;
                case 1:
                    hash += dataToHash[currentIndex];
                    hash ^= hash << 10;
                    hash += hash >> 1;
                    break;
                default:
                    break;
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

The last stap was to go to unsafe land and do real pointer stuff.. This implementation looks a lot like the original c implementation

    public class SuperFastHashUnsafe : IHashAlgorithm
    {
        public unsafe UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = (UInt32)dataLength;
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4

            fixed (byte* firstByte = &(dataToHash[0]))
            {
                /* Main loop */
                UInt16* data = (UInt16*)firstByte;
                for (; numberOfLoops > 0; numberOfLoops--)
                {
                    hash += *data;
                    UInt32 tmp = (UInt32)(*(data + 1) << 11) ^ hash;
                    hash = (hash << 16) ^ tmp;
                    data += 2;
                    hash += hash >> 11;
                }
                switch (remainingBytes)
                {
                    case 3: hash += *data;
                        hash ^= hash << 16;
                        hash ^= ((UInt32)(*(((Byte*)(data))+2))) << 18;
                        hash += hash >> 11;
                        break;
                    case 2: hash += *data;
                        hash ^= hash << 11;
                        hash += hash >> 17;
                        break;
                    case 1: 
                        hash += *((Byte*)data);
                        hash ^= hash << 10;
                        hash += hash >> 1;
                        break;
                    default:
                        break;
                }
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

MurmurHash2

The next algorithm was MurmurHash2, the code is very simple, the only dificult part was the fall-trough case which luckely isn't supported in c#, below is the first implementation.

    public class MurmurHash2Simple : IHashAlgorithm
    {
        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        public UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 currentIndex = 0;
            while (length >= 4)
            {
                UInt32 k = BitConverter.ToUInt32(data, currentIndex);
                k *= m;
                k ^= k >> r;
                k *= m;

                h *= m;
                h ^= k;
                currentIndex += 4;
                length -= 4;
            }
            switch (length)
            {
                case 3:
                    h ^= BitConverter.ToUInt16(data, currentIndex);
                    h ^= (UInt32)data[currentIndex + 2] << 16;
                    h *= m;
                    break;
                case 2:
                    h ^= BitConverter.ToUInt16(data, currentIndex);
                    h *= m;
                    break;
                case 1:
                    h ^= data[currentIndex];
                    h *= m;
                    break;
                default:
                    break;
            }
            
            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }

I've applied the same optimalizations as discussed with SuperFastHash so here is the fastest no-hacks managed implementation.

    public class MurmurHash2InlineBitConverter : IHashAlgorithm
    {

        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        public UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 currentIndex = 0;
            while (length >= 4)
            {
                UInt32 k = (UInt32)(data[currentIndex++] | data[currentIndex++] << 8 | data[currentIndex++] << 16 | data[currentIndex++] << 24);
                k *= m;
                k ^= k >> r;
                k *= m;

                h *= m;
                h ^= k;
                length -= 4;
            }
            switch (length)
            {
                case 3:
                    h ^= (UInt16)(data[currentIndex++] | data[currentIndex++] << 8);
                    h ^= (UInt32)(data[currentIndex] << 16);
                    h *= m;
                    break;
                case 2:
                    h ^= (UInt16)(data[currentIndex++] | data[currentIndex] << 8);
                    h *= m;
                    break;
                case 1:
                    h ^= data[currentIndex];
                    h *= m;
                    break;
                default:
                    break;
            }

            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }

The dirty hack which I’ve already discussed worked miracles here as well. I've also added a different looping logic (stolen from SuperFastHash), this added a nice speed increase for the unsafe version.

    public class MurmurHash2Unsafe : IHashAlgorithm
    {
        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        public unsafe UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 remainingBytes = length & 3; // mod 4
            Int32 numberOfLoops = length >> 2; // div 4
            fixed (byte* firstByte = &(data[0]))
            {
                UInt32* realData = (UInt32*)firstByte;
                while (numberOfLoops != 0)
                {
                    UInt32 k = *realData;
                    k *= m;
                    k ^= k >> r;
                    k *= m;

                    h *= m;
                    h ^= k;
                    numberOfLoops--;
                    realData++;
                }
                switch (remainingBytes)
                {
                    case 3:
                        h ^= (UInt16)(*realData);
                        h ^= ((UInt32)(*(((Byte*)(realData)) + 2))) << 16;
                        h *= m;
                        break;
                    case 2:
                        h ^= (UInt16)(*realData);
                        h *= m;
                        break;
                    case 1:
                        h ^= *((Byte*)realData);
                        h *= m;
                        break;
                    default:
                        break;
                }
            }

            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }

Measurements

I've implemented the same test case for the native version of SuperFastHash and MurmurHash2, the measured speed will be used as reference speed. SuperFastHash was clocked at a rate of 1611 MB/s and MurmurHash2 was clocked at 2312 MB/s.

I've based the tests on the speed test used on the MurmurHash2 page, I'm not sure they test every property of the hash function, but it's a nice way to compare the performance.

            var data = new Byte[256 * 1024];
            new Random().NextBytes(data);
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;
            if (Environment.ProcessorCount > 1)
            {
                Process.GetCurrentProcess().ProcessorAffinity = 
                      new IntPtr(1 << (Environment.ProcessorCount - 1));
            }
            foreach (var testSubject in toTest)
            {
                Stopwatch timer = Stopwatch.StartNew();
                for (int i = 0; i < 9999; i++)
                {
                    testSubject.Value.Hash(data);
                }
                timer.Stop();
                Console.WriteLine("{0}:\t\t{1:F2} MB/s ({2})", testSubject.Key, 
                    (data.Length * (1000.0 / (timer.ElapsedMilliseconds / 9999.0)))
                        / (1024.0 * 1024.0),
                    timer.ElapsedMilliseconds);
            }

The test is pretty simple, test 256k random data 9999 times and extract the MB/s from it. (A test with 1k random data 99999 times showed the same speeds, so my implementations are stable enough). Below are the results for the different c# implementations.

Function	Speed	Relative to native
SuperFastHashSimple	281 MB/s	0.17x
SuperFastHashInlineBitConverter	780 MB/s	0.48x
SuperFastHashUInt16Hack	1204 MB/s	0.75x
SuperFastHashUnsafe	1308 MB/s	0.82x
MurmurHash2Simple	486 MB/s	0.21x
MurmurHash2InlineBitConverter	759 MB/s	0.32x
MurmurHash2UInt32Hack	1430 MB/s	0.62x
MurmurHash2Unsafe	2196 MB/s	0.95x

In conclusion the managed SuperFastHash implementation is only 25% slower than the unmanaged implementation, and if you really like speed you can get the Unsafe implementation at only 18% slower than unmanged. The MurmurHash2 managed implementation is 38% slower than the unmanaged (but still as fast as the unmanaged SuperFastHash) and the unsafe implementation is only 5% slower than the unmanaged version, I call that a nice result!

Final code

Here is the complete unit with all the above functions, as always you can also download it directly.

IHashingAlgorithm.cs

using System;

namespace HashTableHashing
{
    public interface IHashAlgorithm
    {
        UInt32 Hash(Byte[] data);
    }
    public interface ISeededHashAlgorithm : IHashAlgorithm
    {
        UInt32 Hash(Byte[] data, UInt32 seed);
    }
}

SuperFastHash.cs

/***** BEGIN LICENSE BLOCK *****
 * Version: MPL 1.1/GPL 2.0/LGPL 2.1
 *
 * The contents of this file are subject to the Mozilla Public License Version
 * 1.1 (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 * http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS" basis,
 * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
 * for the specific language governing rights and limitations under the
 * License.
 *
 * The Original Code is HashTableHashing.SuperFastHash.
 *
 * The Initial Developer of the Original Code is
 * Davy Landman.
 * Portions created by the Initial Developer are Copyright (C) 2009
 * the Initial Developer. All Rights Reserved.
 *
 * Contributor(s):
 *
 *
 * Alternatively, the contents of this file may be used under the terms of
 * either the GNU General Public License Version 2 or later (the "GPL"), or
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
 * in which case the provisions of the GPL or the LGPL are applicable instead
 * of those above. If you wish to allow use of your version of this file only
 * under the terms of either the GPL or the LGPL, and not to allow others to
 * use your version of this file under the terms of the MPL, indicate your
 * decision by deleting the provisions above and replace them with the notice
 * and other provisions required by the GPL or the LGPL. If you do not delete
 * the provisions above, a recipient may use your version of this file under
 * the terms of any one of the MPL, the GPL or the LGPL.
 *
 * ***** END LICENSE BLOCK ***** */
using System;
using System.Runtime.InteropServices;

namespace HashTableHashing
{
    public class SuperFastHashSimple : IHashAlgorithm
    {
        public UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = Convert.ToUInt32(dataLength);
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4
            Int32 currentIndex = 0;
            while (numberOfLoops > 0)
            {
                hash += BitConverter.ToUInt16(dataToHash, currentIndex);
                UInt32 tmp = (UInt32)(BitConverter.ToUInt16(dataToHash, currentIndex + 2) << 11) ^ hash;
                hash = (hash << 16) ^ tmp;
                hash += hash >> 11;
                currentIndex += 4;
                numberOfLoops--;
            }

            switch (remainingBytes)
            {
                case 3: hash += BitConverter.ToUInt16(dataToHash, currentIndex);
                    hash ^= hash << 16;
                    hash ^= ((UInt32)dataToHash[currentIndex + 2]) << 18;
                    hash += hash >> 11;
                    break;
                case 2: hash += BitConverter.ToUInt16(dataToHash, currentIndex);
                    hash ^= hash << 11;
                    hash += hash >> 17;
                    break;
                case 1: hash += dataToHash[currentIndex];
                    hash ^= hash << 10;
                    hash += hash >> 1;
                    break;
                default:
                    break;
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

    public class SuperFastHashInlineBitConverter : IHashAlgorithm
    {
        public UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = (UInt32)dataLength;
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4
            Int32 currentIndex = 0;
            while (numberOfLoops > 0)
            {
                hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8);
                UInt32 tmp = (UInt32)((UInt32)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8) << 11) ^ hash;
                hash = (hash << 16) ^ tmp;
                hash += hash >> 11;
                numberOfLoops--;
            }

            switch (remainingBytes)
            {
                case 3:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8);
                    hash ^= hash << 16;
                    hash ^= ((UInt32)dataToHash[currentIndex]) << 18;
                    hash += hash >> 11;
                    break;
                case 2:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex] << 8);
                    hash ^= hash << 11;
                    hash += hash >> 17;
                    break;
                case 1: 
                    hash += dataToHash[currentIndex];
                    hash ^= hash << 10;
                    hash += hash >> 1;
                    break;
                default:
                    break;
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

    public class SuperFastHashUInt16Hack : IHashAlgorithm
    {
        [StructLayout(LayoutKind.Explicit)]
        // no guarantee this will remain working
        struct BytetoUInt16Converter
        {
            [FieldOffset(0)]
            public Byte[] Bytes;

            [FieldOffset(0)]
            public UInt16[] UInts;
        }

        public UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = (UInt32)dataLength;
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4
            Int32 currentIndex = 0;
            UInt16[] arrayHack = new BytetoUInt16Converter { Bytes = dataToHash }.UInts;
            while (numberOfLoops > 0)
            {
                hash += arrayHack[currentIndex++];
                UInt32 tmp = (UInt32)(arrayHack[currentIndex++] << 11) ^ hash;
                hash = (hash << 16) ^ tmp;
                hash += hash >> 11;
                numberOfLoops--;
            }
            currentIndex *= 2; // fix the length
            switch (remainingBytes)
            {
                case 3:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex++] << 8);
                    hash ^= hash << 16;
                    hash ^= ((UInt32)dataToHash[currentIndex]) << 18;
                    hash += hash >> 11;
                    break;
                case 2:
                    hash += (UInt16)(dataToHash[currentIndex++] | dataToHash[currentIndex] << 8);
                    hash ^= hash << 11;
                    hash += hash >> 17;
                    break;
                case 1:
                    hash += dataToHash[currentIndex];
                    hash ^= hash << 10;
                    hash += hash >> 1;
                    break;
                default:
                    break;
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }

    public class SuperFastHashUnsafe : IHashAlgorithm
    {
        public unsafe UInt32 Hash(Byte[] dataToHash)
        {
            Int32 dataLength = dataToHash.Length;
            if (dataLength == 0)
                return 0;
            UInt32 hash = (UInt32)dataLength;
            Int32 remainingBytes = dataLength & 3; // mod 4
            Int32 numberOfLoops = dataLength >> 2; // div 4

            fixed (byte* firstByte = &(dataToHash[0]))
            {
                /* Main loop */
                UInt16* data = (UInt16*)firstByte;
                for (; numberOfLoops > 0; numberOfLoops--)
                {
                    hash += *data;
                    UInt32 tmp = (UInt32)(*(data + 1) << 11) ^ hash;
                    hash = (hash << 16) ^ tmp;
                    data += 2;
                    hash += hash >> 11;
                }
                switch (remainingBytes)
                {
                    case 3: hash += *data;
                        hash ^= hash << 16;
                        hash ^= ((UInt32)(*(((Byte*)(data))+2))) << 18;
                        hash += hash >> 11;
                        break;
                    case 2: hash += *data;
                        hash ^= hash << 11;
                        hash += hash >> 17;
                        break;
                    case 1: 
                        hash += *((Byte*)data);
                        hash ^= hash << 10;
                        hash += hash >> 1;
                        break;
                    default:
                        break;
                }
            }

            /* Force "avalanching" of final 127 bits */
            hash ^= hash << 3;
            hash += hash >> 5;
            hash ^= hash << 4;
            hash += hash >> 17;
            hash ^= hash << 25;
            hash += hash >> 6;

            return hash;
        }
    }
}

MurmurHash2.cs

/***** BEGIN LICENSE BLOCK *****
 * Version: MPL 1.1/GPL 2.0/LGPL 2.1
 *
 * The contents of this file are subject to the Mozilla Public License Version
 * 1.1 (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 * http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS" basis,
 * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
 * for the specific language governing rights and limitations under the
 * License.
 *
 * The Original Code is HashTableHashing.MurmurHash2.
 *
 * The Initial Developer of the Original Code is
 * Davy Landman.
 * Portions created by the Initial Developer are Copyright (C) 2009
 * the Initial Developer. All Rights Reserved.
 *
 * Contributor(s):
 *
 *
 * Alternatively, the contents of this file may be used under the terms of
 * either the GNU General Public License Version 2 or later (the "GPL"), or
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
 * in which case the provisions of the GPL or the LGPL are applicable instead
 * of those above. If you wish to allow use of your version of this file only
 * under the terms of either the GPL or the LGPL, and not to allow others to
 * use your version of this file under the terms of the MPL, indicate your
 * decision by deleting the provisions above and replace them with the notice
 * and other provisions required by the GPL or the LGPL. If you do not delete
 * the provisions above, a recipient may use your version of this file under
 * the terms of any one of the MPL, the GPL or the LGPL.
 *
 * ***** END LICENSE BLOCK ***** */
using System;
using System.Runtime.InteropServices;

namespace HashTableHashing
{
    public class MurmurHash2Simple : ISeededHashAlgorithm
    {
        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        public UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 currentIndex = 0;
            while (length >= 4)
            {
                UInt32 k = BitConverter.ToUInt32(data, currentIndex);
                k *= m;
                k ^= k >> r;
                k *= m;

                h *= m;
                h ^= k;
                currentIndex += 4;
                length -= 4;
            }
            switch (length)
            {
                case 3:
                    h ^= BitConverter.ToUInt16(data, currentIndex);
                    h ^= (UInt32)data[currentIndex + 2] << 16;
                    h *= m;
                    break;
                case 2:
                    h ^= BitConverter.ToUInt16(data, currentIndex);
                    h *= m;
                    break;
                case 1:
                    h ^= data[currentIndex];
                    h *= m;
                    break;
                default:
                    break;
            }
            
            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }

    public class MurmurHash2InlineBitConverter : ISeededHashAlgorithm
    {

        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        public UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 currentIndex = 0;
            while (length >= 4)
            {
                UInt32 k = (UInt32)(data[currentIndex++] | data[currentIndex++] << 8 | data[currentIndex++] << 16 | data[currentIndex++] << 24);
                k *= m;
                k ^= k >> r;
                k *= m;

                h *= m;
                h ^= k;
                length -= 4;
            }
            switch (length)
            {
                case 3:
                    h ^= (UInt16)(data[currentIndex++] | data[currentIndex++] << 8);
                    h ^= (UInt32)(data[currentIndex] << 16);
                    h *= m;
                    break;
                case 2:
                    h ^= (UInt16)(data[currentIndex++] | data[currentIndex] << 8);
                    h *= m;
                    break;
                case 1:
                    h ^= data[currentIndex];
                    h *= m;
                    break;
                default:
                    break;
            }

            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }

    public class MurmurHash2UInt32Hack : ISeededHashAlgorithm
    {
        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        [StructLayout(LayoutKind.Explicit)]
        struct BytetoUInt32Converter
        {
            [FieldOffset(0)]
            public Byte[] Bytes;

            [FieldOffset(0)]
            public UInt32[] UInts;
        }

        public UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 currentIndex = 0;
            // array will be length of Bytes but contains Uints
            // therefore the currentIndex will jump with +1 while length will jump with +4
            UInt32[] hackArray = new BytetoUInt32Converter { Bytes = data }.UInts;
            while (length >= 4)
            {
                UInt32 k = hackArray[currentIndex++];
                k *= m;
                k ^= k >> r;
                k *= m;

                h *= m;
                h ^= k;
                length -= 4;
            }
            currentIndex *= 4; // fix the length
            switch (length)
            {
                case 3:
                    h ^= (UInt16)(data[currentIndex++] | data[currentIndex++] << 8);
                    h ^= (UInt32)data[currentIndex] << 16;
                    h *= m;
                    break;
                case 2:
                    h ^= (UInt16)(data[currentIndex++] | data[currentIndex] << 8);
                    h *= m;
                    break;
                case 1:
                    h ^= data[currentIndex];
                    h *= m;
                    break;
                default:
                    break;
            }

            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }

    public class MurmurHash2Unsafe : ISeededHashAlgorithm
    {
        public UInt32 Hash(Byte[] data)
        {
            return Hash(data, 0xc58f1a7b);
        }
        const UInt32 m = 0x5bd1e995;
        const Int32 r = 24;

        public unsafe UInt32 Hash(Byte[] data, UInt32 seed)
        {
            Int32 length = data.Length;
            if (length == 0)
                return 0;
            UInt32 h = seed ^ (UInt32)length;
            Int32 remainingBytes = length & 3; // mod 4
            Int32 numberOfLoops = length >> 2; // div 4
            fixed (byte* firstByte = &(data[0]))
            {
                UInt32* realData = (UInt32*)firstByte;
                while (numberOfLoops != 0)
                {
                    UInt32 k = *realData;
                    k *= m;
                    k ^= k >> r;
                    k *= m;

                    h *= m;
                    h ^= k;
                    numberOfLoops--;
                    realData++;
                }
                switch (remainingBytes)
                {
                    case 3:
                        h ^= (UInt16)(*realData);
                        h ^= ((UInt32)(*(((Byte*)(realData)) + 2))) << 16;
                        h *= m;
                        break;
                    case 2:
                        h ^= (UInt16)(*realData);
                        h *= m;
                        break;
                    case 1:
                        h ^= *((Byte*)realData);
                        h *= m;
                        break;
                    default:
                        break;
                }
            }

            // Do a few final mixes of the hash to ensure the last few
            // bytes are well-incorporated.

            h ^= h >> 13;
            h *= m;
            h ^= h >> 15;

            return h;
        }
    }
}

Tags: c#, optimization

18 comments:

Unknown said:

Hello,

Any change to get Delphi version of MurMurHash2?
at 17 March, 2009 14:00
Davy Landman said:

Hello Tommi Prami,

The MurMurHash2 is pretty simple, so why don't you try too implement it as a challenge? Using my Delphi implementation of SuperFastHash you could implement MurMurHash2 quickly.

If you run into troubles you're free to contact me for some tips/help.

Cheers,
Davy
at 17 March, 2009 15:09
Unknown said:

Hello,

I was just hoping to get Lightning fast ASM-optimized version also ;)

I'll try to port it, I am not too good with shifts etc tough. Never done much of an bit Hacks...

I'll take our challenge, lets se am I worth it :)

-TP-
at 18 March, 2009 06:08
Unknown said:

Hello,

Check .bas newsgroup of my try on this...

-TP-
at 18 March, 2009 08:30
brander said:
This comment has been removed by the author. at 07 February, 2011 21:34
brander said:

curious if you've made a MurMurHash3 implementation now that the algorithm has been updated?
at 07 February, 2011 21:35
Davy Landman said:

Hi Brander,

I have not created a new version, your welcome to give it a go. You'll learn a lot from it.

Let me know if you implemented it, and I'll link to your post.

Cheers,
Davy Landman
at 08 February, 2011 10:49
Ian said:

Great article, thanks!

You can do fall-through in a switch statement in C# as follows:

switch(x)
{
case 0:
// do something
goto case 1; // fall-through
case 1:
// do something
break; // exit
}
at 03 June, 2013 12:03
Ian said:
This comment has been removed by the author. at 03 June, 2013 12:03
Ian said:
This comment has been removed by the author. at 03 June, 2013 12:04
Ian said:
This comment has been removed by the author. at 03 June, 2013 12:06
Ian said:
This comment has been removed by the author. at 03 June, 2013 12:06
Ian said:
This comment has been removed by the author. at 03 June, 2013 12:06
Unknown said:

Otarov Asylbek from Kazakhstan

Hello , how are you ?
Thank you very much for your code.

I searched this murmurhash2 algorithm everywhere. If I found , this code was very difficult to understand .But your murmurhash2 code is very understandably for me. May I will use your code in my Diploma Project. I will wait your message , my email address otarov.1992@gmail.com. I will glad if you send me message.

Best regards
Otarov Asylbek from Kazakhsatn.
at 18 May, 2014 09:29
Unknown said:

Otarov Asylbek from Kazakhstan

Hello , how are you ?
Thank you very much for your code.

I searched this murmurhash2 algorithm everywhere. If I found , this code was very difficult to understand .But your murmurhash2 code is very understandably for me. May I will use your code in my Diploma Project. I will wait your message , my email address otarov.1992@gmail.com. I will glad if you send me message.

Best regards
Otarov Asylbek from Kazakhsatn.
at 18 May, 2014 09:30
Davy Landman said:

Sure, license is liberal, just keep the license headers and you should be fine.

Perhaps ask your supervisor about open source code being used?
at 18 May, 2014 09:45
flav3r said:

hey hello, i cannot understand where is IHashAlgorithm, can you help, thanks!
at 20 January, 2021 10:09
flav3r said:

ok i got it, ignore that.
at 20 January, 2021 10:17

C# SuperFastHash and MurmurHash2 implementations

SuperFastHash

MurmurHash2

Measurements

Final code

IHashingAlgorithm.cs

SuperFastHash.cs

MurmurHash2.cs

18 comments:

Unknown said:

Davy Landman said:

Unknown said:

Unknown said:

brander said:

brander said:

Davy Landman said:

Ian said:

Ian said:

Ian said:

Ian said:

Ian said:

Ian said:

Unknown said:

Unknown said:

Davy Landman said:

flav3r said:

flav3r said:

About me

My github projects

My stackoverflow valuable flair

Blog Archive

License