Landman Code: SuperFastHash from Paul Hsieh Translated to Delphi and Borland Assembler (BASM)

SuperFastHash from Paul Hsieh Translated to Delphi and Borland Assembler (BASM)

13 July 2008

I've been a member of the FastCode community for a little while. The FastCode community tries to optimize core functions from Delphi using mostly BASM. After they have a very intensive benchmark en validation process the winner is suggested to Borland/CodeGear through a QC report, and is likely included in the next version of Delphi. I even have a few functions which are winners in these benchmarks and have a QC report with my function as a suggestion.

How it started

Why this introduction? Well, the community lives in the Borland Newsgroup in the group borland.public.delphi.language.basm and it's also a general discussion place about 'hardcore' optimalizations for a Delphi function. More than a year ago Juhani Suhonen asked for a fast hash to use for his hashtable. I suggested the old but nicely performing elf-hash, but also noted a much better hash function I recently found. It was called SuperFastHash (SFH) and was created by Paul Hsieh to overcome his 'problems' with the hash functions from Bob Jenkins. Juhani asked if somebody could write the SFH function in basm. A few people worked on a basm implementation and posted it.

In this post I will roughly describe how I as a fresh assembler optimizer created my version of the SFH function.

The Delphi implementation

I started with a regular Delphi implementation of the algorithm, using the c code from Paul's article as reference. After I got my Delphi implementation compiled and working I looked at the BASM code using the CPU debug view and tried to find points where I could help the Delphi compiler generating better assembly. Below is my initial pascal version. (beware I will post my final version later on in this post).

function SuperFastHash(AData:pointer; ADataLength: integer):longword;
// Pascal translation of the SuperFastHash function by Paul Hsieh
// more info: http://www.azillionmonkeys.com/qed/hash.html
// Translation by: Davy Landman
// No warranties, but have fun :)
var
  TempPart: longword;
  RemainingBytes: integer;
begin
  if not Assigned(AData) or (ADataLength <= 0) then
  begin
    Result := 0;
    Exit;
  end;
  Result := ADataLength;
  RemainingBytes := ADataLength and 3;
  ADataLength := ADataLength shr 2; // div 4, so var name is not correct anymore..
  // main loop
  while ADataLength > 0 do
  begin
    inc(Result, PWord(AData)^);
    TempPart := (PWord(Pointer(Cardinal(AData)+2))^ shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    AData := Pointer(Cardinal(AData) + 4);
    inc(Result, Result shr 11);
    dec(ADataLength);
  end;
  // end case
  if RemainingBytes = 3 then
  begin
    inc(Result, PWord(AData)^);
    Result := Result xor (Result shl 16);
    Result := Result xor (PByte(Pointer(Cardinal(AData)+2))^ shl 18);
    inc(Result, Result shr 11);
  end
  else if RemainingBytes = 2 then
  begin
    inc(Result, PWord(AData)^);
    Result := Result xor (Result shl 11);
    inc(Result, Result shr 17);
  end
  else if RemainingBytes = 1 then
  begin
    inc(Result, PByte(AData)^);
    Result := Result xor (Result shl 10);
    inc(Result, Result shr 1);
  end;
  // avalance
  Result := Result xor (Result shl 3);
  inc(Result, Result shr 5);
  Result := Result xor (Result shl 4);
  inc(Result, Result shr 17);
  Result := Result xor (Result shl 25);
  inc(Result, Result shr 6);
end;

Warning this is not final code!

The BASM implementation

I posted this and later posted an assembly version of SFH, I used the CPU view and the code suggestions from Bob Gonder to create this BASM code. Using BASM you can control the exact usage of registers and you're able to fine tune the assembly much better (and easier) than writing Delphi code.

My enthusiasm got the better of me and I spent a whole lot of evenings optimizing the Delphi and assembly versions of the SFH function (while I should have been working on my graduation internship paper). The discussion on the newsgroup turned a few interesting corners about the different kind of definitions of a hashing function. Therefor I must note that this function should only be used for a hashtable! It's tuned for giving a good distribution but not for cryptographically or verification usage.

The final unit

I think I've posted my final functions in the discussion thread, but recently I created a separate function file including some new parts. For instance, I added an functions for larger data which is the same algorithm, but uses less jumps because I unrolled the function. Below is my final unit of the SuperFastHash function.

unit unSuperFastHash;
(*
  A Delphi and assembly translation of the SuperFastHash function by
  Paul Hsieh (http://www.azillionmonkeys.com/qed/hash.html).

  I got the idea for translating it due to borland.public.delphi.language.basm.
  See the full discussion at:
  http://groups.google.com/group/borland.public.delphi.language.basm/
  browse_thread/thread/96482ba4d1d5a016/7745466ab714c3b3

 ***** BEGIN LICENSE BLOCK *****
 * Version: MPL 1.1/GPL 2.0/LGPL 2.1
 *
 * The contents of this file are subject to the Mozilla Public License Version
 * 1.1 (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 * http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS" basis,
 * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
 * for the specific language governing rights and limitations under the
 * License.
 *
 * The Original Code is SuperFastHash Delphi and BASM translation.
 *
 * The Initial Developer of the Original Code is
 * Davy Landman.
 * Portions created by the Initial Developer are Copyright (C) 2007
 * the Initial Developer. All Rights Reserved.
 *
 * Contributor(s):
 *
 *
 * Alternatively, the contents of this file may be used under the terms of
 * either the GNU General Public License Version 2 or later (the "GPL"), or
 * the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
 * in which case the provisions of the GPL or the LGPL are applicable instead
 * of those above. If you wish to allow use of your version of this file only
 * under the terms of either the GPL or the LGPL, and not to allow others to
 * use your version of this file under the terms of the MPL, indicate your
 * decision by deleting the provisions above and replace them with the notice
 * and other provisions required by the GPL or the LGPL. If you do not delete
 * the provisions above, a recipient may use your version of this file under
 * the terms of any one of the MPL, the GPL or the LGPL.
 *
 * ***** END LICENSE BLOCK ***** *)

interface

{.$define ASMVersion}

function SuperFastHash(AData: pointer; ADataLength: Integer): Cardinal;
function SuperFastHashLargeData(AData: pointer; ADataLength: Integer): Cardinal;

implementation

// Pascal translation of the SuperFastHash function by Paul Hsieh
// more info: http://www.azillionmonkeys.com/qed/hash.html
function SuperFastHash(AData: pointer; ADataLength: Integer): Cardinal;
{$ifndef ASMVersion}
var
  TempPart: Cardinal;
  RemainingBytes: Integer;
  RemainingDWords: Integer;
begin
  if not Assigned(AData) or (ADataLength <= 0) then
  begin
    Result := 0;
    Exit;
  end;
  Result := ADataLength;
  RemainingBytes := ADataLength and 3; // mod 4
  RemainingDWords := ADataLength shr 2; // div 4

  // main loop
  while RemainingDWords > 0 do
  begin
    Result := Result + PWord(AData)^;
    // splitting the pointer math keeps the amount of registers pushed at 2
    AData  := Pointer(Cardinal(AData) + SizeOf(Word));
    TempPart := (PWord(AData)^ shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    AData  := Pointer(Cardinal(AData) + SizeOf(Word));
    Result := Result + (Result shr 11);
    dec(RemainingDWords);
  end;
  // Handle end cases
  if RemainingBytes = 3 then
  begin
    Result := Result +    PWord(AData)^;
    Result := Result xor (Result shl 16);
    AData  := Pointer(Cardinal(AData) + SizeOf(Word));   // skip to the last byte
    Result := Result xor ((PByte(AData)^ shl 18));
    Result := Result +   (Result shr 11);
  end
  else if RemainingBytes = 2 then
  begin
    Result := Result +    PWord(AData)^;
    Result := Result xor (Result shl 11);
    Result := Result +   (Result shr 17);
  end
  else if RemainingBytes = 1 then
  begin
    Result := Result +    PByte(AData)^;
    Result := Result xor (Result shl 10);
    Result := Result +   (Result shr 1);
  end;
  // Force "avalanching" of final 127 bits
  Result := Result xor (Result shl 3);
  Result := Result +   (Result shr 5);
  Result := Result xor (Result shl 4);
  Result := Result +   (Result shr 17);
  Result := Result xor (Result shl 25);
  Result := Result +   (Result shr 6);
{$else}
asm
    push  esi
    push  edi
    test  eax, eax // data
    jz    @Ret // eax is result
    xchg  edx, eax // swith data and length
    test  eax, eax // length, and hash
    jle    @Ret
@Start:
    mov   edi, eax
    mov   esi, eax
    and   edi, 3    // last few bytes
    shr   esi, 2    // number of DWORD loops
    jz    @Last3
@Loop:
    movzx ecx, word ptr [edx]
    add   eax, ecx
    movzx ecx, word ptr [edx + 2]

    shl   ecx, 11
    xor   ecx, eax
    shl   eax, 16

    xor   eax, ecx
    mov   ecx, eax

    shr   eax, 11
    add   eax, ecx
    add   edx, 4
    dec   esi
    jnz   @Loop
@Last3:
    test  edi, edi
    jz    @Done
    dec   edi
    jz    @OneLeft
    dec   edi
    jz    @TwoLeft

    movzx ecx, word ptr [edx]
    add   eax, ecx
    mov   ecx, eax
    shl   eax, 16
    xor   eax, ecx
    movsx ecx, byte ptr [edx + 2]
    shl   ecx, 18
    xor   eax, ecx
    mov   ecx, eax
    shr   ecx, 11
    add   eax, ecx
    jmp   @Done
@TwoLeft:
    movzx ecx, word ptr [edx]
    add   eax, ecx
    mov   ecx, eax
    shl   eax, 11
    xor   eax, ecx
    mov   ecx, eax
    shr   eax, 17
    add   eax, ecx
    jmp   @Done
@OneLeft:
    movsx ecx, byte ptr [edx]
    add   eax, ecx
    mov   ecx, eax
    shl   eax, 10
    xor   eax, ecx
    mov   ecx, eax
    shr   eax, 1
    add   eax, ecx
@Done:
    // avalanche
    mov   ecx, eax
    shl   eax, 3
    xor   eax, ecx

    mov   ecx, eax
    shr   eax, 5
    add   eax, ecx

    mov   ecx, eax
    shl   eax, 4
    xor   eax, ecx

    mov   ecx, eax
    shr   eax, 17
    add   eax, ecx

    mov   ecx, eax
    shl   eax, 25
    xor   eax, ecx

    mov   ecx, eax
    shr   eax, 6
    add   eax, ecx
@Ret:
    pop   edi
    pop   esi
    ret
{$endif}
end;

function SuperFastHashLargeData(AData: pointer; ADataLength: Integer): Cardinal;
{$ifndef ASMVersion}
type
  TWordArray = array[0..(MaxInt div SizeOf(Word)) - 1] of Word;
  PWordArray = ^TWordArray;
var
  TempPart: Cardinal;
  RemainingBytes: Integer;
  RemainingDWords: Integer;
begin
  if not Assigned(AData) or (ADataLength <= 0) then
  begin
    Result := 0;
    Exit;
  end;
  Result := ADataLength;
  RemainingBytes := ADataLength and 3;
  RemainingDWords := ADataLength shr 2; // div 4
  // large loop
  while RemainingDWords >= 4 do
  begin
    Result := Result + PWord(AData)^;
    TempPart := (PWordArray(AData)^[1] shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    Result := Result + (Result shr 11);

    Result := Result + PWordArray(AData)^[2];
    TempPart := (PWordArray(AData)^[3] shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    Result := Result + (Result shr 11);

    Result := Result + PWordArray(AData)^[4];
    TempPart := (PWordArray(AData)^[5] shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    Result := Result + (Result shr 11);

    Result := Result + PWordArray(AData)^[6];
    TempPart := (PWordArray(AData)^[7] shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    Result := Result + (Result shr 11);

    // update the pointer and the counter
    AData  := Pointer(Cardinal(AData) + (8 * SizeOf(Word)));
    RemainingDWords := RemainingDWords - 4;
  end;
  // small loop
  while RemainingDWords > 0 do
  begin
    Result := Result + PWord(AData)^;
    AData  := Pointer(Cardinal(AData) + SizeOf(Word));
    TempPart := (PWord(AData)^ shl 11) xor Result;
    Result := (Result shl 16) xor TempPart;
    AData  := Pointer(Cardinal(AData) + SizeOf(Word));
    Result := Result + (Result shr 11);
    dec(RemainingDWords);
  end;
  // Handle end cases
  if RemainingBytes = 3 then
  begin
    Result := Result +    PWord(AData)^;
    Result := Result xor (Result shl 16);
    AData  := Pointer(Cardinal(AData) + SizeOf(Word));   // skip to the last byte
    Result := Result xor ((PByte(AData)^ shl 18));
    Result := Result +   (Result shr 11);
  end
  else if RemainingBytes = 2 then
  begin
    Result := Result +    PWord(AData)^;
    Result := Result xor (Result shl 11);
    Result := Result +   (Result shr 17);
  end
  else if RemainingBytes = 1 then
  begin
    Result := Result +    PByte(AData)^;
    Result := Result xor (Result shl 10);
    Result := Result +   (Result shr 1);
  end;
    // Force "avalanching" of final 127 bits
  Result := Result xor (Result shl 3);
  Result := Result +   (Result shr 5);
  Result := Result xor (Result shl 4);
  Result := Result +   (Result shr 17);
  Result := Result xor (Result shl 25);
  Result := Result +   (Result shr 6);
{$else}
 asm
    push  esi
    push  edi
    test  eax, eax // test for nil pointer
    jz    @Ret     // eax is also result, so save ret here
    xchg  edx, eax // swith data and length
    test  eax, eax // length, and hash
    jle    @Ret
@Start:
    mov   edi, eax
    mov   esi, eax
    and   edi, 3    // last few bytes
    shr   esi, 2    // number of DWORD loops
    jz    @Last3
@LargeLoop:
    cmp esi,$04
    jl @Loop
    // first DWORD
    movzx ecx, word ptr [edx]
    add   eax, ecx
    movzx ecx, word ptr [edx + 2]

    shl   ecx, 11
    xor   ecx, eax
    shl   eax, 16

    xor   eax, ecx
    mov   ecx, eax

    shr   eax, 11
    add   eax, ecx
    // second DWORD
    movzx ecx, word ptr [edx + 4]
    add   eax, ecx
    movzx ecx, word ptr [edx + 6]

    shl   ecx, 11
    xor   ecx, eax
    shl   eax, 16

    xor   eax, ecx
    mov   ecx, eax

    shr   eax, 11
    add   eax, ecx

    // third DWORD
    movzx ecx, word ptr [edx + 8]
    add   eax, ecx
    movzx ecx, word ptr [edx + 10]

    shl   ecx, 11
    xor   ecx, eax
    shl   eax, 16

    xor   eax, ecx
    mov   ecx, eax

    shr   eax, 11
    add   eax, ecx

    // fourth DWORD
    movzx ecx, word ptr [edx + 12]
    add   eax, ecx
    movzx ecx, word ptr [edx + 14]

    shl   ecx, 11
    xor   ecx, eax
    shl   eax, 16

    xor   eax, ecx
    mov   ecx, eax

    shr   eax, 11
    add   eax, ecx

    add   edx, 16
    sub   esi, 4
    jz    @Last3
    jmp   @LargeLoop
@Loop:
    movzx ecx, word ptr [edx]
    add   eax, ecx
    movzx ecx, word ptr [edx + 2]

    shl   ecx, 11
    xor   ecx, eax
    shl   eax, 16

    xor   eax, ecx
    mov   ecx, eax

    shr   eax, 11
    add   eax, ecx
    add   edx, 4
    dec   esi
    jnz   @Loop
@Last3:
    test  edi, edi
    jz    @Done
    dec   edi
    jz    @OneLeft
    dec   edi
    jz    @TwoLeft

    movzx ecx, word ptr [edx]
    add   eax, ecx
    mov   ecx, eax
    shl   eax, 16
    xor   eax, ecx
    movsx ecx, byte ptr [edx + 2]
    shl   ecx, 18
    xor   eax, ecx
    mov   ecx, eax
    shr   ecx, 11
    add   eax, ecx
    jmp   @Done
@TwoLeft:
    movzx ecx, word ptr [edx]
    add   eax, ecx
    mov   ecx, eax
    shl   eax, 11
    xor   eax, ecx
    mov   ecx, eax
    shr   eax, 17
    add   eax, ecx
    jmp   @Done
@OneLeft:
    movsx ecx, byte ptr [edx]
    add   eax, ecx
    mov   ecx, eax
    shl   eax, 10
    xor   eax, ecx
    mov   ecx, eax
    shr   eax, 1
    add   eax, ecx
@Done:
    // avalanche
    mov   ecx, eax
    shl   eax, 3
    xor   eax, ecx

    mov   ecx, eax
    shr   eax, 5
    add   eax, ecx

    mov   ecx, eax
    shl   eax, 4
    xor   eax, ecx

    mov   ecx, eax
    shr   eax, 17
    add   eax, ecx

    mov   ecx, eax
    shl   eax, 25
    xor   eax, ecx

    mov   ecx, eax
    shr   eax, 6
    add   eax, ecx
@Ret:
    pop   edi
    pop   esi
    ret
{$endif}
end;

end.

To use this code you'll only have to download the unit and choose if you want to use the Pascal versions or the Assembly versions. You can switch between this using the {$define ASMVersion} preprocessor.

Difference between my version and Paul Hsieh's

The only difference between my Pascal code and Paul's c code is the pointer mathematics in the core loop. Below is the core loop c code.

for (;len > 0; len--) {
    hash  += get16bits (data);
    tmp    = (get16bits (data+2) << 11) ^ hash;
    hash   = (hash << 16) ^ tmp;
    data  += 2*sizeof (uint16_t);
    hash  += hash >> 11;
}

And this is my Pascal version of this code.

while RemainingDWords > 0 do
begin
  Result := Result + PWord(AData)^;
  // splitting the pointer math keeps the amount of registers pushed at 2
  AData  := Pointer(Cardinal(AData) + SizeOf(Word));
  TempPart := (PWord(AData)^ shl 11) xor Result;
  Result := (Result shl 16) xor TempPart;
  AData  := Pointer(Cardinal(AData) + SizeOf(Word));
  Result := Result + (Result shr 11);
  dec(RemainingDWords);
end;

Paul's code increments the pointer at the end data += 2*sizeof (uint16_t);, and uses get16bits (data+2) to get to the second Word while mine increases the data pointer in between, because the Delphi Compiler will not compile PWord(Pointer(Cardinal(AData)+2))^ into movzx ecx, word ptr [edx + 2] but will store the intermediate pointer in a extra register which causes performance loss.

The funny thing is, although I tried a lot of different ways to get a better performance I ended up with an implementation almost identical to Paul's. I hadn't found the assembly version he created until after I wrote my own, and I think it's nice to see that apart from the initialization and the case statement they are the same.

Next time I will post my benchmarking code.

Tags: algorithm, basm, delphi, speedup

11 comments:

Unknown said:

Hi,

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

-
Delphi software development services
at 18 October, 2010 21:30
JeanAlysson said:

Hi Davy,

I am a brazilian developer and would like to know how many digits are generated with SuperFastHash Pascal ?
I need a algorithm that generate a number with 10 digits based on a sequencial number (1,2,3...), one at a time, by hash, is it possible ?
Do you have a example ?

Thanks a lot !
Jean Alysson
at 21 June, 2011 19:08
Davy Landman said:

Hi Jean,

Well, the SuperFastHash generates a "random" number between 0 and 4294967295 (unsigned 32bit integer).

I don't understand your "on a sequencial number" part, do you want a unique identifier? Than either let a database take care of that, or use a GUID.

Hashes are not unique, so I would suggest you read about hashes on wikipedia or try to explain your question more clearly.

Cheers,
Davy
at 21 June, 2011 20:03
JeanAlysson said:

Hello Davy,

thanks by explanation, I need generate a unique identifier with 10 digits to use as a bar code.
I have used a generator database as a base to hash (your doubt about sequencial number), but as you said, is not unique, then how could I generate this unique number with 10 digits ?
PS: 10 random digits to dificult clones.

Cheers,Jean Alysson
at 22 June, 2011 15:24
Davy Landman said:

well, the only real way to avoid duplicate identifiers is to store which identifiers you've already used.
at 22 June, 2011 15:25
Jussi said:

Yes, by definition hash is deterministic procedure which should change when source changes (thus it can, but generally should not) produce same hashes from very different sources.

You might use (or build your own) hash table function that has check against duplicates (ie. when adding entities, if same hash exists, perform additional check, or even use 2 simultaneus hash functions)...

-Juhani Suhonen
at 25 August, 2011 23:00
Anonymous said:

5Hello, thanks for the unit.
I works perfectly on 32 -bit version (D2007) but when I port the same code under XE3, it throws Access Violation at code

TempPart := (PWord(AData)^ shl 11) xor Result

I have no idea why.
at 27 June, 2014 12:39
Anonymous said:

Hello,

Your Pure Pascal code has a bug, does not work on 64-bit Compiler.

The reason is you are casting 64-bit pointer using Cardinal(AData) which will truncate the pointer.

The solution is to replace Cardinal(AData) by NativeUInt(AData)

Cheers.
at 27 June, 2014 15:43
Unknown said:

http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
Ctrl+F 18742
at 16 June, 2015 12:39
Anonymous said:

The Pascal and Assembly version produces different results for the same input.
at 10 September, 2015 20:28
Davy Landman said:

This code was originally written 7 years ago, for Delphi 7. I'm not surprised it doesn't work for the newer compilers, especially in 64bit mode.

Moreover, I would try to find a better hashing algorithm, such as the MurmurHash family, or xxHash, they are faster and create less collisions.
at 11 September, 2015 14:22

SuperFastHash from Paul Hsieh Translated to Delphi and Borland Assembler (BASM)

How it started

The Delphi implementation

The BASM implementation

The final unit

Difference between my version and Paul Hsieh's

11 comments:

Unknown said:

JeanAlysson said:

Davy Landman said:

JeanAlysson said:

Davy Landman said:

Jussi said:

Anonymous said:

Anonymous said:

Unknown said:

Anonymous said:

Davy Landman said:

About me

My github projects

My stackoverflow valuable flair

Blog Archive

License