Test Results Analysis
I will further analyze the performance of PHP's built-in hashes, in addition to the analysis made earlier here by [Michael] [1] (see above), because this topic is quite interesting and has unexpected results.
The results are not so obvious or even surprising. A simple algorithm is CRC32, slower than a complex one is MD5. It seems that modern processors do not like certain old algorithms and execute them very slowly. The CRC32 CCIT ITU algorithm was relatively fast and efficient in the good old days when there were 300 BPS remote access modems. Now there is a modern algorithm specially designed for new equipment that can run much faster on the same hardware than old algorithms that are inherently unsuitable for new equipment, and even if you try to optimize them, they will in any case slow. For example, for algorithms in which each byte depends on the previous one, you cannot take advantage of 64-bit registers and process many bits in parallel.
You can see from other cryptographic libraries that confirm what we see in PHP - that CRC32 has almost the same maximum speed as MD5. Here is a link with the results of another library: https://www.cryptopp.com/benchmarks.html
OpenSSL shows similar results. At first glance, this may seem irrational, because the algorithm for CRC32 is much simpler than for MD5, but reality shows the opposite.
I just want to show how simple the CRC32 function is.
Here is the code that updates the CRCR32 counter with the following incoming byte (Delphi):
// Returns an updated CRC32 function UpdateCrc32(CurByte: Byte; CurCrc: Cardinal): Cardinal; inline; begin UpdateCrc32 := Crc32Table[Byte(CurCrc xor CurByte)] xor (CurCrc shr 8); end;
Here is the assembly code:
@calc_crc32: xor dl,[esi] mov al,dl shr edx,8 xor edx,dword ptr [edi+eax*4] inc esi loop @calc_crc32
You can also deploy this code so that you get only 5 processor instructions for each byte:
xor dl,bl shr rbx,8 mov al,dl shr edx,8 xor edx,dword ptr [r8+rax*4]
You just need to load the rbx register with the next 8 bytes of data, and then repeat this code 8 times until you need to load the next 8 bytes into the 64-bit rbx register.
Here's the caller's routing, which calculates the CRC32 of the entire line:
function CalcCRC32(const B; Size: NativeUINT; const InitialValue: Cardinal = CRC32_INIT): Cardinal; var C: Cardinal; P: PAnsiChar; i: NativeUINT; begin C := InitialValue; if Size > 0 then begin P := @B; for i := 0 to Size - 1 do C := UpdateCrc32(Byte(P[i]), C); end; Result := C; end;
And here is how it is compiled into Delphi machine code - not very optimal, but rather simple - only 11 build commands for each byte, which, surprisingly, runs on Intel Core i5-6600 a little faster than the above assembler code even after the cycle has been canceled. As you can see, and all of these CRC32 CCIT ITU implementation instructions are straightforward, without loops or comparisons, there is only one comparison at the end of each byte. This is just a Delphi compiled code debugger, not a human-written build code.
CRC32.pas.78: begin push esi push edi CRC32.pas.80: if Size > 0 then test edx,edx jbe $00500601 CRC32.pas.82: P := @B; mov edi,eax CRC32.pas.83: for i := 0 to Size - 1 do mov eax,edx dec eax test eax,eax jb $00500601 inc eax xor esi,esi CRC32.pas.84: C := UpdateCrc32(Byte(P[i]), C); movzx edx,[edi+esi] xor dl,cl movzx edx,dl mov edx,[edx*4+$517dec] shr ecx,$08 xor edx,ecx mov ecx,edx inc esi CRC32.pas.83: for i := 0 to Size - 1 do dec eax jnz $005005e6 CRC32.pas.86: Result := C; mov eax,ecx CRC32.pas.87: end; pop edi pop esi ret
Here is another version of the program code for CRC32: there are only 5 processor instructions for each byte, not 11, but it is essentially the same as the above assembler code, it just uses different registers and avoids the loop command, which again on i5 6600 is faster than two different instructions. You can find all the code in the CRC32 assembler function called from the C console application
586 .model flat, stdcall .xmm .data .code CRC32 proc sizeOfFile:DWORD, file:DWORD push esi push ecx push edx mov esi, file xor edx, edx or eax, -1 mov ecx, sizeOfFile CRC32_loop: mov dl, byte ptr [esi] xor dl, al shr eax, 8 xor eax, dword ptr [crc32_table + 4*edx] inc esi dec ecx jnz CRC32_loop not eax pop edx pop ecx pop esi ret
Now compare it to MD5 using this highly optimized assembler code by Peter Savacki:
; MD5_386.Asm - 386 optimized helper routine for calculating ; MD Message-Digest values ; written 2/2/94 by ; ; Peter Sawatzki ; Buchenhof 3 ; D58091 Hagen, Germany Fed Rep ; ; EMail: Peter@Sawatzki.de ; EMail: 100031.3002@compuserve.com ; WWW: http:
The above code handles one call of 64 bytes of incoming data. It is called from the main procedure, which takes the preparation steps:
procedure CiphersMD5Update(var Context: TMD5Ctx; const ChkBuf; len: UInt32); var BufPtr: ^Byte; Left: UInt32; begin If Context.Count[0] + UInt32(len) shl 3 < Context.Count[0] then Inc(Context.Count[1]); Inc(Context.Count[0], UInt32(len) shl 3); Inc(Context.Count[1], UInt32(len) shr 29); BufPtr := @ChkBuf; if Context.BLen > 0 then begin Left := 64 - Context.BLen; if Left > len then Left := len; Move(BufPtr^, Context.Buffer[Context.BLen], Left); Inc(Context.BLen, Left); Inc(BufPtr, Left); If Context.BLen < 64 then Exit; Transform(Context.State, @Context.Buffer); Context.BLen := 0; Dec(len, Left) end; while len >= 64 do begin Transform(Context.State, BufPtr); Inc(BufPtr, 64); Dec(len, 64) end; if len > 0 then begin Context.BLen := len; Move(BufPtr^, Context.Buffer[0], Context.BLen) end end;
And if your processor supports CRC32 operation codes (SSE 4.2), you can calculate checksums 10 times faster using this code:
function crc32csse42(crc: cardinal; buf: Pointer; len: NativeUInt): cardinal; asm // ecx=crc, rdx=buf, r8=len .NOFRAME mov eax,ecx not eax test r8,r8; jz @0 test rdx,rdx; jz @0 @7: test rdx,7; jz @8 // align to 8 bytes boundary crc32 dword ptr eax,byte ptr [rdx] inc rdx dec r8; jz @0 test rdx,7; jnz @7 @8: mov rcx,r8 shr r8,3 jz @2 @1: crc32 dword ptr eax,dword ptr [rdx] crc32 dword ptr eax,dword ptr [rdx+4] dec r8 lea rdx,rdx+8 jnz @1 @2: and rcx,7; jz @0 cmp rcx,4; jb @4 crc32 dword ptr eax,dword ptr [rdx] sub rcx,4 lea rdx,rdx+4 jz @0 @4: crc32 dword ptr eax,byte ptr [rdx] dec rcx; jz @0 crc32 dword ptr eax,byte ptr [rdx+1] dec rcx; jz @0 crc32 dword ptr eax,byte ptr [rdx+2] @0: not eax end;
Please note that in my example, I use a buffer of only 5 KB in size to fit in the processor cache and exclude the influence of slow RAM on the speed of digest calculation.
In PHP, even in version 7, there seems to be no support for CRC32 hardware acceleration, although these instructions are supported on Intel and AMD processors with age. Intel has supported CRC32 since November 2008 (Nehalem (microarchitecture)), and AMD seems to have supported it since 2013.
My own tests confirming Michael's results
I tested various PHP hash functions on different configurations: (1) AMD FX-8320 (released in 2012) under Ubuntu with PHP 5 and (2) Intel Core i5-6600 released in 2015 under Windows with PHP 7. I also tested the OpenSSL test on this Intel Core i5-6600. In addition, I run tests of cryptographic procedures that we use in our software "The Bat!". written in Delphi. Although the main software is written in Delphi, the cryptographic routines we use are written on the Assembler processor for Intel (32-bit or 64-bit) or C.
I found out that our Delphi code shows very large differences in speed between different hash functions and data sizes. This is in contrast to PHP, where to a certain extent and with rare exceptions, all hash functions from the simplest CRC32 to the cryptographically strong MD5 have almost the same ascent rate.
So, here are the measurements I made on AMD FX-8320, PHP5, Ubuntu. I did two tests. Firstly, I spent 5000 iterations on a hash message consisting of only 5 bytes. By this small message size, I was going to check the duration of the initialization / completion steps of various algorithms and how this affects the overall performance. For some algorithms, such as CRC32, three are practically not finalization stages - the digest is always ready after each byte. Cryptographically strong functions, such as SHA1 or MD5 or others, have a finalization step that compresses a larger context to a smaller final digest. Secondly, I run 5000 iterations for a hash message of 5000 bytes in length. Both messages were filled in advance using pseudo-random bytes (they were not filled again after each iteration, they were filled only once, when the program was launched).
Results of my PHP hash speed test
PHP- PHP5, PHP7, PHP. , 5000 5- , 5000 5000- . :
Legend: (1) 5b x 5000, AMD FX-8320, PHP5 (2) 5000b x 5000, AMD FX-8320, PHP5 PHP hash (1) (2) -------- ------------ ------------ md2 0.021267 sec 2.602651 sec md4 0.002684 sec 0.035243 sec md5 0.002570 sec 0.055548 sec sha1 0.003346 sec 0.106432 sec sha224 0.004945 sec 0.210954 sec sha256 0.004735 sec 0.238030 sec sha384 0.005848 sec 0.144015 sec sha512 0.006085 sec 0.142884 sec ripemd128 0.003385 sec 0.120959 sec ripemd160 0.004164 sec 0.174045 sec ripemd256 0.003487 sec 0.121477 sec ripemd320 0.004206 sec 0.177473 sec whirlpool 0.009713 sec 0.509682 sec tiger128,3 0.003414 sec 0.059028 sec tiger160,3 0.004354 sec 0.059335 sec tiger192,3 0.003379 sec 0.058891 sec tiger128,4 0.003514 sec 0.073468 sec tiger160,4 0.003602 sec 0.072329 sec tiger192,4 0.003507 sec 0.071856 sec snefru 0.022101 sec 1.190888 sec snefru256 0.021972 sec 1.217704 sec gost 0.013961 sec 0.653600 sec adler32 0.001459 sec 0.038849 sec crc32 0.001429 sec 0.068742 sec crc32b 0.001553 sec 0.063308 sec fnv132 0.001431 sec 0.038256 sec fnv164 0.001586 sec 0.060622 sec joaat 0.001569 sec 0.062947 sec haval128,3 0.006747 sec 0.174759 sec haval160,3 0.005810 sec 0.166154 sec haval192,3 0.006129 sec 0.168382 sec haval224,3 0.005918 sec 0.166792 sec haval256,3 0.006119 sec 0.173360 sec haval128,4 0.007364 sec 0.233829 sec haval160,4 0.007917 sec 0.240273 sec haval192,4 0.007676 sec 0.245864 sec haval224,4 0.007580 sec 0.245249 sec haval256,4 0.007442 sec 0.241091 sec haval128,5 0.008651 sec 0.281248 sec haval160,5 0.009304 sec 0.278619 sec haval192,5 0.008972 sec 0.281235 sec haval224,5 0.008917 sec 0.274923 sec haval256,5 0.008853 sec 0.282171 sec
PHP script Intel Core i5-6600, 64- PHP7 Windows 10. :
Legend: (1) 5b x 5000, Intel Core i5-6600, PHP7 (2) 5000b x 5000, Intel Core i5-6600, PHP7 PHP hash (1) (2) --------- ------------ ------------ md2 0.016131 sec 2.308100 sec md4 0.001218 sec 0.040803 sec md5 0.001284 sec 0.046208 sec sha1 0.001499 sec 0.050259 sec sha224 0.002683 sec 0.120510 sec sha256 0.002297 sec 0.119602 sec sha384 0.002792 sec 0.080670 sec ripemd128 0.001984 sec 0.094280 sec ripemd160 0.002514 sec 0.128295 sec ripemd256 0.002015 sec 0.093887 sec ripemd320 0.002748 sec 0.128955 sec whirlpool 0.003402 sec 0.271102 sec tiger128,3 0.001282 sec 0.038638 sec tiger160,3 0.001305 sec 0.037155 sec tiger192,3 0.001309 sec 0.037684 sec tiger128,4 0.001618 sec 0.050690 sec tiger160,4 0.001571 sec 0.049656 sec tiger192,4 0.001711 sec 0.050682 sec snefru 0.010949 sec 0.865108 sec snefru256 0.011587 sec 0.867685 sec gost 0.008968 sec 0.449647 sec adler32 0.000588 sec 0.014345 sec crc32 0.000609 sec 0.079202 sec crc32b 0.000636 sec 0.074408 sec fnv132 0.000570 sec 0.028157 sec fnv164 0.000566 sec 0.028776 sec joaat 0.000623 sec 0.042127 sec haval128,3 0.002972 sec 0.084010 sec haval160,3 0.002968 sec 0.083213 sec haval192,3 0.002943 sec 0.082217 sec haval224,3 0.002798 sec 0.084726 sec haval256,3 0.002995 sec 0.082568 sec haval128,4 0.003659 sec 0.112680 sec haval160,4 0.003858 sec 0.111462 sec haval192,4 0.003526 sec 0.112510 sec haval224,4 0.003671 sec 0.111656 sec haval256,4 0.003636 sec 0.111236 sec haval128,5 0.004488 sec 0.140130 sec haval160,5 0.005095 sec 0.137777 sec haval192,5 0.004117 sec 0.140711 sec haval224,5 0.004311 sec 0.139564 sec haval256,5 0.004382 sec 0.138345 sec
, CRC32 PHP , MD5 . , 5000 5000 Intel Core i5-6600 PHP7 CRC32 MD5 (!). . .
, PHP MD5 SHA1, Ubuntu PHP5, 5000 5000 MD5 .
OpenSSL
OpenSSL Intel i5-660. -. , , , : , OpenSSL 3 . , :
Legend: (1) OpenSSL 1.1.0 on Intel Core i5-6600, number of 16-bytes messages processed in 3 seconds (2) OpenSSL 1.1.0 on Intel Core i5-6600, number of 8192-bytes messages processed in 3 seconds Algorighm (1) (2) --------- --------- ---------- md4 50390.16k 817875.48k md5 115875.35k 680700.59k sha1 118158.30k 995986.09k ripemd160 30308.79k 213224.11k whirlpool 39605.02k 182072.66k
, md5 sha1, , MD5 SHA-1 .
- Delphi
Delphi Intel Core i5-6600 64- Windows 10, 32- Win32.
Legend: (1) Delphi, 5b x 5000 iterations (2) Delphi, 5000b x 5000 iterations Algorighm (1) (2) --------------- -------------- -------------- md2 0.0381010 secs 5.8495807 secs md5 0.0005015 secs 0.0376252 secs sha1 0.0050118 secs 0.1830871 secs crc32 >0.0000001 secs 0.0581535 secs crc32c (intel hw) >0.0000001 secs 0.0055349 secs
, MD2 , , - , PHP-, MD5 , SHA-1, Delphi , , PHP , PHP7 0,001284 , 5000 5- MD5, 0,001499 SHA1. 5000 - PHP7 0.046208 MD5 0.050259 SHA-1.
Delphi, 0,0005015 5000 5- MD5 0.0050118 SHA1. 5000 - Delphi 0.0376252 secs MD5 0.1830871 SHA-1. , MD5 Delphi, SHA-1 . , Delphi 10 5- , 5000- SHA-1.
CRC32 CRC32C, Delphi , 10 1000 , PHP.
Conclusion
PHP . , PHP , - , . , : , MD2 , MD5. , MD2. PHP MD2 MD5 . MD5, -, PGP RFC-1991, , , , ETags . ( PHP , ), MD5 . PHP-, . (. ).
<? define (TRAILING_ZEROS, 6); $strlens = array(5, 30, 90, 1000, 5000); $hashes = hash_algos(); function generate_bytes($len) { if (function_exists('random_bytes')) {$fn='random_bytes';$str = random_bytes($len);} else // for php 5 if (function_exists('openssl_random_pseudo_bytes')) {$fn='openssl_random_pseudo_bytes';$str = openssl_random_pseudo_bytes($strlen);} else // for php 7 { flush(); ob_start () ; phpinfo () ; $str = str_pad(substr(ob_get_contents (), 0, $len), $len) ; ob_end_clean () ; $fn = 'phpinfo'; } return array(0=>$str, 1=>$fn); } foreach ($strlens as $strlen) { $loops = 5000; echo "<h1>$loops iterations on $strlen bytes message</h1>".PHP_EOL; echo '<p>'; $r = generate_bytes($strlen); $str = $r[0]; $gotlen = strlen($str); while ($gotlen < $strlen) { // for some uncodumented reason, the openssl_random_pseudo_bytes returned less bytes than needed $left = $strlen-$gotlen; echo "The ".$r[1]."() function returned $left byes less, trying again to get these remaining bytes only<br>"; $r = generate_bytes($left); $str.= $r[0]; $gotlen = strlen($str); }; echo "Got the whole string of ".strlen($str)." bytes!"; echo '</p>'; echo PHP_EOL; echo "<pre>"; foreach ($hashes as $hash) { $tss = microtime(true); for($i=0; $i<$loops; $i++) { $x = hash($hash, $str, true); } $tse = microtime(true); echo "\n".str_pad($hash, 15, ' ')."\t" . str_pad(round($tse-$tss, TRAILING_ZEROS), TRAILING_ZEROS+2, '0') . " sec \t" . bin2hex($x); } echo PHP_EOL."</pre>".PHP_EOL; flush(); } ?>