You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The drdobbs article says that this yields performance of about 1.17 cycles per 64bits word (for a measurement done with a loop, repeatedly computing over a small amount of data, so I guess one can assume they sit in L1 or L2 cache of cpu).
At 2.4GHz, this could mean up to 16GB/s (or whatever your RAM bandwidth is limiting this value to).
The text was updated successfully, but these errors were encountered:
ThomasWaldmann
changed the title
speed improvement using intel crc32 cpu instruction?
speed improvement using SSE4 crc32 cpu instruction?
May 21, 2016
Hmm, this is architecture specific and only works for one specific polynomial (0x1EDC6F41). I think it's unlikely to implemented in pycrc any time soon.
Considering that less-than-5y-old intel/amd cpus are quite common and many people just need some crc (not a specific crc), I can imagine a lot of people could use this.
I ran test/performance.sh and the maximum I got from that was 0.806 GB/s (crc32, table-driven sb4) on a Core i5-4200u.
There is special support for crc computation in intel/AMD CPUs since quite some years:
http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411
https://en.wikipedia.org/wiki/SSE4#Supporting_CPUs
The drdobbs article says that this yields performance of about 1.17 cycles per 64bits word (for a measurement done with a loop, repeatedly computing over a small amount of data, so I guess one can assume they sit in L1 or L2 cache of cpu).
At 2.4GHz, this could mean up to 16GB/s (or whatever your RAM bandwidth is limiting this value to).
The text was updated successfully, but these errors were encountered: