SLUG Mailing List Archives
Re: [SLUG] fast hashing/checksumming tool
- To: Amos Shapira <amos.shapira@xxxxxxxxx>
- Subject: Re: [SLUG] fast hashing/checksumming tool
- From: Jake Anderson <yahoo@xxxxxxxxxxxxxxx>
- Date: Tue, 24 Feb 2009 22:51:19 +1100
- Cc: slug <slug@xxxxxxxxxxx>
- User-agent: Thunderbird 22.214.171.124 (X11/20090105)
Amos Shapira wrote:
2009/2/24 Dave Kempe <dave@xxxxxxxxxxxxxxxxxxxxx>:
I need to checksum recursively alot of data, and store the checksums in a
database. I can do most via a shell script, but was wondering if anyone
could recommend a checksumming tool that was the fastest.
I know about md5sum, sha1sum, cfv (not recursive enough). I want to be able
to produce a checksum of many files (2.1TB worth) for verification against
other copies of the files in various locations. I need the fastest available
algorithm, not necessarily the most secure etc.
I'm no expert but I think md4 is considered very weak but also faster
than other hash algorithms. It is therefore used where security is
less of a concern (e.g. to checksum data which is already signed by
stronger algorithms). According to its wikipedia article rsync uses
openssl comes with md4 so you can do, for instance, "openssl md4 /etc/passwd".
Try comparing the relative performance by replacing "md4" by "md5".
On my system, I ran it multiple times on a 892Mb file and once the
file was all cached in memory md4 persistently ran for 2.06 seconds
elapsed time on it while md5 settled at 3.2 seconds. That's a 35%
speedup compared to md5.
Maybe there are faster algorithms around.
That's the thing though the OP had 2.1TB worth of data.
Most hash algorithms on standard hardware will be diskIO bound rather
than CPU limited.
In other words it don't really matter much.
I think I have heard of hashing algorithms being implemented in video
cards (GPGPU and CUDA)
so if you really wanted some high speed hashing that would be the way to
go ;-> getting enough data to hash at that rate is left as an exercise
for the reader