- To: Dave Kempe <dave@xxxxxxxxxxxxxxxxxxxxx>
- Subject: Re: [SLUG] fast hashing/checksumming tool
- From: Amos Shapira <amos.shapira@xxxxxxxxx>
- Date: Tue, 24 Feb 2009 22:32:41 +1100
- Cc: slug <slug@xxxxxxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Q9oCmL1h4NEnR5hNM5xgRDNBtcmYtaqX9BpWY3wHzyc=; b=S8EZFO4WHkI/8sJ9uiWlHiEmWO99YQkpZSIf5npKM63RDmZmhq2RwPkkS/Bpa6lWMn aqd3tWFcOY3FpPjvrdoLrh1MoaU0BC6/eAyMMqg+m0TIU9GnYYHxP2W2i8IHYK3uO2m9 q+e60YnVcxQ4hJ27aRwUcTNOkq3m/pApX8Vxw=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=HK3I+HrHY2fmbnHX8960O7JbDmQ2MG1+pOlXTA7nGJZQDmAAwnJnaYtUNyIG9LDZnK Fq1fl1uaenPVzj30dX/ePv59dS+8QILdXVM2jTWqQ4a0QPNkLCZ0SpMyRaS8uNvms5Ky WjGfiB5SraeQqp0Xjiq6GBqjXgTJ7oSlPYrrM=
2009/2/24 Dave Kempe <dave@xxxxxxxxxxxxxxxxxxxxx>:
> Hi,
> I need to checksum recursively alot of data, and store the checksums in a
> database. I can do most via a shell script, but was wondering if anyone
> could recommend a checksumming tool that was the fastest.
> I know about md5sum, sha1sum, cfv (not recursive enough). I want to be able
> to produce a checksum of many files (2.1TB worth) for verification against
> other copies of the files in various locations. I need the fastest available
> algorithm, not necessarily the most secure etc.
> Any suggestions?
I'm no expert but I think md4 is considered very weak but also faster
than other hash algorithms. It is therefore used where security is
less of a concern (e.g. to checksum data which is already signed by
stronger algorithms). According to its wikipedia article rsync uses
it.
openssl comes with md4 so you can do, for instance, "openssl md4 /etc/passwd".
Try comparing the relative performance by replacing "md4" by "md5".
On my system, I ran it multiple times on a 892Mb file and once the
file was all cached in memory md4 persistently ran for 2.06 seconds
elapsed time on it while md5 settled at 3.2 seconds. That's a 35%
speedup compared to md5.
Maybe there are faster algorithms around.
--Amos