Raymii.org
Quis custodiet ipsos custodes?Home | About | All pages | Cluster Status | RSS Feed
Generate hashes of files with rhash for archival storage
Published: 11-12-2015 | Author: Remy van Elst | Text only version of this article
❗ This post is over eight years old. It may no longer be up to date. Opinions may have changed.
Table of Contents
Recently I had to archive a large amount of files to archival storage. To save space and reduce the amount of files I decided to create archives with tar. The files will be stored to tapes and DVD's, and will be restored in full, so random access times are not an issue, therefore the tar.gz choice.
I do want to make sure that when the files need to be restored they still are
correct. I first dabbled with some long shell commands to create checksums and
verify them, but then I found the rhash
tool in the repositories. It allows
you to create checksums of files and folders, recursively, with all sorts of
checksums, like CRC, MD5, SHA1 and many more. It also makes bulk validation very
simple.
This small article shows you how to create an archive file with the checksums included and shows you how to validate these checksums later on.
Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:
I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!
Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.
You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $200 credit for 60 days. Spend $25 after your credit expires and I'll get $25!
The data in question are archived tapes, disk copies, source code and documentation for the PDP8 mainframe. We also have these for the PDP11 and a few VAX machines. The archives contain about 5 million files and is about 700 GB in size. The company decided to phase out the on-line storage and place this data on tapes and dvd's, since they're not accessed more than once or twice a month.
Creating the hashes
The first archive contains PDP8 files located in the folder pdp8
. This command
creates the MD5SUMS
file, which we place in the same folder:
rhash --recursive --md5 --output=pdp8/MD5SUMS pdp8/
The archive is later on created with a simple tar -czf pdp8.tar.gz pdp8
.
Verifying the hashes
Extract the archive to a folder and use the following command to verify all files:
rhash --skip-ok --check pdp8/MD5SUMS
If all files match the output looks like this:
--( Verifying pdp8/MD5SUMS )----------------------------------------------------
--------------------------------------------------------------------------------
Everything OK
If a file does not match the hash, the output will include it:
--( Verifying pdp8/MD5SUMS )----------------------------------------------------
pdp8/pdp8/readme.txt ERR
--------------------------------------------------------------------------------
Errors Occurred: Errors:1 Miss:0 Success:3323 Total:3324
If you leave out the --skip-ok
option all files checked will be shown which
might result in long output.
To manually verify one file, first get the checksum:
grep 'pdp8/readme.txt' pdp8/MD5SUMS
53a1aca1631d55de3feece9e1c4d900a pdp8/pdp8/readme.txt
Then manually execute the correct checksum command to verify the match:
$ md5sum pdp8/pdp8/readme.txt
53a1aca1631d55de3feece9e1c4d900a pdp8/pdp8/readme.txt
Tags: archive
, bash
, blog
, gzip
, md5sum
, rhash
, tar