Generating File Manifests and Then Checking Them

This issue has come about whilst having to migrate a positively huge number of files, and have to check the integrity of the transfer.

Build the manifest

find /path/to/folder -type f -print0 | xargs --null md5sum > /path/to/manifest
  • -type f : This flag tells find to only return files
  • -print0: This flag tells find to null terminate strings, this allows us to take files with spaces
  • –null: This flag tells xargs to accept null terminated strings
  • NOTE: PUT THE MANIFEST OUTSIDE THE FOLDER YOU ARE INDEXING!

Checking the manifest

md5sum --check /path/to/manifest | grep FAILED

The above will return all failed checks, if you want a simple count (maybe for automated reporting) just add | wc -l

FAQ

How big is the manifest?

This depends entirely on the length of your filepaths, taking UTF-8 as an encoding example each char is 8bits or 1byte, each manifest line consists of the md5hash, a space and the filepath as the filepath length varies there is no exact way to estimate the filesize of the manifest.

However each line is always 32 + 1 + len(path) bytes.

The more sub directories you have the larger the manifest size will be.

How long does the manifest take to build?

This depends on the number of files you have to index, along with any other factors such as network shares, in test runs 2819 files indexed in 1.493 seconds.

Comments