Problem This recipe demonstrates how to setup a hash reference framework to resolve MD5 hashes using the HashDig tool suite. Motivation The motivation for this recipe was to establish a flexible and efficient framework building and maintaining reference hash sets. Requirements Cooking with this recipe requires the current HashDig tool suite, which is part of the The FTimes Project. You can download FTimes from the following URL: http://ftimes.sourceforge.net/FTimes/index.shtml Note, however, that this recipe actually requires HashDig tools that are more recent than the latest FTimes release (i.e. version 3.3.0). You can always obtain the most up-to-date HashDig tools by checking thme out from the project's cvs repository. See the following URL for more details: http://sourceforge.net/cvs/?group_id=41134 Configure FTimes using the following command, then make and install FTimes. $ ./configure --with-hashdig-tools $ make $ make install This recipe also requires knowledge of how to use the HashDig tools. An overview describing how to use the HashDig tool suite can be found in tools/hashdig/README or on the HashDig page that is located here: http://ftimes.sourceforge.net/FTimes/HashDig.shtml Each HashDig tool contains it's own man page. You may review these pages with perldoc or by looking at the source directly. The commands presented throughout this recipe were designed to be executed within a Bourne shell (i.e. sh or bash). Time to Implement Implementation time for this recipe should be less than two hours, but this can vary widely due to several factors such as number and size of raw hash files, download speed, and IO/CPU speed of the computer building the various reference databases. Solution The solution is to setup a reference directory tree, populate it with reference files, and build one or more reference databases. The following steps describe how to implement this solution. 1. Set the ROOT_DIR, HASHDIG_DIR, and REFERENCES_DIR environment variables in your shell. Next, create the HASHDIG_DIR and REFERENCES_DIR directories. Note that ROOT_DIR should be large enough to hold at least 5X the amount of raw data. $ ROOT_DIR="/usr" $ HASHDIG_DIR="${ROOT_DIR}/hashdig" $ REFERENCES_DIR="${HASHDIG_DIR}/references" $ mkdir -p ${REFERENCES_DIR} 2. Create one or more group directories under REFERENCES_DIR. A group directory must begin with the prefix 'group_' and should be used to partition hashes into related sets or groups. For example, you may wish to create a single group directory named group_good which contains all hash sets that are known to be good. On the other hand, you may wish to create multiple groups such as: group_knowngood, group_knownbad, group_backups, group_media, and so on. When you have decided on the groups you will create, make the directories as shown in the example below: $ DIR_LIST="group_knowngood group_knownbad group_backups group_media" $ for DIR in ${DIR_LIST} ; do mkdir -p ${REFERENCES_DIR}/${DIR} ; done 3. Extract the Makefile from this recipe and place it in REFERENCES_DIR using the command below. The Makefile is used to build and maintain reference database files. $ sed -e '1,/^--- Makefile ---$/d; /^--- Makefile ---$/,$d' hashdig-make-references.txt > ${REFERENCES_DIR}/Makefile 4. After creating one or more group directories, create the directory tree under each group using the commands below. $ cd ${REFERENCES_DIR} $ make setup The result will be a directory structure similar to the example shown below. - hashdig | - references | - group_1 | | | - db.by_file | - db.by_type | - db.unified | - hd.ftimes | - hd.ftk | - hd.generic | - hd.hashkeeper | - hd.knowngoods | - hd.md5 | - hd.md5deep | - hd.md5sum | - hd.nsrl1 | - hd.nsrl2 | - hd.openssl | - hd.plain | - hd.rpm | - group_2 | | | - db.by_file | - db.by_type | - db.unified | - hd.ftimes | - hd.ftk | - hd.generic | - hd.hashkeeper | - hd.knowngoods | - hd.md5 | - hd.md5deep | - hd.md5sum | - hd.nsrl1 | - hd.nsrl2 | - hd.openssl | - hd.plain | - hd.rpm | - group_ ... Each group directory contains two types of sub-directories: db and hd. The hd directories consist of a directory for each hash type supported by HashDig. You will populate these directories with raw hash files that correspond to the named hash type. For example, raw NSRL version 2 hash files are placed in the hd.nsrl2 sub-directory. Currently, HashDig supports the hash types listed here: ftimes hashkeeper knowngoods md5 md5deep md5sum nsrl1 nsrl2 openssl plain rpm Eventually, the db directories will all contain the same data -- the difference is in how that data is partitioned. The db.by_file directory contains one db file for each hd file. The db.by_type directory contains one db file for each hash type (e.g., ftimes, knowngoods, rpm, etc.). Finally, The db.unified directory contains a single db file that contains all hashes for a given group. 5. Populate the appropriate hd directories with raw, uncompressed hash files -- i.e. the raw data. The example below shows the group_knowngood populated with NSRL version 2 hash files and RedHat 7.2 and 8.0 RPM hash files. - hashdig | - references | - group_knowngood | - db.unified - db.by_file - db.by_type - hd.ftimes - hd.ftk - hd.generic - hd.hashkeeper - hd.knowngoods - hd.md5 - hd.md5deep - hd.md5sum - hd.nsrl1 - hd.nsrl2 | | | - rds_2.3_a.raw | - rds_2.3_b.raw | - rds_2.3_c.raw | - rds_2.3_d.raw | - hd.openssl - hd.plain - hd.rpm | - redhat_7.2.txt - redhat_8.0.txt 6. Make the reference db files. You have the choice of making all reference dbs, one unified db, one db per type, or one db for each hd file. Use the command below to make all possible reference dbs. $ make all Use the command below to make one unified db per group. $ make unified Use the command below to make one db for each type in a group. $ make type Use the command below to make one db for each file in a group. $ make file Use the command below to display various db statistics. $ make stats 7. After building your databases, you can bash subject hashes. The following examples show how this might be done: To bash against all by_file databases: for i in `find ${REFERENCES_DIR}/group_*/db.by_file -name "*.db"`; do hashdig-bash.pl -r $i -s subject.db done To bash against all by_type databases: for i in `find ${REFERENCES_DIR}/group_*/db.by_type -name "*.db"`; do hashdig-bash.pl -r $i -s subject.db done To bash against all unified databases: for i in `find ${REFERENCES_DIR}/group_*/db.unified -name "*.db"`; do hashdig-bash.pl -r $i -s subject.db done To bash against all databases: for i in `find ${REFERENCES_DIR}/group_*/ -name "*.db"`; do hashdig-bash.pl -r $i -s subject.db done Closing Remarks All supported targets are documented in the Makefile. Credits This recipe was brought to you by Andy Bair and Klayton Monroe, February 2004. Appendix 1 The following Makefile is used to build and maintain hash reference files. --- Makefile --- ######################################################################## # # $Id$ # ######################################################################## # # Copyright 2004-2004 The WebJob Project, All Rights Reserved. # ######################################################################## # # Purpose: Build and maintain HashDig reference databases. # ######################################################################## # # Targets: # # all # Make unified, by_type, and by_file reference databases. # # clean # Remove all auto-generated hashdig files (i.e. *.{db,hd}). # # clean-hd # Remove all auto-generated .hd files. # # clean-db # Remove all auto-generated .db files. # # by_file # Make a reference database per reference file. # # by_type # Make a reference database per reference type. # # setup # Create the required directory structure for each hash group. # # stats # Calculate and display statistics of all reference databases. # # unified # Make a unified reference database. # ######################################################################## db_unified= db.unified db_by_file= db.by_file db_by_type= db.by_type db_dirs= ${db_unified} \ ${db_by_file} \ ${db_by_type} #FIXME Add a mode to hashdig-harvest.pl that enumerates all supported # types. Replace this list with that output. type_dirs= hd.ftimes \ hd.ftk \ hd.generic \ hd.hashkeeper \ hd.knowngoods \ hd.md5 \ hd.md5deep \ hd.md5sum \ hd.nsrl1 \ hd.nsrl2 \ hd.openssl \ hd.plain \ hd.rpm tmp_dir?= /tmp .if defined(enable_timer) TIME= time .else TIME= .endif #FIXME Add an option to make the hashdig-*.pl tools quiet. ######################################################################## # # Default target. # ######################################################################## all: unified by_type by_file ######################################################################## # # Clean targets. # ######################################################################## clean: clean-hd clean-db clean-hd: @find . -name "*.hd" | xargs rm -f clean-db: @find . -name "*.db" | xargs rm -f ######################################################################## # # Unified database targets. # ######################################################################## unified: @for group_dir in `find . -name "group_*"` ; do \ cd $${group_dir} ; \ for type_dir in ${type_dirs} ; do \ echo '===>' $${group_dir}/$${type_dir} ; \ cd $${type_dir} ; \ type=`echo $${type_dir} | sed 's/^hd\.//'` ; \ for ref_file in `find . -type f | grep -v "\.hd$"` ; do \ make -f ../../Makefile ${db_unified} type=$${type} file="$${ref_file}" ; \ done ; \ cd .. ; \ done ; \ cd .. ; \ done ${db_unified}: ${file}.hd @${TIME} hashdig-make.pl -i -d ../$@/unified.db $? ${file}.hd: ${file} @${TIME} hashdig-harvest.pl -T ${tmp_dir} -c k -t ${type} -o $@ $? ######################################################################## # # By-Type database targets. # ######################################################################## by_type: @for group_dir in `find . -name "group_*"` ; do \ cd $${group_dir} ; \ for type_dir in ${type_dirs} ; do \ echo '===>' $${group_dir}/$${type_dir} ; \ cd $${type_dir} ; \ type=`echo $${type_dir} | sed 's/^hd\.//'` ; \ for ref_file in `find . -type f | grep -v "\.hd$"` ; do \ make -f ../../Makefile ${db_by_type} type=$${type} file="$${ref_file}" ; \ done ; \ cd .. ; \ done ; \ cd .. ; \ done ${db_by_type}: ${file}.hd @${TIME} hashdig-make.pl -i -d ../$@/${type}.db $? ${file}.hd: ${file} @${TIME} hashdig-harvest.pl -T ${tmp_dir} -c k -t ${type} -o $@ $? ######################################################################## # # By-File database targets. # ######################################################################## by_file: @for group_dir in `find . -name "group_*"` ; do \ cd $${group_dir} ; \ for type_dir in ${type_dirs} ; do \ echo '===>' $${group_dir}/$${type_dir} ; \ cd $${type_dir} ; \ type=`echo $${type_dir} | sed 's/^hd\.//'` ; \ for ref_file in `find . -type f | grep -v "\.hd$"` ; do \ make -f ../../Makefile ${db_by_file} type=$${type} file="$${ref_file}" ; \ done ; \ cd .. ; \ done ; \ cd .. ; \ done ${db_by_file}: ${file}.hd @${TIME} hashdig-make.pl -i -d ../$@/${file}.db $? ${file}.hd: ${file} @${TIME} hashdig-harvest.pl -T ${tmp_dir} -c k -t ${type} -o $@ $? ######################################################################## # # Analysis targets. # ######################################################################## stats: @for group_dir in `find . -name "group_*"` ; do \ echo '===>' $${group_dir} ; \ find $${group_dir}/${db_unified} -name "*.db" | xargs ${TIME} hashdig-stat.pl -t db ; \ echo '------------------------------------------------' ; \ find $${group_dir}/${db_by_file} -name "*.db" | xargs ${TIME} hashdig-stat.pl -t db ; \ echo '------------------------------------------------' ; \ find $${group_dir}/${db_by_type} -name "*.db" | xargs ${TIME} hashdig-stat.pl -t db ; \ echo '------------------------------------------------' ; \ done ######################################################################## # # Bootstrap targets. # ######################################################################## setup: @for group_dir in `find . -name "group_*"` ; do \ cd $${group_dir} ; \ for sub_dir in ${db_dirs} ${type_dirs} ; do \ echo "$${sub_dir}" ; \ mkdir -p $${sub_dir} ; \ done ; \ cd .. ; \ done ; \ --- Makefile ---