aff - a new format for storing hard drive images

Upload: pop-ion

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 AFF - A New Format for Storing Hard Drive Images

    1/3COMMUNICATIONS OF THE ACM February 2006/Vol. 49, No. 2 85

    ost forensic practi-tioners work withjust one or a fewdisks at a time. Awife might bring inher husbands lap-top for imaging afew days before she

    files for divorce. Police might raid a drugdealers apartment and seize a PC thats usedfor contacting suppliers. In these cases it is

    common practice to copy the drives con-tents sector-for-sector into a single file,referred to here as raw copy. Some practi-tioners bypass the file entirely, and just makea raw copy to a second disk drive that is thesame size (or larger) than the original.

    Raw images are widely used because theywork with practically every disk forensics toolavailable today. But raw images are not com-pressed, and can be quite large, even if thedrive itself contains very little data.

    The obvious way to solve the data storageproblem is with a file compressor such as gzipor bzip2. But neither supports random accesswithin a compressed file. Because a forensictool requires random access in the same man-ner that a file system requires random accessto a physical disk, disk images compressedwith a file compressor must be decompressedbefore they can be used.

    A second problem with raw images is therecording of metadata. Because a raw image is

    a sector-for-sector copy of a target drive, thereis no place to store additional information inthe file. As a result, information such as the ser-ial number of the target drive, the name of theinvestigator who performed the acquisition,and even the date the disk was imaged must bestored elsewherefor example, in a database.But if metadata is not stored in the image fileitself, there is a chance it will become separatedfrom the image file and lost, or even confusedwith the metadata of another drive.

    STORING HARD DRIVE IMAGES

    M

    By SIMSON L. GARFINKEL

    AFF: A New Format for

  • 8/13/2019 AFF - A New Format for Storing Hard Drive Images

    2/386 February 2006/Vol. 49, No. 2 COMMUNICATIONS OF THE ACM

    The file format used by the popularEnCase forensic tool overcomes

    many of the problems inherentwith raw images. EnCase stores a disk image as aseries of unique compressed pages. Each page can beindividually retrieved and decompressed in the com-puters memory as needed, allowing random access tothe contents of the image file. The EnCase formatalso has the ability to store metadata such as a casenumber and an investigator. Unfortunately, theEnCase format is proprietary; although a few vendorshave tried to reverse engineer the format to providefor some compatibility, such support is necessarilyincomplete.

    Faced with this situation, we designed a new fileformat for our forensic work. Called the AdvancedForensics Format (AFF), this format is both open andextensible. Like the EnCase format, AFF stores theimaged disk as a series of pages or segments, allowingthe image to be compressed for significant savings.Unlike EnCase, AFF allows metadata to be storedeither inside the image file or in a separate, compan-ion file. Although AFF was specifically designed foruse in projects involving hundreds or thousands ofdisk images, it works equally well for practitioners

    who work with just one or two images. And in theevent the disk image is corrupted, AFF internal con-sistency checks are designed to allow the recovery ofas much image data as possible.

    The AFF format is unencumbered by any patentsor trade secrets, and the open source implementationis distributed under a license that allows the code tobe freely integrated into either open source or propri-ety programs. We hope that AFF will be adopted byother tool vendors and become a standard format forstoring disk images.

    AN OPEN, EXTENSIBLE FORMATIn order to create a format that will allow for bothforward and backward compatibility over anextended time, AFF is partitioned into two layers.AFFs lower data storage layerdescribes how a seriesof name/value pairs are stored in one or more diskfiles in a manner that is both operating system andbyte-order independent. AFFs upper disk represen-tation layer defines a series of name/value pairs usedfor storing disk images and associated metadata.

    The original plan for AFFs lower layer was to usean open source b-tree implementation such as Berke-

    ley DB (BDB), the GNU Database Manager

    (GDBM), or a similar system. But BDB and GDBMare distributed under the GNU Public License, whichis unacceptable to most commercial software devel-opers. (A licensed version of BDB with fewer restric-tions can be obtained from SleepyCat software, butnot for free.) The Apache SDBM database has no

    such restrictions, but it creates sparse files that cannotbe efficiently copied between machines. Another con-cern was the layout of information in a b-tree filemight be too complex to explain in court, should theneed arise.

    Instead, we designed a new, simple system basedon a repeatable, variable-length structure called anAFF segment. Each AFF segment consists of aheader, a variable-length segment name, a 32-bit flag,a variable-length data payload area, and a segmentfooter. The segments length is stored in both theheader and footer, allowing for rapid seeking througha file (only the headers and the footers must be read.)The AFF file begins with a file header and ends witha directory that lists all of the segments in the file andtheir byte offsets from the files start. If no directory ispresent, one can be readily constructed by scanningthe file from beginning to end, a process that typicallytakes just a few seconds.

    AFFs disk representation layer defines specific seg-ment names used for representing disk informationand metadata. For example, device_sn is the name ofthe segment used to hold the disks serial number, while

    date_acquired holds the time the disk image wasacquired. Two special segments are accession_gid,which holds a 128-bit globally unique identifier that isdifferent each time the disk imaging program is run,and badflag, which holds a 512B block of data thatidentifies blocks that cannot be read in the disk image.The metadata can be stored in the same AFF file as theimage or in a separate file. Indeed, the schema couldeven be stored in an XML file.

    Tagging bad blocks with a specific pattern is supe-rior to the more common technique of filling badblocks with ASCII NUL characters because it allows

    sectors that are unreadable to be distinguished fromthose that have been manually cleared. On the otherhand, we thought the complexity of having a separatemap of bad blocks was not warranted.

    The image data itself is split into one or more datasegments. All data segments must be the same size;this size is determined when the image file is created.Segments are given sequential namesfor exampleseg0, seg1, seg2and continuing to segn, where n is aslarge as is necessary to store the disk image. The seg-ment size is stored in a segment called segsize.

    AFF data segments can be compressed with the

    open source zlib or left uncompressed; whether or not

  • 8/13/2019 AFF - A New Format for Storing Hard Drive Images

    3/3

    a data segment is compressed or is encoded in the datasegments 32-bit flag. Compressed AFF files consumeless space, but are slower to create and slower to access.The decision to compress or not to compress can bemade at runtime, and uncompressed AFF files caneasily be compressed or vice versa.

    AFFLIB AND AFF TOOLS

    AFFLIB is the open source library that implementsthe underlying AFF system. Rather than forcing theprogrammer to understand segments, data segments,compression and so on, AFFLIB implements a sim-ple abstraction that makes the AFF image file appearas a persistent name/value database and as a standardfile that can be opened, read, and seeked with theaf_open(), af_read() and af_seek() library calls. (In fact,the AFFLIB also supports an af_write() call to make iteasier to write disk-imaging programs.) If af_open() isused to open a non-AFF file, the library defaults to apass-through mode of operation, allowing AFF-awareprograms to work equally well with raw files.

    The AFF source code comes with a set of toolsincluding a disk-imaging program (aimage), a pro-gram for converting AFF metadata into XML(afxml), a program for converting raw images to AFFimages and back (afconvert).

    We have also modified Brian Carriers open sourceSleuth Kit to work with AFF-formatted files. Themodifications are relatively minor and are likely to be

    incorporated into the Sleuth Kits public release.When run interactively, no performance degradationis noticeablenot even on compressed AFF files.

    We have been able to store more than one terabyteof disk images in less than 200GB using AFF. We arenow working to improve the AFF Tools and the per-formance of AFFLIB. More information about AFF,including the source code, can be found atwww.afflib.org.

    Simson L. Garfinkel([email protected]) is a fellow at theCenter for Research on Computation and Society at Harvard

    University, Cambridge, MA. He is also a founder of SandstormEnterprises, a computer security firm that develops advanced computerforensic tools used by businesses and governments to audit theirsystems.

    Permission to make digital or hard copies of all or part of this work for personal or class-room use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and the full citation onthe first page. To copy otherwise, to republish, to post on servers or to redistribute tolists, requires prior specific permission and/or a fee.

    2006 ACM 0001-0782/06/0200 $5.00

    c

    COMMUNICATIONS OF THE ACM February 2006/Vol. 49, No. 2 87

    THEAFF FORMAT ISUNENCUMBERED BY ANY PATENTS OR

    TRADE SECRETS, AND THE OPEN

    SOURCE IMPLEMENTATION IS

    DISTRIBUTED UNDER A LICENSE THAT

    ALLOWS THE CODE TO BE FREELY

    INTEGRATED INTO EITHER OPEN SOURCE

    OR PROPRIETY PROGRAMS. WE HOPE

    THATAFF WILL BE ADOPTED BY

    OTHER TOOL VENDORS AND BECOME A

    STANDARD FORMAT FOR STORING

    DISK IMAGES.