by: dr. siegfried hackel & tobias schaefer (ptb) and dr ... · “the paperless office is just...

18
A AR RC CH HI I S SA AF FE E Legally compliant long-term preservation of electronic documents A German e-Government Project initiated and advised by CSC experts By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr. Wolf Zimmer (CSC) “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich von Pierer, is often quoted. However, one cannot overlook the fact that in the modern offices of public administration, the employees are carrying increasingly less documents around with them and are having less copies initialed. The administrative employees are, in contrast, more and more often sending out electronic document links, opening messages for processing in the virtual mail basket or having a software system credit and send invoices automatically. Today, it is hardly impossible to imagine public administration without electronic documents. The PC, and with it the processing, delivery, and storage of digitized information, have long since conquered the offices of the public service. Due to the broad-scale introduction of the modern IT-supported processing of files, the files in paper form and the office as the organisational base unit of classic administration are increasingly losing their initial importance. Electronic processes are increasingly replacing the ring binder and the interoffice slip. More and more information is produced digitally, and the paper printout is very often only one possible representational form of the electronic document - and often only a haptic habit. “E-Government”, yesterday still a buzzword, is today already the perfect example of a counter-model to the classic, strictly function-oriented bureaucracy according to Max Weber. Visible thereby is today, at most, the tip of the iceberg of future electronic administration - which so far has not even begun to utilize all its possibilities of electronic communication and administrative structures. But it is already clear that highly sophisticated electronical front-office innovations on the basis of modern Internet technologies are no use at all if, in the long run, ring binders, fax and file cards still prevail in the background (in the so-called "back office"). This "back office" is the central production site of the administration, the organization of the "working state", and its main product is the file. Hence, it will also be the file which will be mostly affected by the introduction of modern administrational structures on electronical basis, due to the fact that it has been chosen to be the means which shall enable and support in future the unimpeded and fast exchange of data and information between the different units of the administartion, for the benefit of the citizens and of economy. 30-03-2007_ArchiSafe_White_Paper_V12.doc Page 1 of 18

Upload: others

Post on 09-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

AARRCCHHIISSAAFFEE Legally compliant long-term preservation of electronic documents

A German e-Government Project initiated and advised by CSC experts

By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr. Wolf Zimmer (CSC)

“The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich von Pierer, is often quoted. However, one cannot overlook the fact that in the modern offices of public administration, the employees are carrying increasingly less documents around with them and are having less copies initialed. The administrative employees are, in contrast, more and more often sending out electronic document links, opening messages for processing in the virtual mail basket or having a software system credit and send invoices automatically. Today, it is hardly impossible to imagine public administration without electronic documents. The PC, and with it the processing, delivery, and storage of digitized information, have long since conquered the offices of the public service. Due to the broad-scale introduction of the modern IT-supported processing of files, the files in paper form and the office as the organisational base unit of classic administration are increasingly losing their initial importance. Electronic processes are increasingly replacing the ring binder and the interoffice slip. More and more information is produced digitally, and the paper printout is very often only one possible representational form of the electronic document - and often only a haptic habit. “E-Government”, yesterday still a buzzword, is today already the perfect example of a counter-model to the classic, strictly function-oriented bureaucracy according to Max Weber. Visible thereby is today, at most, the tip of the iceberg of future electronic administration - which so far has not even begun to utilize all its possibilities of electronic communication and administrative structures. But it is already clear that highly sophisticated electronical front-office innovations on the basis of modern Internet technologies are no use at all if, in the long run, ring binders, fax and file cards still prevail in the background (in the so-called "back office"). This "back office" is the central production site of the administration, the organization of the "working state", and its main product is the file. Hence, it will also be the file which will be mostly affected by the introduction of modern administrational structures on electronical basis, due to the fact that it has been chosen to be the means which shall enable and support in future the unimpeded and fast exchange of data and information between the different units of the administartion, for the benefit of the citizens and of economy.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 1 of 18

Page 2: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

CHANCES AND RISKS Electronic documents have, of course, great advantages over conventional documents on paper. They can be copied without problems as often as desired and need little storage space. They may be transported very quickly and economically even over long distances, and they can be processed by several persons at the same time or even automatically by computing machines. Compared with the time paper-bound files need when sent by interoffice mail, this type of documentation drastically increases the processing speed and the quality of information transmission within the public service. E-government processes based on electronic documents and files are primarily characterised by lean processes which finally result in a higher citizen satisfaction, ease of work for administrative employees and, last but not least, a decrease in costs. The decisive advantage of electronic documents of being immediately machine-readable and able to be transported in seconds as simple bits and bytes, even over great distances, however proves to be also their decisive weakness. Digital information is not only volatile because of being virtual per se, but it can also be manipulated easily and without being noticed. In other words, a consequence of the electronic documents, represented nowadays by bits and bytes, is that they are not perceivable and that their history cannot be traced back. Therefore, they can be altered easily and tracelessly. This applies to the contents of electronic documents just as much as to the time of their compilation or the identity of their author. If further progress in e-government, on the basis of electronic files, is desired, then it is necessary to be concerned already today about how to ensure the authenticity and integrity of digital information at least for the periods being legally prescribed for the preservation of files. This task is far more comprehensive than to consider simply the type and magnitude, the accessibility and the connection of an electronic storage to processing systems. We have first and primarily the following problem to solve: How much legal security and revision capability do certain administrative tasks require, and for how many years? This problem exists, of course, not only in public administration and in e-government. The technical difficulty resulting from the digitalization of information exists in all branches of industry - from the financial service area and the aviation industry to the pharmaceutical industry. Information technology is here obligated to find new solutions to a problem which it has caused itself due to the digitalisation. And this was precisely the aim of the ArchiSafe project carried out at the PTB, the National Metrology Institute of Germany.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 2 of 18

Page 3: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

MELODI – METROLOGICAL SERVICES ONLINE PTB's "ArchiSafe" project was directly triggered by the establishment of the Internet service "MELODI - metrological services online (testing, approvals, accreditations)", which was started within the scope of the E-Government Initiative "BundOnline 2005" in Germany.

Fig. 1: Schematic representation of PTB tasks

By means of MELODI, parts of the processes taking place within PTB (e.g. submission of applications, exchange of documentation, status of processing, certificate notification, timesheets, accounting, etc.) are made accessible to the external customers via the Internet.

For the documents compiled within the scope of this service - e.g. certificates, invoices, test results, official notifications and other sovereign documents - very different retention periods are prescribed. National type approvals, for example, have - according to the German Verification Act - an unlimited validity. Some types of measuring devices for electricity, e.g. household electricity meters, have been in use for more than 40 years already. For these certificates, a record retention obligation for more than at least 30 years exists. Invoices and budget-relevant documents must be retained for at least 10 years.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 3 of 18

Page 4: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

For the electronic filing of such documents, which are legally valid and provide the basis for the issuing of an invoice, an electronic archiving solution has to be created which, apart from the secure storage of the electronic documents, fulfils also the requirements of the German Signature Law and of the German Signature Ordinance, particularly for digitally signed documents. The name of the project, "ArchiSafe", thus revealed at the same time its purpose. INITIAL SITUATION - LEGAL AND TECHNICAL FRAMEWORK CONDITIONS The central principle of public administrative processes is the "Requirement for the Keeping of Files in Public Administration", which says that it must at all times be possible to gather from the files the complete and true status of an administrative process (case). This principle requires that any communication essential for an administrative process be laid down in writing, so that everybody - and not only the responsible official - can reconstruct and trace back the history of any administrative case. This central requirement for administrative processes applies undoubtedly to the electronic file, too. Also in electronic form, the official documents of public authorities have to fulfil the criteria of completeness, integrity and authenticity, collective retention of task-related papers which belong together, traceability and legitimacy of the administrative process. Like their predecessors in paper form, electronic files, too, have to fulfil an evidencing function. The "value" of electronic documents and records results - from the point of view of the electronically supported processing of files and cases - primarily from the immediate legal or economic relevance, in the sense of implementation and the preservation of legal rights which are connected with the electronic documents and transactions. For legal electronic administrative practice, of importance, in addition, is the fact that electronic documents do not at first have any lawful document quality. With respect to the distribution of legally valid services via the Internet or other electronic carriers, a legal obstacle would result therefore. The German legislator has solved this problem with amendments to law which describe primarily the legal requirements electronic documents have to fulfil to become legally valid. Furthermore, these amendments to law define the requirements which have to be fulfilled to guarantee the integrity and authenticity (the proof value) of electronic documents. The core pieces of the new legal regulations are, besides the Signature Law (SigG) of May 22, 2001 and the Signature Ordinance (SigV) of October 24, 2001, the amendments (supplements) to the form requirements of the Civil Code (BGB), the facilitation of proof within the scope of the Judicial Code (ZPO) and the adaptation of the Administrative Procedure Law of the German Federal Administration (VwVfG).

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 4 of 18

Page 5: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

Whilst the Signature Law and the Signature Ordinance stipulate the requirements of electronic signatures in general, the Form Customization Law regulates in which cases the digital signature can replace the personal signature. The value of electronic documents has until now been impaired by the lack of legal validity. This situation has changed in general by the legal equivalence of the digital and the personal signature. The person-related digital signature in connection with the digital time-stamp in today's world provides the evidence of the authenticity and integrity of electronic documents and thus closes the gap for complete and legally secure electronic administrative transactions. The digital signature guarantees for the "authenticity" of electronic documents in the world of virtual (because electronic) documents and files. The transition from paper-based documents to electronic media and from the paper file to an electronic document infrastructure with electronically signed documents remains, however, incomplete without an adequate electronic "archive" which is capable of ensuring the long-term preservation and availability of the electronic information. Due to the continuing technological development in the computer and software industry, the lifetime (i.e. the availability) of electronic documents and the security of digital signatures are temporally limited. The exponential rates of increase of the processing capacity of modern computer systems weaken, of all things, the encryption methods which underlie the digital signatures, and in this way also the proof value for the authenticity of an electronically signed document. The encodings of the legendary ENIGMA machine in the Second World War, for example, are cracked today with customary PCs from the supermarket. Furthermore, the availability and preservation of electronic information depends highly on the possibilities and limits of the technology used for the storage and presentation of the data. This applies to both, to the hardware, i.e. the storage media used, and to the software used. The increased technology dependence becomes particularly clear in the so-called "presentation problem". Data available in electronic form have to be interpreted and then presented by corresponding software. If software is configured only slightly differently, this frequently results in different interpretations of the same data. Without a legally secure archiving system, a "depreciation" of electronic documents would therefore inevitably result. In other words, because electronic documents normally remain with the authority until the expiry of the retention period, the requirements for the safe storage of electronic documents cannot be seen only from the point of view of fast and reliable access for the purpose of processing. We have to take into account also the aspect of long-term safeguarding of the readability and the loss-free reproducibility of the electronic document. This is not simply about storing digital objects, but rather about the long-term electronic filing of digital documents - whose purpose it is to be

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 5 of 18

Page 6: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

able to use digital documents also at some later point in time. In addition to that, the responsible authority has to prevent - by means of suitable measures - a subsequent falsification, manipulation or deletion of the digital documents to be stored, at least until the documents are delivered to the responsible final archive. NO SOLUTION IN SIGHT? As far as the long-term availability of electronic documents is concerned, it seems to make sense - at first sight - to continue to use the original hard- and software, or some closely related software that can read it. The problem is, of course, that such application software becomes obsolete. Furthermore, even if we save obsolete software and the operating environment in which it runs, operating that software requires specific computer hardware, which also becomes obsolete just as quickly as the software. This is the crux of the technical problem of preserving digital information. The failure even of only a single, no longer replaceable, component allows this strategy to fail. A further possibility discussed by the experts is the so-called emulation, i.e. the "mimicking" of obsolescent hardware in software. However, "emulation" - or creating software that performs the functions of obsolete hardware and other software - has also been rejected by many digital preservation experts as unlikely to be cost-effective in view of the current plethora of hardware and software (not to mention the hard- and software which has not been developed yet). Another strategy currently favoured is continuous format migration, i.e. the periodic transfer of digital data from one hardware/software configuration to another or from one computer technology to the next. Media refreshing is part of migration, but migration involves the transfer of the entire digital environment, not just the physical storage medium. Migration is necessary each time the operating environment, including the hardware and software, changes. In addition to this, migration strategies vary with the type of data being migrated. In a simple case this means, for example, that a document formerly generated within an old operating system of Microsoft , e.g. the software program "MS WORD 2.0", has to be transformed into a current document format. In addition to the fact that for realizing such a transformation an exact and deep knowledge of the available format specifications is necessary - a vision which certainly fascinates the Gates Company - data losses cannot be ruled out for such operations. Such an idea, however, is - due to the legally stipulated proof function of electronic documents - not satisfactory for public administration, not to mention the costs ensuing at every migration.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 6 of 18

Page 7: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

Moreover, the long-term availability of electronic documents is burdened also by permanent technical innovations. Who can still access a 5-inch custom floppy disk at his PC today, for example? Optical media such as CD-ROM or DVD have a lifetime of 100 years; data security, however, can only be guaranteed for 20 years. The forecasts of the longevity of optical media are based on the models of the Swedish scientist Svante August Arrhenius and refer to a 50% failure rate at a temperature of 25°C and a relative humidity of 50%. Scratches and damages - as happen more frequently in everday life - are not taken into account in such approaches. Imagine, for example, we had printed out the original Gutenberg Bible in the years 1452 and 1454 on similarly "perishable" data carriers as the ones used today. In such a case, we should have had to migrate the Bible at least 5-6 times already to this day. Apart from this - culturally -simply horrifying idea, could we be sure that these “migrated” copies still would be the well-known Gutenberg Bibles? How many transformation errors would have occurred? Not only in the text, but also - not difficult to discover - in the lovely and elaborate illustrations? In other words, if we - to please technical progress - decide to follow the path into the beautiful new digital world, are we then also at the mercy of "digital oblivion"? Fortunately, the answer is "No". But we have to learn to equip electronic documents with all attributes and data that are needed for a sustainable preservation and availability of electronic data and information. This applies to both - the data content as well as to the information on the context and representation. Such a strategy makes it possible to - at least reduce - the dependence on technological innovations and to support the long-term preservation of digital information sustainably above all by the loss-free safeguarding of the data and information. In view of this, some requirements arise for the construction of electronic file cabinets and archives to be filled with digital documents. USE STABLE AND STANDARDIZED DOCUMENT FORMATS ONLY A sustainable electronic document infrastructure must at least support the long-term retention and usability of electronic documents. This means that the access to electronic documents has to be possible with a tolerable economic and temporal effort, even for longer periods of time. For the permanent storage of electronic documents, therefore, only a few electronic file formats should be used. Using many different data types within the scope of long-term storage increases

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 7 of 18

Page 8: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

the risk that already in the near future it will no longer be possible to reproduce certain data types true to the original. This would result in the loss of the authenticity of the filed documents. ARCHISAFE therefore recommends - depending on the content and types of the data to be archived - the following document formats for long-term storage (for further details, see under: http://www.archisafe.de, ArchiSafe Specification, Long-Term Document Formats):

• TXT (ASCII 7-bit) for simple text documents, metadata and master data of business applications (as in ERP systems, for example)

• PDF format (preferably PDF-A) for so-called coded information (CI). This document format can also be used platform-independently and is capable of saving the graphic information as well as the text information. In this way, even a full text search remains possible. The PDF format furthermore offers useful functions, such as the embedding of digital signatures or XML metadata. In addition, PDF-A has been published and is now accepted as a general standard for the long-term storage of electronic documents (see also ISO 19005-1 "Document management-Electronic document file format for long-term preservation- Part1: Use of PDF 1.4 (PDF/A-1)".

• TIFF- and/or PDF format for documents in the so-called non-coded information formats (NCI) and

• XML as a tagging language for metadata and data sets to be archived.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 8 of 18

Page 9: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

METADATA STRUCTURES AND INTERFACES

Fig. 2: ArchiSafe diagram of metadata (graphical) For the permanent retention of electronic documents we need, of course - besides the actual documents - an infinite amount of additional information which reveals and describes, beyond all doubt, the reason for the establishment of a document, the administration authority responsible for its proceeding, or which documents of an individual administrative proceeding (case) or administrative decision belong together. Such so-called "administrative metadata" are indispensable for a later reconstruction and traceability of the administrative action and therefore have to be retained permanently and reliably. And of course, it is advisable for this type of information to also choose a data format which supports the long-term retention and usability of the archived documents. The so-called "SAGA - Standards and Architectures for E-Government Applications" published by the German Ministry of the Interior recommend describing and realizing metadata and data interfaces to third party systems as a matter of principle via XML and corresponding scheme definitions. For the communication between the administrative process and the electronic archive, ArchiSafe therefore uses also XML as a descriptive language for complex archive objects which

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 9 of 18

Page 10: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

are self-explanatory via an XLM-scheme and thus obtain all important and sufficient information which will be needed for an access at a later date (see Fig. 2). The description of the archive objects by means of a valid XML scheme promises the following advantages:

• The archive object can be evaluated for syntactic correctness before it is delivered to the electronic long-term storage system.

• The metadata of a special administration process (case) or authority can be extended with little effort by expanding or including XML schemes.

In the simplest case, such an archive object contains the version number and the assigned XML scheme as well as a block which contains the data regarding the content (object block) and, possibly, one or several signature blocks. The object block itself can, in turn, contain one or several documents which are embedded in XML. Each block contains megadata as introduction, in which, for example, the document ID and the description of the document context and its origin can be stored. Optionally, a block is provided in the ArchiSafe container for metadata which give necessary information for the transmission to the National Federal Archive. The document itself is in general archived as a standard PDF-A, which has to be converted into a native text format (MIME Base64) before it can be embedded in XML. By storing all metadata - together with the record content - in one encapsulated object, it is ensured that the metadata are always stored and transported together with the record. Furthermore, this simplifies the long-term management and assures that the retrieved record (archived object) remains physically self-explanatory. As far as memory-intensive binary data are concerned, it is recommended - not at least also for reasons of performance in the case of frequent access to the long-term storage - that the binary data be provided as a referenced appendix (attachment) in the XML data stream. In this case, the object block must contain an unambiguous reference to the binary file then archived additionally. Moreover, according to the ArchiSafe concept, the data contents (document contents) can also be filed in several different document formats inside or outside the XML package.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 10 of 18

Page 11: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

Fig. 3: ARCHISAFE Metadata "tree" The detailed specifications for such an ArchiSafe-compliant XML-scheme have been published under the title "ARS Meta Data Structure and ARS SML Scheme" on the ArchiSafe homepage http://www.archisafe.de (whereby ARS stands for ArchiSafe Record-Keeping Strategy). PRESERVATION OF THE EVIDENTIARY VALUE OF ELECTRONIC DOCUMENTS An electronic document infrastructure must guarantee legal security - especially the authenticity and integrity of the electronic documents - permanently until the end of the retention period which is relevant for administrative processes. Due to the high falsification risk, the evidentiary value of electronic documents and electronically transmitted documents is per se very low. Electronic documents in general are not proof-suitable, i.e. they do not offer any legal security. This requires safeguarding measures which are suitable for guaranteeing the integrity, authenticity and confidentiality of electronic documents.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 11 of 18

Page 12: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

The conventional means - signature and envelope - are naturally not useable; they must rather be mapped electronically. Cryptographic methods in form of the digital signature, the digital time-stamp and the digital encoding offer the corresponding possibilities. The currently used signature methods are based on an asymmetrical cryptography. Although they cannot prevent manipulations of electronic documents, they, however, reliably prevent traceless changes and thus increase the proof suitability of the digitally signed documents. Moreover, there are specific alleviations of the burden of proof in the Judicial Code (ZPO) particularly for electronic documents which are digitally signed with so-called “qualified” digital signatures, i.e. person-related digital signatures certified by the certification authority (CA). "ARCHISAFE" USES "ARCHISIG" Digital signatures in principle offer the possibility of ensuring the integrity and authenticity of digital data. In contrast to paper-based documents, the proof suitability of digitally signed documents can, however, decrease in the course of time. This is caused particularly by the fact that the applied cryptographic algorithms and cryptographic keys lose their security-eligibility in the course of time. Beside this, the directories and certificates necessary for checking the origin and assignment of the keys are not available (not retained) normally over time periods of more than 30 years. In a former project named ArchiSig (http://www.archisig.de) - promoted by the Federal Ministry of Economics and Technology - existing storage concepts and technologies were taken up and extended for supporting the safe long-term preservation (for 30 years or more) of digitally created and digitally signed data. Within this project, system architectures based on new technical components and organisational concepts were developed for ensuring the secure and economic renewal of digital signatures. The applicability of the concepts and of the technology was proven by means of a prototypical realization in the University Clinic of Heidelberg and other facilities of the health administration, but also by comparative investigations in the public administration and judiciary of the federal state of Lower Saxony, as well as by means of proof-reports based on fictitious court cases. The concept underlying ArchiSig is based on the use of so-called hash trees (Fig. 4). How does such a tree work? The software application which generates and signs the electronic documents stores the document in the archive system, including the signature and all further information. The archive system calculates the hash value (or message digest) of every archive object (in the case of ArchiSafe the entire XML record), which in turn is the corresponding digital "fingerprint" of the document. These individual hash values are then subsequently used for building a hash-tree (Fig. 4), whose top is "sealed" with a digital time-stamp.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 12 of 18

Page 13: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

Such hash values have, above all, two properties which are important for this task:

1. Such a hash value is unambiguous, i.e. every change in the data from which the hash value is calculated leads inevitably to a changed hash value. In other words, manipulations of the file therefore do not remain unnoticed.

2. The calculation of a hash value is a one-way function, i.e. you cannot infer the contents of the document or file from knowing the hash value. This is primarily important if, for legal reasons, individual documents have to be deleted. You will be able to delete individual documents from the archive without problems without destroying the proof value of the tree at all. As long as the corresponding hash value remains in the tree, the seal is still valid for all "branches" and "leaves" of the hash tree and in this way guarantees the integrity of the remaining documents in the archive. And, vice versa, you cannot infer the contents of the deleted document from the existence of the hash value.

Fig. 4: Project ArchiSig But what about the fact that the hash values and "seals" are also based on the use of "aging" algorithms? The advantage is that the signature renewal and the security risk of a variety of documents have been reduced to an easily manageable number of "seals" which lock the

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 13 of 18

Page 14: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

individual hash trees. As long as these "tree seals" are not broken, the individual hash values and thus also the documents are safe. In case that, due to the decreasing security of the signature algorithms according to Section 17 of the Signature Law, a signature renewal must be carried out, then only the "seals" have to be replaced. The hash values in the "trees" are not affected and an unnoticed manipulation of single documents can still be ruled out. The risk of a complete hash tree renewal can be reduced by building two redundant trees with different hash functions. Thus, even if a hash algorithm becomes unsafe, the second tree is still valid and there is plenty of time for recalculating the hash values of all documents in the first tree. ARCHISAFE - TECHNICAL OVERVIEW Viewed technically, the ArchiSafe concept and the solution realised at PTB in 2005 are based on a service-oriented, multi-tier and client-capable software architecture (Fig. 5)

Fig. 5: AARRCCHHIISSAAFFEE architectural overview

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 14 of 18

Page 15: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

A professional software application (e.g. a document management system or a process-handling system) serves as platform and leading system for document administration, case creation and as interface to the long-term storage system. The software application initiates the request for filing the electronic documents in the electronic archive and manages an individual document identifier (“wardrobe brand”) generated by the archive system for any document to be filed in the long-term storage system. After successful storage, the software application connects the document identifier with the remaining document cases and process data used for the operative processing of a case. The software application, e.g. a document management system, communicates with the actual long-term filing system through a unique interface (the archive service) for submitting the objects to be archived (documents, records, files) to the long-term storage unit. This archive service, which acts as an archive hub, constitutes a uniform entry point for all the data packages which are to be stored in a long-term-storage unit. At the same time, the archive service constitutes a uniform anchor point for the archive objects to be stored in the long-term storage system. For retrieving an archived document, the leading software application simply submits the document identifier (“wardrobe brand”) within a request to the long-term storage system. This way it does not matter for the ArchiSafe concept whether the document management system is a product of the companies OpenText, Fabasoft, SER or SAP, or if the long-term storage unit is delivered by companies such as IBM, HP or EMC. The logical decoupling of the software application and the filing system by the archive service and the agreement on uniform and standardized archive data formats makes the connection of new procedures (software applications) as well as changing the archive hardware much easier. The generation of the standardized archive objects (XML records) and the communication with the archive service is carried out in technical service interfaces (service adapters/service facades). The archive service itself, which performs the import of an archive package into the long-term filing system, is realized in a system-independent middleware component (Fig. 5) and is quasi an archive hub. The archive service examines and processes the archive objects which are submitted by the software application and which shall be stored in the long-term filing unit. These archive objects are based on standardized data types and data formats as well as syntactical and semantic specifications for the data structures of the objects - to be filed in long-term storage -defined in a valid XML scheme. Beside this, the archive service may request additional cryptographic functions such as digital signatures, evaluations of digital certificates and digital time-stamps, if necessary. The archive

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 15 of 18

Page 16: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

service mainly consists of an XML processor with defined interfaces (i.e. communication channels) to both the software application and the long-term storage unit. Moreover, the archive service is to enable the connection of additional services such as signature services (for producing and evaluating digital signatures) and a digital time-stamp service. These cryptographic services sign - at the request of the leading software application - the documents to be filed in long-term storage or seal these documents with a digital time-stamp. In addition, they are capable also of verifying the signatures and signature certificates of the signed documents, also at the request of the software application. After obtaining the verification results, the archive service in turn embeds the verification data in a standardized format in the archive objects for later proof functions. Within the scope of the project ArchiSafe, the cryptographic services are primarily realized by the core system of a basic cryptographic component of the German E-Government Initiative. The final storage in the electronic long-term filing system can be combined optionally with a digital time-stamp for the documents or archive records. Beside this, ArchiSafe uses the timestamp service for renewing digital signatures according to the rules and results of the ArchiSig project (Section 17 SigV), which are classified as legally compliant and approved (http://www.archisig.de). The real long-term storage system is situated finally in the back-end and saves the documents together with the accompanying administrative and technical metadata, packaged in a valid XML record. Parallel to this, copies of the documents and metadata may be held for further processing by the leading software application. By using a unique document identifier (document ID or so-called "wardrobe brand"), the long-term storage system ensures that the stored "original" documents can be retrieved at any time from the processing software application. This procedure guarantees the secure and legally compliant storage of digital documents without overloading the long-term storage system with any process-specific logic. For realizing a multi-client capability solution, the "wardrobe brand" can be connected in addition with a unique identifier of the leading software application. In this way, the archive service middleware authorisation concept will enable the reliable prevention of any inadmissible access to the archived data. For searching and presenting the archived data independent of the leading software application, a search and presentation service can - on request - be implemented in the archive hub supplementarily, especially if a multi-client-capable solution is striven for. Such a service would enable the reconstruction of the archived data even in the case of failure of the leading software system.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 16 of 18

Page 17: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

RESULTS AND OUTLOOK The project ArchiSafe of the PTB (National Metrology Institute of Germany) supports and promotes the introduction of national standards for the legally compliant long-term storage of electronic documents. ArchiSafe has created a standardized data interchange format for the long-term storage of user data (contents data, metadata and signature data) in a well-formed and valid XML record. Within the scope of the ArchiSafe project, the representatives of more than 20 different federal and regional authorities - together with the representatives of the BSI (Federal Office for Information Security), the Federal Network Agency and the KBSt (Coordinating and Advisory Agency of the Federal Government for Information Technology in Public Administration) - have discussed, evaluated and published requirements and solutions for implementing a legally compliant long-term storage solution for electronic documents, which fulfills also the requirements of the German Signature Law (for further details, see http://www.archisafe.de). In summary, it is noteworthy that ArchiSafe isn’t only for paper-ware. ArchiSafe has successfully proven its functionality in December 2005 by implementing and executing the published software reference architecture. At present, the ArchiSafe concept enjoys increasing interest shown by the federal states and by the German economy, and will also be supported in future by reputable companies, big players in the information technology. Finally, the ArchiSafe project was initiated and provided with professional consulting by CSC experts within the framework of the German E-Government Initiative BundOnline 2005.

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 17 of 18

Page 18: By: Dr. Siegfried Hackel & Tobias Schaefer (PTB) and Dr ... · “The paperless office is just as far into the future as is the paperless rest room", so the former Siemens boss, Heinrich

CONTACT PERSONS FOR THE PROJECT Dir. u. Prof. Dr. Siegfried Hackel Physikalisch-Technische Bundesanstalt (PTB) Department Q.4 Information Technology Bundesallee 100 38116 Braunschweig Germany Tel.: +49 (531) 592-8400 Fax.: +49 (531) 592-8406 Mailto: [email protected] Dipl. Wirt.-Inform. Tobias Schaefer ArchiSafe project leader Working Group Q.43 Databases Bundesallee 100 38116 Braunschweig Germany Tel.: +49 (531) 592-2456 Fax.: +49 (531) 592-692456 Mailto: [email protected]

30-03-2007_ArchiSafe_White_Paper_V12.doc Page 18 of 18