putting our headers together: a report on the tei header meeting 12 september 1997

Computers and the Humanities33: 39–47, 1999.© 1999Kluwer Academic Publishers. Printed in the Netherlands.

39

Putting Our Headers together: A Report on the TEI

Header Meeting 12 September 19971

LOU BURNARD and MICHAEL POPHAMComputing Services, Oxford University, 13 Banbury Road, Oxford OX2 6NN, UK

Abstract. The TEI Header plays a vital role in the documentation and interchange of TEI con-formant electronic texts. Moreover, this role is becoming increasingly important as more peoplefollow the recommendations set out in TEI P3, and libraries, archives, and electronic text centresseek to share their holdings of electronic texts. However, the fact that TEI P3 allows for flexibilityin the structure and content of TEI Headers has meant that divergent practices have begun to emergewithin the numerous projects and initiatives creating TEI texts. With this in mind, the Oxford TextArchive hosted a one-day colloquium of leading TEI exponents, at which invited participants wereencouraged to share their views and expertise on creating TEI Headers, and work together to developsome recommendations towards good practice.

Key words: TEI, TEI Headers, metadata, encoding standards, cataloguing, good practice

1. Introduction

An increasing number of electronic text centres, libraries, and archives from aroundthe world are deciding to follow the principles and practices outlined in TEI P3(Sperberg-McQueen and Burnard, 1994), and hence seeking to adopt the effectiveuse of the TEI Header as a means of describing and documenting electronic textualresources. Metadata of the kind described by the Header has a vital part to play ininformation management and retrieval – yet with respect to the format and use ofthe Header, variant practices abound.

For existing and potential users, the flexibility offered by TEI P3 is one of itsmost attractive features. TheGuidelinesallow for widely divergent approaches tothe basic issues of encoding electronic texts, and providing metadata in the formof TEI Headers. This is entirely appropriate for a general purpose scheme, andfor individual scholars seeking only a scheme capable of expressing their (oftencomplex) analytic needs. However, for implementers working within a commonframework and with similar objectives, this generality and expressiveness imposesan additional burden. Such implementers must identify a mutually acceptable codeof practice for the application of the scheme to their needs, or compromise oneof the very purposes for which the TEI scheme was designed, namely the mutualinterchangeability of texts and their associated Headers.

40 LOU BURNARD AND MICHAEL POPHAM

This paper presents the results of an attempt to address this problem at source,by bringing together an initial core of expert TEI Header creators with the explicitaim of sharing their expertise and, if possible, co-ordinating their practice.

2. Background to the Meeting

Many of us who work in the expanding community of electronic text providersare well-aware of the potential usefulness of TEI Headers. However, in order tointegrate electronic text collections into other resources (for example, cataloguesof conventional paper-based library holdings), it is often necessary to map selectedinformation from TEI Headers onto some other well-established resource catalogu-ing standard (such as MARC), or to an emergingde factostandard such as theDublin Core element set.

Although standards such as MARC are more familiar to existing library cat-aloguers, they lack the extensibility and flexibility that creators of TEI Headersregularly use to describe and document an electronic document, its source, and theprocess of its creation and revision. Similarly, although the Dublin Core may proveto offer a reasonable mechanism for describing many of the resources availableon the internet, it lacks the formalism and descriptive power that is available tocreators of TEI Headers. However, it was never the intention of this meeting toconcern itself with the relative merits (or otherwise) of standards such as MARCand Dublin Core, except perhaps with regard to the ease with which it would bepossible to record and extract the data they require from within a TEI Header.Where such metadata standards clearly do converge with the process of creatingTEI Headers, is in the area of the data themselves, e.g. the checking of an author’sname against a suitable authority file so that the data, when extracted from the rel-evant element of a TEI Header, would satisfy the requirements of a typical MARCrecord.

For many projects, a translation-based approach – dependent upon generatingMARC records or Dublin Core metadata by (automatically) extracting appropriateinformation from TEI Headers – appears to be merely an interim solution, until itbecomes clear how to interface directly, say, a database of TEI-conformant texts,with a Z39.50 compliant client/server system. In addition, there are also a growingnumber of cases where the goal is to integrate a number of TEI-conformant textcollections, but this work is hampered by the fact that different practices have beenadopted when creating valid TEI Headers.

The Oxford Text Archive (OTA), established at Oxford University ComputingServices in 1976, has one of the world’s largest collections of TEI-conformantelectronic texts. Over the long life of the OTA, five different flavours of TEI-conformant Headers have been used to document electronic resources – to saynothing of the many texts in its holdings for which very little or no metadata ofany kind is available. Following its recent appointment as a Service Provider forthe UK-based national Arts and Humanities Data Service (AHDS), the OTA is

TEI HEADER MEETING 41

now required to standardize its practice relating to the creation of TEI Headersas a means of integrating the OTA’s holdings with those of the four other AHDSService Providers. (More detailed information on this topic is provided in the paperby Alan Morrison and Jakob Fix, which was also given at the TEI 10 conferenceand is included in this issue ofComputing in the Humanities). The Oxford TextArchive is also keen to strengthen its associations with other electronic text centresworldwide, not least to provide a better service for users of the AHDS, by agreeingto the mutual exchange of TEI-conformant texts for integration into our respectivecollections.

In light of the fact that the Oxford Text Archive was in the process of review-ing its own policies and practices with regard to the creation of TEI Headers,and with a mind to the forthcoming TEI Tenth Anniversary User Conference,we decided to invite representatives from a number of electronic text centres andtext creation projects, to a dedicated TEI Header meeting. The attendees were:Nick Finke (CETL, University of Cincinnati College of Law), Julia Flanders(WWP, Brown), Peter Flynn (CELT, University College Cork), Richard Gart-ner (Bodleian Library, Oxford), John Price-Wilkin (Humanities Text Initiative,Michigan), Laurent Romary (Silfide, Loria-CNRS), David Seaman (ElectronicText Center, Virginia), Michael Sperberg-McQueen (TEI, UIC), and Perry Willett(Indiana Library Electronic Text Resources).

3. Objectives

The main objective of the meeting was to make some progress towards facilitat-ing the interchange of TEI Headers (as a minimum), between some of the majorproducers and distributors of scholarly, electronic, TEI-conformant texts. Prior tothe meeting a number of steps by which such progress could be achieved wereidentified, and these are listed below:

• Agreement on what should constitute aminimalTEI Header which would beacceptable to all of the participants at the meeting.

• Agreement on what should constitute anoptimalTEI Header, which would bethe goal of all the participants at the meeting for the purposes of interchangeand Header sharing (although clearly projects may wish to have much moresophisticated and varied Headers for their own internal use).

• Documenting the above agreements, possibly in the form of aGuide to GoodPracticefor creating TEI Headers which could be widely distributed through-out the academic community (via the Web, and possibly as a printed documentproduced under the auspices of the AHDS).

This paper presents a summary of the findings of the meeting, reporting on thedegree of consensus achieved, and any major problem areas identified. Each of theparticipants was asked to provide the following:


• A sample TEI Header, taken from a typical example of the type of documentcreated or used by the project concerned. This would enable us to get animpression of the range of TEI Headers favoured by the participants2.• A valid TEI Header for a sample document supplied by the convenors of

the meeting (an imaginary electronic edition ofBeowulf, containing digitizedpage images, edited texts from Klaeber, Mitchell, and Wrenn, and a moderndiplomatic transcription by Prof. George Lapping). This would enable a moreexplicit comparison of the different approaches to TEI Header creation adoptedby the participants.• A completed feedback form relating to a comprehensive TEI Header, in which

each element could be rated as “required”, “recommended”, “optional”, or“deprecated”, together with any more detailed recommendations as to thenature or scope of its data content and use.

We hoped that this approach would prove to be a simple but effective mecha-nism for gathering feedback from the participants, regarding the relative meritsand usefulness of the various elements available in the TEI Header. It would alsoprovide a rudimentary indication of whether or not any consensus exists amongstthe participants, and helped to identify any elements in the TEI Header for whichusage is widely divergent.

The one thing that this meeting did not set out to do was attempt a review theTEI Header as a whole. Whilst there is every hope that the outcome of the meetingmight prove useful to any future review of TEI P3, and to a reconsideration ofthe TEI Header in particular, the constitution of this group of invitees was notsufficiently broad to regard this as a possible objective.

4. Outcomes

In some respects, the worst possible outcome of this meeting would have been forno consensus to emerge. The expressive power and flexibility of the TEI Header toenable the encoding of an immense range of metadata within a controlled frame-work, is simultaneously both its greatest strength and greatest weakness – in so faras it provides users with an extremely powerful gun with which to shoot themselvesin the foot. So it was pleasantly surprising to discover just how much consensustherewasamongst the participants, with much of the discussion focusing on bestpractice with regard to the possible data content of particular Header elements,rather than on the use of those elementsper se.

If the range and complexity of TEI Headers produced at the large (and growing)number of electronic text creation centres continues to expand, then this perhapsrepresents an inhibiting factor to the simple interchange of Header information.The ability of the TEI Header to encode metadata is beyond question, but the valueof this information is somewhat diminished if it generally has to be thrown away inorder to facilitate the interchange and sharing of electronic texts. For example, if theparticipants at the meeting had only identified a small set of common mandatory


elements, this would facilitate the interchange of electronic texts at the expense ofrecipients losing a great deal of potentially useful metadata. Moreover, if the par-ticipants at the meeting were unable to agree upon sets of required, recommended,and optional TEI Header elements, this would appear to suggest that some sort ofintervention is always likely to be required if one wishes to integrate two or morecollections of electronic texts produced by different projects. Whilst this is not,of itself, a barrier to the interchange of electronic texts, it might be felt to be anadditional discouragement because of the cost implications involved.

Fortunately, the participants at the meeting were able to reach a consensus,which should constitute valuable information for the rest of the TEI user commu-nity. A draft report on the meeting has already been circulated to all the attendees,and is available from the URL cited below (see note 3). It may be followed, ifapproved, by the proposedGuide to Good Practice. Of more long-lasting value, thecommunity will have all the advantages made possible by the greater interchangeof TEI Header information between electronic text creators and providers.

5. Summary of Recommendations3

There follows a summary of the recommendations produced by the meeting. Forthe most part, they should be regarded as refinements of, or extensions to, therecommendations already available in TEI P3 (see Chapter 5 of theGuidelines).Readers unfamiliar with creating TEI Headers should consult an authoritativesource (such as TEI P3) for further information.

• File Description: Title StatementRecommendations:A phrase such as “a machine-readable transcription” should be supplied as adiscrete<title type=gmd> where titles are analysed4. Where uniform titles(as found in standard library catalogues) are supplied, they should be identifiedas such – and if no such identification has been used, then no one shouldassume that what has been used in the header is the uniform title. If theavailable title information has not been analysed or amplified, then just use<title>.

• File Description: ResponsibilitiesRecommendations:For known authors, the same content should appear in both the title state-ment and source description. As with titles, where a uniform naming sys-tem/authority file is available (e.g. an authority file), this should be used toprovide content for such elements and the names identified as coming from it.A closed but extensible set of “relator codes” (cf. USMARC) should be definedby a working party, and used to identify other kinds of responsibility (e.g. enc= encoder).


• File Description: ExtentRecommendations:Do not duplicate information available from a file management system. Anattempt should be made to list the metrics that have been found useful forparticular application areas (e.g. file size, keystrokes, number of sentences etc.)

• File Description: Publication StatementRecommendations:Publication means “making available by any means”. There was a gen-eral feeling that free text (<p>) should appear in the publication statementonly in the case of unpublished materials. In other cases, use<publisher>,<distributor>, or <authority>, and the relevant sub-elements, as appropri-ate.

• File Description: Series and Notes StatementsRecommendations:When appropriate,<seriesStmt> should be used to indicate when the text ispart of a specific series. The<notesStmt> should be used with caution, as it isnot intended as a catch-all replacement for other tags.

• File Description: Source DescriptionRecommendations:There was general agreement that the use of either<biblStruct> or<biblFull> is preferable to<bibl>, but they shouldnot be mixed. Copy-specific bibliographic information should for the present be recorded using the<idNo> element, with a type attribute of “shelfmark” to identify the particularcopy used directly or indirectly as the source, although the collection or repos-itory name alone may be used in the case where that is the only informationavailable.

• Encoding Description: Project DescriptionRecommendations:This element is widely used, but without any consistency. Therefore, includeany boilerplate text about the project, the purpose for which the text wascreated, and the process by which this was done.

• Encoding Description: Sampling DeclarationRecommendations:There were no recommendations. For some representatives at the meetingthe term “sampling” implied some kind of statistical procedure, whereas thedecision as to which parts of a text should be included or excluded from anelectronic transcription seemed more like editorial policy. In practice, thiselement has been used for documenting principles of both statisticaland


editorial sampling. It may be necessary to revise the TEI Header, such that<samplingDecl> becomes a sub-element of<editorialDecl>.

• Encoding Description: Editorial DeclarationRecommendations:There were no recommendations, other than this element should be used ifat all possible given the importance of the information it might potentiallycontain. TEI P3 offers a choice between structured (sub-)elements and free text(<p> tags). Free text is useful for preserving user-supplied documentation or forholding boilerplate text, but structured elements offer greater searchability.

• Encoding Description: Tagging DeclarationRecommendations:When counting occurrences of tags, header elements should be excluded.<tagUsage> allows for simple integrity checking, and can assist informationretrieval (e.g. by improving the performance of searches for specific types ofelement). Of the projects represented, only the WWP at Brown currently usesthe<rendition> element to define the default processing of elements.

• Encoding Description: Reference DeclarationRecommendations:There were no recommendations, as usage varied between the projects repre-sented at the meeting. This element needs better documentation and softwaresupport in order to be useful.

• Encoding Description: Classification DeclarationRecommendations:There were no recommendations. Many different classification schemes appearto be in use, and most of these are externally defined. Using the option to definea <taxonomy> in terms of an embedded<bibl> was generally preferred tousing<catDesc>.

• Profile Description: CreationRecommendations:This should only be used to give the date and place of the creation of theintellectual content of the (source) work. For example, in the case of a bookwritten between 1954 and 1956, but not published until 1973, the creation dateshould be recorded as 1956.

• Profile Description: Language UsageRecommendations:Within <langUsage>, a<language> element must be supplied for each valuespecified by a lang attribute on any element in the text. Its usage attribute


“specifies the approximate percentage (by volume) of the text which uses thislanguage”, and must be expressed as a whole number between 1 and 100. Thereis thus no way, short of a DTD modification, for any project to indicate sim-ply which is themain language of a multi-lingual text, without quantification.Unless and until such a modification is made, the practice or conventions usedshould be clearly stated (e.g. languages listed in priority order).

• Profile Description: Writing System Declarations

Recommendations:

WSD attributes should be supplied on each<language> element where appro-priate, and the WSDs indicated should be delivered together with the Header.None of the projects had (yet) found it necessary to use the WSD attribute,although it was generally believed to be A Good Thing. Relevant WSDs arenot widely available, and this situation should be corrected. There are alsowidely-perceived usability problems with WSDs, and the TEI should considerhow best to address this situation.

• Profile Description: Text Classification

Recommendations:

There were no recommendations. Of the various options available for textclassification,<keywords> was frequently used to supply terms from pre-existing genre taxonomies (e.g. Library of Congress) – and in combinationwith <creation> date, use of this element makes it much easier to retrievegroups of text by genre and period.<particDesc> was also widely used, andthe meeting decided that this element could serve to record detailed demo-graphic information about the author of a text as distinct from the participantswithin it.

• Revision Description: (various elements)

Recommendations:

The use of<change> is preferred to<list> as a means of structuring the<revisionDesc>, and<change> elements should be listed in reverse chrono-logical order (as specified in TEI P3).<resp> should be used to indicate therole of the person making the change (using the pre-defined but extensibleset of relator codes mentioned above), and<item> should be used to indicateexactlywhathas been changed. There was no consensus on when to record a<change> in the<revisionDesc>, as each project had adopted its own under-standing of what constituted a meaningful change. It was therefore decided thatchanges should be recorded “when it feels right to do so”.


6. Future Plans

At the time of writing, any future work plan remains somewhat speculative. It iseasy to envisage how an emerging consensus with regard to the creation and contentof TEI Headers would facilitate the training of users in this hitherto rather difficultand specialized area (with the proviso, of course, that the TEI Header should notbe seen to be limited to whatever was agreed at this meeting). Similarly, if a modelinterchange TEI Header can be agreed upon, one might reasonably expect to seethe rapid development of simple SGML transformation tools or scripts to mapbetween the interchange form, and the TEI Header structure favoured locally byany particular project. The definition of such a minimal Header (a Header-Lite?)would also greatly speed and simplify the creation of effective metadata packagesfor use by the next generation of XML-aware web browsers. In the immediateshort term, we would expect also to see the rapid creation and take up of simpletools effecting translations between the minimal TEI Header and other metadataschemes (e.g. MARC and Dublin Core).

Notes1 This report describes a one day meeting that took place at St Anne’s College, Oxford, immediatelyprior to the DRH’97 Conference. The recommendations given here are still under discussion, andmay need to be revised in light of the contributors’ attempts to implement them within their ownprojects. A dedicated mailing list has been established for all those who attended the meeting, butinterested readers are welcome to contact the authors if they wish to actively participate in the furtherdiscussion of any of the issues raised in this report. To contact the authors, email [email protected] Unfortunately, within the confines of this article it is not possible to reproduce all the sampleHeaders supplied by participants at the meeting. However, the Header provided by the OTA can befound at http://ota.ahds.ac.uk/ota/header_meeting/samples/ota.html3 A more complete report on the meeting, which includes an account of the discussions which pre-ceded each recommendation, can be found under the “Publications, Presentations, and Workshops”section of the OTA’s new website (located at http://ota.ahds.ac.uk). This page also includes a link tothe slides used for the presentation at the TEI 10 Conference.4 The attribute value gmd standing for “general material designation”, a concept familiar to usersof the Anglo-American Cataloguing rules and used to describe a term indicating the broad class ofmaterial to which an item belongs (in this case,machine-readable transcriptions).

Reference

Sperberg-McQueen, C. M., and L. Burnard, Eds.,Guidelines for Electronic Text Encoding andInterchange (TEI P3), Chicago, Oxford: Text Encoding Initiative, 1994.

putting our headers together: a report on the tei header meeting 12 september 1997

Documents