an exercise in preservation and applied technology making an electronic text

8
An exercise in preservation and applied technology Making an Electronic Text

Upload: cameron-lang

Post on 13-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

An exercise in preservation and applied technology

Making an Electronic Text

Published in 1871only 456 copies printed This book is a collection of broadsides, ballads, and popular stories in Dickensian London

Charles Hindley’s Curiosities of Street Literature

Using High quality scanned images and OCR software we have created text documents from the scanned images Using XML we are then able to “Mark-up” the documents for display on the web. We are following a defined standard for electronic texts. The TEI, or Text Encoding Initiative.

What we are doing

This standard was defined by the University of Oxford, Brown University, University of Bergen, and the University of VirginiaTEI consortium formulated their guidelines to facilitate interchange between individuals and groups using different programs and computer systems over a broad range of applications

Text Encoding Initiative

To make the TEI defined documents as accessible as possible a cross platform mark-up language was chosenA mark-up language can be as simple as HTML (Hyper Text Mark-up Language) As complex as LaTeXAs user definable as XML (eXtensible Mark-up Language)

eXtensible Mark-up LanguageChosen By TEI for it’s cross platform, multi-application capabilities.The user defines the mark-up in XMLcustom tag and search XML documents based on those tags

XMLWhy it’s good for you

Each image, scanned saves as a 40 Megabyte uncompressed TIFF Using OCR (optical character recognition) software, we are able to preserve the text.

The Images

Once the image has been OCR’ed, a text document is createdthese text documents can then be marked up in XMLMarkup can be done is software or manually

The Text