m#iii:!manuscript!metadata!magic! · ! 1!!! executivesummary!!...

M-‐III: Manuscript Metadata Magic MIMS Final Project • May 6th, 2016

Jordan Shedlock

Advisor: Robert Glushko

1

Executive Summary

The goal of the M-‐III project was to enhance the visibility and identity of the Robbins Collection, a “hidden collection” of works in legal scholarship at the Berkeley Law School, by creating an online interface to enable scholars to discover and explore its manuscript holdings; in other words, reorganizing their description resources for new interactions. This was accomplished by processing the descriptions to increase granularity, designing a data model to represent them, creating a database to store the information, and building a Web application to view and explore the descriptions and relationships of the manuscripts. Although the existing records include rich descriptions of the manuscripts as physical and textual artifacts, they had been organized either as disparate catalogue records in the Law Library catalogue, or as a linear collection of text-‐only records on the Robbins Collection website, which makes it difficult to browse them, investigate relationships and commonalities between them, or get a sense of the overall context of the collection. Moreover, the library catalogue records are largely narrative in form; to house them effectively in a database and increase discoverability, it was necessary to give them additional structure and move them further towards the transactional end of the Document Type Spectrum.

I accomplished this by studying the information components contained in the records and using regular expressions to extract them. I also identified information that was implicitly linked to the entities represented in the records, such as watermarks classified by C.-‐M. Briquet and geographic coordinates of cities where manuscripts originated, and made these relationships explicit by including links to external sources. During this process, I also considered how to model the manuscripts as bibliographic entities and as physical and textual items, and how to represent the other entities (people, places, organizations) in the records and their relationships with the manuscripts. This laid the foundation for a relational database that served as the back end of the final Web application.

This database, implemented in SQLite3, served as the storage tier of the application. The presentation tier consists of HTML pages created with Jinja2 templates, each one realizing a certain view of the collection (e.g. a single entity or a collection of entities sharing an attribute). The relationships between entities are shown through a graph visualization (undirected links between different types of entities). The logic tier consists of an application written in Python using the Flask framework that links those views to specific paths, makes specific database queries in response to requests from the user, and populates the templates accordingly.

2

I. Background

The Robbins Collection is a special research library at the UC Berkeley School of Law with a rich collection in religious and civil law, legal history, and comparative law, formed on the basis of a bequest by Lloyd M. Robbins, an alumnus who had a strong personal and professional interest in these topics. In addition to its extensive reference collection, its holdings include over a thousand early printed books and over 300 manuscripts, mostly early modern, as well as some from the medieval period. The Robbins Collection used to serve JD students studying civil and canon law; however, these courses are no longer a required part of the curriculum. Consequently, the Robbins Collection currently serves a dedicated but small audience of specialized legal scholars who have the requisite domain knowledge, language skills, and awareness of the collection.

This project began with meetings with the Robbins Collection staff in spring of 2015.

During the summer of 2015, I worked at the Robbins Collection, familiarizing myself with its holdings and the existing catalogue records, and studying possible directions for the project. I also devoted substantial effort to the project in other courses: during the fall 2015 semester, I worked with the records in the I School’s Applied Natural Language Processing course, and created a “first draft” of the interface in the Web Architecture course. These courses introduced me to and gave me practice with some tools, including regular expressions, the Flask framework, and the Google Maps and GeoNames application programming interfaces. In the spring 2016 semester, the Information Organization Laboratory course, which teaches full-‐stack development in the context of information organization concepts, was an excellent opportunity to learn and practice the web development and database modeling techniques used in this project. Although I School projects are typically undertaken by teams rather than individuals, I was able to execute a project of this scale by using the project components of these three courses to work on it.

II. Stakeholder Discussions and Scoping In spring of 2015, I began discussions with the Robbins Collection about how I might

put the skills and abilities taught at the School of Information to use in a project that would be beneficial to them. The Robbins staff felt that scholars and graduate students in other fields – medievalists, historians, art historians, linguists, artists, or anyone whose “dissertation has taken a left turn into law” – could find value in the collection’s rare books, but they were unaware of it, or did not realize that its holdings had relevance to fields other than law. They expressed a desire for a tool that would raise the profile of the Robbins Collection and give them something to show researchers who might be interested in using its holdings. We decided to focus on the manuscripts, since they are an attractive resource and (by definition) unique to the collection.

The Robbins staff identified access to information about materials as a major

challenge to its goals. While access to the original manuscripts is necessarily restricted, scholars and librarians at Robbins have created very rich descriptions of the manuscripts as textual and physical objects. These descriptions currently reside in LawCat, the Law Library’s general catalogue, and in a text catalogue on the Robbins website. However, this

3

situation is less than ideal. In LawCat, the items are not differentiated in any way from the rest of the Law Library’s holdings, and although their affiliation and location at the Robbins Collection are specified, this does not promote the identity of Robbins as a distinct entity. Moreover, since many of them are not catalogued by subject, the few hundred Robbins manuscripts tend to vanish amid the hundreds of thousands of regular circulating items recorded in LawCat, except to those who are already aware of the collection and have an idea of the specific text that they want to find. Meanwhile, the text catalogue on the Robbins website affords only linear reading of manuscript descriptions; they are presented as undifferentiated text across several pages, with little semantic markup or search capability. In addition, the Robbins Collection website is itself somewhat dated, and the staff do not feel that it is an asset to the organization’s image (it is slated to be updated in the near future). An historian of medieval art with whom I spoke about the project echoed the Robbins staff’s opinion that finding information about manuscripts is a major challenge for scholars. She emphasized that researchers studying manuscripts are interested in details that might seem trivial, but play a valuable role in understanding the manuscript in its physical and historical context. These include such details as script (scripts are closely identified with locations, time periods, and cultural and institutional contexts)1, support (the material used for the leaves of a manuscript, usually paper or parchment), and physical arrangement (how the manuscript is assembled from multiple sheets of material, which can shed light on a manuscript’s production and use), as well as features related to the composition of the text and images on the page, such as the number of lines the text is written on or the method used to create them. Thus the goal of the project came to be the creation of an online application that would serve as a destination to which staff could point researchers interested in working with the collection; that would be clearly and uniquely affiliated with the organization; that would take advantage of Robbins’ existing resource descriptions while making them more granular; and that would enable new interactions and more effective discovery and exploration of manuscripts.

III. Starting Point The focus of the project became the existing resource descriptions: just over 200 catalogue records, reflecting the majority of the Robbins Collection’s approximately 300 manuscripts (not all of their manuscripts have been catalogued at the moment). I spent a good portion of the summer learning about the format of the resource descriptions as they existed in LawCat, and eventually scraping them off of the website (although a Berkeley Law Library administrator provided me with a database extract of all of the Robbins catalogue records, they were in a proprietary format, and scraping them gave me a better understanding of the data structure and greater confidence in their integrity). These follow two cataloguing conventions. First, they are stored in MARC 21 format,2 a standard 1 Michelle Brown. The British Library Guide to Writing and Scripts. London: British Library, 1998, pp. 78-‐87. 2 “MARC Standards.” http://www.loc.gov/marc/, accessed 5/4/2016.

4

machine-‐readable library catalogue format that separates information about manuscripts into fields and delineates them with specific subfields and indicators. Second, the descriptions follow the guidelines laid out in Descriptive Cataloging of Ancient, Medieval, Renaissance, and Early Modern Manuscripts (AMREMM)3, which specifies the information to be included in descriptions of manuscripts. In addition to prescribing the form and content of manuscript descriptions, it also provides crosswalks for using MARC and other standards (intended for more uniformly produced, contemporary printed books) to reflect the unique physical and textual attributes of pre-‐modern or early modern manuscripts. There was still a good distance between the format dictated by MARC and AMREMM and the consistent, highly granular information I would need to build a database for a web application. Although MARC records are by definition “machine readable,” with numbered fields corresponding to certain types of information, the amount of internal structure varies widely. Glushko and McGrath describe degree of internal structure in terms of the Document Type Spectrum, ranging from un-‐ or loosely-‐structured “narrative” documents (e.g. novels or other natural language documents) to highly structured and specified “transactional” documents that can be easily processed by a computer or database.4

The Document Type Spectrum (Glushko and McGrath). Only one MARC field that I used, 008, is entirely transactional: it encodes date, geographic, format, language, and other information components in fixed-‐width fields using predefined codes. Most others fall around the middle right of the spectrum: fields are defined by numeric codes with pre-‐defined content, which is divided into coded subfields. However, the format of the subfields is not defined as strictly as the content, leaving some room for variation. For example, field 100, which denotes the person mainly responsible for the work represented in the record, may include subfields for dates, titles, numeration, and name variants. However, not all of these fields are mandatory, and there is some flexibility

3 Gregory A. Pass, Descriptive Cataloging of Ancient, Medieval, Renaissance, and Early Modern Manuscripts (Chicago: Association of College and Research Libraries, 2003) (hereafter “AMREMM”). Available online in full at http://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/booksanddigitalresources/digital/AMREMM_full.pdf 4 Robert J. Glushko and Tim McGrath. 2005. Document Engineering: Analyzing and Designing Documents for Business Informatics and Web Services. Cambridge, MA: The MIT Press. Section 1.3.1.

5

in their use: for example, representation of dates is not standardized (they may be years of birth and death, or of professional activity, and may be approximate or absent), and there is some variation in use of punctuation (partly from the inevitable diversity of human names across time and cultures). Other fields are even less structured. AMREMM makes extensive use of multiple 500 (“general note”) fields to describe manuscripts, and these generally fall in the center, or even towards the narrative end, of the spectrum: they are written in natural language, although they follow a highly conventionalized structure and vocabulary. This made it relatively straightforward to identify discrete information components within the MARC fields and subfields and determine which ones to extract.

Raw MARC 21 record for a Robbins Collection manuscript. Field 008 is transactional, fields 090-‐300 fall near the center-‐right of the Document Type Spectrum, while the 500 fields are in the center, edging towards the narrative end of the Spectrum.

IV. Data Modeling

The overarching concern at this point in the project was the representation of the manuscripts in the final application: what were the resources being organized? Which information components were properties of others, and which ones were best treated as entities in their own right? How could I preserve the rich detail of the original catalogue records in the new representation – which fields needed to be preserved, and which could be ignored? Moreover, my choice of a relational database (implemented in SQLite3) as my storage tier forced me to explicitly define the entities and relationships in the system. The question of “manuscript as entity” illustrates this challenge. Each manuscript in the collection is represented by a single bibliographic record. However, four of the currently described manuscripts consist of two bound volumes which are treated as a single entity, but whose extents, arrangements, dimensions, and other properties may differ. Moreover, unlike contemporary books, the texts bound into a single manuscript are not necessarily a unified work:5 for example, Robbins MS 272 is assembled from three

5 Karl Christ, The Handbook of Medieval Library History. Metuchen, N.J. and London: The Scarecrow Press, 1984, p. 14.

6

manuscripts of the same work – the Mishneh Torah – ranging over the 15th and 16th century, with some elements possibly dating back to the 14th century, and some as late as the 20th century.6 Because an attribute of an entity in a relational database may only have a single value, treating the physical properties of a manuscript as attributes of the manuscript entity in the database would exclude manuscripts with multiple volumes. For this reason, I decided to separate the manuscripts into multiple entities: manuscript, which contained the dates, language, identifier, place of publication, and other attributes common to the manuscript as a bibliographic entity; volume, which encompasses physical characteristics (such as material, dimensions, arrangement, and extent) that can vary between bound volumes of a manuscript; and content_item, which reflects a single textual item listed in the “Contents” field (505), relating it to the manuscript, the volume in which it appears, and the specific folia (or pages) that it occupies.

The title of a manuscript is another example: while we typically think of

contemporary books as having a single, immutable title, manuscripts often have several, including the title of the main or sole work they contain, spelling variations, a uniform title that is more widely recognized, or no title (in which case the cataloguer typically assigns an appropriately descriptive title statement). Descriptions often include the “secundo folio,” the opening words of the second leaf, which were used to distinguish between manuscripts of a single work (since the opening words, or “incipit,” would be identical).7 To capture this information, it was necessary to include a separate title entity to reflect all of a manuscript’s titles and their relationship to the manuscript.

In the interest of representing as much information as possible about the manuscript as a physical artifact, one of the attributes that I drew out as a separate entity is the watermark – a symbol imprinted in paper by the manufacturer to serve as a trademark. Since watermarks can play a great role establishing the date or authenticity of a document, they are included in manuscript descriptions. The Robbins Collection records identify watermarks, wherever possible, using the typology of Charles-‐Moïse Briquet’s Les filigranes,8 a comprehensive four-‐volume reference that identifies over 16,000 known watermarks, classifying them by their motifs and referring to known manuscripts where they have been observed. Since Briquet’s work has been made available in relatively structured form by the Austrian Academy of Sciences and the Paris Laboratory for Western Medieval Studies, I was able to link to their Briquet Online resource9 to make an explicit link between the watermark references in the Robbins Collection records and images of the watermarks. As watermarks are a useful and informative feature, and a single watermark

6 Saul Friedman and Jennifer K. Nelson. “[Mishneh Torah (ha-‐yad ha-‐hazaqah)].” Catalogue description in Berkeley Law Library catalogue, August 2006. http://lawcat.berkeley.edu/record=b511689 , retrieved 2 May 2016. 7 AMREMM, pp. 60, 148. 8 Charles-‐Moïse Briquet. Les filigranes : dictionnaire historique des marques du papier dès leur apparition vers 1282 jisqu'en 1600. Paris : A. Picard, 1907. 9 Briquet Online. Vienna-‐Paris, Österreichischen Akademie der Wissenschaften, Kommission für Schrift-‐ und Buchwesen des Mittelalters and Laboratoire de Médiévistique Occidentale de Paris. http://www.ksbm.oeaw.ac.at/_scripts/php/BR.php , retrieved 2 May 2016.

7

can occur in many documents, I included a discrete watermark entity to store them in the database.

Two watermarks classified by Briquet that appear in Robbins Collection manuscripts: 2752 “Boeuf” (appears in Robbins MS 301) and 12250 “Oiseau” (seen in Robbins MSS 153, 163, 164, 183, and 184). Images from Briquet Online.

Other entities were drawn from those reflected in the MARC records, and are fairly intuitive: people, places, and organizations. In some cases, these encompass multiple MARC fields: although 100 (main entry – personal name), 600 (subject added entry – personal name), and 700 (added entry – personal name) represent a person, and follow the same format, they represent different relationships between that person and the manuscript described. However, since the relational database format necessitates the representation of this relationship as a separate entity, storing the nature of the relationship as an attribute of this person_ms_assoc entity seemed justified. The following figure shows the entity-‐relationship diagram of the database, which went through several iterations during the project. The diagram makes it clear that the manuscript entity is the central one: it is directly connected to all other entities (except those with whom it shares an associative entity, such as has_place or has_wm ), while connections between other entities are sparse. This clearly reflects its derivation from the MARC records, which in turn trace their lineage back to card catalogues. These records include very detailed assertions about the relationships between the resources they describe and other entities: for example, a relationship between a person and a bibliographic item can be characterized by any of 271 codes, ranging from author and publisher to binder, engineer, or censor.10 There are also extensive codes for contemporary and obsolete country and language names. However, the records do not form an ontology: they do not represent all relevant relationships between all entities. 11

10 “MARC Code List for Relators: Term Sequence.” http://www.loc.gov/marc/relators/relaterm.html , accessed 5/5/2016. 11 Robert J. Glushko, ed. The Discipline of Organizing (3rd ed.). Sebastopol, CA: O’Reilly Media, 2015. 4.2.3, “Frameworks for Resource Descriptions,” 5.3.3, “Ontologies.”

8

So, while Thomas Wolsey, for example, was a highly influential figure in 16th-‐century England, the graph in this project representing his relationships is quite plain, his only connection being to Robbins MS 20, a collection of copies of diplomatic letters. His connections to other people (such as King Henry VIII and Thomas Boleyn) are only represented through a shared association with this manuscript. Although the hypertext approach of this project exposes some of these relationships, a fuller representation of the relationships would require deeper analysis of the records and, quite probably, the incorporation of information from other sources.

Graph visualizations from the M-‐III project for Thomas Wolsey (left) and Robbins MS 20 (right)

10

Preceding page: Entity-‐relationship diagram for the application database. The centrality of the manuscript entity is evident.

V. Data Extraction

Having determined the structure of the database, it was necessary to extract the information to populate it. Despite the aforementioned narrative tendency in the catalogue record fields, the conventional language of the descriptions made them good candidates for information extraction using regular expressions. A regular expression defines a pattern of characters in a text string, making it possible to find recurring patterns in text without having to match specific characters. This is invaluable for extracting structured data, and over the course of the project, I wrote over thirty regular expressions (using the Python re module) to find specific patterns in the various fields, including number of lines, ruling, script, arrangement, register of quires, and other characteristic properties. The table below shows some common patterns from the records, and how regular expressions can identify them:

Info Component RegEx Examples Output Watermark (500 Collation)

Briquet,? ([\'\"]?[\S ]*?[,\'\"]*) ?([0-‐9-‐]+)[;:).]*

“Paper (similar but not identical to Briquet, "Oiseau" 12224)” ; “(watermark similar, but not identical to, Briquet "Lettre B," 8060-‐8063)”

(‘”Oiseau”’, ‘1224’) ; ('"Lettre B,"', '8060-‐8063')

Register of quires (500 Collation)

([0-‐9-‐]+ ?[⁶⁷⁺⁸¹²⁰³⁵⁻⁴⁹]+)+ ?

Paper, fol. 26 ; unbound quires, 1-‐6⁴ 7².

1-‐6⁴ 7²

Script (500 Script) in a?n? ?([\S ]+?)(script|hand)[s,.;]?

Written in a gothic libraria script by one hand;

gothic libraria

Physical arrangement (500 Collation)

[Ff]ol. ([0-‐9xvi])* ?($.*?$ )?\+? ?[0-‐9]+ ?($.*?$ ?)?\+? ?([0-‐9xvi])* ?($.*?$)?

Paper, fol. iii + 212 + i ;

fol. iii + 212 + i

Dimensions of written area (300, subfield $c)

$ ?([0-‐9]*)[x ]+([0-‐9]* ?)$ ([A-‐Za-‐z]{2})

|c263 x 195 (208 x 129) mm bound to 266 x 205 mm

(‘208’, ‘129’, ‘mm’)

After determining that I could extract the desired information and fine-‐tuning the regular expressions to do so, I needed to assemble a pipeline to run each catalogue record through the regular expressions and create a data structure, from which I could load the data into the database. This turned out to be a relatively complex task. The first challenge was the diversity of structures and elements within the records and individual fields. I had to determine which fields were “children” of the manuscript as a whole and which ones were destined for the

11

volume entity, in order to store them in the appropriate structure; this also required writing code to identify markers of volume association within certain fields. Some records, in addition to describing the original manuscript, also included an additional field 300 with descriptions of microfilm copies; since this necessitated an additional “materials specified” subfield, which changed the location of the “support” element within the field, it introduced additional complexity into the function designed to extract the materials. In order to account for as many variations as possible, the code resembled less a pipeline than a tree with branches forking out in every direction, and much trial and error was required to make sure that the necessary data was extracted from every record, regardless of the path it took through the program. It was also necessary to iterate over all of the MARC fields in order, to ensure that prerequisite structures and data were in place when they were called for. Data integrity challenges became apparent on the other end as well, when I took the output dictionary produced by the pipeline and attempted to put it into the database. There were many errors due to missing or improperly-‐typed data, and it was necessary to go back to the pipeline, identify errors, and implement validation code to ensure that there were no attempts to place missing variables into the database. As the project progressed and I added more entities and fields to the database, I found it easier to focus on implementing a single entity across the system – first creating the entity in the database schema, then writing the code in the pipeline, and finally writing to code to load it into the database – rather than making numerous additions to the pipeline, then trying to “catch” them all on the other end. Another function of the pipeline was to retrieve information from additional online sources in order to enhance the presentation and enable further interactions. First, upon encountering a reference to a Briquet-‐catalogued watermark, the pipeline script queries the aforementioned Briquet Online website to retrieve the name of the motif (which is not always reliably captured by the regular expressions) and a URL for the Briquet Online page for that watermark. Second, after extracting a place name from a record, the script retrieves the associated latitude and longitude from the GeoNames Web service, storing it in order to display that location on a map.

12

Architectural diagram of the project

VI. Database and Application Implementation – Storage and Logic Tiers

The application was also built primarily with Python-‐language tools, which allowed me to integrate the regular expressions and pipeline with relative ease and no need to switch languages. The main application was written in Flask, a web development framework that handles server functions. Flask allows the designer to specify “routes” that map a URL to a function by the application. A route generally corresponds to a specific page, featuring a certain subset of data. For example, the list_mss view retrieves data on all manuscripts (from the manuscript table of the database) and sends them to a page for display, while the ms_view view retrieves information on one manuscript, as well as its associated volumes, places, people, content items, and other entities. The exception is a route which returns a JSON object instead of an HTML page; this is used to populate the d3.js visualizations on the application’s front end. This information retrieval takes place through SQLAlchemy, an object-‐relational mapper that bridges the gap between the object-‐oriented Python language and the relational logic of the database by abstracting communication between them and allowing database entities and queries to be treated as Python

13

objects. The database schema is initially defined using a Python script, and SQLAlchemy creates the database binary file and handles subsequent communication with it.

VII. Interface and Information Architecture – Presentation Tier

The end goal of the data extraction and organization is, of course, to enable new interactions with the resources, allowing users to explore the manuscripts and gain a greater understanding of them in their social, geographical, and collection-‐level context. To this end, it presents the entities represented in the collection on a website with a variety of views, organized by specific features.

The site is intended to function as a subsite of the existing Robbins Collection website, affording the user the ability to go from the Robbins site to active exploration of the manuscript records. Upon visiting the project homepage, the user sees a home page containing a map (focused on the Mediterranean) with circles indicating the country of origin of the Robbins Collection manuscripts. The circle size is proportional to the number of manuscripts originating in the country. This map illustrates the breadth of the collection: while most manuscripts do originate in Italy, France, or Spain, the map shows others originating in Central Europe and the Middle East. Users can click on the circles to view a list of manuscripts originating in that country.

Moving ahead, each manuscript has its own page, which largely duplicates

the content of the original catalogue records. However, these pages also include hyperlinks to people, locations, organizations, and watermarks, which now have their own pages as well. By clicking on any of these entities, the user can not only view more information about them (such as a map showing a location, the dates of a person involved with the manuscript, or an illustration of a watermark) but also follow a path of hyperlinks to other manuscripts, people, or locations associated with them. The user can also view list pages for each type of entity, focused on a particular facet (e.g. manuscripts from a particular country, or those associated with a person or organization.

The graph visualization included on individual entity pages simultaneously

reinforces the sense of the Robbins Collection’s manuscripts as an interconnected network, and allows further exploration. It shows the subject entity as the central node of a graph, connected to other entities associated with it by edges.12

12 The use of graph visualizations to represent knowledge networks is well established in the digital humanities; see, e.g., INKE’s Glass Cast visualization tool: http://inke.ca/projects/tools-‐and-‐prototypes/

14

VIII. Conclusion

I began this project with a set of static, linear resource descriptions for a hidden collection of manuscripts that were embedded in a larger library catalogue, analyzed their structure and information components, and extracted them. On their basis, I built a new organizing system – in the form of a Web application – to store them in a database and allow users to access and explore them through a free-‐standing Web site that emphasizes their uniqueness and the identity of the collection, allows easy exploration of the relationships between them, and makes it easier to comprehend their broader context. However, this project, in more ways than one, is a starting point. First, the entities, features, and relationships represented in this project only scratch the surface of those reflected in the Robbins Collection; future directions might include not only using natural language processing to discover and make explicit further relationships, but also digitization, transcription, and annotation of the manuscripts. Second, while I hope that this project serves as a tool to bring new attention to the Robbins Collection, it can never replace the study of the manuscripts themselves as physical and textual items, nor would it have been possible without the painstaking efforts of the librarians and scholars who described the manuscripts.

Acknowledgments My sincere thanks go to Andrea Quinn and Jennifer Nelson for welcoming me to the Robbins Collection and providing me with an exciting project opportunity and invaluable help and support over the past year. I would also like to express my gratitude to Robbins Collection Director Laurent Mayali, Law Library Director Kathleen Vanden Heuvel, and Dean Sujit Choudhry for being open to this project and providing the generous support that made it possible. This project never could have happened without the energy and unflagging optimism of Robert Glushko, who identified this opportunity, removed numerous obstacles, and provided guidance and encouragement along the way. It’s been a privilege and a pleasure to work with my classmates, who make the I School a friendly and fascinating place. Finally, I’d like to thank my wife, Chloë, without whom I’d never have come to the I School in the first place, much less made it through.

15

Works Cited Briquet, Charles-‐Moïse. Les filigranes : dictionnaire historique des marques du papier dès leur apparition vers 1282 jisqu'en 1600. Paris : A. Picard, 1907. Briquet Online. Vienna-‐Paris, Österreichischen Akademie der Wissenschaften, Kommission für Schrift-‐ und Buchwesen des Mittelalters and Laboratoire de Médiévistique Occidentale de Paris. 7/14/2009. http://www.ksbm.oeaw.ac.at/_scripts/php/BR.php , retrieved 2 May 2016. Brown, Michelle. The British Library Guide to Writing and Scripts. London: British Library, 1998. Christ, Karl. The Handbook of Medieval Library History. Metuchen, N.J. and London: The Scarecrow Press, 1984. Glushko, Robert J., and Tim McGrath. Document Engineering: Analyzing and Designing Documents for Business Informatics and Web Services. Cambridge, MA: The MIT Press, 2005. Glushko, Robert J., ed. The Discipline of Organizing (3rd ed.). Sebastopol, CA: O’Reilly Media, 2015. Library of Congress. “MARC Code List for Relators: Term Sequence.” 10/21/2014. http://www.loc.gov/marc/relators/relaterm.html , accessed 5/5/2016. Library of Congress. “MARC Standards.” 3/8/2016. http://www.loc.gov/marc/, accessed 5/4/2016. Nelson, Brent, and Melissa Terras. Digitizing Medieval and Early Modern Material Culture. Tempe, Arizona : ACMRS (Arizona Center for Medieval and Renaissance Studies), 2012. Pass, Gregory A. Descriptive Cataloging of Ancient, Medieval, Renaissance, and Early Modern Manuscripts. Chicago: Association of College and Research Libraries, 2003.