transforming library metadata into linked library data · cultural objects: a guide to describing...

13
You are at: ALA.org » ALCTS » Publications & Resources » Organization Resources » Cataloging Resources » Research Topics in Cataloging & Classification » Transforming Library Metadata into Linked Library Data Transforming Library Metadata into Linked Library Data Introduction and Review of Linked Data for the Library Community, 2003–2011 By Virginia Schilling Introduction Libraries and other cultural institutions are experiencing a time of huge, tumultuous change. Standards that have been in use for decades have come under increasing pressure to either adapt to new circumstances or to give way entirely to different standards. While it is clear that change is happening, what is less clear is where that change is taking us. If MARC and AACR2 no longer serve us, then what standards will serve? How can we adapt to fundamental differences in how our data is used without rendering decades of legacy data completely worthless? We stand in a moment of uncomfortable chaos. We must forge a new path, but where that path might lead, or even what it looks like, is still unclear. While some do not believe, perhaps, that any sort of change is necessary at all, Coyle points out that library data, despite being saved and accessed via computers, is designed for the use and consumption of humans (2010b, 6). The 2008 report of The Library of Congress Working Group on the Future of Bibliographic Control observes that “people are not the only users of the data we produce in the name of bibliographic control... so too are machine applications that interact with those data in a variety of ways” (2). Unfortunately, as stated in the Library Linked Data Incubator Group Final Report, library data is not integrated with the Web, much of it is encoded in natural language rather than as data, library standards serve only the library community and no other, and changes in library technology are often completely dependent on the expertise of vendors (Baker, et al. 2011, sec. 3.1). One possible path forward is provided by the standards established by the World Wide Web Consortium (W3C) to build the Semantic Web. Linked data, in particular, is an implementation of these standards that seems to fit well with the legacy metadata produced and maintained by libraries and other cultural institutions. This essay attempts to provide an overview of the current state of the discussion about linked data as well as provide a solid introduction for practitioners who wish to get involved in the conversation themselves. It draws from a range of sources published between 2003 and 2011 and is organized in five sections: An Overview of Library Metadata, An Overview of Linked Data, The Convergence of Linked Data and Library Metadata, Problems with Linked Data, and Looking to the Future of Linked Library Data. An Overview of Library Metadata This discussion starts with a look at our existing library standards and the types of metadata currently in use by libraries and other cultural institutions. Zeng and Qin (2008, 15) define four kinds of metadata standards used in the library profession. Standards for

Upload: others

Post on 26-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

You are at: ALA.org » ALCTS » Publications & Resources » Organization Resources »Cataloging Resources » Research Topics in Cataloging & Classification » TransformingLibrary Metadata into Linked Library Data

Transforming Library Metadata intoLinked Library Data

Introduction and Review of Linked Data for the LibraryCommunity, 2003–2011By Virginia Schilling

Introduction

Libraries and other cultural institutions are experiencing a time of huge, tumultuouschange. Standards that have been in use for decades have come under increasingpressure to either adapt to new circumstances or to give way entirely to differentstandards. While it is clear that change is happening, what is less clear is where thatchange is taking us. If MARC and AACR2 no longer serve us, then what standards willserve? How can we adapt to fundamental differences in how our data is used withoutrendering decades of legacy data completely worthless? We stand in a moment ofuncomfortable chaos. We must forge a new path, but where that path might lead, or evenwhat it looks like, is still unclear.

While some do not believe, perhaps, that any sort of change is necessary at all, Coylepoints out that library data, despite being saved and accessed via computers, is designedfor the use and consumption of humans (2010b, 6). The 2008 report of The Library ofCongress Working Group on the Future of Bibliographic Control observes that “peopleare not the only users of the data we produce in the name of bibliographic control... so tooare machine applications that interact with those data in a variety of ways” (2).Unfortunately, as stated in the Library Linked Data Incubator Group Final Report, librarydata is not integrated with the Web, much of it is encoded in natural language rather thanas data, library standards serve only the library community and no other, and changes inlibrary technology are often completely dependent on the expertise of vendors (Baker, etal. 2011, sec. 3.1). One possible path forward is provided by the standards established bythe World Wide Web Consortium (W3C) to build the Semantic Web. Linked data, inparticular, is an implementation of these standards that seems to fit well with the legacymetadata produced and maintained by libraries and other cultural institutions.

This essay attempts to provide an overview of the current state of the discussion aboutlinked data as well as provide a solid introduction for practitioners who wish to getinvolved in the conversation themselves. It draws from a range of sources publishedbetween 2003 and 2011 and is organized in five sections: An Overview of LibraryMetadata, An Overview of Linked Data, The Convergence of Linked Data and LibraryMetadata, Problems with Linked Data, and Looking to the Future of Linked Library Data.

An Overview of Library Metadata

This discussion starts with a look at our existing library standards and the types ofmetadata currently in use by libraries and other cultural institutions. Zeng and Qin (2008,15) define four kinds of metadata standards used in the library profession. Standards for

Page 2: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

data:

structures like the Dublin Core Metadata Element Set (DCMES)content like the Anglo­American Cataloging Rules, Second Edition (AACR2)values like the Library of Congress Subject Headings (LCSH)exchange like the MARC 21 Format for Bibliographic Data (MARC 21).

Caplan (2003) notes that data structure standards “will normally specify the metadataelements that are included in the scheme by giving each of them a name and a definition”(6). In addition to DCMES, Zeng and Qin discuss other commonly used data structurestandards such as VRA Core (39–41), Categories for the Description of Works of Art(CDWA) (32–36), and Encoded Archival Description (EAD) (53–59).

Content standards “specify how values for metadata elements are selected andrepresented” (Caplan 2003, 6). Caplan traces a brief historical overview of librarycataloging standards, starting with Panizzi and finishing with AACR2 and the InternationalStandard Bibliographic Description (ISBD) specification developed by the InternationalFederation of Library Associations and Institutions (IFLA) (54–55). Additionally, Zeng andQin discuss Describing Archives: A Content Standard (DACS) (59) and CatalogingCultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36).

Zeng and Qin note that data value standards “include controlled term lists, classificationschemes, thesauri, authority files, and lists of subject headings” (15). In addition to LCSH,Caplan touches on The Art & Architecture Thesaurus (AAT) (25) and Zeng and Qindiscuss The Thesaurus of Graphic Materials (TGM) and Iconclass (42), and theThesaurus of Geographical Names (TGN) (107–8).

Finally, the data exchange standards are the linchpin of modern library catalogingstandards. These allow libraries to exchange metadata reliably and coherently. Standardslike MARC 21 mean that a library sending data can always assure that the title elementwill always contain title information for the library receiving data. Furrie (2009) describeshow MARC accomplishes this. “The information from a catalog card cannot simply betyped into a computer to produce an automated catalog. The computer needs a means ofinterpreting the information found on a cataloging record. The MARC record contains aguide to its data, or little ‘signposts,’ before each piece of bibliographic information (underPart II).”

“MARC... was developed by the Library of Congress (LC) in the mid­1960s, primarily toenable the computer production of catalog cards that could subsequently be distributedthrough the Cataloging Distribution Service” (Caplan 2003, 12). Zeng and Qin describeone difficulty of continuing to use MARC: integrating MARC and non­MARC data. “Tointegrate [Dublin Core] records into a MARC­based database, it is necessary to convert[Dublin Core] records into MARC records, and they must be stored, indexed, andexchanged with this format” (25). Coyle (2010b) discusses the use of MARC to exchangebibliographic records created using Resource Description and Access (RDA). “TheMARC21 community is investigating to what extent RDA can be expressed in thatexisting format, but it seems clear that the full flexibility and extensibility of RDA goesbeyond what can be done in a record format that is already experiencing difficulties inkeeping up with needed changes” (32).

An alternative exchange format is Extensible Markup Language (XML). Caplan describesXML as a “subset” of Standard Generalized Markup Language (SGML) that is “easier toprocess” (19). SGML “was designed to handle variable­length textual data gracefully. Anunlimited number of elements... can be defined, and their names can be descriptive oftheir contents” (18). She continues, “SGML is inherently hierarchical and can enforce therules of hierarchy, making it a perfect medium for expressing the types of hierarchicalrelationships found within collections and among works, expressions, manifestations, anditems” (18). Zeng and Qin note MARC’s “inflexible output process” as one of several of its

Page 3: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

limitations and state that “in contrast, XML­based processing can easily produce differentoutput forms” (25). They also observe that EAD (53–54), Guidelines for ONlineInformation eXchange (ONIX) (69), and CDWA Lite (36–38) all use XML to exchangedata between systems.

In addition to metadata standards, the metadata itself falls generally into three categories:descriptive, administrative and structural (NISO 2004, 1). “Traditional library catalogingviewed as metadata is primarily descriptive” (Caplan, 4). Zeng and Qin, however, notethat digital resources are more complex and require more than traditional description.

The purposes of pre­Internet cataloging were twofold: (1) to provide richbibliographic descriptions and relationships between and among data ofheterogeneous items and (2) to facilitate sharing these bibliographic data acrosslibrary boundaries. While AACR2 and MARC have done a meritorious job inaccomplishing those purposes, they fall short on several important fronts inInternet­based resource descriptions, e.g., management of digital rights,preservation of digital objects, and evaluation of resources based on authenticity,user profile, and grade level. (6)

NISO defines administrative metadata as “information to help manage a resource, suchas when and how it was created, file type and other technical information, and who canaccess it” (1). Administrative metadata can also be subdivided into technical metadata(file characteristics), rights management metadata (intellectual property rights), andpreservation metadata (preservation of the resource) (Caplan, 5). Structural metadata“indicates how compound objects are put together” (NISO, 1). Caplan characterizesstructural metadata as “required to record the relationships between physical files andpages, between pages and chapters, and between chapters and the book as a whole”(5).

These standards and metadata types work together, sometimes inextricably, to form thebasis for modern library cataloging and the description of cultural objects. Hillmann andWestbrooks (2004) profile a variety of real­world implementations of metadata standardsand types. In an introductory essay for a special issue of Cataloging and ClassificationQuarterly, Smiraglia (2005) provides a general introduction to metadata and, finally, Wolfeand Lubas (2004) offer a basic slideshow introduction to metadata.

An Overview of Linked Data

Understanding linked data starts with a basic review of the applicable W3C SemanticWeb standards. Most importantly, the Resource Description Framework (RDF) provides atheoretical model for understanding relationships between “things.” Manola and Miller(2004) state,

the Resource Description Framework (RDF) is a language for representinginformation about resources in the World Wide Web. It is particularly intended forrepresenting metadata about Web resources, such as the title, author, andmodification date of a Web page, copyright and licensing information about a Webdocument, or the availability schedule for some shared resource. However, bygeneralizing the concept of a “Web resource,” RDF can also be used to representinformation about things that can be identified on the Web, even when they cannotbe directly retrieved on the Web. (sec. 1)

While RDF underpins the whole Semantic Web ecosystem, a host of other technologiesalso contribute. Riley (2010) notes that while “libraries have ‘records’”, “RDF has‘statements’ and ‘graphs’” (slides 4­5). “SPARQL [an RDF Query Language] can be usedto express queries across diverse data sources, whether the data is stored natively asRDF or viewed as RDF via middleware” (Prud'hommeaux and Seaborne 2008, under“Abstract”). RDF Schema (RDFS) “is a semantic extension... of RDF. It provides

Page 4: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

mechanisms for describing groups of related resources and the relationships betweenthese resources” (Brickley and Guha 2004, sec. 1).

Web Ontology Language (OWL) “is designed to facilitate ontology development andsharing via the Web, with the ultimate goal of making Web content more accessible tomachines” (W3C OWL Working Group 2009, sec. 1). Finally, Simple KnowledgeOrganization System (SKOS) “aims to provide a bridge between different communities ofpractice within the library and information sciences involved in the design and applicationof knowledge organization systems. In addition, SKOS aims to provide a bridge betweenthese communities and the Semantic Web, by transferring existing models of knowledgeorganization to the Semantic Web technology context, and by providing a low­costmigration path for porting existing knowledge organization systems to RDF” (Miles andBechhofer 2009, sec. 1.1).

Linked data is simply one practical application of these technologies to real­world data.Berners­Lee (2009) defines his four principles of linked data.

Use URIs as names for thingsUse HTTP URIs so that people can look up those namesWhen someone looks up a URI, provide useful information, using the standards(RDF*, SPARQL)Include links to other URIs so that they can discover more things. (under“Introduction”)

Heath and Bizer (2011) examine the principles of linked data in detail. Beginning with thefirst principle, they note that it advocates using URIs to identify not only Web pages, butalso objects in the real world, as well as abstract concepts (sec. 2.1). The HTTP URIsrequired by the second principle allow anyone with a domain name to create URIreferences (sec. 2.2). The third principle indicates that when someone looks up anexisting URI, it should retrieve a description of the resource to which it has been assigned(sec. 2.3). Finally, they discuss the importance of Berners­Lee’s fourth principle of linkingto other URIs. They define three types of links (relationship links, identity links andvocabulary links) and state that “such external RDF links are fundamental for the Web ofData as they are the glue that connects data islands into a global, interconnected dataspace and as they enable applications to discover additional data sources in a follow­your­nose fashion” (sec. 2.5).

Heath and Bizer also present an argument for the adoption of linked data for datapresentation on the Web. They state, “Linked data provides a more generic, more flexiblepublishing paradigm which makes it easier for data consumers to discover and integratedata from large numbers of data sources” and furthermore, they note specifically thatlinked data provides a unifying data model (RDF), a standardized data accessmechanism (HTTP), hyperlink­based data discovery (URIs) and self­descriptive data(vocabulary links) (sec. 2.6).

Dodds and Davis (2011) provide “a pattern catalogue that covers a range of differentareas from the design of web scale identifiers through to application developmentpatterns” (under “Introduction”). In a series of five slideshow presentations, Sequedaoffers an introduction to linked data (2011a), primers on creating (2011b), publishing(2011c) and consuming (2011d) linked data and finishes by reiterating that “Linked data isa set of best practices for publishing data on the Web” (2011e, slide 2). Raimond andSmethurst (2009) give a more technical introduction to linked data, noting that “everythingthat's good about the web comes from links” (slide 17). Finally, Linked Data: Connectdistributed data across the Web (n.d.), which is on the linkeddata.org website, and theLinkedData wiki (2011) also provide links to additional materials relating to linked dataand the Semantic Web.

The Convergence of Linked Data and Library Metadata

Page 5: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

Heery (2004) discusses the similarities between linked data and traditional librarymetadata.

What is perhaps the most striking aspect of the Semantic Web for the librarycommunity is the commonality between traditional information management andlibrary interests (constructing vocabularies, describing properties of resources,identifying resources, exchanging and aggregating metadata) and the concernsthat are driving the development of Semantic Web technologies. (270)

The difference between them lies in making the implicit relationships found in traditionallibrary metadata, which are obvious to humans, explicit for machine understanding aswell. Baker et al. note that the first step is to take our human­understandable, controlledstrings of text and link them to unique, persistent identifiers. “Library data cannot be usedin a linked data environment without having Uniform Resource Identifiers (URIs) both forspecific resources and for library­standard concepts” (sec. 4.3.2). The Library ofCongress has begun this process with the development of RDF presentations of itsvarious authorities and vocabularies. Subject headings, names, genre terms, countrycodes, languages and more have been given URIs (Authorities and Vocabularies n.d.,under “Search Authorities & Vocabularies”). As discussed by Coyle (2010b, 26­36), workon defining the data structure elements and various value vocabularies described by RDAin the Open Metadata Registry (formerly the NSDL Registry) has also begun.

Currently the metadata that is generated by the library profession exists in databases that“are made available on the Web only as dynamically formatted pages in response tosearch requests” (Caplan, 48). Unfortunately “search engines are increasingly becomingthe first gate that users approach when searching for any information” (Zeng and Qin,233) and search engines do not index these “dynamically formatted” pages. Coyle(2010b) notes that “this means that library data cannot participate in the highly linked andlinkable information environment on the Web, and this limits the visibility of libraries toWeb users” (9). In other words, library metadata simply does not show up in the searchresults presented to users by search engines.

Baker et al. note, on the other hand, that “Linked data is... about enhancing the Webthrough the addition of structured data. This structured data... plays a role in the crawlingand relevancy algorithms of search engines and social networks, and will provide a wayfor libraries to enhance their visibility through search engine optimization (SEO)” (sec.2.1). Exposing library metadata as linked data would mean it could be crawled by thesearch engine bots and included in the search results presented to users along witharticles from Wikipedia and the Internet Movie Database (IMDB).

Beyond the basic search results available from Web browser searches, Bourg (June 24,2010) muses in a blog post that linked data has the potential to allow serendipity in onlinesearching that does not currently exist. Users cannot search for unknown information orresources. “The scholarly value of serendipity usually comes up in discussions ofbrowsing, and in calls for maintaining large, open, physical collections of librarymaterials.” Coyle notes that “offline, we rely on a web of human connections to help usfind information. Online, that web consists of links between resources and rich socialinteraction.... The library catalog, however, offers little beyond search and retrieve”(2010b, 9). With linked data, searching for the known can become browsing forserendipitous discovery of the unknown simply by following links from one data set to thenext.

Linked data builds on the defining feature of the Web: browsable links (URIs)spanning a seamless information space. Just as the totality of Web pages andwebsites is available as a whole to users and applications, the totality of datasetsusing RDF and URIs presents itself as a global information graph that users andapplications can seamlessly browse by resolving trails of URI links…. The value oflinked data for library users derives from these basic navigation principles. Links

Page 6: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

between libraries and non­library services such as Wikipedia, Geonames, andMusicbrainz will connect local collections into the larger universe of information onthe Web. (Baker et al., sec. 2.1)

Coyle (2010a) further explores the idea that current data in Web resources are implicitlylinked by human understanding of them, but are not explicitly linked for machineunderstanding. “You can't … move easily from a statement in an essay about AbrahamLincoln to a list of books about Lincoln, much less a list of relevant books in your locallibrary (let alone a list of resources that are on the shelf and currently available)” (12). Bypublishing such information as linked data, the explicit connections necessary for thecomputer will have been added.

The intersection of library metadata and linked data has also resulted in the emergence ofdomain­applicable tools and real­world implementation of library linked data. The LibraryLinked Data Community Wiki (2011) provides links to some resources including LinkedLibrary Data events and glossaries of library and Semantic Web terminologies. Byrne andGoddard (2010) include an appendix of Semantic Web tools for libraries, including toolsto convert various metadata schemes into RDF, tools for publishing RDF (not necessarilydomain­specific), and several projects actively using linked data. Isaac et al. (2011)present an extensive description of library linked data resources divided into threecategories: published datasets, value vocabularies (such as LCSH) and metadataelement sets (such as RDA or Dublin Core). Vila Suero (2011) details use cases andcase studies of library linked data clustered by bibliographic data, authority data,vocabulary alignment, archives and heterogeneous data, citations, digital objects,collections, and social and new uses.

Problems with Linked Data

The problems with implementing linked data at this point in time are myriad. Baker et al.note that the practical focus thus far has been on the generation of linked data ontologiesand value vocabularies. “Many metadata element sets and value vocabularies have beenpublished as linked data over the past few years” but “relatively few bibliographic datasetshave been made available” (sec. 3.2). Bowen (2010) discusses the need for fundamentalchanges in how bibliographic data is stored in order to realize the true potential of linkeddata.

Libraries are tied to MARC­based systems that do not yet facilitate the creation oflinked data. Without a body of library data converted to linked data, softwaredevelopers have little incentive to create new applications that require it. Andwithout a significant number of applications that take advantage of linked data,vendors of current systems have little incentive to implement linked data in alegacy environment. (56)

Bowen also touches on the need to develop tools just for the process of transitioning ourexisting legacy data to linked data. While describing some of the issues encountered intransitioning MARC data into the eXtenstible Catalog (XC) schema, she concludes that“converting legacy metadata to linked data will require a team of experts, includingMARC­based catalogers, specialists in other metadata schemas, software developers,and Semantic Web experts to design and test normalization/conversion algorithms,develop new schemas, and prepare individual records for automated conversion” (57).Finally, Guenther (2004) and Zeng and Qin (2008) both present arguments for using XMLto transition out of MARC and into other standards. Zeng and Qin advocate the usage ofMARCXML to transition MARC records into XML (24–27). Guenther notes that “XMLallows for an easy path for converting existing records and flexibility in display and furthertransformations” (slide 38).

Moving past the practical concerns, other conceptual problems include establishing thereliability of linked data, copyright and intellectual property issues pertaining to the data

Page 7: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

being published, and privacy considerations. Hannemann and Kett (2010) ask “is the datacorrect and do processes exist that guarantee a high data quality? Who is responsible forit? Of the same importance is reliability in time: Is a resource stable enough to be citable,or will it be gone at some point?” (2). Closely related to establishing the reliability ofothers’ linked data is the problem of providing information about the authenticity andaccuracy of one’s own data. “Provenance of data provides useful information such astimeliness and authorship of data. It can be used as a ground basis for variousapplications and use cases such as identifying trust values for pages or pagesfragments.... Moreover, providing provenance meta­data as RDF and making it availableon the Web of Data, offers more interchange possibilities and transparency” (Orlandi andPassant 2011, 149).

Publishing linked data also raises questions about ownership and copyright. Byrne andGoddard point to one problem by noting that libraries license much of the content theyprovide to users and that “a mix of licensed and free content in a linked data environmentwould be extremely difficult to manage” (under “What are the major obstacles forlibraries?”). Baker et al., on the other hand, note that ownership of the metadata itself canbe extremely complicated. “Records are frequently copied and the copies are modified orenhanced for use by local catalogers. These records may be subsequently re­aggregatedinto the catalogs of regional, national, and international consortia” (sec. 3.3.1). They addthat “larger agencies are likely to treat records as assets in their business plans and maybe reluctant to publish them as Linked Open Data” (sec. 3.3.2).

Also at issue is privacy. “Threats to personal privacy will also increase as boundaries blurbetween personal information published intentionally, that published conditionally (forexample, to specific social networking sites for a specific audience) and information overwhich the subject has no control” (O'Hara and Shadbolt 2010, 39). Byrne and Goddardagree, noting that “librarians, with their long tradition of protecting the privacy of patrons,will have to take an active role in linked data development to ensure rights are protected”(under “What are the major obstacles for libraries?”).

Finally, Ding et al. (2011) consider the problems with provenance in detail. Hartig andLangegger (2010) approach linked data and its challenges from the perspective of thedatabase community, while Grant (August 14, 2011) in a blog post touches on thechallenges for vendors from the vendors’ perspective. Auer and Lehmann (2010) alsoexamine the challenges linked data needs to overcome, including lowering the barriers toentry, and a blog post by Bradley (February 14, 2011) reflects on the concepts of proofand trust in the context of the Semantic Web. Sauermann and Cyganiak (2008) discussthe problems inherent in trying to represent both real­world objects and web pages usingURIs.

Looking to the Future of Linked Library Data

Byrne and Goddard state that while linked data “has a long way to go before it is seen asa standard foundation for library data...the majority of issues are non­technical in nature.The technology is ready; it is now a matter of getting libraries and librarians ready as well”(under “What are the major obstacles for libraries?”). Coyle (2010b) advocates forlibraries and other cultural institutions as a whole to continue to model and buildontologies with a focus on the structural metadata. “One of the first steps that needs to betaken is to tease out the many components that are encompassed by the RDA text” (9).Sequeda (2011e) offers specific topics for further research: search and ranking,interlinking algorithms, provenance, trust and privacy, dataset dynamics, user interfaces,distributed queries, and evaluation (slide 12).

Byrne and Goddard offer some specific ideas for libraries to get involved in thedevelopment of linked data. Quoting Miller (2004, slide 26), they define four roles forlibraries: exposing collections via Semantic Web technologies, webifyingthesaurus/mappings/services, sharing lessons learned, and advocating for change (under

Page 8: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

“How can libraries get involved?”). First, Byrne and Goddard note that there is no reasonnot to experiment with publishing linked data for “small standalone collections” to develop“expertise and technologies within libraries” (under “Exposing collections”). Secondly,they encourage libraries to become involved in current efforts to develop library­relatedontologies and value vocabularies. “Supporting and contributing to the efforts underwaysuch as those of the DCMI, as well as web'ifying locally maintained controlled vocabularyis a natural fit for the profession” (under “Web'ifying thesaurus/mappings/services”).Thirdly, they call on librarians to become involved with the linked data community outsideof libraries. “Sharing is something that comes naturally to the library community, but...librarians must [also] engage with—and contribute to—wider linked­data­communityefforts. The semantic web is about breaking down silos, not building better ones” (under“Sharing Lessons Learned”). Lastly, they urge librarians to advocate for linked data withtheir own vendors. “Librarians must demand that vendors develop their own datasemantically” (under “Persistence”).

Baker et al. recommend additionally that libraries play a key role in the preservation oflinked data, both the vocabularies and the published data sets. They note that “Linkeddata will remain usable twenty years from now only if its URIs persist and can resolve todocumentation of their meaning” (sec. 4.4.1). Because of libraries' experience with andcommitment to data quality and long­term data maintenance, Baker et al. see anopportunity for libraries to take on the role of curating linked data as an extension of theircurrent functions (sec. 4.4.2).

Conclusion

This has been a simple discussion of a complex topic and barely touches on issues thatare debated extensively on blogs, electronic mailing lists, and other Web venues. This isalso an area where the status quo can change on a daily basis as new ideas arepresented and considered, new tools deployed, and previous predictions disproved.

At this time, efforts to implement Semantic Web and linked data technologies in librariesand other cultural institutions are still in their infancy. We are still establishing the buildingblocks such as ontologies and value vocabularies, defining the requirements and theconstraints for their use and, most importantly, have only just begun building the toolsthemselves that will become fundamental for libraries to both consume and producelinked data. The barriers to implementation for the average library are still very high.Byrne and Goddard acknowledge this, stating that, “particularly when compared to web2.0 applications, linked data can seem rather inaccessible; anyone can create a Twitteraccount or promote user tagging or even contribute to mashups, but the world of linkeddata, for the moment, remains firmly in the hands of the experts” (under “What are themajor obstacles for libraries?”). It is for this reason, Baker et al. recommend that libraries“take an incremental approach to making data available for use on the Web” by startingwith “high­priority, low­effort linked data projects” such as authority files and controlledterm lists (sec. 4.1.1).

Finally, despite the complexity, frustration, and general chaos involved in transitioning to anewer technology like linked data, it should be recognized that there really may be nochoice in the matter. “The participation of digital resource creators and that of the generalpublic in producing and organizing digital information objects has effectively ended theera of librarians' or information professionals' dominance in this field. Metadata creationhas become more distributed, participatory, and diversified in methods, practices, andtools” (Zeng and Qin, 297). Libraries can either participate in the larger metadatacommunity via technologies like linked data and the Semantic Web or they can bepushed aside and ignored. This larger community will continue to move forward andfarther away from “traditional” library technologies and practices regardless of whetherthe library profession decides to participate in the process. This then is the challengecurrently presented to us: Evolving along with the larger community brings with it all of thehazards inherent in change. Ignoring the transformation taking place risks rendering the

Page 9: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

data we have assembled so carefully and laboriously over so many years unusable foranyone outside of the library profession.

References

Auer, S. and J. Lehmann. 2010. Making the Web a Data Washing Machine—CreatingKnowledge Out of Interlinked Data. Semantic Web Journal, accessed September 18,2012, www.semantic­web­journal.net/content/new­submission­making­web­data­wash....

Library of Congress Linked Data Service. Authorities and Vocabularies. n.d., accessedSeptember 18, 2012, http://id.loc.gov/.

Baker, T., E. Bermès, K. Coyle, G. Dunsire, A. Isaac, P. Murray, M. Panzer, J. Schneider,R. Singer, E. Summers, W. Waites, J. Young and M. Zeng. 2011. Library Linked DataIncubator Group Final Report. World Wide Web Consortium, accessed September 18,2012, www.w3.org/2005/Incubator/lld/XGR­lld­20111025.

Berners­Lee, T. 2009. Linked Data. in Design Issues. World Wide Web Consortium,accessed September 18, 2012, www.w3.org/DesignIssues/LinkedData.html.

Bourg, C. 2010. “Linked Data = Rationalized Serendipity.” Feral Librarian (blog),accessed September 18, 2012, http://chrisbourg.wordpress.com/2010/06/24/linked­data­rationalized­sere....

Bowen, J. 2010. “Moving Library Metadata Toward Linked Data: Opportunities Providedby the eXtensible Catalog.” International Conference on Dublin Core and MetadataApplications, DC­2010­­Pittsburgh Proceedings, accessed September 18, 2012,http://dcpapers.dublincore.org/ojs/pubs/article/view/1010.

Bradley, A. 2011. “Open Linked Data Discovery, Proof and Trust.” SEO Skeptic (blog) ,accessed September 18, 2012, www.seoskeptic.com/open­linked­data­discovery­proof­and­trust.

Brickley, D. and R. V. Guha. 2004. RDF Vocabulary Description Language 1.0: RDFSchema. World Wide Web Consortium, accessed September 18, 2012,www.w3.org/TR/rdf­schema/.

Byrne, G. and L. Goddard. 2010. “The Strongest Link: Libraries and Linked Data.” D­LibMagazine, 16(11/12), accessed September 18, 2012,www.dlib.org/dlib/november10/byrne/11byrne.

Caplan, P. 2003. Metadata Fundamentals for All Librarians. Chicago: American LibraryAssociation.

Coyle, K. 2010a. “Understanding the Semantic Web: Bibliographic Data and Metadata.”Library Technology Reports 46(1): 1–31.

Coyle, K. 2010b. “RDA Vocabularies for a Twenty­First­Century Data Environment.”Library Technology Reports 46(2): 1–39.

Ding, L., J. Michaelis, J. McCusker, and D. L. McGuinness. 2011. Linked ProvenanceData: A Semantic Web­based Approach to Interoperable Workflow Traces. FutureGeneration Computer Systems 27(6): 797–805.

Dodds, L. and I. Davis. 2011. Linked Data Patterns: A Pattern Catalogue for Modelling,Publishing, and Consuming Linked Data, accessed September 18, 2012,http://patterns.dataincubator.org/book/.

Furrie, B. 2009. Understanding MARC Bibliographic: Machine­readable Cataloging,

Page 10: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

accessed September 18, 2012, www.loc.gov/marc/umb/.

Grant, C. 2011. “The Library Linked Data Model: From a Librarian/Vendor Point of View.”Commentary from Carl Grant (blog), accessed September 18, 2012, http://thoughts.care­affiliates.com/2011/08/linked­data­model­from­libra....

Guenther, R. 2004. “New and Traditional Descriptive Formats in the Library Environment”(presentation). International Conference on Dublin Core and MetadataApplications, 2004, Shanghai, China, accessed September 18, 2012,http://dc2004.library.sh.cn/english/prog/ppt/ifla.

Hannemann, J. and J. Kett . 2010. Linked Data for Libraries. World Library andInformation Congress: 76th IFLA General Conference and Assembly, Gothenburg,Sweden, accessed September 18, 2012, www.ifla.org/files/hq/papers/ifla76/149­hannemann­en.pdf.

Hartig, O. and A. Langegger. 2010. A Database Perspective on Consuming Linked Dataon the Web (preprint). Datenbank­Spektrum 10(2): 57–66, accessed September 18,2012, http://www2.informatik.hu­berlin.de/~hartig/files/Hartig_QueryingLD_DBSpektrum_Preprint.pdf.

Heath, T. and C. Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space.Synthesis Lectures on the Semantic Web: Theory and Technology 1( 1). Morgan &Claypool, accessed September 18, 2012, http://linkeddatabook.com/editions/1.0/.

Heery, R. 2004. “Metadata Futures: Steps Toward Semantic Interoperability.” In Metadatain Practice, edited by D. I. Hillman and E. L. Westbrooks, 257–71. Chicago: AmericanLibrary Association.

Hillmann, D. I. and E. L. Westbrooks, eds. 2004. Metadata in Practice. Chicago:American Library Association.

Isaac, A., W. Waites, J. Young, and M. Zeng. 2011. Library Linked Data Incubator Group:Datasets, Value Vocabularies, and Metadata Element Sets. World Wide Web Consortium,accessed September 18, 2012, www.w3.org/2005/Incubator/lld/XGR­lld­vocabdataset­20111025/.

Library Linked Data Community Wiki. Last modified December 19, 2011. World Wide WebConsortium, accessed September 18, 2012, www.w3.org/2001/sw/wiki/LLD.

Linked Data: Connect Distributed Data across the Web . n.d., accessed September 18,2012, http://linkeddata.org/home.

LinkedData (wiki). Last modified July 7, 2012. World Wide Web Consortium, accessedSeptember 18, 2012, www.w3.org/wiki/LinkedData.

Manola, F. and E. Miller, eds. 2004. RDF Primer. World Wide Web Consortium, accessedSeptember 18, 2012, www.w3.org/TR/rdf­primer/.

Miles, A. and S. Bechhofer, eds. 2009. SKOS Simple Knowledge Organization SystemReference. World Wide Web Consortium, accessed September 18, 2012,www.w3.org/TR/skos­reference/.

Miller, E. 2004. The Semantic Web and Digital Libraries (presentation). World Wide WebConsortium, accessed September 18, 2012, www.w3.org/2004/Talks/1013­semweb­em/talk.

NISO. 2004. Understanding Metadata, accessed September 18, 2012,www.niso.org/publications/press/UnderstandingMetadata.pdf.

Page 11: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

O’Hara, K. and N. Shadbolt. 2010. “Privacy on the Data Web.” Communications of theACM 53(3): 39–41.

Library of Congress. On the Record: Report of The Library of Congress Working Groupon the Future of Bibliographic Control. 2008, accessed September 18, 2012,www.loc.gov/bibliographic­future/news/lcwg­ontherecord­jan08­final.pdf.

Orlandi, F. and A. Passant. 2011.” Modelling Provenance of DBpedia Resources UsingWikipedia Contributions.” Web Semantics: Science, Services and Agents on the WorldWide Web 9(2): 149–64.

Prud'hommeaux, E. and A. Seaborne, eds. 2008. SPARQL query language for RDF.World Wide Web Consortium, accessed September 18, 2012, www.w3.org/TR/rdf­sparql­query/.

Raimond, Y. and M. Smethurst. 2009. A Skim­read Introduction to Linked Data(presentation). BBC Radio Labs blog, accessed September 18, 2012,www.bbc.co.uk/blogs/radiolabs/2009/09/a_skimread_introduction_to_lin.shtml.

Riley, J. 2010. RDF for Librarians (presentation). Indiana University Digital LibraryProgram, accessed September 18, 2012,www.dlib.indiana.edu/education/brownbags/fall2010/rdf/rdf.pdf.

Sauermann, L. and R. Cyganiak. 2008. Cool URIs for the Semantic Web. World WideWeb Consortium, accessed September 18, 2012, www.w3.org/TR/cooluris/.

Sequeda, J. 2011a. Introduction to Linked Data (presentation 1/5). Semantic TechnologyConference, accessed September 18, 2012,www.slideshare.net/juansequeda/introduction­to­linked­data­8223364.

Sequeda, J. 2011b. Creating Linked Data (presentation 2/5). Semantic TechnologyConference, accessed September 18, 2012, www.slideshare.net/juansequeda/creating­linked­data.

Sequeda, J. 2011c. Publishing Linked Data (presentation 3/5). Semantic TechnologyConference, accessed September 18, 2012, www.slideshare.net/juansequeda/publishing­linked­data.

Sequeda, J. 2011d. Consuming Linked Data (presentation 4/5). Semantic TechnologyConference, accessed September 18, 2012,www.slideshare.net/juansequeda/consuming­linked­data.

Sequeda, J. 2011e. Linked Data: Summary and Outlook (presentation 5/5). SemanticTechnology Conference, accessed September 18, 2012,www.slideshare.net/juansequeda/conclusions­linked­data.

Smiraglia, R. P. 2005. Introducing Metadata. Cataloging & Classification Quarterly 40(3–4): 1–15.

Vila Suero, D., ed. 2011. Library Linked Data Incubator Group: Use Cases. World WideWeb Consortium, accessed September 18, 2012, www.w3.org/2005/Incubator/lld/XGR­lld­usecase­20111025/.

Wolfe, R. H. W. and R. H. Lubas. 2004. Metadata: An Introduction Presented by MITLibraries Metadata Services (presentation). Metadata Services, MIT Libraries, accessedSeptember 18, 2012, http://libraries.mit.edu/metadata/presentations/IAPMetadata.ppt.

W3C OWL Working Group, ed. 2009. OWL 2 Web Ontology Language DocumentOverview. World Wide Web Consortium, accessed September 18, 2012,

Page 12: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

www.w3.org/TR/owl2­overview/.

Zeng, M. L. and J. Qin. 2008. Metadata. New York: Neal­Schuman.

Further Reading

Anglo­American Cataloguing Rules (website). 2006, accessed September 18, 2012,www.aacr2.org/index.html.

Art & Architecture Thesaurus Online. 2000. The Getty Research Institute, accessedSeptember 18, 2012, www.getty.edu/research/tools/vocabularies/aat/index.html.

Baca, M. and P. Harpring, eds. 2009. Categories for the Description of Works of Art,accessed September 18, 2012,www.getty.edu/research/publications/electronic_publications/cdwa/index.html.

Bray, T., J. Paoli, C. M. Sperberg­McQueen, E. Maler, and F. Yergeau, eds. 2008.Extensible Markup Language (XML). World Wide Web Consortium, accessed September18, 2012, www.w3.org/TR/xml/.

CCO Commons (website). 2006. Visual Resources Association, accessed September 18,2012, http://cco.vrafoundation.org/index.php/.

CDWA Lite. 2011. In Categories for the Description of Work of Art, edited by M. Baca andP. Harpring. The Getty Research Institute, accessed September 18, 2012,www.getty.edu/research/publications/electronic_publications/cdwa/cdwalit....

Describing Archives: A Content Standard (DACS). n.d. The Society of AmericanArchivists, accessed September 18, 2012,www.archivists.org/governance/standards/dacs.asp.

Dublin Core Metadata Element Set, Version 1.1. 2010. Dublin Core Metadata Initiative,accessed September 18, 2012, http://dublincore.org/documents/dces/.

<ead> Encoded Archival Description version 2002 official site. Last modified September14, 2011. Library of Congress, accessed September 18, 2012, www.loc.gov/ead/.

Getty Thesaurus of Geographic Names Online. 2000. The Getty Research Institute,accessed September 18, 2012,www.getty.edu/research/tools/vocabularies/tgn/index.html.

Iconclass. 2011. Rijksbureau voor Kunsthistorische Documentatie, accessed September18, 2012, www.iconclass.nl/.

IFLA Cataloguing Section & ISBD Review Group. 2011. International StandardBibliographic Description, accessed September 18, 2012,www.ifla.org/en/publications/international­standard­bibliographic­descri....

The International Federation of Library Associations and Institutions (IFLA) (website).n.d., accessed September 18, 2012, www.ifla.org/.

MARC 21 Format for Bibliographic Data. 2011. Library of Congress, accessed September18, 2012, www.loc.gov/marc/.

ONIX. 2009. EDItEUR, accessed September 18, 2012, www.editeur.org/8/ONIX/.

Overview of SGML Resources. 2004. World Wide Web Consortium, accessed September18, 2012, www.w3.org/MarkUp/SGML/.

Page 13: Transforming Library Metadata into Linked Library Data · Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (36). Zeng and Qin note that data value standards

Copyright Statement Privacy Policy Site Help Site Index© 1996–2016 American Library Association

50 E Huron St., Chicago IL 60611 | 1.800.545.2433

RDA: Resource Description and Access. 2010. Joint Steering Committee for theDevelopment of RDA, accessed September 18, 2012, www.rda­jsc.org/rda.html.

Thesaurus for Graphic Materials. n.d. Library of Congress, accessed September 18,2012, http://id.loc.gov/vocabulary/graphicMaterials.html.

VRA Core. n.d. Visual Resources Association, accessed September 18, 2012,www.vraweb.org/projects/vracore4/.

Semantic Web (website). 2010. World Wide Web Consortium, accessed September 18,2012, www.w3.org/standards/semanticweb/.