rda vocabularies briefing

37
The Case for RDA Vocabularies Diane Hillmann, Jon Phipps Metadata Management Associates 1/15/2010 1 Big Heads briefing

Upload: diane-i-hillmann

Post on 06-May-2015

1.990 views

Category:

Technology


0 download

DESCRIPTION

Presented Jan. 15, 2010 for the Technical Services 'Big Heads', as an introduction to the RDA Vocabularies and the opportunities provided by this different approach to data.

TRANSCRIPT

Page 1: RDA Vocabularies Briefing

The Case for RDA Vocabularies

Diane Hillmann, Jon Phipps

Metadata Management Associates

1/15/2010 1Big Heads briefing

Page 2: RDA Vocabularies Briefing

RDA in Two PartsYou’ve heard about the guidance text

The vocabularies have been developed in parallel Agreement made in Apr./May 2007 for a Task

Group to work on this Vocabularies are up-to-date as of latest JSC

changes

1/15/2010 2Big Heads briefing

Page 3: RDA Vocabularies Briefing

They’re HerePlease explore!

http://metadataregistry.org/rdabrowse.htm

1/15/2010 3Big Heads briefing

Page 4: RDA Vocabularies Briefing

Why are the Vocabularies Important?They provide a way for libraries to move

from a limited, bespoke “format” and elderly encoding to a more modern approach to data creation, management, and sharing

They are open and usable by others, making re-use of non-library data easier for libraries to accomplish

1/15/2010 4Big Heads briefing

Page 5: RDA Vocabularies Briefing

1/15/2010 5Big Heads briefing

Page 6: RDA Vocabularies Briefing

1/15/2010 6Big Heads briefing

Page 7: RDA Vocabularies Briefing

1/15/2010 7Big Heads briefing

Page 8: RDA Vocabularies Briefing

Why Not Just “Improve” MARC?

MARC is optimized for records, and although some improvement is possible (and is happening), a complete overhaul is not feasible Too much change (i.e., “improvement”) is

likely to make a transition more difficult We need to re-think our approach to

creating, managing, sharing metadata, not apply bandaids to 45-year-old standards

1/15/2010 8Big Heads briefing

Page 9: RDA Vocabularies Briefing

What’s Going On Outside Libraries?

Many more sources of good data becoming available Much of it is freely available, with links to even

more sources

NY Times is one of the newer entrants in this field, building links to enrich their own data in an similar environment of retrenchment

1/15/2010 9Big Heads briefing

Page 10: RDA Vocabularies Briefing

1/15/2010 10Big Heads briefing

Page 11: RDA Vocabularies Briefing

1/15/2010 Big Heads briefing 11

Page 12: RDA Vocabularies Briefing

We’re Not In This Picture

With MARC, we’re currently “delivering:” Primarily textual information, with few or no links

to follow Information almost exclusively created and

maintained by [expensive] human agents

Currently, as we look at financial retrenchment, we are focusing on how to make our data less expensive by doing less of it Isn’t this a strategy designed to put us at the

margins?

1/15/2010 12Big Heads briefing

Page 13: RDA Vocabularies Briefing

1/15/2010Big Heads briefing 13

Page 14: RDA Vocabularies Briefing

1/15/2010Big Heads briefing 14

Page 15: RDA Vocabularies Briefing

Moving Beyond RecordsLinked open data--enables conversations

with the rest of the data world This data is independent of format, syntax and

"records" (although can be aggregated for various uses)

May include “crowd-sourced” data (DBPedia or FreeBase) or data re-used from other sources

1/15/2010 15Big Heads briefing

Page 16: RDA Vocabularies Briefing

... To StatementsThe one book=one record world of MARC is a

serious limitation Making use of FRBR also requires a new view of

data management

An RDF approach, based on statements rather than records, gives us a means to incorporate other sources of data and to do so using cheaper machine-based strategies

1/15/2010 16Big Heads briefing

Page 17: RDA Vocabularies Briefing

Why Invest in Change?We know our current way of creating and

managing data is: Unsustainable in an environment of limited

resources Based on a notion of standard data that does

not meet the needs of our users Relies on expensive human effort

1/15/2010 17Big Heads briefing

Page 18: RDA Vocabularies Briefing

The Vocabularies ...Built according to RDF Vocabulary standards,

can be used in a variety of data environments

Based on library data experience

Intended to be attractive to the data world outside libraries, in hopes that they will use our vocabularies for their bibliographic description This would make re-use easier for us

1/15/2010 18Big Heads briefing

Page 19: RDA Vocabularies Briefing

Richer, Cheaper Data?Data that is more easily manipulated and

maintained by machine

Data that is created and maintained by someone else, but “good enough” to provide important functionality

Ex.: Geographic data, to support mapping applications

Ex.: Data to better support faceted searching and browsing

1/15/2010 19Big Heads briefing

Page 20: RDA Vocabularies Briefing

Real ExampleLC Chronicling American Project

Building georeferencing into library data

1/15/2010 20Big Heads briefing

Page 21: RDA Vocabularies Briefing

About this Newspaper: The Daytona Daily News

• HTMLhttp://chroniclingamerica.loc.gov/lccn/sn93063916/

• RDFhttp://chroniclingamerica.loc.gov/lccn/sn93063916.rdf

• MARC (HTML)http://chroniclingamerica.loc.gov/lccn/sn93063916/marc/

• MARC (XML)http://chroniclingamerica.loc.gov/lccn/sn93063916/marc.xml

• WorldCat (HTML only?)http://www.worldcat.org/oclc/1631353 1/15/2010 Big Heads briefing 21

Page 22: RDA Vocabularies Briefing

1/15/2010 22Big Heads briefing

Page 23: RDA Vocabularies Briefing

1/15/2010 23Big Heads briefing

Page 24: RDA Vocabularies Briefing

Un-Linked DataMARC21 has a naming convention for place

names…752 $a United States $b Florida $c Volusia $d Daytona Beach

Wikipedia also has a naming convention for place names…http://en.wikipedia.org/wiki/Daytona_Beach,_Florida

LC staffer created a little script to use the 752 hierarchy to build a Wikipedia URL and see if it would resolve as a URI from DBpedia…

1/15/2010 Big Heads briefing 24

Page 25: RDA Vocabularies Briefing

Linked DataDbpedia:

<dcterms:coverage rdf:resource=http://dbpedia.org/resource/Daytona_Beach%2C_Florida />

Geonames:<dcterms:coverage rdf:resource=http://sws.geonames.org/4152872/ />

1/15/2010 Big Heads briefing 25

Page 26: RDA Vocabularies Briefing

DBpedia

Dbpedia is “a community effort to extract structured information from Wikipedia and to make this information available on the Web.”

1/15/2010 Big Heads briefing 26

Page 27: RDA Vocabularies Briefing

1/15/2010 27Big Heads briefing

Page 28: RDA Vocabularies Briefing

1/15/2010 28Big Heads briefing

Page 29: RDA Vocabularies Briefing

DBpedia“The DBpedia knowledge base currently

describes more than 2.6 million things, including at least…

213,000 persons

328,000 places

57,000 music albums

36,000 films

20,000 companies.”

1/15/2010 Big Heads briefing 29

Page 30: RDA Vocabularies Briefing

Dbpedia (even more data)

owl:sameAs

Rdfabout: The 2000 U.S. Censushttp://www.rdfabout.com/rdf/usgov/geo/us/fl/counties/volusia_county/daytona_beach

GeoNameshttp://sws.geonames.org/4152872/

Freebasefbase:Daytona Beach, Florida

1/15/2010 Big Heads briefing 30

Page 31: RDA Vocabularies Briefing

Interesting QuestionsThere are hundreds, if not thousands of people tracking down place names in Wikipedia and making sure they are normalized and geo-referenced.

Is this crowd-sourced, Wikipedia data ‘authoritative’?

Is it ‘good enough’?

How different is this from the strategy that’s used for NACO?

1/15/2010 Big Heads briefing 31

Page 32: RDA Vocabularies Briefing

More QuestionsChronicling America’s data for the Daytona Beach Daily News references Dbpedia but there’s no corresponding reference to Chronicling America data in Dbpedia, even though there’s a ‘place’ where it could be referenced.

How do we make sure that happens?

Where’s the library data anyway?

1/15/2010 Big Heads briefing 32

Page 33: RDA Vocabularies Briefing

Even More QuestionsDBpedia uses it’s own vocabulary for many statements, chooses to use skos:subject instead of dc:subject, foaf:name instead of dc:title.

Was there a specific reason for this choice?

Would there be value for us if they used more RDA properties instead?

1/15/2010 Big Heads briefing 33

Page 34: RDA Vocabularies Briefing

How Do We Get From Here to There?

Work with vendors to shift from MARC to RDA; from records to statements

Focus community effort on solid innovation rather than incremental shifts

Worry less about the costs of moving forward, and more about the costs of stasis

Support open sharing of library data!

1/15/2010 34Big Heads briefing

Page 35: RDA Vocabularies Briefing

The Elephants in the Room

Record “ownership” as OCLC is attempting to enforce will not help libraries as they attempt to move forward

OCLC’s membership must reinforce an open model of record use and re-use, lest necessary innovation be stifled

LC’s R2 report recommends a backward facing strategy Given LC’s well-known (and well-respected)

record for innovation, why is cataloging data exempt from consideration?

1/15/2010 35Big Heads briefing

Page 36: RDA Vocabularies Briefing

What RDA Vocabularies Bring to the Table

Readiness for participation in the open data world

Potential for automating more data capture to enrich library data without using expensive human resources, and sharing without artificial boundaries

Improved "marketing" of our collections (particularly digital and special collections) beyond the library world

1/15/2010 36Big Heads briefing

Page 37: RDA Vocabularies Briefing

Thank You!Questions? Comments?

Contact: [email protected]@gmail.com

1/15/2010 37Big Heads briefing