making your data work harder than you do

47
Images of the Babbage Engine taken from the Computer History Museum website http:// www.computerhistory.org

Upload: susan-jane-williams

Post on 18-Dec-2014

1.107 views

Category:

Education


2 download

DESCRIPTION

From Shapeshifting: A Practical Look at Metadata Interoperability. The use of planning documents, data dictionaries and xslt in transforming data, particularly for use in image delivery systems. Also addresses the uses of VRA Core 4 display and index fields.

TRANSCRIPT

Page 1: Making your data work harder than you do

Images of the Babbage Engine taken from the

Computer History Museum

website

http://www.computerhistory.org

Page 2: Making your data work harder than you do

Learning to Make Your Data Work Harder Than

You Do

Page 3: Making your data work harder than you do

Do it once, do it right

same data for different tools and users in the most automated way possible, not recreating it

Its about transforming and repurposing the

Page 4: Making your data work harder than you do

Three Words:

USE PLANNING DOCUMENTS

Page 5: Making your data work harder than you do

In the age of Google, why have fielded data?

•More efficient for both data entry and for systems to search, retrieve and ingest

•Parsed, discretely fielded data can be recombined mechanically for a variety of outputs and uses, including XML

Page 6: Making your data work harder than you do

Data flow and useCataloging utility (relational database)

InstitutionalDigital Repository

Web 2.0XMP (in images)

RSS feedsWebsites

tools

UsersXMP (in images)

Flat ExcelPDF

Delivery SystemsARTstor

MDIDCONTENTdm

LUNA Insight etc.

XML

xslt

xslt

xsltx

slt

Page 7: Making your data work harder than you do
Page 8: Making your data work harder than you do

Data flow and useCataloging utility (relational database)

InstitutionalDigital Repository

Web 2.0XMP (in images)

RSS feedsWebsites

tools

UsersXMP (in images)

Flat ExcelPDF

Delivery SystemsARTstor

MDIDCONTENTdm

LUNA Insight etc.

XML

xslt

xsltxslt

Page 9: Making your data work harder than you do

A pithy answer to “why relational?” (for cataloging)

• Message from Jan Eklund to VRA-L, Feb 20, 2008, subject: Re: CONTENTdm and metadata (now posted on VRA web under Resources)– Complexity: “complexity cannot be captured efficiently

in a flat data model because basically you have to leave space in every record to accommodate the most complex object you will ever encounter. This adds up to a lot of wasted space, and wasted space means more money…”

– Consistency: “all the descriptive data about the work is entered once, and every image that shows this work inherits the same information”

Page 10: Making your data work harder than you do

Excel sample (“flat file” output)

Notice that each row represents an image file and conflates the work and image records (repeats the information about the work for each image).

Each repeating value (like Artist) must have a column reserved for possible use.

Page 11: Making your data work harder than you do

XML sample—more like a flexible accordion—expands as needed

Page 12: Making your data work harder than you do

ER Diagrams show related tables

Page 13: Making your data work harder than you do

Authority record

Numeric key

All the information about the agent is supplied from this file on the basis of the numeric key

Page 14: Making your data work harder than you do

A note field is possible for every Core 4 element

Repeating values are supported for each element (using portals or subforms)

Numeric key

“indexed” value (in this case the sort name)

“display” value done to CCO recommended formatting. Note that the Agent Nationality is supplied automatically here by theLink (numeric key) to the Agent Authority

Page 15: Making your data work harder than you do

Creation of an XSLT

• XSLT stands for Extensible Stylesheet Language Transformation. XSLT is XML-based. You can use a stylesheet to take an XML document and turn it into plain text, PDF documents, web pages, or to import fielded data into other applications.

• In this case sample it creates a tab delimited file and specifies the field names in the headers when it is converted into Excel (extra step is to preserve diacritics with Unicode)

Page 16: Making your data work harder than you do

VCat• Begun 2004 with goal of being fully

relational, VRA Core 4 and CCO compliant and capable of Core 4 XML output—that goal met

• Reality in 2010—flattened Excel is still the lingua franca. The XML export stylesheet was used as basis to create a flattened Excel export

• So, there are now 2 exports (XSLTs) that can provide XML and Excel

Page 17: Making your data work harder than you do

VCat folderVCat folder

You can create as many stylesheets as you like for specific purposes

.xsd=schema; .xsl=stylesheet; .xml=document

Page 18: Making your data work harder than you do
Page 19: Making your data work harder than you do

Data flow and useCataloging utility (relational database)

InstitutionalDigital Repository

Web 2.0XMP (in images)

RSS feedsWebsites

tools

UsersXMP (in images)

Flat ExcelPDF

Delivery SystemsARTstor

MDIDCONTENTdm

LUNA Insight etc.

XML

xslt

xsltxslt

Page 20: Making your data work harder than you do

Sample XML output (small clip)

Page 21: Making your data work harder than you do

Data flow and useCataloging utility (relational database)

InstitutionalDigital Repository

Web 2.0XMP (in images)

RSS feedsWebsites

tools

UsersXMP (in images)

Flat ExcelPDF

Delivery SystemsARTstor

MDIDCONTENTdm

LUNA Insight etc.

XML

xslt

xsltxslt

Page 22: Making your data work harder than you do

Creation of a mapping document to a standard

• Flattened Core 4

Page 23: Making your data work harder than you do

• Flattening repeating fields

Page 24: Making your data work harder than you do

Sample Excel output(a small clip)

Page 25: Making your data work harder than you do

Data flow and useCataloging utility (relational database)

InstitutionalDigital Repository

Web 2.0XMP (in images)

RSS feedsWebsites

tools

UsersXMP (in images)

Flat ExcelPDF

Delivery SystemsARTstor

MDIDCONTENTdm

LUNA Insight etc.

XML

xsltxsltxslt

Data Dictionary

Page 26: Making your data work harder than you do

Creation of a Data Dictionary for each tool

• Data dictionaries help set the display look of the data that the patron sees—this can be customized and where the use of “index” and “display” values of Core 4 are crucial

• They also set the things the patron does not see—under the surface search parameters, like using early and late date (index fields) to do “fuzzy” searching

Page 27: Making your data work harder than you do

Display data is like publishing: arranges data attractively for user

Page 28: Making your data work harder than you do

Difference in user display and cataloger mode

We are used to seeing this in OPACs

Page 29: Making your data work harder than you do
Page 30: Making your data work harder than you do

Sample MDID Data Dictionary

Set import, field labels, thumb captions, sorting, searching, keyword searching, DC mapping for cross collection searching, advance search pop-down lists, etc.

Page 31: Making your data work harder than you do
Page 32: Making your data work harder than you do

VCat ARTstor

Page 33: Making your data work harder than you do

VCat-ARTstor Data Dictionary

Concatenate fields; “prepend” global information or labels

Set display order of grouped fields

Page 34: Making your data work harder than you do

Clustered (grouped) fields; ability to concatenate information or “preprend” information

Data Dictionary settings in action

Page 35: Making your data work harder than you do

Setting thumb captions

In this case, ARTstor has a floating information window; in other tools this would be a place to use the INDEX value of the name (which is the

sort value instead of the display)

Also allows user to change thumb sort

Page 36: Making your data work harder than you do

Users use keywords

Page 37: Making your data work harder than you do
Page 38: Making your data work harder than you do
Page 39: Making your data work harder than you do

Same data, different tools and users

• The following 3 slides show the same data prepared for our stock and royalty-free publishing site hosted on SmugMug—the educational data is reduced and compressed into an IPTC-like caption and keywords only and written into the image header. This means it can be seen using Cooliris as well (which is fun).

Page 40: Making your data work harder than you do
Page 41: Making your data work harder than you do

The Cooliris “wall” of images with captions

Page 42: Making your data work harder than you do
Page 43: Making your data work harder than you do

In general, you seek to adapt the xslt stylesheet and the data dictionary as needed rather than changing the data that you produce centrally—that should remain consistent to a standard and you should seek the ability to express that in a standard xml schema, as well as any other stylesheets. Hopefully future tools will ingest from the standard schema.

Page 44: Making your data work harder than you do

Thinking about how to present grouped or complex objects

• Think about this upfront so that your cataloging can help facilitate groupings—use of data values

• Also think about what needs to be consistently fielded data (including local field structure) to help order and sequence manuscripts and time-based works

• These will require local fields and data dictionary mapping and settings

Page 45: Making your data work harder than you do

Pragmatic, phased approaches

• Being able to find and update older records easily and consistently into full Core 4 when it is better supported in tools

• Supplying the data now in some useful form

Page 46: Making your data work harder than you do
Page 47: Making your data work harder than you do

“Collection” record