class 5-introto dl

61
Digital Libraries Lillian N. Cassel

Upload: madhuvardhan

Post on 01-Nov-2014

103 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Class 5-introto dl

Digital Libraries

Lillian N. Cassel

Page 2: Class 5-introto dl

A digital library• An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. --

• Wm Arms, Digital Libraries, 1999

• A focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. --

• Witten and Bainbridge, How to Build a Digital Library 2003

Page 3: Class 5-introto dl

What is a library?

• An active exercise to explore what we know about, and think about, traditional libraries.

• How do we translate these characteristics to the digital world?– Is that the right model? Are we

unnecessarily constraining the digital environment? Are there things that do not translate?

Page 4: Class 5-introto dl

Vannevar Bush

• “As we may think”• (http://www.theatlantic.com/doc/194507/bush)• Reflecting after WWII

– The value of collaboration– The sad use of scientific expertise to invent the

atomic bomb– The need for organization and access to

information.

Page 5: Class 5-introto dl

memex• Vannevar Bush’s vision

Image source:kelty.rice.edu/375/images/memex/camera.jpghttp://www.knowledgesearch.org/presentations/etcon/images/memex.gif

Page 6: Class 5-introto dl

MyLifeBits

• Gordon Bell and Microsoft• http://www.guardian.co.uk/science/story/0,3605,1674359,00.html

“Gordon Bell doesn't need to remember, but has no chance of forgetting. At the age of 71, he is recording as much of his life as modern technology will allow, storing it all on a vast database: a digital facsimile of a life lived.

If he goes for a walk, a miniature camera that dangles from his neck snaps pictures every minute or so, immediately committing the scene to a memory built not of neurons but ones and noughts. If he wanders into a cafe, sensors note the change in light, the shift of temperature and squirrel the information away. Conversations are recorded and steps logged thanks to a GPS receiver carried with him.”

Page 7: Class 5-introto dl

Related work

• Walden’s Path– http://www.csdl.tamu.edu/walden/– System used by itself or as a service within a digital library– Allows a user to make a path through a set of related

resources and save the path for reuse at a later time.• Used to allow a teacher to “blaze a trail” through a collection of

materials to help students find their way from a starting point to a goal.

• Also for recording personal trips through a collection of material to be revisited.

How does that compare to a set of bookmarks?

Page 8: Class 5-introto dl

Moving Forward

• Looked at what a library is• Now

– How do we translate that to a digital entity?

• Information resources, including digital libraries, are very complex systems. – A formal model helps to capture the essence of the system

and give special attention to specific areas– The model also allows developers of digital libraries to have

a check list of areas to consider and develop well.

Page 9: Class 5-introto dl

The 5S model

• Streams– The flow of information in various formats

• Structures– Organizational aspects of the DL

• Spaces– Views of components; real or abstract images

• Scenarios– Services and behaviors

• Societies– Communities and relationships among them

Page 10: Class 5-introto dl

5S summaryModel Primitives Formalisms Objectives

Stream Text; video, audio, software program

Sequences, types Describes properties of the DL content, encoding and textual material or particular forms of multimedia data.

Structure Collection, catalog; hypertext; document; metadata; organizational tools

Graphs; nodes; links; labels; hierarchies

Specifies organizational aspects of the DL content

Space User Interface; index; retrieval model

Sets; operations; vector space; measure space; probability space

Defines logical and presentational views of several DL components

Scenarios Service, event; condition; action

Sequence diagrams; collaboration diagrams

Details the behavior of DL services

Societies Community; managers; actors; classes; relationships; attributes; operators

Object-oriented modeling constructs; design patterns

Defines managers responsible for running DL services; actors that use those services, and relationships among them

Source: http://www.dlib.vt.edu/projects/5S-Model/

Page 11: Class 5-introto dl

Etana - A DL for archeology

Page 12: Class 5-introto dl

An example application of 5S - Etana: A DL for an archeological site

Text Video Audio

*Site *Sub-partition *Container *Artifact*LocusRegion

Taxonomies

Temporal Artifact-specific

Space model

Structuremodel

Metadata

Drawing Photo 3DStreammodel

*Partition

Society model

Archaeologist

General public

Geographic space

Service Manager

Information Satisfaction

Value added

Repository buildingScenario

model Services

Domain specific

User interface Metric space

Spatial

Source: E. A. Fox http://feathers.dlib.vt.edu/

Page 13: Class 5-introto dl

Applying the model, informallyPersonal Photos; Movie, TV, media• Stream - what types of data? Gif, jpg, avi? • Structure - How are the elements organized? Is

there a hierarchy? Are there multiple structures?• Spaces - How would you index the items? How

would you divide them into related groups• Scenarios - what services would you provide? What

information do we need to provide those services?• Societies - who is the library intended to serve?

Remember to include agents and other processes as well as users.

In your group, choose one or the other (photos or movie/TV/media).

Start with stream, scenarios, societies.

Page 14: Class 5-introto dl

More formally: Definitions

• Definition: A stream is a sequence whose co-domain is a non empty set.

• Definition: A structure is a tuple (G, L, F) where G = (V,E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function.

Page 15: Class 5-introto dl

Definitions, cont’d

• Definition: A space is a measurable space, measure space, probability space, vector space, topological space, or metric space– A vector space is a representation for the set of

elements in a collection. The vector representing each element is a set of characteristics held by that element and both connecting that element to others that are similar and distinguishing it from those that are different.

– We will do an exercise to illustrate

Page 16: Class 5-introto dl

Definitions - 3• Definition: A scenario is a sequence of related

transition events (e1, e2, …, en) on state set S such that ek = (sk, sk+1,) for 1 <= k <= n.– More easily visualized, a scenario is a path in a

directed graph, G = (S, ∑e), where vertices correspond to states in the state set S and directed edges are equivalent to events in a set of events, ∑e, and correspond to transitions between states.

– Scenarios must be implemented to make a working system.

Page 17: Class 5-introto dl

Definitions - 4

• Definition: A society is a tuple (C,R) where – C = (c1, c2, …, cn) is a set of conceptual

communities, each community referring to a set of individuals of the same class or type (e.g. actors, activities, components, hardware, software, data);

– R = (r1, r2, …, rm) is a set of relationships, each relationship being a tuple rj = (ej, ij) where ej is a Cartesian product ck1

x ck2 x … x cknj. 1<= k1 < k2 < …

< knj<= n, which specifies the communities involved in the

relationship and ij is an activity.

Page 18: Class 5-introto dl

The Digital Library Content

• Essential elements for a digital library– Users– Content– Services

Page 19: Class 5-introto dl

Content - requirements

• Store– Organize– Describe

• Find

• Deliver

Page 20: Class 5-introto dl

Describing the content

• How to describe content– Metadata

• Machine readable description of anything

• What description– Machine readable requires standard descriptive elements

• Dublin Core (http://dublincore.org/)– International standard– “a standard for cross-domain information resource description.”– 15 descriptive elements

• Other metadata schemes– IEEE-LOM

Page 21: Class 5-introto dl

Metadata

• What does metadata look like?

• Metadata is data about data– Information about a resource, encoded in

the resource or associated with the resource.

• The language of metadata: XML– eXtensible Markup Language

Page 22: Class 5-introto dl

Google Books Project

• Michael A. Keller, Closing Keynote– Ida M. Green University Librarian at Stanford, – Director of Academic Information Resources, – Publisher of HighWire Press, and – Publisher of the Stanford University Press:

• "One good turn deserves another; how the Google Book Search project is benefiting everyone".

Page 23: Class 5-introto dl

Google Books demo

• Full text - Life of Miguel de Cervantes

• Limited Preview - The Life of Miguel de Cervantes Saavedra

• Snippet View - "Discreción" in the Works of Cervantes: A Semantic Study

Page 24: Class 5-introto dl

What has been accomplished

• As of September 2006• Nearly 30,000 Stanford books digitized

– ~1M books from all partner libraries

• Over 4,000 books identified as needing preservation treatment (& so not digitized)

• A great debate about copyright has started– Orphan works– What can an archive do to provide access– Defense of fair use underway

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 25: Class 5-introto dl

Original Principles

• If legally possible, digitize every book (9M volumes) in the Stanford libraries– Now digitizing with imprint dates up to 1963

• Partner libraries (*added recently)– University of Michigan (similar to Stanford)– Harvard (public domain (?), maybe > 1M)– NYPL (public domain, unusual collections)– Oxford - Bodleian (earlier than 1885, ~ 1M titles)– University of California (similar to Stanford >6M)– (more to follow)

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 26: Class 5-introto dl

Purposes

• Digital preservation– Virtual Bookshelves in Stanford Digital Repository under

construction as part of the Stanford Digital Repository– For Stanford use only

• Other searching and research functions– Subtle searching (as in Socrates & HighWire)– Taxonomic (LCSH & HighWire) & Associative Searching (Takano)– Citation linking (HighWire & “InforTools” (Ebrary)– Better navigation (through visualization ?) (Grokker)

• Digitized books from all sources as test bed for new research; combine with articles, datasets, etc. for data mining & other transformative uses.

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 27: Class 5-introto dl

Some Conclusions• Google Book Search

– Is an indexing, not a publishing project– Offers substantial increases in access to contents of books

in library collections by keyword searching– Offers publishers global marketing of their publications– Offers several useful services to readers

• Offers participating libraries– Digital copies of books on their shelves for preservation– New possibilities for services to local readers– New possibilities for research for local faculty & students

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 28: Class 5-introto dl

Google statement

• “Many of the books in Google Book Search come from authors and publishers who participate in our Partner Program. For these books, our partners decide how much of the book is browsable -- anywhere from a few sample pages to the whole book.

• For books that enter Book Search through the Library Project, what you see depends on the book's copyright status. We respect copyright law and the tremendous creative effort authors put into their work. If the book is in the public domain and therefore out of copyright, you can page through the entire book and even download it and read it offline. But if the book is under copyright, and the publisher or author is not part of the Partner Program, we only show basic information about the book, similar to a card catalog, and, in some cases, a few snippets -- sentences of your search terms in context. The aim of Google Book Search is to help you discover books and learn where to buy or borrow them, not read them online from start to finish. It's like going to a bookstore and browsing - with a Google twist.”

http://books.google.com/support/bin/answer.py?answer=43729&topic=9259&hl=en

Page 29: Class 5-introto dl

Other projects

• Open Content Alliance (Yahoo and the Internet Archive)

• The Internet Archive www.archive.org

• The European Digital Library (Growing number of countries)

• others

Comments? Discussion?

Page 30: Class 5-introto dl

A DL example

• Library of Congress American Memory project– http://memory.loc.gov/ammem/index.html– “American Memory provides free and open access through

the Internet to written and spoken words, sound recordings, still and moving images, prints, maps, and sheet music that document the American experience. It is a digital record of American history and creativity. These materials, from the collections of the Library of Congress and other institutions, chronicle historical events, people, places, and ideas that continue to shape America, serving the public as a resource for education and lifelong learning.”

Page 31: Class 5-introto dl

Dublin Core for a map

• Map found in the LOC American Memory collection– Map at http://memory.loc.

gov/ammem/gmdhtml/gmdhome.html

• Dublin Core metadata illustration found at http://webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

– Part of a DL course at U. of Alabama

Page 32: Class 5-introto dl

Go to web site to explore what is there -- including copyright information, title, history, etc.

Page 33: Class 5-introto dl

Dublin Core: Title

• Name given, usually by the creator or publisher

< META name = “DC.Title”

content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ”

lang = “la”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 34: Class 5-introto dl

Dublin Core: Subject

• What the work is about, possibly keywords, terms from classification scheme if available.

<META name = “DC.Subject” content = “Middle Atlantic States - Maps

- Early works to 1800 - Facsimilies” scheme = “LCSH” >

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

LCSH = Library of Congress Subject Headers

Page 35: Class 5-introto dl

Dublin Core: Description

• Free text description, abstract, etc.

<META

name = DC.Description”

content = “An (sic) historical map showing the coast of New Jersey as perceived in the senventeenth century”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 36: Class 5-introto dl

Dublin Core: Source

• Is this object derived from another? Is this map a part of a larger map? Is this text a variation or revision of another piece of text?

<META name = “DC.Source”content = “G3715 1685 .V5 1969”scheme = “LCCN”

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

LCCN = Library of Congress Call Number

Page 37: Class 5-introto dl

Dublin Core: Language

• Language of the content of the resource

• For the map, there is no language content

<META

name = “DC.Language”

content = “nl”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 38: Class 5-introto dl

Dublin Core: Relation

• To what other object(s) or collection is this object related? Does it also exist in another collection? Is it derived from another document or image? How is it related?

<META name = “DC.Relation”content = “isPartOf

http://lcweb2.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+@band(g3715+ct000001))+@field(COLLID+dsxpmap))

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 39: Class 5-introto dl

Dublin Core: Creator

• Person or organization responsible for the Intellectual Content of this object

<META

name = “DC.Creator”

content = “Nicolaum Visscher”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 40: Class 5-introto dl

Dublin Core: Publisher

• Entity responsible for making the resource available in its present form

• Not shown in the example, but should be something like this:

<META name = “DC.Publisher”content = “Library of Congress American Memory Project”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 41: Class 5-introto dl

Dublin Core: Contributor

• Any entity making a contribution to this object.

• Example: someone who added some information to the original document or image

• No entry for this map.

Page 42: Class 5-introto dl

Dublin Core: Rights

• A pointer to a copyright notice, a rights management statement, or a rights server.

<META

name = “DC.Rights”

content = http://lcweb2.loc.gov/cgi-bin/ ammemrr.pl ?title=%3ca%20href%3d%22%2fammem

%2fgmdhtml %2fdsxphome.html%22%3eDiscovery%20and%20Exploration %3c%2fa%3e&coll=gmd&div=&agg=g3715&default=ammem &dir=ammem

>

Page 43: Class 5-introto dl

Dublin Core: Date

• Date on which this object was made available in its present form, possibly the date it was entered into this digital collection.

<META

name = “DC.DATE”

content = “1996-04-17”

scheme = “ISO 8601”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Specify the date format so that others can interpret it correctly

Page 44: Class 5-introto dl

Dublin Core: Type or Category

• What sort of thing is this? Some examples: home page, novel, poem, working paper, technical report, essay dictionary, …

• Type should be selected from a controlled list. For example, see the DCMI Type Vocabulary:

• http://dublincore.org/documents/2006/08/28/dcmi-type-vocabulary/

Why is this recommended as a controlled vocabulary field?

Page 45: Class 5-introto dl

DCMI Type Vocabulary

• Collection• Dataset• Event• Image• InteractiveResource• MovingImage

• PhysicalObject• Service• Software• Sound• StillImage• Text

See the official page for explanations of the categories. Note that Image is a broad category and Moving Image and StillImage are more restricted subcategories.

Page 46: Class 5-introto dl

Dublin Core: Type

• Category of this resource

<META

name = “DC.Type”

content = “image.photograph”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 47: Class 5-introto dl

Dublin Core: Format

• The way the content is encoded. This tells what resource is needed to access this content.

<METAname=“DC.Format”content = “image/gif”scheme = “IMT”

>

Internet MIME Types: http://www.ltsw.se/knbase/internet/mime.htp

See also Internet Media Type: http://www.graphcomp.com/info/specs/mime.html

Page 48: Class 5-introto dl

Dublin Core: Unique ID

• The key for this object in the collection.• I cannot find one for the map we are looking

at, but the ID for the map of which it is a part is g3715 ct000001

• The Metadata specification for that would be<META name= “DC.Id”

content = “g3715 ct000001”>

Source: http://memory.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+ @band(g3715+ct000001))+@field(COLLID+dsxpmap))

Page 49: Class 5-introto dl

Dublin Core: Coverage

• The time, space or other measurement of the scope or completeness of the object.

• No coverage entry specified, but might be this:

<META name = “DC.Coverage”content = “North America, Eastern lands and coast, as viewed in late seventeenth century”

> Example not a controlled vocabulary. Why would a controlled vocabulary be better?

Page 50: Class 5-introto dl

International Concensus

• Recognition of International Scope ofResource Discovery on Web

• 17 Countries Currently Involved in DCWorking Groups

• 50+ Implementation Projects in 10Countries

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 51: Class 5-introto dl

Guide to Good Practice

• The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials

• http://www.nyu.edu/its/humanities/ninchguide/index.html

Page 52: Class 5-introto dl

Legal and Technical Issues

• Legal: When is a resource available to digitize and make available. What requirements exist for controlling access.

• Technical: How do we control access to a resource that is stored online?– Policies– Encoding– Distribution limitations

Page 53: Class 5-introto dl

Date of work Protected from Term

Created 1-1-78 or after

When work is fixed in tangible medium of expression

Life + 70 years1(or if work of corporate authorship, the shorter of 95 years from publication, or 120 years from creation

Published before 1923

In public domain None

Published 1923 - 63

When published with notice 28 years + could be renewed for 47 years, now extended by 20 years for a total renewal of 67 years. If not so renewed, now in public domain

Published from 1964 - 77

When published with notice 28 years for first term; now automatic extension of 67 years for second term

Created before 1-1-78 but not published

1-1-78, the effective date of the 1976 Act which eliminated common law copyright

Life + 70 years or 12-31-2002, whichever is greater

Created before

1-1-78 but published between then and 12-31-2002

1-1-78, the effective date of the 1976 Act which eliminated common law copyright

Life + 70 years or 12-31-2047 whichever is greater

Chart created by Lolly Gasaway. Updates at

http://www.unc.edu/~unclng/public-d.htm

Page 54: Class 5-introto dl

Works for hire

• Usual case -- works created by faculty are not the property of the university. – Faculty surrender copyright to publishers of

journals and books– Some publishers allow faculty to retain

copyright, giving the publisher specific limited rights to reproduce and distribute the work.

Page 55: Class 5-introto dl

Fair use

• No clear, easy answers.

• Checksheet provided in the article is a good guide to the issues.

• Link to the checksheet: http://www.copyright.iupui.edu/checklist.htm

Page 56: Class 5-introto dl

Moral rights

• Fair to the creator– Keep the identity of the creator of the work– Do not cut the work– Generally, be considerate of the person (or

institution) that created the work.

Page 57: Class 5-introto dl

Getting Permission

• With the best will in the world, getting the appropriate permissions is not always easy.– Identify who holds the rights– Get in touch with the rights holder– Get a suitable agreement to cover the needs of your use.

• Useful links: http://www.loc.gov/copyright/http://www.utsystem.edu/OGC/IntellectualProperty/PERMISSN.HTM

– Connections to various ways to discover and contact the rights holder of a work.

Page 58: Class 5-introto dl

Source: NINCH Guide to Good Practice. Chapter 4:

Rights Management

Checking copyright status

Page 59: Class 5-introto dl

Source: NINCH Guide to Good

Practice. Chapter 4: Rights

ManagementCopyright: Lauryn

G. Grant

Considering people

depicted in the work

Page 60: Class 5-introto dl

Technical issues

• Link the resource to the copyright statements• Maintain that link when the resource is copied

or used• Approaches:

– Steganography– Encryption– Digital Wrappers– Digital Watermarks

Page 61: Class 5-introto dl

Issues in Encryption

• General cases for protection of controlled content: Concern for passive listening, active interference.– Listening: intruder gains information, may not be detected.

Effects indirect. – Active interference

• Intruder may prevent delivery of the message to the intended recipient.

• Intruder may substitute a fake message for the intended one• Effects are direct and immediate• Less likely in the case of digital library content