artstor a digital library of online collections for education, research and scholarship digital art...

35
ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr. William Ying CIO and VP of Technology ARTstor Willem van de Velde I: Calm Sea, Alte Pinakothek (Munich, Germany); Scala/Art Resource

Upload: myron-griffith

Post on 22-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor A digital library of online collections for Education, research and scholarship

Digital Art History Workshop Malaga, Spain

September 24, 2011

Dr. William YingCIO and VP of Technology

ARTstor

Willem van de Velde I: Calm Sea, Alte Pinakothek (Munich, Germany); Scala/Art Resource

Page 2: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Application of a KOS in ARTstor and Shared Shelf: A Digital Library and a Networked Image Cataloguing and Management Solution

Essential to the successful implementation and use of any digital library is the organization of that library, by one or more knowledge organization systems (KOS). KOS includes classification and categorization schemes that organize materials at a general level, subject headings that provide more detailed access and authority files that control variant versions of key information. KOS also includes highly structured vocabularies, such as thesauri, and less traditional schemes, such as semantic networks and ontology. This presentation will explore how ARTstor Digital Library over the years has improved it’s usefulness to the scholarly community by applying different increasing sophisticated Knowledge Organization Systems.

Page 3: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor Digital Library Today

• Founded by The Andrew W. Mellon Foundation in 2001

• Independent nonprofit organization since 2003

• Launched in July 1, 2004

• 1.3 million images and growing

• 200+ museum & special collections

• Museums, archives, libraries, artists, artists’ estates, scholars, photographers

• Password-protected database restricted to educational and scholarly users

Page 4: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor is a repository of aggregated collections

Page 5: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor is a networkof educational and scholarly users

Page 6: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor serves 1,350+ subscribing educational institutions and museums in 45 countries

Page 7: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor is a workspacefor research and teaching

Page 8: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr
Page 9: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Research is not only the work of scientists. It is the work of any thoughtful person who exercises any profession….The Andrew W. Mellon Foundation will continue to support research in the humanities and the arts in the belief that research as an activity is what we mean when we say that every mind should steadily be engaged in making “shape, order, meaning, purpose, where there was none, or none discernible.”

-Don M. Randel, President, The Andrew W. Mellon Foundation, March 2011

Commitment to research

Page 10: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Curated collections for scholarshipARTstor keyword search – “Hercules”

Page 11: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Differentiating general interest content from scholarly content Google images search – “Hercules”

Page 12: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

ARTstor collection types

• ARTstor collection - 200+ museum & special collections

• Hosted Institutions collections – collections maintained and cataloged locally by participating institutions and ingested and hosted in ARTstor

• Personal collection – images uploaded and metadata maintained in ARTstor by individual users

Page 13: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Yearly ARTstor repository growth by content type

  2004 2005 2006 2007 2008 2009 2010 2011-Aug

 ARTstor Collection 239,148 447,120 491,807 699,762 839,747 1,087,149 1,274,013 1,334,258

 Hosted Institution Collection 80,963 136,270 448,491 1,046,191 1,631,812 2,287,324 2,702,551 2,793,883

Instructor Only IC     27,748 37,269 43,817 53,291 63,255 65,611

 Personal Collection 1,100 8,592 66,487 181,107 291,531 398,393 486,134 529,310

Total 321,211 591,982 1,034,533 1,965,159 2,807,737 3,826,987 4,526,777 4,723,886

Page 14: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

How can we help users find what they want?

• Challenges:– Diversified collections coming to us without

uniform metadata– Metadata completeness differ tremendously

between collection types– No standard way to name an art work or creator

Page 15: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Using Title to find records in ADL• ARTstor now uses “exact matches” for search

– Jesus Carrying the Cross (The Way to Calvary)*– The Procession to Calvary– The Hunters in the Snow — also known as The Return of the Hunters*– Actually we can use either title above to find the painting because we

have both images with different title in a cluster.• Would “fuzzier” search help? Match 3 out of 4 words etc?

– Un Enterrement a Ornans– Funeral at Ornans– Work registry with foreign title would help!

• Dis-ambigous simple search result

Page 16: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Best kept secret of ARTstor

• The more you type in the search box, the less likely you will find the art work you want!

Page 17: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Knowledge Organization for Digital Libraries

Essential to the successful implementation and use of any digital library is the organization of that library, either directly or indirectly, by one or more knowledge organization systems (KOS).

Knowledge organization systems include 1. classification and categorization schemes that

organize materials at a general level, 2. subject headings that provide more detailed

access, and 3. authority files that control variant versions of

key information such as geographic names and personal names.

Page 18: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

How is ARTstor using Knowledge Organization Systems?

1. Simple search2. Advanced search – Title, Creator, Date, Classification,

Geography3. Faceted search – Classification, Date, Geography4. Browse – Geography, Classification5. Topics – ARTstor created, User created6. Cluster and Associated Images

Page 19: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

In the beginning

• ARTstor started with 400,000 images• One image – one metadata record• NO KOS!!!• Plain old simple keyword search (POSKS)

Page 20: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

We even try “advanced search” and it was a disaster!

• We even allow our users to do “advanced search” on a lot of the fields in the metadata record such as “material” and “date” which makes no sense since the data were not normalized. The material could be O/C, oil on canvas, or 20 different other combinations!

• It was so bad, we have to remove the “Advanced search” feature!

Page 21: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Next, the beginning of KOS, we start to “organize” our metadata and add “knowledge”

• Add 16 Classification: While no one standard served as the basis for this classification scheme, the Metadata team consulted the Getty Art & Architecture Thesaurus (AAT), among other vocabularies, when formulating these categories.

• Architecture and City Planning • Decorative Arts, Utilitarian Objects and Interior Design • Drawings and Watercolors • Fashion, Costume and Jewelry • Film, Audio, Video and Digital Art • Garden and Landscape • Graphic Design and Illustration • Humanities and Social Sciences • Manuscripts and Manuscript Illuminations • Maps, Charts and Graphs • Paintings • Performing Arts (including Performance Art) • Photographs • Prints • Science, Technology and Industry • Sculpture and Installations

Page 22: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Date and Geography• Creation date in ARTstor record is free text• Special algorithm and manual process is used to create numeric earliest and latest

date• Standardized geographic terms have been applied to the descriptive data for

ARTstor images, according to a controlled list of country names based on the Getty Thesaurus of Geographic Names (TGN). Geographic terms were assigned according to two different criteria: For site-specific works (architecture, mural painting, public monuments, etc.), the country term was assigned based on the location of the work. For objects now in repositories, country terms were assigned on the basis of the nationality of the creator. In cases where these two criteria overlapped (e.g. an American architect’s preparatory drawing for a building built in Spain), the records were assigned two country terms.

Page 23: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Collection ranking

• Every image record in a collection is assigned a collection ranking

• Images that have high ranking will show up first in the search result page

Page 24: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Finally, the beginning of Meaningful Concept Display

• Sorting in search result page: Date, Relevancy (collection ranking)

• Advanced search• Title and/or creator• Date range• Geography• Classification

• Dynamic Filtered search• Classification• Geography• Date

Page 25: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Knowledge from Image Groups created by user

• Simple IG – active sharing by browsingPublic folder, Institution folder, password protected folder

etc.

• Describe IG – active and searchable inside an institution

• Describe IG – searchable and shared across institution (future)

Page 26: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Browsing

• We can also browse by the new found “knowledge”1. Geography2. Classification

Page 27: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Browse by Featured Groups – knowledge created by ARTstor and Scholars

• In each Sample Topic group, you will find iconic images mixed with other selections that are meant to trigger new ideas and provoke deeper research. Each Sample Topic is intended to be an inclusive, rather than comprehensive, introduction to a particular subject or discipline. We encourage you to use these Sample Topics to search and browse for more images related to these and other subjects in ARTstor.

• The Travel Award winning groups are made available as excellent, user-contributed examples of integrating ARTstor images into teaching and research. We hope these groups will illustrate and inspire the cross-disciplinary application of ARTstor images.

Page 28: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Clustered images

• Through a lot of BSW (Blood , sweat and tears) images of the same work, whether duplicates or details, have been clustered behind a preferred image.

Page 29: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Associated images (group knowledge)

• Using Item-to-item Collaborative filtering approach, we decide to follow Amazon’s approach

• Using implicit data collection, instead of using external data, we use data captured in image groups created by instructor grade users. We assume that each individual image group contain images that are related to each other.

• We attempt to help our user find images related to the one that he/she is interested in.

Page 30: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

How did we do it?• First, we pick all images that appear in at least X Image Groups created by

Instructor grade users. • Next, for each of these images, we form a cluster that include all image that

appear in at least Y Image Groups with this image.• Next, we combine all images that belong to the same duplicate cluster together.

Different users may use different version of Mona Lisa.• Next we rank all images inside a cluster by how many times this images appear

with the master image. • Next, we exclude all cluster that have less than Z members• We come up with all these Collaborative filtering clusters.

Page 31: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Name authority

• We have created a system to use ULAN (Getty List of Artist Names) to normalize “creator” in the 1.4 million records in ARTstor. We are only 30% finished and hundred of thousands of new records adding to ARTstor every year.

• The problem is we cannot use ULAN as a major facet to search into ARTstor as most of the record will not be discoverable this way.

• Instead, we have added variant names from ULAN to ARTstor records that have been matched.

Page 32: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Michelangelo and his variant names from ULAN

• Buonarroti, Michelangelo (preferred,V,index), Michelangelo Buonarroti (V,display)• Michelangelo Buonarotti , Michelangelo , Michelagnolo di Lodovico Buonarroti

Simoni,• Michelagniolo di Lodovico de Lionardo di Buonarroto Simoni • Michelagniolo di Lodovico di Lionardo di Buonarroto Simoni , Buonarroti, • Michel Angelo, Buonarroti, Michelagniolo , Bonarroti, Michelangelo, • Bonorotti,Michelangelo , Buonarota, Michelangelo , Michael Angelo Buonaroti , • Michael Angelo Buonarotti , Michelagniolo Buonarroti, Michelagnolo Buonarotti,• Michelagnolo Buonarruoti, Michelange Bonaroti, Michelang. o Bonarota • Michel Angel de Bonarrotta, Michelangelo Bonarota, Michelangelo Bonaroti • Michelangelo Buonarota, Michelangelo Buonaroti, Michelangelo Buonarrota • Michelangelo Buonnaroti, Michelangiolo Buonaroti, Micheleangelo Buonarota • Michel Angelo, Michel'Angelo, Michael Angelo, Michel Ange, Michel-Ange • Michel Aniol, Mighelagnolo, Miguel Angelo, Mikelandzhelo, Mikel-Andzhelo,

Mikilanjilu

Page 33: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Using other external ontology

• From Freebase's ontology, we can further extract: Period name, start and end dates Creator names, birth and death dates, belonging

to a period

Page 34: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

External data source

• Once the two data sources were worked out,• Freebase's creator data was mapped to ULAN creator data. • Thus, creators were grouped by period.

Page 35: ARTstor A digital library of online collections for Education, research and scholarship Digital Art History Workshop Malaga, Spain September 24, 2011 Dr

Freebase, ULAN and ARTstor relationship

FreebasePeriod and Creator

ULANCreator

ARTstorrecords

By computer By hand