discovering and describing dpla collections students: madhura parikh, zhang zhang karen wickett,...
TRANSCRIPT
![Page 1: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/1.jpg)
Discovering and Describing DPLA Collections
Students: Madhura Parikh, Zhang Zhang
Karen Wickett, Unmil P. KaradkarSchool of Information
The University of Texas at Austin
![Page 2: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/2.jpg)
in the portal
![Page 3: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/3.jpg)
in the portal
![Page 4: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/4.jpg)
in the data
![Page 5: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/5.jpg)
in the data
![Page 6: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/6.jpg)
so what?• Collection-level entities and collection
descriptions can support a range of functions:– representing data providers– providing context for items– managing and presenting search results– assessing relevance and accessibility– supporting the contribution of collections by users.
Modeling Cultural Collections for Digital Aggregation and Exchange Environments. CIRSS Technical Report 201310-1, University of Illinois at Urbana-Champaign.
![Page 7: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/7.jpg)
Approach• Based on collection/item propagation rules
– link item-level attribute/value pairs to collection-level attribute/values pairs• collection attributes from a collection-level schema• item attributes from DPLA’s
Metadata Application Profile
– in general, allow reasoning in either direction– we are experimenting with building descriptions of
collections, using:• descriptions of items• collection membership• a guiding propagation rule
![Page 8: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/8.jpg)
collection-level properties• Collection title• Collection
description• Begin date• End date• Geographic
boundary• Places
• Subjects• Formats• Languages• Genres • Rights
![Page 9: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/9.jpg)
Approach• Take data from the DPLA based on collection
membership– e.g. all items in the “Minnesota Newspapers
Collection”• Pick a target collection-level field
– e.g. dc:subject• Identify source data fields in item records
– e.g. dc:subject and dc:description
![Page 10: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/10.jpg)
Approach (con’t)• Aggregate item data from across the
collection– e.g. all unique subject strings, along with
frequency counts• Derive collection-level values for the selected
attribute– e.g. five subjects for the Minnesota Newspaper
Collection.• Add attribute/value pair to collection record
![Page 11: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/11.jpg)
WARNING: The following presentation contains strong,
graphic imagery.
![Page 12: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/12.jpg)
Collection-level Metadata Generation
Support for: Portal users, Humanities scholars
![Page 13: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/13.jpg)
Architecture
Aggregated subject values
Aggregated date values
CI I I I I I …
Date Deriver
Collection Date values
Subject Deriver
Collection Subject values
Collection Description
Extract
Derive
Enrich
Aggregated Spatial values
Spatial Deriver
Collection Subject values
Aggregate
Populate
![Page 14: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/14.jpg)
ArtStor
Dates Format Variations
![Page 15: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/15.jpg)
Date Processing
Begin and end dates
imperfect but consistent
![Page 16: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/16.jpg)
Parser Factory
Inside the Date DeriverAggregated date values
Rule Factory
Begin year
End year
Additional rules
DD
D
D
Years with known formats
D
DDD
D
D D
D
D
D
D
Collection Date values
D
D
![Page 17: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/17.jpg)
Subject - Phrases
![Page 18: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/18.jpg)
Commonalities and Differences
![Page 19: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/19.jpg)
Thresholded Boundaries
VariantsOjibwe-
Ojibway
GLBT-LGBT
Hierarchies
Labor Unions
Minnesota
Minneapolis
Newspapers
Labor Unions
Organizing
![Page 20: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/20.jpg)
Automatic? Descriptions
![Page 21: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/21.jpg)
Automatic? Descriptions
![Page 22: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/22.jpg)
Inside the Subject Deriver
Aggregated subject values
Parser Factory
Tokenizer Tokenizer Tokenizer
Rule Factory
Thresholddetector
Cluster generator
Wordnet analyzer
Other
rules
Aggregated title values
Aggregated description values
Collection Subject values
![Page 23: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/23.jpg)
Current Description
id: 49b09ce719c5184f166920a1a7c1e8cd
Title: Minnesota Newspapers Collection
Description: The Minnesota Digital Library is now providing access to some of Minnesota's historical newspapers. We are focusing our attention on titles, volumes and issues that were never microfilmed, and where the originals are frail and not frequently available to the public
![Page 24: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/24.jpg)
.collectionResource.<property>
dateCreated: 3/30/2015
itemCount: 3528
date.begin: 1867
date.end: 2009
subjects: [Helpers, lockouts, Drivers, Indian, Indians, American, Sauk, Minnesota, Minneapolis, Gay, GLBT, Homosexuality, missions, Mission, Community, Ojibwa, Ojibway, Ojibwe, Pine, River, County, Strikes, Petroleum, Union]
Enhanced Description
![Page 25: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/25.jpg)
spatial.boundary: [[153.06667, -27.28333], [-99.8111038208, 41.5272712708], [-94.8796463013, 47.4731407166], [132.270004272, -14.4532003403], [153.06667, -27.28333]]
formats: newspapers
languages: English, Dakota
dataProviders: [“Bemidji State University”, “Center for Human Resources and Labor Studies”, “Heritage Group North”, “Morrison County Historical Society”, “Morrison County Historical Society”, “Quatrefoil Library”, Sauk Centre Area Historical Society”, “Synod of Lakes and Prairies”]
rights:
Enhanced Description
![Page 26: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/26.jpg)
Visual Assessment
![Page 27: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/27.jpg)
S
S
S
S
C
C
C
C
DPLAD
D
D
DD
D
DD
D
D
DD
Collection ProfilesSupport for: DPLA, Hub, Data provider Staff
![Page 28: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/28.jpg)
Approach
Numeric characterization
(for now) ignore semantic assessment
Assess consistency
enhance automation, computation
Assess compliance to MAP (3.1)
required, recommended fields
Support visual analysis
(early stage)
![Page 29: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/29.jpg)
Collection Profile
DPLA Collection data
Administrative data
Collection and item details
![Page 30: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/30.jpg)
Collection Details
![Page 31: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/31.jpg)
Item Details
![Page 32: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/32.jpg)
Visual Analysis
id: 49b09ce719c5184f166920a1a7c1e8cd
Title: Minnesota Newspapers Collection
Item titles Item rights
![Page 33: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/33.jpg)
Other Fields
publisherformat
coordinates names
spatial
subjects
![Page 34: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/34.jpg)
Subjects - Assessment
![Page 35: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/35.jpg)
Subjects - Analysis
![Page 36: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/36.jpg)
Correlations
coordinates names
![Page 37: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/37.jpg)
Correlations
coordinates namessubjects
![Page 38: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/38.jpg)
Collection description dashboard
Evaluation of developed algorithms and metrics
Implications and Ongoing Work
![Page 39: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/39.jpg)
Contact
Unmil P. Karadkar <[email protected]>
Karen Wickett <[email protected]>
Temple Teaching Fellowship, School of Information, UT Austin
Acknowledgements
Mark Matienzo, Tom Johnson, Gretchen Gueguen, and the DPLA staff
Student programmers: Jiexian Li, Zheyuan Zhu, Nan Guo, Ruoying Li, Jeremy Tzou, Julia Link, Andrew Florance, Joshua
Sheehy, Meghanath Reddy, Robert Flores, Sowmya Sadhasivam
![Page 40: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649e915503460f94b967b1/html5/thumbnails/40.jpg)
Collection description dashboard
which features?
which fields?
Evaluation of developed algorithms and metrics
Discussion