preservation, publishing, and people: a sead view
DESCRIPTION
Slides from SEAD's workshop on the Virtual Archive. This took place on Jun 30, 2014 in Bloomington, IN.TRANSCRIPT
![Page 1: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/1.jpg)
Preservation, Publishing, and People: a SEAD View
Beth PlaleDirector, Data To Insight Center
Indiana UniversityIU Scholarworks
![Page 2: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/2.jpg)
Publishable results of computationally‐based science rarely takes form of single data file or homogeneous collection.
More often bundle: primary results, metadata describing the generated data, software used, configuration parameters used
with the software, input data sources, ….
We call these bundles Research Objects
Bechhofer, S., Buchan, I., De Roure, D., Missier, P., Ainsworth, J., Bhagat, J., … & Goble, C. (2011). Why linked data is not enough for scientists. Future Generation Computer Systems, 29(2), 599–611.
![Page 3: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/3.jpg)
Data lifecycle• Research occurs over months to years. Praveen Kumar study of Mississippi River Basin flood of late April, early May 2011.
• Arrange funding, define objectives (2011) • Data gathering: sample flood plain at designated locations, take pictures, obtain satellite data, contract with independent organization to fly over the area with Lidar
• Data cleaning and analysis• Publish 2‐3 papers (2014)• Decide what data to package for publishing alongside publications• Publish the datasets• Each published package we call a Research Object
![Page 4: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/4.jpg)
Publish‐reuse window
•We focus on one window in time in lifecycle of research data : starts when researcher is ready to make data publically available … through to its first case of use by unrelated party (reuse).
“publish‐reuse window”•Why this window?• Repository services have to be self‐documenting to achieve reproducibility. I derive new object from object in SEAD VA, I revise object in SEAD VA – these are different actions, by different people, with different implications.
![Page 5: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/5.jpg)
Publish reuse window and important actors•Data in shared file system, or other project space
Researcher brings together,
organizes, cleans, and analyzes data
Researcher brings together,
organizes, cleans, and analyzes data
•Package up into Research Object
Researcher organizes and preps data for publishing
Researcher organizes and preps data for publishing Researcher
initiates submission for deposit
Researcher initiates
submission for deposit •Data curator
examines object, augments, and approves
SEAD VA unpacks RO
and processes for deposit to IR
SEAD VA unpacks RO
and processes for deposit to IR
•Download RO, new object created
Data scientist uses published RO in his/her research
Data scientist uses published RO in his/her research
Publish‐reuse window
Actors:Data creator Curator
Data scientist
![Page 6: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/6.jpg)
Research Object: what the RO is
• The Research object (RO) is an aggregation of resources that can be transferred, produced, and consumed by common services across organizational boundaries. The RO encapsulates digital knowledge and provides a mechanism for sharing and discovering re‐usable research.
• ROs are a bundle of primary results, metadata describing the generated data, software used, configuration parameters used with the software, input data sources, …
• An RO can and will likely have multiple manifestations. • Research object is the publishable object.
![Page 7: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/7.jpg)
Why is Research Object view important?
• Addresses weaknesses in existing solutions: The hierarchical “belongs to” organization of information is extremely inadequate for all but simplest cases.
• Facilitates reproducibility: We can no longer look just at data products: software is critical for reproducibility (even if repeatability is not the goal.)
• Allows for uniform handling: Research object is dropped into a BagIT bag (1 bag = 1 RO). SEAD VA accepts bags of all colors, but all are bags. Lifecycle of ROs tracked in SEAD VA
• Just makes sense: When is the result of a scientific dissertation a uniform collection of files with fixed directory structure? <answer: never>
![Page 8: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/8.jpg)
Research Object, Components of
• Identity : unique ID • Entities : core data or software objects themselves• Properties : Aggregation : “belongs to” relationship, used to aggregate within Research Object
• Properties : Relationships : “related to” relationship• Properties: Descriptive/Annotative : metadata• Properties: Provenance : “derived from”, “versioned from” relationship as well as others
• Properties: Agents : data creator (author list), curator, data scientist• State : external to the RO
![Page 9: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/9.jpg)
Research Object
![Page 10: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/10.jpg)
Research Object State TransitionAn RO in one of three states: LO, PO, and CO as follows:
• Live object (LO) – a work in progress. Data creator assembling content for publication
• Curation object (CO) – an object after creator signaled intention to publish. Curator works on the curation object; changes are selective.
• Publication object (PO) ‐ a final version ready to be disseminated widely. Published Objects (PO’s) are mutable under certain conditions only.
RO described by model:
RO = {s, dm, c}Where s is state of an RO at any point in time, dm is its descriptive metadata, and cis the entities (core content) and relationships amongst entities
![Page 11: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/11.jpg)
RO = {s, dm, c}
Where s is state of an RO at any point in time, dm is its descriptive metadata, and cis the resources and relationships amongst resources.
State transition graph
![Page 12: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/12.jpg)
Architectural implications of RO model
SEAD Virtual Archive
User Interface (GWT web application)
Ingest workflow (Data Conservancy)
KomaduProvenance System
VA Registry
RO Subsystem
Matchmaker
![Page 13: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/13.jpg)
RO model implemented in SEAD VA
SEAD Virtual Archive
Ingest workflow (Data Conservancy)
KomaduProvenance System
VA Registry
RO Subsytem
Matchmaker
Extended ingest workflow to seamlessly: ‐ Extract RO
from BagITbag
‐ Transition from RO to SIP model of Data Conservancy model
User Interface (GWT web application)
![Page 14: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/14.jpg)
RO model implemented in SEAD VA
SEAD Virtual Archive
Ingest workflow (Data Conservancy)
KomaduProvenance System
VA Registry
RO Subsytem
Matchmaker
Extended SEAD VA with registry and provenance tracking to implement RO lifecycle. Modular functionality (built outside DC for portability)
User Interface (GWT web application)
![Page 15: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/15.jpg)
People: Data Creator, Curator, and Data Scientist
![Page 16: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/16.jpg)
Each of Data Creator, Curator, and Data Scientist are related to one another, and their relationship is through the Research Objects that they create, work on, and use.
This relationship information exists in form of provenance in SEAD VA. Future work is to capture these nuanced relationships in the SEAD Research Network as well.
![Page 17: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/17.jpg)
And onto … SEAD VA Workshop Agenda and Resources
http://bit.ly/sead‐va‐workshop063014
![Page 18: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/18.jpg)
Data Creator in SEAD VAInna Kouper
![Page 19: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/19.jpg)
OverviewThe Data Creator collects data and, once done with a study, gathers materials that support the study and submits them for publication and preservation in institutional repositories.
Example: A dissertation that is based• images from USGS• spreadsheets with numbers and calculations• computing scripts• videos of experiments
In VA a data creator can:• Upload research objects (ROs)• Preview, review and download ROs• Check status of ROs in queue to IR
![Page 20: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/20.jpg)
Background : SEAD Services
• SEAD Research Network• Project Spaces• Packaging and Mapping
![Page 21: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/21.jpg)
Research Network• Network of data creators, curators and re‐use scientists across disciplines• Rich ontology to support links to data, projects and publications• Visualizations of co‐authorship and co‐citation
![Page 22: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/22.jpg)
ORCiD / SEAD Research Network Integration
• Create empty profile in VIVO• Execute harvester• Ingest data
![Page 23: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/23.jpg)
Project Spaces
• 15 project spaces (incl. an open demo space and an internal testing space)
• Thousands of collections in active curation• Once a collection is marked for publication, it can be ingested into Virtual Archive
![Page 24: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/24.jpg)
Project Space = Active Content Repository (ACR)
![Page 25: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/25.jpg)
![Page 26: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/26.jpg)
Packaging and Mapping (BagIT / ORE)
• BagIt format• standardized “envelopes” (bags)• no requirements for “knowing” internal semantics• 3 elements: a bag declaration (bag.txt), a manifest file (manifest‐<algorithm>.txt, folder with content (data)
• Tools available for bagging• SEAD BagIt service• LOC Bagger tool (http://sourceforge.net/projects/loc‐xferutils/files/loc‐bagger/2.1.2/)
![Page 27: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/27.jpg)
Resource Maps• OAI/ORE standard
• Exposes rich content• Captures semantic of relationships among RO items• Identifies aggregations
• SEAD VA OAI/ORE relationship classes: • Aggregation• Description• Authorship• Copyright / rights• Modification• Derivation• Citation• Processing (calculation, computation, etc.)
![Page 28: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/28.jpg)
OAI‐ORE Example
Resource Map
Aggregation
_readme
spreadsheet
image
Image 2.0
spreadsheet 1.1
describes
aggregates
describes
wasDerivedFrom
wasModifiedFrom
aggregates
Aggregation2.0
wasDerivedFrom
![Page 29: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/29.jpg)
OAI/ORE Map Example<rdf:RDF…<rdf:Description rdf:about=URI> <!‐‐ data item‐‐>
<ore:isAggregatedBy>ID</ore:isAggregatedBy><dcterms:identifier rdf:datatype=URI>ID</dcterms:identifier><dcterms:title rdf:datatype=URI>Vortex_Mining.xlsx</dcterms:title><dcterms:source rdf:datatype=URI>test_bag/data/Vortex_Mining.xlsx</dcterms:source>
<!‐‐ A related resource from which the described resource is derived. ‐‐></rdf:Description>…..</rdf:RDF>
![Page 30: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/30.jpg)
Demo / Hands on[Data creator role]
![Page 31: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/31.jpg)
Download Test Research Objects
Or go to https://iu.box.com/sead‐va‐test‐bags
![Page 32: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/32.jpg)
Register / Sign In• Go to http://seadva‐test.d2i.indiana.edu:5672/sead‐access/• Click LOG IN and fill your login information (or click SignUp below)
![Page 33: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/33.jpg)
Upload Research Object• On the Upload Data tab, click “Choose File”• Select a test dataset in the dialog window• Click upload
Upload Data Tab
![Page 34: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/34.jpg)
Review Research Object
• Check that the object is correct
• Change project name and description
• Agree to the license terms• Click “Submit Dataset for Review”
![Page 35: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/35.jpg)
Status and Success Messages
![Page 36: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/36.jpg)
Trace Activity• Go to activity tab• See all actions performed by you• Click on the dataset name to see details
Activity tab
![Page 37: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/37.jpg)
View Research Object Details
![Page 38: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/38.jpg)
Receive Notification• After the next part of the tutorial, check your inbox for email from SEAD VA
![Page 39: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/39.jpg)
Curator in SEAD VAKavitha Chandrasekar
![Page 40: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/40.jpg)
Overview
The Curator works on Research Objects created and submitted by Data Creators: reviews submission, modifies metadata, and takes action to move submission to their Institutional Repository
In VA curator can:• Select Item for review from curation queue• Enhance Metadata• Deposit to Institutional Repository
![Page 41: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/41.jpg)
“Under the Hood”IR Recommendation and IR Description
![Page 42: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/42.jpg)
Automatic IR Recommendation (SEAD VA Matchmaker)• Matches RO’s to compatible Institutional Repository• Recommends best Institutional Repository match for RO• Facilitates transfer and deposit of heterogeneous ROs
![Page 43: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/43.jpg)
IR Recommendation Flow
Submit
•User‐initiated
Deposit
•RO received by SEAD VA
Stage
•For decision making
Execute Rules
•Rules engine
Send to Curator queue
•Workflow‐initiated
IR MatchmakerAdd to IR queue based on
match found – eg:IU Scholarworks
or Ideals
![Page 44: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/44.jpg)
IR – SEAD VA “contract”: the Service Level Agreement• Service Level Agreement (SLA) is a contract of sorts between SEAD VA and an Institutional Repository. It captures• Repository requirements and privileges• Repository services
• The IR Recommendation system uses excerpts from IR’s SLA to identify compatible pairs of datasets and repositories during RO deposit.
![Page 45: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/45.jpg)
Service Level Agreement‐ Requirements and Privileges (summary)• RO properties – Requirements
• Data contributor Institutional Affiliation • Scientific Domain• Data Organization (e.g.: BagIt or SWORD)• Size• Versioning• Minimal Metadata• Licensing (eg: open, embargoed)
• Repository privileges• Repository is free to re‐distribute the RO received from SEAD VA, except in case of embargo.
• Repository can migrate RO into other formats and re‐distribute migrate ROs.• Repository curators can annotate data collections to comply with standards or upgrades in our policies.
![Page 46: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/46.jpg)
SLA – Repository Service Guarantees
• Long‐term preservation• Format Migration• Archival support• Embargo• Access • DOI generation• Technical guarantees:
• Limited Downtime• Data Ingest Time• Backup• Integrity checks
![Page 47: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/47.jpg)
Excerpt from from SLA for IU Scholarworks
• Institutional Affiliation• At least one author, at the time of deposit, belongs to the same institution as our repository.
• RO Size• 150 MB for items uploaded directly to IUScholarWorks, 10 GB total• 5 TB for items hosted on the SDA
• Versioning• Only final PO is accepted, subsequent versions will substitute the version of record.
• Scientific Domain – Curator review might be needed• ROs are associated with research in the domains of ANY (identify specific domains or put “sustainability science” for a broader match)
![Page 48: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/48.jpg)
The IR Recommender use of an SLA
• IR Recommender implements an IR’s SLA as a set of executable rules in the Matchmaker. The rules are executed with a rules engine called “Drools”• Rules can be added on the fly, meaning new IR can be added just by specifying a SLA. • Incorporate modifications in SLA to rules at runtime
• Clean mapping of SLA terms to Drools Drools rules
![Page 49: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/49.jpg)
Mapping SLA to Drools rulesrule "IU Scholaworks Affiliation rule”
dialect "mvel”salience 20
whenSeadDeliverableUnit( title != null ) //Per IU SLA collection should have titleSeadDeliverableUnit($contributors:dataContributors )eval( $contributors.size>0 ) //Creators should not be empty per IU SLA$seadPerson: SeadPerson( idType == "vivo" && getEmail(id)=="Indiana University") from $contributors;$seadDu : SeadDeliverableUnit(sizeBytes < 10000000000 ) //Total collection size less than10 GB approximatelySeadDeliverableUnit(fileNo < 1000 ) //Total file count less than 1000SeadDeliverableUnit( "CC" in (rights) ) //Open access data
thenaddRepository("iu", 2); //Adding IU repository to the queue of matched repositories with priority 2
end
Rule declaration
Condition
Execution
Affiliated data contributor found
![Page 50: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/50.jpg)
Demo / Hands On[Curator role]
![Page 51: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/51.jpg)
Select Item from Curation QueueMatched Institutional Repository
Click on Curate Tab
Assign RO to self for review by clicking “Assign to me”
![Page 52: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/52.jpg)
Download ReadMe file for Dataset under edit
Unzip Bag
Open data/_readme.txt
![Page 53: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/53.jpg)
Enhance Metadata
Click on ‘Edit’ button
![Page 54: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/54.jpg)
View Research Object in Edit mode
To edit, click on entities in the bottom pane
![Page 55: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/55.jpg)
Populate metadata from ReadMe file
To save changes, click on ‘Save Changes’ button
![Page 56: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/56.jpg)
Save Final Curation changes
Finally click on ‘Save Changes’ below
After changes are saved, click on ‘Back’ to go back to Curation queue
![Page 57: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/57.jpg)
Approve and Publish to Institutional Repository
Publish
![Page 58: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/58.jpg)
Trace Activity
• Go to activity tab• See all actions performed by you• Click on the Research Object name to see details
Activity tab
![Page 59: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/59.jpg)
View in Institutional Repository
![Page 60: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/60.jpg)
Data Reuse Scientist in SEAD VAIsuru Suriarachchi
![Page 61: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/61.jpg)
Overview: The Data Scientist Data Scientist uses research objects that were created by someone elsefor his/her purposes and creates new research objects by modifyingexisting objects.Super Simple Example: Putting images in given RO 3 into a singlepresentation and creating a new RO
Data scientist can:• Search• Download (bags)• Modify• Re‐upload
![Page 62: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/62.jpg)
“Under the Hood”Provenance, Component Interaction
![Page 63: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/63.jpg)
Provenance
• What is Provenance? • Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness
• Also called “Lineage” or “Pedigree”• Advantages of provenance for preservation
• Derive ownership• Asses quality and trustworthiness• Reproducibility• Validation• Failure Tracing
Not used in Preservation Provenance
![Page 64: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/64.jpg)
Provenance in Repositories
• The provenance important here is provenance of a Research Object • Why important?
• For the data scientists in “Search”• To check ownership of RO• To asses quality and trustworthiness of RO
• For the Curators • To check curation history
• Provenance role in “Publish ‐ Reuse window”• Published Object (PO) Provenance• Curation Object (CO) Provenance
![Page 65: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/65.jpg)
Provenance Capture in SEAD VA
• Uses Komadu provenance system• Captures activity in real time, assembles new activity into internal representation as provenance graphs
• W3C PROV spec compliant
• Terminology• Activity : Some Processing Event in SEAD VA• Entity : A Research Object (in CO or PO state)• Agent : Data Creator, Curator, Data Scientist
![Page 66: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/66.jpg)
Provenance among Published Objects
Create RO
Publish RO
Download RO
Upload RO’
Publish RO’
DataCreator
Curator Data Scientist
CuratorData Scientist
Provenance captured between these 2 published RO’s (RO and RO’). Provenance relationship is:Derivation: if Data Creator =/ Data Scientist. Revision: if Data Creator same as Data Scientist
![Page 67: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/67.jpg)
Maintaining Provenance among Published ROs• Two identifiers maintained: DOI and Internal Identifier.•Why two identifiers? • DOI: each RO has a unique DOI. • Internal Identifier: lineage maintained through internal identifier which maintains the relationship between original object and derived object
![Page 68: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/68.jpg)
Provenance among Published Objects• At first publish of RO, a DOI and Internal Identifier are added to oaiore.xml
• At Re‐upload
![Page 69: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/69.jpg)
Provenance among Published ROs
• Provenance relationships captured in Komadu• Entity‐Entity (derivation) : When the second publish is done
• This RO provenance capture continues up to any number of publish:download:re‐upload cycles
• At second publish (RO’), “wasDerivedFrom” element is added in the oaiore.xml referring to the original Internal Identifier
![Page 70: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/70.jpg)
Usage of Published Object Provenance• Data scientist can see lineage graph of her new RO’. This helps her assess the collection and is useful if original object changes (forward provenance).
![Page 71: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/71.jpg)
Curation Time Provenance Capture
Create RO
Publish RO
Creator Curator
Provenance within Curation
![Page 72: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/72.jpg)
Curation Time Provenance Capture• Curation Activities
• Curation‐Edit‐Event• Publish‐Event
• Provenance relationships captured in Komadu• Agent‐Activity : When some Agent triggers one of above Activities• Activity‐Entity : When an Activity Generates (Updates) a Research Object
• Example Scenario• Curator X edits metadata on research object Y
• Agent‐Activity relationship (association) between X and Curation‐Edit‐Event• Activity‐Entity relationship (generation) between Curation‐Edit‐Event and Y
![Page 73: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/73.jpg)
Usage of Provenance at Curation time
• Curator can see all actions he/she performed on a particular Research Object
![Page 74: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/74.jpg)
Component InteractionSEAD VA Workflow
Local ID GenerationLocal ID
GenerationPersist RO
Persist RO
DOI Generation
DOI Generation
Publish to IR
Publish to IR
RO Subsystem
RO Sub
system
API
RO Sub
system
API
SEAD VARegistrySEAD VARegistry
KomaduProvenance
Server
KomaduProvenance
Server
Metadata/Provenance Processor
Metadata/Provenance Processor
REST APIREST API WS APIWS API
SEAD VAUI
Upload Bag/Publish RO
Curate/Provenance
MatchMakerMatchMaker
![Page 75: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/75.jpg)
Demo / Hands On[as a data scientist]
![Page 76: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/76.jpg)
Register / Sign In• Go to http://seadva‐test.d2i.indiana.edu:5672/sead‐access/• Click LOG IN and fill your login information (or click SignUp below)
![Page 77: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/77.jpg)
Search for Data
![Page 78: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/78.jpg)
Find data
Filter
![Page 79: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/79.jpg)
Browse data collection
![Page 80: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/80.jpg)
Request Data Download
![Page 81: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/81.jpg)
Receive data download email
![Page 82: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/82.jpg)
Download Data
Modify Data
![Page 83: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/83.jpg)
Re‐Upload data
![Page 84: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/84.jpg)
Access Curation Queue
![Page 85: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/85.jpg)
Approve and Publish
Publish
![Page 86: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/86.jpg)
Check Activity
• Go to activity tab• See activities performed (Curation time provenance)
• Click on the Research Object name to see details
Activity tab
![Page 87: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/87.jpg)
Check Provenance Graph
Provenance between 2 published objects (derivation)
![Page 88: Preservation, Publishing, and People: A SEAD View](https://reader033.vdocuments.net/reader033/viewer/2022060111/556352cfd8b42a90698b56ff/html5/thumbnails/88.jpg)
Thank You