data curation vanderbilt
TRANSCRIPT
DataCurationWhyshouldwecare?
YasminAlNoamanyOldDominionUniversity
WebScienceandDigitalLibrariesGroupws-dl.cs.odu.edu@yasmina_anwar @WebSciDL
1
Presented for CLIR Postdoctoral Fellow for Data Curation atVanderbilt University
Aboutme
2
Academicdegrees
3
• Bachelor'sdegreeofComputerScience
• Master'sdegree inComputerScience
• ADoctorofPhilosophyinComputerScience
OldDominionUniversity(2011-2016)
• ResearchAssistant:integratingthepastwiththepresent“StorytellingforSummarizingCollectionsinWebArchives”
• TeachingAssistant
4
ArchivedcollectionsStorytellingservices
Archivedenrichedstories
InternetArchive(summer2014-fall2014)
• Loganalysis• ToolsformanagingseedURIs
5
0.11.160.135 [02/Feb/2012:00:01:03] "GET http://web.archive.org/web/20070519015308im_/http://www.jcdl.org/images/jcdl2007-edie.jpg HTTP/1.1" 200 2137 "-" "Mozilla/5.0"
0.11.160.135 [02/Feb/2012:00:01:03] "GET http://staticweb.archive.org/images/toolbar/wayback-toolbar-logo.png HTTP/1.1" 200 3700 "–" "Mozilla/5.0"
0.151.147.108 [02/Feb/2012:00:01:03] "GET http://web.archive.org/web/20100102003557/about:blank HTTP/1.1" 302 0 "www.xx.com" "Mozilla/4.0"
Personal
• WomeninTechcommunities:@anitaborg,@systers,@arabwic
• Photography• Amomforthisadorable7yearsold
6
Awardsandpublications
• BestTeachingAward• BestStudentPaperAward
• 9papers,inwhich3arejournals.
7
DataCuration
8
Whyshouldwecare?
9
DataManagement
10
11
DataManagement
HowIgotthelogsfromtheIA
12Source:http://www.tamr.com/real-data-scientists-enterprise/
Evenwesavethedata,howitwillbesharedandre-used?
13
Metadataisimportant
14
ThecallforarevolutioninEgypt
• ItallstartedonFacebook
15
MultipleinitiativesfordocumentingtheEgyptianRevolution
16
SeveralstudiesandbooksabouttheEgyptianRevolution
17
Thesestudiesandbookscitedthesesites
18
Theydonotexistanymore!
19
Datapreservationisimportantforposterity
• AyearaftertheEgyptianRevolution,11%ofthesocialmediadocumentationisgone.
20Source:http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
DataCurationisimportantforscholarlyresearch• Managingyourresearchdatasavestime• Universitiesandotherresearchorganizationsinvestverylargesumsofmoneyintoresearchactivities
• Digitaldataisinherentlypronetoloss• Futureaccesstovaluabledigitalassetsdependsuponcuration/preservationactionstakentoday
• Fundingagencyrequirements• Researchdatashouldbesharedandpubliclyaccessible:
• Increasetheimpactofyourresearch• Makeattributioneasy• Callforaccountabilityandtransparency• Permitsotherstoreplicatethefindingsofastudy
• Scholarlycommunicationchain—connectingdatatopublication21
WhatisDataCuration?
• “Datacurationistheactiveandongoingmanagementofresearchdatathroughitslifecycleofinterestandusefulnesstoscholarship,science,andeducation.”– CarolePalmer,UIUCGSLIS
• Datamanagement• Addingvaluetodata• Datapreservationforlaterre-use
22
DCCCurationlifecycle
23Source:http://www.dcc.ac.uk/resources/curation-lifecycle-model
CONCEPTUALIZEStep-by-stepinstructionandtemplatesforcreating,publishingandsharingdatamanagementplansthatsatisfyfundingagencymandates
24
CREATEORRECEIVE
25
Acollaborativeworkingspaceanddata-sharingplatform
APPRAISE&SELECT
26
Identification,Validation,Characterization
INGEST
27
• Handleawidevarietyoftransferprocesses• Assuretheavailabilityoftheresearchdataacross
institutions andpublishersandkeepitdiscoverable
PRESERVATIONACTION
28
• ExtractmetadatainXMLformat• Createchecksum,orhashtagforthedataobjects
• Facilitatedatadiscoveryandre-use• Raiseinterestinyour research• Facilitatepreservation
STORE
29
• Getcreditforyourdataandbuildyourreputation.• Youdataisdiscoverableandcanbeattributedtoyou.• Otherresearcherscanfinddataassociatedwitha
publicationandexplorenewwaystouseit.
ACCESS,USE&REUSE
30
TRANSFORM
31
Migratingthedataandputthemintoanotherformat
Summary
• WhatisDataCuration?• Annotation• Management• Validation• Preservation• Sharing• AccessandRe-use• Authentication
32
• WhydoweneedDataCuration?
• Long-termaccess• Re-use• Interoperability• Reproducibility• Cost-effective• Time-saving• Creditability• Accountability
“Datacurationsystemsshouldbeintegratedwiththeactiveresearch
phase”
33
YasminAlNomanyOldDominionUniversity
WebScienceandDigitalLibrariesGroupws-dl.cs.odu.edu
http://www.cs.odu.edu/~yasmin/https://www.linkedin.com/in/yasminalnoamany
https://github.com/yasmina85/@yasmina_anwar @WebSciDL
BackupSlides
34
Data&Complexity
• Researchproblemsincreasinglyinterdisciplinaryandcomplex
• Collaborationrequiresopensharingofdata
• Dataarehighlyheterogeneousandlargelyincompatibleintheirnativeforms
• Thesemanticsandcontextswithinwhichdataaregatheredandinterpretedareimportanttopreserve
35
36
(Comic from The Official Dilbert Store)
37
http://www.christianitytoday.com/edstetzer/2015/february/3-ways-social-media-benefits-church-leaders.html