data curation vanderbilt

37
Data Curation Why should we care? Yasmin AlNoamany Old Dominion University Web Science and Digital Libraries Group ws-dl.cs.odu.edu @yasmina_anwar @WebSciDL 1 Presented for CLIR Postdoctoral Fellow for Data Curation at Vanderbilt University

Upload: yasmin-alnoamany-phd

Post on 22-Jan-2018

123 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Data curation vanderbilt

DataCurationWhyshouldwecare?

YasminAlNoamanyOldDominionUniversity

WebScienceandDigitalLibrariesGroupws-dl.cs.odu.edu@yasmina_anwar @WebSciDL

1

Presented for CLIR Postdoctoral Fellow for Data Curation atVanderbilt University

Page 2: Data curation vanderbilt

Aboutme

2

Page 3: Data curation vanderbilt

Academicdegrees

3

[email protected]

• Bachelor'sdegreeofComputerScience

• Master'sdegree inComputerScience

• ADoctorofPhilosophyinComputerScience

Page 4: Data curation vanderbilt

OldDominionUniversity(2011-2016)

• ResearchAssistant:integratingthepastwiththepresent“StorytellingforSummarizingCollectionsinWebArchives”

• TeachingAssistant

4

ArchivedcollectionsStorytellingservices

Archivedenrichedstories

Page 5: Data curation vanderbilt

InternetArchive(summer2014-fall2014)

• Loganalysis• ToolsformanagingseedURIs

5

0.11.160.135 [02/Feb/2012:00:01:03] "GET http://web.archive.org/web/20070519015308im_/http://www.jcdl.org/images/jcdl2007-edie.jpg HTTP/1.1" 200 2137 "-" "Mozilla/5.0"

0.11.160.135 [02/Feb/2012:00:01:03] "GET http://staticweb.archive.org/images/toolbar/wayback-toolbar-logo.png HTTP/1.1" 200 3700 "–" "Mozilla/5.0"

0.151.147.108 [02/Feb/2012:00:01:03] "GET http://web.archive.org/web/20100102003557/about:blank HTTP/1.1" 302 0 "www.xx.com" "Mozilla/4.0"

Page 6: Data curation vanderbilt

Personal

• WomeninTechcommunities:@anitaborg,@systers,@arabwic

• Photography• Amomforthisadorable7yearsold

6

Page 7: Data curation vanderbilt

Awardsandpublications

• BestTeachingAward• BestStudentPaperAward

• 9papers,inwhich3arejournals.

7

Page 8: Data curation vanderbilt

DataCuration

8

Whyshouldwecare?

Page 9: Data curation vanderbilt

9

Page 10: Data curation vanderbilt

DataManagement

10

Page 11: Data curation vanderbilt

11

DataManagement

Page 12: Data curation vanderbilt

HowIgotthelogsfromtheIA

12Source:http://www.tamr.com/real-data-scientists-enterprise/

Page 13: Data curation vanderbilt

Evenwesavethedata,howitwillbesharedandre-used?

13

Page 14: Data curation vanderbilt

Metadataisimportant

14

Page 15: Data curation vanderbilt

ThecallforarevolutioninEgypt

• ItallstartedonFacebook

15

Page 16: Data curation vanderbilt

MultipleinitiativesfordocumentingtheEgyptianRevolution

16

Page 17: Data curation vanderbilt

SeveralstudiesandbooksabouttheEgyptianRevolution

17

Page 18: Data curation vanderbilt

Thesestudiesandbookscitedthesesites

18

Page 19: Data curation vanderbilt

Theydonotexistanymore!

19

Page 20: Data curation vanderbilt

Datapreservationisimportantforposterity

• AyearaftertheEgyptianRevolution,11%ofthesocialmediadocumentationisgone.

20Source:http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html

Page 21: Data curation vanderbilt

DataCurationisimportantforscholarlyresearch• Managingyourresearchdatasavestime• Universitiesandotherresearchorganizationsinvestverylargesumsofmoneyintoresearchactivities

• Digitaldataisinherentlypronetoloss• Futureaccesstovaluabledigitalassetsdependsuponcuration/preservationactionstakentoday

• Fundingagencyrequirements• Researchdatashouldbesharedandpubliclyaccessible:

• Increasetheimpactofyourresearch• Makeattributioneasy• Callforaccountabilityandtransparency• Permitsotherstoreplicatethefindingsofastudy

• Scholarlycommunicationchain—connectingdatatopublication21

Page 22: Data curation vanderbilt

WhatisDataCuration?

• “Datacurationistheactiveandongoingmanagementofresearchdatathroughitslifecycleofinterestandusefulnesstoscholarship,science,andeducation.”– CarolePalmer,UIUCGSLIS

• Datamanagement• Addingvaluetodata• Datapreservationforlaterre-use

22

Page 23: Data curation vanderbilt

DCCCurationlifecycle

23Source:http://www.dcc.ac.uk/resources/curation-lifecycle-model

Page 24: Data curation vanderbilt

CONCEPTUALIZEStep-by-stepinstructionandtemplatesforcreating,publishingandsharingdatamanagementplansthatsatisfyfundingagencymandates

24

Page 25: Data curation vanderbilt

CREATEORRECEIVE

25

Acollaborativeworkingspaceanddata-sharingplatform

Page 26: Data curation vanderbilt

APPRAISE&SELECT

26

Identification,Validation,Characterization

Page 27: Data curation vanderbilt

INGEST

27

• Handleawidevarietyoftransferprocesses• Assuretheavailabilityoftheresearchdataacross

institutions andpublishersandkeepitdiscoverable

Page 28: Data curation vanderbilt

PRESERVATIONACTION

28

• ExtractmetadatainXMLformat• Createchecksum,orhashtagforthedataobjects

Page 29: Data curation vanderbilt

• Facilitatedatadiscoveryandre-use• Raiseinterestinyour research• Facilitatepreservation

STORE

29

Page 30: Data curation vanderbilt

• Getcreditforyourdataandbuildyourreputation.• Youdataisdiscoverableandcanbeattributedtoyou.• Otherresearcherscanfinddataassociatedwitha

publicationandexplorenewwaystouseit.

ACCESS,USE&REUSE

30

Page 31: Data curation vanderbilt

TRANSFORM

31

Migratingthedataandputthemintoanotherformat

Page 32: Data curation vanderbilt

Summary

• WhatisDataCuration?• Annotation• Management• Validation• Preservation• Sharing• AccessandRe-use• Authentication

32

• WhydoweneedDataCuration?

• Long-termaccess• Re-use• Interoperability• Reproducibility• Cost-effective• Time-saving• Creditability• Accountability

Page 33: Data curation vanderbilt

“Datacurationsystemsshouldbeintegratedwiththeactiveresearch

phase”

33

YasminAlNomanyOldDominionUniversity

WebScienceandDigitalLibrariesGroupws-dl.cs.odu.edu

http://www.cs.odu.edu/~yasmin/https://www.linkedin.com/in/yasminalnoamany

https://github.com/yasmina85/@yasmina_anwar @WebSciDL

Page 34: Data curation vanderbilt

BackupSlides

34

Page 35: Data curation vanderbilt

Data&Complexity

• Researchproblemsincreasinglyinterdisciplinaryandcomplex

• Collaborationrequiresopensharingofdata

• Dataarehighlyheterogeneousandlargelyincompatibleintheirnativeforms

• Thesemanticsandcontextswithinwhichdataaregatheredandinterpretedareimportanttopreserve

35

Page 36: Data curation vanderbilt

36

(Comic from The Official Dilbert Store)

Page 37: Data curation vanderbilt

37

http://www.christianitytoday.com/edstetzer/2015/february/3-ways-social-media-benefits-church-leaders.html