Download - SPARC 2013 Data Management Presentation
![Page 1: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/1.jpg)
Data management.
Nicole Vasilevsky, NCNM, OHSU
Jackie Wirz, OHSU
Melissa Haendel, OHSU
![Page 2: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/2.jpg)
![Page 3: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/3.jpg)
Outline
• Introduction
• Why do we need good data
management?
• Good data management
• Databases and tools
• Sharing your data
![Page 4: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/4.jpg)
Who are we?
• Nicole Vasilevsky, PhD
– Assistant Professor, Helfgott Research Institute, NCNM
– Project Manager, Ontology Development Group, OHSU
• Jackie Wirz, PhD
– Assistant Professor, Bioinformation Specialist, OHSU
library
• Melissa Haendel, PhD
– Assistant Professor, Department Head, Ontology
Development Group, OHSU
![Page 5: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/5.jpg)
What does data mean to you?
![Page 6: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/6.jpg)
Do you have any training in data
management?
![Page 7: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/7.jpg)
Do you know what
metadata is?
a. Philosophy
b. describes data
c. dating site
d. data
![Page 8: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/8.jpg)
What is data?
• Clinical data
• Experimental data
• School related data
• Personal data
• Social data
![Page 9: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/9.jpg)
So much data
![Page 10: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/10.jpg)
Why?
Personal organization
Credit where credit is due
Reproducibility of science and
medicine
Accelerates scientific and clinical discovery
Efficiency
![Page 11: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/11.jpg)
Do you get frustrated with any of the
following in your personal or professional
life?a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers/clinicians
e. Ensuring data are secure
f. Making data accessible to other researchers
g. Controlling access to data
h. Tracking updates to data (ie versioning)
i. Creating metadata (ie describing the data to be more useful at
a later time or by others)
j. Protecting intellectual property rights
k. Ensuring appropriate professional credit/citation is given to
data sets/generated
![Page 12: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/12.jpg)
http://davidmichaelangelosilva.wordpress.com/2012/01/29/organize-your-messy-desktop-with-fences/
Messy Desktop?
![Page 13: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/13.jpg)
Which of the following do you do? a. Save copies of data on a disk, USB drive, tape, or
computer hard drive
b. Save copies of data on a local server
c. Save copies of data on a central campus server
d. Save copies of data on a web based or cloud server
e. Store data in a repository or archives
f. Automatically backup files
g. Manually generate backup
h. Restrict access to files
![Page 14: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/14.jpg)
Credit where credit is due
Data collection & Analysis
Authoring
Storage, Archiving, & Preservation
Publication & Dissemination
The scholarly
communication cycle
![Page 15: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/15.jpg)
Reproducibility of science• Lack of information
makes it difficult to reproduce experiments
• Retraction rates are on the rise
• Difficulty identifying resources in the published literature
Cokol et al. EMBO reports (2008) 9, 2
0%
25%
50%
75%
100%
![Page 16: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/16.jpg)
Sharing can be advantageous
http://www.flickr.com/photos/eltonl/107582334/sizes/l/in/photostream/
![Page 17: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/17.jpg)
Why share your data?
• Data sharing mandates– NIH public
access policy
– NIH/NSF data sharing plan for new applications
• Further science and and medicine
• Build collaborations
• Enable new discoveries with your data
• Can be required at time of publication
![Page 18: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/18.jpg)
Efficiency
http://hbr.org/2012/10/big-data-the-management-revolution
https://upload.wikimedia.org/wikipedia/commons/b/ba/HMS_Surprise_at_sunset_with_airplane.jpg
![Page 19: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/19.jpg)
How?
• File naming and data storage
• Metadata
• Controlled vocabularies and
Ontologies
• Databases and Tools
• Data accessibility
![Page 20: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/20.jpg)
File naming
![Page 21: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/21.jpg)
Informative file names
Will I remember what this file is in a month from now?
![Page 22: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/22.jpg)
Naming conventions
Project_instrument_location_YYYYMMDDhhmmss_extra.ext
Index/grant conditions Leading zero!
s/n, variable Retain order
![Page 23: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/23.jpg)
Directory Structure
Sticking with a directory structure can
be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentation
Presentations
SPARC CTSAconnect
Monarch
![Page 24: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/24.jpg)
VersioningDataManagement_SPARC_050313_final_NV
• Save a copy of every version of a data file
• Follow a file naming convention
• Version control software
– Dropbox
– Google docs
– GIT
– SMART SVN
![Page 25: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/25.jpg)
Dropbox
www.dropbox.com
![Page 26: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/26.jpg)
Google docs
![Page 27: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/27.jpg)
Remember to backup your data!
• Recommended to back up three
copies!
– 1 on your local workstation
– 1 local/remove, such as external hard drive
– 1 remote, such as on a cloud server*
*Depending on the type of data, as cloud servers are not always secure
http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf
![Page 28: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/28.jpg)
Organizing your IRB application
Created by Heather Schiffke
See:http://libguides.ohsu.edu/data
![Page 29: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/29.jpg)
File renaming applications
• Bulk Rename Utility (Windows)
• Renamer (Mac)
• PSRenamer
![Page 30: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/30.jpg)
Metadata
![Page 31: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/31.jpg)
What is Metadata?
TitleAuthorCall numberPublisherISBN
![Page 32: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/32.jpg)
![Page 33: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/33.jpg)
![Page 34: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/34.jpg)
File name File type
Who created the data
Title
Date created
![Page 35: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/35.jpg)
![Page 36: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/36.jpg)
![Page 37: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/37.jpg)
Using structured phenotype data to identify genetic basis of disease
![Page 38: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/38.jpg)
Metadata standards:Controlled vocabularies and
ontologies
![Page 39: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/39.jpg)
Controlled vocabularies
![Page 40: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/40.jpg)
MeSH
![Page 41: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/41.jpg)
MeSH
acetominophen
![Page 42: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/42.jpg)
What is an Ontology?
1. Hierarchical terms are defined textually and logically
2. Relationships between the terms are defined
3. Expressed in a language that can be reasoned across by computers
4. Data can be reused and can be easily linked together
![Page 43: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/43.jpg)
Commonly Used Ontologies
• Gene Ontology
• Linnaean Taxonomy
• SNOMED
![Page 44: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/44.jpg)
Why are CVs and Ontologies useful?
• Can be used to structure your
metadata
• Are often used to structure
information in databases
![Page 45: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/45.jpg)
Structured data helps with searching
Craigslist search: Chaise
Craigslist matches on strings only
Craigslist search: Fainting couch
![Page 46: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/46.jpg)
Structured data helps with searching
PubMed indexes articles with MeSH Terms
![Page 47: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/47.jpg)
In Summary: Structured Metadata = good
How can I create structured metadata?
http://www.flickr.com/photos/san_drino/1454922072/
![Page 48: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/48.jpg)
and Tools…(to make your life easier)
(s)
http://farm4.static.flickr.com/3560/3332644561_c9d5041d02.jpg
![Page 49: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/49.jpg)
Data Management tools and repositories
• Purpose: Software where you can organize, store and/or share data
• Often contain metadata to assist with data entry and create structured data
![Page 50: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/50.jpg)
Tools for data management
![Page 51: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/51.jpg)
Data Sharing Repositories
http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
![Page 52: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/52.jpg)
Repositories use Unique IDs
• Document Object Identifier (DOI)
• Example: DOIs for publications
– doi: 10.1371/journal.pbio.1001339
• Unique resource identifier (URI)
• A URI will resolve to a single location on the web
• URIs for people
![Page 53: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/53.jpg)
People Disambiguation
![Page 54: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/54.jpg)
• Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis
OR• John L Campbell, Research Ecologist, Center for Research on
Ecosystem Change, Durham, NC
![Page 55: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/55.jpg)
![Page 56: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/56.jpg)
Tools for personal data management
• Google drive
• Dropbox
• Evernote
• Task Paper
• Diigo- bookmarking websites
• Mendeley, EndNote, Zotero- citation manager
• Sound Gecko
http://blogs.scientificamerican.com/information-culture/2012/12/10/managing-personal-knowledge-data-and-information/
![Page 57: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/57.jpg)
URLs to resources
Go to:
http://
libguides.ohsu.edu/data
![Page 58: SPARC 2013 Data Management Presentation](https://reader036.vdocuments.net/reader036/viewer/2022062705/556353e7d8b42aed538b4faa/html5/thumbnails/58.jpg)
Data Sharing and Management
Snafu
in 3 short acts