scidatacon - how to increase accessibility and reuse for clinical and personal genomic data

23
SciDataCon: How to increase accessibility and reuse for clinical and personal genomic data Fiona Nielsen – September 12 th 2016

Upload: fiona-nielsen

Post on 11-Feb-2017

306 views

Category:

Science


0 download

TRANSCRIPT

Page 1: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

SciDataCon: How to increase accessibility and reuse for clinical and personal genomic data

Fiona Nielsen – September 12th 2016

Page 2: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

We are always looking for data

Genetics, Cancer,

Rare diseaseresearch

We need access to the right data at the right time

DNA interpretation

requires lots of data

Page 3: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

How much data do you need to publish a paper?

2001: 1 human genome

2012: 1000 Genomes (1092 genomes, since increased to ~2500)

2015: UK10K, Icelandic population (2,636 + 100k imputed), Cancer genome atlas ~11,000 genomes

?

2016:Exac consortium 65,000 exomes

2020:

Page 4: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Data is not easy to find and access

FRAGMENTEDPoor visibility of available

genomic data

ADMIN BURDENHuge overhead to manage

data access

BAD CULTURELack of data sharing habits in

research culture

Page 5: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Finding and accessing data can take months

< 1 week

1-3 months

+6 months

40%

48%

11%

Time spent data scouting per project

Page 6: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Why the barrier?

Barriers

• Difficult to find data, let alone find the RIGHT data

• Time-consuming and difficult to apply for access to data

• Complicated and labourious to submit data to public repositories

http://blog.repositive.io/tag/data-access/

http://blog.repositive.io/tag/data-sharing/

Page 7: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Data access applications for sensitive data

• Benefits: strict governance, review of consent, applicant signs for full responsibility for governance

Page 8: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Data access applications for sensitive data

• Benefits: strict governance, review of consent, applicant signs for full responsibility for governance

• Disadvantages: No control of data once access is given, high barrier for access – too high?

Page 9: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Alternative process – castle and moat

• Vetted users are allowed into the system where they can investigate and analyse data.

• No raw data exports are allowed and results for export are manually reviewed

• Example: Genomics England

Page 10: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

• Allow vetted users access to privacy-preserving or manually curated exports from the data

• Example: Browsing UK census data – available for all

Alternative process – controlled disclosure

Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx

Page 11: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

But where in the world is the data?

?

Page 12: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Building upon best practices

MAKE DATA DISCOVERABLE

SIMPLIFY WORKFLOWS

CONTRIBUTE TOCOMMUNITY

DNAdigest and Repositive – Connecting the world of genomic datahttp://www.tinyurl.com/plos-biology-repositive

Page 13: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

How to make data easy to discover?

Page 14: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Although there are hundreds of data sources… they aren’t easy to find!

Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-160

20406080

100120140160180200

1025 33 35

102

163

http://dx.doi.org/10.1371/journal.pbio.1002418 First 30 data sources listed here:

Page 15: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Sequenced ethnicities

Aboriginals

African Americans

Africans

Australians

Chinese

MalaysIndians

DanishDutch Estonian

Russian

European Ancestry

FinnishIcelandic

JapaneseKorean

Latin Americans

Saudi

Swedish

Page 16: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Machines & Data sources

9475600

88

660

26

68

5062

3

25

0

0

23 International

Interesting site to look at: http://omicsmaps.com/stats

Page 17: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Main Repository funders

BGI = 4

EBI = 9NIH = 10NCBI = 9

The Broad = 8

Wellcome = 4

EBI total 104 services, 19 repositories http://www.ebi.ac.uk/services/all

NCBI total 67 databases http://www.ncbi.nlm.nih.gov/guide/all/#databases_

Page 18: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

We have identified hundreds of data sources

Universities – Or repositories affiliated to a university.

Projects/Consortia – Has a specific purpose/aim. Often focussed on a specific research question or disease.

Public repositories – Allows download and upload of data from multiple institutions.

Companies – For profit organisations making data available for free or as a service.

Biobanks – many have sequence data of their biological samples.

Researchers know on

average 4-5 data sources

More data sources appear every day, to date we have identified 270+

Page 19: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Simpler workflowfor data access

And indexed them on a the Repositive platform

Discover and access

Efficient Search, see related results

Find colleagues & their data interests

Co-annotate data & community feedback

Free to use: http://discover.repositive.io

Page 20: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Benefit for both sides of data collaborations

Data consumers Data producers

Find relevant data faster

Feedback from other users through ratings and comments to evaluate data quality

Find collaborators with data

Make your data visible

Build credibility as a trusted provider of quality data

Find collaborators to analyse your data

Page 21: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

• Supporting the whole research workflow

• Faster, more efficient data discovery• Streamlining data access applications • Developing technology for efficient data access• Setting up pre-competitive data sharing agreements• Running workshops and training programmes

More efficient data access

Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx

Page 22: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Recap: Still a lot of work to do

Barriers

• Difficult to find data, let alone find the RIGHT data

• Time-consuming and difficult to apply for access to data

• Complicated and labourious to submit data to public repositories

http://blog.repositive.io/tag/data-access/

http://blog.repositive.io/tag/data-sharing/

Page 23: SciDataCon - How to increase accessibility and reuse for clinical and personal genomic data

Connecting the world of genomic data

Visit us at: http://repositive.io Or tweet us @repositiveio