icg-11 - genomic data projects around the world - nov 5 2016

21
ICG-11: Genomic Data Projects around the World - How to find data for your research Fiona Nielsen – November 4 th 2016

Upload: fiona-nielsen

Post on 11-Feb-2017

205 views

Category:

Science


1 download

TRANSCRIPT

Page 1: ICG-11 - genomic data projects around the world - nov 5 2016

ICG-11: Genomic Data Projects around the World - How to find data for your research

Fiona Nielsen – November 4th 2016

Page 2: ICG-11 - genomic data projects around the world - nov 5 2016

We are always looking for data

Genetics, Cancer,

Rare diseaseresearch

We need access to the right data at the right time

DNA interpretation

requires lots of data

Page 3: ICG-11 - genomic data projects around the world - nov 5 2016

How much data do you need to publish a paper?

2001: 1 human genome

2012: 1000 Genomes (1092 genomes, since increased to ~2500)

2015: UK10K, Icelandic population (2,636 + 100k imputed), Cancer genome atlas ~11,000 genomes

?

2016:Exac consortium 65,000 exomesGnomAD ~126,000 exomes

2020:

Page 4: ICG-11 - genomic data projects around the world - nov 5 2016

Data is not easy to find and access

FRAGMENTEDPoor visibility of available

genomic data

ADMIN BURDENHuge overhead to manage

data access

BAD CULTURELack of data sharing habits in

research culture

Page 5: ICG-11 - genomic data projects around the world - nov 5 2016

Finding and accessing data can take months

< 1 week

1-3 months

+6 months

40%

48%

11%

Time spent data scouting per project

Page 6: ICG-11 - genomic data projects around the world - nov 5 2016

Why the barrier?

Barriers

• Difficult to find data, let alone find the RIGHT data

• Time-consuming and difficult to apply for access to data

• Complicated and labourious to submit data to public repositories

http://blog.repositive.io/tag/data-access/

http://blog.repositive.io/tag/data-sharing/

Page 7: ICG-11 - genomic data projects around the world - nov 5 2016

But where in the world is the data?

?

Page 8: ICG-11 - genomic data projects around the world - nov 5 2016

DATA is fragmented

Page 9: ICG-11 - genomic data projects around the world - nov 5 2016

How to make data easy to discover?

Page 10: ICG-11 - genomic data projects around the world - nov 5 2016

We have identified hundreds of data sources

Universities – Or repositories affiliated to a university.

Projects/Consortia – Has a specific purpose/aim. Often focussed on a specific research question or disease.

Public repositories – Allows download and upload of data from multiple institutions.

Companies – For profit organisations making data available for free or as a service.

Biobanks – many have sequence data of their biological samples.

Researchers know on

average 4-5 data sources

More data sources appear every day, to date we have identified 350+

Page 11: ICG-11 - genomic data projects around the world - nov 5 2016

Simpler workflowfor data access

And indexed them on a the Repositive platform

Discover and access

Efficient Search, see related results

Find colleagues & their data interests

Co-annotate data & community feedback

Free to use: http://discover.repositive.io

Page 12: ICG-11 - genomic data projects around the world - nov 5 2016

Platform launched in Sept 2016

Discover and access

Efficient Search, see related results

Find colleagues & their data interests

Co-annotate data & community feedback

1 Million+ Human genomic datasets indexed

Free to use: http://discover.repositive.io

Page 13: ICG-11 - genomic data projects around the world - nov 5 2016

Platform launched in Sept 2016

Discover and access

Efficient Search, see related results

Find colleagues & their data interests

Co-annotate data & community feedback

1 Million+ datasets indexed

Simpler workflowfor data access

177kWhole Exomes

213kWhole Genomes

240023andMe samples

Free to use: http://discover.repositive.io

Page 14: ICG-11 - genomic data projects around the world - nov 5 2016

Platform launched in Sept 2016

Discover and access

Efficient Search, see related results

Find colleagues & their data interests

Co-annotate data & community feedback

1 Million+ datasets indexed

Simpler workflowfor data access

61+Countries

426+Research organisations

Using Repositive

PDX ConsortiumWith AstraZeneca

Free to use: http://discover.repositive.io

Page 15: ICG-11 - genomic data projects around the world - nov 5 2016

Data sources across the globeGEO location of 278 data sources analysed.

Found by tracking IP address of the source.

These include:

Public Repositories

Universities

Companies

BioBanks

Research consortiums

Page 16: ICG-11 - genomic data projects around the world - nov 5 2016

Data source content

Assay Types

Dedicated to…

Page 17: ICG-11 - genomic data projects around the world - nov 5 2016

Sequenced ethnicities

Aboriginals

African Americans

Africans

Australians

Chinese

MalaysIndians

DanishDutch Estonian

Russian

European Ancestry

FinnishIcelandic

JapaneseKorean

Latin Americans

Saudi

Swedish

Page 18: ICG-11 - genomic data projects around the world - nov 5 2016

Machines & Data sources

9475600

88

660

26

68

5062

3

25

0

0

23 International

Interesting site to look at: http://omicsmaps.com/stats

Page 19: ICG-11 - genomic data projects around the world - nov 5 2016

• Repositive is supporting the whole research workflow

• Faster, more efficient data discovery• Streamlining data access applications • Developing technology for efficient data access• Setting up pre-competitive data sharing agreements• Running workshops and training programmes

More efficient data access

Read about our pre-competitive PDX data resource in collaboration with AstraZeneca http://repositive.io/pdx

Page 20: ICG-11 - genomic data projects around the world - nov 5 2016

Building upon best practices

MAKE DATA DISCOVERABLE

SIMPLIFY WORKFLOWS

CONTRIBUTE TOCOMMUNITY

DNAdigest and Repositive – Connecting the world of genomic datahttp://www.tinyurl.com/plos-biology-repositive

First 30 data sources listed here:

Page 21: ICG-11 - genomic data projects around the world - nov 5 2016

Connecting the world of genomic data

Visit us at: http://repositive.io Or tweet us @repositiveio Free to use: http://discover.repositive.io

Fiona Nielsen, CEO Email us: [email protected]