genome sharing projects around the world nijmegen oct 29 - 2015

45
Genome sharing projects around the world – and how you find data for your research Fiona Nielsen, October 2015 Find me on twitter: @glyn_dk

Upload: fiona-nielsen

Post on 16-Apr-2017

549 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Genome sharing projects around the world   nijmegen oct 29 - 2015

Genome sharing projects around the world

– and how you find data for your research

Fiona Nielsen, October 2015

Find me on twitter: @glyn_dk

Page 2: Genome sharing projects around the world   nijmegen oct 29 - 2015

• In case my talk will be boring…

First the take home messages…

Page 3: Genome sharing projects around the world   nijmegen oct 29 - 2015

Do not forget: By 2025 genome research will produce as much data

as Twitter /YouTube.

You do not have enough statistical power to interpret

your data

ButYou can

improve your study design

AndYou can access more data from public genome

data repositories

Page 4: Genome sharing projects around the world   nijmegen oct 29 - 2015

As you all know…

Page 5: Genome sharing projects around the world   nijmegen oct 29 - 2015

Data output is going up

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

400K

Genomes Sequenced

The output of human genome sequencing data is growing at exponential rates

Estimated number of human genomes sequenced in 2015

Page 6: Genome sharing projects around the world   nijmegen oct 29 - 2015

Population scale genome sequencing projects

Population scale genome sequencing projects have been launched all over the world

Soon every research lab and every genetic clinic will have a DNA sequencer

Page 7: Genome sharing projects around the world   nijmegen oct 29 - 2015

How much data do you need to publish a paper?

2001: 1 human genome

2012: 1000 Genomes (1092 genomes, since increased to ~2500)

2015: UK10K, Icelandic population (2,636 + 100k imputed), Cancer genome atlas ~11,000 genomesExac consortium 65,000 exomes

?

Page 8: Genome sharing projects around the world   nijmegen oct 29 - 2015

Statistically speaking, you still need 10s of thousands of samples for validation

The more severe the phenotype and the more complete penetrance, the easier it will be for you to find your variant, but

“As the genetic complexity of the disease increases (for example, reduced penetrance and increased locus heterogeneity), issues of statistical power quickly become paramount.” http://

www.nature.com/nrg/journal/v15/n5/full/nrg3706.html

But I am just looking at this one disease…

Page 9: Genome sharing projects around the world   nijmegen oct 29 - 2015

What can I do?

PRO TIP: involve a statistician early on in your study design!

Page 10: Genome sharing projects around the world   nijmegen oct 29 - 2015

How can I determine significance?

“One potentially powerful approach is to assess conservation across and within multiple species as whole-genome sequence data become more abundant.”

Look at extreme phenotypes “Sampling cases or controls from the extremes of an appropriate quantitative distribution can often increase power”

Look at non-SNP variants, they are more likely to have functional effects

- “how to account for the technical features of sequencing, such as incomplete sequencing and biased coverage over the genome?”

Page 11: Genome sharing projects around the world   nijmegen oct 29 - 2015

Think of how you can provide evidence that your result is not just a local technical variation or sampling bias

e.g. data from same cell type, same seq technology, same alignment…

How to account for bias?

PRO TIP: include more reference data in your analysis

Page 12: Genome sharing projects around the world   nijmegen oct 29 - 2015

• Know what data is available in your lab, your dept, your org

• Survey from Qiagen showed that one of the main reasons researchers collaborate is to get access to data!

How can I access more data for my research?

Page 13: Genome sharing projects around the world   nijmegen oct 29 - 2015

How can I find collaborators?

PRO TIP: Search for collaborators who have the data you need

PRO TIP: Tell your colleagues and peers what type of data you have in your lab

Page 14: Genome sharing projects around the world   nijmegen oct 29 - 2015

Where can I access data?

public repositories• some you apply for access,

especially if data contains clinical info or whole genome PID

• some are open access: GEO, SRA, PGP, OpenSNP, GigaDB, …

• some are consented for general research use, some have specific consent

Page 15: Genome sharing projects around the world   nijmegen oct 29 - 2015

It may be confusing

Page 16: Genome sharing projects around the world   nijmegen oct 29 - 2015

And it takes time

Bottlenecks: • Finding relevant and usable

data• Getting authorisation to

access data• Formatting data• Storing and moving data

We studied the problem by qualitative interviews followed by a survey of researchers in

human genetics

Page 17: Genome sharing projects around the world   nijmegen oct 29 - 2015

And it takes time

T. A. van Schaik et alThe need to redefine genomic data sharing: a focus on data

accessibility, Applied & Translational Genomics, 2014

10.1016/j.atg.2014.09.013

Researchers spend months to find and access genomic data, and often choose to not access

data at all

Page 18: Genome sharing projects around the world   nijmegen oct 29 - 2015

Barriers to access

Page 19: Genome sharing projects around the world   nijmegen oct 29 - 2015

Barriers to access

NIH / eRA Commons login

No

Yes

Organisation registered with eRA

Organisation has DUNS number

No

No

Write research proposal

Yes+ 2-3 days

+ 1-2 weeks

+ 1 week

Yes

Submit proposal

+ days to weeks

Access granted

Variable: fromweeks to months

dbGaP Application Process

Science…

Find/Download/Decrypt data

+ 1-2 days

Page 20: Genome sharing projects around the world   nijmegen oct 29 - 2015

Why the barrier?

• Benefits: strict governance, review of consent, applicant signs for full responsibility for governance

• Disadvantages: No control of data once access is given, high barrier for access – too high?

Page 21: Genome sharing projects around the world   nijmegen oct 29 - 2015

• Start planning your data needs early in your project• When you find the data you need, start application• Use Open Access data

How can I save time?

PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets

Page 22: Genome sharing projects around the world   nijmegen oct 29 - 2015

• Some data is Open Access requires specific consent

• OpenSNP.org (Bastian)• Personal Genomes Projects• Individuals who put their genomes online, e.g. Manuel Corpas

and his family “the Corpasome”

• http://manuelcorpas.com/about/

Not all data is restricted

Page 23: Genome sharing projects around the world   nijmegen oct 29 - 2015

• Some data is Open Access requires specific consent

• Individuals who put their genomes online, e.g. Manuel Corpas and his family “the Corpasome”

• http://manuelcorpas.com/about/

• OpenSNP.org (Bastian)• Personal Genomes Projects

Not all data is restricted

Page 24: Genome sharing projects around the world   nijmegen oct 29 - 2015

Personal Genome ProjectPGP Harvard PGP Canada PGP UK Genom Austria

Host institution Harvard Medical School Boston

SickKids Toronto University College London CeMM Research Center for Molecular Medicine

Principal Investigator George Church Steven Scherer Stephan Beck Christoph Bock &Giulio Superti-Furga

Launch year 2005 2012 2013 2014Geographic scope USA, mainly Boston Canada United Kingdom Mainly Austria

Enrollment eligibility At least 18 years old, able to make an informed decision, perfect score in the PGP enrollment exam, certain vulnerable groups excluded

Data Generated Whole genome sequencing, upload of additional data possible

Mainly whole genome sequencing

Whole genome sequencing, DNA methylome sequencing, RNA transcriptome sequencing

Mainly whole genome sequencing

Number of genomes 100s 10s 10s 10sData access

http://personalgenomes.org/harvard/data http://genomaustria.at/unser-genom/#genome-der-pionierinnen

Project funding Discretional funds and corporate sponsoring

Institutional startup funds Discretional funds and corporate sponsoring

Institutional startup funds

Areas of emphasis Integration with phenotypic data, collaboration with other personal omics initiatives

Genome donations, synergy with massive-scale clinical genome sequencing projects

Genomes and society, genetic literacy, school projects, education

Website http://personalgenomes.org/harvard/ http://personalgenomes.org/canada/ http://personalgenomes.org/uk/ http://genomaustria.at/

Page 25: Genome sharing projects around the world   nijmegen oct 29 - 2015

Summary of data access barriers

Data is uploaded to repository

Data is discovered by potential user

Data is accessed by potential user

Page 26: Genome sharing projects around the world   nijmegen oct 29 - 2015

Where is the data?

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

≈ 5K

Genomes Available

400K

Genomes Sequenced

Only a fraction of the data is findable or available through public repositories

Page 27: Genome sharing projects around the world   nijmegen oct 29 - 2015

• “even when researchers are authorised to share data they report reluctance to do so because of the amount of effort required“ http://www.sciencedirect.com/science/article/pii/S2212066114000386

• “Clinical geneticists cited a lack of time because their main priority is diagnosing patients. Industrial researchers cited a lack of time because of the pressure to meet the deadlines in their job. Researchers in academia cited both a concern about the potential loss of future publications once unpublished data is shared, and the lack of time and incentive to share data as this does not contribute to their publication record. Researchers from all categories felt that they lacked sufficient resources to make their data available.”

The barrier of making data available

But I do not want to share my data

Page 28: Genome sharing projects around the world   nijmegen oct 29 - 2015

• If you expect data to be available to you – you have to make your data available too!

• Encourage collaborations: power by numbers

1. Get credit – publish and make your data available2. Give credit – cite data sources3. Understand consent – for all uses of clinical data

Best practices

Page 29: Genome sharing projects around the world   nijmegen oct 29 - 2015

• Use all available tools to make your life easier: • Data publications visibility and citations for your data, e.g.

GigaScience

• Figshare, Zenodo, Dryad for sharing open access data

• PhenomeCentral, Matchmaker exchange for rare disease research

• Repositive for finding data across repositories and make your own data discoverable

Best practices: use the tools

Page 30: Genome sharing projects around the world   nijmegen oct 29 - 2015

Does #OpenScience matter at

proposal evaluation?Based on: Winning Horizon 2020 with Open Science,

http://dx.doi.org/10.5281/zenodo.12247

Page 31: Genome sharing projects around the world   nijmegen oct 29 - 2015

“Weakness: Involvement of non-academic beneficiaries is limited”

“Weakness: highly focused on academic activities, and lacks an advanced communication strategy”

“Weakness: limited exposure to non-academic partners & infrastructures”

Excellence

Impact

Implementation

“data accessibility is unclear!”

“data storage & access not considered”

Page 32: Genome sharing projects around the world   nijmegen oct 29 - 2015

“Strengths: extensive dissemination of data to the scientific community (open access, databases)”

“outreach activities to a broad audience”

“research software is freely available”

Impact:

Page 33: Genome sharing projects around the world   nijmegen oct 29 - 2015
Page 34: Genome sharing projects around the world   nijmegen oct 29 - 2015

Make the (research) world a better place by sharing in return

Best practices

Page 35: Genome sharing projects around the world   nijmegen oct 29 - 2015

• Digital consent: towards automatic processing of applications

• Dynamic consent and power to the patient, e.g. PatientsKnowBest

• Privacy-preserving access to datasets: preserving control and governance with data custodian, lower barrier for access

What the future holds

Page 36: Genome sharing projects around the world   nijmegen oct 29 - 2015

In the meantime: It is a jungle out there!

What if finding data was as easy as finding a book on Amazon, book a hotel on Expedia?

Page 37: Genome sharing projects around the world   nijmegen oct 29 - 2015

The Repositive vision

Enabling efficient data

accessIncentivising

best practices

Trusted broker for data

exchange

Page 38: Genome sharing projects around the world   nijmegen oct 29 - 2015

Repositive is a web platform

Discover new data sources

We are indexing all the public sources of data, so users have an easy portal for searching through data descriptions.

EASY SEARCH

Page 39: Genome sharing projects around the world   nijmegen oct 29 - 2015

Repositive is a web platform

Make your data visible

As a two-sided marketplace, the users can also make their own data findable.

SHARE KNOWLEDGE

Page 40: Genome sharing projects around the world   nijmegen oct 29 - 2015

Active Repositive users increase benefits

Build a data community

BUILDTRUST

Users can interact to find relevant collaborators for their research either to analyse their data or to combine data sources.

Page 41: Genome sharing projects around the world   nijmegen oct 29 - 2015

Active Repositive users increase benefits

Find data collaborators

SAVE TIME

Feedback from other users through ratings and comments helps users evaluate data quality

Page 42: Genome sharing projects around the world   nijmegen oct 29 - 2015

Benefit for both sides

Data consumers Data producers

Find relevant data faster

Feedback from other users through ratings and comments to evaluate data quality

Find collaborators with data

Make your data visible

Build credibility as a trusted provider of quality data

Find collaborators to analyse your data

Page 43: Genome sharing projects around the world   nijmegen oct 29 - 2015

Live demo

Sign up as beta tester: http://repositive.io

Page 44: Genome sharing projects around the world   nijmegen oct 29 - 2015

Best practices - recap

• Get credit – publish data• Give credit – cite data• Understand consent

Page 45: Genome sharing projects around the world   nijmegen oct 29 - 2015

Thank you!