reasons why other people should share their data.€¦ · reasons why other people should share...

23
Reasons why other people should share their data. Phillip Lord, Newcastle University

Upload: others

Post on 15-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Reasons why other people should share their data.

Phillip Lord, Newcastle University

Page 2: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

“In the standard model, one collects data, publishes a paper or papers and then gradually loses the original dataset.”

THE NEW KNOWLEDGE ECONOMY AND SCIENCE AND TECHNOLOG Y POLICYGeoffrey Bowker, University of California, San Diego

Page 3: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

What am I?

•Web-Enabled

•Open Access

•Open Source

•Blogger

•Friend-Feeding

•Tweeting

•Emailer

A Web 2.0, open, data-sharing Junkie, used to washing his dirty laundry in public

Colour Blind, Design Blind, and Tasteless

Page 4: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

CARMEN – eScience for the Neurosciences

Stirling

St. Andrews

Newcastle

York

Sheffield

Cambridge

ImperialPlymouth

Warwick

Leicester

Manchester

• 6M EUR over 4 years• 20 Investigators

• Commenced 1st October 2006

Page 5: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Research Challenge

Understanding the brain may be the greatest

informatics challenge of the 21st century

Worldwide >100,000 neuroscientists(~ 5,000 in UK) are generating vast amounts of data

Principal experimental data formats:

� molecular (genomic/proteomic)

� neurophysiological (time-serieselectrical measures of activity)

� anatomical (spatial)

� behavioural

Neuroinformatics concerns how these data are handled and integrated, including the application of computational modelling

Page 6: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Need for Cooperation

Understanding the brain may be the greatest

informatics challenge of the 21st century

OECD Neuroinformatics Working Group identified the need to work cooperativelyin order to achieve major advances

Cooperation will permit:

� development of common processes

� best value from data, including longterm curation

� ‘mega-analysis’ of large data sets

� integration of data sets across different scales and different approaches

� interdisciplinary research

Page 7: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

CARMEN – Focus on Neural Activity

� resolving the ‘neuralcode’ from the timingof action potentialactivity

Understanding the brain may be the greatest

informatics challenge of the 21st century

neurone 1

neurone 2

neurone 3

� raw voltage signal data collected bypatch-clamp and single & multi-electrode array recording

� novel optical recording, particularlythe activity dynamics of large networks

Page 8: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Sharing of Knowledge

• How do we share the data

http://en.wikipedia.org/wiki/File:Usbkey_internals.jpg

http://en.wikipedia.org/wiki/File:Jet2_aeroplane_landing_at_EDI.jpg

Page 9: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Data

Metadata

Core ServicesExternal

Client

External

Client

Service 1

Service 2

Service n

Service 1

Service 2

Service n

Client Dynamically

Deployed Services

Workflow

Enactment

Engine

Registry

We want to share data, but also programmatic cools

Page 10: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Sharing data!

• CARMEN is based around the idea that sharing data is good. – If it’s someone elses

• Common Worries:– We did the experimental work, we need the papers– Other people might not understand the data– It won’t be of any use to other people– Other people might use it wrongly

Page 11: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

We did the experimental work, we need the papers

• We can implement a security system– Fine-Grained (per item)– Role-Based

• But this has it’s problems• I have no idea how this works• We lack the metrics to show that people will get

more papers from release.

Page 12: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Other people might not understand the data

2009

2008

2007

2006

2005

2004

2003

2002

• There are many answers to this one:– “surely, that’s their

problem”• Metadata: Minimal

Information About a Neuroscience Investigation

• We lack the metrics to show that better annotated data is better used (ie leads to more papers)

Page 13: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

It won’t be of any use to other people

• Data sharing in other domains works!– Yeah, but that’s different

• Who is using my data?– How often has my data been downloaded

• Easy to provide but not that good an indicator.

– Who has downloaded it • Easy to provide but a barrier to reuse.

http://en.wikipedia.org/wiki/File:Rooster04_adjusted.jpg

http://en.wikipedia.org/wiki/File:Coturnix_coturnix_eggs.jpg

Page 14: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Other people might use it wrongly

• This seems to centre around the idea that the data is too hard to understand.

• Metadata!!• If you data is not comprehensible, then your analysis is

not repeatable. So, it’s not science.• We need attribution methods other than authorship

– Authorship from your data == Career Value!– Authorship => I agree with the paper.

• These two should be separate

Page 15: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Sharing Code

• Neuroscientists don’t have a strong tradition of sharing code.

• Computer scientists do have a strong tradition of not sharing code.

• Surely code is just data?– But data is a artefact– Code is a shapshot of a development process.

Page 16: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Common concerns

• We did the experimental work, we need the papers– I wrote the code; It’s my startup company

• Other people might not understand the data– My code is really ugly

• It won’t be of any use to other people– It doesn’t work and I don’t want everyone to know

• Other people might use it wrongly– It’s going to wipe their hard drive

Page 17: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Addition Issues: Configuration

sub go_dbi_connection{## Edit these appropriately for your database.my $go_dbi_database = "DBI:mysql:database=go_full_2006_05;host=somewhere.ncl.ac.uk";my $go_dbi_username = "root";my $go_dbi_password = "akd0skdmw";

return DBI->connect( $go_dbi_database, $go_dbi_username, $go_dbi_password );}

• This is some of my code. Oh dear. • Protocol Hacking

Page 18: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

It won’t be of any use: may be true!

• It doesn’t work and I don’t want everyone to know– It depends on a third-party library– It won't build without a development environment– It was written for a specific purpose

– There are answers to all these, but they are expensive

Page 19: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

And the biggie

•Software is a slice in time

•There is a social commitment

•Code maintenance is hard

•Funding for it is hard

•"It's just code; we're doing science"

•No one cites you

•Perhaps standard metadata could help

Page 20: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Conclusions

• Sharing is good, but hard– if the researchers don't want to, it ain't gonna happen.

• Attribution, referencing, credit are critical– Understanding the level of ongoing commitment

• The social aspects vary between domains• Different kinds of data require different handling• Small changes can be a big help!

Page 21: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

You can move a mountain

One pebble at a time

Page 22: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University

Acknowledgements

MINI: Frank Gibson, Paul G Overton, Tom V Smulders, Simon R Schultz, Stephen J Eglen, Colin D Ingram, Stefano Panzeri, Phil Bream, Evelyne Sernagor, Mark Cunningham, Christopher Adams, ChristophEchtermeyer, Jennifer Simonotto, Marcus Kaiser, Daniel C Swan, Martyn Fletcher, Phillip Lord

Page 23: Reasons why other people should share their data.€¦ · Reasons why other people should share their data. Phillip Lord, Newcastle University