introduction the imber ssc created a dmc, which has made three major proposals: educate, change the...

17
Introduction • The IMBER SSC created a DMC, which has made three major proposals: • Educate, change the negative ethos • Help you to do DM by creating a cookbook of simple guidelines • Advocate that every IMBER project or cruise identify a Data Scientist to help with DM

Upload: lauren-greer

Post on 27-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Introduction

• The IMBER SSC created a DMC, which has made three major proposals:

• Educate, change the negative ethos

• Help you to do DM by creating a cookbook of simple guidelines

• Advocate that every IMBER project or cruise identify a Data Scientist to help with DM

Page 2: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Writing papers

• Writing papers is an essential part of a researcher’s job

• Writing papers is time consuming

• Writing papers is tedious/boring

• Writing papers needs attention to detail

• Publications are a legacy of your research

Page 3: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Data management

• Data management is an essential part of a researcher’s job

• Data management is time consuming

• Data management is tedious/boring

• Data management needs attention to detail• Data sets are a legacy of your research

potentially more objective than your publications, and available for re-interpretation

Page 4: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

So why do we accept that we must write papers, but treat DM as the poor relation?

• Because we get recognition for publishing• “Publish or perish”• But we don’t get recognition for DM• Let’s try to change that

Page 5: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Recognition for DM

• Carrots and sticks

• Stick– No more funding

• Carrots– Referenceable data sets using DOIs (Digital

Object Identifiers) - SCOR/IODE initiative– Give help with DM – develop a cookbook– Data Scientist will look good on your CV

Page 6: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Example 1

Example of data spreadsheet submission to the BODC

Page 7: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Good filename

Good data set title

Good overall data organisation

Good explicit column headers

Page 8: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Use consistent data format – use a number to indicate a missing numerical value and use text to indicate a missing text value

Avoid blank cells unless the value is missing

Do not mix characters and numbers in same field

Avoid free text for dates – best to use separate columns for year, month, day.

Definitions need to be explicit

Page 9: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.15

10 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23

Example guideline 2

Page 10: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Example guideline 2map stations

Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.15

10 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23

Page 11: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Some suggestions

• Talk to a Data Centre from the start

• Fill in Metadata right from the start

• Delegate someone to help right from the start

• Follow the guidelines in the cookbook

• Maintain an Event Log during a cruise

• Take regular copies of notes and data

Page 12: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

The bottom line

• DM cannot be an afterthought

• If you give DM some thought when you first plan a project, it will be– relatively straightforward– not too much effort– remarkably useful to all participants– valuable to those who come after

• Help is at hand: talk to Data Centre right at the start

Page 13: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

So, what is a Data Scientist?

• The Data Scientist is someone who helps and advises the project/cruise Principal Scientist and researchers to document their data sets so that they are properly described

• The DS also interacts with PIs and Data Specialists to calibrate, validate, save and archive data

• Why is it FUN? - because you learn so much yourself by having to talk to people

• Can be full or part-time; paid or unpaid; hire, cajole or volunteer

Page 14: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

What does the DS gain?

• Broadening your experience, learning from other PIs

• Advancing your own DM skills

• Great management training! (listening to others, looking for problems)

• Looks great on your CV

• You might even get paid

Page 15: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

IMBER cookbook

• Draft version by Christmas

• Find it on-line via IMBER web site– Click on Data/How to do?

• Advise widely and seek your comments

• Create downloadable version + supporting templates etc (to take to sea)

Page 16: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

IMBER cookbook demo

Page 17: Introduction The IMBER SSC created a DMC, which has made three major proposals: Educate, change the negative ethos Help you to do DM by creating a cookbook

Summary

• Plan DM right at the start and allocate funds (5-10%, includes Data Centre time)

• Help us get the cookbook right

• Follow the guidelines in the cookbook

• Appoint/delegate/cajole somebody to be Data Scientist on a major project/cruise– Both he/she and you will benefit