research data management - uk data archive

58
RESEARCH DATA MANAGEMENT ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………….…... VEERLE VAN DEN EYNDEN & TOM ENSOM UK DATA ARCHIVE UNIVERSITY OF ESSEX ……………………………………………….……………..……. University of Essex, Looking after your research data 24 January 2013

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Data Management - UK Data Archive

RESEARCH DATA MANAGEMENT

……………………………………………………………………………………………………………………………….……………………………..

……………………………………………………………….…...

VEERLE VAN DEN EYNDEN & TOM ENSOM

UK DATA ARCHIVE

UNIVERSITY OF ESSEX ……………………………………………….……………..…….

University of Essex, Looking after your research data

24 January 2013

Page 2: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

UK DATA ARCHIVE

• the UK Data Archive has over forty years experience in selecting, ingesting, curating and providing access to social science data

• we have huge experience of supporting researchers and data creators of social science data and related disciplines

• we do data sharing for the ESRC Data Policy (since 1995) and the Rural Economy and Land Use programme (2004-2012)

• our best practice approaches to making data shareable are based on: • challenges faced by researchers to share data

• handling research data – quantitative and qualitative

• highly skilled staff comprising researchers, technical and information specialists

www.data-archive.ac.uk

Page 3: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

OUR MANAGING AND SHARING DATA RESOURCES

Managing and sharing guidance

• sections

• references

• training programme

www.data-archive.ac.uk/create-manage

www.data-archive.ac.uk/media/2894/managingsharing.pdf

Training resources:

• presentations

• exercises and discussions / answers

www.data-archive.ac.uk/create-manage/training-resources

Page 4: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

OVERVIEW FOR TODAY

• Data management planning

• Storing your data, including data security, data transfer,

encryption and file sharing

• Formatting and organising data

• Data confidentiality, legal and ethical issues

Page 5: Research Data Management - UK Data Archive

BENEFITS OF MANAGING AND SHARING YOUR DATA ……………………………………………………………………………………………………………………………….……………………………..

DATA CREATED FROM RESEARCH ARE VALUABLE RESOURCES

THAT CAN BE USED AND RE-USED FOR FUTURE SCIENTIFIC AND

EDUCATIONAL PURPOSES. SHARING DATA FACILITATES NEW

SCIENTIFIC INQUIRY, AVOIDS DUPLICATE DATA COLLECTION AND

PROVIDES RICH REAL-LIFE RESOURCES FOR EDUCATION AND

TRAINING

Page 6: Research Data Management - UK Data Archive

DATA LIFECYCLE & DATA MANAGEMENT PLANNING ……………………………………………………………………………………………………………………………….……………………………..

A DATA MANAGEMENT AND SHARING PLAN HELPS

RESEARCHERS CONSIDER: WHEN RESEARCH IS BEING

DESIGNED AND PLANNED, HOW DATA WILL BE MANAGED

DURING THE RESEARCH PROCESS AND SHARED

AFTERWARDS WITH THE WIDER RESEARCH COMMUNITY

AREAS OF COVERAGE

• Data management planning why & how and the research

lifecycle

• Data management checklist

• Roles and responsibilities

• Costing data management

Page 7: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

WHY DMP ?

• Research funders require planning for data management and data

sharing, e.g. UK Research Councils

• which data

• how manage

• how share, preserve, curate

• rights to access, use,….

• roles & responsibilities

DCC: UK research funders' DMPS expectations

• Research benefits

• think what to do with research data, how collect, how look after

• keep track of research data (e.g. staff leaving)

• identify support, resources, services needed

• plan storage, short & long-term

• plan security, ethical aspects

• be prepared for data requests (FoI, funder)

Page 8: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DATA PLAN REQUIREMENTS

Funder Required at application Data topics in DMP

AHRC Technical plan Standards, preservation, continued access &

use

BBSRC Data management and

sharing plan

Type, format, standards, sharing methods,

restrictions, timeframe

CRUK Data sharing plan Volume, format, standards, metadata,

documentation, sharing method, timescale,

preservation, restrictions

DFID Access and data

management plan

Repositories, limits, timescale, responsibilities,

resources, access strategy

EPSRC Policy framework

ESRC Data management plan Volume, type, quality, archiving plans,

difficulties sharing, consent sharing, IPR,

responsibilities

Page 9: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DATA PLAN REQUIREMENTS

Funder Required at application Data topics in DMP

MRC Data management plan Collection methods, documentation,

standards, preservation, curation, security,

confidentiality, sharing & access, timescale,

responsibilities

NERC Outline data management

plan

DM procedures, created data

STFC Data management plan Type, preservation, metadata, value, sharing,

timescale, resources needed

Wellcome

Trust

Data management and

sharing plan

What data? When share? Where share?

How access? Limits, how preserve? What

resources?

Digital Curation Centre, Funders’ data plan requirements: www.dcc.ac.uk/resources/data-management-

plans/funders-requirements

Knight, G; (2012) Funder Requirements for Data Management and Sharing. London School of Hygiene

and Tropical Medicine, London. researchonline.lshtm.ac.uk/208596/

Page 10: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

HOW

• Funder template for DMP • ESRC DMP requirements in data policy and DMP guidance

• MRC DMP guidance and template

• AHRC technical plan requirements

• DCC’s DMPonline tool

• UK Data Archive data management checklist

Page 11: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ROLES & RESPONSIBILITIES

Assign, not presume roles or responsibilities for data management

Who?

• PI

• Research staff / students - collecting, creating, processing,

analysing data

• External contractors - data collection, collation, processing; e.g.

transcribers

• Support staff - managing, administering research

• Essex ISS - data storage, security, back-up services

• External/institutional data centres / archives - data sharing

Page 12: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

COSTING

• Cost data management and sharing into research

• Identify resources needed to make research data shareable

beyond primary research team - above planned standard

research procedures and practices

• Resources = people, equipment, infrastructure, tools to manage,

document, organise, store and provide access to data

• Early planning can reduce costs

• See our data management costing tool

Page 13: Research Data Management - UK Data Archive

FORMATTING YOUR DATA ……………………………………………………………………………………………………………………………….……………………………..

USING STANDARD AND INTERCHANGEABLE OR OPEN

LOSSLESS DATA FORMATS ENSURES LONG-TERM

USABILITY OF DATA. HIGH QUALITY DATA ARE WELL

ORGANISED, STRUCTURED, NAMED AND VERSIONED AND

THE AUTHENTICITY OF MASTER FILES IDENTIFIED.

AREAS OF COVERAGE

• File formats

• File conversions

• Organising files and folders

• File naming

• Version control and authenticity

Page 14: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

CAN YOU UNDERSTAND/USE THESE DATA?

SrvMthdDraft.doc

SrvMthdFinal.doc

SrvMthdLastOne.doc

SrvMthdRealVersion.doc

Page 15: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

FILE FORMATS

Choice of software format for digital data:

• planned data analyses

• software availability/cost

• hardware used – e.g. audio capture

• discipline-specific standards and customs

Digital data = software dependent

Digital data endangered by obsolescence of software/ hardware

Best formats for long-term preservation - standard formats, interchangeable formats, open formats

e.g. tab-delimited, comma-delimited (CSV), ASCII, RTF, PDF/A, OpenDocument format, SPSS portable, XML

see UK Data Archive optimal file formats for various data types

Page 16: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

FILE FORMAT CONVERSIONS

Convert data for preservation or back-up: export, save as, scripts

Beware of conversion errors losses:

• loss of internal metadata

e.g. convert MS Access to tab-delimited tables

• loss of editing, formatting, formulae

e.g. convert DOCX to RTF; XLSX to CSV

• truncation or loss of data

e.g. string variables lost in SPSS – Stata conversion; MS Access memo fields truncated in conversion to CSV; loss of sample labels on sequence data between annotation and analysis formats

Check for errors and changes after conversion

Page 17: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

EXAMPLE: FORMAT CONVERSION

MS Excel (.XLSX) format

Tab-delimited text format

Loss of

annotation Formatting

change

Page 18: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ORGANISING DATA

Plan in advance how best to organise data

Use a logical structure and ensure collaborators understand

Examples

• hierarchical structure of files, grouped in folders, e.g. audio, transcripts and annotated transcripts

• measurement data – original, processed, analysed etc.

• interview transcripts - individual well-named files

Page 19: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

FILE NAMING

• file name = principal identifier of file

• use logical naming i.e. easy to identify, locate, retrieve, access

• naming provides organisation, context & consistency

• name elements: version nr, date, content description, creator

name

Best practice

• name independent of location

• brief & relevant

• no special characters, dots or spaces

• for separation use underscores _

• versioning via filename: ascending, decimal version numbers

• use names to classify broad types of files

• avoid very long file names

Page 20: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DIRECTORY STRUCTURE

Page 21: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

QUIZ: FILE NAMING

From the UK Data Archive’s Managing and Sharing Data: Training

Resources

http://data-archive.ac.uk/create-manage/training-resources/format

Page 22: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

VERSION CONTROL

Keep track of different copies or versions of data files

Which method:

• single site vs. across locations

• single vs. multiple users

• different versions to be stored vs. synchronised files

Best practice:

• unique identifiers for files (file names)

• record file status/versions

• record relationships between files

e.g. data file and documentation; similar data files

• keep track of file locations

e.g. laptop vs. PC

Page 23: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

VERSION CONTROL

Single user of data files

• file naming – unique file names with date or version number (avoid spaces!)

e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTest_06-04- 2008; BGHSurveyProcedures_00_04

• version control table or file history within or alongside data file

• version control facility within software

e.g. MS WORD

Page 24: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

VERSION CONTROL

Multiple users of data files

• control rights to file editing: read/write permissions

e.g. Windows Explorer

• versioning/file sharing software: check files out/in

e.g. SharePoint, CMS, Google Docs, Amazon S3

• manual merging of multiple entries/edits

Synchronise files

• software

e.g. MS SyncToy,

• cloud-based

e.g. DropBox, Google Drive

Page 25: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

EXAMPLE : VERSION CONTROL TABLE

Page 26: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DEMO: SYNCHRONISING

Synchronise files between two folders using SyncToy (Windows) software

www.data-archive.ac.uk/media/375817/formattingyourdata_synchronisingexercise.pdf

Page 27: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

QUIZ: FORMATTING YOUR DATA

From the UK Data Archive’s Managing and Sharing Data: Training

Resources

http://data-archive.ac.uk/create-manage/training-resources/format

Page 28: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

STORING YOUR DATA ……………………………………………………………………………………………………………………………….……………………………..

LOOKING AFTER RESEARCH DATA FOR THE LONGER-TERM AND PROTECTING THEM FROM UNWANTED LOSS REQUIRES HAVING GOOD STRATEGIES IN PLACE FOR SECURELY STORING, BACKING-UP, TRANSMITTING, AND DISPOSING OF DATA. COLLABORATIVE RESEARCH BRINGS CHALLENGES FOR THE SHARED STORAGE OF, AND ACCESS TO, DATA.

AREAS OF COVERAGE

• Making back-ups

• Data storage

• Data security

• Data transmission and encryption

• File sharing and collaborative environments

• Data disposal

• Disseminating data

Page 29: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

BACKING-UP DATA

• Why do back-ups? Risk of loss and change - would your data

survive a disaster?

• Protect against: software failure, hardware failure, malicious

attack, natural disasters

• Back-ups are additional copies that can be used to restore

originals

• It’s not backed-up unless backed-up with a strategy

Page 30: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

BACK-UP STRATEGY

Consider

• what’s backed-up? - all, some, just the bits you change?

• where? - original copy, external local and remote copies

• what media? - CD, DVD, external hard drive, tape, etc.

• how often? – assess frequency and automate the process

• for how long is it kept?

• verify and recover - never assume, regularly test a restore

Backing-up need not be expensive

• 1Tb external drives are around

£50, with back-up software

Page 31: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE 31

DATA STORAGE

All digital media are fallible

File formats and physical storage media become obsolete

• optical (CD, DVD) and magnetic media (hard drive, tapes) degrade

• never assume the format will be around for ever

Best practice

• use data formats with long-term availability

• storage strategy - at least two forms of storage and locations

• maintain original copy, external local copy and external remote copy

• copy data files to new media two to five years after first created

• check data integrity of stored data files regularly (checksum)

• know your personal/institutional back-up strategy

• know data retention policies that apply: funder, publisher, home institution

• what to protect? not only data, and not only digital

Page 32: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE 32

NON-DIGITAL STORAGE

Printed materials, photographs

• degradation from sunlight and acid (sweat on skin, in paper)

• use high quality media for long-term storage/preservation

e.g. using acid-free paper & boxes, non-rust paperclips (no

staples)

Confidential items, e.g. signed consent forms, interview notes

• store securely, behind lock

• separate from data files

Page 33: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE 33

ENCRYPTION

Always encrypt personal or sensitive data

Encrypt anything you would not send on a postcard

• for moving files e.g. transcripts

• for storing files e.g. shared areas, mobile devices

Basic principles

• use an algorithm to transform information (A=1)

• need a ‘key’ to decrypt

Free softwares that are easy to use

• Safehouse

• Truecrypt

• Axcrypt

These softwares

• encrypt hard drives, partitions, files and folders

• encrypt portable storage devices such as USB flash drives

Page 34: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DATA DESTRUCTION

When you delete data and documentation from a hard drive: it is

probably not gone

• files need to be overwritten to ensure they are irretrievably

deleted:

• BCWipe - uses ‘military-grade procedures to surgically

remove all traces of any file’

• Axcrypt

• if in doubt, physically destroy the drive using an approved

secure destruction facility

• physically destroy portable media, as you would

shred paper

Page 35: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE 35

DATA SECURITY

Protect data from unauthorised access, use, change, disclosure and destruction

Personal data need more protection – always keep separate

Control access to computers • passwords

• anti-virus and firewall protection, power surge protection

• networked vs non-networked PCs

• all devices: desktops, laptops, memory sticks, mobile devices

• all locations: work, home, travel

• restrict access to sensitive materials e.g. consent forms, patient records

Proper disposal of equipment (and data) • even reformatting the hard drive is not sufficient

Control physical access to buildings, rooms, cabinets

Page 36: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE 36

FILE SHARING & COLLABORATIVE ENVIRONMENTS

Sharing data between researchers and teams

• too often email attachments

• Yousendit, Dropbox – consider if appropriate as services can be hosted outside the EU (DPA for personal data), e.g. encrypt

• Virtual Research Environments

• MS SharePoint

• Sakai

• file transfer protocol (ftp)

• physical media

• Essex ZendTo

Page 37: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DEMO: DATA ENCRYPTION

Create an encrypted storage space using free software SafeHouse

www.data-archive.ac.uk/media/312652/storingyourdata_encryptionexercise.pdf

Page 38: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DEMO: DATA INTEGRITY & BACK-UP

Calculate the MD5 checksum value of a file to check its integrate, e.g.

after back-up

www.data-archive.ac.uk/media/361550/storingyourdata_checksumexercise.pdf

Page 39: Research Data Management - UK Data Archive

ETHICAL AND LEGAL ISSUES IN DATA SHARING ……………………………………………………………………………………………………………………………….……………………………..

A COMBINATION OF GAINING CONSENT FOR DATA SHARING,

ANONYMISING AND REGULATING ACCESS TO DATA WILL

INCREASE THE POTENTIAL FOR MAKING PEOPLE-

RELATED RESEARCH DATA MORE READILY AND WIDELY

AVAILABLE

AREAS OF COVERAGE

• Legal and ethical aspects

• Informed consent for data sharing

• Anonymising data

• Controlling access to data

Page 40: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ETHICAL ARGUMENTS FOR ARCHIVING DATA

• store and protect data securely

• not burden over-researched, vulnerable groups

• make best use of hard-to-obtain data (e.g., elites, socially excluded, over-researched)

• extend voices of participants

• provide greater research transparency

• enable fullest ethical use of rich data

In each, ethical duties to participants,

peers and public may be present

Page 41: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DUTY OF CONFIDENTIALITY AND DATA SHARING

• Duty of confidentiality exists in common law and may apply to research data

• If participant consents to share data, then sharing does not breach confidentiality

• Public interest can override duty of confidentiality; best practice is to avoid vague or general promises in consent forms

Page 42: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DATA PROTECTION ACT, 1998

• Personal data: • relate to living individual

• individual can be identified from those data or from those data and other information

• include any expression of opinion about the individual

• Requirements for handling personal data • processed fairly and lawfully

• obtained and processed for a specified purpose

• adequate, relevant and not excessive for the purpose

• accurate

• not kept longer than necessary

• processed in accordance with the rights of data subjects, e.g. right to be informed about how data will be used, stored, processed, transferred, destroyed, …; right to access info and data held

• kept secure

• not transferred abroad without adequate protection

• Only disclosed if consent has been given to do so (except legal duty)

Page 43: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

DATA PROTECTION ACT & RESEARCH

• Exceptions for personal data collected as part of research:

• can be retained indefinitely (if needed)

• can be used for other purposes in some circumstances

• people should still be informed

• If data are anonymised (personal identifiers removed) then DP laws will not apply as these no longer constitute ‘personal data’

DPA is not intended to, and does not, inhibit ethical research

Page 44: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

SENSITIVE DATA

• Data regarding an individual's race or ethnic origin, political opinion, religious beliefs, trade union membership, physical or mental health, sex life, criminal proceedings or convictions (DPA 1998)

• Can only be processed for research purposes if:

• explicit consent (ideally in writing) has been obtained; or

• medical research by a health professional or equivalent with duty of confidentiality; or

• analysis of racial/ethnic origins for purpose of equal opportunities monitoring; or

• in substantial public interest and not causing substantial damage and distress

Page 45: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

OPTIONS FOR SHARING CONFIDENTIAL DATA

Researchers to consider

• obtaining informed consent, also for data sharing and preservation / curation

• protecting identities e.g. anonymisation, not collecting personal data

• restricting / regulating access where needed (all or part of data) e.g. by group, use, time period

• securely storing personal or sensitive data

Consider jointly and in dialogue with participants

Plan early in research

Page 46: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

CONSENT NEEDED ACROSS THE DATA LIFE CYCLE

• Engagement in the research process

• decide who approves final versions of transcripts

• Dissemination in presentations, publications, the web

• decide who approves research outputs

• Data sharing and archiving

• consider future uses of data

Always dependent on the research context

UK Data Archive model consent form

Page 47: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ANONYMISATION PREVENTS IDENTITY DISCLOSURE

A person’s identity can be disclosed through:

• direct identifiers

e.g. name, address, postcode, telephone number, voice, picture

Often NOT essential research information (administrative)

• indirect identifiers – possible disclosure in combination with other information

e.g. occupation, geography, unique or exceptional values (outliers) or characteristics

Page 48: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

KEY POINTS FOR ANONYMISING

• never disclose personal data - unless consent for disclosure

• reasonable/appropriate level of anonymity

• maintain maximum meaningful information

• where possible replace rather than remove

• identifying information may provide context, do not over-anonymise

• re-users of data have the same legal and ethical obligation to NOT disclose confidential information as primary users

Page 49: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ANONYMISING QUANTITATIVE DATA

• remove direct identifiers

e.g. names, address, institution, photo

• reduce the precision/detail of a variable through aggregation

e.g. birth year vs. date of birth, occupational categories, area rather than village

• generalise meaning of detailed text variable

e.g. occupational expertise

• restrict upper lower ranges of a variable to hide outliers

e.g. income, age

• combining variables

e.g. creating non-disclosive rural/urban variable from place variables

Page 50: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

GEO-REFERENCED DATA

Spatial references (point coordinates, small areas) may disclose position of individuals, organisations, businesses

Remove spatial references - prevents disclosure; also all geographical and related information lost

Better

• reduce precision - replace point co-ordinates with larger, non-disclosing geographical areas e.g. km2 area, postcode district, ward, road

• reduce precision - replace point coordinate with meaningful variable typifying the geographical position; or summary statistics of location

e.g. catchment area, poverty index, population density

• keep spatial references and impose access restrictions on data

Page 51: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ANONYMISING QUALITATIVE DATA

• not collect disclosive data unless necessary

• plan or apply editing at time of transcription

except: longitudinal studies - anonymise when data collection complete (linkages)

• avoid blanking out; use pseudonyms or replacements

• avoid over-anonymising - removing/aggregating information in text can distort data, make them unusable, unreliable or misleading

• consistency within research team and throughout project

• identify replacements, e.g. with [brackets]

• keep anonymisation log of all replacements, aggregations or removals made – keep separate from anonymised data files

• xml mark-up can be used for anonymisation

<seg type="anonymised">word to be anonymised</seg>

Page 52: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ANONYMISING QUALITATIVE DATA

Example: Anonymisation log interview transcripts

Interview / Page Original Changed to

Int1

p1 Spain European

country

p1 E-print Ltd Printing

company

p2 20th June June

p2 Amy Moira

Int2

p1 Francis my friend

Page 53: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

ACCESS CONTROLS ON DATA

• Essential when anonymisation ineffective or damaging to quality

• visual or audio data

• disclosive microdata

• UK Data Archive has gradation of access controls

• small number of studies are open (no registration)

• majority require registration

• data users sign legally binding End User Licence – e.g. not identify any potentially identifiable individuals

• stricter regulations for certain types of data: • Special Licences

• Approved researchers

• require data access authorisation from data owner prior to data release

• embargo for given time period

• Secure Data Service (no direct data access)

• Multiple AC can apply to different data types within one study

Page 54: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

COPYRIGHT ……………………………………………………………………………………………………………………………….……………………………..

COPYRIGHT IS AN INTELLECTUAL PROPERTY RIGHT ASSIGNED AUTOMATICALLY TO THE CREATOR, THAT PREVENTS UNAUTHORISED COPYING AND PUBLISHING OF AN ORIGINAL WORK. COPYRIGHT APPLIES TO RESEARCH DATA AND PLAYS A ROLE WHEN CREATING, SHARING AND RE-USING DATA.

Page 55: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

COPYRIGHT AND DATA SHARING

• Copyright permissions sought and granted prior to data sharing / archiving

• Clearing copyright – reach agreement with copyright holder

• Data archives publish data – they hold no copyright

• Copyright holders give permission to data archives to preserve data and make them accessible to users

• For secondary use, copyright clearance before data can be reproduced

• Exception - fair dealing - for non-commercial research, private study, teaching, quotations, criticism or review; then author and source must be cited

Page 56: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

CASE STUDY

Health and Social Consequences of the Foot and Mouth Disease Epidemic in North Cumbria, 2001-2003 (SN5407 at UK Data Archive)

Maggie Mort, Lancaster University

•funded by Department of Health

•recruit panel of 54 local people in affected area at time of FM crisis: farmers, agricultural professionals, small businesses, health professionals, vets, residents

•weekly diaries for 18 months describing how their life was affected by the crisis and process of recovery observed around them (handwritten)

•In-depth interviews and group discussions (audio recordings, transcripts)

•at end of research – feeling by researchers that data should be archived

•how would you approach data archiving in this case? • ethical aspects

• legal aspects

• how engage with panel

• practical aspects

•If this was your research project, which data management aspects would

•be essential to consider, and when? use the DM checklist to plan DM activities within a research cycle

Page 57: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

CASE STUDY

Researchers approach

• seek advice from copyright specialist re. terms of agreement for archiving

• meet with UK Data Archive, Qualidata - advice data archiving

• develop separate consent forms for written and audio material, with opt in

/ opt out and embargo option

• pilot discussion on data archiving with 4 panel members to explore:

• feelings re. data anonymisation, confidentiality, copyright, ownership

• user options of archived data - scholarly / educational purposes

• understanding of archiving by participants and information required

• discuss archiving individually with each panel member

• 7 panel members declined archiving their data

• 40 interview and diary transcripts are archived and available for re-use by

registered users

• 3 interviews and 5 diaries are embargoed until 2015

• audio files archived and only available by permission from researchers

Detailed information:

www.esds.ac.uk/doc/5407%5Cmrdoc%5Cpdf%5Cq5407userguide.pdf

Page 58: Research Data Management - UK Data Archive

……………………………………………………………………………………………………………………………….……………………………..

………………………………………………………………………………………………………………………………………….……………………..…

UK DATA ARCHIVE

CONTACT

Workshop materials:

http://data-archive.ac.uk/news-events

http://www.data-archive.ac.uk/news-events/events.aspx?id=3329

UK DATA ARCHIVE

UNIVERSITY OF ESSEX

WIVENHOE PARK

COLCHESTER

ESSEX CO4 3SQ ……………………….…………………….….

T: +44 (0)1206 872001

E: [email protected]

W: www.data-archive.ac.uk ……………………………….………………..