research data management - uk data archive
TRANSCRIPT
RESEARCH DATA MANAGEMENT
……………………………………………………………………………………………………………………………….……………………………..
……………………………………………………………….…...
VEERLE VAN DEN EYNDEN & TOM ENSOM
UK DATA ARCHIVE
UNIVERSITY OF ESSEX ……………………………………………….……………..…….
University of Essex, Looking after your research data
24 January 2013
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
UK DATA ARCHIVE
• the UK Data Archive has over forty years experience in selecting, ingesting, curating and providing access to social science data
• we have huge experience of supporting researchers and data creators of social science data and related disciplines
• we do data sharing for the ESRC Data Policy (since 1995) and the Rural Economy and Land Use programme (2004-2012)
• our best practice approaches to making data shareable are based on: • challenges faced by researchers to share data
• handling research data – quantitative and qualitative
• highly skilled staff comprising researchers, technical and information specialists
www.data-archive.ac.uk
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
OUR MANAGING AND SHARING DATA RESOURCES
Managing and sharing guidance
• sections
• references
• training programme
www.data-archive.ac.uk/create-manage
www.data-archive.ac.uk/media/2894/managingsharing.pdf
Training resources:
• presentations
• exercises and discussions / answers
www.data-archive.ac.uk/create-manage/training-resources
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
OVERVIEW FOR TODAY
• Data management planning
• Storing your data, including data security, data transfer,
encryption and file sharing
• Formatting and organising data
• Data confidentiality, legal and ethical issues
BENEFITS OF MANAGING AND SHARING YOUR DATA ……………………………………………………………………………………………………………………………….……………………………..
DATA CREATED FROM RESEARCH ARE VALUABLE RESOURCES
THAT CAN BE USED AND RE-USED FOR FUTURE SCIENTIFIC AND
EDUCATIONAL PURPOSES. SHARING DATA FACILITATES NEW
SCIENTIFIC INQUIRY, AVOIDS DUPLICATE DATA COLLECTION AND
PROVIDES RICH REAL-LIFE RESOURCES FOR EDUCATION AND
TRAINING
DATA LIFECYCLE & DATA MANAGEMENT PLANNING ……………………………………………………………………………………………………………………………….……………………………..
A DATA MANAGEMENT AND SHARING PLAN HELPS
RESEARCHERS CONSIDER: WHEN RESEARCH IS BEING
DESIGNED AND PLANNED, HOW DATA WILL BE MANAGED
DURING THE RESEARCH PROCESS AND SHARED
AFTERWARDS WITH THE WIDER RESEARCH COMMUNITY
AREAS OF COVERAGE
• Data management planning why & how and the research
lifecycle
• Data management checklist
• Roles and responsibilities
• Costing data management
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
WHY DMP ?
• Research funders require planning for data management and data
sharing, e.g. UK Research Councils
• which data
• how manage
• how share, preserve, curate
• rights to access, use,….
• roles & responsibilities
DCC: UK research funders' DMPS expectations
• Research benefits
• think what to do with research data, how collect, how look after
• keep track of research data (e.g. staff leaving)
• identify support, resources, services needed
• plan storage, short & long-term
• plan security, ethical aspects
• be prepared for data requests (FoI, funder)
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DATA PLAN REQUIREMENTS
Funder Required at application Data topics in DMP
AHRC Technical plan Standards, preservation, continued access &
use
BBSRC Data management and
sharing plan
Type, format, standards, sharing methods,
restrictions, timeframe
CRUK Data sharing plan Volume, format, standards, metadata,
documentation, sharing method, timescale,
preservation, restrictions
DFID Access and data
management plan
Repositories, limits, timescale, responsibilities,
resources, access strategy
EPSRC Policy framework
ESRC Data management plan Volume, type, quality, archiving plans,
difficulties sharing, consent sharing, IPR,
responsibilities
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DATA PLAN REQUIREMENTS
Funder Required at application Data topics in DMP
MRC Data management plan Collection methods, documentation,
standards, preservation, curation, security,
confidentiality, sharing & access, timescale,
responsibilities
NERC Outline data management
plan
DM procedures, created data
STFC Data management plan Type, preservation, metadata, value, sharing,
timescale, resources needed
Wellcome
Trust
Data management and
sharing plan
What data? When share? Where share?
How access? Limits, how preserve? What
resources?
Digital Curation Centre, Funders’ data plan requirements: www.dcc.ac.uk/resources/data-management-
plans/funders-requirements
Knight, G; (2012) Funder Requirements for Data Management and Sharing. London School of Hygiene
and Tropical Medicine, London. researchonline.lshtm.ac.uk/208596/
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
HOW
• Funder template for DMP • ESRC DMP requirements in data policy and DMP guidance
• MRC DMP guidance and template
• AHRC technical plan requirements
• DCC’s DMPonline tool
• UK Data Archive data management checklist
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ROLES & RESPONSIBILITIES
Assign, not presume roles or responsibilities for data management
Who?
• PI
• Research staff / students - collecting, creating, processing,
analysing data
• External contractors - data collection, collation, processing; e.g.
transcribers
• Support staff - managing, administering research
• Essex ISS - data storage, security, back-up services
• External/institutional data centres / archives - data sharing
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
COSTING
• Cost data management and sharing into research
• Identify resources needed to make research data shareable
beyond primary research team - above planned standard
research procedures and practices
• Resources = people, equipment, infrastructure, tools to manage,
document, organise, store and provide access to data
• Early planning can reduce costs
• See our data management costing tool
FORMATTING YOUR DATA ……………………………………………………………………………………………………………………………….……………………………..
USING STANDARD AND INTERCHANGEABLE OR OPEN
LOSSLESS DATA FORMATS ENSURES LONG-TERM
USABILITY OF DATA. HIGH QUALITY DATA ARE WELL
ORGANISED, STRUCTURED, NAMED AND VERSIONED AND
THE AUTHENTICITY OF MASTER FILES IDENTIFIED.
AREAS OF COVERAGE
• File formats
• File conversions
• Organising files and folders
• File naming
• Version control and authenticity
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CAN YOU UNDERSTAND/USE THESE DATA?
SrvMthdDraft.doc
SrvMthdFinal.doc
SrvMthdLastOne.doc
SrvMthdRealVersion.doc
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
FILE FORMATS
Choice of software format for digital data:
• planned data analyses
• software availability/cost
• hardware used – e.g. audio capture
• discipline-specific standards and customs
Digital data = software dependent
Digital data endangered by obsolescence of software/ hardware
Best formats for long-term preservation - standard formats, interchangeable formats, open formats
e.g. tab-delimited, comma-delimited (CSV), ASCII, RTF, PDF/A, OpenDocument format, SPSS portable, XML
see UK Data Archive optimal file formats for various data types
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
FILE FORMAT CONVERSIONS
Convert data for preservation or back-up: export, save as, scripts
Beware of conversion errors losses:
• loss of internal metadata
e.g. convert MS Access to tab-delimited tables
• loss of editing, formatting, formulae
e.g. convert DOCX to RTF; XLSX to CSV
• truncation or loss of data
e.g. string variables lost in SPSS – Stata conversion; MS Access memo fields truncated in conversion to CSV; loss of sample labels on sequence data between annotation and analysis formats
Check for errors and changes after conversion
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
EXAMPLE: FORMAT CONVERSION
MS Excel (.XLSX) format
Tab-delimited text format
Loss of
annotation Formatting
change
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ORGANISING DATA
Plan in advance how best to organise data
Use a logical structure and ensure collaborators understand
Examples
• hierarchical structure of files, grouped in folders, e.g. audio, transcripts and annotated transcripts
• measurement data – original, processed, analysed etc.
• interview transcripts - individual well-named files
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
FILE NAMING
• file name = principal identifier of file
• use logical naming i.e. easy to identify, locate, retrieve, access
• naming provides organisation, context & consistency
• name elements: version nr, date, content description, creator
name
Best practice
• name independent of location
• brief & relevant
• no special characters, dots or spaces
• for separation use underscores _
• versioning via filename: ascending, decimal version numbers
• use names to classify broad types of files
• avoid very long file names
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DIRECTORY STRUCTURE
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
QUIZ: FILE NAMING
From the UK Data Archive’s Managing and Sharing Data: Training
Resources
http://data-archive.ac.uk/create-manage/training-resources/format
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
VERSION CONTROL
Keep track of different copies or versions of data files
Which method:
• single site vs. across locations
• single vs. multiple users
• different versions to be stored vs. synchronised files
Best practice:
• unique identifiers for files (file names)
• record file status/versions
• record relationships between files
e.g. data file and documentation; similar data files
• keep track of file locations
e.g. laptop vs. PC
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
VERSION CONTROL
Single user of data files
• file naming – unique file names with date or version number (avoid spaces!)
e.g. FoodInterview_1_draft; FoodInterview_1_final; HealthTest_06-04- 2008; BGHSurveyProcedures_00_04
• version control table or file history within or alongside data file
• version control facility within software
e.g. MS WORD
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
VERSION CONTROL
Multiple users of data files
• control rights to file editing: read/write permissions
e.g. Windows Explorer
• versioning/file sharing software: check files out/in
e.g. SharePoint, CMS, Google Docs, Amazon S3
• manual merging of multiple entries/edits
Synchronise files
• software
e.g. MS SyncToy,
• cloud-based
e.g. DropBox, Google Drive
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
EXAMPLE : VERSION CONTROL TABLE
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DEMO: SYNCHRONISING
Synchronise files between two folders using SyncToy (Windows) software
www.data-archive.ac.uk/media/375817/formattingyourdata_synchronisingexercise.pdf
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
QUIZ: FORMATTING YOUR DATA
From the UK Data Archive’s Managing and Sharing Data: Training
Resources
http://data-archive.ac.uk/create-manage/training-resources/format
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
STORING YOUR DATA ……………………………………………………………………………………………………………………………….……………………………..
LOOKING AFTER RESEARCH DATA FOR THE LONGER-TERM AND PROTECTING THEM FROM UNWANTED LOSS REQUIRES HAVING GOOD STRATEGIES IN PLACE FOR SECURELY STORING, BACKING-UP, TRANSMITTING, AND DISPOSING OF DATA. COLLABORATIVE RESEARCH BRINGS CHALLENGES FOR THE SHARED STORAGE OF, AND ACCESS TO, DATA.
AREAS OF COVERAGE
• Making back-ups
• Data storage
• Data security
• Data transmission and encryption
• File sharing and collaborative environments
• Data disposal
• Disseminating data
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
BACKING-UP DATA
• Why do back-ups? Risk of loss and change - would your data
survive a disaster?
• Protect against: software failure, hardware failure, malicious
attack, natural disasters
• Back-ups are additional copies that can be used to restore
originals
• It’s not backed-up unless backed-up with a strategy
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
BACK-UP STRATEGY
Consider
• what’s backed-up? - all, some, just the bits you change?
• where? - original copy, external local and remote copies
• what media? - CD, DVD, external hard drive, tape, etc.
• how often? – assess frequency and automate the process
• for how long is it kept?
• verify and recover - never assume, regularly test a restore
Backing-up need not be expensive
• 1Tb external drives are around
£50, with back-up software
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE 31
DATA STORAGE
All digital media are fallible
File formats and physical storage media become obsolete
• optical (CD, DVD) and magnetic media (hard drive, tapes) degrade
• never assume the format will be around for ever
Best practice
• use data formats with long-term availability
• storage strategy - at least two forms of storage and locations
• maintain original copy, external local copy and external remote copy
• copy data files to new media two to five years after first created
• check data integrity of stored data files regularly (checksum)
• know your personal/institutional back-up strategy
• know data retention policies that apply: funder, publisher, home institution
• what to protect? not only data, and not only digital
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE 32
NON-DIGITAL STORAGE
Printed materials, photographs
• degradation from sunlight and acid (sweat on skin, in paper)
• use high quality media for long-term storage/preservation
e.g. using acid-free paper & boxes, non-rust paperclips (no
staples)
Confidential items, e.g. signed consent forms, interview notes
• store securely, behind lock
• separate from data files
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE 33
ENCRYPTION
Always encrypt personal or sensitive data
Encrypt anything you would not send on a postcard
• for moving files e.g. transcripts
• for storing files e.g. shared areas, mobile devices
Basic principles
• use an algorithm to transform information (A=1)
• need a ‘key’ to decrypt
Free softwares that are easy to use
• Safehouse
• Truecrypt
• Axcrypt
These softwares
• encrypt hard drives, partitions, files and folders
• encrypt portable storage devices such as USB flash drives
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DATA DESTRUCTION
When you delete data and documentation from a hard drive: it is
probably not gone
• files need to be overwritten to ensure they are irretrievably
deleted:
• BCWipe - uses ‘military-grade procedures to surgically
remove all traces of any file’
• Axcrypt
• if in doubt, physically destroy the drive using an approved
secure destruction facility
• physically destroy portable media, as you would
shred paper
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE 35
DATA SECURITY
Protect data from unauthorised access, use, change, disclosure and destruction
Personal data need more protection – always keep separate
Control access to computers • passwords
• anti-virus and firewall protection, power surge protection
• networked vs non-networked PCs
• all devices: desktops, laptops, memory sticks, mobile devices
• all locations: work, home, travel
• restrict access to sensitive materials e.g. consent forms, patient records
Proper disposal of equipment (and data) • even reformatting the hard drive is not sufficient
Control physical access to buildings, rooms, cabinets
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE 36
FILE SHARING & COLLABORATIVE ENVIRONMENTS
Sharing data between researchers and teams
• too often email attachments
• Yousendit, Dropbox – consider if appropriate as services can be hosted outside the EU (DPA for personal data), e.g. encrypt
• Virtual Research Environments
• MS SharePoint
• Sakai
• file transfer protocol (ftp)
• physical media
• Essex ZendTo
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DEMO: DATA ENCRYPTION
Create an encrypted storage space using free software SafeHouse
www.data-archive.ac.uk/media/312652/storingyourdata_encryptionexercise.pdf
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DEMO: DATA INTEGRITY & BACK-UP
Calculate the MD5 checksum value of a file to check its integrate, e.g.
after back-up
www.data-archive.ac.uk/media/361550/storingyourdata_checksumexercise.pdf
ETHICAL AND LEGAL ISSUES IN DATA SHARING ……………………………………………………………………………………………………………………………….……………………………..
A COMBINATION OF GAINING CONSENT FOR DATA SHARING,
ANONYMISING AND REGULATING ACCESS TO DATA WILL
INCREASE THE POTENTIAL FOR MAKING PEOPLE-
RELATED RESEARCH DATA MORE READILY AND WIDELY
AVAILABLE
AREAS OF COVERAGE
• Legal and ethical aspects
• Informed consent for data sharing
• Anonymising data
• Controlling access to data
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ETHICAL ARGUMENTS FOR ARCHIVING DATA
• store and protect data securely
• not burden over-researched, vulnerable groups
• make best use of hard-to-obtain data (e.g., elites, socially excluded, over-researched)
• extend voices of participants
• provide greater research transparency
• enable fullest ethical use of rich data
In each, ethical duties to participants,
peers and public may be present
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DUTY OF CONFIDENTIALITY AND DATA SHARING
• Duty of confidentiality exists in common law and may apply to research data
• If participant consents to share data, then sharing does not breach confidentiality
• Public interest can override duty of confidentiality; best practice is to avoid vague or general promises in consent forms
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DATA PROTECTION ACT, 1998
• Personal data: • relate to living individual
• individual can be identified from those data or from those data and other information
• include any expression of opinion about the individual
• Requirements for handling personal data • processed fairly and lawfully
• obtained and processed for a specified purpose
• adequate, relevant and not excessive for the purpose
• accurate
• not kept longer than necessary
• processed in accordance with the rights of data subjects, e.g. right to be informed about how data will be used, stored, processed, transferred, destroyed, …; right to access info and data held
• kept secure
• not transferred abroad without adequate protection
• Only disclosed if consent has been given to do so (except legal duty)
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
DATA PROTECTION ACT & RESEARCH
• Exceptions for personal data collected as part of research:
• can be retained indefinitely (if needed)
• can be used for other purposes in some circumstances
• people should still be informed
• If data are anonymised (personal identifiers removed) then DP laws will not apply as these no longer constitute ‘personal data’
DPA is not intended to, and does not, inhibit ethical research
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
SENSITIVE DATA
• Data regarding an individual's race or ethnic origin, political opinion, religious beliefs, trade union membership, physical or mental health, sex life, criminal proceedings or convictions (DPA 1998)
• Can only be processed for research purposes if:
• explicit consent (ideally in writing) has been obtained; or
• medical research by a health professional or equivalent with duty of confidentiality; or
• analysis of racial/ethnic origins for purpose of equal opportunities monitoring; or
• in substantial public interest and not causing substantial damage and distress
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
OPTIONS FOR SHARING CONFIDENTIAL DATA
Researchers to consider
• obtaining informed consent, also for data sharing and preservation / curation
• protecting identities e.g. anonymisation, not collecting personal data
• restricting / regulating access where needed (all or part of data) e.g. by group, use, time period
• securely storing personal or sensitive data
Consider jointly and in dialogue with participants
Plan early in research
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CONSENT NEEDED ACROSS THE DATA LIFE CYCLE
• Engagement in the research process
• decide who approves final versions of transcripts
• Dissemination in presentations, publications, the web
• decide who approves research outputs
• Data sharing and archiving
• consider future uses of data
Always dependent on the research context
UK Data Archive model consent form
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ANONYMISATION PREVENTS IDENTITY DISCLOSURE
A person’s identity can be disclosed through:
• direct identifiers
e.g. name, address, postcode, telephone number, voice, picture
Often NOT essential research information (administrative)
• indirect identifiers – possible disclosure in combination with other information
e.g. occupation, geography, unique or exceptional values (outliers) or characteristics
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
KEY POINTS FOR ANONYMISING
• never disclose personal data - unless consent for disclosure
• reasonable/appropriate level of anonymity
• maintain maximum meaningful information
• where possible replace rather than remove
• identifying information may provide context, do not over-anonymise
• re-users of data have the same legal and ethical obligation to NOT disclose confidential information as primary users
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ANONYMISING QUANTITATIVE DATA
• remove direct identifiers
e.g. names, address, institution, photo
• reduce the precision/detail of a variable through aggregation
e.g. birth year vs. date of birth, occupational categories, area rather than village
• generalise meaning of detailed text variable
e.g. occupational expertise
• restrict upper lower ranges of a variable to hide outliers
e.g. income, age
• combining variables
e.g. creating non-disclosive rural/urban variable from place variables
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
GEO-REFERENCED DATA
Spatial references (point coordinates, small areas) may disclose position of individuals, organisations, businesses
Remove spatial references - prevents disclosure; also all geographical and related information lost
Better
• reduce precision - replace point co-ordinates with larger, non-disclosing geographical areas e.g. km2 area, postcode district, ward, road
• reduce precision - replace point coordinate with meaningful variable typifying the geographical position; or summary statistics of location
e.g. catchment area, poverty index, population density
• keep spatial references and impose access restrictions on data
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ANONYMISING QUALITATIVE DATA
• not collect disclosive data unless necessary
• plan or apply editing at time of transcription
except: longitudinal studies - anonymise when data collection complete (linkages)
• avoid blanking out; use pseudonyms or replacements
• avoid over-anonymising - removing/aggregating information in text can distort data, make them unusable, unreliable or misleading
• consistency within research team and throughout project
• identify replacements, e.g. with [brackets]
• keep anonymisation log of all replacements, aggregations or removals made – keep separate from anonymised data files
• xml mark-up can be used for anonymisation
<seg type="anonymised">word to be anonymised</seg>
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ANONYMISING QUALITATIVE DATA
Example: Anonymisation log interview transcripts
Interview / Page Original Changed to
Int1
p1 Spain European
country
p1 E-print Ltd Printing
company
p2 20th June June
p2 Amy Moira
Int2
p1 Francis my friend
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
ACCESS CONTROLS ON DATA
• Essential when anonymisation ineffective or damaging to quality
• visual or audio data
• disclosive microdata
• UK Data Archive has gradation of access controls
• small number of studies are open (no registration)
• majority require registration
• data users sign legally binding End User Licence – e.g. not identify any potentially identifiable individuals
• stricter regulations for certain types of data: • Special Licences
• Approved researchers
• require data access authorisation from data owner prior to data release
• embargo for given time period
• Secure Data Service (no direct data access)
• Multiple AC can apply to different data types within one study
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
COPYRIGHT ……………………………………………………………………………………………………………………………….……………………………..
COPYRIGHT IS AN INTELLECTUAL PROPERTY RIGHT ASSIGNED AUTOMATICALLY TO THE CREATOR, THAT PREVENTS UNAUTHORISED COPYING AND PUBLISHING OF AN ORIGINAL WORK. COPYRIGHT APPLIES TO RESEARCH DATA AND PLAYS A ROLE WHEN CREATING, SHARING AND RE-USING DATA.
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
COPYRIGHT AND DATA SHARING
• Copyright permissions sought and granted prior to data sharing / archiving
• Clearing copyright – reach agreement with copyright holder
• Data archives publish data – they hold no copyright
• Copyright holders give permission to data archives to preserve data and make them accessible to users
• For secondary use, copyright clearance before data can be reproduced
• Exception - fair dealing - for non-commercial research, private study, teaching, quotations, criticism or review; then author and source must be cited
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CASE STUDY
Health and Social Consequences of the Foot and Mouth Disease Epidemic in North Cumbria, 2001-2003 (SN5407 at UK Data Archive)
Maggie Mort, Lancaster University
•funded by Department of Health
•recruit panel of 54 local people in affected area at time of FM crisis: farmers, agricultural professionals, small businesses, health professionals, vets, residents
•weekly diaries for 18 months describing how their life was affected by the crisis and process of recovery observed around them (handwritten)
•In-depth interviews and group discussions (audio recordings, transcripts)
•at end of research – feeling by researchers that data should be archived
•how would you approach data archiving in this case? • ethical aspects
• legal aspects
• how engage with panel
• practical aspects
•If this was your research project, which data management aspects would
•be essential to consider, and when? use the DM checklist to plan DM activities within a research cycle
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CASE STUDY
Researchers approach
• seek advice from copyright specialist re. terms of agreement for archiving
• meet with UK Data Archive, Qualidata - advice data archiving
• develop separate consent forms for written and audio material, with opt in
/ opt out and embargo option
• pilot discussion on data archiving with 4 panel members to explore:
• feelings re. data anonymisation, confidentiality, copyright, ownership
• user options of archived data - scholarly / educational purposes
• understanding of archiving by participants and information required
• discuss archiving individually with each panel member
• 7 panel members declined archiving their data
• 40 interview and diary transcripts are archived and available for re-use by
registered users
• 3 interviews and 5 diaries are embargoed until 2015
• audio files archived and only available by permission from researchers
Detailed information:
www.esds.ac.uk/doc/5407%5Cmrdoc%5Cpdf%5Cq5407userguide.pdf
……………………………………………………………………………………………………………………………….……………………………..
………………………………………………………………………………………………………………………………………….……………………..…
UK DATA ARCHIVE
CONTACT
Workshop materials:
http://data-archive.ac.uk/news-events
http://www.data-archive.ac.uk/news-events/events.aspx?id=3329
UK DATA ARCHIVE
UNIVERSITY OF ESSEX
WIVENHOE PARK
COLCHESTER
ESSEX CO4 3SQ ……………………….…………………….….
T: +44 (0)1206 872001
W: www.data-archive.ac.uk ……………………………….………………..