growing open data: making the sharing of xxl-sized research data files online a reality, using...

16
GROWING OPEN DATA: MAKING THE SHARING OF XXL-SIZED RESEARCH DATA FILES ONLINE A REALITY, USING EDINBURGH DATASHARE PAULINE WARD: [email protected] @PAULINEDATAWARD GEORGE HAMILTON

Upload: pauline-ward

Post on 16-Apr-2017

828 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

GROWING OPEN DATA: MAKING THE SHARING OF XXL-SIZED RESEARCH DATA FILES ONLINE A REALITY, USING EDINBURGH DATASHAREPAULINE WARD: [email protected] @PAULINEDATAWARDGEORGE HAMILTON

Page 2: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE CHALLENGE

• Researchers are generating bigger files. At University of Edinburgh all researchers are entitled to 500 GB storage.

Page 3: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE CHALLENGE

• Researchers need to be able to share their data online.• For impact.• For discoverability.• For reproducibility.• For compliance.

Page 4: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE CHALLENGE

• DataShare is the Institutional Repository for research data for staff and students at the University of Edinburgh: datashare.is.ed.ac.uk .• Previous file size limit of 2.1 GB.• Largest file we’ve been asked to share: 20 GB – split into smaller

files.• Largest fileset we’ve been asked to share: 226 GB – split into

smaller filesets.

Page 5: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE CHALLENGE

• Some files had to be imported via time-consuming batch import process because too big / too numerous for web deposit.• Some files still waiting to be shared because they are too big

for users to be able to conveniently download them.• These files are generated from a wide range of disciplines

and wide range of methods.

Page 6: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION

• Getting the files from the depositors: address upload • Allowing users to get the files: address download

Page 7: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: UPLOAD

• HTML5 resumable upload

Page 8: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: UPLOAD

• EDINA’s code for implementing HTML5 upload in DSpace is on GitHub: https://github.com/edina/DSpace/tree/xml-html5-upload • Uses resumable.js• This was the XMLUI re-write of functionality that was

available for DSpace 5.0 JSPUI. See https://jira.duraspace.org/browse/DS-1562 for further details.

Page 9: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare
Page 10: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare
Page 11: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: UPLOAD

• Testing shows files up to 15 GB upload successfully.• (cf figshare 5 GB file size limit, Zenodo 2 GB)• 20 GB file upload has been done in testing, but generates an error

message in the browser, and the user must find and Resume the submission from the Submissions page

• Multiple files can be uploaded by drag’n’drop.

Page 12: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: DOWNLOAD

We wanted a mechanism, which DSpace doesn’t provide, of zipping up files for download.• BitTorrent was one possible approach: could be added at a

later date• Other approaches possible (Rsync, Secure Copy (SCP))

Page 13: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: DOWNLOAD

• FTP download: agreed• Tried and tested technology that we are confident we can put in place

and will work well• All files will be accessed from the FTP server anonymously• Users can still download files via browser via FTP• Users who wish can use an FTP client, allowing them to resume a

download

Page 14: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: DOWNLOAD

• Specification:• All files will still be required to have appropriate metadata stored in

DSpace• All filesets will now be downloadable as a zip file (previous 5.2 GB

limit)• Move DSpace assetstore to a location where more storage available• Statistics (i.e. numbers) of file downloads by SFTP will be added to

DSpace statistics

Page 15: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

THE SOLUTION: DOWNLOAD

• This is a replacement for our current on-the-fly zip file creation of Item bitstreams.• Will mitigate potential performance issues. Because it will use

less server resources (Java threads and RAM)

Page 16: Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

SUMMARY

• We have implemented HTML5 upload in the DataShare (DSpace) web interface to allow depositors to easily and quickly deposit individual files up to 15 GB.• We are working on integrating an SFTP server to allow users to

retrieve filesets larger than our current 20 GB limit. Storage rather than network/browser timeout will become the limiting factor on fileset size. We anticipate making numerous filesets around 100 GB available in this way in the medium term.