coping with the long tail of data variety (edf 2014)

1
Data curation is enabling more complete and high quality data-driven models for knowledge organisations. eScience projects are the key innovators while Biomedical and Media companies are the early adopters. Pre-competitive economic models can support the creation of curation infrastructures. Curation at scale requires blending of automated curation platforms with large numbers of data curators. Improvement of human-data interaction is needed. Standards and models needed to reduce data curation effort. Interviews with domain experts, sector case studies and literature analysis. Focus on , and . Five main categories of analysis: Figure: The long tail of data variety and data curation scalability. Provide a for the future of data curation. Distributed data generation. Data quality issues. Increasing data variety and volume. Data curation activities as a fundamental process for coping with the . Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 318062).

Upload: andre-freitas

Post on 10-May-2015

106 views

Category:

Science


4 download

DESCRIPTION

The talk will discuss current challenges, approaches and future directions for coping with data variety. The discussion will be grounded on exemplar use cases from leaders in industry and in large-scale scientific projects such as IBM Watson, CrowdFlower, BBC, Press Association, ProteinDataBank, Data.gov.uk, Chemspider among others. The use cases were collected in interviews with Big Data industry and academic experts in the context of the BIG Project and provide a glimpse of the state of the art techniques which are currently being used to cope with data variety and the future directions and emerging trends for this field.

TRANSCRIPT

Page 1: Coping with the Long Tail of Data Variety (EDF 2014)

Data curation is enabling more complete and high

quality data-driven models for knowledge

organisations.

eScience projects are the key innovators while

Biomedical and Media companies are the early

adopters.

Pre-competitive economic models can support the

creation of curation infrastructures.

Curation at scale requires blending of automated

curation platforms with large numbers of data curators.

Improvement of human-data interaction is needed.

Standards and models needed to reduce data curation

effort.

Interviews with domain experts, sector

case studies and literature analysis.

Focus on ,

and .

Five main categories of analysis:

Figure: The long tail of data variety and data curation

scalability.

Provide a for the future of data

curation.

Distributed data generation.

Data quality issues.

Increasing data variety and volume.

Data curation activities as a fundamental

process for coping with the

.

Project co-funded by the European Commission within the

7th Framework Program (Grant Agreement No. 318062).