sustainable preservation of linked data vassilis christophides

8
Sustainable Preservation of Linked Data Vassilis Christophides

Upload: melvyn-rich

Post on 17-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Sustainable Preservation of Linked Data Vassilis Christophides

Sustainable Preservation of Linked Data

Vassilis Christophides

Page 2: Sustainable Preservation of Linked Data Vassilis Christophides

Linked Data vs Cultural Artifacts!

• Linked datasets are digitally-born objects designed to be copied, rely on vocabularies and integrity constraints (understandable by both people and programs), whose data and structures changing over time

Page 3: Sustainable Preservation of Linked Data Vassilis Christophides

Digital Object vs Data Preservation

Source: Preserving Our Digital Heritage: The National Digital Information Infrastructure and Preservation Program 2010 Report. A Collaborative Initiative of the Library of Congress

Page 4: Sustainable Preservation of Linked Data Vassilis Christophides

Frame Linked Data Preservation as a Sustainable Economic Activity

• Economic activity: deliberate allocation of resources– Cost of losing datasets

• Sustainable: ongoing resource allocation over long periods of time– Involved data subjects

• Articulate the problem/provide recommendations & guidelines– Economic and societal benefits

Technical

Social Economic

Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Final report 2010

Page 5: Sustainable Preservation of Linked Data Vassilis Christophides

Sustainability Conditions

• Who benefits from use of the preserved data?

• Who selects what data to preserve?

• Who owns the data?• Who preserves the data?• Who pays both for data

and preservation services?

• recognition of the benefits of preservation by decision makers

• selection of datasets with long-term value

• incentives for decision makers to act in the public interest or to elaborate new business models

• appropriate governance of preservation activities

• ongoing and efficient allocation of resources to preservation

• timely actions to ensure long-term data access and usability

Page 6: Sustainable Preservation of Linked Data Vassilis Christophides

The Scientific Data Life Cycle

• Data Life Cycle Labs A New Concept to Support Data-Intensive Science

Page 7: Sustainable Preservation of Linked Data Vassilis Christophides

Scientist

Research Process

Secondary(derived)

data

Tertiarydata for

publication

Primary publication

Secondarypublication

TertiaryPublication

PeerReview

e-Prints

PublicationArchives

Library - Peers - Public - Industry

PublicationProcess

Primary data

Web Content

Patent data

Research Process

Researchbased on

data

Metadata

CurationCurator

Curation Process

Archiveddata

Data repositories

Philip Lord, 2003

Scientist

Research Process

Secondary(derived)

data

Tertiarydata for

publication

Primary publication

Secondarypublication

TertiaryPublication

PeerReview

e-Prints

PublicationArchives

Library - Peers - Public - Industry

PublicationProcess

Primary data

Web Content

Patent data

Research Process

Researchbased on

data

Metadata

CurationCurator

Curation Process

Archiveddata

Data repositories

Philip Lord, 2003

Page 8: Sustainable Preservation of Linked Data Vassilis Christophides

Data-as-a-Service (DaaS) Pricing Models• By far the most common case is that of a fixed price for the entire data set

(CustomLists, Infochimps) or a fixed number of transactions per month based on client subscriptions (Azure DataMarket, Infochimps API)

• DaaS pricing models are based on tiered data access falling into– Volume-based model: 1) quantity-based pricing and 2) pay per call (A

“call” is a single request/response interaction with the API for data)– Data type-based model: An example is a mapping API that offers the

geo-coordinates and zip codes of the neighbourhoods in an urban area while additional attributes including school or post office locations are sold for an additional charge

– Hybrid pricing models combine value with volume charges to create finer-grained pricing to better meet both the buyers’ and sellers’ needs

• Existing pricing models favour essentially big customers that can typically afford to purchase the entire data sets they need, but small customers often need only a few data items from them and cannot afford to pay the full price