data on the web life cycle bernadette farias lóscio [email protected] march, 2014

18
Data on the Web Life Cycle Bernadette Farias Lóscio [email protected] March, 2014

Upload: melinda-lindsey

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

Bernadette Farias Ló[email protected]

March, 2014

Page 2: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Outline

• Definition of data on the Web• Data on the Web life cycle

– Spiral model– Overview

• Data collection• Data generation• Data distribution• Data usage

• Data on the Web life cycle + best practices– Examples of best practices

Page 3: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web

Data from diverse domains (ex: governmental data, cultural heritage, scientific data, cross domain) available on the Web on a machine

processable format.

Page 4: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

A set of tasks or activities that take place during the process of publishing and using data on the Web.

The process may pass through some number of iterations and may be represented using a spiral model.

Page 5: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

Author: Bernadette Lóscio

Page 6: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

DATA ON THE WEB LIFE CYCLEAn overview of the

Page 7: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

• Data collection– Sources selection: identification of data sources

that may offer relevant data (ex: relational databases, xml files, excel documents)

Page 8: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

• Data Generation (1st iteration)– Dataset project

• Define the schema of the target dataset (structural metadata)

• Choose standard vocabularies– Data (ex: FOAF, DC, SKOS, Data Cube)– Dataset (ex: DCAT, PROV, VoiD, Data Quality Vocab)– Data Catalog (ex: DCAT)

• Choose data formats (machine processable data)• Create new vocabularies• …

Page 9: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

• Data Generation (2nd iteration)– ETL process (Extract, Transform and Load)

• Extract data from the selected data sources, transforms the data according to the decisions made during the dataset project and loads the data into the target dataset

– Metadata generation• Produce (manually or automatically) structured

metadata according to the metadata standards defined during the dataset project

Page 10: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

• Data Distribution (1st iteration)– URIs project

• Design URIs that will persist and will continue to mean the same thing on the long term

– Choose a solution(s) for data publishing• data catalogue, API, SPARQL endpoint, dataset dump, …

Page 11: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

• Data Distribution (2nd iteration)– Publish data and metadata

• Make data and metadata available on the Web

• Data Distribution (3rd iteration)– Update data

• Make a new version of the dataset available on the Web

– Update metadata• Make a new version of the metadata available on the

Web

Page 12: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle

• Data usage– Explore data

• Identify important aspects of the data into focus for further analysis

– Analyze data• Develop applications, build visualizations, …

– Give feedback• Provide useful information about the dataset (ex:

dataset relevance, data quality,…)• Provide data usage descriptions

Page 13: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

DATA ON THE WEB LIFE CYCLE + BEST PRACTICES

An overview of the

Page 14: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Best Practices

• Best practices may be applied during the whole process of publishing and using data on the Web.

• Best practices may be defined according to the activities performed in each one of the quadrants (or tasks).

Page 15: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Life Cycle + Best Practices

Author: Bernadette Lóscio

Page 16: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Examples of Best Practices

• Data collection– Best practices:

• Have a catalogue to describe potential data sources, i.e., data sources that could provide data to be published on the Web

• …

• Data Generation– Best practices

• Document the process of data generation• Use standard vocabularies to describe data• Use standard vocabularies to describe datasets and data catalogues (ex:

DCAT)• Provide stable URIs• Provide data on machine processable formats• Provide metadata to describe data• …

Page 17: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Examples of Best Practices

• Data Distribution – Use standard ways to distribute data (ex: data catalogues and APIs)– Provide details about data access– Provide details about data licence– Provide details about dataset provenance and quality– Provide a schedule of dataset updates– Keep a dataset history– Provide ways to collect data consumers feedback– Announce the publication of new datasets or new versions of existing

datasets– …

• Data usage– Provide feedback about datasets– Provide descriptions about the usage of the dataset– …

Page 18: Data on the Web Life Cycle Bernadette Farias Lóscio bfl@cin.ufpe.br March, 2014

Data on the Web Best Practices

• For each best practice, a guidance of how to implement must be provided

• Some best practices may have more than one way of implementation

(to be continued)