linked data generation process
Post on 28-Jul-2015
270 Views
Preview:
TRANSCRIPT
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
1st Summer School on Smart Ci2es and Linked Open Data (LD4SC-‐15)
Linked Data Genera=on Process Raúl García-‐Castro, Filip Radulovic, Oscar Corcho, María Poveda, Víctor Rodríguez-‐Doncel, Asunción Gómez-‐Pérez, Daniel Vila-‐Suero
Presenter: Raúl García-‐Castro
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci2es • Guidelines for the Genera=on of Linked Data • Discussion • Hands-‐on Descrip=on
2
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Data in smart ci=es
hQp://br.fiberhomegroup.com/pt/Enterprise/324/2282.aspx
3
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
• For example, (re)using open transport data – Provide travel informa=on to persons – Allow beQer mul=modal route planning – Facilitate public transport management – … – Accessibility
• Which metro accesses are accessible for wheelchair users? • In which bus stops is it safer and more convenient for a wheelchair user to wait?
• Is there any accessible parking space nearby a bus stop? • etc.
Open data… for what?
4
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Legal framework and open data ini=a=ves
• Aarhus Conven=on (1998) – Right to par=cipa=on and access; 41 countries and the EU
• Open Access Ini=a=ve (2001) – Scien=fic informa=on on the Web; > 510 organisa=ons
• PSI Direc=ve – PSI Reuse (2003/98/EC)
• Conven=on for the access to official documents (2009) – Signed by 12 countries – Belgium, Finland, Norway, Sweden, Hungary, Estonia, Lithuania, Slovenia, Georgia,
Montenegro, Serbia and Macedonia
• Law 37/2007. PSI Reuse • Law 11/2007. Ci=zen access to public services and right to the quality of services • RD 4/2010 Na=onal Interoperability Scheme
– Open standards – Technology neutral – Open source solware
• RD 1495/2011 It develops law 37/2007 • Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
Adapted from Antonio Rodríguez Pascual (IGN) 5
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
The problem: lack of interoperability
Publish
Extract
Publish
Extract
Publish
Extract
I want to publish data in an interoperable
structure and format
I use GTFS I use my own CSV structure
I provide a web service
Build an app that is available all over the
world
6
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Scenario: open transport data
Is there any open transport
data already?
We are surrounded by them
7
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
1) In no2ce boards – For those who have a lot of free =me – Or those who are there at the right moment in =me
Adapted from Antonio Rodríguez Pascual (IGN)
DATA
8
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
2) In web pages and mobile apps – For people
Adapted from Antonio Rodríguez Pascual (IGN)
On the Web, open license
DATA
9
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
2) In web pages and mobile apps – For people
Adapted from Antonio Rodríguez Pascual (IGN)
On the Web, open license
DATA
Machine-‐readable
Non-‐proprietary format
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Open data and how they are published
3) As web files – So that they can be loaded by humans in their
informa=on systems (XML, HTML, CSV, etc.) – Hopefully it is not a scanned PDF
Adapted from Antonio Rodríguez Pascual (IGN)
On the Web, open license
DATA
Machine-‐readable
Non-‐proprietary format
11
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain Adapted from Antonio Rodríguez Pascual (IGN)
Open data and how they are published
4) Via web services – For humans and machines – It allows genera=ng added-‐value services – And can be integrated in the applica=on business logic
On the Web, open license
DATA
Machine-‐readable
Non-‐proprietary format
12
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
What is open data?
• Open data are data that can be freely used, reused and redistributed by anyone -‐ subject only, at most, to the requirement to a9ribute and sharealike.
• The most important aspects to consider: – Availability and Access: data must be available as a whole and at no
more than a reasonable reproduc2on cost, preferably by downloading over the Internet. Data must also be available in a convenient and modifiable form.
– Reuse and Redistribu2on: data must be provided under terms that permit reuse and redistribu2on including the intermixing with other datasets.
– Universal Par2cipa2on: everyone must be able to use, reuse and redistribute -‐ there should be no discrimina2on against fields of endeavour or against persons or groups. For example, ‘non-‐commercial’ or ‘only in educa=on’ restric=ons.
Source: Open Data Handbook 13
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Scenario: open transport data
Is there any open transport
data already?
Can we do it beSer?
14
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Going into 4 and 5 Linked Data
Make it available as structured data (e.g., Excel instead of image scan or a table)
Use non-‐proprietary formats (e.g., CSV instead of Excel)
Use URIs to iden2fy things, so that people can point at your stuff
Link your data to other data to provide context
Make your stuff available on the Web (whatever format) under an open license
15
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
USE URIs + RDF RDF standards
José
Mobility impairment
Boardgames
API
Mirasierra
Ven=squero de la Condesa
Yes
CSV
Mega Games
Ven=squero de la Condesa
Yes
CSV
Mega Games
Conquer & Smash!
MG
29,95
HTML
José
Mobility Impairment
hasImpairment
WheelchairAccessibility
requires
Boardgame
likes
Mirasierra
address Ven=squero de la Condesa
WheelchairAccessibility
hasAccessibility
Mega Games
address
hasAccessibility WheelchairAccessibility
Ven=squero de la Condesa
Mega Games
Conquer & Smash!
is a Boardgame
sells
API RDF CSV RDF CSV RDF HTML RDF
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Link your data Linked RDF
José
Mobility impairment
Boardgames
Mirasierra
Ven=squero de la Condesa
Yes
Mega Games
Ven=squero de la Condesa
Yes
Mega Games
Conquer & Smash!
MG
29,95
API CSV CSV HTML
José
Mobility Impairment
hasImpairment
WheelchairAccessibility
requires
Boardgame
likes
Mirasierra
address Ven=squero de la Condesa
WheelchairAccessibility
Mega Games
address
hasAccessibility WheelchairAccessibility
Mega Games
Conquer & Smash!
is a
hasAccessibility
Boardgame
Ven=squero de la Condesa
sells
API RDF CSV RDF CSV RDF HTML RDF
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
WheelchairAccessibility
Ven=squero de la Condesa
Boardgame
Link your data Linked RDF
José
Mobility impairment
Boardgames
Mirasierra
Ven=squero de la Condesa
Yes
Mega Games
Ven=squero de la Condesa
Yes
Mega Games
Conquer & Smash!
MG
29,95
API CSV CSV HTML
José
Mobility Impairment
hasImpairment
WheelchairAccessibility
requires
Boardgame
likes
Mirasierra
address Ven=squero de la Condesa
hasAccessibility WheelchairAccessibility
Mega Games
address Ven=squero de la Condesa
hasAccessibility WheelchairAccessibility
Mega Games
sells Conquer & Smash!
is a Boardgame
API RDF CSV RDF CSV RDF HTML RDF
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Make complex queries
Where can I buy the Conquer & Smash!
game?
Which are the most accessible routes for Christmas shopping?
Expansion pack for Conquer & Smash! Take metro line 9 and in 35 minutes
we can demo it to you!
Or beQer take bus 231 because it is sunny and you can take a glance at the outdoor art
exhibi=on in Plaza de Cas=lla
MG
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Using Linked Open Transport Data
• Calculate accessible routes – Combined with geographical data (IGN) – Which stop should I use if I have mobility problems?
• Commercial routes by bus – Combined with Madrid’s shop census (from Ayto. Madrid)
• Geomarke=ng decisions for enterpreneurs – Where should I open my shop? Based on the combina=on of the number of travellers per stop, demographic data, data about other businesses and shops around, etc.
• Personalised offers to travellers – With real-‐=me data and data about consump=on paQerns (e.g., credit card transac=ons)
• …
20
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci=es • Guidelines for the Genera2on of Linked Data • Discussion • Hands-‐on Descrip=on
21
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data life cycle
Specification
Modelling
Generation Publication
Exploitation
Linking
22
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements (smart ci=es domain)
1. Tabular formats (i.e., SQL, XLS or CSV) – Other data structures (e.g., XML) less important in prac=ce
or are unstructured and would require much more work 2. Changing data (dynamic or streaming data), versioning,
(automa=c) data quality assurance and reliability 3. Data access through web services, proprietary APIs and
data files 4. Legal aspects (e.g., licensing, data ownership) 5. Access rights management or mechanisms for
extrac=ng public data (plenty of confiden=al data)
23
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
24
F. Radulovic, M. Poveda-‐Villalón, D. Vila-‐Suero, V. Rodríguez-‐Doncel, R. García-‐Castro and A. Gómez-‐ Pérez, Guidelines for Linked Data genera=on and publica=on: An example in building energy consump=on, Automa=on in Construc=on, Special Issue on Linked Data in Architecture and Construc=on. Available online April 2015.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
DATA PREPARATION
25
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Select data source
• Select the data source that will be transformed into Linked Data
• Steps: – To define the requirements for selec=on – To select one or several data sources
• The data set may be: – Owned by your organiza=on… – … or not (external data sources)
26
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Select data source – LCmple
• Requirements – Real-‐world scenario in the smart city domain – Available for use – Available in machine-‐processable format (the more structured the data are, the beQer)
– Can be linked with generic en==es (e.g., loca=on) • Leeds City Council – energy consump=on
– hQp://data.gov.uk/dataset/council-‐energy-‐consump=on
27
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Obtain access to data source
• Data access means – Technical means to retrieve the data – Legal rights to use the data
• If the data is not accessible: – To iden=fy the person to contact – To request the access – To obtain access and to retrieve the data
• Access alterna=ves: – file, – programming interface, – database, – data stream, – etc.
28
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Obtain access to data source – Lample
• Data set already available as a CSV file
29
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analysing licensing of the data source
• Licenses specify the legal terms under which a data set can be used and exploited
• Neither legal prescrip=ons on how to declare licenses nor common standard prac=ces to do so
• Steps (not automatable): – To iden=fy the rightsholder and the authorita=ve publisher
• Righstholder vs. authorized distributor – To find the applicable license
• Web page, data set metadata, data themselves • Contact the publisher
– To read the license and analyse legal terms • Tips
– Analysis should be performed upon all copies and formats of the data – Ensure license compa=bility when integra=ng several data sources
30
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data resources can be protected
Ontologies are intellectual works, they can be protected by copyright RDF Datasets can be considered as databases, also legally protected in the EU
31
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Create, consume, aggregate, derive and publish Linked Data in a lawful environment
0
Always license your data
…
Data shops Government Individuals
32
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Licensed Linked Data
Non-‐licensed Linked Data Licensed Linked Data
+License
Unless there is a license allowing to do so, the resource cannot be copied, modified or published. In practice, non-licensed resources are useless in industrial settings
Licensed Linked Data can be used
33
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Licensed Linked Data in prac=ce
Linked Open Data Published Open License
(Published) Linked Data Published No Open License
Linked Data Not Published No Open License
34
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
ç
Guidelines for licensing linked data
35
Add "rights" metadata in the dataset descrip=on (e.g., VoID, DCAT) 1
Use standard predicates to declare "rights" statements (e.g., Dublin Core terms: dc:rights, dct:license) 2
?
Use rights declara2on language, e.g., ODRL
Yes
Use URI of standard license e.g., CC0 3b 3a
No
Standard license available
ODRL Open Digital Rights Language
DCAT Data catalog vocabulary
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Licensing Linked Data is Simple…
The Bri=sh Na=onal Bibliography (BNB) lists the books and new journal =tles published or distributed in the United Kingdom and Ireland since 1950.
J 36
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
… or complex depending your needs
Policies can be expressed with ODRL 2.0 to govern access to Linked Data Example of access to Linked Data for a price (15EUR for the dataset or 0.01EUR for a triple thereof)
@prefix gr: <http://purl.org/goodrelations/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . <http://salonica.dia.fi.upm.es/ldr/policy/cdaddba4-fc2e-4ee0-a784-e62f1db259bf> a odrl:Set ; rdfs:label "License Offering Paid Linked Data" ; odrl:permission [ a odrl:Permission ; odrl:target <http://example.org/dataset/ds01> ; odrl:action odrl:reproduce ; odrl:duty [ a odrl:Duty ; rdfs:label "Pay" ; gr:UnitOfMeasurement dcat:Dataset ; gr:amountOfThisGood "1" ; odrl:action odrl:pay ; odrl:target "15,00 EUR" ] ] , [ a odrl:Permission ; odrl:action odrl:reproduce ; odrl:target <http://example.org/dataset/ds01> ; odrl:duty [ a odrl:Duty ; rdfs:label "Pay" ; gr:UnitOfMeasurement rdf:Statement ; gr:amountOfThisGood "1" ; odrl:action odrl:pay ; odrl:target "0,01 EUR" ] ] ..
The target can be an ontology, a dataset, a SPARQL endpoint… …or a SPARQL query itself or a triple pattern: {mysubject, ?p , ?o}
37
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
And you have support for that
• Condi=onal access to Linked Data – hQp://condi=onal.linkeddata.es
• Dataset of licenses in RDF – hQp://rdflicense.appspot.com
• ODRL Profile for Linked Data – hQp://purl.oclc.org/NET/ldr/ns# – hQps://www.w3.org/community/odrl/profile/linkeddata/
38
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analyse licensing – LCmple
39
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analyse data source
• Get insight into the data structure and organiza=on • Steps:
– To analyse the characteris=cs of the data • Data values, data ranges, etc.
– To obtain the schema of the data • Concepts and their rela=onships
• Data can be available as: – Structured data – Unstructured data
• If the schema does not exist: – Use a standard modeling language for describing the data schema (e.g., UML)
40
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analyse data source – LCmple
• Metadata not quite descrip=ve: – Different types of council sites (mostly buildings)
– Electricity, gas and oil consump=ons
– 1-‐year intervals -‐ 2010/11, 2011/12, 2012/13
• Analysis required contac=ng with people from LCC open data
41
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analyse data source – LCmple
42
hQp://localhost:3333/
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analyse data source – LCmple
43
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Analyse data source – LCmple
• Analyse the characteris=cs of data using facets • Obtain the schema of the data
44
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Data characteris=cs and schema – LCCLLIDD
Column Type Comments / Range (rounded) Problems
uprn String Not unique, empty values
Site Name String Unique? Site types + name
4 repeated sites
Address 2 String Not unique, empty values
Address 3 String Not unique, empty values Village? Civil Parish?
Address 4 String Not unique, empty values City? Metropolitan district? “leeds” vs “Leeds”
PostCode String Not unique, empty values
Electricity 10/11 Decimal 0 — 2.700.000
Electricity 11/12 Decimal 0 — 2.300.000
Electricity 12/13 Decimal 0 — 2.400.000
Gas 10/11 Decimal -‐100,000 — 6,100,000 Nega=ve values
Gas 11/12 Decimal -‐100,000 — 7,800,000 Nega=ve values
Gas 12/13 Decimal -‐100,000 — 8,300,000 Nega=ve values
Oil 12/13 Decimal -‐1,000,000 — 13,000,000 Nega=ve values 45
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
DEFINE RESOURCE NAMING STRATEGY
46
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hash and slash URIs
• Hash URIs (#) – hQp://www.energycompany.com/about#energyCompany – The fragment part has to be stripped off when the URI is requested from the server (i.e., the resource cannot be retrieved directly)
– Hash URIs can be used to iden=fy non-‐document resources • Slash URIs (/)
– hQp://www.energycompany.com/about/energyCompany – Imply a 303 redirec=on to the loca=on of a document that represents the resource (+ content nego=a=on)
• E.g., hQp://www.energycompany.com/about/energyCompany.rdf – Drawbacks: HTTP round-‐trip, redirects, web server configura=on
47
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hash or slash?
• Depends on the data and on their expected use • Small data:
– Hash namespace – Access all the data as a whole – HTTP GET would return a single informa=on resource with everything
• Large / frequently-‐updated / modular data: – Slash namespace – Access resources individually or in groups – Resource descrip=ons may be divided among many informa=on resources or may be managed via a query service (e.g., SPARQL)
– Progressively greater detail about resources may be retrieved through mul=ple accesses
48
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define resource naming strategy
• Steps: – To choose a URI form (hash or slash) – To choose a domain for the URIs. – To choose a path for the URIs. – To choose a paQern for ontology classes and proper=es in the ontology, as well as for individuals
• Tips: – One URI must iden=fy only one item (e.g., avoid mixing with web pages and real-‐world objects)
– URIs should be persistent and should not change over =me (e.g., state informa=on); PURL may support this
– Use a domain that is under your control (or a service such as PURL)
– Separate the ontology model from its instances – Define meaningful URIs
49
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Resource naming strategy – LCC
• Hash URIs for ontological terms, slash URIs for individuals • Domain: hQp://smartcity.linkeddata.es/ • Ontological terms path:
– hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#
• Individuals path: – hQp://smartcity.linkeddata.es/lcc/resource/
• Ontological terms paSern: – hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name> – Ex.: hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#hasQuan=ta=veValue
• Individuals paSern: – hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name> – Ex.: hQp://smartcity.linkeddata.es/lcc/resource/LeisureCentre/WetJohnCharlesCentreforSport
50
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
DEVELOP ONTOLOGY
51
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Ontology development
6. Ontologyimplementation
5. Ontology selection
1. Requirements definition
Can you represent all your data?
7. Ontology evaluation
2. Terms extraction
3. Ontology conceptualization
4. Ontology search
6.2 Ontology completion
3.1 Initial model drafting
3.2 Detailed model definition
6.1 Ontology integration
You did this yesterday
52
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Ontology development – LCCDD
53
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data genera=on process
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
TRANSFORM DATA
54
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Data transforma=on
• Steps: – To select the RDF serializa=on
• RDF/XML, Turtle, N-‐Triples, JSON-‐LD – To select a tool. Depends on:
• The format of the data (database, spreadsheets, etc.), • Concrete needs of the transforma=on process (e.g., dynamicity)
– To transform the data into RDF • Usually requires a mapping between the data and the ontology
• The mapping implements the resource naming strategy – To evaluate the obtained RDF data:
• Syntax, Completeness, Accuracy, Conciseness, Modelling, Understandability, Versa=lity, Usage, Licensing, …
55
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Data transforma=on tools
Database to RDF Data streams to RDF • morph-‐RDB • D2R Server • TopBraid Composer
• morph-‐streams • D2R Server
Spreadsheets to RDF
XML to RDF
• TopBraid Composer • Excel2RDF • RDF123 • XLWrap • OpenRefine/LODRefine
• XML2RDF • TopBraid Composer • OpenRefine/LODRefine
56
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Data transforma=on tools
Database to RDF Data streams to RDF • morph-‐RDB • D2R Server • TopBraid Composer
• morph-‐streams • D2R Server
Spreadsheets to RDF
XML to RDF
• TopBraid Composer • Excel2RDF • RDF123 • XLWrap • OpenRefine/LODRefine
• XML2RDF • TopBraid Composer • OpenRefine/LODRefine
Overview of OpenRefine
57
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
OpenRefine basic opera=ons • Installing • Crea=ng a new project • Data analysis
– Exploring data – Sor=ng data – Face=ng data – Filtering data
• Basic data transforma=on (cleaning/preparing) – Columns:
• Move • Rename • Remove columns • Collapse and expand • Common transforma=ons
– Rows: • Remove rows
• Export whole project
58
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Adding derived columns Edit column à Add column based on this column...
59
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Spli�ng data accross columns Edit column à Split into several columns...
60
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Handling mul=-‐valued cells Edit Cells à Split mul=-‐valued cells...
Edit Cells à Join mul=-‐valued cells...
61
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Rows and records Show as: rows records
Record
Row
62
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Clustering similar cells Edit cells à Cluster and edit...
63
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Transposing rows and columns Transpose à Transpose cells across columns into rows...
Transpose à Columnize by key/value columns...
64
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Other useful u=li=es • Regular expressions
– Java regular expressions • Custom transforma=ons
– General Refine Expression Language (GREL) – Jython (Python implemented in Java) – Clojure (func=onal language that resembles Lisp)
65
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
66
Using the project history
• Project history: – Access opera=on history – Undo opera=ons – Extract opera=ons (in JSON) – Apply opera=ons
• Cau=on: – Transforma=ons are registered in the history; filters and facets are not
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Solving memory problems
hQps://github.com/OpenRefine/OpenRefine/wiki/FAQ:-‐Allocate-‐More-‐Memory
67
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
OpenRefine RDF extension -‐ RDF skeleton
• Resource naming strategy – Ontological terms paQern:
hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>
– Individuals paQern: hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>
Add base URI Add prefixes
68
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Crea=ng individuals
schema:CivicStructure
rdf:type
lccRes:CouncilOfficesBelgraveHouse
69
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Previewing results
70
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Adding property values
rdfs:label schema:CivicStructure xsd:string
rdf:type
lccRes:CouncilOfficesBelgraveHouse rdfs:label
“Belgrave House”
71
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Expor=ng RDF
@prefix schema: <http://schema.org/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix lcc: <http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#> . @prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CouncilOfficesBelgraveHouse> a schema:CivicStructure ; rdfs:label "Belgrave House" .
<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CommunityCentreTunstallRoad> a schema:CivicStructure ; rdfs:label "Tunstall Road" .
Export à RDF as Turtle
72
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Evalua=ng the exported data
• Manual inspec=on • Syntax evalua=on (with syntax validator) • Consistency with the ontologies (with reasoner) • Usage evalua=on (e.g., by running SPARQL queries) – Show all electricity consump=ons and the related =me periods for all council sites related to culture
– Show all energy consump=ons and the related =me periods of council sites from the Wakefield district
73
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci=es • Guidelines for the Genera=on of Linked Data • Discussion • Hands-‐on Descrip=on
74
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
75
Richer schema (and data) time:Interval
schema:City
ssn:Observation
ssn:observationSamplingTime
ssn:SensorOutput
ssn:ObservationValue
ssn:hasValue
ssn:FeatureOfInterest
ssn:featureOfInterest
lcc:hasQuantityValue :: xsd:decimal ssn:Property
ero:FinalEnergy
ssn:observedProperty
ssn:observationResult
LegendClassdatatype property :: datatype
object property subclass of relation
schema:CivicStructurelcc:uprn :: xsd:Stringdc:title :: xsd:String
schema:PostalAddressschema:addressLocality :: xsd:Stringschema:addressRegion :: xsd:Stringschema:streetAddress :: xsd:Stringschema:postalCode :: xsd:String
schema:address
admingeo:District
admingeo:district
time:Instanttime:inXSDDateTime :: xsd:dateTime
time:hasBeginningtime:hasEnd
ero:EnergyConsumerFacility
ero:consumesEnergyType
om:Unit_of_measure
lcc:hasQuantityUnitOfMeasurement
SupplyOrStorageSite
OpenAirSite
AccomodationSite AdministrativeSite
OfficeSite
EducationalSite
SocialSite
OtherSite
CulturalSite
schema:containedIn
schema:Place
schema:AdministrativeAreaLeisureSite
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data are just data
01000000
electric1011
01000000
electric1112
01000000
0 20 40 60 80 100
electric1213
Building
Electrical consumption
0e+00
2e+06
4e+06
6e+06
8e+06
0 500000 1000000 1500000 2000000Electricity
Gas
Electricity vs gas consumption 12/13
0.0e+00
4.0e+06
8.0e+06
1.2e+07
0 500000 1000000 1500000 2000000Electricity
Oil
Electricity vs oil consumption 12/13
76
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
77
Benefits of linking data
resPlus$electricTotal
0e+00
2e+06
4e+06
6e+06
Total electric consump2on Original data + geoloca=on
resPopulation$electricTotal
0e+00
2e+06
4e+06
6e+06
Total electric consump2on in loca2ons with popula2on > 20.000 Original data + geoloca=on + popula=on
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Benefits of reasoning
resPlus$electricTotal
250000
500000
750000
1000000
Total electric consump2on in cultural buildings
schema:CivicStructure
CulturalSite
Museum Library
78
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Linked Open Data in Smart Ci=es • Guidelines for the Genera=on of Linked Data • Discussion • Hands-‐on Descrip2on
79
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
What are we going to do?
Specification
Modelling
Generation Publication
Exploitation
Linking
80
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
What are we going to do?
Select data source
Obtain access to
data source
Analyse data source
Analyse licensing of
the data source
Define resource naming strategy
Transform data source
Link with other
datasets
Data source
Access, data
License
Schema, data
Resource naming strategy
Ontology
RDF data
Linked dataset
Ontology Develop ontology
81
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hands-‐on task 1 • Goal: to get familiar with the first steps in the Linked
Data genera=on process • The students will have to take their selected dataset(s)
and perform the following tasks: – Analyse Data Set
• Both the data (quan==es, value ranges, etc.) and the schema – Analyse Licensing of the Data Source
• Who is the publisher and the rightsholder? • What is the licence? • Which will be the license to be used for the generated dataset?
– Define Resource Naming Strategy • For the ontology and the data (URI form, content nego=a=on, URIs domain, path, paQerns, etc.)
– Finish Ontology Development • Lightweight ontology (i.e., classes, proper=es, domains and ranges)
82
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hands-‐on task 1 -‐ Deliverables
• A document that includes: – The analyses performed over the data source – The licensing of the data source and the poten=al license
– The resource naming strategy defined
• An OWL file with the ontology developed, according to the resource naming strategy defined
83
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hands-‐on task 2 • Goal: to get familiar with the transforma=on of CSV data into RDF using LODRefine
• The students will have to take their selected dataset(s) and perform the following tasks: – Import data into LODRefine – Analyse and fix data
• Analysis performed in the previous class, but can be updated with new findings
• Fix the data to remove errors • Transform the data to facilitate RDF genera=on
– Export data to RDF • Define an RDF skeleton for the data • Export the data to RDF (Turtle syntax)
84
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Hands-‐on task 2 -‐ Deliverables
For each dataset: • An RDF file in the Turtle syntax with the data transformed into RDF
85
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
1st Summer School on Smart Ci2es and Linked Open Data (LD4SC-‐15)
Thank you for your aQen=on!
top related