linked data generation process

86
LD4SC Summer School 7 th 12 th June, Cercedilla, Spain 1st Summer School on Smart Ci2es and Linked Open Data (LD4SC15) Linked Data Genera=on Process Raúl GarcíaCastro, Filip Radulovic, Oscar Corcho, María Poveda, Víctor RodríguezDoncel, Asunción GómezPérez, Daniel VilaSuero Presenter: Raúl GarcíaCastro

Upload: ld4sc

Post on 28-Jul-2015

270 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

1st  Summer  School  on    Smart  Ci2es  and  Linked  Open  Data  (LD4SC-­‐15)  

Linked  Data  Genera=on  Process  Raúl  García-­‐Castro,    Filip  Radulovic,  Oscar  Corcho,  María  Poveda,  Víctor  Rodríguez-­‐Doncel,  Asunción  Gómez-­‐Pérez,  Daniel  Vila-­‐Suero  

Presenter:  Raúl  García-­‐Castro  

Page 2: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Index  

•  Linked  Open  Data  in  Smart  Ci2es  •  Guidelines  for  the  Genera=on  of  Linked  Data  •  Discussion  •  Hands-­‐on  Descrip=on  

2  

Page 3: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Data  in  smart  ci=es  

hQp://br.fiberhomegroup.com/pt/Enterprise/324/2282.aspx  

3  

Page 4: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

•  For  example,  (re)using  open  transport  data  –  Provide  travel  informa=on  to  persons  – Allow  beQer  mul=modal  route  planning  –  Facilitate  public  transport  management  – …  – Accessibility  

•  Which  metro  accesses  are  accessible  for  wheelchair  users?  •  In  which  bus  stops  is  it  safer  and  more  convenient  for  a  wheelchair  user  to  wait?  

•  Is  there  any  accessible  parking  space  nearby  a  bus  stop?  •  etc.  

Open  data…  for  what?  

4  

Page 5: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Legal  framework  and  open  data  ini=a=ves  

•  Aarhus  Conven=on  (1998)  –  Right  to  par=cipa=on  and  access;  41  countries  and  the  EU  

•  Open  Access  Ini=a=ve  (2001)  –  Scien=fic  informa=on  on  the  Web;  >  510  organisa=ons  

•  PSI  Direc=ve  –  PSI  Reuse  (2003/98/EC)  

•  Conven=on  for  the  access  to  official  documents  (2009)  –  Signed  by  12  countries  –  Belgium,  Finland,  Norway,  Sweden,  Hungary,  Estonia,  Lithuania,  Slovenia,  Georgia,  

Montenegro,  Serbia  and  Macedonia  

•  Law  37/2007.  PSI  Reuse  •  Law  11/2007.  Ci=zen  access  to  public  services  and  right  to  the  quality  of  services  •  RD  4/2010  Na=onal  Interoperability  Scheme  

–  Open  standards  –  Technology  neutral  –  Open  source  solware  

•  RD  1495/2011  It  develops  law  37/2007  •  Norma  Técnica  de  Interoperabilidad  (19/02/2013,  BOE  4/3/2013)  

Adapted  from  Antonio  Rodríguez  Pascual  (IGN)  5  

Page 6: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

The  problem:  lack  of  interoperability  

Publish  

Extract  

Publish  

Extract  

Publish  

Extract  

I  want  to  publish  data  in  an  interoperable  

structure  and  format  

I  use  GTFS   I  use  my  own  CSV  structure  

I  provide  a  web  service  

Build  an  app  that  is  available  all  over  the  

world  

6  

Page 7: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Scenario:  open  transport  data  

 Is  there  any  open  transport  

data  already?  

 We  are  surrounded  by  them  

7  

Page 8: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Open  data  and  how  they  are  published  

1)  In  no2ce  boards  –  For  those  who  have  a  lot  of  free  =me  –  Or  those  who  are  there  at  the  right  moment  in  =me  

Adapted  from  Antonio  Rodríguez  Pascual  (IGN)  

DATA  

8  

Page 9: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

 Open  data  and  how  they  are  published  

2)  In  web  pages  and  mobile  apps  –  For  people  

Adapted  from  Antonio  Rodríguez  Pascual  (IGN)  

On  the  Web,  open  license  

DATA  

9  

Page 10: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

 Open  data  and  how  they  are  published  

2)  In  web  pages  and  mobile  apps  –  For  people  

Adapted  from  Antonio  Rodríguez  Pascual  (IGN)  

On  the  Web,  open  license  

DATA  

Machine-­‐readable  

Non-­‐proprietary  format  

Page 11: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

 Open  data  and  how  they  are  published  

3)  As  web  files  –  So  that  they  can  be  loaded  by  humans  in  their  

informa=on  systems  (XML,  HTML,  CSV,  etc.)  –  Hopefully  it  is  not  a  scanned  PDF  

Adapted  from  Antonio  Rodríguez  Pascual  (IGN)  

On  the  Web,  open  license  

DATA  

Machine-­‐readable  

Non-­‐proprietary  format  

11  

Page 12: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  Adapted  from  Antonio  Rodríguez  Pascual  (IGN)  

 Open  data  and  how  they  are  published  

4)  Via  web  services  –  For  humans  and  machines  –  It  allows  genera=ng  added-­‐value  services  –  And  can  be  integrated  in  the  applica=on  business  logic  

On  the  Web,  open  license  

DATA  

Machine-­‐readable  

Non-­‐proprietary  format  

12  

Page 13: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

What  is  open  data?  

•  Open  data  are  data  that  can  be  freely  used,  reused  and  redistributed  by  anyone  -­‐  subject  only,  at  most,  to  the  requirement  to  a9ribute  and  sharealike.  

•  The  most  important  aspects  to  consider:  –  Availability  and  Access:  data  must  be  available  as  a  whole  and  at  no  

more  than  a  reasonable  reproduc2on  cost,  preferably  by  downloading  over  the  Internet.  Data  must  also  be  available  in  a  convenient  and  modifiable  form.  

–  Reuse  and  Redistribu2on:  data  must  be  provided  under  terms  that  permit  reuse  and  redistribu2on  including  the  intermixing  with  other  datasets.  

–  Universal  Par2cipa2on:  everyone  must  be  able  to  use,  reuse  and  redistribute  -­‐  there  should  be  no  discrimina2on  against  fields  of  endeavour  or  against  persons  or  groups.  For  example,  ‘non-­‐commercial’  or  ‘only  in  educa=on’  restric=ons.  

Source:  Open  Data  Handbook  13  

Page 14: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Scenario:  open  transport  data  

 Is  there  any  open  transport  

data  already?  

 Can  we  do  it  beSer?  

14  

Page 15: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Going  into  4  and  5            Linked  Data  

Make  it  available  as  structured  data  (e.g.,  Excel  instead  of  image  scan  or  a  table)  

Use  non-­‐proprietary  formats  (e.g.,  CSV  instead  of  Excel)  

Use  URIs  to  iden2fy  things,  so  that  people  can  point  at  your  stuff  

Link  your  data  to  other  data  to  provide  context  

Make  your  stuff  available  on  the  Web  (whatever  format)  under  an  open  license  

15  

Page 16: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

USE  URIs  +  RDF   RDF  standards  

José  

Mobility  impairment  

Boardgames  

API  

Mirasierra  

Ven=squero  de  la  Condesa  

Yes  

CSV  

Mega  Games  

Ven=squero  de  la  Condesa  

Yes  

CSV  

Mega  Games  

Conquer  &  Smash!  

MG

29,95  

HTML  

José  

Mobility  Impairment  

hasImpairment  

WheelchairAccessibility  

requires  

Boardgame  

likes  

Mirasierra  

address   Ven=squero  de  la  Condesa  

WheelchairAccessibility  

hasAccessibility  

Mega  Games  

address  

hasAccessibility  WheelchairAccessibility  

Ven=squero  de  la  Condesa  

Mega  Games  

Conquer  &  Smash!  

is  a  Boardgame  

sells  

API   RDF   CSV   RDF   CSV   RDF   HTML   RDF  

Page 17: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Link  your  data   Linked  RDF  

José  

Mobility  impairment  

Boardgames  

Mirasierra  

Ven=squero  de  la  Condesa  

Yes  

Mega  Games  

Ven=squero  de  la  Condesa  

Yes  

Mega  Games  

Conquer  &  Smash!  

MG

29,95  

API   CSV   CSV   HTML  

José  

Mobility  Impairment  

hasImpairment  

WheelchairAccessibility  

requires  

Boardgame  

likes  

Mirasierra  

address   Ven=squero  de  la  Condesa  

WheelchairAccessibility  

Mega  Games  

address  

hasAccessibility  WheelchairAccessibility  

Mega  Games  

Conquer  &  Smash!  

is  a  

hasAccessibility  

Boardgame  

Ven=squero  de  la  Condesa  

sells  

API   RDF   CSV   RDF   CSV   RDF   HTML   RDF  

Page 18: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

WheelchairAccessibility  

Ven=squero  de  la  Condesa  

Boardgame  

Link  your  data   Linked  RDF  

José  

Mobility  impairment  

Boardgames  

Mirasierra  

Ven=squero  de  la  Condesa  

Yes  

Mega  Games  

Ven=squero  de  la  Condesa  

Yes  

Mega  Games  

Conquer  &  Smash!  

MG

29,95  

API   CSV   CSV   HTML  

José  

Mobility  Impairment  

hasImpairment  

WheelchairAccessibility  

requires  

Boardgame  

likes  

Mirasierra  

address   Ven=squero  de  la  Condesa  

hasAccessibility   WheelchairAccessibility  

Mega  Games  

address  Ven=squero  de  la  Condesa  

hasAccessibility  WheelchairAccessibility  

Mega  Games  

sells  Conquer  &  Smash!  

is  a  Boardgame  

API   RDF   CSV   RDF   CSV   RDF   HTML   RDF  

Page 19: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Make  complex  queries  

Where  can  I  buy  the  Conquer  &  Smash!  

game?  

Which  are  the  most  accessible  routes  for  Christmas  shopping?  

Expansion  pack  for  Conquer  &  Smash!  Take  metro  line  9  and  in  35  minutes  

we  can  demo  it  to  you!  

Or  beQer  take  bus  231  because  it  is  sunny  and  you  can  take  a  glance  at  the  outdoor  art  

exhibi=on  in  Plaza  de  Cas=lla  

MG

Page 20: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Using  Linked  Open  Transport  Data  

•  Calculate  accessible  routes  –  Combined  with  geographical  data  (IGN)  – Which  stop  should  I  use  if  I  have  mobility  problems?  

•  Commercial  routes  by  bus  –  Combined  with  Madrid’s  shop  census  (from  Ayto.  Madrid)  

•  Geomarke=ng  decisions  for  enterpreneurs  – Where  should  I  open  my  shop?  Based  on  the  combina=on  of  the  number  of  travellers  per  stop,  demographic  data,  data  about  other  businesses  and  shops  around,  etc.  

•  Personalised  offers  to  travellers  – With  real-­‐=me  data  and  data  about  consump=on  paQerns  (e.g.,  credit  card  transac=ons)  

•  …  

20  

Page 21: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Index  

•  Linked  Open  Data  in  Smart  Ci=es  •  Guidelines  for  the  Genera2on  of  Linked  Data  •  Discussion  •  Hands-­‐on  Descrip=on  

21  

Page 22: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  life  cycle  

Specification

Modelling

Generation Publication

Exploitation

Linking

22  

Page 23: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Requirements  (smart  ci=es  domain)    

1.  Tabular  formats  (i.e.,  SQL,  XLS  or  CSV)  –  Other  data  structures  (e.g.,  XML)  less  important  in  prac=ce  

or  are  unstructured  and  would  require  much  more  work  2.  Changing  data  (dynamic  or  streaming  data),  versioning,  

(automa=c)  data  quality  assurance  and  reliability  3.  Data  access  through  web  services,  proprietary  APIs  and  

data  files  4.  Legal  aspects  (e.g.,  licensing,  data  ownership)  5.  Access  rights  management  or  mechanisms  for  

extrac=ng  public  data  (plenty  of  confiden=al  data)  

23  

Page 24: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  genera=on  process  

Select data source

Obtain access to

data source

Analyse data source

Analyse licensing of

the data source

Define resource naming strategy

Transform data source

Link with other

datasets

Data source

Access, data

License

Schema, data

Resource naming strategy

Ontology

RDF data

Linked dataset

Ontology Develop ontology

24  

F.  Radulovic,  M.  Poveda-­‐Villalón,  D.  Vila-­‐Suero,  V.  Rodríguez-­‐Doncel,  R.  García-­‐Castro  and  A.  Gómez-­‐  Pérez,  Guidelines  for  Linked  Data  genera=on  and  publica=on:  An  example  in  building  energy  consump=on,  Automa=on  in  Construc=on,  Special  Issue  on  Linked  Data  in  Architecture  and  Construc=on.  Available  online  April  2015.  

Page 25: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  genera=on  process  

Select data source

Obtain access to

data source

Analyse data source

Analyse licensing of

the data source

Define resource naming strategy

Transform data source

Link with other

datasets

Data source

Access, data

License

Schema, data

Resource naming strategy

Ontology

RDF data

Linked dataset

Ontology Develop ontology

DATA PREPARATION

25  

Page 26: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Select  data  source  

•  Select  the  data  source  that  will  be  transformed  into  Linked  Data  

•  Steps:  – To  define  the  requirements  for  selec=on  – To  select  one  or  several  data  sources  

•  The  data  set  may  be:  – Owned  by  your  organiza=on…  – …  or  not  (external  data  sources)  

26  

Page 27: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Select  data  source  –  LCmple  

•  Requirements  –  Real-­‐world  scenario  in  the  smart  city  domain    – Available  for  use  – Available  in  machine-­‐processable  format  (the  more  structured  the  data  are,  the  beQer)  

–  Can  be  linked  with  generic  en==es  (e.g.,  loca=on)  •  Leeds  City  Council  –  energy  consump=on    

–  hQp://data.gov.uk/dataset/council-­‐energy-­‐consump=on  

27  

Page 28: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Obtain  access  to  data  source  

•  Data  access  means    –  Technical  means  to  retrieve  the  data  –  Legal  rights  to  use  the  data  

•  If  the  data  is  not  accessible:  –  To  iden=fy  the  person  to  contact  –  To  request  the  access  –  To  obtain  access  and  to  retrieve  the  data  

•  Access  alterna=ves:    –  file,    –  programming  interface,    –  database,    –  data  stream,    –  etc.  

28  

Page 29: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Obtain  access  to  data  source  –  Lample  

•  Data  set  already  available  as  a  CSV  file  

29  

Page 30: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analysing  licensing  of  the  data  source  

•  Licenses  specify  the  legal  terms  under  which  a  data  set  can  be  used  and  exploited    

•  Neither  legal  prescrip=ons  on  how  to  declare  licenses  nor  common  standard  prac=ces  to  do  so    

•  Steps  (not  automatable):  –  To  iden=fy  the  rightsholder  and  the  authorita=ve  publisher  

•  Righstholder  vs.  authorized  distributor  –  To  find  the  applicable  license  

•  Web  page,  data  set  metadata,  data  themselves  •  Contact  the  publisher  

–  To  read  the  license  and  analyse  legal  terms  •  Tips  

–  Analysis  should  be  performed  upon  all  copies  and  formats  of  the  data  –  Ensure  license  compa=bility  when  integra=ng  several  data  sources  

30  

Page 31: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  resources  can  be  protected  

Ontologies are intellectual works, they can be protected by copyright RDF Datasets can be considered as databases, also legally protected in the EU

31  

Page 32: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Create, consume, aggregate, derive and publish Linked Data in a lawful environment

0

Always  license  your  data  

…  

Data  shops   Government   Individuals  

32  

Page 33: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Licensed  Linked  Data  

Non-­‐licensed  Linked  Data   Licensed  Linked  Data  

+License

Unless there is a license allowing to do so, the resource cannot be copied, modified or published. In practice, non-licensed resources are useless in industrial settings

Licensed Linked Data can be used

33  

Page 34: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Licensed  Linked  Data  in  prac=ce  

Linked Open Data Published Open License

(Published) Linked Data Published No Open License

Linked Data Not Published No Open License

34  

Page 35: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

ç

Guidelines  for  licensing  linked  data  

35  

Add  "rights"  metadata  in  the  dataset  descrip=on  (e.g.,  VoID,  DCAT)  1  

Use  standard  predicates  to  declare  "rights"  statements    (e.g.,  Dublin  Core  terms:  dc:rights,  dct:license)  2  

?

Use  rights  declara2on  language,  e.g.,  ODRL  

Yes

Use  URI  of  standard  license    e.g.,  CC0   3b  3a  

No

Standard license available

ODRL  Open  Digital  Rights  Language  

DCAT  Data  catalog  vocabulary  

Page 36: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Licensing  Linked  Data  is  Simple…  

The  Bri=sh  Na=onal  Bibliography  (BNB)  lists  the  books  and  new  journal  =tles  published  or  distributed  in  the  United  Kingdom  and  Ireland  since  1950.  

J  36  

Page 37: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

…  or  complex  depending  your  needs  

Policies  can  be  expressed  with  ODRL  2.0  to  govern  access  to  Linked  Data  Example  of  access  to  Linked  Data  for  a  price  (15EUR  for  the  dataset  or  0.01EUR  for  a  triple  thereof)  

@prefix gr: <http://purl.org/goodrelations/> . @prefix dcat: <http://www.w3.org/ns/dcat#> . <http://salonica.dia.fi.upm.es/ldr/policy/cdaddba4-fc2e-4ee0-a784-e62f1db259bf> a odrl:Set ; rdfs:label "License Offering Paid Linked Data" ; odrl:permission [ a odrl:Permission ; odrl:target <http://example.org/dataset/ds01> ; odrl:action odrl:reproduce ; odrl:duty [ a odrl:Duty ; rdfs:label "Pay" ; gr:UnitOfMeasurement dcat:Dataset ; gr:amountOfThisGood "1" ; odrl:action odrl:pay ; odrl:target "15,00 EUR" ] ] , [ a odrl:Permission ; odrl:action odrl:reproduce ; odrl:target <http://example.org/dataset/ds01> ; odrl:duty [ a odrl:Duty ; rdfs:label "Pay" ; gr:UnitOfMeasurement rdf:Statement ; gr:amountOfThisGood "1" ; odrl:action odrl:pay ; odrl:target "0,01 EUR" ] ] ..

The target can be an ontology, a dataset, a SPARQL endpoint… …or a SPARQL query itself or a triple pattern: {mysubject, ?p , ?o}

37  

Page 38: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

And  you  have  support  for  that  

•  Condi=onal  access  to  Linked  Data  –  hQp://condi=onal.linkeddata.es  

•  Dataset  of  licenses  in  RDF  –  hQp://rdflicense.appspot.com  

•  ODRL  Profile  for  Linked  Data  –  hQp://purl.oclc.org/NET/ldr/ns#  –  hQps://www.w3.org/community/odrl/profile/linkeddata/    

38  

Page 39: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analyse  licensing  –  LCmple  

39  

Page 40: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analyse  data  source  

•  Get  insight  into  the  data  structure  and  organiza=on  •  Steps:  

–  To  analyse  the  characteris=cs  of  the  data  •  Data  values,  data  ranges,  etc.  

–  To  obtain  the  schema  of  the  data  •  Concepts  and  their  rela=onships  

•  Data  can  be  available  as:    –  Structured  data  –  Unstructured  data  

•  If  the  schema  does  not  exist:    –  Use  a  standard  modeling  language  for  describing  the  data  schema  (e.g.,  UML)  

40  

Page 41: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analyse  data  source  –  LCmple  

•  Metadata  not  quite  descrip=ve:  – Different  types  of  council  sites  (mostly  buildings)  

–  Electricity,  gas  and  oil  consump=ons    

–  1-­‐year  intervals  -­‐  2010/11,  2011/12,  2012/13  

•  Analysis  required  contac=ng  with  people  from  LCC  open  data  

41  

Page 42: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analyse  data  source  –  LCmple  

42  

hQp://localhost:3333/  

Page 43: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analyse  data  source  –  LCmple  

43  

Page 44: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Analyse  data  source  –  LCmple  

•  Analyse  the  characteris=cs  of  data  using  facets  •  Obtain  the  schema  of  the  data  

44  

Page 45: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Data  characteris=cs  and  schema  –  LCCLLIDD  

Column   Type   Comments  /  Range  (rounded)   Problems  

uprn   String   Not  unique,  empty  values  

Site  Name   String   Unique?    Site  types  +  name  

4  repeated  sites  

Address  2   String   Not  unique,  empty  values  

Address  3   String   Not  unique,  empty  values   Village?  Civil  Parish?    

Address  4   String   Not  unique,  empty  values   City?  Metropolitan  district?  “leeds”  vs  “Leeds”  

PostCode   String   Not  unique,  empty  values  

Electricity  10/11   Decimal   0  —  2.700.000  

Electricity  11/12   Decimal   0  —  2.300.000  

Electricity  12/13   Decimal   0  —  2.400.000  

Gas  10/11   Decimal   -­‐100,000  —  6,100,000   Nega=ve  values  

Gas  11/12   Decimal   -­‐100,000  —  7,800,000   Nega=ve  values    

Gas  12/13   Decimal   -­‐100,000  —  8,300,000   Nega=ve  values  

Oil  12/13   Decimal   -­‐1,000,000  —  13,000,000   Nega=ve  values  45  

Page 46: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  genera=on  process  

Select data source

Obtain access to

data source

Analyse data source

Analyse licensing of

the data source

Define resource naming strategy

Transform data source

Link with other

datasets

Data source

Access, data

License

Schema, data

Resource naming strategy

Ontology

RDF data

Linked dataset

Ontology Develop ontology

DEFINE RESOURCE NAMING STRATEGY

46  

Page 47: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Hash  and  slash  URIs  

•  Hash  URIs  (#)    –  hQp://www.energycompany.com/about#energyCompany  –  The  fragment  part  has  to  be  stripped  off  when  the  URI  is  requested  from  the  server  (i.e.,  the  resource  cannot  be  retrieved  directly)  

–  Hash  URIs  can  be  used  to  iden=fy  non-­‐document  resources    •  Slash  URIs  (/)  

–  hQp://www.energycompany.com/about/energyCompany  –  Imply  a  303  redirec=on  to  the  loca=on  of  a  document  that  represents  the  resource  (+  content  nego=a=on)    

•  E.g.,  hQp://www.energycompany.com/about/energyCompany.rdf  –  Drawbacks:  HTTP  round-­‐trip,  redirects,  web  server  configura=on  

47  

Page 48: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Hash  or  slash?  

•  Depends  on  the  data  and  on  their  expected  use  •  Small  data:  

–  Hash  namespace  –  Access  all  the  data  as  a  whole  –  HTTP  GET  would  return  a  single  informa=on  resource  with  everything    

•  Large  /  frequently-­‐updated  /  modular  data:  –  Slash  namespace  –  Access  resources  individually  or  in  groups  –  Resource  descrip=ons  may  be  divided  among  many  informa=on  resources  or  may  be  managed  via  a  query  service  (e.g.,  SPARQL)  

–  Progressively  greater  detail  about  resources  may  be  retrieved  through  mul=ple  accesses  

48  

Page 49: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Define  resource  naming  strategy  

•  Steps:    –  To  choose  a  URI  form  (hash  or  slash)    –  To  choose  a  domain  for  the  URIs.    –  To  choose  a  path  for  the  URIs.    –  To  choose  a  paQern  for  ontology  classes  and  proper=es  in  the  ontology,  as  well  as  for  individuals  

•  Tips:    –  One  URI  must  iden=fy  only  one  item  (e.g.,  avoid  mixing  with  web  pages  and  real-­‐world  objects)    

–  URIs  should  be  persistent  and  should  not  change  over  =me  (e.g.,  state  informa=on);  PURL  may  support  this  

–  Use  a  domain  that  is  under  your  control  (or  a  service  such  as  PURL)    

–  Separate  the  ontology  model  from  its  instances    –  Define  meaningful  URIs  

49  

Page 50: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Resource  naming  strategy  –  LCC  

•  Hash  URIs  for  ontological  terms,  slash  URIs  for  individuals  •  Domain:  hQp://smartcity.linkeddata.es/  •  Ontological  terms  path:    

–  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#  

•  Individuals  path:    –  hQp://smartcity.linkeddata.es/lcc/resource/  

•  Ontological  terms  paSern:    –  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>  –  Ex.:  hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#hasQuan=ta=veValue  

•  Individuals  paSern:    –  hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>  –  Ex.:  hQp://smartcity.linkeddata.es/lcc/resource/LeisureCentre/WetJohnCharlesCentreforSport  

50  

Page 51: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  genera=on  process  

Select data source

Obtain access to

data source

Analyse data source

Analyse licensing of

the data source

Define resource naming strategy

Transform data source

Link with other

datasets

Data source

Access, data

License

Schema, data

Resource naming strategy

Ontology

RDF data

Linked dataset

Ontology Develop ontology

DEVELOP ONTOLOGY

51  

Page 52: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Ontology  development  

6. Ontologyimplementation

5. Ontology selection

1. Requirements definition

Can you represent all your data?

7. Ontology evaluation

2. Terms extraction

3. Ontology conceptualization

4. Ontology search

6.2 Ontology completion

3.1 Initial model drafting

3.2 Detailed model definition

6.1 Ontology integration

You  did  this  yesterday  

52  

Page 53: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Ontology  development  –  LCCDD  

53  

Page 54: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  genera=on  process  

Select data source

Obtain access to

data source

Analyse data source

Analyse licensing of

the data source

Define resource naming strategy

Transform data source

Link with other

datasets

Data source

Access, data

License

Schema, data

Resource naming strategy

Ontology

RDF data

Linked dataset

Ontology Develop ontology

TRANSFORM DATA

54  

Page 55: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Data  transforma=on  

•  Steps:  –  To  select  the  RDF  serializa=on  

•  RDF/XML,  Turtle,  N-­‐Triples,  JSON-­‐LD  –  To  select  a  tool.  Depends  on:  

•  The  format  of  the  data  (database,  spreadsheets,  etc.),    •  Concrete  needs  of  the  transforma=on  process  (e.g.,  dynamicity)  

–  To  transform  the  data  into  RDF  •  Usually  requires  a  mapping  between  the  data  and  the  ontology  

•  The  mapping  implements  the  resource  naming  strategy  –  To  evaluate  the  obtained  RDF  data:  

•  Syntax,  Completeness,  Accuracy,  Conciseness,  Modelling,  Understandability,  Versa=lity,  Usage,  Licensing,  …  

55  

Page 56: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Data  transforma=on  tools  

Database  to  RDF   Data  streams  to  RDF  •  morph-­‐RDB  •  D2R  Server  •  TopBraid  Composer  

•  morph-­‐streams  •  D2R  Server  

Spreadsheets  to  RDF    

XML  to  RDF    

•  TopBraid  Composer  •  Excel2RDF  •  RDF123  •  XLWrap  •  OpenRefine/LODRefine    

•  XML2RDF  •  TopBraid  Composer  •  OpenRefine/LODRefine  

56  

Page 57: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Data  transforma=on  tools  

Database  to  RDF   Data  streams  to  RDF  •  morph-­‐RDB  •  D2R  Server  •  TopBraid  Composer  

•  morph-­‐streams  •  D2R  Server  

Spreadsheets  to  RDF    

XML  to  RDF    

•  TopBraid  Composer  •  Excel2RDF  •  RDF123  •  XLWrap  •  OpenRefine/LODRefine    

•  XML2RDF  •  TopBraid  Composer  •  OpenRefine/LODRefine  

Overview  of  OpenRefine  

57  

Page 58: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

OpenRefine  basic  opera=ons  •  Installing    •  Crea=ng  a  new  project    •  Data  analysis    

–  Exploring  data    –  Sor=ng  data    –  Face=ng  data    –  Filtering  data    

•  Basic  data  transforma=on  (cleaning/preparing)    –  Columns:    

•  Move    •  Rename    •  Remove  columns    •  Collapse  and  expand    •  Common  transforma=ons    

–  Rows:  •  Remove  rows    

•  Export  whole  project    

58  

Page 59: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Adding  derived  columns  Edit  column  à    Add  column  based  on  this  column...    

59  

Page 60: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Spli�ng  data  accross  columns  Edit  column  à    Split  into  several  columns...    

60  

Page 61: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Handling  mul=-­‐valued  cells  Edit  Cells  à    Split  mul=-­‐valued  cells...    

Edit  Cells  à    Join  mul=-­‐valued  cells...    

61  

Page 62: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Rows  and  records  Show  as:    rows    records  

Record  

Row  

62  

Page 63: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Clustering  similar  cells  Edit  cells  à    Cluster  and  edit...    

63  

Page 64: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Transposing  rows  and  columns  Transpose  à  Transpose  cells  across  columns  into  rows...    

Transpose  à  Columnize  by  key/value  columns...    

64  

Page 65: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Other  useful  u=li=es  •  Regular  expressions    

–  Java  regular  expressions  •  Custom  transforma=ons    

–  General  Refine  Expression  Language  (GREL)  –  Jython  (Python  implemented  in  Java)  –  Clojure  (func=onal  language  that  resembles  Lisp)    

65  

Page 66: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

66  

Using  the  project  history  

•  Project  history:  – Access  opera=on  history    – Undo  opera=ons    – Extract  opera=ons  (in  JSON)  – Apply  opera=ons    

•  Cau=on:    – Transforma=ons  are  registered  in  the  history;  filters  and  facets  are  not    

Page 67: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Solving  memory  problems  

hQps://github.com/OpenRefine/OpenRefine/wiki/FAQ:-­‐Allocate-­‐More-­‐Memory    

67  

Page 68: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

OpenRefine  RDF  extension  -­‐  RDF  skeleton  

•  Resource  naming  strategy    – Ontological  terms  paQern:    

 hQp://smartcity.linkeddata.es/lcc/ontology/EnergyConsump=on#<term_name>  

–  Individuals  paQern:      hQp://smartcity.linkeddata.es/lcc/resource/<resource_type>/<resource_name>  

Add  base  URI  Add  prefixes  

68  

Page 69: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Crea=ng  individuals  

schema:CivicStructure

rdf:type

lccRes:CouncilOfficesBelgraveHouse  

69  

Page 70: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Previewing  results  

70  

Page 71: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Adding  property  values  

rdfs:label schema:CivicStructure xsd:string

rdf:type

lccRes:CouncilOfficesBelgraveHouse  rdfs:label

“Belgrave  House”

71  

Page 72: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Expor=ng  RDF  

@prefix schema: <http://schema.org/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix lcc: <http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#> . @prefix owl: <http://www.w3.org/2002/07/owl#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CouncilOfficesBelgraveHouse> a schema:CivicStructure ; rdfs:label "Belgrave House" .

<http://smartcity.linkeddata.es/lcc/resource/CivicStructure/CommunityCentreTunstallRoad> a schema:CivicStructure ; rdfs:label "Tunstall Road" .

Export  à  RDF  as  Turtle  

72  

Page 73: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Evalua=ng  the  exported  data  

•  Manual  inspec=on  •  Syntax  evalua=on  (with  syntax  validator)  •  Consistency  with  the  ontologies  (with  reasoner)  •  Usage  evalua=on  (e.g.,  by  running  SPARQL  queries)  – Show  all  electricity  consump=ons  and  the  related  =me  periods  for  all  council  sites  related  to  culture  

– Show  all  energy  consump=ons  and  the  related  =me  periods  of  council  sites  from  the  Wakefield  district  

73  

Page 74: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Index  

•  Linked  Open  Data  in  Smart  Ci=es  •  Guidelines  for  the  Genera=on  of  Linked  Data  •  Discussion  •  Hands-­‐on  Descrip=on  

74  

Page 75: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

75  

Richer  schema  (and  data)  time:Interval

schema:City

ssn:Observation

ssn:observationSamplingTime

ssn:SensorOutput

ssn:ObservationValue

ssn:hasValue

ssn:FeatureOfInterest

ssn:featureOfInterest

lcc:hasQuantityValue :: xsd:decimal ssn:Property

ero:FinalEnergy

ssn:observedProperty

ssn:observationResult

LegendClassdatatype property :: datatype

object property subclass of relation

schema:CivicStructurelcc:uprn :: xsd:Stringdc:title :: xsd:String

schema:PostalAddressschema:addressLocality :: xsd:Stringschema:addressRegion :: xsd:Stringschema:streetAddress :: xsd:Stringschema:postalCode :: xsd:String

schema:address

admingeo:District

admingeo:district

time:Instanttime:inXSDDateTime :: xsd:dateTime

time:hasBeginningtime:hasEnd

ero:EnergyConsumerFacility

ero:consumesEnergyType

om:Unit_of_measure

lcc:hasQuantityUnitOfMeasurement

SupplyOrStorageSite

OpenAirSite

AccomodationSite AdministrativeSite

OfficeSite

EducationalSite

SocialSite

OtherSite

CulturalSite

schema:containedIn

schema:Place

schema:AdministrativeAreaLeisureSite

Page 76: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Linked  Data  are  just  data  

01000000

electric1011

01000000

electric1112

01000000

0 20 40 60 80 100

electric1213

Building

Electrical consumption

0e+00

2e+06

4e+06

6e+06

8e+06

0 500000 1000000 1500000 2000000Electricity

Gas

Electricity vs gas consumption 12/13

0.0e+00

4.0e+06

8.0e+06

1.2e+07

0 500000 1000000 1500000 2000000Electricity

Oil

Electricity vs oil consumption 12/13

76  

Page 77: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

77  

Benefits  of  linking  data  

resPlus$electricTotal

0e+00

2e+06

4e+06

6e+06

Total  electric  consump2on    Original  data    +  geoloca=on  

resPopulation$electricTotal

0e+00

2e+06

4e+06

6e+06

Total  electric  consump2on  in  loca2ons  with  popula2on  >  20.000    Original  data    +  geoloca=on  +  popula=on  

Page 78: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Benefits  of  reasoning  

resPlus$electricTotal

250000

500000

750000

1000000

Total  electric  consump2on  in  cultural  buildings  

schema:CivicStructure

CulturalSite

Museum Library

78  

Page 79: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Index  

•  Linked  Open  Data  in  Smart  Ci=es  •  Guidelines  for  the  Genera=on  of  Linked  Data  •  Discussion  •  Hands-­‐on  Descrip2on  

79  

Page 80: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

What  are  we  going  to  do?  

Specification

Modelling

Generation Publication

Exploitation

Linking

80  

Page 81: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

What  are  we  going  to  do?  

Select data source

Obtain access to

data source

Analyse data source

Analyse licensing of

the data source

Define resource naming strategy

Transform data source

Link with other

datasets

Data source

Access, data

License

Schema, data

Resource naming strategy

Ontology

RDF data

Linked dataset

Ontology Develop ontology

81  

Page 82: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Hands-­‐on  task  1  •  Goal:  to  get  familiar  with  the  first  steps  in  the  Linked  

Data  genera=on  process  •  The  students  will  have  to  take  their  selected  dataset(s)  

and  perform  the  following  tasks:    –  Analyse  Data  Set  

•  Both  the  data  (quan==es,  value  ranges,  etc.)  and  the  schema  –  Analyse  Licensing  of  the  Data  Source  

•  Who  is  the  publisher  and  the  rightsholder?  •  What  is  the  licence?  •  Which  will  be  the  license  to  be  used  for  the  generated  dataset?  

–  Define  Resource  Naming  Strategy  •  For  the  ontology  and  the  data  (URI  form,  content  nego=a=on,  URIs  domain,  path,  paQerns,  etc.)  

–  Finish  Ontology  Development    •  Lightweight  ontology  (i.e.,  classes,  proper=es,  domains  and  ranges)  

82  

Page 83: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Hands-­‐on  task  1  -­‐  Deliverables  

•  A  document  that  includes:  – The  analyses  performed  over  the  data  source  – The  licensing  of  the  data  source  and  the  poten=al  license  

– The  resource  naming  strategy  defined  

•  An  OWL  file  with  the  ontology  developed,  according  to  the  resource  naming  strategy  defined    

83  

Page 84: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Hands-­‐on  task  2  •  Goal:  to  get  familiar  with  the  transforma=on  of  CSV  data  into  RDF  using  LODRefine  

•  The  students  will  have  to  take  their  selected  dataset(s)  and  perform  the  following  tasks:    –  Import  data  into  LODRefine  – Analyse  and  fix  data    

•  Analysis  performed  in  the  previous  class,  but  can  be  updated  with  new  findings  

•  Fix  the  data  to  remove  errors    •  Transform  the  data  to  facilitate  RDF  genera=on  

–  Export  data  to  RDF    •  Define  an  RDF  skeleton  for  the  data    •  Export  the  data  to  RDF  (Turtle  syntax)    

84  

Page 85: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

Hands-­‐on  task  2  -­‐  Deliverables  

For  each  dataset:  •  An  RDF  file  in  the  Turtle  syntax  with  the  data  transformed  into  RDF  

85  

Page 86: Linked Data Generation Process

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

LD4SC  Summer  School  7th  -­‐  12th  June,  Cercedilla,  Spain  

1st  Summer  School  on    Smart  Ci2es  and  Linked  Open  Data  (LD4SC-­‐15)  

Thank  you  for  your  aQen=on!