vanessa lopez linked data and search

22
IBM Research – Ireland © 2012 IBM Corporation Linked Data and Search Vanessa Lopez Smarter Ci*es Technology Centre IBM Research Ireland

Upload: dub-linked

Post on 12-Jan-2015

162 views

Category:

Technology


3 download

DESCRIPTION

Dublinked Technical Workshop - Linked Data & Search by Vanessa Lopez

TRANSCRIPT

Page 1: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Linked  Data  and  Search  

Vanessa  Lopez    Smarter  Ci*es  Technology  Centre  

IBM  Research  Ireland  

Page 2: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Provides  explicit  seman9cs  

Extensible  

Interoperability-­‐focused:  to  enable  automa9c  discovery  and  inges9on  

Large  exis9ng  corpora  

Fundamentally  incremental  (like  the  Web)  

W3C  standard  representa9on  and  common  format  

Government  push  (e.g.  data.gov,  data.gov.uk,  Linked  Government  Data)  

Background:  Why  Linked  Data  

Page 3: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Yes,  yes..  Richer  structured  queries  but  ..  

   

..  Limited  usability  for  both  data  publishers  and  consumers    

Page 4: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

How can  we  help  users  in  querying    and  exploring  the  Seman9c  Web  content?  

 

Page 5: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

State  of  the  art  •  Seman9c  search  over  messy,  heterogeneous  data  and  mash-­‐ups  

•  Exploratory  and  Faceted  systems  •  Query  Builders  and  rela9onship  finders    •  Ques9on  Answer  over  Linked  Data  sources  •  Google  knowledge  graph    hVp://technologies.kmi.open.ac.uk/poweraqua  

Page 6: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

State  of  the  art  

Page 7: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

What  makes  City  Data    so  special?  

How  can  we  make  it  more  accessible?    

Linked  Data  and  Search  -­‐  Problem  domain:  

Page 8: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Seman9c  processing  of  urban  data  –  why  is  different?  

•  How  can  we  go  from  raw  data  to  insight  into  the  opera9on  of  a  city  with  minimal  effort?  

Return-­‐on-­‐Investment  (because  data  integra9on  is  expensive)  

Fit-­‐for-­‐all  (ci9zen  engagement)  

Page 9: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Challenges:  Big  city  data  Volume  •  Lots  of  relevant  informa*on  

• Not  linked  to  authorita*ve  sources  

Velocity  •  Streams  •  Frequent  updates  

Variety  • Different  models  and  file  formats  

• Open  domain  -­‐  Unknown  schema  

Veracity  • Diverse  sources  • Difficult  to  do  assess  quality  

Page 10: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Business  case:  open  data  as  a  means  to  an  end  

Page 11: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

• Why  are  ambulances  late?  

Business  case    

• 100’s  of  datasets  from  four  municipal  authori9es  in  Dublin  • Most  sta9c,  some  dynamic  

• Social  Media:  twiVer,  LiveDrive,  even_ul,  eventBright,  …  • Linked  Data:  DBpedia,  ..  • Vocabularies:  IPSV,  FOAF,  VOID,  PROV,  DCAT,  WSG  

Sources  of  informa*on  

• Loca9ons  of  Health  Services  • Ambulance  call  outs  and  response  9mes  • Tweets  about  traffic  conges9on  • Geo-­‐located  tweets  about  people  movement  • Road  network  • Event  Web  Services  • …  

Domain  of  informa*on  

Page 12: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Issues  

•  Linked  Data  to  enrich  data  and  give  contextual  insight  for  publishers  and  consumers:  – Publish  (vocabularies,  annota9on)  – Discovery  and  Search  (metadata  /  cataloguing,  full-­‐text  indexing,  seman9c  en99es)  

– Link  (schema  alignment,  linked  data,  social  media)  – Extract  interes9ng  views  – Reason  (diagnose  traffic  problems)  

 

Ubiquitous  aspects:  Provenance,  Governance,  Performance,  Security,  Privacy  

Page 13: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Approach–  Data  model  

Documents  +  Metadata  

Structure   En**es   Links   Views   Insight  

Tabular  Graph  C1  a  Cell  C1  inRow  r1  C1  value  “name”  

 …  

En9ty  Graph  e1  a  En9ty  e1  inRow  r1  e1  inCol  c2  

 …  

Annota9on  Graph  e1  a  En9ty  e1  rdfs:label  “name”  e1  addr  “X  st”  e1  lat  :53.23”      …  

Mapping  Graph  e1  a  En9ty  e1  sameAs  e2  …  

Pay-­‐as-­‐you-­‐go,  Gain-­‐as-­‐you-­‐go  

•  Structured  metadata  -­‐>  Queries  over  the  metadata  •  Files  into  a  standard  representa9on  -­‐>  Queries  over  the  data.  •  Par9ally  integrate  schemata  -­‐>  Queries  across  datasets.  •  Integrate  globally  -­‐>  Queries  across  Web  data  

Page 14: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Discovery:  Publishing  and  Cataloguing  

•  METADATA  – Many  data  publishers  and  disconnected  datasets  – Link  metadata  using  domain  vocabularies:  IPSV  – Convert  to  simple  RDF  format  

 

Vocabulary  matching  

IPSV  

Page 15: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Page 16: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Search  and  linking  

Mining  descrip9ons  

Full  text  indexing  

En9ty  linking  

Open  metadata  

•  Full  text  indexing  for  search  over  metadata  and  content  •  En9ty  linking  and  naviga9on  (keywords,  categories,  publishing  agencies,  regions,..)  

•  Open  metadata  and  vocabularies  (VOID,  PROV,  etc)  for  data  discovery  and  linking  

•  Mining  descrip9ons  (Dbpedia  spotlight)    

Page 17: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Faceted  search:  “beaches  in  Fingal”  

Page 18: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Page 19: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Content  integra9on  •  Incrementally  lij  data  content  (beyond  search  to  querying  across  datasets  content)  – Extract  en99es  represented  in  RDF  (PAYGO)  – Label  extrac9on  and  annota9on  – Link  when  we  have  higher  confidence  (lat,  long)  – Geo-­‐coding  and  taxonomy  of  tweets  (traffic)  

Minimal  Entry  cost  Provenance-­‐based  dataset  ranking  

Geocoding  Label  extrac9on  

Page 20: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Views  •  Beyond  search  to  guiding  the  user  to  create  meaningful  views:  – Guide  the  users  to  annotate  data,  recommend  related  datasets  and  create  dataviews  on  the  fly  

– Ranking  and  context-­‐based  recommenda9ons  – Allow  seman9c  based  analysis  on  mul9ple  views     Hidden  informa9on  discovery  

Mul9ple  endpoints  

Cross  domain  queries  

Mul9ple  interpreta9ons  

Page 21: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Demo  

•  Currently:  Web  services  and  technology  demonstrator  

•  Next:  Open  RDF-­‐based  data  management  deployed  in  Dublin  City  (read/write).  Deployment  of  traffic  diagnoser.  

•  SPUD:  Seman*c  Processing  of  Urban  Data  (2nd  prize  at  the  Seman*c  Web  Challenge  –  ISWC)  

•  Live  demo:  www.dublinked.ie/sandbox/Seman9cWebChall  Spyros  Kotoulas,  Vanessa  Lopez,  Raymond  Lloyd,  Marco  Luca  Sbodio,  Freddy  Lecue,  Mar;n  Stephenson,  Elizabeth  Daly,  Veli  Bicer,  Aris  Gkoulalas-­‐Divanis,  Giusy  Di  Lorenzo,  Anika  Schumann,  Denis  PaFerson,  and  Pol  Mac  Aonghusa    

 

Page 22: Vanessa lopez   linked data and search

IBM Research – Ireland

© 2012 IBM Corporation

Thank  you!  

 •  QuerioCity:  A  Linked  Data  PlaZorm  for  Urban  Informa*on  Management  

V.  Lopez,  S.  Kotoulas,  M.  L.  Sbodio,  M.  Stephenson,  A.  Gkoulalas-­‐Divanis,  P.  Mac  Aonghusa.  In  Use  track  at  the  11th  Interna;onal  Seman;c  Web  Conference  (ISWC).  

Reference  Publica9on:  

City  Fabric  Team: