HOW TO INTEGRATE LINKED DATA INTO YOUR wifo5-03. TO INTEGRATE LINKED DATA INTO YOUR APPLICATION LDIF Team: ... LOV Linked Open Numbers LODE ... Managed download and update

Download HOW TO INTEGRATE LINKED DATA INTO YOUR wifo5-03.  TO INTEGRATE LINKED DATA INTO YOUR APPLICATION LDIF Team: ... LOV Linked Open Numbers LODE ... Managed download and update

Post on 25-Mar-2018

217 views

Category:

Documents

5 download

Embed Size (px)

TRANSCRIPT

  • |

    HOW TOINTEGRATE LINKED DATAINTO YOUR APPLICATION

    LDIF Team: Andreas Schultz, Freie Universitt Berlin

    Andrea Matteini, mes|semanticsRobert Isele, Freie Universitt Berlin

    Pablo N. Mendes, Freie Universitt BerlinChristian Becker, mes|semantics

    Christian Bizer, Freie Universitt Berlin

    With contributions by:Hannes Mhleisen, Freie Universitt Berlin; William Smith, Vulcan Inc.

    SEMANTIC TECHNOLOGY & BUSINESS CONFERENCESAN FRANCISCO, JUNE 5, 2012

  • |

    Raw data (RDF)

    Accessible on the web

    Data can link to other data sources

    Benefits: Ease of access and re-use; enables discovery

    One API for all data sources?

    WHAT IS LINKED DATA?

    Thing

    Thing

    Thing

    Thing

    Thing

    Thing

    A B C

    Thing

    Thing

    Thing

    Thing

    D E

    data link data link data link data link

  • |

    LINKING OPEN DATA CLOUD

    As of September 2011

    Media

    Geographic

    Publications

    Government

    Cross-domain

    Life sciences

    User-generated content

    MusicBrainz

    (zitgist)

    P20

    Turismo de

    Zaragoza

    yovisto

    Yahoo! Geo

    Planet

    YAGO

    World Fact-book

    El ViajeroTourism

    WordNet (W3C)

    WordNet (VUA)

    VIVO UF

    VIVO Indiana

    VIVO Cornell

    VIAF

    URIBurner

    Sussex Reading

    Lists

    Plymouth Reading

    Lists

    UniRef

    UniProt

    UMBEL

    UK Post-codes

    legislationdata.gov.uk

    Uberblic

    UB Mann-heim

    TWC LOGD

    Twarql

    transportdata.gov.

    uk

    Traffic Scotland

    theses.fr

    Thesau-rus W

    totl.net

    Tele-graphis

    TCMGeneDIT

    TaxonConcept

    Open Library (Talis)

    tags2con delicious

    t4gminfo

    Swedish Open

    Cultural Heritage

    Surge Radio

    Sudoc

    STW

    RAMEAU SH

    statisticsdata.gov.

    uk

    St. Andrews Resource

    Lists

    ECS South-ampton EPrints

    SSW Thesaur

    us

    SmartLink

    Slideshare2RDF

    semanticweb.org

    SemanticTweet

    Semantic XBRL

    SWDog Food

    Source Code Ecosystem Linked Data

    US SEC (rdfabout)

    Sears

    Scotland Geo-

    graphy

    ScotlandPupils &Exams

    Scholaro-meter

    WordNet (RKB

    Explorer)

    Wiki

    UN/LOCODE

    Ulm

    ECS (RKB

    Explorer)

    Roma

    RISKS

    RESEX

    RAE2001

    Pisa

    OS

    OAI

    NSF

    New-castle

    LAASKISTI

    JISC

    IRIT

    IEEE

    IBM

    Eurcom

    ERA

    ePrints dotAC

    DEPLOY

    DBLP (RKB

    Explorer)

    Crime Reports

    UK

    Course-ware

    CORDIS (RKB

    Explorer)CiteSeer

    Budapest

    ACM

    riese

    Revyu

    researchdata.gov.

    ukRen. Energy Genera-

    tors

    referencedata.gov.

    uk

    Recht-spraak.

    nl

    RDFohloh

    Last.FM (rdfize)

    RDF Book

    Mashup

    Rdata n!

    PSH

    Product Types

    Ontology

    ProductDB

    PBAC

    Pok-pdia

    patentsdata.go

    v.uk

    OxPoints

    Ord-nance Survey

    Openly Local

    Open Library

    OpenCyc

    Open Corpo-rates

    OpenCalais

    OpenEI

    Open Election

    Data Project

    OpenData

    Thesau-rus

    Ontos News Portal

    OGOLOD

    JanusAMP

    Ocean Drilling Codices

    New York

    Times

    NVD

    ntnusc

    NTU Resource

    Lists

    Norwe-gian

    MeSH

    NDL subjects

    ndlna

    myExperi-ment

    Italian Museums

    medu-cator

    MARC Codes List

    Man-chester Reading

    Lists

    Lotico

    Weather Stations

    London Gazette

    LOIUS

    Linked Open Colors

    lobidResources

    lobidOrgani-sations

    LEM

    LinkedMDB

    LinkedLCCN

    LinkedGeoData

    LinkedCT

    LinkedUser

    FeedbackLOV

    Linked Open

    Numbers

    LODE

    Eurostat (OntologyCentral)

    Linked EDGAR

    (OntologyCentral)

    Linked Crunch-

    base

    lingvoj

    Lichfield Spen-ding

    LIBRIS

    Lexvo

    LCSH

    DBLP (L3S)

    Linked Sensor Data (Kno.e.sis)

    Klapp-stuhl-club

    Good-win

    Family

    National Radio-activity

    JP

    Jamendo (DBtune)

    Italian public

    schools

    ISTAT Immi-gration

    iServe

    IdRef Sudoc

    NSZL Catalog

    Hellenic PD

    Hellenic FBD

    PiedmontAccomo-dations

    GovTrack

    GovWILD

    GoogleArt

    wrapper

    gnoss

    GESIS

    GeoWordNet

    GeoSpecies

    GeoNames

    GeoLinkedData

    GEMET

    GTAA

    STITCH

    SIDER

    Project Guten-berg

    MediCare

    Euro-stat

    (FUB)

    EURES

    DrugBank

    Disea-some

    DBLP (FU

    Berlin)

    DailyMed

    CORDIS(FUB)

    Freebase

    flickr wrappr

    Fishes of Texas

    Finnish Munici-palities

    ChEMBL

    FanHubz

    EventMedia

    EUTC Produc-

    tions

    Eurostat

    Europeana

    EUNIS

    EU Insti-

    tutions

    ESD stan-dards

    EARTh

    Enipedia

    Popula-tion (En-AKTing)

    NHS(En-

    AKTing) Mortality(En-

    AKTing)

    Energy (En-

    AKTing)

    Crime(En-

    AKTing)

    CO2 Emission

    (En-AKTing)

    EEA

    SISVU

    education.data.g

    ov.uk

    ECS South-ampton

    ECCO-TCP

    GND

    Didactalia

    DDC Deutsche Bio-

    graphie

    datadcs

    MusicBrainz

    (DBTune)

    Magna-tune

    John Peel

    (DBTune)

    Classical (DB

    Tune)

    AudioScrobbler (DBTune)

    Last.FM artists

    (DBTune)

    DBTropes

    Portu-guese

    DBpedia

    dbpedia lite

    Greek DBpedia

    DBpedia

    data-open-ac-uk

    SMCJournals

    Pokedex

    Airports

    NASA (Data Incu-bator)

    MusicBrainz(Data

    Incubator)

    Moseley Folk

    Metoffice Weather Forecasts

    Discogs (Data

    Incubator)

    Climbing

    data.gov.uk intervals

    Data Gov.ie

    databnf.fr

    Cornetto

    reegle

    Chronic-ling

    America

    Chem2Bio2RDF

    Calames

    businessdata.gov.

    uk

    Bricklink

    Brazilian Poli-

    ticians

    BNB

    UniSTS

    UniPathway

    UniParc

    Taxonomy

    UniProt(Bio2RDF)

    SGD

    Reactome

    PubMedPub

    Chem

    PRO-SITE

    ProDom

    Pfam

    PDB

    OMIMMGI

    KEGG Reaction

    KEGG Pathway

    KEGG Glycan

    KEGG Enzyme

    KEGG Drug

    KEGG Com-pound

    InterPro

    HomoloGene

    HGNC

    Gene Ontology

    GeneID

    Affy-metrix

    bible ontology

    BibBase

    FTS

    BBC Wildlife Finder

    BBC Program

    mes BBC Music

    Alpine Ski

    Austria

    LOCAH

    Amster-dam

    Museum

    AGROVOC

    AEMET

    US Census (rdfabout)

    http://lod-cloud.net

    http://lod-cloud.nethttp://lod-cloud.net

  • |

    TYPES OF LINKED DATA

    Linked Enterprise

    Data

    Open,Public Data

    (LOD Cloud)

    Commercial Linked Data

    VERY SOON?

    Provide interfaces on top of them

    Augment your website

    Integrate them into your application logic

    Create specialized data marts

    ... AND WHAT YOU CAN DO WITH THEM

  • |

    AUGMENT YOUR WEBSITE: BBC

    BBC online properties make intensive use of data from Wikipedia and MusicBrainz

  • |

    DATA MARTS: NEUROWIKI

    NeuroWiki creates views for genes, drugs and diseases data from four RDF data sources

    Provides navigation and composition tools for accessing and mining the data

  • |

    APPLICATION LOGIC: IBM WATSON

    IBM Watson makes use of Linked Data sources such as DBpedia

    http://www.flickr.com/photos/ibm_media/

    http://www.flickr.com/photos/ibm_media/http://www.flickr.com/photos/ibm_media/

  • |

    4 STEPS TOLINKED DATA INTEGRATION

  • |

    STEP #1:ACCESS LINKED DATA

    Linked Data is published via HTTP, SPARQL endpoints, RDF dumps

    Live access allows quick prototyping and limited production use

    As data sets grow in size and more data sources are added, a crawling/caching architecture often becomes necessary

    ArchitectureAccess MethodsAccess MethodsAccess Methods Decision FactorsDecision FactorsDecision FactorsDecision Factors

    Architecture HTTP Dereferencing SPARQL

    Dump import Recency Speed / Scalability Reliability Complexity

    On-The-Fly Dereferencing

    X High Low Low High

    Query Federation X High

    Decreases exponentially as new sources are added

    LowModerate with SPARQL 1.1 SERVICE clause

    Crawling and Caching X X X Depends High High High

    Adapted from: Linked Data: Evolving the Web into a Global Data Space (Heath/Bizer 2011)

  • |

    STEP #1:ACCESS LINKED DATA

    Implementations:

    On-the-fly dereferencing

    LDspider, SQUIN, Semantic Web Client library

    Query federation

    SPARQL 1.1 SERVICE clause

    Crawling and Caching

    Triplestore import script

    Public caches (e.g. Sindice, OpenLink LOD endpoint)

    LDIF

  • |

    STEP #2:NORMALIZE VOCABULARIES

    Data sources that overlap in content use a wide range of vocabularies.

    po bib swrc dcam tl mpeg7 rdfg compass wot txn metalex doap wdrs admingeo vann api org sawsdl sdmx geospecies qb xml vu-wordnet rev

    umbel uniprot http scovo void tag dbp

    bio ore dbo gr

    dbpedia event time xsd frbr

    geonames cc sioc

    vcard mo bibo

    akt xhtml

    geo skos

    foaf

    dc Over 60 % of all LOD sources use

    proprietary vocabularies

    Its up to the data consumer to normalize the vocabularies

    Enterprise: Need to translate between internal and external vocabularies

    Most widely used vocabularies in the LOD cloud (08/10/2011)Source: FU Berlin / DERI; http://www4.wiwiss.fu-berlin.de/lodcloud/state/

    file://localhost/Users/Christian/Desktop/Work/MES/Vulcan/SMW-LDE%202011/Presentations/SMWCon/SMW_logo.svgfile://localhost/Users/Christian/Desktop/Work/MES/Vulcan/SMW-LDE%202011/Presentations/SMWCon/SMW_logo.svghttp://www4.wiwiss.fu-berlin.de/lodcloud/state/http://www4.wiwiss.fu-berlin.de/lodcloud/state/

  • |

    Approaches to Schema Mapping:

    Hand-crafting queries against individual sources no different than an API

    Ontology Representation Languages: OWL, RDFS

    Rules: SWRL, RIF

    Query Languages

    SPARQL CONSTRUCT clause

    TopQuadrant SPARQLMotion

    Mosto

    R2R (part of LDIF)

    OPTIONAL { ?ow fb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } . OPTIONAL { ?ow dbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc } OPTIONAL { ?ow dbp-ont:city [ ot:preferredLabel ?city_db_cit ] }

    Source: http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php

    STEP #2:NORMALIZE VOCABULARIES

    https://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.phphttps://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php

  • |

    Using SPARQL: Rename a class

    Value transformation

    Create URI from literal

    CONSTRUCT { ?s a mo:MusicArtist} WHERE { ?s a dbpedia-owl:MusicalArtist}

    STEP #2:NORMALIZE VOCABULARIES

    CONSTRUCT { ?s movie:runtime ?runtimeInMinutes . } WHERE { ?s dbpedia-owl:runtime ?runtime . BIND(?runtime * 60 As ?runtimeInMinutes)}

    CONSTRUCT { ?s diseasome:omim ?omimuri . ?omimuri dc:identifier ?identifier .} WHERE { ?s dbpedia-owl:omim ?omim . BIND(IRI(concat(http://bio2rdf.org/omim:, ?omim)) As ?omimuri) BIND(concat(omim:, ?omim) As ?identifier)}

    Slide credits: Andreas Schultz

    http://bio2rdf.org/omimhttp://bio2rdf.org/omim

  • |

    STEP #3:RESOLVE IDENTIFIERS

    Data sources that overlap in content use different identifiers for the same real-world entity.

    1 linked data sets

    2 linked data sets

    3 linked data sets

    4 linked data sets

    5 linked data sets

    6 - 10 linked data sets

    > 10 linked data sets

    0 25 50 75 100

    27

    17

    5

    19

    38

    62

    98

    Number of linked data sets per source (08/10/2011)Source: FU Berlin / DERI; http://www4.wiwiss.fu-berlin.de/lodcloud/state/

    Most LOD sources only provide owl:sameAs links to one other data source

    Its up to the data consumer to generate additional links

    Enterprise: Need to link both internal and external resources

    file://localhost/Users/Christian/Desktop/Work/MES/Vulcan/SMW-LDE%202011/Presentations/SMWCon/SMW_logo.svgfile://localhost/Users/Christian/Desktop/Work/MES/Vulcan/SMW-LDE%202011/Presentations/SMWCon/SMW_logo.svghttp://www4.wiwiss.fu-berlin.de/lodcloud/state/http://www4.wiwiss.fu-berlin.de/lodcloud/state/

  • |

    Approaches to Identity Resolution:

    Improvised or manual merging

    Rule-based approaches:

    SILK (part of LDIF)

    LIMES

    STEP #3:RESOLVE IDENTIFIERS

    Union Sq., New YorkUnion Sq., SeattleUnion Sq., San Francisco

    Union Square

    374

    7 N

    122

    24 W

    Union Sq.=

    Union Sq.,San Francisco

    374

    7 N

    122

    24 W

    http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://wiking.vulcan.com/neurobase/kegg_genes/resource/gene/hsa:348http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=52.518611111111_N_13.408055555556_E_region:DE-BE_type:city%283471756%29http://toolserver.org/%7Egeohack/geohack.php?pagename=Berlin&language=de&params=5...