georgi kobilarov , chris bizer, sören auer, jens lehmann

72
Georgi Kobilarov , Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig

Upload: norman-owens

Post on 01-Jan-2016

21 views

Category:

Documents


2 download

DESCRIPTION

Georgi Kobilarov , Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig. Querying Wikipedia like a Database. Domain specific Data Images Infoboxes. Title Description Languages Web Links Categorization. Infobox Extraction. - PowerPoint PPT Presentation

TRANSCRIPT

Georgi Kobilarov, Chris Bizer, Sören Auer, Jens LehmannFreie Universität Berlin, Universität Leipzig

Querying Wikipedia like a Database

Title

Description

Languages

Web Links

Categorization

Domain specificData

Images

Infoboxes

Infobox Extraction

dbpedia:Albert_Einstein p:name„Albert Einstein“

dbpedia:Albert_Einstein p:birth_place dbpedia:Ulm

dbpedia:Albert_Einstein p:birth_date„ 1956-07-09“

Property Synonyms

Structuring Wikipedia‘s Knowledge

• Structuring actual data, not modeling the world

• Bound to Wikipedia Templates, parsers handle template values based on rules (property splitting, merging, transformation)

DBpedia Ontology

• DBpedia Ontology build from scratch • 170 classes, 900 properties

No living things

Class Hierarchy

„Select all TV Episodes …“

Template Mapping

Class TV Episode (Work)

Wikipedia Templates:Television EpisodeUK Office EpisodeSimpsons Episode

DoctorWhoBox

Template Mapping

Infobox CricketerInfobox Historic CricketerInfobox Recent Cricketer

Infobox Old CricketerInfobox Cricketer Biography

=> Class Cricketer (Athlete)

People

ActorsAthleteJournalistMusicalArtistPoliticianScientistWriter

Places

AirportCityCountryIslandMountainRiver

Organisations

BandCompanyEducational InstitutionRadio StationSports Team

Event

ConventionMilitary ConflictMusic EventSport Event

Work

BookBroadcastFilmSoftwareTelevision

More structured data

• Categories in SKOS• Intra-wiki links• Disambiguation• Redirects

• Links to Images (and Flickr)• Links to external webpages

• Data about 2.6 million “things”

• 274 million pieces of information (RDF triples)

MultilingualAbstracts

– English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 – Swedish: 144,000 – Chinese: 101,000

DBpedia as Linked Data Hub

Semantic Web

“My document can point at your document on the Web, but my database can't point at something in your database without writing special purpose code. The Semantic Web aims at fixing that.”

Prof. James Hendler

Web of Documents

Web Browsers

Search Engines

A B C D

HTML HTML HTMLhyperlinks

hyperlinks

hyperlinks

HTML

HTTP

Web of Data

B C

Thing

datalink

A D E

datalink

datalink

datalink

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Search Engines

Linked DataMashups

Linked DataBrowsers

HTTP HTTP

Linked Data

• Use URIs as names for things• Use HTTP URIs so that people can look up those names.• When someone looks up a URI, provide useful information.• Include links to other URIs. so that they can discover more

things.

Wikipedia Article URI:http://en.wikipedia.org/wiki/Madrid

DBpedia Resource URIhttp://dbpedia.org/resource/Madrid

HTTP URIs

Information Resources

http://dbpedia.org/page/Madrid

HTTP GET -> 200 OK

Real-World Resources

http://dbpedia.org/resource/Madrid

HTTP GET -> 303 See other http://dbpedia.org/page/Madrid http://dbpedia.org/data/Madrid

-> 200 OK

Life Sciences

Publications

Online ActivitiesMusic

Geographic

Cross-Domain

4.5 billion triples

180 million data links

Use Cases

Use Cases

1. Data Source for Web-Applications2. Querying Wikipedia like a database3. Tag Web content with concepts instead of

free-text tags4. Vocabulary and semantic backbone for

enterprise linked data integration

DBpedia as data source

• Embed structured information fromWikipedia into your web applications

• Build (mobile) maps applications using DBpedia data about places

• Display multilingual titles &descriptions in 15 languages

DBpedia Mobile

Sparql Endpoint

http://dbpedia.org/sparql

Wikipedia Query

Annotating Documents

• Use DBpedia concepts to annotate documents instead of free-text tags

• Named Entity Extraction Systems already use DBpedia URIs(OpenCalais, Muddy Boots)

• Social Bookmarking with DBpedia URIs as tags www.faviki.com

„Apple“

http://dbpedia.org/resource/Apple_Inc.

http://dbpedia.org/resource/Apple_(fruit)

http://dbpedia.org/resource/Apple_Records

Annotating Documents

• BBC editors tag news articles with DBpedia concepts

• DBpedia Lookup Servicehttp://lookup.dbpedia.org

Linking Enterprise Data

Take the Linking Open Data approach to the enterprises

• Connect data sets with DBpedia as shared vocabulary• Enable meaningful navigation paths across BBC websites• Browsing Madonna-related information across BBC News,

BBC Music, BBC Programmes, …

• Make use of the rich background information:

relate the release of a music album to a news article about the artist

Linking Enterprise Data

The Future of DBpedia

Improve Information Extraction

Croud-source Information Extraction

Crowd Sourced Extraction

Where‘s the user benefit?

Data Fusion

Cross-Language Data Fusion

• 264 Wikipedia Editions in different languages– Italian Wikipedians know more about Italian

villages– German Wikipedia contains more person

infoboxes

• Augment the infobox dataset with facts from other Wikipedia editions.

Augment DBpedia with External Data

• Linking Open Data cloud provides more data than Wikipedia– EuroStat provides additional statistical information about

countries.– Musicbrainz contains additional information about other

bands.– Geonames provides additional information about

locations.• Idea

– Augment DBpedia with additional data from external sources.

Contribute back to Wikipedia

• Opportunity– Feed data back to Wikipedia

• Extend the Wikipedia authoring environment with– Suggestions for infobox values– Cross-language consistency checking for infoboxes

• Currently going on– New maps in Wikipedia based on Dbpedia Mobil

Code (OpenStreetMap)

Contribute back to Wikipedia

• Initialize Wikipedia Clean-Up Cycles– Data-driven search interfaces expose the

weaknesses of Wikipedia template system.– Preferred items not showing up in end-user

interfaces may motivate Wikipedia editors to use templates more stringently.

Live Update

• Current Situation– DBpedia update cycle: 3 month– Wikipedia provides us with access to the live

update stream• Opportunity

– Increase the currency of the DBpedia dataset using this update stream

• Result– DBpedia in synchronization with Wikipedia.

Open Source

Open Data

What is the Wikipedia for Data?

Wikipedia is the Wikipedia for Data

Summary

http://dbpedia.org

[email protected]