dbpedia - a crystallization point

27
DBpedia - A Crystallization Point for the Web of Data 2011.10.05 Junghee - Han

Upload: ambrose-melton

Post on 17-Jan-2018

265 views

Category:

Documents


0 download

DESCRIPTION

Outline The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia DBpedia - A Crystallization Point for the Web of Data

TRANSCRIPT

Page 1: DBpedia - A Crystallization Point

DBpedia - A Crystallization Point

for the Web of Data2011.10.05

Junghee - Han

Page 2: DBpedia - A Crystallization Point

2

Outline

The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia

DBpedia - A Crystallization Point for the Web of Data

Page 3: DBpedia - A Crystallization Point

3

The DBpedia Project

DBpedia 위키피디아로부터 구조화된 정보를 추출하고 , 이를

웹에서 이용할 수 있도록 만들기 위한 커뮤니티

Dbpedia is a community effort to Extract structured information from Wikipedia Make this information available on the Web under an open licenseInterlink the DBpedia dataset with other open datasets on the Web

DBpedia - A Crystallization Point for the Web of Data

Page 4: DBpedia - A Crystallization Point

4

DBpedia knowledge base Currently describes more than 2.6 million entities

- 198,000 persons - 328,000 places - 101,000 musical works - 34,000 films - 20,000 companies.

The knowledge base contains 3.1 million links to external web pages and 4.9 million RDF links into other Web data sources.

DBpedia - A Crystallization Point for the Web of Data

The DBpedia Project

Page 5: DBpedia - A Crystallization Point

5

Linked Data

참고 :

Page 6: DBpedia - A Crystallization Point

6

Linked Data

참고 :

WebBrowsers

SearchEngines

HTTP HTTP

Page 7: DBpedia - A Crystallization Point

7

Linked Data

RDF stands for

Resource : URI 를 갖는 모든 것 ( 웹페이지 , 이미지 , 동영상등 ) Description : 자원 (Resource) 들의 속성 , 특성 , 관계기술

Framework : 위의 것들을 기술하기 위한 모델 , 언어 , 문법

RDF 는 Graph Model 을 갖고 있다 .

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Page 8: DBpedia - A Crystallization Point

8

Linked Data Graph Model 예시

RDF Syntax

Triple 형식표현

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

SPARQL(Simple Protocol and RDF Query Language) W3C 에서 만든 RDF 질의 언어

Page 9: DBpedia - A Crystallization Point

Linked Data

9

1. Use URI(Uniform Resource Identifier)s as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful RDF Information4. Include RDF statements that link to other URIs so that they

can discover related things

Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html

Page 10: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

10

http://bibleontology.com/page/Bilhah

1. Use URIs as names for things

http://bibleontology.com/page/Bilhah

Page 11: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

11

http://bibleontology.com/page/Bilhah

2. Use HTTP URIs so that people can look up those names

http://bibleontology.com/page/Bilhah

Page 12: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

12

http://bibleontology.com/page/Bilhah

3. When someone looks up a URI, provide useful RDF Information

Page 13: DBpedia - A Crystallization Point

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Linked Data

13

http:// http://bibleontology.com/page/Bilhah

4. Include RDF statements that link to other URIs so that they can discover related things

Page 14: DBpedia - A Crystallization Point

14

HongGilDong

Hong, Gil Dong 35

Seoul

SemanticWeb

[hasPhotoCollection]

http://dbpedia.org/resource/Semantic_Web

http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Semantic_Web

[sameAs]

http://dbpedia.org/resource/Seoul

http://sws.geonames.org/1835848/

http://sws.geonames.org/1835848/nearby.rdf

[nearbyFeatures]

[residences]

[researches]

[name] [age]

Linked Data

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Page 15: DBpedia - A Crystallization Point

15

SPARQL

Linked Data

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

SQL

Page 16: DBpedia - A Crystallization Point

16

공간정보

여행정보

교통정보

부동산정보문화재정

문헌정보토지정보

환경정보

XXX 정보

상품정보

일자리정보

단절된 국가 공공정보

공간정보

여행정보

교통정보

부동산정보문화재정

문헌정보토지정보

환경정보

XXX 정보

상품정보

일자리정보

연결된 국가 공공정보

포털 및 언론 대학 기타

민간 정보

DBPedia BBC etc해외 정보

여행정보 공간정보 문헌정보 환경정보 XXX 정보국가 공공정보

Linked Data

참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data

Page 17: DBpedia - A Crystallization Point

17

Wikipedia Content

Title

Description

Languages

Web Links

Categorization

Domain specificData

Images

Infoboxes

DBpedia - A Crystallization Point for the Web of Data

Page 18: DBpedia - A Crystallization Point

Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not maintained anymore

18

The DBpedia Knowledge Extraction Framework(1/2)

Labels(title,rdfs:label)Abstracts(first paragraph,rdfs:comment)Interlanguage links. Images. Redirects. Disambiguation(depedia:disambiguates) External links(dbpedia:reference)Page links(dbpedia:wikilink)Homepages(foaf:homepage)Geo-coordinates. Person data. PND. SKOS categories. Page ID. Revision ID. Category label. Article categories. Mappings. Infobox.

Currently 19 extractors

DBpedia - A Crystallization Point for the Web of Data

Page 19: DBpedia - A Crystallization Point

19

The DBpedia Knowledge Extraction Framework(2/2)

Two Work-Flows Dump-based extraction

-The Wikimedia Foundation publishes SQL dumps of all Wikipedia editions on a monthly basis-The dump-based workflow uses the DatabaseWikipedia page collection as the source of article texts and the N-Triples serializer as the output destination.

Live extraction

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

DBpedia - A Crystallization Point for the Web of Data

Page 20: DBpedia - A Crystallization Point

20

Infobox Extraction

dbpedia:BBC p:network_name„British Broadcasting Corporation (BBC)“

dbpedia:BBC p:country dbpedia:United_Kingdom

dbpedia:BBC p:key_people dbpedia:Michael_Lyons dbpedia:Mark_Thompson

DBpedia - A Crystallization Point for the Web of Data

Page 21: DBpedia - A Crystallization Point

The DBpedia Knowledge Base

Identifying EntitiesResources are assigned a URI according to the pattern http://dbpedia.org/resource/Name (where Name is taken from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name)

Classifying EntitiesDBpedia entities are classified within four classification schemata in order to fulfill different application requirements.

- Wikipedia Categories - YAGO - UMBEL(Upper Mapping and Binding Exchange Layer) - DBpedia Ontology Describing Entities

Every DBpedia entity is described by a set of general properties

21DBpedia - A Crystallization Point for the Web of Data

Page 22: DBpedia - A Crystallization Point

Accessing the DBpedia Knowledge Base over the Web

Linked Data DBpedia resource identifiers(ex: http://dbpedia.org/resource/Berlin) SPARQL Endpoint

http://dbpedia.org/sparql

22

RDF Dumps http://wiki.dbpedia.org/Downloads32

Lookup Index http://lookup.dbpedia.org/api/search.asmx

DBpedia - A Crystallization Point for the Web of Data

Page 23: DBpedia - A Crystallization Point

Interlinked Web Content

23DBpedia - A Crystallization Point for the Web of Data

Currently contains 4.9 million outgoing RDF links

Page 24: DBpedia - A Crystallization Point

Applications facilitated by Dbpedia(1/3)

Browsing and Exploration DBpedia Mobile

24DBpedia - A Crystallization Point for the Web of Data

Page 25: DBpedia - A Crystallization Point

Applications facilitated by Dbpedia(2/3)

Querying and Search DBpedia Query Builder

.

25

http://querybuilder.dbpedia.orgDBpedia - A Crystallization Point for the Web of Data

Page 26: DBpedia - A Crystallization Point

Applications facilitated by Dbpedia(3/3)

Querying and Search Relationship Finder

.

26DBpedia - A Crystallization Point for the Web of Data

Page 27: DBpedia - A Crystallization Point

ConclusionThe resulting DBpedia knowledge base covers a wide range of different domains and connects entities across these domains.

27DBpedia - A Crystallization Point for the Web of Data

Future WorkCross-language infobox knowledge fusion

- Derive an astonishingly detailed multi-domain knowledge baseWikipedia article augmentation

- Develop a MediaWiki extension that augments Wikipedia articles with additional information as well as media items (pictures, audio) from these sourcesWikipedia consistency checking

- Improve the overall quality of Wikipedia

Conclusions and Future Work