dbpedia+ / dbpedia meeting in dublin

16
DBpedia (in) ALIGNED From DBpedia to DBpedia + Dimitris Kontokostas AKSW Group, Leipzig University DBpedia Association

Upload: dimitris-kontokostas

Post on 15-Jul-2015

243 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: DBpedia+ / DBpedia meeting in Dublin

DBpedia (in) ALIGNED

From DBpedia to DBpedia+

Dimitris Kontokostas

AKSW Group, Leipzig University

DBpedia Association

Page 2: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia @ 2007

Page 3: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia @ 2008

Page 4: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia @ 2009

Page 5: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia @ 2010

Page 6: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia @ 2011

Page 7: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia @ 2014

Page 8: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

RDF Stats (2014 release)

3B facts (only 580M facts in English)● DBpedia En: 4.58M Things / 4.22M typed● 125 Localized versions: 38.3M Things● 50M links to other datasets

Many more stats @:dbpedia.org/Datasets2014/DatasetStatistics

Page 9: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

Dev Stats

DBpedia Information Extraction Framework● Java/Scala based framework

○ Old PHP-based framework● 5.1K Commits● 52K lines of code (100K/1M AT)● 71 total contributors

Many more stats @:www.openhub.net/p/dbpedia

Page 10: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

Aligning Problem

Lot’s of code & a lot more data● Wikipedia evolves over time

○ Infobox Templates change, merge, deleted○ New formatting templates○ Structural differences per language edition

● Code should adapt to all the changes○ hard at this (data) scale

Page 11: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

Unit-testing to the rescue?

● Software & Data testing● Straightforward for software (since 70’s)● Preliminary for (RDF) data

○ RDFUnit, SPIN, OWL, PelletICV, ShEx,...■ W3C Data Shapes WG

Data testing++● Generation: manual, (Semi)automatic, ...● Linking: data & software tests

Page 12: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

RDFUnit

http://rdfunit.aksw.org

Page 13: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

UT feedback loop

Data verification and feedback at different data extraction stages● Three main points of failure in DBpedia:

○ Code○ Infobox mappings○ Wikipedia (!!!)

Page 14: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

DBpedia+ Workflow

Page 15: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

Additional feedback

We are looking into:● Reporting● Statistics● Inter-Wikipedia cross-checking● ML techniques

Page 16: DBpedia+ / DBpedia meeting in Dublin

February 9th 2015 / 3rd DBpedia Meeting in Dublin

Thank you & Questions?

ALIGNEDAligned, Quality-centric Software and Data

Engineering