wikipedia as source of collaboratively created knowledge organization systems

Download Wikipedia as source of collaboratively created Knowledge Organization Systems

If you can't read please download the document

Upload: jakob-

Post on 16-Apr-2017

1.828 views

Category:

Technology


0 download

TRANSCRIPT

Wikipedia as source of collaboratively created Knowledge Organization Systems

Digitale Bibliothek

Jakob Vo

Wikipedia as source ofcollaboratively createdKnowledgeOrganization Systems

Fachhochschule Hannover25. Juni 2009

Wikipedia

de facto standard online reference

> 13 million articles, > 230 languages

run by Wikimedia, run with MediaWiki

open content (CC-BY-SA / GFDL)

its a wiki!

dense hypertext

anyone can edit (but its a media of its own)

http://de.wikipedia.org/wiki/Portal:BID

Structure of Wikipedia

articles

internal and external links

redirects and disambiguation pages

lists, portals, and navigation templates

categories

infoboxes and geodata

(bibliographic) references

revisions, flags, featured content ....

Articles

text, intro, substructure

specific structure for specific article types
(years, people etc.)

Links

[[target]] or [[target|label]]

connect on textual and conceptual level

structure of hyperlinks encodes relations

External links

links to references

links to other structured knowledge bases

authority files (for instance PND)

MusicBrainz, IMDB ...

interlanguage links to other wikipedias

Redirects and disambiguations

control synonyms and homonys

Redirects and disambiguations

Lists and Portals

list: lead section followed by a list of links to articles in a particular subject area, such as people or places, or a timeline of events

List of _ , Outline of _, Glossary of _,
Timeline of _, Index of _ ...

portal: intended to serve as Main Pages for specific topics or areas. May be associated with one or more WikiProjects.

en: ~140 featured portals of ~600 total

http://en.wikipedia.org/wiki/Portal:Featured_portals

Navigation templates

grouping of links used in multiple related articles to facilitate navigation between those articles.

Categories

Nordrhein-Westfalennach Ort

Ort als Thema

Rheinland

Kln

Kultur (Kln)

Klner Dom

Geschichte Klns

Messe Kln

Multihierarchie of categories

Tagged article (social tagging, set model)

Kategorien: Katholische Bischhofskirche (Deutschland) | Klner Dom | Weltkulturerbe in Deutschland | Geschtztes Kulturgut | Architekturikone | Gotisches Bauwerk | Historisches Bauwerk | Stadtbezirk Kln-Innenstadt | Kultbau

Categories

Infoboxes and Geodata

structured tables via MediaWiki Templates, a simple field-value-structure

used for cities, animals, bands, chemicals ...

qualiers problematic: date, unit, source...

special and popular case:
geographical coordinates

this and following slides based on Georgi Kobilarovs presentation.

Field values are not atomic

References

vast amount of bibliographic data

Wikipedia cataloguing rules (sic!)

partly structured via templates:

Examples without templates

Revisions and other metadata

Information about articles

which user changed what an which time

flagged revisions

featured content

...

Interesting data available for wiki research

Wikipedia is/are not just articles but a struc-tured system of knowledge management

And all of it is availabe for further processing!

Use as Knowledge Organization System (KOS)

WikiWord

DBPedia

Semantic Tagging

...its up to you!

Summary

WikiWord

WikiWord builds a multilingual thesaurus
by mining the link structure

Every page describes a concept

Link labels are terms refering
to those concepts

Links and categories dene relations

Multilingual by merging languages

German Thesis by Daniel Kinzler

http://brightbyte.de/page/WikiWord

WikiWord Thesaurus

English, German, French, Dutch, Norwegian

>20 millionen labels

>11 millionen concepts

>2 millionen definitions

>75 millionen related links

>11 millionen hierarchical links

Available in SKOS/RDF

Source code available to generate more

RDF is URI + Unicode + Triples [+ Rules]

"Object"@lang
"Object"^^type-URI

subject

object

Resource Description Framework

predicate

RDF example (this: SKOS)

"Ananas"@de

agro:c385skos:prefLabelskos:Conceptrdf:typeURI namespaces for abbreviation

@prefix skos: .@prefix agro: .

RDF formats

http://d-nb.info/gnd/13150794X

Zettelwirtschaft

dc:title

KrajewskiMarkus

foaf:firstName

foaf:secondName

N3

graph

@prefix foaf .

@prefix dc .


dc:title "Zettelwirtschaft" ;

dc:creator .


foaf:firstName "Markus" ;

foaf:secondName "Krajeski" .

http://d-nb.info/96327841X

dc:creator

RDF/XML format


Zettelwirtschaft

Markus Krajewski

initiative to connect and publish open collections of data with RDF on the Web

one of largest collections and main hub:

DBpedia (http://dbpedia.org)

DBPedia Extraction framework

http://dbpedia.svn.sourceforge.net (Open Source)

Wikipedia

Extraction

Triple Store

DBPedia Extraction framework

core ontology

people

places

organizations

events

works

specific infoboxes

Parsers for each field

RDF Triples

Crowd Sourced Extraction

Wikipedia

Extraction

Triple Store

Linked Data

Benutze URIs, um Objekte zu identifizieren.

Benutze HTTP URIs, so dass Objekte nachgeschlagen werden knnen.

Wenn jemand eine URI nachschlgt, stelle zweckdienliche Informationen bereit.

Biete Links zu anderen URIs, so dass weitere Objekte nachgeschlagen werden knnen.

Tim Berners Lee (2006): Linked Data Design Issues http://www.w3.org/DesignIssues/LinkedData.html

Mai 2007

Mrz 2009

September 2008

More complex queries

examples

people born in 1965 that
contributed music in films

books about these people

notation in SPARQL (SQL for RDF)

one of several ways to access Semantic Web

Beispielanfrage

Dancer in the DarkBjrkmusic

1965

born

?music

1965

born

?

Filme, deren Musik jemand gemacht hat, der im Jahr 1965 geboren wurde?

Problem: Die Prdikate music (hat-darin-Musik-gemacht) und born (ist-geboren-im-Jahr) mssenbekannt und einheitlich verwendet werden!

Vernpfungen von Quellen

Dancer in the DarkBjrkdc:creator

1965

Buch ber BjrkPND:119525054dc:subject

owl:sameAs

OPAC

DBpedia

dbpedia:birthYear

Inference rules

if ... then also ...

a frbr:creator B => B rdf:type frbr:Work

danger of inference and discrimination

Bowker and Star (1999): Sorting Things out. Classification and its consequences

Voss (2007): The Semantic Web and
why Wikipedia should bother.

reality is fuzzy, data is not

Semantic Tagging

assign controlled concepts to resources

subject indexing reinvented

practised at BBC (!) with DBPedia concepts

SKOS and CommonTags ontology

Open Issues

user interfaces for query and display

data quality (needs humans)

fuzzy concepts and mapping (e.g. languages)

versioning and changes

underestimated regularly

interesting research topics

References

Wikipedia itself (practise editing and discussion!)

Kinzler (2008): Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia.

Kobilarov, Bizer, Auer & Lehmann (2009): DBpedia - A Linked Data Hub and Data Source for Web and Enterprise Applications.

Vo (2006):Collaborative thesaurus tagging the Wikipedia way.

What do you think?

Klicken Sie, um das Format des Titeltextes zu bearbeiten

Klicken Sie, um die Formate des Gliederungstextes zu bearbeiten

Zweite Gliederungsebene

Dritte Gliederungsebene

Vierte Gliederungsebene

Fnfte Gliederungsebene

Sechste Gliederungsebene

Siebente Gliederungsebene

Achte Gliederungsebene

Neunte Gliederungsebene

Jakob Vo: Wikipedia als Grundlage zur gemeinsamen Erstellung von Begriffsnetzen
25. Juni 2009 an der Fachhochschule Hannover

Klicken Sie, um das Format des Titeltextes zu bearbeiten

Klicken Sie, um die Formate des Gliederungstextes zu bearbeiten

Zweite Gliederungsebene

Dritte Gliederungsebene

Vierte Gliederungsebene

Fnfte Gliederungsebene

Sechste Gliederungsebene

Siebente Gliederungsebene

Achte Gliederungsebene

Neunte Gliederungsebene

Die Inhalte dieser Prsentation stehen (sofern nicht weiter angegeben) von Jakob Vofreigegeben unter der Creative Commons Attribution-Share Alike 3.0 Unported Lizenz.