semantic media mining seminar - kickoff

43
Semantic Media Mining Bachelor Seminar - WS 2015/16 Dr. Harald Sack / Jörg Waitelonis / Magnus Knuth / Tamara Bobic / Dinesh Reddy / Tabea Tietz Hasso-Plattner-Institut für Softwaresystemtechnik

Upload: harald-sack

Post on 14-Feb-2017

714 views

Category:

Education


0 download

TRANSCRIPT

Semantic Media MiningBachelor Seminar - WS 2015/16

Dr. Harald Sack / Jörg Waitelonis / Magnus Knuth /Tamara Bobic / Dinesh Reddy / Tabea Tietz

Hasso-Plattner-Institut für Softwaresystemtechnik

Semantic Media Mining

1. Tutors

2. Semantic Media Mining

3. Seminar Challenges

4. Administrative Issues

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

● Head of Research Group “Semantic Technologies”

● Senior Researcher at Hasso Plattner Institute (HPI)

○ Research Topics

■ Semantic Web Technologies

■ Ontological Engineering

■ Information Retrieval

■ Multimedia Analysis & Retrieval

■ Knowledge Mining

■ Data/Information Visualization

○ Research Projects

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningDr. Harald Sack

My apologies for not attending

the first seminar lectures!

I’m representing HPI and our

Semantic Web Technology Research at the

following conferences:

Semantic Media MiningDr. Harald Sack

Semantic Media MiningDipl. Inform. Jörg Waitelonis

● Computer Science Univ. of Jena, 2006

● 2006-2008 Start-up Activities (osotis, yovisto)

● Developer for Multimedia Portal ETH-Zürich, CH

● Since 2009 at HPI

● Research: Semantic Web, Linked Data, Multimedia-Retrieval,

Semantic Search Technologies

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningDipl.-Inf. Magnus Knuth

● Studied Information Science @ Uni Leipzig

● 2007-2010: Research Assistant @ Institute for Medical Informatics,

Statistics and Epidemiology Leipzig (imise)

● since 2010: PhD student @ HPI

○ Semantic Web, Linked Data Cleansing, Linked Data Change

Management, Knowledge Management, Read-Write-Web

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningTamara Bobic, M.Sc.

● B.Sc. in Computer Science @ Belgrade, Serbia

● M.Sc. “Life Science Informatics” @ Uni Bonn

● PhD student @ HPI (since June, 2014)

● Research interests:

Semantic Web, Fact Ranking, Recommender Systems,

Knowledge Engineering

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningDinesh Reddy, M.Sc.

● Studied Life Sc Informatics @ Uni Bonn until March, 2014

● 2012-2014 Research Assistant @ Fraunhofer Institute for Algorithms

and Scientific Computing, St. Augustin

● since May, 2014 PhD student @ HPI

○ Semantic Web Technologies, Knowledge Engineering, Linked

Data, Temporal Mining

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningTabea Tietz, B.A.

● 2014: B.A. Economics and Social Science @ Potsdam University

● since 2014: M.A. studies @ Potsdam University

● 2010 - 2015: Student coworker @ HPI

● 2014 - 2015: Scholarship @ MIZ-Babelsberg

● since 2015: Scientific coworker @ HPI

● Interests: Semantic Web, Linked Data, DBpedia, Visualization

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningSemantic Technologies and Multimedia Research Group

Seminar Semantic Multimedia, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, SS 2015

http://semex.hpi.uni-potsdam.de/semex/

Semantic Media MiningSemantic Technologies and Multimedia Research Group

Seminar Semantic Multimedia, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, SS 2015

http://semex.hpi.uni-potsdam.de/mggui-dev2/#search

Seminar Knowledge Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, SS 2015

http://blog.yovisto.com/the-hero-of-mushroom-kingdom-turns-27-super-mario/

Semantic Media MiningSemantic Technologies and Multimedia Research Group

Semantic Media MiningSemantic Technologies and Multimedia Research Group

http://commons.dbpedia.org/

http://dbpedia.org/

http://de.dbpedia.org/

Semantic Technologies Research Group Bloghttp://s16a.org/ http://linkeddata.org/

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media Mining

1. Tutors

2. Semantic Media Mining3. Seminar Challenges

4. Administrative Issues

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

The Semantic Web

● Extension of the WWW with formal Knowledge Representations

(Ontologies)

● Information in natural language is explicitly annotated with

semantic Metadata

● Semantic Metadata encode the Meaning (Semantics) of the

information content and can be read and correctly interpreted

(=understood) by machines

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

● Semantics

○ OWL, RDFS, SKOS, ...

● Model = RDF

● Syntax

○ N3, Turtle, XML

○ RDFa, JSON-LD

● Web Platform

○ URI/IRI, HTTP

○ UNICODE, AUTH

Semantic Media Mining

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media Mining

http://dbpedia.org/resource/Neil_Armstrong

Semantic Media Mining

structured data

semantic data

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media Mininghttp://dbpedia.org/resource/Neil_Armstrong

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media Mining

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

RDF - Resource Description Framework

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

http://dbpedia.org/resource/Neil_Armstrong

Neil Armstrong

http://dbpedia.org/ontology/Astronaut

Astronaut

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

Subject Property Object

rdf:type

RDF Tripel

URIs for unique Identification

http://dbpedia.org/resource/Neil_Armstrong

http://dbpedia.org/ontology/Astronaut

is a

http://dbpedia.org/ontology/Person

is a subclass of

Classes

Entities

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Classes vs. Entitieshttp://dbpedia.org/resource/Neil_Armstrong

http://dbpedia.org/ontology/Person

Classes

Entities

is a

has birthdatexsd:date

http://dbpedia.org/ontology/City

has birthplace

has birthdate“1930-08-05”

has birthplace

http://dbpedia.org/resource/Wapakoneta,_Ohio

is ais a

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

RDF – Resource Description Format

● Triple

○ Subject

○ Property

○ Object

RDF Statement

Subject + Property + Object URI URI URI or Literal

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

RDF – Turtle Serialization (1)

<http://dbpedia.org/resource/Neil_Armstrong>

<http://dbpedia.org/ontology/Astronaut> .

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

S

P

O

<http://dbpedia.org/resource/Neil_Armstrong>

“1930-08-05” .

<http://dbpedia.org/ontology/birthDate>

S

P

O

Resource

Literal

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

RDF – Turtle Serialization (2)

@prefix dbr: <http://dbpedia.org/resource> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbo: <http://dbpedia.org/ontology/> .

dbr:Neil_Armstrong rdf:type dbo:Astronaut .dbr:Neil_Armstrong dbo:birthDate “1930-08-05” .

@prefix dbr: <http://dbpedia.org/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbo: <http://dbpedia.org/ontology/> .

dbr:Neil_Armstrong rdf:type dbo:Astronaut ; dbo:birthDate “1930-08-05” .

http://www.w3.org/TR/rdf11-primer/ Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media Mining

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

http://dbpedia.org/resource/Neil_Armstrong

Semantic Media Mining

1. Tutors

2. Semantic Media Mining

3. Seminar Challenges

4. Administrative Issues

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningSeminar Challenges

1. Sound2Triple - Convert audio features to RDF2. DBpedia - Mining Implicit Knowledge

3. Temporal Information Extraction

4. Important knowledge coverage in the LOD cloud

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Seminar Challenges1. Sound2Triple - Convert audio features to RDF

● detect different kinds of audible events:

○ silence

○ speech

○ music

○ changes in dynamics

○ changes in speed

○ etc.

● use existing tools, e. g. http://essentia.upf.edu/ http://www.praat.org/ http://clam-project.org/

● create a RDF annotation using the Open Annotation Model

● visualize the annotation, if possible in real-time http://www.w3.org/TR/webaudio/

○ inspired by http://ianreah.com/2013/02/28/Real-time-analysis-of-streaming-audio-data-with-Web-Audio-API.html

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Seminar Challenges1. Sound2Triple - Convert audio features to RDF

increasing volume

Open Annotation Model

http://example.org/audio.mp3#t=23,55

Media Fragment Identifier<http://example.org/A-1> oac:hasTarget <http://example.org/audio.mp3#t=23,55> .<http://example.org/A-1> oac:hasBody <http://example.org/event1> .<http://example.org/event1> cnt:chars “increasing volume” .<http://example.org/event1> cnt:characterEncoding “utf-8” .<http://example.org/event1> rdf:type cnt:ContentAsText .

RDF Triples:

Seminar Challenges1. Sound2Triple - Convert audio features to RDF

Visualization: E.g. consume the triple stream and visualize events on a timeline

event 1 event 2event 3event 4 ...

● Wikipedia Categories of Wilhelm Conrad Röntgen:

● Triples extracted from this article section in article-categories_en.ttl:<http://dbpedia.org/resource/Wilhelm_Röntgen> <http://purl.org/dc/terms/subject>

<http://dbpedia.org/resource/Category:1845_births> ,

<http://dbpedia.org/resource/Category:1923_deaths> ,

<http://dbpedia.org/resource/Category:Wilhelm_Röntgen> ,

<http://dbpedia.org/resource/Category:People_from_Remscheid> ,

<http://dbpedia.org/resource/Category:ETH_Zurich_alumni> ,

<http://dbpedia.org/resource/Category:Experimental_physicists> ...

Seminar Challenges2. DBpedia - Mining Implicit Knowledge from DBpedia Categories

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

https://en.wikipedia.org/wiki/Wilhelm_R%C3%B6ntgenhttp://dbpedia.org/resource/Wilhelm_R%C3%B6ntgen

Seminar Challenges2. DBpedia - Mining Implicit Knowledge from DBpedia Categories

● DBpedia category memberships contain implicit information about a resource, e.

g. Wilhelm Conrad Röntgen:

● Task:

○ Learn relationships from DBpedia properties and categories

○ Extract implicit facts for category members

○ Find inconsistencies, e.g. 1845_births ⇔ dbo:birthDate “1890-12-24”

category fact

1845_births dbo:birthDate “1845-03-27”

People_from_Remscheid dbo:birthPlace dbp:Remscheid

University_of_Würzburg_faculty dbo:workInstitutions dbp:University_of_Würzburg (implicit / missing)

Nobel_laureates_in_Physics dbo:award dbp:Nobel_Prize_in_Physics (implicit / missing)

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

https://en.wikipedia.org/wiki/Wilhelm_R%C3%B6ntgenhttp://dbpedia.org/resource/Wilhelm_R%C3%B6ntgen

Seminar Challenges3. Temporal information extraction

Extraction of timelines or more generally, temporal sequences from text.

Tasks

● Temporal information extraction from Wikipedia articles● Infer new temporal knowledge from

existing temporal information● Classification of extracted events

References

Stanford Temporal Tagger: SUTimeJava tools - DateParserTemporal Mining - Ground truth DatasetE. Kuzey, G. Weikum: Extraction of temporal facts and events from Wikipedia. TempWeb 2012

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Timeline of Mark Zuckerberg

For example we have text as follows :Jennifer Hosten was born on 12 March 1990. She won the title Miss New York in 2010. A year later she also won the title Miss America. Jennifer is a native of Wichita, Kansas, and a 2011 graduate of Wichita High School East. Her father is Mark Wagner and her mother is Krista Wagner.

● Temporal information extraction from Wikipedia articles● 1990-03-12, Jennifer Hosten was born on 12 March 1990.● 2010, She won the title Miss New York in 2010.● 2011, Jennifer is a native of Wichita, Kansas, and a 2011 graduate of Wichita High School East.

● Infer new temporal knowledge from existing temporal information● 2011, A year later she also won the title Miss America

● Classification of extracted events● we can classify events to life, career, education events etc.

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Seminar Challenges3. Temporal information extraction

Seminar Challenges4. Important knowledge coverage in the LOD cloud

● DBpedia - a large-scale knowledge base extracted from Wikipedia

○ Most interlinked dataset in the Linked Open Data (LOD) graph

○ English version of DBpedia 2015 describes 5.9 million entities

with 737 million facts in the form of RDF triples

○ How much knowledge is actually there?

● Find explicit semantic relations:

○ Dirk Nowitzki -- team -- Dallas Mavericks

○ Amsterdam -- country -- Netherlands

○ Garry Kasparov -- ??? -- Chess

○ Dalai Lama -- ??? -- Buddhism

(sample with entities is provided)

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Seminar Challenges4. Important knowledge coverage in the LOD cloud

Tasks:

● Evaluate coverage of important facts in DBpedia (sample). Facts can be found:○ fully (Dirk Nowitzki -- team -- Dallas Mavericks),

○ partially (Garry Kasparov -- dc:subject -- Chess grandmasters),

○ indirectly (Cristiano Ronaldo -- birthPlace -- Funchal -- country -- Portugal)

● Complement missing information with interlinked datasets (Yago, CIA World Factbook)○ e.g. from Wikidata: Garry Kasparov -- sport -- Chess

● Propose new RDF triples for facts that were not fully found ○ statistically derive connecting properties (e.g. Dalai Lama -- religion -- Buddhism)

● Compare coverage of LOD to traditional knowledge bases

References:● DBpedia - A Crystallization Point for the Web of Data

● https://open.hpi.de/courses/semanticweb2014 (Week 2, Week 3)

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

● http://www.w3.org/wiki/LinkedData

● http://lod-cloud.net/

Semantic Media Mining

1. Tutors

2. Knowledge Mining

3. Seminar Challenges

4. Administrative Issues

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningAdministrational Issues

● Weekly hours: 4

○ plenary sessions and individual team meetings (about 30min. each)

● ECTS: 6

● Grading:

○ Implementation of a research application

○ Presentation of achieved results

■ Midterm presentation, final (poster-)presentation, team meetings

○ Written final report of achieved results (= seminar paper)

■ about 20 pages each group

■ we provide an introduction to scientific writing and a template

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningAdministrational Issues

● Teams of 3 (max. 4) students work on a common problem

● Schedule (possible changes tba):

○ 22.10.2015: Formation of student teams,

technical introduction

○ 29.10.2015: First team meetings

○ 10.12.2015: Midterm presentations (plenary session)

○ 04.02.2016: Final presentations (plenary session)

○ 31.03.2015: Deadline for seminar reports

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningAdministrational Issues

Bibliography:○ H. Sack: Linked Data Technologien - Ein Überblick, in T. Pellegrini, H. Sack, S. Auer

(Hrsg.), Linked Enterprise Data, Springer Vieweg, Heidelberg, 2014, pp. 21-62.

○ Jens Lehmann et al.: DBpedia - A Crystallization Point for the Web of Data, in

Journal of Web Semantics 7(3):154--165 (2009).

○ OpenHPI: Knowledge Engineering with Semantic Web Technologies

○ An Introduction to Audio Content Analysis: Applications in Signal Processing and

Music Informatics, Alexander Lerch, ISBN: 978-1-1182-6682-3

Blog with seminar material!:○ http://smm2016.blogspot.de/

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Semantic Media MiningAdministrational Issues

Homework:

● Find your groups of 3 (max 4) students and

sign up:

○ http://bit.ly/smm2016_doodle

● If possible, email us your group’s first and

second favorite seminar topic

Seminar Semantic Media Mining, Dr. Harald Sack et. al. Hasso-Plattner-Institute, University of Potsdam, WS 2015/16

Next session: 22.10.2015 - Technical Introduction