openhpi 6.6 - named entity recognition

43
This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0 ) Dr. Harald Sack Hasso Plattner Institute for IT Systems Engineering University of Potsdam Spring 2013 Semantic Web Technologies Lecture 6: Applications in the Web of Data 06: Named Entity Recognition

Upload: harald-sack

Post on 08-Sep-2014

819 views

Category:

Documents


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: OpenHPI 6.6 - Named Entity Recognition

This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)

Dr. Harald Sack

Hasso Plattner Institute for IT Systems Engineering

University of Potsdam

Spring 2013

Semantic Web Technologies

Lecture 6: Applications in the Web of Data06: Named Entity Recognition

Page 2: OpenHPI 6.6 - Named Entity Recognition

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

2

Lecture 6: Applications in the Web of DataOpen HPI - Course: Semantic Web Technologies

Page 3: OpenHPI 6.6 - Named Entity Recognition

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

3

06 - Named Entity RecognitionOpen HPI - Course: Semantic Web Technologies - Lecture 6: Applications in the Web of Data

Page 4: OpenHPI 6.6 - Named Entity Recognition

Context

Pragmatics

Experience

Experience

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam

4

Meaning

Symbol Objectstands for

sender

receiver

refers tosymbolizes

Concept

„Jaguar“

Ogden, Richards: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism (1923)

Page 5: OpenHPI 6.6 - Named Entity Recognition

Armstrong

Page 6: OpenHPI 6.6 - Named Entity Recognition
Page 7: OpenHPI 6.6 - Named Entity Recognition

,Armstrong‘ is more than just a character string

Page 8: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

,Armstrong‘ is more than just a character string

Page 9: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

,Armstrong‘ is more than just a character string

Page 10: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Science Occupation

subClassOf

,Armstrong‘ is more than just a character string

Page 11: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Science Occupation

subClassOf

Employment

subClassOf

,Armstrong‘ is more than just a character string

Page 12: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

,Armstrong‘ is more than just a character string

Page 13: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOfhas an

,Armstrong‘ is more than just a character string

Page 14: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an

,Armstrong‘ is more than just a character string

Page 15: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an

,Armstrong‘ is more than just a character string

is NOT a

Page 16: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an

,Armstrong‘ is more than just a character string

Cosmonautsame as

is NOT a

Page 17: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an

,Armstrong‘ is more than just a character string

Cosmonautsame as

Juri Gagarin

is a

is NOT a

Page 18: OpenHPI 6.6 - Named Entity Recognition

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an

,Armstrong‘ is more than just a character string

Kosmonautsame as

Juri Gagarin

is a

is NOT a

Named Entity Recognition

(also Entity Identification or Entity Extraction)

„locating and classifying atomic elements...into

predefined categories such as names, persons,

organizations, locations, expressions of time,

quantities, monetary values, etc.“

C.J.Rijsbergen, Information Retrieval (1979)

Page 19: OpenHPI 6.6 - Named Entity Recognition

Where does the knowledge come from...?

Page 20: OpenHPI 6.6 - Named Entity Recognition

Where does the knowledge come from...?

Page 21: OpenHPI 6.6 - Named Entity Recognition

Where does the knowledge come from...?

Page 22: OpenHPI 6.6 - Named Entity Recognition

Web of Data = Linked Open Data

Page 23: OpenHPI 6.6 - Named Entity Recognition

Armstrong

Page 24: OpenHPI 6.6 - Named Entity Recognition

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

subClassOf

subClassOf

Entity Mapping

Armstrong

Page 25: OpenHPI 6.6 - Named Entity Recognition

Pragmatics

Experience

Experience

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam

12

Meaning

Symbol Objectstands for

sender

receiver

refers tosymbolizes

Concept

Armstrong

Ogden, Richards: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism (1923)

http://commons.wikimedia.org/wiki/User:McSmit

Page 26: OpenHPI 6.6 - Named Entity Recognition

Pragmatics

Experience

Experience

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam

12

Meaning

Symbol Objectstands for

sender

receiver

refers tosymbolizes

Concept

Armstrong

Ogden, Richards: The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism (1923)

http://commons.wikimedia.org/wiki/User:McSmit

Context

Page 27: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

Page 28: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

Determine all possible Entity Mapping Candidates

• linguistic analysis (POS tagging)• normalization• encoding and spelling• special (language dependent) characters• language dependent spellings• abbreviations, acronyms• type dependent spellings• alternative names and synonyms• fuzzy string mapping• ...

Page 29: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

Determine all possible Entity Mapping Candidates

Page 30: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

Determine all possible Entity Mapping Candidates

Armstrong, Florida

Armstrong, Ontario

Armstrong County, Texas

Armstrong Tunnel

Louis Armstrong

Armstrong Tools

Armstrong (moon crater)

Armstrong (car)

The Armstrongs

Craig Armstrong

Anton Armstrong

Edward Armstrong

Gary Armstrong

George Armstrong

The Armstrong Twins

Ian Armstrong

+ 400 more...

Neil Armstrong

Armstrong Bridge

Lance Armstrong

Armstrong, Ontario

Page 31: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

Entity Selection process is determined by• context• ambiguity of source data / mapping• accuracy /reliability of source data / mapping

Page 32: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

Armstrong, Florida

Armstrong, Ontario

Armstrong County, Texas

Armstrong Tunnel

Louis Armstrong

Armstrong Tools

Armstrong (moon crater)

Armstrong (car)

The Armstrongs

Craig Armstrong

Anton Armstrong

Edward Armstrong

Gary Armstrong

George Armstrong

The Armstrong Twins

Ian Armstrong

+ 400 more...

Neil Armstrong

Armstrong Bridge

Lance Armstrong

Armstrong, Ontario

Entity Selection process is determined by• context• ambiguity of source data / mapping• accuracy /reliability of source data / mapping

Page 33: OpenHPI 6.6 - Named Entity Recognition

Armstrong

George Armstrong Custer

Neil Armstrong

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Louis Armstrong

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)

USCGC Eagle

The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle (lunar module)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Fly me to the Moon (song)

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 34: OpenHPI 6.6 - Named Entity Recognition

Select matching entities from all possible candidate entities:

• Popularity based strategies

• Linguistical strategies

• Statistical strategies

• Semantic based strategies

General Approach

1. Make an assumption

2. Do the strategies support or contradict your assumption

3. Make decision according to logical and probabilistic rules/constraints

Named Entity Recognition Strategies

N. Ludwig, H. Sack, “Named entity recognition for user-generated tags,TIR 2011

• reference text corpus(wikipedia)

• link graph (wikipedia)

• semantic graph(dbpedia)

Entity Selection Process

Page 35: OpenHPI 6.6 - Named Entity Recognition

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 36: OpenHPI 6.6 - Named Entity Recognition

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 37: OpenHPI 6.6 - Named Entity Recognition

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 38: OpenHPI 6.6 - Named Entity Recognition

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 39: OpenHPI 6.6 - Named Entity Recognition

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 40: OpenHPI 6.6 - Named Entity Recognition

Consider all entities within the same context

Armstrong landed the Eagle on the Moon.

Page 41: OpenHPI 6.6 - Named Entity Recognition

Armstrong

George Armstrong Custer

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)

USCGC Eagle

The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Neil Armstrong

Eagle (lunar module)

Moon

Louis Armstrong

Fly me to the Moon (song)

N. Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

Entity Selection Process(Semantic) Graph Analysis

Armstrong landed the Eagle on the Moon.

Page 42: OpenHPI 6.6 - Named Entity Recognition

Armstrong landed the Eagle on the Moon.

http://dbpedia.org/resource/Neil_Armstrong

http://dbpedia.org/resource/Apollo_Lunar_Module

http://dbpedia.org/resource/Moon

Page 43: OpenHPI 6.6 - Named Entity Recognition

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

22

07 - Semantic SearchOpen HPI - Course: Semantic Web Technologies - Lecture 6: Applications in the Web of Data