obie – ontology-based information extraction · • text mining – information extraction (ie)...

Post on 23-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

OBIE – Ontology-based Information Extraction

An Approach to Extract and Deal with

Imprecise Temporal Data and Spelling Errors

PhD Proposal

HEGLER TISSOT

Advisor: Marcos Didonet Del Fabro

Universidade Federal do Paraná

Curitiba – Brazil

Fev / 2014

1

2

• Introduction– Context– Motivating example– Problem– Objetives

• State of the art– Information Extraction– Ontologies– OBIE Systems– Temporal Information

• Proposed work– Spelling errors– Temporal Information

Outline2

3

• Information Management – Large volumes of data are available

• 80% = text (on the Internet or within companies)(Aranha, 2007)

– Unstructured data formats• New system modeling and building techniques

– What is the challenge?• ???

Context

4

• Information Management

– Text Mining• From “Data Mining”• a technology which the purpose of extracting non-

trivial and interesting knowledge from large collections of unstructured documents

• Classification / Clustering• Indexing for Search

• Information Extraction

Context

5

• Text Mining – Classification / clustering

• Machine learning algorithms

Based on medical records

textual content, how to

identify a possible Group of

Disease?

Context

6

• Text Mining – Classification / clustering

• Machine learning algorithms

Based on medical records

textual content, how to

identify a possible Group of

Disease?

The most discriminant

words do not necessarily

represent the most

suitable concepts.

(“not” >> E10 x E11)

Context

7

• Text Mining – Information Extraction (IE)

• In IE, relevant information from natural language (NL) texts is identified, collected and normalized.

– NLP» Natural Language Processing» Exhaustive deep NL analysis of all aspects of a text

– OBIE (Ontology-based Information Extraction)

Context

8

Context

Unstructured Textual Content

Medical Record Sample:Blood pressure is lower. No vision complaints. Sub optimal sugar, control with retinopathy and neuropathy, high glucometer readings. Will work harder on diet. Will increase insulin by 2 units.

Information Extractionvision OK

high

lower

blood pressure

glucometer

sugar

retinopathy neuropathy

IE

Ontology-basedInformation Extraction

OBIE

8

9

Where to apply?1. Information Extraction

• Medical records– Statistical view from unstrucured data

• Internet data/news/...– Knowing more from competitors

• Social Networks– How to sort and identify specific profiles?

» (e.g. drug dealers)

• Documents– How many (Word,PDF,...) documents do you have in your

computer/server? What do they say?

Context

10

2. Answering natural language queries

Context

Quais os pacientes que apresentaram os

sintomas X, Y ou Z em casos de doenças A, B

ou C nos dois últimos anos?

Resultado:Paciente 1Paciente 2Paciente 3...

“match”

11

3. Semantic + Analytical Data

Context

Quais os melhores clientes no último

semestre?

12

Medical Record Example (in Portuguese)

Motivating Example12

13

Medical Record Example (in Portuguese)

Motivating Example13

14

Temporal Information (Precise + Imprecise)

Motivating Example14

15

Spelling Errors

Motivating Example15

16

– Extract temporal information from text– Organize events in a timeline– Uncertainty → imprecise temporal data

• “a few weeks ago”

• “the coming months”• “around 10:00 am”• “in the beginning of next month”

– Spelling errors• “in the last tree days”

Problem16

17

OBIE approach to extract and deal with:

– Uncertain Temporal Information

– Spelling Errors

Objective17

18

(Bird et al., 2009)

Information Extraction18

19

(Nedellec and Nazarenko, 2006)

Ontology-based Information Extraction (OBIE)19

Ontologies

20

– Formal specification of concepts

– Knowledge Domain

Classes

+ Instances

+ Properties

+ Relations

+ Axioms

= Formal Conceptualization

(Gruber, 1993) Ontology Web Language (OWL)

Ontology20

21

– process unstructured text (natural language)– guided by ontologies– present output using ontologies– specific knowledge domain extraction

(Wimalasuriya and Dejing, 2010)

– Desired features•String similarity

•Inexact matching

•Large repositories

•Multiple ontologies

•Temporal information

OBIE Systems21

22

OBIE General Framework22

23

OBIE General Framework23

24

OBIE General Framework24

25

OBIE General Framework25

26

2. State of the Art26

27

– Organize events in a timeline– Establish chronological order– Answer temporal questions (Wong et al., 2005)

– “Which were the most prescribed drugs in the last weeks?”

– “Who did use aspirin before having <symptom>?”

– “When did <event-description> happen?”

– Challenges (Temporal Information Extraction)• Linguistics: different expressions• Reference resolution: “tomorrow”

• Negation: “not before”, “it didn´t happen last year”

Temporal Information Extraction27

28

Tokens that represent a temporal entity (point in time, duration, frequency) (Sanampudi and Kumari, 2010)

• Explicit– January 2013

• Implicit– Christmas 2012

• Relative (indexed)– Yesterday, next month, three days ago (Alonso et al., 2007)

• Vague– Several weeks, in the next days (Schilder and Habel,

2003)

Temporal Expressions28

29

– Formal representation of temporal concepts• OWL-time (Hobbs and Pan, 2004)

• TL-OWL (Kim et al., 2008)

• other Temporal approaches and OWL Extensions

– Challenges (Imprecise Temporal Information)– Extraction

– Representation

– Logics and algebra

Temporal Ontologies29

30

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

30

31

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

before( A.begin , B.begin )? → probably true/false

31

32

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

before( A.begin , B.begin )? → probably true/false

before( B.begin , A.end )? → probably true/false

32

33

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

before( A.begin , B.begin )? → probably true/false

before( B.begin , A.end )? → probably true/false

before( A.end , B.end )? → probably true/false

33

34

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

before( A.begin , B.begin )? → probably true/false

before( B.begin , A.end )? → probably true/false

before( A.end , B.end )? → probably true/false

before( A.begin , B.end )? → TRUE

34

35

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

C

before( A.begin , B.begin )? → probably true/false

<statement>: after( B.begin , C.end )

35

36

Dealing with Imprecise Temporal Information

Logics and algebra

begin

begin

end

end

A

B

C

before( A.begin , B.begin )? → probabily true/false

<statement>: after( B.begin , C.end )

before( A.begin , B.begin )? → TRUE

36

37

– Inaccurate temporal expression (+)– Define temporal concepts (+)– Perform arithmetic or logic operations (-)

Temporal Information Approaches37

38

– Spelling ErrorsWordNet Extensions to deal with phonetic similarity

– Imprecise Temporal Information Extraction

Proposed Work38

39

* ED (Levenshtein, 1966), TS (Oliver, 1993), HD (Hamming, 1950), LCS (Allison and Dix, 1986), SWD (Smith and Waterman, 1981), MED (Monge and Elkan, 1996), JWD (Winkler and Thibaudeau, 1991), Soudex (Knuth, 1968), FastSS (Bocek et al., 2007)

Spelling Errors x Similarity39

40

– Multi language / Multi dictionary

– Derivative Words• Verb conjugation in Portuguese

– 13 tenses; 67 variations;» Unlike English (7 variations for ‘to be’)

{am, is, are, was, were, being, been}

– Fast Phonetic Similarity Search

WordNet Extensions40

41

Stringsim function

Fast Phonetic Similarity Search41

42

Stringsim function

Fast Phonetic Similarity Search42

43

PhoneticMapPT

Fast Phonetic Similarity Search43

44

PhoneticMapSimPT function

Fast Phonetic Similarity Search44

45

– Similarity Search Methods• Full• Fast

– PhoneticSearchPT function (fast search method)

Fast Phonetic Similarity Search45

46

– Precise x Imprecise Temporal Information• “08:15 am” x “earlyer in the morning”

» Which one happened before?

– Experiment• 4,748 medical records (MR)• 3,583 imprecise expressions (in 2,018 MR – 42,5%)

Uncertain Temporal Information Extraction46

47

• Temporal Information Mapping• Extracting Process

• Answering User Queries

• Case Study

Proposed Activities47

48

• Temporal Information Mapping• Extracting Process

• Answering User Queries

• Case Study

Proposed Activities

A1 A3 A2

A.1. Temporal Ontology

A.2. Temporal Expressions

A.3. Numeral Ontology

48

49

• Temporal Information Mapping• Extracting Process

• Answering User Queries

• Case Study

Proposed Activities

B.1. Temporal Representation

B.2. Annotation Schemes

B.3. Phonetic Similarity

B.4. OWL Extension

B.5. Extraction Rules

B1

B2

B3

B4B5

49

50

• Temporal Information Mapping• Extracting Process

• Answering User Queries

• Case Study

Proposed Activities

C.1. Temporal Algebra

C.2. Analytical Queries C1

C2

50

51

• Temporal Information Mapping• Extracting Process

• Answering User Queries

• Case Study

Proposed Activities

D.1. Ontology Generator

D.2. Domain Ontology

D.3. Information Extraction

D.4. Accuracy EvaluationD1

D2

51

52

Proposed Activities

A.1

A.2

A.3

B.1

B.2

B.3

B.4

B.5

C.1

C.2

D.1

D.2

D.3

D.4

P.x

T

Temporal Ontology

Temporal Expressions

Numeral Ontology

Temporal Representation

Annotation Schemes

Phonetic Similarity

Temporal OWL Extension

Extraction Rules

Temporal Algebra

Analytical Queries

Ontology Generator

Domain Ontology

Information Extraction

Accuracy Evaluation

Articles

Thesis

Create an ontology to define imprecise temporal concepts

List possible temporal expressions in Portuguese and English

Search for a numeral ontology that maps numeric values in the form of words

Define a representation for temporal expressions

Review and adapt temporal annotation schemes to support uncertain temporal data

Apply the Phonetic Fast Search method in the annotation and extraction processes

Propose an extension to the OWL metamodel to support temporal-dependent elements

Define a set of extraction rules needed to handle uncertain temporal data

Review the literature concerning Temporal Algebra

Convert natural language queries to analytical queries

Review the literature to describe methods to convert data models to ontologies

Create an ontology to handle medical domain knowledge available in InfoSaude

Design and develop part of the proposed framework - case study - medical records

Search for a benchmark to measure accuracy of proposed work

52

53

• Schedule

Uncertain Temporal Information Extraction53

54

– Uncertain x Imprecise x Inaccurate• How do differences in such word senses can

contribute to organize such temporal expressions into different groups?

– Fuzzy Logic• Imprecise time ≡ fuzzy time? How to apply fuzzy

logic to inaccurate temporal data? Are there alternatives?

– OBIE Accuracy• How to evaluate IE accuracy?

Pending questions...54

55

OBIE – Ontology-based Information Extraction

An Approach to Extract and Deal with

Imprecise Temporal Data and Spelling Errors

PhD Proposal

HEGLER TISSOT

Advisor: Marcos Didonet Del Fabro

Universidade Federal do Paraná

Curitiba – Brazil

Fev / 2014

55

top related