presented by rani qumsiyeh & andrew zitzelberger

56
Presented by Rani Qumsiyeh & Andrew Zitzelberger

Upload: meghan-scott

Post on 02-Jan-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Presented byRani Qumsiyeh & Andrew Zitzelberger

Page 2: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Common approaches Collocation analysis: Producing

anonymous relations without a label. Syntactic Dependencies: The

dependencies between verbs and arguments.

Hearst’s approach: Matching lexico-syntactic patterns.

Page 3: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Definition: A pair of words which occur together more often than expected by chance within a certain boundary.

Can be detected by Student’s t-test or X^2 test.

Examples of such techniques are presented in the related work section.

Page 4: Presented by Rani Qumsiyeh & Andrew Zitzelberger

“A person works for some employer” Relation: work-for Concepts: person, employer

The acquisition of selectional restrictions

Detecting verbs denoting the same ontological relation.

Hierarchical ordering of relations.

Discussed later in detail.

Page 5: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Used to discover very specific relations such as part-of, cause, purpose.

Charniak employed part-of-speech tagging to detect such patterns.

Other approaches to detect causation and purpose relations are discussed later.

Page 6: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Learning Attributes relying on the syntactic relation between a noun and its modifying adjectives.

Learning Relations on the basis of verbs and their arguments.

Matching lexico-syntactic patterns and aims at learning qualia structures for nouns.

Page 7: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Attributes are defined as relations with a datatype as range.

Attributes are typically expressed in texts using the preposition of, the verb have or genitive constructs: the color of the car every car has a color the car's color Peter bought a new car. Its color [...]

Page 8: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 9: Presented by Rani Qumsiyeh & Andrew Zitzelberger

attitude adjectives, expressing the opinion of the speaker such as in 'good house'

temporal adjectives, such as the 'former president' or the 'occasional visitor‘

membership adjectives, such as the 'alleged criminal', a 'fake cowboy‘

event-related adjectives, such as 'abusive speech', in which either the agent of the speech is abusive or the event itself

Page 10: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Find the corresponding description for the adjective by looking up its corresponding attribute in WordNet.

Consider only those adjectives which do have such an attribute relation.

This increases the probability that the adjective being considered denotes the value of some attribute, quality or property.

Page 11: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Tokenize and part-of-speech tag the corpus using TreeTagger.

Match to the following two expressions and extract adjective/noun pairs: (\w+{DET})? (\w+{NN})+ is{VBZ} \w+

{JJ} (\w+{DET})? \w+{JJ} (\w+{NN})+

Cond (n, a) := f(n, a)/f(n)

Page 12: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Tourism Corpus Threshold = 0.01 Car

Page 13: Presented by Rani Qumsiyeh & Andrew Zitzelberger

For each of the adjectives we look up the corresponding attribute in WordNet age is one of {new, old} value is one of {black} numerousness/numerosity/multiplicity is one of {many} otherness/distinctness/separateness is one of {other} speed/swiftness/fastness is one of {fast} size is one of {small, little, big}

Page 14: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Evaluate for every domain concept according to(i) its attributes and their(ii) corresponding ranges by assigning them a rate from '0' to '3‘

▪ '3' means that the attribute or its range is totally reasonable and correct.

▪ '0' means that the attribute or the range does not make any sense.

Page 15: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 16: Presented by Rani Qumsiyeh & Andrew Zitzelberger

A new approach that not only lists relations but finds the general relation.

work-for (man, department), work.for (employee, institute), work.for (woman, store) work-for (person,organization)

Page 17: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Conditional probability. Pointwise mutual information (PMI).A measure based on the x^-test.

Evaluate by applying their approach to the Genia corpus using the Genia ontology

Page 18: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Extract verb frames using Steven Abney's chunker.

Extract tuples NP-V-NP and NP-V-P-NP. Construct binary relations from tuples.

Use the lemmatized verb V as corresponding relation label

Use the head of the NP phrases as concepts.

Page 19: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 20: Presented by Rani Qumsiyeh & Andrew Zitzelberger

protein_molecule: 5 Protein_family_or_group:

10 amino-acid: 10

Page 21: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Take into account the frequency of occurrence.

Chose the highest one

Page 22: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Penalize concepts c which occur too frequently.

P{amino-acid) = 0.27, P(protein) = 0.14

Page 23: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Compares contingencies between two variables (the two variables are statistically independent or not)

we can generalize c to ci if the X^2-test reveals the verb v and c to be statistically dependent

Level of significance = 0.05

Page 24: Presented by Rani Qumsiyeh & Andrew Zitzelberger

the Genia corpus contains 18.546 sentences with 509.487 words and 51.170 verbs.

Extracted 100 relations, 15 were regarded as inappropriate by a biologist evaluator.

The 85 remaining was evaluated Direct matches for domain and range (DM), Average distance in terms of number of edges

between correct and predicted concept (AD) A symmetric variant of the Learning Accuracy

(LA)

Page 25: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 26: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 27: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Nature of ObjectsAristotle

Material cause (made of) Agentive cause (movement, creation,

change) Formal cause (form, type) Final cause (purpose, intention, aim)

Page 28: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Generative Lexicon framework [Pustejovsky, 1991]

Qualia StructuresConstitutive (components)Agentive (created)Formal (hypernym)Telic (function)

Knife

Page 29: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Human Subjective decisions

Web Linguistic errors Ranking errors Commercial Bias Erroneous information Lexical Ambiguity

Page 30: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 31: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Pattern library tuples (p, c) p is pattern c is clue (c:string -> string)

Given a term t and a clue c c(t) is sent to the search engine

π(x) refers to plural forms of x

Page 32: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 33: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 34: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 35: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Amount words: variety, bundle, majority, thousands,

millions, hundreds, number, numbers, set, sets, series, range

Example: “A conversation is made up of a series of

observable interpersonal exchanges.”▪ Constitutive role = exchange

Page 36: Presented by Rani Qumsiyeh & Andrew Zitzelberger

PURP:=\w+{VB} NP I NP I be{VB} \w+{VBD}).

Page 37: Presented by Rani Qumsiyeh & Andrew Zitzelberger

No good patterns X is made by Y X is produced by Y

Instead:

Agentive_verbs = {build, produce, make, write, plant, elect, create, cook, construct, design}

Page 38: Presented by Rani Qumsiyeh & Andrew Zitzelberger

e = element t = term

Page 39: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Lexical elements: knife, beer, book, computer

Abstract Noun: conversation

Specific multi-term words: Natural language processing Data mining

Page 40: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Students score 0 = incorrect 1 = not totally wrong 2 = still acceptable 3 = totally correct

Page 41: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 42: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 43: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Reasoning: Formal and constitutive patterns are more ambiguous.

Page 44: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 45: Presented by Rani Qumsiyeh & Andrew Zitzelberger
Page 46: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Madche and Stabb, 2000 Find relations using association rules

Transaction is defined as words occurring together in syntactic dependency

Calculate support and confidence

Precision = 11%, Recall = 13%

Page 47: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Kavalec and Svatek, 2005 Added ‘above expectation’ heuristic

▪ Measure association between verb and pair of concepts

Page 48: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Gamallo et al., 2002 Map syntactic dependencies to semantic

relations 1) shallow parser + heuristics to derive

syntactic dependencies 2) cluster based on syntactic positions

Problems▪ Mapping is under specified▪ Largely domain dependent

Page 49: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Ciaramita et al., 2005 Statistical dependency parser to extract:

▪ SUBJECT-VERB-DIRECT_OBJECT▪ SUBJECT-VERB-INDIRECT_OBJECT

χ2 test – keep those occurring significantly more often than by chance

83% of learned relations are correct 53.1% of generalized relations are correct

Page 50: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Heyer et al., 2001 Calculate 2nd order collocations Use set of defined rules to reason

Page 51: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Ogata and Collier, 2004 HEARST patterns for extraction Use heuristic reasoning rules

Page 52: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Yamaguchi, 2001 Word space algorithm using 4 word

window Cos(angle) measure for similarity

▪ If similarity > threshold relationship

Precision = 59.89% for legal corpus

Page 53: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Poesio and Almuhareb, 2005 Classify attributes into one of six categories:

▪ Quality, part, related-object, activity, related-agent, non-attribute

Classifier was trained using:▪ Morphological information, clustering results, search

engine results, and heuristics

Better results from combining related-object and part

F-measure = 53.8% for non-attribute class, and between 81-95% for other classes

Page 54: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Claveau et al., 2003 Inductive Logic Programming Approach Doesn’t distinguish between different

qualia roles

Page 55: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Learning relations from non-verbal structures

Gold standard of qualia structuresDeriving a reasoning calculus

Page 56: Presented by Rani Qumsiyeh & Andrew Zitzelberger

Strengths Explained (their) methods in detail

Weaknesses Required a lot of NLP background

knowledge Short summaries of other’s work