wordnet ® and its java api ♦ introduction to wordnet ♦ wordnet api for java name: hao li uni:...

13
WordNet WordNet ® and its Java ® and its Java API API Introduction to WordNet WordNet API for Java Name: Hao Li Uni: hl2489

Upload: paula-strickland

Post on 29-Dec-2015

244 views

Category:

Documents


2 download

TRANSCRIPT

WordNetWordNet ® and its Java API ® and its Java API

♦ Introduction to WordNet

♦ WordNet API for Java

Name: Hao Li Uni: hl2489

Introduction to WordNet ®Introduction to WordNet ®

1.WordNet® is a large lexical database of English. It is kind of a dictionary. It is developed by Cognitive Science Laboratory of Priceton University.

2.In WordNet, Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

3.In WordNet, Synsets are interlinked by means of conceptual-semantic and lexical relations.

4.WordNet is freely and publicly available for download and also have APIs for different programming languages. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.

WordNetWordNet API for JAVA(1) API for JAVA(1) Method Summary of Class WordNetDatabase ― abstract String[] getBaseFormCandidates(String inflection, SynsetType type)

Returns lemma representing word forms that might be present in WordNet.

√ static WordNetDatabase getFileInstance()

Returns an implementation of this class that can access the WordNet database by searching files on the local file system.

― Synset[] getSynsets(String wordForm)

Returns all synsets that contain the specified word form or a morphological variation of that word form.

√ Synset[] getSynsets(String wordForm, SynsetType type)

Returns only the synsets of a particular type (e.g., noun) that contain a word form or morphological variation of that form.

― abstract Synset[] getSynsets(String wordForm, SynsetType type, boolean useMorphology)

Returns only the synsets of a particular type (e.g., noun) that contain a word form matching the specified text or one of that word form's variants.

WordNetWordNet API for JAVA(2) API for JAVA(2)• Method Summary of Calss Synset ― WordSense[] getAntonyms(String wordForm)

Returns the antonyms (words with the opposite meaning), if any, associated with a word form in this synset.

√ String getDefinition()

Retrieve a short description / definition of this concept.

― WordSense[] getDerivationallyRelatedForms(String wordForm)

Returns word forms that derivationally related to the one specified.

√ int getTagCount(String wordForm)

Returns a number that's intended to provide an approximation of how frequently the specified word form is used to represent this meaning relative to how often it's used to represent other meanings.

― SynsetType getType()

Retrieve the type of synset this object represents.

― String[] getUsageExamples()

Retrieve sentences showing examples of how this synset is used.

√ String[] getWordForms()

Retrieve the word forms.

Method used in the project(1)Method used in the project(1) WordNetDatabase WordNetDatabase..getSynsets(String wordForm, SynsetType type) getSynsets(String wordForm, SynsetType type)

Take word “pig” as example:

Synset[0]=Noun@2395406[hog,pig,grunter,squealer,Sus scrofa] - domestic swine

Synset[1]=Noun@10612210[slob,sloven,pig,slovenly person] - a coarse obnoxious person

Synset[2]=Noun@10179649[hog,pig] - a person regarded as greedy and pig-like

Synset[3]=Noun@9879144[bull,cop,copper,fuzz,pig] - uncomplimentary terms for a policeman

Synset[4]=Noun@3935116[pig bed,pig] - mold consisting of a bed of sand in which pig iron is cast

Synset[5]=Noun@3934998[pig] - a crude block of metal (lead or iron) poured from a smelting furnace

Method used in the project(2)Method used in the project(2)Synset. getDefinition() Synset. getDefinition()

Take Synset[0] of word “pig” as example:

domestic swine

Method used in the project(3)Method used in the project(3) Synset.getTagCount(String wordForm) Synset.getTagCount(String wordForm)

It is a very useful method. It represent the frequency of the specified word used to represent this meaning relative to how often it's used to represent other meanings.

This method has two usage according to my understanding:

(1)Analyse the same word of its different synets.

(2) Analyse different words of the same synset.

Analyse the same word of its different synets.Analyse the same word of its different synets. Synset.getTagCount(String wordForm) Synset.getTagCount(String wordForm)

The results shows us which meaning of the word is more frequently used.

For example:

The frequemcy of the word “bridge” in the following synset is 4.

Synset[0]=Noun@2898711[bridge,span] - a structure that allows people or vehicles to cross an obstacle such as a river or canal or railway etc.

And in another synset of “bridge” is 1.

Synset[4]=Noun@490569[bridge] - any of various card games based on whist for four players

The above example means when people talk about “bridge”, it is more likely about a structure

“bridge ”than the card game “bridge”.

Analyse different words of the same synset.Analyse different words of the same synset. Synset.getTagCount(String wordForm) Synset.getTagCount(String wordForm)

The result shows us in order to express a definition, which word is more accurate and will not cause word sense ambiguation.

For example:In a synset of the word “java”Synset=Noun@7929519[coffee,java] - a beverage consisting of an infusion of ground coffee beansThe frequency of the word “coffee” is 46 and the word “java” is 1 .

It means “coffee” is more representative in the meaning of “a beverage consisting of an infusion of ground coffee beans” than the word “java”. when people talks about “coffee”, you will understand they are talking about “a beverage consisting of an infusion of ground coffee beans” but not other meanings. And when people talks about “java”, they may talk about the beverage or the programming language “java”.

ConclusionConclusion

1.There are two purpose of WordNet application: one is to produce a combination of

dictionary and thesaurus that is more intuitively usable, and the other is to support

automatic text analysis and artificial intelligence applications.

2.Because of its features, WordNet is now videly used in information systems, including

word sense disambiguation, information retrieval, automatic text classification,

automatic text summarization, and even automatic crossword puzzle generation.

And it is also used in our project!

I will tell you --------------I will tell you --------------what our WordNet based algorithm is in demo next week .what our WordNet based algorithm is in demo next week .

Thank you!Thank you!