stemming stemming is crude chopping of affixes in inflected words. it is used to coalesce terms for...

6
Stemming • Stemming is crude chopping of Affixes in inflected words. • It is used to coalesce terms for effective Information Retrieval. • The base version of word is Stem, while pieces attached to stem are Affixes . • Example: • Affixes, Stem: Affix, and Affix: es • Functional Stem: Function Affix: al

Upload: sandra-pearson

Post on 19-Jan-2018

212 views

Category:

Documents


0 download

DESCRIPTION

Implementation Step 1: A. Expand query Query Input: Query Output: Office Attire wear apparels dress for Eradicate Mosquitoes remove kill mosquito B. Assign QueryId 1-Eradicate 2- Mosquitoes 3-remove 4-kill 5-mosquito 1- Office 2-Attire 3-wear 4-apprales 5-dress for

TRANSCRIPT

Page 1: Stemming Stemming is crude chopping of Affixes in inflected words. It is used to coalesce terms for effective Information Retrieval. The base version of

Stemming

• Stemming is crude chopping of Affixes in inflected words.

• It is used to coalesce terms for effective Information Retrieval.

• The base version of word is Stem, while pieces attached to stem are Affixes.

• Example: • Affixes, Stem: Affix, and Affix: es• Functional Stem: Function Affix: al

Page 2: Stemming Stemming is crude chopping of Affixes in inflected words. It is used to coalesce terms for effective Information Retrieval. The base version of

Lemmatization

• It is more complex form of stemming.• It implies identifying synonyms of the words in

user queries. • Example: • Engineering -> Technology• Attire -> Wear, Dress • Stemming and Lemmatization are used to simplify

the job of designer and better serve users.

Page 3: Stemming Stemming is crude chopping of Affixes in inflected words. It is used to coalesce terms for effective Information Retrieval. The base version of

Implementation

• Step 1:

A. Expand query• Query Input:

• Query Output:• Office Attire wear apparels dress for • Eradicate Mosquitoes remove kill mosquito

B. Assign QueryId• 1-Eradicate 2- Mosquitoes 3-remove 4-kill 5-mosquito • 1- Office 2-Attire 3-wear 4-apprales 5-dress for

Page 4: Stemming Stemming is crude chopping of Affixes in inflected words. It is used to coalesce terms for effective Information Retrieval. The base version of

Implementation (cont’d)• Step 2: Map Function• Input:

• Output: map(String key, String value)

// key: QWord

// value: SERP text

FOREACH Dword IN value

EmitIntermediate(Qword,Proximity word);

NEXT

Page 5: Stemming Stemming is crude chopping of Affixes in inflected words. It is used to coalesce terms for effective Information Retrieval. The base version of

Implementation (cont’d)

• Step 3: Reduce Function• Input:

• Output:

reduce(String key, Iterator values)

// key: Qword

// values: a list of Proximity words

QwId=fn_GetQueryId(Qword)

FOREACH v IN values

IF word IS verb

Emit(QwId,word+Pword);

ELSE

Emit(QwId,Pword+word);

NEXT

Page 6: Stemming Stemming is crude chopping of Affixes in inflected words. It is used to coalesce terms for effective Information Retrieval. The base version of

Implementation (cont’d)

• Step 3: Reduce Function• Input:

• Output:

reduce(String key, Iterator values)

// key: Qword

// values: a list of Proximity words

QwId=fn_GetQueryId(Qword)

FOREACH v IN values

IF word IS verb

Emit(QwId,word+Pword);

ELSE

Emit(QwId,Pword+word);

NEXT