www.sharon-it.com1 refining – finding words/expanding taly sharon [email protected]...
Embed Size (px)
TRANSCRIPT

www.sharon-it.com2
Contents• Expanding/Learning terms• Categorization/Clustering engines• Google Suggest• SurfWax FocusWords• When you don’t know where to
start

www.sharon-it.com3
Make Longer Queries
•Yahoo
•Harvest Digital
Average Search Terms per Query
Overall Experienced

www.sharon-it.com4
Adding Words• Holocaust
23,200,000• holocaust memorial
836,000• holocaust memorial budapest
42,300• holocaust memorial budapest danube
4,910• holocaust memorial budapest danube promenade
692

www.sharon-it.com5
Classification/Categorization
• Classification: the process of deciding the appropriate category for a given document.
• Examples:– deciding what newsgroup an article belongs
to. – what folder an email message
should be directed to.– what is the general topic of an essay.

www.sharon-it.com6
Clustering
• The process of automatically grouping documents.

www.sharon-it.com7
Search Categorization/Clustering
• The result documents are ordered according to categories.
• The searcher can select the relevant category to display the related documents.
• Examples:– Vivisimo/Clusty– Excite– Teoma– Exalead

www.sharon-it.com8
Clusty

www.sharon-it.com9
Excite

www.sharon-it.com10
Ask

www.sharon-it.com11
Exalead

www.sharon-it.com12
Google Suggest
• As you type – you get query suggestions and number of results per query.

www.sharon-it.com13
Google Suggest (2)

www.sharon-it.com14
Google Suggest

www.sharon-it.com15

www.sharon-it.com16
Yahoo Search Assist

www.sharon-it.com17
SurfWax FocusWords• SurfWax has an option “Focus”• This option invokes the FocusWords
mechanism• You get suggestions to make your
query:– Broader– Similar– Narrower
• http://www.surfwax.com

www.sharon-it.com18
SurfWax FocusWords

www.sharon-it.com19
When you don’t know how to start
• Reverse Dictionary• Glossaries and
Dictionaries• Taxonomy/
Folksonomy• Pearl Culturing• Analyzing pages• Finding similar pages
– Google’s related:– Alexa www.alexa.com

www.sharon-it.com20
Reverse Dictionary• OneLook reverse dictionary:
http://www.onelook.com/reverse-dictionary.shtml
• Example: “bird of prey” => raptor• Example: economical measure of a
nation’s wealth => Gross Domestic Product

www.sharon-it.com21

www.sharon-it.com22
Glossaries and Dictionaries• Google search:
– <topic>– glossary OR thesaurus OR dictionary OR taxonomy
• Example 1: agriculture glossary– http://www.cnie.org/nle/AgGlossary/AgGlossary.htm– http://www.cahe.nmsu.edu/news/aggloss.html– http://agriculture.house.gov/info/glossary.html
• Example 2: agriculture thesaurus– http://agclass.nal.usda.gov/agt/agt.shtml– http://www.fao.org/aims/ag_intro.htm (multilingual)
• http://www.glossarist.com– http://www.glossarist.com/glossaries/business/primary-in
dustry/agriculture.asp

www.sharon-it.com23

www.sharon-it.com24
Taxonomy/Folksonomy• Taxonomies
– Found via directory search (example DMOZ): http://search.dmoz.org/cgi-bin/search?search=taxonomy
– www.taxonomywarehouse.com (paid)
• Folksonomy– Use tags in
• Technorati www.technorati.com• Delicious www.del.icio.us

www.sharon-it.com25
del.icio.us

www.sharon-it.com26
del.icio.us

www.sharon-it.com27
Pearl Culturing• What to do when you don’t have the
category nor the right keywords?• Find one good relevant website• Look it up in directories• You will find:
– the category/main keywords– authoritative websites
• Useful search engine: Exalead

www.sharon-it.com28
Analyze Pages• Distilling: what is problematic in a bad page?
– what is wrong? Is there an interfering keyword/term appearing.
– Remove interfering terms (using “-”).• Identifying clues and patterns in a good page.
– Read the document, what are the clues? – Look for new keywords, word combinations and
other things differentiating between it to non-authoritative documents.
– Use frequency counter:• http://www.wordcounter.com/• http://www.georgetown.edu/faculty/ballc/webtools/web_
freqs.html

www.sharon-it.com29
Frequency Counter

www.sharon-it.com30

www.sharon-it.com31
Wordcounter

www.sharon-it.com32

www.sharon-it.com33
References• http://hacks.oreilly.com/pub/ht/2 • www.batesinfo.com• http://www.searchtools
.com/info/classifiers.html

www.sharon-it.com34
Exercises1. How are bad user interfaces called (hint: try Google suggest)2. Reverse dictionary
a. Find relevant keywords for chemistryb. What is the terminology for when menstruation stops?c. How was the separation between the west and the soviet union
called?3. What are the related terms to Competitive Intelligence?4. Check suggestions from Google Suggest for a query starting
with biofuel.5. Using SurfWax, learn options to focus or broaden the query:
biodiesel.6. Identify the most relevant terms in the website: www.uspto.gov 7. Identify the most relevant terms in the Biofuel Wikipedia entry
http://en.wikipedia.org/wiki/Biofuel.8. Search in Onelook reverse dictionary and in other glossaries
terms: fuel, natural energy, geothermal, and other terms. Look at the results.