implementation of meta-search engine by: antony pranata [email protected]
TRANSCRIPT
Implementation ofImplementation ofMeta-Search EngineMeta-Search Engine
by:
Antony pranata
http://antonypr.pair.com
MotivationMotivation
• Most Internet users use search engines to find information on the Web.
• Explore the idea behind meta-search engines.
• Build a “new” client-side meta-search engine SimpleFind (http://antonypr.pair.com/simplefind.html).
What areWhat areSearch Engines?Search Engines?
• Search engines are interactive tools to help people locate information available via the World Wide Web.
• Search engines are actually databases that contain references to thousands of resources.
• Search engines provide interfaces between the users and the underlying databases.
Types of Search EnginesTypes of Search Engines
• Robot-Driven Search Engines.– Example: AltaVista, Excite, HotBot, Lycos.
• Web Directory Services Search Engines.– Example: Yahoo!, Snap, LookSmart.
• Meta-Search Engines.– Example: MetaCrawler, Mamma,
SavvySearch.
How do People FindHow do People FindNew Web Sites?New Web Sites?
Source: GVU Center at Georgia Institute of Technology.
Search Engines UsedSearch Engines Used
Source: GVU Center at Georgia Institute of Technology.
Why do We NeedWhy do We NeedMeta-Search Engines?Meta-Search Engines?
• Each search engine provides its own database, interface, and special features.
• Each search engine collects resources differently, therefore the same query typed into several search engines is likely to produce different results.
• Most search engines contain less that 20% of the data on the Web.
What areWhat areMeta-Search Engines?Meta-Search Engines?
• Meta-search engines search the databases of other search engines and directories.
• Meta-search engine don’t create their own databases of information.
• The results are the compilation of all search engine queried.
What is SimpleFind?What is SimpleFind?
• A “new” client-side meta-search engine implemented in my thesis.
• Developed with C++Builder + STL + ICS.
• Minimum system requirements: PC Pentium with Win95/98/2000, 32 MB RAM, 2 MB HD, and Internet connection.
Features of SimpleFindFeatures of SimpleFind
• Currently supports seven major search engines, AltaVista, Excite, HotBot, Infoseek (Go), Lycos, WebCrawler, and Yahoo!
• Send the query to multiple search engines simultaneously.
• Duplicated links are merged into one link.
• Customizable sort method.
Features of SimpleFindFeatures of SimpleFind
• Save the results as SimpleFind format, HTML files, or CSV.
• Customizable search engines.
• Support AND, OR, and NOT operator as well as + and - operator.
• Customizable title and description weight.
Company Name TestCompany Name Test
Query URL of home page
satelindo http://www.satelindo.co.id
http://www.satelindo.co.id/index.html
toefl http://www.toefl.org
http://www.toefl.org/index.html
inprise http://www.inprise.com
http://www.inprise.com/index.html
nokia http://www.nokia.com
http://www.nokia.com/main.html
gadjah mada university http://www.ugm.ac.id
http://www.ugm.ac.id/index.html
Company Name TestCompany Name Test(Query: satelindo)(Query: satelindo)
Test was conducted on January 15, 2000
0 2 4 6 8 10 12
SimpleFind (Engine)
SimpleFind (Score)
AltaVista
Excite
HotBot
Infoseek
Lycos
WebCrawler
Rank (home page) Rank (other page)
Company Name TestCompany Name Test(Query: toefl)(Query: toefl)
0 2 4 6 8 10 12
SimpleFind (Engine)
SimpleFind (Score)
AltaVista
Excite
HotBot
Infoseek
Lycos
WebCrawler
Rank (home page) Rank (other page)
Test was conducted on January 15, 2000
Company Name TestCompany Name Test(Query: inprise)(Query: inprise)
0 2 4 6 8 10 12
SimpleFind (Engine)
SimpleFind (Score)
AltaVista
Excite
HotBot
Infoseek
Lycos
WebCrawler
Rank (home page) Rank (other page)
Test was conducted on January 15, 2000
Company Name TestCompany Name Test(Query: nokia)(Query: nokia)
Test was conducted on January 15, 2000
0 2 4 6 8 10 12
SimpleFind (Engine)
SimpleFind (Score)
AltaVista
Excite
HotBot
Infoseek
Lycos
WebCrawler
Rank (home page) Rank (other page)
Company Name TestCompany Name Test(Query: gadjah mada university)(Query: gadjah mada university)
Test was conducted on January 15, 2000
0 2 4 6 8 10 12
SimpleFind (Engine)
SimpleFind (Score)
AltaVista
Excite
HotBot
Infoseek
Lycos
WebCrawler
Rank (home page) Rank (other page)
Phrase TestPhrase Test
Query Relevant
Links
Irrelevant
Links
Dead
Links
indonesia programmer 3 5 2
tip trick delphi programming 6 4 -
download free mp3 music 5 1 4
Phrase TestPhrase Test(Query: indonesia programmer)
Test was conducted on August 1, 1999
02468
1012
Relevant Links Irrelevant Links Dead Links
Phrase TestPhrase Test(Query: tip trick delphi programming)
Test was conducted on August 1, 1999
0123456789
10
Relevant Links Irrelevant Links Dead Links
Phrase TestPhrase Test(Query: download free mp3 music)(Query: download free mp3 music)
012345678
Simple
Find
AltaVist
a
Excite
HotBot
Infos
eek
Lyco
s
Web
Crawler
Relevant Links Irrelevant Links Dead Links
Test was conducted on January 15, 2000
Reality CheckReality Check
• SimpleFind has been distributed and tested by more than 100 users world wide.
• “Amazing program. One of the most cogently designed, effective web utility programs developed to date.” (William P. Welty, M.Div., Executive Director of The ISV Foundation)
SimpleFind DistributionSimpleFind Distribution
SimpleFind DistributionSimpleFind Distribution
SummarySummary
• Meta search engines is useful for saving time in searching multiple search engines at once.
• Searching with meta-search engines is not always give the best results, however starting to search a query with meta-search engines is recommended.
Further DevelopmentFurther Development
• The list of search engines can be added by modifying SimpleFind.ini file Google, GoTo, Northern Light, etc.
• The program can be improved to search other information, such as e-mail address (WhoWhere, Four11), software (TuCows, Download, HotFiles), etc.