search engines jan damsgaard dept. of informatics copenhagen business school

19
Search Engines Jan Damsgaard Dept. of Informatics Copenhagen Business School http://www.cbs.dk/staff/ damsgaard

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Search Engines

Jan DamsgaardDept. of Informatics

Copenhagen Business Schoolhttp://www.cbs.dk/staff/damsgaard

EBUSS Jan Damsgaard, 2004

Introduction How to find relevant information on the

web a major problem Size, growth, lack of universal semantic

organization major impediments Two major strategies

1. Improve users’ search capability by using raw computer power: search engines

2. Help organize user relevant information into meaningful categories and bundles of services: portals

EBUSS Jan Damsgaard, 2004

Definitions Search engine

– Specific information retrieval software which provides as results URL and descriptions web pages

Portal– Site that forms a major site for users when they

connect to web; portals combine directories, services and search capabilities and personalization

EBUSS Jan Damsgaard, 2004

Search Engines

Technical and business solutions that provide these services on a mass scale are important internet phenomena for two reasons:– 1) they obtain immense hit rates and therefore are major

points of origin for any internet activity

– 2) they are most important means to channel user search and retrieval

– Therefore they are strategically important as reflected in the valuations of the search engine companies in the market

www.mediametrix.com www.nielsen-netratings.com

msn.dk 1.365.657 

dr.dk 863.095 

krak.dk 794.017 

tv2.dk 540.977 

eniro.dk 496.780 

ekstrabladet.dk 480.470 

ofir.dk 454.639 

tdconline.dk 412.923 

bt.dk 336.704 

sol.dk 317.125 

netdoktor.dk (26)  103.669

FDIM (top ti)

EBUSS Jan Damsgaard, 2004

Look at the stickiness

Top 10 sites in November 2000 in terms of minutes spend per month

EBUSS Jan Damsgaard, 2004

Where Do Search Engines Develop Market Value?

Market recognition, leading to– popular use and adoption

– selling add impressions

– long term contracts for search engine functionality

Market assessment of real options associated with the recognition of the tool in the marketplace– future value-added alliance and spin-offs

EBUSS Jan Damsgaard, 2004

Search engine basics

Basic information retrieval techniques Market trends and capabilities Awareness of popular assessment metrics

for search engine performance Search engine business models

EBUSS Jan Damsgaard, 2004

How Search Engines Work

Three components: – spider or link crawler software agent– index or catalog database of content– search engine software or combined meta-search

engine Require significant hardware horsepower,

server connectivity and database capabilities If not connected, you submit your links

EBUSS Jan Damsgaard, 2004

How do search engines work

Add keywords to text fields Critical is the choice of the keywords,

possibilities of their combination and how the search engine exploits the results

Multilingual support Another issue is how it organizes search result

The most popular search engines

Search EngineTotal from

Dec. 2002

Total from March 2002

Total from Aug. 2001

Google 9,732 8,371 6,567

AlltheWeb 6,757 4,388 4,969

AltaVista 5,419 3,432 3,112

WiseNut 4,664 5,009 4,587

HotBot 3,680 2,869 3,277

MSN Search 3,267 2,523 3,005

Teoma 3,259 1,839 2,219

NLResearch 2,352 3,610 3,321

Gigablast 2,352 NA NA

EBUSS Jan Damsgaard, 2004

Popularity over time March 2002:Google, WiseNut, AlltheWeb August 2001:Google, Fast, WiseNut April 2001:Google, Fast, MSN (Inktomi) Oct. 2000:Fast, Google, Northern Light July 2000:iWon, Google, AltaVista April 2000:Fast, AltaVista, Northern Light Feb. 2000:Fast, Northern Light, AltaVista Jan. 2000:Fast, Northern Light, AltaVista Nov. 1999:Northern Light, Fast, AltaVista Sept. 1999:Fast, Northern Light, AltaVista Aug. 1999:Fast, Northern Light, AltaVista

May 1999:Northern Light, AltaVista, Anzwers March 1999:Northern Light, AltaVista, HotBot January 1999:Northern Light, AltaVista, HotBot August 1998:AltaVista, Northern Light, HotBot May 1998:AltaVista, HotBot, Northern Light February 1998: HotBot, AltaVista, Northern Light October 1997:AltaVista, HotBot, Northern Light September 1997:Northern Light, Excite, HotBot June 1997:HotBot, AltaVista, Infoseek October 1996:HotBot, Excite, AltaVista

http://searchengineshowdown.com/stats/size.shtml

EBUSS Jan Damsgaard, 2004

Also specific services E.g. Google provides

– Find pdf files– Stock quotes– Cached links– Similar pages– Who links to you– Specific site– Dictionary definitions– Find Maps

Major design issues: completeness and relevance

The set of relevant repliesThe set of obtained results

The larger the overlap the better in terms of completeness

The smaller the set of not relevantReplies the more relevant search

How to organize the results for fast reviewing

EBUSS Jan Damsgaard, 2004

Page Ranking for Relevance Biased or unbiased by search engine? The size of the search space (pages e.g. google

addresses currently 1,346,966,000 pages) Use of keywords: in title, meta-tags information in

HTML code, or near top of the page Use of other facilities like semantic nets or reliability

indices (E.g. google uses page ranks and filtering) Daily, weekly, monthly WebCrawler software refresher For an analysis see http://www.notess.com/search/

EBUSS Jan Damsgaard, 2004

Special features of search engines

Multi-lingua searches Natural language interfaces Image searches Agents (specific crawlers and service

providers, e-mail, news agents, shopping and trading agents)

EBUSS Jan Damsgaard, 2004

Search Assistance Features

Phrase Searching– finds terms you enter into the search box as a phrase; tells you

in results whether any full or partial matches found

Stemming– Ability for search engine to search for variations of word based

on stem Entering "swim" might also find "swims" and maybe "swimming,"

depending on the search engine, in some other languages more important

Some search engines have stemming switched on by default

Clustering – Allows only one page per site to be represented in the results

EBUSS Jan Damsgaard, 2004

Conclusions

Search engines are key elements of Internet business

Next wave will integrate new interfaces and new access channels (Digital TV, wireless)

Mass scale business with the value of installed base