information technology in business and society session 9 – search and advertising sean j. taylor

50
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 9 – SEARCH AND ADVERTISING SEAN J. TAYLOR

Upload: nigel-manning

Post on 17-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETYSESSION 9 – SEARCH AND ADVERTISING

SEAN J. TAYLOR

ADMINISTRATIVIA

• Assignment 2 onlinedue Saturday 2/25 at 1am

• Assignment 2 resources

• Assignment 3 preview

• Guest speaker on Tuesday 2/28: Chrys Wu discussing IT and Journalism

• Substitute on Thursday 3/1Professor Dylan Walker

LEARNING OBJECTIVES

1. Learn how search engines rank pages

2. Learn how to design effectively for high rankings

3. Learn how online advertising works, especially search ads and keyword auctions

4. The future of search

SEARCH ENGINES AND WEB DIRECTORIES

Resources on the Web that help you find sites with the information and/or services you want.

• Directory search engine - organizes listings of Web sites into hierarchical lists.

• Search engine - uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes.

WEB DIRECTORIES EXAMPLE

Advantages? Disadvantages?

SEARCH ENGINE EXAMPLES

Advantages? Disadvantages?

SEARCH ENGINES DRIVE ECOMMERCE!

WHERE IS CONSUMERS ATTENTION?

EYETRACKING STUDY OF GOOGLE RESULTS

– Search engines discover new pages by following links

– Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list

– Text content is important! But is not enough! (Why?)

How do search engines rank pages?(why does this matter?)

HOW SEARCH ENGINES WORK

PAGERANK IS REALLY A “RANDOM SURFER” MODEL

Random Surfer Model:

T 1 W)1( 22)1( WW)1(1

1

What about getting stuck in loops? takes care of that

Let’s count the surfer’s that pass through each point:

Transfer Matrix:

The probability that a surfer follows a link from webpage i to webpage j is = [Prob. you were not “picked up”] * [prob. of following link i->j ]

The matrix if page i links to page j

MEASURING IMPORTANCE OF LINKING

PageRank Algorithm

Idea: important pages are pointed to by other important pages

Method:

• Each link from one page to another is counted as a “vote” for the destination page

• The number of incoming links is important!• But it is not enough!

• But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on)

Compare, for example, the cases for the circled page in cases A and B

A

B

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C

(ignoring damping factor for illustration)

COMPUTING PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C

COMPUTING PAGERANK

(ignoring damping factor for illustration)

PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C.250 .250

.250 .250

(ignoring damping factor for illustration)

PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C.250 .250

.250 .250

.250/3

.250

.250/3

.250/2

.250.250/3 .250/2

(ignoring damping factor for illustration)

PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C

.250/3

.250

.250/3

.250/2

.250.250/3 .250/2

.375 .083

.083 .458

(ignoring damping factor for illustration)

PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C

.375/3

.083

.375/3

.083/2

.458.375/3 .083/2

.375 .083

.083 .458

(ignoring damping factor for illustration)

PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C

.375/3

.083

.375/3

.083/2

.458.375/3 .083/2

.500 .125

.125 .250

(ignoring damping factor for illustration)

PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C.400 .133

.133 .333

.400/3

.133

.400/3

.133/2

.333.400/3 .133/2

(ignoring damping factor for illustration)

GAMING PAGERANK AND TRUST

TrustRank Algorithm

Initial votes come only from trusted pages

Compare, for example, the cases for the circled page in cases A and B B

trusted page

trusted page

Links from untrusted sources

A

SIMULATINGCHANGES IN PAGERANK

People who bought this also bought…

BOOK A

book B

book C

book D

People who bought this also bought…

BOOK D

book C

People who bought this also bought…

BOOK C

book A

People who bought this also bought…

BOOK B

book A

book C

Change PR of A PR of C

C cuts link to A 0.18 0.50

C links to B 0.38 0.33

C links to D 0.24 0.40

C links to B & D 0.22 0.38

.400 .133

.133 .333

IMPORTANCE OF ANCHOR TEXT

<a href=http://www.sims…>INFOSYS 141</a>

<a href=http://www.sims…>A terrific course on search engines</a>

The anchor text summarizes what the website is about.

OTHER RANKING FACTORS

Location, Location, Location...and Frequency

• Query words in title, or in first few sentences• The more frequent the query words, the better

Click through measurement

• How often users click on your URL, when they see it

• How long do they stay (using toolbars!)

OUTLINE1. Learn how search engines rank pages

2. Learn how to design effectively for high rankings

3. Learn how online advertising works, especially search ads and keyword auctions

4. The future of search

ACHIEVING HIGHER RESULTS RANKINGS• Position your keywords (title, headings, early on page)

• Make text visible (no tiny fonts, no white-on-white)

• Frames can kill

• Have relevant content

• Do not change topics

• Just say no to search engine spamming

• Submit your key pages

• Verify your listing often

Motives• Commercial, political, religious, lobbies• Promotion funded by advertising budget

Operators• Contractors (Search Engine Optimizers) for lobbies,

companies• Web masters• Hosting services

What are the techniquesused by rankings manipulators?

MANIPULATING RANKINGS

MANIPULATION TECHNOLOGIESCloaking

• Serve fake content to search engine robot• DNS cloaking: Switch IP address. Impersonate

Doorway pages• Pages optimized for a single keyword that re-direct

to the real target page Keyword Spam

• Misleading meta-keywords, excessive repetition of a term, fake “anchor text”

• Hidden text with colors, CSS tricks, etc.Link spamming

• Mutual admiration societies, hidden links, awards• Domain flooding: numerous domains that point or

re-direct to a target pageRobots

• Fake click stream• Fake query stream

Is this a SearchEngine spider?

N

Y

SPAM

FakeDoc

Cloaking

Meta-Keywords = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, viagra, …”

Risky to use any of these as search engines aregetting better at detecting and punishing them

OUTLINE1. Learn how search engines rank pages

2. Learn how to design effectively for high rankings

3. Learn how online advertising works, especially search ads and keyword auctions

4. The future of search

PAID RANKING

Keyword bidding for targeted ads

• Pay-per-click• Higher bids result in higher ranks for the ad• Higher percentage of clicks on the ad, increase

the rank as well (why?)

Google's AdWords is the biggest player

• Google’s 2007 revenue was more than $16 Billion, 2008 ~ $22 Billion, mostly from such ads

Promoting without Manipulation: Paid placement

EXAMPLE

AdWordsPlacement

AdWords Placement

Most relevant sites

FUND YOUR WEBSITE: ADSENSEGoogle also delivers ads to other websites

Sign-up for Google AdSense, and Google delivers ads to your website (common source of income for “professional” bloggers)

How ads are delivered:

• If website best for targeted keywords

• If users of website click on results

Strategies for successful ads:

• Place the ads on top

• Blend with the rest of the website

• Ads at the bottom are ignored consistently

EXAMPLE: WASHINGTON POSTWEBSITE

Analysis of Washington Post

Website

TARGETING BANNER ADS

Request for Ad from Ad Server

IP AddressCountry, Domain, CompanyBrowser, Operating System

Surfing Behavior from cookiesDemographic Data?

Targeted Ad isDelivered to

User

Context:Movie reviews

User Profile:NYU userNew York

UserVisits

PublisherSites

Ads Delivered By Dart For Advertisers

DART For

Advertisers

BoomerangCaptures User

Action DataData Analysis

Databank

Boomerang Compiles & Reports Response For Future Targeting

User Clicks &Visits

Advertiser’sSite

CLOSED LOOP MARKETING

Source: Doubleclick, Inc.

FUTURE OF SEARCH

1. Information Extraction:Search on Structured Data

2. Social Search

3. Privacy Preserving Search

INFORMATION EXTRACTION

Information extraction applications extract structured relations from unstructured text

May 19 1995, Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire , is finding itself hard pressed to cope with the crisis…

Date Disease Name Location

Jan. 1995 Malaria Ethiopia

July 1995 Mad Cow Disease U.K.

Feb. 1995 Pneumonia U.S.

May 1995 Ebola Zaire

Disease Outbreaks in The New York Times

Information Extraction System

(e.g., NYU’s Proteus)

RETURN STRUCTURED ANSWERS, NOT WEBPAGES

FUTURE OF SEARCH

1. Information Extraction:Search on Structured Data

2. Social Search

3. Privacy Preserving Search

Y! ANSWERS

Launched in second half of 2005

Incentive system based on points and voting for best answers

Questions grouped by category

Some statistics:

• over 60 million users

• over 120 million answers, available in 18 countries and in 6 languages

Y! ANSWERS

Y! ANSWERS

LONG-TERM PROSPECTSQuestions follow a power-law:

•Large number of questions will be asked by many people (20% of questions80% of requests)

•We only need one answer for each question•Acquire quickly high-quality answers for 80% of queries

•…people will take care in time of the “long tail” of the remaining questions

FUTURE OF SEARCH

1. Information Extraction:Search on Structured Data

2. Social Search

3. Privacy Preserving Search

PRIVACY PRESERVING SEARCH

NEXT CLASS:SOCIAL NETWORKS

• Work on Assignment 2