democratizing search. 1 / 43 title bernhard rieder université de paris viii - vincennes saint-denis...

Post on 18-Dec-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

. democratizing search

. 1 / 43 Title

Bernhard Rieder

Université de Paris VIII - Vincennes Saint-Denis

Laboratoire Paragraphe

Democratizing SearchConcepts and Challenges

Deep Search

World-Information Institute

8 / 11 / 2008

. democratizing search

. 2 / 43 I - Search engine basics

Search engine basics rehashed

Search engines have emerged as the dominant pathways into

the depths of the Web for 1.5 billion Internet users.

After email, search is the second most frequent activity online.

Search engines play an important role in shaping which sites

are visible on the Web and which sites are not.

. democratizing search

. 3 / 43 I - The problem with search

The problem with search

"Search is broken." [ Jimmy Wales 2008 ]

The most common points of critique:• Crawling and ranking are not transparent ( black box )

• Might favor "commercial" sites

• Smaller sites have little visibility

• Susceptible to manipulation ( SEO )

• Results are "read only"

• Google as monopoly gatekeeper

A "quick fix" is quite improbable.

. democratizing search

. 4 / 43 I - Search as a strange object

Search as a strange object

Web search is a phenomenon that is not easy to categorize

and conceptualize.

• The Web is an information space unlike any other

• Search can be done using different techniques

• It is part of a variety of practices

What is the closest antecedent? The library catalogue? Mass

media? Guidebooks? Domain experts?

. democratizing search

. 5 / 43 I - Searching or Filtering?

Search is not search!?

Web search is part of a larger shift from information scarcity to

abundance.

"The task is not to design information-distributing systems but

intelligent information-filtering systems." [ H. Simon 1969 ]

Search engines are not systems of classification, they are

machines that make judgments on the importance of pieces of

information relative to a query.

. democratizing search

. 6 / 43 I - SCREEN: Yahoo 1997

. democratizing search

. 7 / 43 I - SCREEN: AltaVista 1996

. democratizing search

. 8 / 43 I - The search pipeline

The search pipeline

A search engine includes several distinct stages:

Crawler Index Search& Rank

GUI

. democratizing search

. 9 / 43 I - Some basic ranking principles

Some basic ranking principles ( content ranking )

query: "house" rank by: number of occurrences

query: "house AND hill" rank by: closeness"there is a house on the hill"

"from my house I can see a beautiful hill"

query: "house" rank by: location in document"<title>house</title>"

"<p>house<p>"

query: "house" rank by: URL"http://www.house.com""http://www.villa.com"

. democratizing search

. 10 / 43 I - Link analysis

The dominant paradigm: recursive link analysis

. democratizing search

. 11 / 43 II - The Web as scale-free network

The Web as scale-free network

. democratizing search

. 12 / 43 II - Link analysis and the logic of the hit

Link analysis and the logic of the hit

Link analysis projects the hypertext graph as a hierarchical

list that strongly favors hubs and networks of hubs.

Growth principle: "preferential attachment"• "cumulative advantage"

• "the rich get richer"

• "logic of the hit"

"We will have to realize that hierarchies fulfill a semantic

function and that semantic systems are hierarchic by

principle." [ Hartmut Winkler 1997 ]

. democratizing search

. 13 / 43 II - CITATION: Best friend

"So what’s our straightforward

definition of the ideal search engine?

Your best friend with instant access to

all the world’s facts and a photographic

memory of everything you’ve seen and

know. That search engine could tailor

answers to you based on your

preferences, your existing knowledge

and the best available information."

- Marissa Mayer, Google VP

. democratizing search

. 14 / 43 II - Current guiding principles

Current guiding principles

The two dominant guiding principles currently are:

• popularity ( the logic of the hit )

• convenience ( personalization )

. democratizing search

. 15 / 43 II - Where to look for alternative principles?

Where can we look for alternative principles?

Web search is a new phenomenon; it can nonetheless be

compared to adjacent domains.

• Libraries and documentation ( freedom of access )

• Media and journalism ( neutrality, plurality )

• Cultural policy - "exception culturelle" ( diversity )

• Community organization ( participation )

• Liberal democracy ( transparency, accountability )

. democratizing search

. 16 / 43 III - CITATION: Democracy!

"Democracy! Bah! When I hear that

word I reach for my feather Boa!"

- Allen Ginsberg

. democratizing search

. 17 / 43 III - Two concepts of democracy: community

Democracy as community

"The second big element of Web 2.0 is democracy. We now

have several examples to prove that amateurs can surpass

professionals, when they have the right kind of system to

channel their efforts. [ … ] Another place democracy seems to

win is in deciding what counts as news. I never look at any

news site now except Reddit." [ Paul Graham 2005 ]

"Democratizing search" would mean letting users rank results.

The community decides which information is best ( markers:

votes, clicks, pageviews, etc. ).

. democratizing search

. 18 / 43 III - CITATION: Wales on bias

"The idea that all 'selection' is equally

'biased' is fallacious. We intuitively

understand this when we talk about

other forms or writing or journalism;

we need to understand it for *this*

form of journalism as well."

- Jimmy Wales

. democratizing search

. 19 / 43 III - Wikia Search

Wikia Search

Wikia Search tries to apply the Wikipedia principle to ranking

search results, following the NPOV principle.

• All technology is open source

• Crawling is distributed using GRUB

• Currently in an experimental stage

Wikia Search follows a series of explicit principles:• Transparency

• Community

• Quality

• Privacy

. democratizing search

. 20 / 43 III - SCREEN: Wikia Search Abortion

. democratizing search

. 21 / 43 III - SCREEN: Wikia Search McCain

. democratizing search

. 22 / 43 III - Two concepts of democracy: society

Democracy as society

Large-scale collective governance based on bureaucratic

institutions limited by checks and balances.

"Democratizing search" could mean adapting search to the

requirements of liberal democracy.

Web search would serve the goal of informing citizens on the

different courses of action.

. democratizing search

. 23 / 43 III - What should we strife for?

What should we strive for?

Reforming the search landscape is a normative project that

would produce winners and losers.

• Transparency => Plurality of opinion

• Community => Society

• Quality

• Privacy

The goal would be having a variety of high-quality search

applications that deliver different sets of results.

. democratizing search

. 24 / 43 III - Democratizing search: main challenges

Democratizing search: main challenges

Market entry into the search market has become difficult.

• Cost for infrastructure / datacenter

• Difficulty finding quantifiable markers for ranking

• Changing user habits / software defaults

Every part of the search pipeline has specific costs and

specific engineering challenges. In order to have very fast

end-user performance, there has to be sophisticated load

balancing and an elaborate datacenter architecture.

. democratizing search

. 25 / 43 III - Overview

Democratizing search: overview

User sideeducation

Provider sideantitrust measures

financial aid

Interaction between user and serviceinterface / algorithm additions

search APIs

search sandbox

. democratizing search

. 26 / 43 III - CITATION: Mind of god

"The perfect search engine would be

like the mind of God."

- Sergey Brin

. democratizing search

. 27 / 43 III - A: User side

User side: education

Information access is driven by informational practices as

much as technology itself. User education can include:• General information on search engines and how they work

• Using a search engine to its full potential

• Learning about alternatives to the dominant player

• Understanding that linking is not an innocent practice

• General informational ecology

These points could easily be included into teaching curricula.

. democratizing search

. 28 / 43 III - A: SCREEN: Cheat sheet

. democratizing search

. 29 / 43 III - B: SCREEN: Monopoly

comScore European Search Properties March 08

. democratizing search

. 30 / 43 III - B: Antitrust measures

Provider side: antitrust measures

Ownership is commonly an issue in the world of media.

Google is politically quite active.

But how to split up

http://google.com?

. democratizing search

. 31 / 43 III - B: Financial aid

Provider side: financial aid

A series of countries grant direct or indirect subsidies to

newspapers.

France taxes cinema tickets and redistributes the money to

level the playing field.

Countries can offer targeted R&D grants ( e.g. Quaero ).

There could be public search engines or a public datacenter

infrastructure.

. democratizing search

. 32 / 43 III - B: A public infrastructure

Provider side: building a public infrastructure

Crawler Index Search& Rank

GUI

. democratizing search

. 33 / 43 III - B: SCREEN: exalead

. democratizing search

. 34 / 43 III - C: Empower the user through interaction

Between user and provider: interaction possibilities

Crawler Index Search& Rank

GUI

. democratizing search

. 35 / 43 III - C: SCREEN: exalead

. democratizing search

. 36 / 43 III - C: SCREEN: exalead

. democratizing search

. 37 / 43 III - C: SCREEN: msn sliders

. democratizing search

. 38 / 43 III - C: SCREEN: clusty

. democratizing search

. 39 / 43 III - C: Opening the results

Between user and provider: better Web APIs

Search APIs allow external applications to download a limited

number of results ( Google ~8, Yahoo BOSS 50, Live API 50 ).

With larger result sets, effective reranking or more powerful

user interaction would be possible.

. democratizing search

. 40 / 43 III - C: Opening the index

Between user and provider: the search sandbox

Crawler Index Search& Rank

GUI

. democratizing search

. 41 / 43 III - C: Opening the index

Between user and provider: the search sandbox

A search sandbox would have the following elements:• Run on corporate infrastructure

• A safe execution environment for untrusted code

• A limited set of API calls to access the index

• Users and institutions could propose alternative ranking methods

• Quota rules for processing time

This might allow an ecosystem of search methods to develop

in a situation that is both technically and economically viable.

. democratizing search

. 42 / 43 Conclusions

Conclusions

We will have to put humans "back into the loop", render

search configurations hybrid and more complex.

In order to open up the search landscape and get closer to

the goal of plurality, we will have to combine all three levels.

We need more large scale empirical data on search habits

and consequences of ranking.

Without better conceptual grasp on search engines,

regulatory efforts are highly improbable.

. democratizing search

. 43 / 43 The End

Thank you for your attention!

bernhard.rieder@univ-paris8.fr

http://bernhard.rieder.fr

http://thepoliticsofsystems.net

top related