peer-to-peer search that works, djoerd hiemstra

PEERTOPEER SEARCH THAT WORKS

Djoerd Hiemstrahttp://www.cs.utwente.nl/~hiemstra

Yandex, Moscow, 27 April 2011

WHAT DOES A SEARCH ENGINE

LOOK LIKE?

A DATA CENTER...?

Goose Creak, California

A DATA CENTER...?

In Eemshaven... ? Biggest data center in Europe 100,000 servers, 19000 m2, Uses electricity equal to 80.000 households

A DATA CENTER...?

… where the * is Eemshaven?

Close to a power plant Close to the sea (cooling!)

WHAT ELSE DOES A SEARCH ENGINE

LOOK LIKE?

A “BIG BROTHER” ?

NO REALLY, WHAT DOES A SEARCH

ENGINE LOOK LIKE?

… FINDS WHAT YOU NEED ?

SO, NOT NECESSARILY...

Green; environmentally friendly respecting privacy, objective... nor democratic.

WHAT SHOULD A SEARCH ENGINE

LOOK LIKE?

YOUR PERSONAL SYSTEM:

PEERTOPEER SEARCH

YOUR PERSONAL SYSTEM:

Each user brings processing power: As search consumer and search supplier

Green! Democratic No “big brother”

PEERTOPEER SEARCH

Moscow

Results for “Moscow”

PEERTOPEER SEARCH

RuSSIRGo to peer 74

PEERTOPEER SEARCH

RuSSIRGo to peer 74R

r “R

PEERTOPEER SEARCH

RuSSIR

Go to peer 2

PEERTOPEER SEARCH

RuSSIR

Results for “R

uSSR”

RuSSIR

Go to peer 2

OVERVIEW

1. Caching in P2P networks

2. Querybased sampling using snippets

3. Deep web querying

P2P LOAD BALANCING BY CACHING

If you do not index documents, cache them!

Handles query bursts: (e.g., “michael jackson's death”)

QUERY LOG & CACHING POTENTIAL

SHARE RATIOS

CACHE SIZES

EFFECT OF TEXT PROCESSING

DISCUSSION

About 55 % from cache in ideal case About 78 % from cache with subsumption,

stemming, etc. About 33 % from cache if bounded cache

and churn (but no subsumption)

OVERVIEW

QUERYBASED SAMPLING

Never download any documents Instead, use the search results

snippets to learn about documents

DO SAMPLES RESEMBLE THE FULL INDEX?

CAN WE DO BETTER THAN RANDOM?

DISCUSSION

1. Sampling snippets is as effective as sampling full documents

2. Can be done at no extra costs(!)3. Random sampling is an effective strategy

OVERVIEW

DEEP WEB QUERYING

Opportunity: while we are sending queries to search engines directly...… we might as well search the deep web!

YOUR TYPICAL DEEP WEB SITEYOUR TYPICAL DEEP WEB SITEhttp://www.ns.nlhttp://www.ns.nl

NATURAL LANGUAGE QUERYING

EASY TO SPECIFY

USER STUDY

A = fromB = toV = viaD = dateT = time

DISCUSSION

1. Users like the interface2. Users perform the tasks faster3. Considerable query variation between

subjects: No “one size fits all”!

CONCLUSIONS

Peertopeer is a viable approach to large scale search

Peertopeer search will make Google, Yahoo, Bing and Yandex irrelevant ;)

PUBLICATIONS Almer Tigelaar, Djoerd Hiemstra, and Dolf Trieschnigg, Search

Result Caching in P2P Information Retrieval Networks, Proceedings of the 2nd Information Retrieval Facility Conference (IRFC), 2011.

Almer Tigelaar and Djoerd Hiemstra, QueryBased Sampling using Snippets, In Proceedings of the SIGIR 2010 Workshop on LargeScale Distributed Systems for Information Retrieval, 2010.

Kien TjinKamJet, Dolf Trieschnigg, and Djoerd Hiemstra, FreeText Search versus Complex Web Forms, Proceedings of the European Conference on Information Retrieval (ECIR), 2011.

ACKNOWLEDGEMENTS

Netherlands Organization for Scientific Research

Almer Tigelaar Kien TjinKamJet Dolf Trieschnigg

“MAIL” RESULTS FROM YANDEX ?

peer-to-peer search that works, djoerd hiemstra

Technology

transportation today and tomorrow, with glen hiemstra,

future of health care, by glen hiemstra, futurist.com

university of groningen focus on your strengths? hiemstra

the people, the spirit, lodging the future. presenters ·...

zorg | 140416 | stelselwijzigingen curatieve zorg 2015 |...

peer to-peer

from peer to peer: 10 peer reviewing tips from peer...

bbva madrid, october 29th 2008 · peer 2 peer 3 peer 4 peer...

june 2020 investor presentation › mr5ircnw_encana ›...

lessons from the future for sonae, with glen hiemstra

how to incorporate cooperative learning in your classroom...

barnabas’ ministry of encouragement sunday, august 26,...

sablefish - linda hiemstra

congres nn open - johan hiemstra

information retrieval models - · information retrieval...

peer-to-peer data management management peer-to-peer data...

equity crowdfunding & peer-to-peer lending€¦ · legalink...

self-directed learning: individualizing instruction – we...

information retrieval models - djoerd hiemstra...some...

water, energy, lifestyles and the future - with glen...