network software system laboratory rana shahout & ibrahim baransi supervisor : edward bortnikov...

44
Network software system laboratory Rana Shahout & Ibrahim Baransi supervisor : Edward Bortnikov Winter 2011 Real-Time Search Engine

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Network software system laboratory

Rana Shahout & Ibrahim Baransi  supervisor :          Edward Bortnikov

Winter 2011

Real-Time Search Engine

Agenda

• The problem & motivation • Background in search systems • The architecture• CIP policies• Software design

What?

What is the project goal?

Serving fresh search results when the data is constantly changing

Nowadays websites changes in a high frequency, such as Twitter, Facebook, news .

 

Background in search systems

Search cachesWhy is that a problem ?Search engine uses cache optimization which makes the search engine faster and efficient, when the data a dynamic data, some of cache’s information become irrelevant.

Search engines search for the queries first in the cache, and only if there is cache miss they search in the Index.

Thus, when the data is dynamic, it is existing in the cache, and the search engine returns UNCORRECT result

General picture

Why?

The Architecture

Data structures required for implementation

Index- Lucene Index Directory :Lucene is a free text-indexing and -searching API written in Java, a typical Lucene index is stored in a single directory in the file system on a hard disk

Cache-

It was implemented as a linked-list with hash table.

Replacement policy is LRU

CIP-- CACHE INVALIDATION PREDICTORS

 The CIP is formed of two major parts:Synopsis generator is responsible for preparing synopses of the new documents coming in .

Invalidator interacts with the runtime system and decides which cached entries to invalidate according to two policies.

Invalidation Policies

•Basic: invalidates each query (in the cache) which appear in the synopsis.

•Score:Find out all the queries (in the cache) which are contained in the synopsis, for each one of them compute score(q,d)- where d is the added/updated document – and invalidate top K results.

Illustration

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

President Barak Obama meets Mubarak in London

Added Document

Basic Invalidation

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

President Barak Obama meets Mubarak in London

Added Document

Basic Invalidation

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

CIP Will help here !

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

President Barak Obama meets Mubarak in London

Added Document

Basic Invalidation

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

My work is done

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Basic Invalidation

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document

Score Invalidation- K=1

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document-d

Score(q,d) Query

0.56 President Obama

0.32 President Mubarak

0.001 Barak Obama

Score Invalidation- K=1

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document-d

Score(q,d) Query

0.56 President Obama

0.32 President Mubarak

0.001 Barak Obama

Score Invalidation- K=1

President Barak Obama meets Mubarak in London

Value Key

President Mubarak, Egypt Mubarak Mubarak

President Obama, Barak Obama Obama

Facebook features, Facebook account

Facebook

Cache

Added Document-d

Score Invalidation- K=1

President Barak Obama meets Mubarak in London

Software Design – UML Diagrams

Search Query, with miss in cache

Software Design – UML Diagrams

Add a document to index with basic invalidation

Skills

We acquired the following skills in this project: • Knowledge: reading scientific publications • Java (& Advanced Java topics)• Working with Web-server.(apache)• Learning Lucene features and how to use it.• Building software Cache. • UML• XML parsing• HTML