memex: a browsing assistant for collaborative archiving and mining of surf trails soumen chakrabarti...

46
Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari Indian Institute of Technology Bombay

Upload: chastity-black

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

Memex: A Browsing Assistant forCollaborative Archiving and

Mining of Surf Trails

Soumen ChakrabartiSandeep Srivastava

Mallela SubramanyamMitul Tiwari

Indian Institute of Technology Bombay

Page 2: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Sources of Web information Sources already exploited

• Text on pages (keyword search)• Link between pages (popularity rating)• Topic taxonomies (query expansion)

Sources not exploited enough yet• Public surfing history• Public bookmarks

Collaboration is central to hypertext Lack of trust limits collaboration on Web

Page 3: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Our goals Infrastructure to support spontaneous

formation of topic-based collaborative Web communities• Browsing assistant client• Community server

Mining algorithms for personal and community level topic management and collaborative resource discovery

Extensible API for plugging in additional hypertext analysis tools

Page 4: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

1: Create aMemex account(password sent

by email)

3: Allow the Memexclient to attach toyour Web browser

4: Log on to theMemex server

2: Install theMemex applet signing

certificate and visitthe applet page

Page 5: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Memex clientapplet attachesto browser

Privacy choice

Function ta

bs

Page 6: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Preparing toimport initialbookmarks

Page 7: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Bookmarksimported

Page 8: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

For Memex to suggestan initial topic organization,select all bookmarks…

Page 9: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

…and send themto the clustering tab

Page 10: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Switch to theclustering tab

URLs to beclusteredappear here

Page 11: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Submit the URLsto the server-sideMemex clusteringdemon

Page 12: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Check later if theserver has completedthe clustering task

Page 13: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Two top-levelclusters aboutsoftware andmusic

Page 14: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Expanding thesoftware clusterto study it inmore detail

Page 15: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

User can freelyreorganize URLplacement usingcut-and-paste

Page 16: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

User can freelyreorganize URLplacement usingcut-and-paste

Page 17: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

User can freelyreorganize URLplacement usingcut-and-paste

Page 18: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Moving an entirefolder from thecluster tab…

Page 19: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

…to the foldertab together withexample URLs

Page 20: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

…to the foldertab together withexample URLs

Page 21: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Folder names can beedited as per taste; thisalso gives Memexadditional clues aboutthe folder’s contents

Page 22: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

New folders can becreated to hold clustersfound in the cluster tab

Page 23: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

New folders can becreated to hold clustersfound in the cluster tab

Page 24: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

A topic hierarchy which istoo detailed for the user canbe flattened

Page 25: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

A topic hierarchy which istoo detailed for the user canbe flattened

Page 26: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Groups of closely relatedURLs can be moved backto folders in the folder tab

Page 27: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Groups of closely relatedURLs can be moved backto folders in the folder tab

Page 28: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Memex helps the user derivea starting topic hierarchy fromunstructured bookmarks

Page 29: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

The user then continuesbrowsing in multiple sessions.Relevant pages found by othermembers of the communityand made public are availablefor collaborative surfing

Page 30: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

If permission is granted, theMemex applet monitors the trailthat the surfer follows anduploads it to the server forfurther analysis and mining

Page 31: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

If permission is granted, theMemex applet monitors the trailthat the surfer follows anduploads it to the server forfurther analysis and mining

Page 32: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Such surf trails together withpage contents are valuableinputs to the Memex server-sidehypertext mining and resourcediscovery demons

Page 33: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

In the background, the Memexclassifier finds the most suitablefolders to assign to each historyitems. History is never deleted (diskis cheap). When the user refreshesthe view, surf history from othersand herself are found categorizedinto the user’s familiar topic tree.

‘?’ indicates that Memex is not

sure about the folder assignment.

Users can easily correct mistakes

and this forms additional

valuable training data.

Page 34: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Automatic collaborativeclassification also lets usersreturn to a topic-restrictedsurfing context quickly, andreplay the last few surfingactions within that topicof interest.

Page 35: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Personalized topic-basedhistory management is farsuperior to the one-dimensional history listprovided by popularbrowsers

Page 36: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Users can switch topics witha single click, and browsingis not limited by the linear“back and forward” paradigmsupported by browsers.

Page 37: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Users can switch topics witha single click, and browsingis not limited by the linear“back and forward” paradigmsupported by browsers.

Page 38: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

A flexible interactive searchlets the user locate any pageever visited from anywhereusing this account, combiningcontent with popularity, siteselections and timeliness

Page 39: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

A flexible interactive searchlets the user locate any pageever visited from anywhereusing this account, combiningcontent with popularity, siteselections and timeliness

Page 40: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Close integration of theMemex client with thebrowser is non-trivial toimplement but adds greatlyto comfort and ease of use

Page 41: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Memex system diagram

Browser

Memex server

Client JARVisit

Runningclient applet

Download

Attach

Eve

nt-

han

dle

r se

rvle

ts

Search

Folder

Context

Archive

Memex client-serverprotocol and workloadsharing negotiations

Relationalmetadata

Textindex

Min

ing

de

mo

ns

Topicmodels

Taxonomy synthesis

Resource discovery

Recommendation

Classification

Clustering

Page 42: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Document workflow

Demon Registry

X

Per-document version queue

NODEtable

Crawler

Searchindexer

Classifierservice

Clusteringservice

Garbagecollector

Push newversion

Pop anddiscard

old version

BrowserMemexclient

Page visit andbookmarkingevents logged

Page 43: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Autonomous topic organization Bookmarks often collected into topics Surfers use personal topic organization One-size-fits all taxonomy inadequate

• Many topics over-developed for most of us• http://dmoz.org/Sports/Hockey/Underwater_Hockey/

• But deeper interests often underdeveloped• Structure reorganization also desirable

Best taxonomy depends on community behavior as well as page content

Page 44: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Autonomy and collaboration Personalization picking Yahoo nodes Complex relations between topics Need “simplest common ground”

• Coalesce similar topics where possible…• …without sacrificing individual taste

Sports

Hiking

Subsumption

User2User1Yahoo

Biz

Shops

Bikeshops

Sports

Cycling

Cycling

Bikeshops

Sports

User3

Tree ‘inversion’

Page 45: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Taxonomy synthesis example

Generating themes makes map simpler But distorts contents of original folders Joint optimization gives best themes

Entertainment

Studios

Broadcasting

Media kpfa.org

bbc.co.uk

kron.com

channel4.com

kcbs.com

foxmovies.com

miramax.com

lucasfilms.com

Share document

Share folder

Share termsThemes

‘Radio’

‘Television’

‘Movies’

Page 46: Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari

IITB 2000

Summary and project status Collaborative resource discovery and topic

management system Testbed for hypertext mining research Signed Java2 client

• Netscape 4.5+ available• IE5+ planned

Server for Unix and Windows• IBM UDB, Berkeley DB, servlets• Non-trivial to install and manage• Simple-to-use RPMs being planned

http://www.cse.iitb.ernet.in/~soumen