u-p2p a peer-to-peer system for description and discovery of resource-sharing communities aloke...

46
U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Upload: ericka-jeffery

Post on 14-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

U-P2P

A Peer-to-peer System for Description and Discovery of Resource-sharing

Communities

Aloke Mukherjee, Carleton UniversityAugust 28, 2003

Page 2: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Peer-to-peer File-sharing

Exploit storage capability of the edge

Balance load

Robustness to failure

Weaknesses: Search and Communities

Page 3: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Search Problem

Lack of structured metadata Filenames, Keyword matching Opaque identifiers Support for popular formats

Ignoring structured metadata Implicit indicators Collaborative filtering

Page 4: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

State of the Art: Search

Metadata Napster, Kazaa, Limewire, JxtaSearch

Query Routing Gnutella, Routing Indices, Limewire, Neurogrid

Communities JxtaSearch, Alpine, Associative P2P

Search in DHTs PIER, FASD, Inverted Indices

Page 5: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Community Problem

Not simple to create a community for sharing a new file format

Current state Different protocols/apps (gnutella, fasttrack, jxtasearch) Inadequate metadata (filename matching, limited

schemas) Ad-hoc attempts aimed at specific domains

Scattered and isolated – there is no easy way to discover communities

Page 6: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

State of the Art: Communities

Opaque No existing rich metadata search, no way to add it

Limited Rich metadata search for some formats but no way to support new formats

Implicit Implicit indicators are used to identify communities, no way to specify explicitly

Partial Users can explicitly form groups but each grouping is in the eye of the beholder

Unshared Users can explicitly direct rich metadata queries to a community, but response format is not specified

Page 7: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Improving Search

Standard metadata layer

Explicit structured metadata

All resources are XML files

XML Schema used to describe format (e.g. MP3, design pattern)

Page 8: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Schema instantiates resource

<schema>

<element name=“designpattern”>

<sequence>

<element name=“name” type=“string”>

<element name=“author” type=“string”>

<element name=“context” type=“string”>

<element name=“problem” type=“string”>

<element name=“design” type=“string”>

<element name=“diagram” type=“anyURI”>

</sequence>

</element>

</schema>

<designpattern>

<name>singleton</name>

<author>gang of four</author>

<context>when creating a new class…</context>

<problem>ensure a class only has…</problem>

<design>make the class itself responsible…</design>

<diagram>http://example.com/singleton.jpg</diagram>

</designpattern>

Page 9: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Automated interface generation

resource xml

schema

resource create form

resource search form

resource

resource view

instantiates

xslt

xslt xslt

Page 10: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns: xsd="http: / /www.w3.org/2001/XMLSchema">

<xsd:element name="stamps">

<xsd:complexType>

<xsd:all>

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="description" type="xsd: string"/>

<xsd:element name="country" type="xsd:string"/>

<xsd:element name="dateOfI ssue" type="xsd:date"/>

<xsd:element minOccurs="0" name="lastDayOfSale" type="xsd:date"/>

<xsd:element minOccurs="0" name="denomination" type="xsd: string"/ >

. . .

XSL Transform

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns: xsd="http: / /www.w3.org/2001/XMLSchema">

<xsd:element name="stamps">

<xsd:complexType>

<xsd:all>

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="description" type="xsd: string"/>

<xsd:element name="country" type="xsd:string"/>

<xsd:element name="dateOfI ssue" type="xsd:date"/>

<xsd:element minOccurs="0" name="lastDayOfSale" type="xsd:date"/>

<xsd:element minOccurs="0" name="denomination" type="xsd: string"/ >

. . .

XSL Transform

resource xml

schema

resource create form

resource search form

resource

resource view

instantiates

xsl

xsl xsl

Page 11: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

<?xml version="1.0" encoding="UTF-8"?>

<stamps title="2002 Olympic Winter Games">

<name>2002 Olympic Winter Games</name>

<description>To celebrate the spirit of the 2002 Winter Games that took place February 8-24, 2002 in Salt Lake City, Canada Post issued four stamps featuring some of the most exciting events of the games.</description>

<country>Canada</ country>

<dateOfI ssue>2002-01-25</dateOfI ssue>

<lastDayOfSale>2003-01-24</ lastDayOfSale>

<denomination>4 x 48&#xA2;</denomination>

<design>Bhandari and Plater I nc.</design>

<dimensions>30 mm x 40 mm (vertical)</dimensions>

<gumType>P.V.A.</gumType>

<paperType>Tullis Russell Coatings</paperType>

XSL Transform

<?xml version="1.0" encoding="UTF-8"?>

<stamps title="2002 Olympic Winter Games">

<name>2002 Olympic Winter Games</name>

<description>To celebrate the spirit of the 2002 Winter Games that took place February 8-24, 2002 in Salt Lake City, Canada Post issued four stamps featuring some of the most exciting events of the games.</description>

<country>Canada</ country>

<dateOfI ssue>2002-01-25</dateOfI ssue>

<lastDayOfSale>2003-01-24</ lastDayOfSale>

<denomination>4 x 48&#xA2;</denomination>

<design>Bhandari and Plater I nc.</design>

<dimensions>30 mm x 40 mm (vertical)</dimensions>

<gumType>P.V.A.</gumType>

<paperType>Tullis Russell Coatings</paperType>

XSL Transform

resource xml

schema

resource create form

resource search form

resource

resource view

instantiates

xsl

xsl xsl

Page 12: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Community Creation and Discovery:What is a Community?

Concrete object with defined tuple of attributes

Simplest form: (format, protocol, …)Known examples:(mp3, napster) (video, kazaa)

Examples that don’t exist: (design patterns, gnutella) (p2p papers, jxtasearch)

Tuple is specified as a XML file

Page 13: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Simplifying Community Creation

<community>

<name>designpatterns</name>

<schema>designpattern.xsd</schema>

<protocol>gnutella</protocol>

<display>designpattern.stylesheet</display>

</community>

User-designed communities Compose schema to describe format Compose community XML file

Page 14: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Community as class

mp3

mp3 community

mp3

mp3 class

Page 15: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Metaclass analogy

mp3

mp3 community

mp3

mp3 class

community community

class class

Page 16: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Community discovery is File discovery MP3 community shares MP3 files Community community shares communities

mp3

mp3 community

community

community

community

Page 17: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Simplifying Community Discovery

A Community for Communities: The Root Community

Communities are files shared in a real community

Root Community includes schema for communities

(format, protocol) = (community, centralized db)

Page 18: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Schema for Communities

<schema>

<element name="community">

<complexType>

<sequence>

<element name="name" type="xsd:string"/>

<element name="protocol" type="protocolTypes"/>

<element name="schema" type="xsd:anyURI"/>

<element name="display" type="xsd:anyURI"/>

</sequence>

</complexType>

</element>

</schema>

<community>

<name>root community</name>

<schema>community.xsd</schema>

<protocol>central-db</protocol>

<display>community.stylesheet</display>

</community>

The Root Community

Page 19: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

What is U-P2P?

A framework that breathes life into these ideas

Explicit metadata search and creation for every Community

Creation of Community tuples (format, protocol etc…)

Discovery of Community tuples

Page 20: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Design

User

WebAdapter

NetworkAdapter

Repository

network layer

User

WebAdapter

NetworkAdapter

Repository

User

WebAdapter

NetworkAdapter

Repository

User

WebAdapter

NetworkAdapter

Repository

network layer

User

WebAdapter

NetworkAdapter

Repository

User

WebAdapter

NetworkAdapter

Repository

Page 21: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Technologies

Java Tomcat Servlet Container Java Server Pages (JSP) + Servlets XSLT (transforms), XPath (queries) Java components for XSLT, XPath (Xerces,

Xalan) eXist XML Database Log4j (logging infrastructure), JUnit (unit testing)

Page 22: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Evaluation and Validation: Areas of Interest

Publish and Search times as Community size increases

Breaking down Publish and Search operations

Community effectMultiple central servers

Page 23: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Publish

Time to publish a file

y = 0.4473x + 260.84

0

500

1000

1500

2000

2500

1 100 199 298 397 496 595 694 793 892 991

Number of Files

Mil

lis

ec

on

ds

Publish time

Linear trend

Page 24: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Search

Search time vs. Number of Files

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

100s of files

Mil

lis

ec

on

ds

Search time

Page 25: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Community EffectTime to Publish With Communities Present

0

200

400

600

800

1000

1200

1400

1 251 501 751

Number of Files

Mil

lis

ec

on

ds Time to add files

(250 file groups)

Time to publish a file

y = 0.4473x + 260.84

0

500

1000

1500

2000

2500

1 100 199 298 397 496 595 694 793 892 991

Number of Files

Mill

isec

on

ds

Publish time

Linear trend

Average Publish Time

Multiple communities

356 ms

Single community 485 ms

Page 26: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Multiple Central Servers

Root

A

Client

Root

Client

A

Central Server

Central Server(Root) Central Server(A)

Retrieve

Community A

I nfo

Search/Publish

I n Community A

Retrieve

Community A

I nfo

Search/Publish

I n Community A

Single Server Deployment Multiple Server Deployment

Root

A

Client

Root

Client

A

Central Server

Central Server(Root) Central Server(A)

Retrieve

Community A

I nfo

Search/Publish

I n Community A

Retrieve

Community A

I nfo

Search/Publish

I n Community A

Single Server Deployment Multiple Server Deployment

Page 27: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Publish with Multiple Servers

Server Processor Speed OS

1 Pentium 4 1.8 GHz Windows 2000

2 Pentium II 250 MHz Linux (RH7)

3 Celeron 1 GHz Windows XP

Time to Publish Files with Three Servers

0

500

1000

1500

2000

2500

3000

3500

1 251 501

Number of files (250 / server)

Mil

lis

ec

on

ds

Time to Publish(avg: 517.732 ms)

Server 1 Server 2 Server 3

Page 28: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Vs. Without Multiple Central Servers

Server Avg. time to publish a file (750 files published)

S1 455 ms

S2 1355 ms

S3 645 ms

S1, S2, S3 (load-balanced)

517 ms

Page 29: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Contributions

Standard Metadata Layer All communities include support for explicit metadata search and

creation

User-designed Communities Users can easily share new formats with full support for metadata

Community for Communities Prevents fragmented, isolated communities by providing metadata

about communities and a standard method for discovering them

Performance and Scalability Gains Communities can improve performance and scalability vs. systems

where resources are undifferentiated

Page 30: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Future Work

Performance improvements

Protocol independence (adapters for Gnutella, Freenet, etc.)

Community-aware Gnutella routing

More Community parameters (security, authentication, etc.)

Page 31: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Future Work continued

Trust metrics (to differentiate between communities, metadata quality)

Community evolution

Inheritance and multiple inheritance for Communities

Page 32: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

U-P2P Publications

A. Mukherjee, B. Esfandiari, N. Arthorne, “U-P2P: A Peer-to-peer System for Description and Discovery of Resource-sharing Communities”, ICDCS Workshops 2002: 701-705, July 2002.

Neal Arthorne, Babak Esfandiari and Aloke Mukherjee, "U-P2P: A Peer-to-peer Framework for Universal Resource Sharing and Discovery”, Proceedings of Freenix track of Usenix 2003, 29-38, June 2003.

http://u-p2p.sourceforge.net

Page 33: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Backup slides

Page 34: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

WebAdapter: User Interaction Model

UserUser

Web service

network layer

Application/UI Web Browser

Standard user interaction model for Peer-to-peer applications

User interaction model for U-P2P

UserUser

Web service

network layer

Application/UI Web Browser

Standard user interaction model for Peer-to-peer applications

User interaction model for U-P2P

Page 35: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Repository Design

Community Resource

Resource Collection

Attachment

Attachment

Id: 1

Id: 2Id: 3

Community Resource

Resource Collection

Attachment

Attachment

Id: 1

Id: 2Id: 3

Page 36: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Repository Design: Resource IDs

Root (/ )

/genes / music/molecules

file1

file2

file3

file4

file5

file6

file7

file8

file9

/ music/ rock

file10

file11

file12

User designates /music as the root directory

beneath which all files and directories are shared.

Traditional model

Root (/ )

/genes /music/molecules

file1

file2

file3

file4

file5

file6

file7

file8

file9

/music/ rock

file10

file11

file12

Resource IDs act as indirect references to files

anywhere in the file system.

Resource IDs

4d5e…f7/molecules/file1

82db…0a

/genes/file5

9e40…f9/music/ rock/file10

U-P2P model

Root (/ )

/genes / music/molecules

file1

file2

file3

file4

file5

file6

file7

file8

file9

/ music/ rock

file10

file11

file12

User designates /music as the root directory

beneath which all files and directories are shared.

Traditional model

Root (/ )

/genes /music/molecules

file1

file2

file3

file4

file5

file6

file7

file8

file9

/music/ rock

file10

file11

file12

Resource IDs act as indirect references to files

anywhere in the file system.

Resource IDs

4d5e…f7/molecules/file1

82db…0a

/genes/file5

9e40…f9/music/ rock/file10

U-P2P model

Root (/ )

/genes /music/molecules

file1

file2

file3

file4

file5

file6

file7

file8

file9

/music/ rock

file10

file11

file12

Root (/ )

/genes /music/molecules

file1

file2

file3

file4

file5

file6

file7

file8

file9

/music/ rock

file10

file11

file12

Resource IDs act as indirect references to files

anywhere in the file system.

Resource IDs

4d5e…f7/molecules/file1

82db…0a

/genes/file5

9e40…f9/music/ rock/file10

Resource IDs

4d5e…f7/molecules/file1

82db…0a

/genes/file5

9e40…f9/music/ rock/file10

U-P2P model

Page 37: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Repository Design: XML Database

Requirements Flexibility to store wide variety of formats Handle powerful queries over all metadata

XML Database better suited than RDBMS Difficult to map fields to rows and columns

Chose eXist XML database Open source Written in Java Support for XML:DB API

Page 38: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Network Adapter Design

Abstract interface to Peer-to-peer Network Routing search requests, handling results,

handle incoming search requests, etc.

Only implemented Hybrid model (Napster model)

All peers can act as client and/or server

Page 39: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Network Adapter: Protocol

PeerCentral

Server

1. RegisterRequest( community, resource id )

2. RegisterResponse ( is resource known? )

3. RegisterRequest( community, resource id, metadata )

PeerCentral

Server

1. RegisterRequest( community, resource id )

2. RegisterResponse ( is resource known? )

3. RegisterRequest( community, resource id, metadata )

PeerCentral

Server

1. SearchRequest( community, query )

2. SearchResponse ( results )

PeerCentral

Server

1. SearchRequest( community, query )

2. SearchResponse ( results )

1. SearchRequest( community, query )

2. SearchResponse ( results )

Page 40: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Evaluation and Validation: Challenges

Finding large XML collections Berkeley Drosophila Genome Project: genome

annotations Other sources: DBLP (CS papers), EDGAR

(SEC filings), GeneOntology (gene-related concepts)

Transforming DTDs to XML Schema (DTDXS package)

Automation XML-RPC interface for publish and search

Page 41: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Publish: Breakdown of Operations

User

browser

database

Server

datastructures3

3b

database

Client

datastructures

1

2

User

browser

database

Server

datastructures3

3b

database

Server

datastructuresdatastructures3

3b

database

Client

datastructuresdatastructures

1

2

Page 42: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Publish: Client Timings

Time to Store File in Client DB

y = 0.1232x + 48.702

0

100

200

300

400

500

600

1 100 199 298 397 496 595 694 793 892 991

Number of files

Mil

lis

ec

on

ds

Time to store in client db

Linear trend

Page 43: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Publish: Server Timings

Comparison of Total Publish Time vs. Server Operations

y = 0.3622x + 114.72

y = 0.3476x + 47.319

0

100

200

300

400

500

600

700

800

900

1000

1 67 133 199 265 331 397 463 529 595 661 727 793 859 925 991

Number of Files

Mil

lis

ec

on

ds

Publish to Server

Resource Lookups + Stored in Db

Linear (Publish to Server)

Linear (Resource Lookups + Stored in Db)

Page 44: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Network Adapter: Protocol

PeerCentral

Server

1. RegisterRequest( community, resource id )

2. RegisterResponse ( is resource known? )

3. RegisterRequest( community, resource id, metadata )

PeerCentral

Server

1. RegisterRequest( community, resource id )

2. RegisterResponse ( is resource known? )

3. RegisterRequest( community, resource id, metadata )

PeerCentral

Server

1. SearchRequest( community, query )

2. SearchResponse ( results )

PeerCentral

Server

1. SearchRequest( community, query )

2. SearchResponse ( results )

1. SearchRequest( community, query )

2. SearchResponse ( results )

Page 45: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Search: Breakdown of Operations

User

browser

database

Server

datastructures

1

Client

2

User

browser

database

Server

datastructuresdatastructures

1

Client

2

Page 46: U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003

Search: Total vs. Server Timings

Components of Search Operation

0

500

1000

1500

2000

2500

1 101 201 301 401 501 601 701

Number of Searches (58 / 100 files)

Mil

lis

ec

on

ds

Time to Search Server Database

Total Search Time