an overview of issues in p2p database systems presented by ahmed ataullah wednesday, november 29 th...

10
An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

Upload: angelica-stevenson

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

An Overview of Issues in P2P database systems

Presented by Ahmed Ataullah

Wednesday, November 29th 2006

Page 2: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

2

Why mix P2P and databases More and more intelligent mobile devices

Storage capacities of 8 gigabytes and beyond are becoming the norm

Most devices are multipurpose and do more than just storage

These nodes can often independently connected to other multipurpose devices

P2P systems have a ‘network effect’ No special infrastructure required to join (usually) No requirements of availability and reliability Community orientation

Some motivating P2P database examples Provincial health care network Travel Agents (worldwide)

Page 3: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

3

P2PDBMS – A generally accepted definition

Unmanaged distributed database system Number of nodes > 10^6 Most nodes (at least half) are offline at any

given time Nodes can leave at any given time and join

from different locations Nodes are independent local database

systems as well Have a local schema and may contribute with

some local resources (data, processing power, bandwidth etc.)

Page 4: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

4

Widely accepted assumptions No central control

No standard schema (FNAME == FIRST_NAME) No standardized local DBMS

Goal centric communities Peers are co-operative

Some work related to game theory has been done with the contrary assumption

Location and location independent scenarios are treated differently by applications

No reliability, serializability and correctness guarantees. Best effort is acceptable Virtually no access control

Page 5: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

5

P2P Database Management Systems What it boils down to…

File sharing, formalized and taken up a notch Our objective is to port everything from the

relational world (tables, constraints, foreign keys, materialized views, triggers etc) into a highly scalable and loosely connected network of database systems

Why is that so difficult?

Page 6: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

6

The Query Processing Nightmare

SELECT MIN (PRICE), DATE, FLIGHT_NUMBER

FROM FLIGHTS

NATURAL JOIN AVAILABILITY

WHEREORIGION= ‘TORONTO’

ANDDESTINATION=‘LONDON’

Schema issues Schemas may not agree Knowledge may not be

consistent, Toronto = YYZ and London = LHR or LGW etc.

Correctness Have to look at every peer. Not possible? Alternative

solutions? Response Time

Most accurate answer up to certain point in time

Page 7: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

7

The Query Processing Nightmare

SELECT MIN (PRICE), DATE, FLIGHT_NUMBER

FROM FLIGHTS

NATURAL JOIN AVAILABILITY

WHEREORIGION= ‘TORONTO’

ANDDESTINATION=‘LONDON’

Data placement issues A correct answer may have to be

derived May require coordination among

peers Local vs. Remote processing

Dynamic coordination rules Is bandwidth more available or

processing power? Cyclic nature of networks

Query propagation and update requests (and all other algorithms) have to be bounded

Page 8: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

8

The Query Optimization Nightmare

SELECT MIN (PRICE), DATE, FLIGHT_NUMBER

FROM FLIGHTS

NATURAL JOIN AVAILABILITY

WHEREORIGION= ‘TORONTO’

ANDDESTINATION=‘LONDON’

Redundancy Issues Same flight and price but

different date? Materialized views

How often do we update these views

Update propagation problem for offline peers

(push/pull strategy) Inserts and Deletes

Is every item unique? Ownership model

Page 9: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

9

Other issues which need attention

SELECT MIN (PRICE), DATE, FLIGHT_NUMBER

FROM FLIGHTS

NATURAL JOIN AVAILABILITY

WHEREORIGION= ‘TORONTO’

ANDDESTINATION=‘LONDON’

Semantic Optimization Not very well studied Must have a well designed model

Fairness Can one agent lie about his/her

ticket prices Incentives and Detection

mechanisms Access control

Can it be offered at a high granularity? Consequences?

Page 10: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006

10

Conclusion (lessons learnt)

P2P database systems are more than just database engines with networking modules above them

Lot more work can be done in various sub areas A minor tweak or assumption change can often lead to

surprisingly different results Interesting ideas like semantic query optimization, fine

grained access control, fairness and control related issues have not been addressed

The need to do so perhaps also not been recognized