an overview of issues in p2p database systems presented by ahmed ataullah wednesday, november 29 th...
TRANSCRIPT
![Page 1: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/1.jpg)
An Overview of Issues in P2P database systems
Presented by Ahmed Ataullah
Wednesday, November 29th 2006
![Page 2: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/2.jpg)
2
Why mix P2P and databases More and more intelligent mobile devices
Storage capacities of 8 gigabytes and beyond are becoming the norm
Most devices are multipurpose and do more than just storage
These nodes can often independently connected to other multipurpose devices
P2P systems have a ‘network effect’ No special infrastructure required to join (usually) No requirements of availability and reliability Community orientation
Some motivating P2P database examples Provincial health care network Travel Agents (worldwide)
![Page 3: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/3.jpg)
3
P2PDBMS – A generally accepted definition
Unmanaged distributed database system Number of nodes > 10^6 Most nodes (at least half) are offline at any
given time Nodes can leave at any given time and join
from different locations Nodes are independent local database
systems as well Have a local schema and may contribute with
some local resources (data, processing power, bandwidth etc.)
![Page 4: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/4.jpg)
4
Widely accepted assumptions No central control
No standard schema (FNAME == FIRST_NAME) No standardized local DBMS
Goal centric communities Peers are co-operative
Some work related to game theory has been done with the contrary assumption
Location and location independent scenarios are treated differently by applications
No reliability, serializability and correctness guarantees. Best effort is acceptable Virtually no access control
![Page 5: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/5.jpg)
5
P2P Database Management Systems What it boils down to…
File sharing, formalized and taken up a notch Our objective is to port everything from the
relational world (tables, constraints, foreign keys, materialized views, triggers etc) into a highly scalable and loosely connected network of database systems
Why is that so difficult?
![Page 6: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/6.jpg)
6
The Query Processing Nightmare
SELECT MIN (PRICE), DATE, FLIGHT_NUMBER
FROM FLIGHTS
NATURAL JOIN AVAILABILITY
WHEREORIGION= ‘TORONTO’
ANDDESTINATION=‘LONDON’
Schema issues Schemas may not agree Knowledge may not be
consistent, Toronto = YYZ and London = LHR or LGW etc.
Correctness Have to look at every peer. Not possible? Alternative
solutions? Response Time
Most accurate answer up to certain point in time
![Page 7: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/7.jpg)
7
The Query Processing Nightmare
SELECT MIN (PRICE), DATE, FLIGHT_NUMBER
FROM FLIGHTS
NATURAL JOIN AVAILABILITY
WHEREORIGION= ‘TORONTO’
ANDDESTINATION=‘LONDON’
Data placement issues A correct answer may have to be
derived May require coordination among
peers Local vs. Remote processing
Dynamic coordination rules Is bandwidth more available or
processing power? Cyclic nature of networks
Query propagation and update requests (and all other algorithms) have to be bounded
![Page 8: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/8.jpg)
8
The Query Optimization Nightmare
SELECT MIN (PRICE), DATE, FLIGHT_NUMBER
FROM FLIGHTS
NATURAL JOIN AVAILABILITY
WHEREORIGION= ‘TORONTO’
ANDDESTINATION=‘LONDON’
Redundancy Issues Same flight and price but
different date? Materialized views
How often do we update these views
Update propagation problem for offline peers
(push/pull strategy) Inserts and Deletes
Is every item unique? Ownership model
![Page 9: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/9.jpg)
9
Other issues which need attention
SELECT MIN (PRICE), DATE, FLIGHT_NUMBER
FROM FLIGHTS
NATURAL JOIN AVAILABILITY
WHEREORIGION= ‘TORONTO’
ANDDESTINATION=‘LONDON’
Semantic Optimization Not very well studied Must have a well designed model
Fairness Can one agent lie about his/her
ticket prices Incentives and Detection
mechanisms Access control
Can it be offered at a high granularity? Consequences?
![Page 10: An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006](https://reader036.vdocuments.net/reader036/viewer/2022072006/56649f455503460f94c662eb/html5/thumbnails/10.jpg)
10
Conclusion (lessons learnt)
P2P database systems are more than just database engines with networking modules above them
Lot more work can be done in various sub areas A minor tweak or assumption change can often lead to
surprisingly different results Interesting ideas like semantic query optimization, fine
grained access control, fairness and control related issues have not been addressed
The need to do so perhaps also not been recognized