data management in cloud platforms

13
Data Management in the Cloud Platforms Sefa Şahin Koç Dev&Ops

Upload: shnkoc

Post on 20-Jun-2015

162 views

Category:

Engineering


4 download

DESCRIPTION

Data Management in cloud platforms. Replication. Master election

TRANSCRIPT

Page 1: Data Management in Cloud Platforms

Data Management in the Cloud PlatformsSefa Şahin KoçDev&Ops

Page 2: Data Management in Cloud Platforms

Abstract● Introduction to Cloud Computing● Cloud Characteristics● Data Analysis in the Cloud● Replication● Master-slave election● References● Q&A

Page 3: Data Management in Cloud Platforms

Introduction to Cloud Computing● Encompass works of computer processing, storage and software delivery

● Get rid of large IT investments and its management○ no need for configuration and extra employers to do that

● Enable professionals to get in powerful computing resources○ Powerful computers are hard to buy○ Maintenance is expensive

● pay-as-you-go model is preferable for startups○ pay how much you use

Page 4: Data Management in Cloud Platforms

Cloud Characteristics● Elasticity helps to widen database due to demands

○ Quickly insert new resources

● Security risk for data○ Governments may have in law rights to reach servers

● Replication across large geographic distance○ Latency in data transfer

● Heterogeneous infrastructure○ Different resource usage for VMs in same cloud

Page 5: Data Management in Cloud Platforms

Data Analysis in the Cloud● Wish List

○ Efficiency○ Fault tolerance

■ hard to guarantee ACID properties in transactional data management over large geographical distances

■ complex queries can take time on weak processors○ ability to run in a heterogeneous environment

■ different performance of nodes○ ability of data encryption

■ decrypt data before sending to avoid high bandwidth○ ability to interface with business products

■ ODBC or JDBC

Page 6: Data Management in Cloud Platforms

Replication (1)● Master-slave

○ master: controller node. ○ slave: read-only nodes

● Write operation is done on master nodes. Slaves replicate the changes.● Multi-master replication

○ one fails, others continue○ at different physical locations can shorten distance to slaves○ loosely consistent○ violates ACID○ complex and increases latency○ conflict resolution

Page 7: Data Management in Cloud Platforms

Replication (2)● Multimaster replication (cont.)

○ e.g. Couchdb, cloudant, oracle, mysql etc○ Multiversion Concurrency Control (MVCC)

● Replication types○ Storage level replication

■ guarantees ‘zero data loss’■ copies disk blocks

○ File level replication■ less bandwidth■ know what to replicate■ uses CPU

Page 8: Data Management in Cloud Platforms

Replication (3)● Replication types(cont.)

○ Journaling■ Operation logs■ See which operations are done and apply them in secondaries■ May be preferable for sensitive data

● Database size may differ○ Different pre-allocation○ Different disk fragmentation

Page 9: Data Management in Cloud Platforms

Replication (4)● Comparison

Page 10: Data Management in Cloud Platforms

● Need to be immediate and fast○ Absence of a primary should be detected fast○ Election must start immediately○ Without a primary node, replica set is read-only

● Odd number of nodes is recommended○ The master will be one who connects

to majority.○ Accept-reject votes will not be equal.

Master-slave election

Page 11: Data Management in Cloud Platforms

Master-slave election (2)● Give priority for quick election

○ Node with highest priority will be voted.○ A node with high priority can drop

candidacy of a node with low priority.

● Network partitions○ Put the majority in same cloud

Page 12: Data Management in Cloud Platforms

References● http://en.wikipedia.org/wiki/Replication_(computing)● http://en.wikipedia.org/wiki/Leader_election● http://docs.mongodb.org/manual/faq/replica-sets/● http://docs.mongodb.org/manual/core/replica-set-elections/ ● Abadi, Daniel J. Data Management in the Cloud: Limitations and

Opportunities. IEEE Data Eng. Bull. 32(1): 3-12 (2009). Available at http://sites.computer.org/debull/A09mar/abadi.pdf.

Page 13: Data Management in Cloud Platforms

Questions

?