data management in cloud platforms
DESCRIPTION
Data Management in cloud platforms. Replication. Master electionTRANSCRIPT
Data Management in the Cloud PlatformsSefa Şahin KoçDev&Ops
Abstract● Introduction to Cloud Computing● Cloud Characteristics● Data Analysis in the Cloud● Replication● Master-slave election● References● Q&A
Introduction to Cloud Computing● Encompass works of computer processing, storage and software delivery
● Get rid of large IT investments and its management○ no need for configuration and extra employers to do that
● Enable professionals to get in powerful computing resources○ Powerful computers are hard to buy○ Maintenance is expensive
● pay-as-you-go model is preferable for startups○ pay how much you use
Cloud Characteristics● Elasticity helps to widen database due to demands
○ Quickly insert new resources
● Security risk for data○ Governments may have in law rights to reach servers
● Replication across large geographic distance○ Latency in data transfer
● Heterogeneous infrastructure○ Different resource usage for VMs in same cloud
Data Analysis in the Cloud● Wish List
○ Efficiency○ Fault tolerance
■ hard to guarantee ACID properties in transactional data management over large geographical distances
■ complex queries can take time on weak processors○ ability to run in a heterogeneous environment
■ different performance of nodes○ ability of data encryption
■ decrypt data before sending to avoid high bandwidth○ ability to interface with business products
■ ODBC or JDBC
Replication (1)● Master-slave
○ master: controller node. ○ slave: read-only nodes
● Write operation is done on master nodes. Slaves replicate the changes.● Multi-master replication
○ one fails, others continue○ at different physical locations can shorten distance to slaves○ loosely consistent○ violates ACID○ complex and increases latency○ conflict resolution
Replication (2)● Multimaster replication (cont.)
○ e.g. Couchdb, cloudant, oracle, mysql etc○ Multiversion Concurrency Control (MVCC)
● Replication types○ Storage level replication
■ guarantees ‘zero data loss’■ copies disk blocks
○ File level replication■ less bandwidth■ know what to replicate■ uses CPU
Replication (3)● Replication types(cont.)
○ Journaling■ Operation logs■ See which operations are done and apply them in secondaries■ May be preferable for sensitive data
● Database size may differ○ Different pre-allocation○ Different disk fragmentation
Replication (4)● Comparison
● Need to be immediate and fast○ Absence of a primary should be detected fast○ Election must start immediately○ Without a primary node, replica set is read-only
● Odd number of nodes is recommended○ The master will be one who connects
to majority.○ Accept-reject votes will not be equal.
Master-slave election
Master-slave election (2)● Give priority for quick election
○ Node with highest priority will be voted.○ A node with high priority can drop
candidacy of a node with low priority.
● Network partitions○ Put the majority in same cloud
References● http://en.wikipedia.org/wiki/Replication_(computing)● http://en.wikipedia.org/wiki/Leader_election● http://docs.mongodb.org/manual/faq/replica-sets/● http://docs.mongodb.org/manual/core/replica-set-elections/ ● Abadi, Daniel J. Data Management in the Cloud: Limitations and
Opportunities. IEEE Data Eng. Bull. 32(1): 3-12 (2009). Available at http://sites.computer.org/debull/A09mar/abadi.pdf.
Questions
?