consensus algo with_distributed_key_value_store_in_distributed_system

Download Consensus algo with_distributed_key_value_store_in_distributed_system

If you can't read please download the document

Upload: atin-mukherjee

Post on 08-Aug-2015

118 views

Category:

Technology


1 download

TRANSCRIPT

  1. 1. Using consensus algorithm and distributed store in designing distributed system Atin Mukherjee GlusterFS Hacker @mukherjee_atin
  2. 2. Topics What is consensus in distributed system? What is CAP theorem in distributed system Different distributed system design approaches Challenges in design of a distributed system What is RAFT algorithm and how it works Distributed store Combining RAFT & distributed store in the form of technologies like consul/etcd/zookeeper etc Q & A
  3. 3. What is consensus in distributed system Consensus An agrement but for what and between whom? For what the op/transaction to be committed or not Between whom Answer is pretty simple, the nodes forming the distributed system Quorum (n/2) + 1
  4. 4. CAP theorem Any two of the following three gurantees Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it succeeded or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  5. 5. Design approaches of distributed system No meta data all nodes share across their data Meta data server One node holds data where others fetches from it So which one is better??? Probably none of them? Ask yourself for a minute....
  6. 6. Challenges in design of a distributed system No meta data N * N exchange of Network messages Not scalable when N is probably in hundreds or thousands Initialization time can be very high Can end up in a situation like whom to believe, whom not to - popularly known as split brain How to undo a transaction locally
  7. 7. Challenges in design of a distributed system contd... MDS (Meta data server) SPOF Ahh!! so is this the only drawback?? How about having replicas and then replica count?? Additional N/W hop, lower performance
  8. 8. RAFT A consensus algorithm Key features Leader followers based model Leader election Normal operation Safety and consistency after leader changes Neutralizing old leaders Client interactions Configuration changes
  9. 9. RAFT : Server states Server states transition
  10. 10. RAFT : Terms Divided into two parts Election Normal operation At most 1 leader per term Failed election Split vote Each server maintains current term value
  11. 11. RAFT : Replicated state machine A picture says thousand words...
  12. 12. RAFT : Different RPCs RequestVote RPCs Candidate sends to other nodes for electing itself as leader AppendEntries RPCs Normal operation workload AppendEntries RPCs with no message - Heart beat messages Leader sends to all followers to make its presence
  13. 13. RAFT : Leader Election current_term++ Follower->Candidate Self vote Send request vote RPCs to all other servers, retry until either: Receive votes from majority of server Receive RPC from valid leader Election time out elapses increment term Election properties Safety allow at most one winner per term Liveness some candidate must eventually win
  14. 14. RAFT : Picking the best leader Candidate include log info in RequestVote RPCs with index & term of last log entry Voting server V denies vote if its log is more complete by (votingServerLastTerm > candidateLastTerm || ((votingServerLastTerm == candidateLastTerm) && (votingServerLastIndex > candidateLastIndex)) But is this enough to have crash consistency?
  15. 15. RAFT : New commitment rules For a leader to decide an entry is committed: Must be stored on a majority of server & At least one new entry from leader's term must also be stored on majority of servers
  16. 16. RAFT : Log inconsistency Leader repairs log entries by Delete extraneous entries Fill in missing entries from the leader
  17. 17. RAFT : Neutralizing old leaders Sender sends its term over RPC If sender's term in older than receiver's term RPC is rejected else it receiver steps down to follower, updates its term and process the RPC
  18. 18. RAFT : Client protocol Send commands to leader If leader is unknown, send to anyone If contacted server is not leader, it will redirect to leader Client gets back the response after the full cycle at leader Req- timeout Re-issues command to other server Unique id for each command at client to avoid duplicate execution
  19. 19. Joint consensus phase 2 phase approach Need majority of both old and new configurations for election and commitment Configuration change is just a log entry, applied immediately on receipt (committed or not) Once joint consensus is committed, begin replicate log entry for final configuration
  20. 20. Distributed store A common store which can be shared by different nodes In the form of key value pair for ease of use Such distributed key value store implementations are available.
  21. 21. etcd Named as /etc distributed Open source distributed consistent key value store Highly available and reliable Sequentially consistent Watchable Exposed via HTTP Runtime reconfigurable (Saling feature) Durable (snapshot backup/restore) Time to live keys (have a time out)
  22. 22. etcd cond.. Bootstraping using RAFT Proxy mode in node Cluster configuration etcdctl member add/remove/list Similar projects like consul, zookeeper are also available.
  23. 23. Why etcd Vibrant community 500+ applications like kubernetes, cloud foundry using it 150+ developers Stable releases
  24. 24. References https://raftconsensus.github.io/ https://www.youtube.com/watch? v=YbZ3zDzDnrw https://github.com/coreos/etcd#etcd https://consul.io/
  25. 25. Q & A
  26. 26. THANK YOU