distributed systemsiwanicki/courses/ds/2011/... · distributed multidimensional map indexed by a...
TRANSCRIPT
Distributed Systems
Maciej Łopatka
Facebook Inbox Search
Authors Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik
Facebook code dump
Community
Transfer to Apache Software Foundation
An Apache top level project
BigTable data model
An Amazon Dynamo-like infrastructure
Distributed multidimensional map indexed by a key
Four or five dimensions
Key Value Timestamp
Data
Keyspace → Column Family
Column Family → Column Family Row
Column Family Row → Columns
Column → Data value
Keyspace → Super Column Family
Super Column Family → Super Column Family Row
Super Column Family Row → Columns Row
Column Row → Columns
Column → Data value
Replication Log file Bootstrapping Partitioning Consistent Hashing
Periodic Data Compaction Gossip Anti-Entropy data sync (uses Merkel tree) Write and Read Quorum
W + R > N
RandomPartitioner
OrderPreservingPartitioner
Terabytes of data
Replaced MySQL
Detecting failures in 15 seconds
ZooKeeper used to locate nodes
Replaced by HBase
50+TB of data on a 150 node cluster, east and west coast data centers
Term search UserId -> Word -> MessageId Columns
Interaction search UserId -> Recipient UserId -> MessageId Columns
Latency Stat Search Interactions Term Search
Min 7.69ms 7.78ms
Median 15.69ms 18.27ms
Max 26.13ms 44.41ms
Tab. Read performance
Workload A— 50 percent reads and 50 percent updates, update heavy: (a) read operations, (b) update operations.
Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)
Workload B — 50 percent reads and 50 percent updates, Read heavy: (a) read operations, (b) update operations.
Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)
Designed to run on cheap commodity hardware
Handle high write throughput while not sacricing read eciency
Decentralized
Elasticity
Fault-tolerant
Tunable consistency
http://en.wikipedia.org/wiki/Apache_Cassandra Cassandra - A Decentralized Structured Storage
System, Avinash Lakshman, Prashant Malik, Facebook
http://maxgrinev.com/2010/07/09/a-quick-introduction-to-the-cassandra-data-model/
http://www.facebook.com/note.php?note_id=454991608919
http://horicky.blogspot.com/2010/10/bigtable-model-with-cassandra-and-hbase.html
http://www.datastax.com/docs/1.0/ddl/index
Benchmarking Cloud Serving Systems with YCSB, Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears