accumulo summit 2014: accismus -- percolating with accumulo

of 39/39
Accismus A Percolator implementation using Accumulo Keith Turner

Post on 18-Jan-2015

519 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

Talk Info --------- Title: Percolating with Accumulo Abstract: A talk about conditional mutations and Accismus (a Percolator prototype) covering the following topics. * Conditional mutation use cases and overview * Conditional muitation implementation * Percolator overview and use cases * Percolator implementation

TRANSCRIPT

  • 1. Accismus A Percolator implementation using Accumulo Keith Turner

2. Accismus A form of irony where one pretends indifference and refuses something while actually wanting it. 3. Google's Problem Use M/R to process ~ 10 15 bytes ~10 12 bytes new data arrive Use M/R to process 10 15 + 10 12 bytes High latency before new data available for query 4. Solution Percolator : incremental processing for big data Layer on top of BigTable Offers fault tolerant, cross row transactions Lazy recovery Offers snapshot isolation Only read committed data Uses BigTable data model, except timestamp Accismus adds visibility Has own API 5. Observers User defined function that executes a transaction Triggered when a user defined column is modified (called notification in paper) Guarantee only one transaction will execute per notification 6. Initialize bank tx1.begin() if(tx1.get('bob','balance') == null) tx1.set('bob','balance',100) if(tx1.get('joe','balance') == null) tx1.set('joe','balance',100) if(tx1.get('sue','balance') == null) tx1.set('sue','balance',100) tx1.commit() What could possibly go wrong? 7. Two threads transferring Thread 2 on node BThread 2 on node B tx3.begin() b3 = tx3.get('joe','balance') b4 = tx3.get('sue','balance') tx3.set('joe','balance',b3 + 5) tx3.set('sue','balance',b4 - 5) tx3.commit() Thread 1 on node AThread 1 on node A tx2.begin() b1 = tx2.get('joe','balance') b2 = tx2.get('bob','balance') tx2.set('joe','balance',b1 + 7) tx2.set('bob','balance',b2 - 7) tx2.commit() 8. Accismus stochastic bank test Bank account per row Initialize N bank accounts with 1000 Run random transfer threads Complete scan always sums to N*1000 9. Phrasecount example Have documents + source URI Dedupe documents based on SHA1 Count number of unique documents each phrase occurs in Can do this with two map reduce jobs https://github.com/keith-turner/phrasecount 10. Accismus Application Map Reduce+Bulk Import Load Transactions Observers Export Transactions 11. Load transaction 1 document:b4bf617e my dog is very nice http://foo.com/a 12. Load transaction 2 document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a 13. Load transaction 3 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c 14. Observer transaction 1 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 dog is very nice : 1 15. Observer transaction 2 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 his dog is very : 1 dog is very nice : 2 16. Load transaction 4 document:1e111475 his dog is very nice document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 his dog is very : 1 dog is very nice : 2 17. Observer transaction 3 document:b4bf617e my dog is very nice http://foo.com/a http://foo.net/a http://foo.com/c my dog is very : 1 dog is very nice : 1 18. Phrasecount schema Row Column Value uri: doc:hash doc: doc:content doc: doc:refCount doc: index:check null doc: index:status INDEXED | null phrase: stat:docCount 19. Querying phrase counts Query Accismus directly Lazy recovery may significantly delay query High load may delay queries Export transaction write to Accumlo table WARNING : leaving the sane word of transactions Faults during export Concurrently exporting same item Out of order arrival of exported data 20. Export transaction strategy Only export committed data (Intent log) Don't export something a transaction is going to commit Idempotent Export transaction can fail Expect repeated execution (possibly concurrent) Use committed sequence # to order data Thread could read export data, pause, then export old data. Use seq # as timestamp in Accumulo export table 21. Phrasecount export schema Row Column Value phrase: export:check phrase: export:seq phrase: export:sum phrase: stat:sum 22. Phrasecount problems No handling for high cardinality phrases Weak notifications mentioned in paper Multi-row tree another possibility Possible memory exhaustion Percolator uses many threads to get high throughput Example loads entire document into memory. Many threads X large documents == dead worker. 23. Weak notifications(Queue) String pr = 'phrase:'+phrase; int current = tx1.get(pr,'stat:docCount') if(isHighVolume(phrase)){ tx1.set(pr,'stat:docCount'+rand,delta) tx1.weakNotify(pr); //trigger observer to collapse rand columns }else tx1.set(pr, 'stat:docCount',delta + current) 24. Multi-row tree for high cardinality phrase: phrase_01: phrase_1: phrase_00: phrase_0: phrase_10: phrase_11: Incoming updates leaves Observers percolate to root Export from root 25. Timestamp Oracle lightweight centralized service that issues timestamp Allocates batches of timestamps from zookeeper Give batches of timestamps to nodes executing transactions 26. Timestamp oracle Gives logical global ordering to events Transactions get timestamp at start. Only read data committed before. Transaction get timestamp when committing. 27. Percolator Implementation Two phase commit using conditional mutations Write lock+data to primary row/column Write lock+data to all other row/columns commit primary row/column if still locked commit all other row/columns Lock fails if change between start and commit timestamp All row/columns in transaction point to primary In case of failure, primary is authority No centralized locking 28. Handling failures Transaction dies in phase 1 Written some locks+data Must rollback Transaction dies in phase 2 All locks+data written Roll-forward and write data pointers 29. Transfer transaction Row Column Percolator Type Time Value bob balance write 1 0 bob balance data 0 100 joe balance write 1 0 joe balance data 0 100 Percolator appends column type to qualifier. Accismus uses high 4 bits of timestamp. 30. Lock primary Row Column Percolator Type Time Value bob balance write 1 0 bob balance lock 3 bob:balance bob balance data 3 93 bob balance data 0 100 joe balance write 1 0 joe balance data 0 100 31. Lock other Row Column Percolator Type Time Value bob balance write 1 0 bob balance lock 3 bob:balance bob balance data 3 93 bob balance data 0 100 joe balance write 1 0 joe balance lock 3 bob:balance joe balance data 3 107 joe balance data 0 100 32. Commit primary Row Column Percolator Type Time Value bob balance write 6 3 bob balance write 1 0 bob balance data 3 93 bob balance data 0 100 joe balance write 1 0 joe balance lock 3 bob:balance joe balance data 3 107 joe balance data 0 100 What happens if tx with start time 7 reads joe and bob? Commit timestamp obtained after all locks written, why? 33. Commit other Row Column Percolator Type Time Value bob balance write 6 3 bob balance write 1 0 bob balance data 3 93 bob balance data 0 100 joe balance write 6 3 joe balance write 1 0 joe balance data 3 107 joe balance data 0 100 34. Garbage collection Not mentioned in paper Use compaction iterator Currently keep X versions. Could determine oldest active scan start timestamp. Must keep data about success/failure of primary column Added extra column type to indicate when primary can be collected. Never collected in failure case. 35. After GC Iterator Row Column Percolator Type Time Value bob balance write 6 3:TRUNC bob balance write 1 0 bob balance data 3 93 bob balance data 0 100 joe balance write 6 3:TRUNC joe balance write 1 0 joe balance data 3 107 joe balance data 0 100 Transaction with read time of 5 would see StaleScanException 36. Snapshot iterator Used to read data Analyzes percolator metadata on tserver Returns commited data