cassandra day ny 2014: apache cassandra & python for the the new york times ⨍aбrik platform
DESCRIPTION
In this session, you’ll learn about how Apache Cassandra is used with Python in the NY Times ⨍aбrik messaging platform. Michael will start his talk off by diving into an overview of the NYT⨍aбrik global message bus platform and its “memory” features and then discuss their use of the open source Apache Cassandra Python driver by DataStax. Progressive benchmark to test features/performance will be presented: from naive and synchronous to asynchronous with multiple IO loops; these benchmarks tailored to usage at the NY Times. Code snippets, followed by beer, for those who survive. All code available on Github!TRANSCRIPT
![Page 1: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/1.jpg)
![Page 2: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/2.jpg)
Cassandra python driver Benchmarking concurrency for nyt aбrik⨍[email protected]
![Page 3: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/3.jpg)
![Page 4: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/4.jpg)
![Page 5: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/5.jpg)
![Page 6: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/6.jpg)
![Page 7: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/7.jpg)
![Page 8: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/8.jpg)
![Page 9: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/9.jpg)
![Page 10: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/10.jpg)
![Page 11: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/11.jpg)
![Page 12: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/12.jpg)
A Global Mesh with a Memory
Message-based: WebSocket, AMQP, SockJS
If in doubt:• Resend• Reconnect• Reread
Idempotent:• Replicating• Racy• Resolving
Classes of service:• Gold: replicate/race• Silver: prioritize• Bronze: queueable
Millions of users
![Page 13: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/13.jpg)
![Page 14: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/14.jpg)
Message: an event with data
CREATE TABLE source_data ( hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id));
![Page 15: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/15.jpg)
![Page 16: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/16.jpg)
![Page 17: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/17.jpg)
![Page 18: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/18.jpg)
1-10kb
1-10kb
Ack
Ack
Push
![Page 19: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/19.jpg)
1kb
1kb
10-150kb
10-150kb
Pull
Synchronous:C* Thrift orCQL Native
![Page 20: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/20.jpg)
ConcurrentDegree = 3
(using theLibev eventLoop)
Asynchronous:CQL Native only
![Page 21: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/21.jpg)
More Concurrency
Can also try:• DC Aware• Token Aware• Subprocessing
![Page 22: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/22.jpg)
Build one
def build_message(self): message = { "message_id": str(uuid.uuid1()), "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }
![Page 23: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/23.jpg)
Kick-off
def push_message(self): if self._submitted_count.next() < self._message_count: message = self.build_message() self.submit_query(message)
def push_initial_data(self): self._start_time = time()
try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()
![Page 24: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/24.jpg)
Put it in the pipeline
def submit_query(self, message): body = message.pop('body')
substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) )
future = self._cql_session.execute_async( self._query, substitution_args )
future.add_callback(self.push_or_finish) future.add_errback(self.note_error)
![Page 25: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/25.jpg)
Maintain concurrency or finish
def push_or_finish(self, _): try: if ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()
![Page 26: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/26.jpg)
1-10kb
1-10kb
Ack
Ack
Push
![Page 27: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/27.jpg)
![Page 28: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/28.jpg)
![Page 29: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/29.jpg)
![Page 30: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/30.jpg)
![Page 31: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/31.jpg)
![Page 32: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/32.jpg)
![Page 33: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/33.jpg)
![Page 34: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/34.jpg)
![Page 35: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/35.jpg)
![Page 36: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/36.jpg)
![Page 37: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/37.jpg)
![Page 38: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/38.jpg)
![Page 39: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/39.jpg)
Push some messages
usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
Push messages from a RabbitMQ queue into a Cassandra table.
![Page 40: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/40.jpg)
Push messages many times
usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
Run multiple test cases based upon the product of worker_counts,prefetch_counts, and consistency_levels. Each test case may be run with up to4 variations reflecting the use or not of the dc_aware and token_awarepolicies. The results are output to stdout as a JSON object.
![Page 41: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/41.jpg)
![Page 42: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/42.jpg)
![Page 43: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/43.jpg)
1kb
1kb
10-150kb
10-150kb
Pull
![Page 44: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/44.jpg)
![Page 45: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/45.jpg)
![Page 46: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/46.jpg)
![Page 47: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/47.jpg)
![Page 48: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/48.jpg)
![Page 49: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/49.jpg)
![Page 50: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/50.jpg)
![Page 51: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/51.jpg)
![Page 52: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform](https://reader034.vdocuments.net/reader034/viewer/2022051611/54b700df4a7959943a8b45b7/html5/thumbnails/52.jpg)