coursera cassandra driver

42
Coursera, Cassandra, Java Drivers

Upload: datastax-academy

Post on 06-Jan-2017

469 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Coursera Cassandra Driver

Coursera, Cassandra, Java Drivers

Page 2: Coursera Cassandra Driver

Biography

Daniel Chia @DanielJHChia

Software Engineer, Infrastructure Team

2

Page 3: Coursera Cassandra Driver

1 Introduction

2 Why We Chose Cassandra

3 Example Use Cases

4 Pain Points

5 Java Drivers

Page 4: Coursera Cassandra Driver

Coursera

4

Page 5: Coursera Cassandra Driver

5

Page 6: Coursera Cassandra Driver

6

Web iOS Android

Page 7: Coursera Cassandra Driver

Why Cassandra

7

Page 8: Coursera Cassandra Driver

Coursera Tech Stack

• 100% AWS • MySQL + Cassandra • Service-oriented

8

Page 9: Coursera Cassandra Driver

Consistently Fast Latencies

9

Page 10: Coursera Cassandra Driver

Availability

10

Page 11: Coursera Cassandra Driver

Scalability

11

Page 12: Coursera Cassandra Driver

Use Case #1

• Resume video where you left off • High write volume • TTL data

12

Page 13: Coursera Cassandra Driver

13

CREATE TABLE video_progress_kvs_basic ( user_id int, course_id varchar, video_id varchar, viewed_up_to bigint, updated_at bigint PRIMARY KEY ((user_id, course_id, video_id)));

Page 14: Coursera Cassandra Driver

Use Case #2: Media Asset Service

14

Page 15: Coursera Cassandra Driver

15

Page 16: Coursera Cassandra Driver

16

Page 17: Coursera Cassandra Driver

Use case #3: Video Workflows

17

Input.mp4

Step 1: Audio

Step 2: Low Res Video

Step 3: High Res Video

Assembly 1: Crash

Assembly 2: Ok

Assembly 3: Crash

Assembly 4: Ok

Assembly 5: Ok

Page 18: Coursera Cassandra Driver

18

Page 19: Coursera Cassandra Driver

CREATE TABLE transloadit_workflow ( workflow_id text, step_id text, assembly_id text, step_details text, step_payload map<text, text>, step_status text, PRIMARY KEY (workflow_id, step_id, assembly_id))

19

Page 20: Coursera Cassandra Driver

20

Looking Back

Page 21: Coursera Cassandra Driver

Cassandra - Initial Pain Points

• Can’t execute arbitrary queries • Filtering, sorting, etc.

• Can’t be abused as an OLAP database

• Worries about ‘eventual’ consistency

21

Page 22: Coursera Cassandra Driver

Gotchas

• Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?)

• Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance

22

Page 23: Coursera Cassandra Driver

Helpful Things

• Data modeling consulting

• Monitoring

• Data access layer for common use cases

23

Page 24: Coursera Cassandra Driver

24

Page 25: Coursera Cassandra Driver

25

Page 26: Coursera Cassandra Driver

Java Drivers

Page 27: Coursera Cassandra Driver

Best Practices

• Driver Choice • Cluster / Connection Setup • Executing Queries

27

Page 28: Coursera Cassandra Driver

28

Datastax Java Drivers

Page 29: Coursera Cassandra Driver

29

public class Scratch { static Cluster cluster;

public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra") .build();

readRow("asset:QoMqLLyCEeSOi3paAormVw");

cluster.close(); }

static void readRow(String id) { Session session = cluster.connect("asset");

ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);

System.out.println(result.one()); session.close(); }}

Page 30: Coursera Cassandra Driver

30

cluster = Cluster.builder() .addContactPoint("cassandra") .build();

Page 31: Coursera Cassandra Driver

31

LoadBalancingPolicy policy = new TokenAwarePolicy( new DCAwareRoundRobinPolicy());

cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy) .build();

Page 32: Coursera Cassandra Driver

32

cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy)

.withRetryPolicy(retryPolicy) .build();

Page 33: Coursera Cassandra Driver

Default Retry Policy

• Retries read if enough replicas alive, but data fetch failed. • Retries write only for batched writes. • Retries next host on Unavailable. 2.0.11+ or 2.1.7 (JAVA-709)

33

Page 34: Coursera Cassandra Driver

Share Session!

34

public static void main(String args[]) { cluster = Cluster.builder()

.addContactPoint(“cassandra”).build();

readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");

cluster.close(); }

static void readRow(String id) { Session session = cluster.connect("asset");

ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);

System.out.println(result.one()); session.close(); }

Page 35: Coursera Cassandra Driver

35

public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra").build();

session = cluster.connect();

readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");

session.close(); cluster.close();}

static void readRow(String id) { ResultSet result = session.execute( "SELECT * from asset.asset_kvs_timestamp where part_key = ?", id);

System.out.println(result.one());}

Page 36: Coursera Cassandra Driver

Use prepared statements

• If doing query more than once • Better performance • Token aware routing

36

Page 37: Coursera Cassandra Driver

37

static PreparedStatement statement;

public static void main(String args[]) { …

session = cluster.connect(); statement = session.prepare( "SELECT * from asset.asset_kvs_timestamp where part_key = ?")

readRow("asset:QoMqLLyCEeSOi3paAormVw");

… }

static void readRow(String id) { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound);

System.out.println(result.one()); }

Page 38: Coursera Cassandra Driver

There Be Dragons.. JAVA-420

statement = session.prepare( "SELECT part_key, time_key, content from asset.asset_kvs_timestamp where part_key = ?")

38

Always specify columns explicitly for prepared statements!

Page 39: Coursera Cassandra Driver

Consider Async

static List<String> readRows(List<String> ids) { return ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound); return result.one().getString("c_enc"); }).collect(Collectors.toList());}

39

Page 40: Coursera Cassandra Driver

Async..

static ListenableFuture<List<String>> readRowsAsync(List<String> ids) { List<ListenableFuture<String>> futures = ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSetFuture future = session.executeAsync(bound);

return Futures.transform(future, (ResultSet result) -> result.one().getString(“c_enc"));

}).collect(Collectors.toList());

return Futures.allAsList(futures);}

40

http://www.datastax.com/dev/blog/java-driver-async-queries

Page 41: Coursera Cassandra Driver

Thank you

Page 42: Coursera Cassandra Driver

Cassandra Summit 2016 September 7-9 San Jose, CA

Get 15% Off with Code: MeetupPromo Cassandrasummit.org