coursera cassandra driver

Click here to load reader

Post on 06-Jan-2017

449 views

Category:

Technology

1 download

Embed Size (px)

TRANSCRIPT

  • Coursera, Cassandra, Java Drivers

  • Biography

    Daniel Chia @DanielJHChia

    Software Engineer, Infrastructure Team

    2

  • 1 Introduction2 Why We Chose Cassandra3 Example Use Cases4 Pain Points5 Java Drivers

  • Coursera

    4

  • 5

  • 6

    Web iOS Android

  • Why Cassandra

    7

  • Coursera Tech Stack

    100% AWS MySQL + Cassandra Service-oriented

    8

  • Consistently Fast Latencies

    9

  • Availability

    10

  • Scalability

    11

  • Use Case #1

    Resume video where you left off High write volume TTL data

    12

  • 13

    CREATE TABLE video_progress_kvs_basic ( user_id int, course_id varchar, video_id varchar, viewed_up_to bigint, updated_at bigint PRIMARY KEY ((user_id, course_id, video_id)));

  • Use Case #2: Media Asset Service

    14

  • 15

  • 16

  • Use case #3: Video Workflows

    17

    Input.mp4

    Step 1: Audio

    Step 2: Low Res Video

    Step 3: High Res Video

    Assembly 1: Crash

    Assembly 2: Ok

    Assembly 3: Crash

    Assembly 4: Ok

    Assembly 5: Ok

  • 18

  • CREATE TABLE transloadit_workflow ( workflow_id text, step_id text, assembly_id text, step_details text, step_payload map, step_status text, PRIMARY KEY (workflow_id, step_id, assembly_id))

    19

  • 20

    Looking Back

  • Cassandra - Initial Pain Points

    Cant execute arbitrary queries Filtering, sorting, etc.

    Cant be abused as an OLAP database

    Worries about eventual consistency

    21

  • Gotchas

    Lots of truly ad-hoc queries is hard Dont use C* directly to explore your data. (Spark?)

    Sorting, filtering can be hard Consider Solr / ElasticSearch Or even MySQL depending on load / importance

    22

  • Helpful Things

    Data modeling consulting

    Monitoring

    Data access layer for common use cases

    23

  • 24

  • 25

  • Java Drivers

  • Best Practices

    Driver Choice Cluster / Connection Setup Executing Queries

    27

  • 28

    Datastax Java Drivers

  • 29

    public class Scratch { static Cluster cluster;

    public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra") .build();

    readRow("asset:QoMqLLyCEeSOi3paAormVw");

    cluster.close(); }

    static void readRow(String id) { Session session = cluster.connect("asset");

    ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);

    System.out.println(result.one()); session.close(); }}

  • 30

    cluster = Cluster.builder() .addContactPoint("cassandra") .build();

  • 31

    LoadBalancingPolicy policy = new TokenAwarePolicy( new DCAwareRoundRobinPolicy());

    cluster = Cluster.builder() .addContactPoint(cassandra") .withLoadBalancingPolicy(policy) .build();

  • 32

    cluster = Cluster.builder() .addContactPoint(cassandra") .withLoadBalancingPolicy(policy)

    .withRetryPolicy(retryPolicy) .build();

  • Default Retry Policy

    Retries read if enough replicas alive, but data fetch failed. Retries write only for batched writes. Retries next host on Unavailable. 2.0.11+ or 2.1.7 (JAVA-709)

    33

  • Share Session!

    34

    public static void main(String args[]) { cluster = Cluster.builder()

    .addContactPoint(cassandra).build();

    readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");

    cluster.close(); }

    static void readRow(String id) { Session session = cluster.connect("asset");

    ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);

    System.out.println(result.one()); session.close(); }

  • 35

    public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra").build();

    session = cluster.connect();

    readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");

    session.close(); cluster.close();}

    static void readRow(String id) { ResultSet result = session.execute( "SELECT * from asset.asset_kvs_timestamp where part_key = ?", id);

    System.out.println(result.one());}

  • Use prepared statements

    If doing query more than once Better performance Token aware routing

    36

  • 37

    static PreparedStatement statement;

    public static void main(String args[]) {

    session = cluster.connect(); statement = session.prepare( "SELECT * from asset.asset_kvs_timestamp where part_key = ?")

    readRow("asset:QoMqLLyCEeSOi3paAormVw");

    }

    static void readRow(String id) { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound);

    System.out.println(result.one()); }

  • There Be Dragons.. JAVA-420

    statement = session.prepare( "SELECT part_key, time_key, content from asset.asset_kvs_timestamp where part_key = ?")

    38

    Always specify columns explicitly for prepared statements!

  • Consider Async

    static List readRows(List ids) { return ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound); return result.one().getString("c_enc"); }).collect(Collectors.toList());}

    39

  • Async..

    static ListenableFuture readRowsAsync(List ids) { List futures = ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSetFuture future = session.executeAsync(bound);

    return Futures.transform(future, (ResultSet result) -> result.one().getString(c_enc"));

    }).collect(Collectors.toList());

    return Futures.allAsList(futures);}

    40

    http://www.datastax.com/dev/blog/java-driver-async-queries

    http://www.datastax.com/dev/blog/java-driver-async-queries

  • Thank you

  • Cassandra Summit 2016 September 7-9 San Jose, CA

    Get 15% Off with Code: MeetupPromo Cassandrasummit.org

View more