spark and cassandra: 2 fast, 2 furious
TRANSCRIPT
![Page 1: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/1.jpg)
Spark and Cassandra: 2 Fast, 2 Furious
Russell Spitzer DataStax Inc.
![Page 2: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/2.jpg)
Russell, ostensibly a software engineer
• Did a Ph.D in bioinformatics at some point
• Written a great deal of automation and testing framework code
• Now develops for Datastax on the Analytics Team
• Focuses a lot on the Datastax OSS Spark Cassandra Connector
![Page 3: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/3.jpg)
Datastax Spark Cassandra ConnectorLet Spark Interact with your Cassandra Data!
https://github.com/datastax/spark-cassandra-connectorhttp://spark-packages.org/package/datastax/spark-cassandra-connector
http://spark-packages.org/package/TargetHolding/pyspark-cassandra
Compatible with Spark 1.6 + Cassandra 3.0
• Bulk writing to Cassandra • Distributed full table scans • Optimized direct joins with Cassandra • Secondary index pushdown • Connection and prepared statement pools
![Page 4: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/4.jpg)
Cassandra is essentially a hybrid between a key-value and a column-oriented (or tabular) database management system. Its data model is a partitioned row
store with tunable consistency*
*https://en.wikipedia.org/wiki/Apache_Cassandra
Let's break that down
1.What is a C* Partition and Row
2.How does C* Place Partitions
![Page 5: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/5.jpg)
CQL looks a lot like SQLCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
![Page 6: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/6.jpg)
INSERTS look almost IdenticalCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
INSERT INTO tracker (vehicle_id, ts, x, y) Values ( 1, 0, 0, 1)
![Page 7: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/7.jpg)
Cassandra Data is stored in PartitionsCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
vehicle_id 1
ts: 0
x: 0 y: 1
INSERT INTO tracker (vehicle_id, ts, x, y) Values ( 1, 0, 0, 1)
![Page 8: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/8.jpg)
C* Partitions Store Many RowsCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
vehicle_id 1
ts: 0
x: 0 y: 1
ts: 1
x: -1 y:0
ts: 2
x: 0 y: -1
ts: 3
x: 1 y: 0
![Page 9: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/9.jpg)
C* Partitions Store Many RowsCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
vehicle_id 1
ts: 0
x: 0 y: 1
ts: 1
x: -1 y:0
ts: 2
x: 0 y: -1
ts: 3
x: 1 y: 0Partition
![Page 10: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/10.jpg)
C* Partitions Store Many RowsCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
vehicle_id 1
ts: 0
x: 0 y: 1
ts: 1
x: -1 y:0
ts: 2
x: 0 y: -1
ts: 3
x: 1 y: 0Row
![Page 11: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/11.jpg)
Within a partition there is ordering based on the Clustering Keys
CREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
vehicle_id 1
ts: 0
x: 0 y: 1
ts: 1
x: -1 y:0
ts: 2
x: 0 y: -1
ts: 3
x: 1 y: 0
Ordered by Clustering Key
![Page 12: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/12.jpg)
Slices within a Partition are Very EasyCREATE TABLE tracker ( vehicle_id uuid, ts timestamp, x double, y double, PRIMARY KEY (vehicle_id, ts))
vehicle_id 1
ts: 0
x: 0 y: 1
ts: 1
x: -1 y:0
ts: 2
x: 0 y: -1
ts: 3
x: 1 y: 0
Ordered by Clustering Key
![Page 13: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/13.jpg)
Cassandra is a Distributed Fault-Tolerant Database
San Jose
Oakland
San Francisco
![Page 14: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/14.jpg)
Data is located on a Token Range
San Jose
Oakland
San Francisco
![Page 15: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/15.jpg)
Data is located on a Token Range
0
San Jose
Oakland
San Francisco
1200
600
![Page 16: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/16.jpg)
The Partition Key of a Row is Hashed to Determine it's Token
0
San Jose
Oakland
San Francisco
1200
600
![Page 17: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/17.jpg)
The Partition Key of a Row is Hashed to Determine it's Token
01200
600
San Jose
Oakland
San Francisco
vehicle_id 1
INSERT INTO tracker (vehicle_id, ts, x, y) Values ( 1, 0, 0, 1)
![Page 18: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/18.jpg)
The Partition Key of a Row is Hashed to Determine it's Token
01200
600
San Jose
Oakland
San Francisco
vehicle_id 1
INSERT INTO tracker (vehicle_id, ts, x, y) Values ( 1, 0, 0, 1)
Hash(1) = 1000
![Page 19: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/19.jpg)
How The Spark Cassandra Connector Reads0
500
![Page 20: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/20.jpg)
How The Spark Cassandra Connector Reads0
San Jose Oakland San Francisco
500
![Page 21: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/21.jpg)
Spark Partitions are Built From the Token Range
0San Jose Oakland San Francisco
500
500 -
600
600 -
700
500 1200
![Page 22: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/22.jpg)
Each Partition Can be Drawn Locally from at Least One Node
0San Jose Oakland San Francisco
500
500 -
600
600 -
700
500 1200
![Page 23: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/23.jpg)
Each Partition Can be Drawn Locally from at Least One Node
0San Jose Oakland San Francisco
500
500 -
600
600 -
700
500 1200
Spark Executor Spark Executor Spark Executor
![Page 24: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/24.jpg)
No Need to Leave the Node For Data!0
500
![Page 25: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/25.jpg)
Data is Retrieved using the Datastax Java Driver
0
Spark Executor Cassandra
![Page 26: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/26.jpg)
A Connection is Established0
Spark Executor Cassandra
500 -
600
![Page 27: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/27.jpg)
A Query is Prepared with Token Bounds0
Spark Executor CassandraSELECT * FROM TABLE WHERE Token(PK) > StartToken AND Token(PK) < EndToken500
- 600
![Page 28: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/28.jpg)
The Spark Partitions Bounds are Placed in the Query
0
Spark Executor CassandraSELECT * FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600500
- 600
![Page 29: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/29.jpg)
Paged a number of rows at a Time0
Spark Executor Cassandra
500 -
600
SELECT * FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600
![Page 30: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/30.jpg)
Data is Paged0
Spark Executor Cassandra
500 -
600
SELECT * FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600
![Page 31: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/31.jpg)
Data is Paged0
Spark Executor Cassandra
500 -
600
SELECT * FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600
![Page 32: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/32.jpg)
Data is Paged0
Spark Executor Cassandra
500 -
600
SELECT * FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600
![Page 33: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/33.jpg)
Data is Paged0
Spark Executor Cassandra
500 -
600
SELECT * FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600
![Page 34: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/34.jpg)
How we write to Cassandra• Data is written via the Datastax Java Driver • Data is grouped based on Partition Key
(configurable) • Batches are written
![Page 35: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/35.jpg)
Data is Written using the Datastax Java Driver0
Spark Executor Cassandra
![Page 36: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/36.jpg)
A Connection is Established0
Spark Executor Cassandra
![Page 37: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/37.jpg)
Data is Grouped on Key0
Spark Executor Cassandra
![Page 38: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/38.jpg)
Data is Grouped on Key0
Spark Executor Cassandra
![Page 39: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/39.jpg)
Once a Batch is Full or There are Too Many Batches The Largest Batch is Executed
0
Spark Executor Cassandra
![Page 40: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/40.jpg)
Once a Batch is Full or There are Too Many Batches The Largest Batch is Executed
0
Spark Executor Cassandra
![Page 41: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/41.jpg)
Once a Batch is Full or There are Too Many Batches The Largest Batch is Executed
0
Spark Executor Cassandra
![Page 42: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/42.jpg)
Once a Batch is Full or There are Too Many Batches The Largest Batch is Executed
0
Spark Executor Cassandra
![Page 43: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/43.jpg)
Most Common Features• RDD APIs
• cassandraTable • saveToCassandra • repartitionByCassandraTable • joinWithCassandraTable
• DF API • Datasource
![Page 44: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/44.jpg)
Full Table Scans, Making an RDD out of a Table
importcom.datastax.spark.connector._
sc.cassandraTable(KeyspaceName,TableName)
importcom.datastax.spark.connector._
sc.cassandraTable[MyClass](KeyspaceName,TableName)
![Page 45: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/45.jpg)
Pushing Down CQL to Cassandraimportcom.datastax.spark.connector._
sc.cassandraTable[MyClass](KeyspaceName,TableName).select("vehicle_id").where("ts>10")
SELECT "vehicle_id" FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600 AND ts > 10
![Page 46: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/46.jpg)
Distributed Key Retrievalimportcom.datastax.spark.connector._
rdd.joinWithCassandraTable("keyspace","table")
San Jose Oakland San Francisco
Spark Executor Spark Executor Spark Executor
RDD
![Page 47: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/47.jpg)
But our Data isn't Colocatedimportcom.datastax.spark.connector._
rdd.joinWithCassandraTable("keyspace","table")
San Jose Oakland San Francisco
Spark Executor Spark Executor Spark Executor
RDD
![Page 48: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/48.jpg)
RBCR Moves bulk reshuffles our data so Data Will Be Local
rdd.repartitionByCassandraReplica("keyspace","table").joinWithCassandraTable("keyspace","table")
San Jose Oakland San Francisco
Spark Executor Spark Executor Spark Executor
RDD
![Page 49: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/49.jpg)
The Connector Provides a Distributed Pool for Prepared Statements and Sessions
CassandraConnector(sc.getConf)
rdd.mapPartitions{it=>{valps=CassandraConnector(sc.getConf).withSessionDo(s=>s.prepare)it.map{ps.bind(_).executeAsync()}}
Your Pool Ready to Be Deployed
![Page 50: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/50.jpg)
The Connector Provides a Distributed Pool for Prepared Statements and Sessions
CassandraConnector(sc.getConf)
rdd.mapPartitions{it=>{valps=CassandraConnector(sc.getConf).withSessionDo(s=>s.prepare)it.map{ps.bind(_).executeAsync()}}
Your Pool Ready to Be DeployedPrepared Statement CacheSession Cache
![Page 51: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/51.jpg)
The Connector Supports the DataSources ApisqlContext.read.format("org.apache.spark.sql.cassandra") .options(Map("keyspace"->"read_test","table"->"simple_kv")) .load
importorg.apache.spark.sql.cassandra._
sqlContext.read.cassandraFormat("read_test","table").load
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md
![Page 52: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/52.jpg)
The Connector Works with Catalyst to Pushdown Predicates to Cassandra
importcom.datastax.spark.connector._
df.select("vehicle_id").filter("ts>10")
SELECT "vehicle_id" FROM TABLE WHERE Token(PK) > 500 AND Token(PK) < 600 AND ts > 10
![Page 53: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/53.jpg)
The Connector Works with Catalyst to Pushdown Predicates to Cassandra
importcom.datastax.spark.connector._
df.select("vehicle_id").filter("ts>10")
QueryPlan Catalyst PrunedFilteredScan
Only Prunes (projections) and Filters (predicates) are able to be pushed down.
![Page 54: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/54.jpg)
Recent Features• C* 3.0 Support
• Materialized Views • SASL Indexes (for pushdown)
• Advanced Spark Partitioner Support • Increased Performance on JWCT
![Page 55: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/55.jpg)
Use C* Partitioning in Spark
![Page 56: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/56.jpg)
Use C* Partitioning in Spark• C* Data is Partitioned • Spark has Partitions and partitioners • Spark can use a known partitioner to speed up
Cogroups (joins) • How Can we Leverage this?
![Page 57: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/57.jpg)
Use C* Partitioning in Spark
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/16_partitioning.md
Now if keyBy is used on a CassandraTableScanRDD and the PartitionKey is included in the key. The RDD will be given a C* Partitioner
valks="doc_example"valrdd={sc.cassandraTable[(String,Int)](ks,"users").select("name"as"_1","zipcode"as"_2","userid").keyBy[Tuple1[Int]]("userid")}
rdd.partitioner//res3:Option[org.apache.spark.Partitioner]=Some(com.datastax.spark.connector.rdd.partitioner.CassandraPartitioner@94515d3e)
![Page 58: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/58.jpg)
Use C* Partitioning in Spark
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/16_partitioning.md
Share partitioners between Tables for joins on Partition Key
valks="doc_example"valrdd1={sc.cassandraTable[(Int,Int,String)](ks,"purchases").select("purchaseid"as"_1","amount"as"_2","objectid"as"_3","userid").keyBy[Tuple1[Int]]("userid")}
valrdd2={sc.cassandraTable[(String,Int)](ks,"users").select("name"as"_1","zipcode"as"_2","userid").keyBy[Tuple1[Int]]("userid")}.applyPartitionerFrom(rdd1)//Assignsthepartitionerfromthefirstrddtothisone
valjoinRDD=rdd1.join(rdd2)
![Page 59: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/59.jpg)
Use C* Partitioning in Spark
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/16_partitioning.md
Many other uses for this, try it yourself!
• All self joins using the Partition key • Groups within C* Partitions • Anything formerly using SpanBy • Joins with other RDDs
And much more!
![Page 60: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/60.jpg)
Enhanced Parallelism with JWCT• joinWithCassandraTable now has increased
concurrency and parallelism! • 5X Speed increases in some cases • https://datastax-oss.atlassian.net/browse/
SPARKC-233 • Thanks Jaroslaw!
![Page 61: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/61.jpg)
The Connector wants you!• OSS Project that loves community involvement • Bug Reports • Feature Requests • Doc Improvements • Come join us!
Vin Diesel may or may not be a contributor
![Page 62: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/62.jpg)
Tons of Videos at Datastax Academyhttps://academy.datastax.com/
https://academy.datastax.com/courses/getting-started-apache-spark
![Page 63: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/63.jpg)
Tons of Videos at Datastax Academyhttps://academy.datastax.com/
https://academy.datastax.com/courses/getting-started-apache-spark
![Page 64: Spark And Cassandra: 2 Fast, 2 Furious](https://reader034.vdocuments.net/reader034/viewer/2022042619/586f7a331a28ab10258b7259/html5/thumbnails/64.jpg)
See you at Cassandra Summit!
Join me and 3500 of your database peers, and take a deep dive into Apache Cassandra™, the massively scalable NoSQL database that powers global businesses like Apple, Spotify, Netflix and Sony.
San Jose Convention CenterSeptember 7-9, 2016
https://cassandrasummit.org/Build Something Disruptive