ccm alchemyapi and real-time aggregation
DESCRIPTION
An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction. Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys. All written in Python!TRANSCRIPT
TIME SERIES AGGREGATES
USING CASSANDRA, KAIROSDB & ALCHEMY API
• Bio-Informatics Engineer
• Business Analyst
• Data Warehouse Specialist
• System Operations / DevOps
• Founder & Lead Technologist
• Presenter, Speaker, Organizer
• Founder / Do-Gooder
• Data Engineer & Manager
@
Who is Victor Anjos?KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
@
Quick Review…KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
@
Why Real-Time?KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
@
REMEMBER --- TWEETKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
PLEASE MAKE SURETO TWEET…
NEED TWEETSTO THE HASHTAGSBELOW AT THE END
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);
cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> CREATE TABLE example ( ... field1 int PRIMARY KEY, ... field2 int, ... field3 int);
cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 1,2,3);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 4,5,6);cqlsh:test> INSERT INTO example (field1, field2, field3) VALUES ( 7,8,9);
cqlsh:test> SELECT * FROM example;
field1 | field2 | field3--------+--------+-------- 1 | 2 | 3 4 | 5 | 6 7 | 8 | 9
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
[default@test] list example;-------------------RowKey: 1=> (column=, value=, timestamp=1374546754299000)=> (column=field2, value=00000002, timestamp=1374546754299000)=> (column=field3, value=00000003, timestamp=1374546754299000)-------------------RowKey: 4=> (column=, value=, timestamp=1374546757815000)=> (column=field2, value=00000005, timestamp=1374546757815000)=> (column=field3, value=00000006, timestamp=1374546757815000)-------------------RowKey: 7=> (column=, value=, timestamp=1374546761055000)=> (column=field2, value=00000008, timestamp=1374546761055000)=> (column=field3, value=00000009, timestamp=1374546761055000)
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> CREATE TABLE example ( ... partitionKey1 text, ... partitionKey2 text, ... clusterKey1 text, ... clusterKey2 text, ... normalField1 text, ... normalField2 text, ... PRIMARY KEY ( (partitionKey1, partitionKey2), clusterKey1, clusterKey2 ) ... );
cqlsh:test> INSERT INTO example (partitionKey1, ... partitionKey2, clusterKey1, clusterKey2, ... normalField1, normalField2) VALUES ( ... 'partitionVal1', ... 'partitionVal2', ... 'clusterVal1', ... 'clusterVal2', ... 'normalVal1', ... 'normalVal2');
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
cqlsh:test> SELECT * FROM example; partitionkey1 | partitionkey2 | clusterkey1 | clusterkey2 | normalfield1 | normalfield2---------------+---------------+-------------+-------------+--------------+-------------- partitionVal1 | partitionVal2 | clusterVal1 | clusterVal2 | normalVal1 | normalVal2
[default@test] list example;-------------------RowKey: partitionVal1:partitionVal2=> (column=clusterVal1:clusterVal2:, value=, timestamp=1374630892473000)=> (column=clusterVal1:clusterVal2:normalfield1, value=6e6f726d616c56616c31, timestamp=1374630892473000)
@
Keys in C*KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
1. First part of composite key [inside the inner brackets] is called “Partition Key”, rest [no inside the inner brackets] are “Cluster Keys”.
2. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (“:” as separator) to form column names (cluster keys). Column values remain unchanged.
3. Cluster keys (other than partition keys) are ordered, and you cannot allowed search on random columns, you have to specify the entire cluster key and can run a range query on the final portion of it.
@
A bit of data modellingKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
USER ACTIVITY DATA MODEL
CREATE TABLE user_activity (… username varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar… PRIMARY KEY (username, interaction time)… ) WITH CLUSTERING ORDER BY (interaction_time
DESC);
CREATE TABLE user_activity_history (… username varchar,… interaction_date varchar,… interaction_time timeuuid,… activity_code varchar,… detail varchar,… PRIMARY KEY
((username,interaction_date),interaction_time)… );
@
Data modelling 4 QUERIESKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
FIND A CAR IN A LOTCREATE TABLE car_location_index (
… make varchar,… model varchar,… colour varchar,… vehicle_id int,… lot_id,… PRIMARY KEY ((make,model,colour),vehicle_id)… );
@
Data modelling 4 QUERIESKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
FIND A CAR IN A LOT
Truth(iness) Table
@
Data modelling 4 QUERIESKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
FIND A CAR IN A LOT
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’Blue’,1234,8675309)
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’Mustang’,’’,1234,8675309)
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’Blue’,1234,8675309)
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘Ford’,’’,’’,1234,8675309)
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’Blue’,1234,8675309)
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’Mustang’,’’,1234,8675309)
INSERT INTO car_location_index (make,model,colour,vehicle_id,lot_id)VALUES (‘’,’’,’Blue’,1234,8675309)
@
Data modelling 4 QUERIESKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
FIND A CAR IN A LOTSELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘Ford’AND model = ‘’AND colour= ‘Blue’;
vehicle_id | lot_id--------------+----------- 1234 | 8675309
SELECT vehicle_id, lot_idFROM car_location_indexWHERE make = ‘’AND model = ‘’AND colour = ‘Blue’;
vehicle_id | lot_id--------------+----------- 1234 | 8675309 8765 | 5551212
@
Enter KairosDBKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
[{ "name": "archive.file.tracked", "datapoints": [[1359788400000, 123], [1359788300000, 13.2], [1359788410000, 23.1]], "tags": { "host": "server1", "data_center": "DC1" }},{ "name": "archive.file.search", "timestamp": 999, "value": 321, "tags":{"host":"test"}}]
http://localhost:8080/api/v1/datapoints
http://localhost:8080/api/v1/datapoints/query
@
Sentiment Analysis NLPKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
@
Sentiment Analysis NLPKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
He loves me He loves me not
@
AlchemyAPIKEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
AlchemyAPI uses natural language processing technology and machine learning algorithms to extract semantic meta-data from content, such as information on people, places, companies, topics, facts, relationships, authors, and languages.
@
Prep Work…KEEP
TWEETING
@VictorFAnjos
@viafoura
@AlchemyAPI
@Datastax
@Data_for_Good
#BDWTO #BDW14
https://gist.github.com/vanjos/6169734Install CCM
Install KairosDBhttps://code.google.com/p/kairosdb/wiki/GettingStarted
Get some API Keyshttps://dev.twitter.com & https://apps.twitter.com/
http://www.alchemyapi.com/api/register.html