counters for real-time statistics

Counters for real-time statistics

Aug 2011

Quick Cassandra storage primer

Standard columns

Idempotent writes – last client time stamp wins Store byte [] - can have validators No internal locking Not read before write Example:

set Users['ecapriolo']['fname']='ed';

Counter columns

Store Integral values only Can be incremented or decremented with single

RPC Local read before write Merged on read Example:

incr followers['ecapriolo']['x'] by 30

Counters combine powers with:

composite keys: incr stats['user/date']['page'] by 1; scale to distribute writes

A distributed system to record events Pre-caclulated real time stats

And you get:

Other ways to collect and report

Store in files, process into reports Example: data-> hdfs -> hive queries -> reports Light work on front end Heavy on back end

Store into relational database Example:

data -> rdbms (ind) -> rt queries & reports -> reports Divides work between front end and back end Indexes can become choke points

Example data set

url | username | event_time | time_to_serve_millis

/page1.htm | edward | 2011-01-02 :04:01:04 | 45

/page1.htm | stacey | 2011-01-02 :04:01:05 | 46

/page1.htm | stacey | 2011-01-02 :04:02:07 | 40

/page2.htm | edward | 2011-01-02 :04:02:45 | 22

“Query” one: hit count bucket by minute

page | time | count

/page1.htm | 2011-01-02 :04:01 | 2

/page1.htm | 2011-01-02 :04:02 | 1

/page2.htm | 2011-01-02 :04:02 | 1

“Query” two: resources consumed by user per hour

user | time | total_time_to_serve

edward | 2011-01-02 :04 | 67

stacey | 2011-01-02 :04 | 86

Turn a record line into a pojo

class Record {

String url,username;

Date date;

int timeToServe;

Use your imagination here:

public static List<Record> readRecords(String file) throws Exception {

writeRecord() Method

public static void writeRecord(Cassandra.Client c, Record r) throws Exception {

DateFormat bucketByMinute = new SimpleDateFormat("yyyy-MM-dd HH:mm");

DateFormat bucketByDay = new SimpleDateFormat("yyyy-MM-dd");

DateFormat bucketByHour = new SimpleDateFormat("yyyy-MM-dd HH");

“Query” 1 page counts by minute

CounterColumn counter = new CounterColumn();

ColumnParent cp = new ColumnParent("page_counts_by_minute");

counter.setName(ByteBufferUtil.bytes (bucketByMinute.format(r.date)));

counter.setValue(1);

c.add( ByteBufferUtil.bytes(

bucketByDay.format(r.date)+"-"+r.url)

, cp, counter, ConsistencyLevel.ONE);

“Query” 2 usage by users per hour

CounterColumn counter2 = new CounterColumn();

ColumnParent cp2 = new ColumnParent ("user_usage_by_minute");

counter2.setName( ByteBufferUtil.bytes(

bucketByHour.format(r.date)));

counter2.setValue(r.timeToServe);

c.add(ByteBufferUtil.bytes(

bucketByDay.format(r.date)+"-"+r.username)

, cp2, counter2, ConsistencyLevel.ONE);

How this works

Results

[default@counttest] list user_usage_by_minute;

——————-

RowKey: 2011-01-02- stacey

=> (counter=2011-01-02 04, value=86)

——————-

RowKey: 2011-01-02- edward

=> (counter=2011-01-02 04, value=67)

More Results

[default@counttest] list page_counts_by_minute;

——————-

RowKey: 2011-01-02-/page1.htm

=> (counter=2011-01-02 04:01, value=2)

=> (counter=2011-01-02 04:02, value=1)

——————-

RowKey: 2011-01-02-/page2.htm

=> (counter=2011-01-02 04:02, value=1)

Counters pushed work to the “front end” Data is bucketed, sorted, and indexed on insert Data is already “ready” on read Designed around how you want to read data

Distributed writes across the cluster Bucketed data by time, user, page, etc. Different then table/index contention point

Questions?Full code at: http://www.jointhegrid.com/highperfcassandra/?cat=7

counters for real-time statistics

time time

minute countercolumn

end data

counter columns

minute page time count

new columnparent user

new simpledateformatyyyy

date counter2

Technology

real estate statistics for hazelwood, mo including real...

real statistics examples workbook

real estate statistics for st. louis, mo 63134 including...

real estate statistics for st. charles, mo 63301 including...

real estate statistics for university city, mo 63130...

real estate statistics for chesterfield, mo 63146 including...

real estate statistics for riverview, mo 63368 including...

real estate statistics for st. louis, mo including real...

wordpress.comjun 01, 2016 · murder by abortion number of...

real estate statistics for mehlville, mo 63129 including...

real estate statistics for st. louis, mo 63106 including...

real estate statistics for florissant, mo 63033 including...

real estate statistics for st. charles, mo 63303 including...

real estate statistics for st. louis, mo 63136 including...

real estate statistics for chesterfield, mo including real...

performance counters - · pdf filecisco ran management...

real estate statistics for creve coeur, mo 63146 including...

real estate statistics for st. louis, mo 63138 including...

real estate statistics for st. louis, mo 63137 including...

real estate statistics for chesterfield, mo 63017 including...