acunu and hailo: a realtime analytics case study on cassandra

Post on 24-Jan-2015

1.766 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

A use case (Hailo, taxi eHailing app) for Acunu's realtime analytics on cassandra nosql

TRANSCRIPT

@daiclegg @acunu

Hailo - a case study for Cassandra & Acunu

dai cleggoctober 2013JAX London

@daiclegg @acunu

What is Hailo?

‣ The world’s highest-rated taxi app – over 11,000 five-star reviews

‣ Over 500,000 registered passengers

‣ A Hailo hail is accepted around the world every 4 seconds

‣ Hailo operates in 15 cities on 3 continents from Tokyo to Toronto in nearly 2 years of operation

2

@daiclegg @acunu

The Adoption of Cassandra & Acunu at Hailo

‣ Launched on AWS

‣ Two PHP/MySQL web apps plus a Java backend

‣ Mostly built by a team of 3 or 4 backend engineers

‣ MySQL multi-master for single available zone resilience

‣ Get/create/update entity

‣ Analytics

‣ Text search

3

@daiclegg @acunu

The Adoption of Cassandra & Acunu at Hailo

‣ A desire for greater resilience – “become a utility”

‣ Cassandra is designed for high availability

‣ Plans for international expansion around a single consumer app

‣ Cassandra is good at global replication

‣ Expected growth

‣ Cassandra scales linearly for both reads and writes

‣ Prior experience

‣ successful in-team experience with Cassandra

4

@daiclegg @acunu

The Adoption of Cassandra & Acunu at Hailo

‣ Replacement of key consumer app functionality,

‣ split PHP/MySQL web app into:

‣ a mixture of PHP/Java services

‣ backed by a Cassandra data store

‣ Launched into production in September 2012

‣ originally just powering North American expansion,

‣ gradually switching over Dublin and London

5

@daiclegg @acunu

The Adoption of Cassandra & Acunu at Hailo

‣ Further decompose functionality into Go/Java SOA

‣ Migrating:

‣ Entity databases to Cassandra

‣ Analytics to Acunu

‣ Search into Elastic Search

6

@daiclegg @acunu

Cassandra

@daiclegg @acunu

“Cassandra just works”Dom W, Senior Engineer, Hailo

8

@daiclegg @acunu

Some Considerations for Data Modeling

‣ Do not read the entire entity, update one property and then write back a mutation containing every column

‣ Only mutate columns that have been set

‣ This avoids read-before-write race conditions

‣ Choose row key carefully, since this partitions the records

‣ Think about how many records you want in a single row

‣ Denormalise on write into many indexes/views

9

@daiclegg @acunu

not obvious!

Some Considerations for Data Modeling

10Average years experience per team member

MySQL Cassandra

10

@daiclegg @acunu

whoops!

Some Repercussions of Data Modeling

11

@daiclegg @acunu

Some considerations for Application Development

People who canattempt to queryMySQL

People who canattempt to

query Cassandra

12

@daiclegg @acunu

Some Considerations for Applications development

13

@daiclegg @acunu

Acunu Analytics

@daiclegg @acunu

Hailo needed to understand system performance/business SLAs

Acunu Analytics

‣ Raw Cassandra lacks analytic primitives

‣ eg: COUNT, SUM, AVG, GROUP BY

‣ Acunu Analytics provides a platform for real time

‣ for pre-planned query templates

‣ It uses Cassandra as the store

‣ so it is highly available, resilient and globally distributed

‣ Integration is straightforward

15

@daiclegg @acunu

Acunu Analytics: technology

16

Real-time incremental cubing provides instant answers to Big Data questions

build cube from history

@daiclegg @acunu

Acunu Analytics: technology

17

Apache Cassandra is the repository

build cube from history

Apache Cassandra

@daiclegg @acunu

Acunu Analytics: an example

18

build cube from history

Define aggregate cubes:CREATE CUBE APPROX TOP(keyword) WHERE browser, time GROUP BY time

New events update cubes

Rich instant queries over cubes SELECT TOP(keyword) FROM table WHERE browser = ‘chrome’ AND time BETWEEN.. GROUP BY d1, d2, ... JOIN ... HAVING .. ORDER BY ..

Drill down to raw events Populate new cubes from historic data

@daiclegg @acunu

Overview of the workflow

Acunu Analytics: summary

19

define aggregation cubes with DDL or infer from self-service queries

define connector: either from library, toolkit or REST

define pre-processors: programmatic, Java or

Javascript; or AQL query

develop queries in AQL, query builder or self-service data explorer

invoke queries from within applications with JSON query API

populate new cubes from historic data

define event schema with DDL or infer from sample events

fill cube from history

define alerts to be raised on trigger conditions

@daiclegg @acunu

some sample screenshots

Acunu Analytics at Hailo

“drill-across” to see breakdown of data

and in-depth analysis

20

@daiclegg @acunu

use cases

Acunu Analytics at Hailo

‣ Infrastructure and Application monitoring

‣ Real-time A/B testing of app layout and incentives

‣ Real time geo-view of supply/demand for drivers

‣ More in the pipeline

21

@daiclegg @acunu

Conclusions

@daiclegg @acunu

Conclusions

‣ Solid Cassandra design

‣ High availability characteristics

‣ Easy multi-data centre setup

‣ Simplicity of operation

‣ With Acunu

‣ SQL-like rich queries

‣ easier data modeling

Choosing the Platform

23

@daiclegg @acunu

Conclusions

‣ Have an advocate

‣ sell the dream

‣ Learn the fundamentals

‣ get the best out of Cassandra

‣ Invest in tools to make life easier

‣ Keep management in the loop

‣ explain the trade offs

Exploiting the platform

24

@daiclegg @acunuApache, Apache Cassandra, Cassandra and the eye logo are trademarks of the Apache Software Foundation.

Thank You.

top related