scaling up wso2 bam for billions of requests and terabytes of data

33
Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data Buddhika Chamith Software Engineer – WSO2 BAM

Upload: wso2

Post on 21-Jun-2015

1.192 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Scaling up wso2 bam for billions of requests and terabytes of data

Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data

Buddhika ChamithSoftware Engineer – WSO2 BAM

Page 2: Scaling up wso2 bam for billions of requests and terabytes of data

Business Activity Monitoring

“The aggregation, analysis, and presentation of real-time information about activities inside organizations and involving customers and partners.” - Gartner

Page 3: Scaling up wso2 bam for billions of requests and terabytes of data

Aggregation

● Capturing data● Data storage● What data to

capture?

Page 4: Scaling up wso2 bam for billions of requests and terabytes of data

Analysis

● Data operations● Building KPIs● Operate on large

amounts of historic data or new data

● Building BI

Page 5: Scaling up wso2 bam for billions of requests and terabytes of data

Presentation

● Visualizing KPIs/BI● Custom Dashboards● Visualization tools● Not just dashboards!

Page 6: Scaling up wso2 bam for billions of requests and terabytes of data

Need for Scalability

Page 7: Scaling up wso2 bam for billions of requests and terabytes of data

BAM 2.x - Component Architecture

Page 8: Scaling up wso2 bam for billions of requests and terabytes of data

Data Agents

● Push data to BAM● Collecting

● Service data● Mediation data● Logs etc.

● Various interceptors used● Axis2 Handlers● Synapse Mediators● Tomcat Valves● Log4j Appenders

Page 9: Scaling up wso2 bam for billions of requests and terabytes of data

Performance Considerations

● Should be asynchronous ● Event batching ● SOAP?● Apache Thrift (Binary protocol)

Page 10: Scaling up wso2 bam for billions of requests and terabytes of data

Apache Thrift

● A RPC framework● With a pluggable architecture

for mixing different transports with different protocols

● Has multiple language bindings (Java, C++, Python, Perl, C# etc.)

● We mainly use Java binding

Page 11: Scaling up wso2 bam for billions of requests and terabytes of data

Not Just Performance...

● Load balancing● Failover● All available within a Java SDK libary. ● You can use it too.

Page 12: Scaling up wso2 bam for billions of requests and terabytes of data

Data Receiver

● Capture and transfer data to subscribed sinks.● Not just the database. ● Can be clustered. ● Load balancing is handled from client side.

Page 13: Scaling up wso2 bam for billions of requests and terabytes of data

Data Bridge

Page 14: Scaling up wso2 bam for billions of requests and terabytes of data

Data Storage

● Apache Cassandra● NoSQL column family

implementation● Scalable, HA and no

SPOF.● Very high write

throughput and good read throughput

● Tunable consistency with data replication

Page 15: Scaling up wso2 bam for billions of requests and terabytes of data

Deployment – Storage Cluster

Page 16: Scaling up wso2 bam for billions of requests and terabytes of data

Reciever Cluster

Page 17: Scaling up wso2 bam for billions of requests and terabytes of data

Results

With a single receiver node allocated 2GB heap with quad core on RHEL.

Page 18: Scaling up wso2 bam for billions of requests and terabytes of data

Disk Growth

Page 19: Scaling up wso2 bam for billions of requests and terabytes of data

Analyzer Engine

● Idea : Distribute processing to multiple nodes to run in parallel

● Obvious choice : Hadoop ● Uses Map Reduce Programming paradigm

Page 20: Scaling up wso2 bam for billions of requests and terabytes of data

Map Reduce

● Process multiple data chunks paralley at Mappers.

● Aggregate map outputs having similar keys at Reducers and store the result.

● Let's think of a useful example..

Page 21: Scaling up wso2 bam for billions of requests and terabytes of data

Hadoop Components

● Job Tracker● Name node● Secondary Name Node● Task Trackers● Data Nodes

Page 22: Scaling up wso2 bam for billions of requests and terabytes of data

It's Cool But ..● Do we need to have a

Hadoop cluster in order to try out BAM?

● Are we supposed to code Hadoop jobs to get

BAM to summarize some thing?

● Answers

1) No

2) No. Ok may be very rarely at best.

Courtesy: http://goo.gl/QEnpN

Page 23: Scaling up wso2 bam for billions of requests and terabytes of data

Apache Hive

● You write SQL. (Almost)● Let Hive convert to Map Reduce jobs.● So Hive does two things

● Provide an abstraction for Hadoop Map Reduce● Submit the analytic jobs to Hadoop

● Hive may spawn a Hadoop JVM locally or delegate to a Hadoop Cluster

Page 24: Scaling up wso2 bam for billions of requests and terabytes of data

A Typical Hive Script

Page 25: Scaling up wso2 bam for billions of requests and terabytes of data

Results

Page 26: Scaling up wso2 bam for billions of requests and terabytes of data

Task Framework

● Run Hive scripts periodically● Can specify as cron expressions/ predefined

templates● Handles task failover in case of node faliure● Uses Zookeeper for coordination

Page 27: Scaling up wso2 bam for billions of requests and terabytes of data

Zookeeper

● Can be run seperately or embedded within BAM

Page 28: Scaling up wso2 bam for billions of requests and terabytes of data

Analyzer Cluster

Page 29: Scaling up wso2 bam for billions of requests and terabytes of data

Dashboard

● Making dashboard scale.

Page 30: Scaling up wso2 bam for billions of requests and terabytes of data

Deployment Patterns

Single Node Single Node

Page 31: Scaling up wso2 bam for billions of requests and terabytes of data

High AvailabilityHigh Availability

Page 32: Scaling up wso2 bam for billions of requests and terabytes of data

Fully Distributed SetupFully Distributed Setup

Page 33: Scaling up wso2 bam for billions of requests and terabytes of data

Summary

● BAM ● Need for scalability● Scaling BAM components● Results● BAM deployment patterns