wso2 product release webinar: wso2 data analytics server 3.0
TRANSCRIPT
WSO2 Data Analytics Server 3.0.0Product Release Webinar
Inosh GoonewardenaAssociate Technical Lead
WSO2 Analytics Platform
WSO2 Analytics Platform
Data Processing Pipeline
Introducing WSO2 Data Analytics Server
● Fully-open source solution with the ability to build systems and applications that collect and analyze data and communicate the results.
● Embodies the WSO2 Analytics Platform by combining batch, real-time, interactive and predictive analytics capabilities
● High performance data capture framework
● Highly available and scalable by design
Advantages of DAS 3.0 over WSO2 BAM 2.5.0
● Complete rewrite from the ground up, with performance and extensibility as core values
● Faster analytics powered by Apache Spark, 10x - 100x speedup
● Rich indexing support, with near real-time text search
● Pluggable data store support, from lightweight embedded RDBMS to highly scalable HBase/HDFS
● Revamped Analytics Dashboard with wizard-based gadget generation
WSO2 DAS Architecture
Collecting Data
Data Model{
'name': 'stream.name',
'version': '1.0.0',
'nickName': 'stream nickname',
'description': 'description of the stream',
'metaData':[
{'name':'meta_data_1','type':'STRING'},
],
'correlationData':[
{'name':'correlation_data_1','type':'STRING'}
],
'payloadData':[
{'name':'payload_data_1','type':'BOOL'},
{'name':'payload_data_2','type':'LONG'}
]
}
● Published data conforms to a strongly typed data stream
● One API for Batch and Real-time Analytics.
● Asynchronous and non-blocking nature enables extremely fast writes.
● Supports multiple transport adapters for data collection
Data Receiver
Highly Pluggable Event Receiver Architecture
Data Persistence● Data Abstraction Layer to enable pluggable data connectors
○ RDBMS, Cassandra and HBase/HDFS offered. Custom connectors could be easily written
● Analytics Table○ The data persistence entity in WSO2 Data Analytics Server
○ Provides a backend data source agnostic way of storing and retrieving data
○ Allows applications to be written in a way that it does not depend on a specific data source, e.
g. JDBC (RDBMS), Cassandra APIs etc.
○ WSO2 DAS gives a standard REST API in accessing the Analytics Tables
Data Persistence● Analytics Record Stores
○ An Analytics Record Store houses a specific set of Analytics Tables
○ The Analytics Record Stores to be used for storing incoming events and storing query
processing output are configurable
○ Single Analytics Table namespace, the target record store only given at the time of table
creation
○ Useful in creating Analytics Tables where data will be stored in multiple target databases
● Analytics File System○ The location where the indexing data is stored
○ Multiple implementations provided OOTB, or custom implementations can be written
Analyzing Data
Batch Analytics
Batch Analytics - Overview● Powered by Apache Spark for 10x-100x higher performance than Hadoop
● Parallel, distributed with optimized in-memory processing
● Scalable script-based analytics written using an easy-to-learn, SQL-like query language powered by Spark SQL
● Interactive built in web interface for ad-hoc query execution
● Scheduled query script execution support with high-availability and failover
● Run Spark on a single node, Spark embedded Carbon server cluster or connect to external Spark cluster
create temporary table product_data using CarbonAnalytics
options (schema …)
create temporary table products using CarbonAnalytics
options (schema …)
insert into products select product_name from product_data
group by …
Batch Analytics - Spark SQL
Batch Analytics - Interactive Console
Batch Analytics - Spark Scripts
Interactive Analytics
● Full text data indexing support powered by Apache Lucene● Drill down search support● Distributed data indexing
○ Designed to support scalability● Near real-time data indexing and retrieval
○ Data indexed immediately as received
Interactive Analytics
Interactive Analytics
Real-time Analytics
What is Real-time Analytics?Real-time Analytics in
→
Real-time Analytics in →
● Gather data from multiple sources● Correlate data streams over time● Find interesting occurrences ● And Notify ● All in real-time
What is Real-time Analytics?
Predictive Analytics (upcoming)
Predictive Analytics in →
What is Predictive Analytics?
Predictive Analytics in →
● Extract, pre-process, and explore data
● Create models, tune algorithms and make predictions
● Integrate for better intelligence
What is Predictive Analytics?
Communicating Results
Dashboards● “Overall idea” in a glance (e.g. car
dashboard)
● Support for personalization, you can build your own dashboard.
● The entry point for Drill-down
● Building a custom dashboard○ Dashboard via Google Gadgets and content
via HTML5 + JavaScript○ Leverages WSO2 User Engagement Server to
build a dashboard.○ Uses charting libraries like Vega, D3.js
Dashboards: Gadget Generation Wizard
● Start with data in tabular format
● Map each column to dimension in your plot like X,Y, color, point size, etc
● Also do drill downs
● Create a chart with few clicks
Alerts● Detecting conditions can be
done via CEP Queries
● “Last Mile” is key○ Email
○ SMS
○ Push notifications to a UI
○ Pager
○ Trigger physical Alarm
APIs● With mobile Apps, most data are
exposed and shared as APIs (REST/JSON ) to end users.
● Analytics results can be exposed through APIs
○ REST API
○ JavaScript API
What can WSO2 DAS do for you?
Common Use Cases of WSO2 DAS● KPI Statistics
○ Application Statistics Monitoring○ Network / Service Statistics○ Sensor Data Aggregation
● Solving Optimization Problems○ Urban Planning○ Revenue Distribution Analysis
● Activity Monitoring○ Tracking Message Flows
● HL7 Data Exploration○ ESB HL7 Transport Interfaced with
DAS
● Log Analysis○ Application / System Logs
● Sports○ Real-time Analysis of Player
Performance○ Real-time Match Analysis
● Geo-Spatial○ Traffic Monitoring and Alerting○ Geo-fencing
● Anomaly Detection○ Fraud Detection○ Network Intrusion Detection○ Server Health Monitoring
API Statistics
API Statistics
HTTP Monitoring
Activity MonitoringActivity monitoring is for tracking events from multiple nodes in a flow to understand a specific activity
● Example:○ A client initiating a web services request which travels through multiple ESBs, application
servers and returns back. This flow will be uniquely identified and visualized in DAS
● Used for tracing messages, finding performance hotspots in the flow
● Implemented based on a correlation id based mechanism using Interactive Analytics
Activity Monitoring
Activity Monitoring
Activity Monitoring
Activity Monitoring
Activity Monitoring
Fraud Detection
● Built for detecting credit card fraud
● The rules are extensible with customized Siddhi execution plans for any type of fraud detection
● Currently leverages Real-time and Interactive Analytics features
Source: multichannelmerchant.com
Log Analysis● Distributed indexing and searching
of any type of logs stored in the system
● Notifications support with Real-time event processing features
● Application / Server health prediction with Machine Learning
● Utilizes Interactive + Real-time Analytics + Machine Learning features
Source: www.retrospective.centeractive.com
Urban Route Planning
Urban Route Planning
Product Demonstration
Questions?