c* summit eu 2013: blending cassandra data into the mix
DESCRIPTION
Speaker: Matt Casters, Chief Architect & PDI/Kettle Project Founder at Pentaho Video: http://www.youtube.com/watch?v=r7BEp-C60bQ&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=8 Traditionally, data is delivered to business analytics tools through a relational database. However, there are cases where that can be inconvenient, for example when the volume of data is just too high or when you can't wait until the database tables are updated. This presentation by Pentaho Kettle founder Matt Casters will demonstrate a solution of data 'Blending', which allows a data integration user to create a transformation capable of delivering data directly to Pentaho - and other - business analytics tools. Matt will demonstrate taking data from Cassandra, and blending it with other data from both SQL and NoSQL sources, and then visualizing that data. Matt will explain how it becomes possible to create a virtual "database" with "tables" where the data actually comes from a transformation step.TRANSCRIPT
#CASSANDRAEUCASSANDRASUMMIT
EU
Blending Cassandra Data Into the mix
Matt Casters| Chief Architect, Data Integration at Pentaho Kettle Project Founder
#CASSANDRAEUCASSANDRASUMMIT
EU
* About Pentaho* Blended Big Data Integration* Demo* Takeaway & QA
What we will discuss today…
#CASSANDRAEUCASSANDRASUMMIT
EU
About Pentaho
Our mission and key takeaways
#CASSANDRAEUCASSANDRASUMMIT
EU
Pentaho MissionEnabling the future of analytics
Modern unified business analytics and data integration platform• Full spectrum of advancing analytics for all key roles• Embeddable, cloud-ready analytics• Big data blending for analytics in real-time environments• Broadest and deepest big data integrationInnovation through open source• Open, pluggable, purpose built for the future• Early sustained leadership in big data
ecosystem with technology innovation
Critical mass achieved• Over 1,200 commercial customers• Over 10,000 production deployments
#CASSANDRAEUCASSANDRASUMMIT
EU
* ETL and Analytics that complement Cassandra* Create data transformations from source systems
into Cassandra, and Cassandra to target systems, via drag and drop
* Quickly visualize and explore data inside Cassandra with Pentaho Data Services
* Deeper Casandra/Pentaho integration in development
* Keep up with the latest Cassandra developments* Provide underlying API compatibility layer
Pentaho and Cassandra
#CASSANDRAEUCASSANDRASUMMIT
EU
The New RealitySimplified Analysis for all Users
ANY Analytics• Reports• Dashboards• Visualizations• Discovery• Predictive
Analytics
ANY Environment• Data warehouses• Data marts• Stack vendors• Cloud• Embedded
Existing & New Data
Infrastructure & Processes
ANY Data• Relational• Operational• Big Data• Data sources not yet
anticipated…
Billing
Location
Social Media
Customer
Web
Network
#CASSANDRAEUCASSANDRASUMMIT
EU
Pentaho 5.0 Architected for the Future Simplified analytics experience for all users
Simplified Analytics
Experience
Enterprise Big Data Integratio
n
Blended Big Data
#CASSANDRAEUCASSANDRASUMMIT
EU
Basic Cassandra Use Case• Enterprise Customer Data Store Source Systems
…
Pentaho Data Integration
Enterprise Data Store
Pentaho Data Integration
Target Systems
System Scope
Pentaho Analytics• Reporting• Dashboards• Visualization• Discovery
• Visual ETL development with Pentaho Data Integration
• Reporting, Dashboards,
Visualization and Data discovery with full spectrum analytics
#CASSANDRAEUCASSANDRASUMMIT
EU
Big Data Orchestration
#CASSANDRAEUCASSANDRASUMMIT
EU
Orchestration Toolkit
#CASSANDRAEUCASSANDRASUMMIT
EU
Pentaho Visual Development
Would you rather do this?
Integrate, Manipulate, Ingest
… or this?
Schedule
Model
#CASSANDRAEUCASSANDRASUMMIT
EU
Cassandracluster
Analytics
Broad Connectivity
PDI
#CASSANDRAEUCASSANDRASUMMIT
EU
Blending data
When copying data all over the place stops making sense
#CASSANDRAEUCASSANDRASUMMIT
EU
Analytics on Cassandra– Two Approaches
Cassandracluster
AnalyticsPDI
Data Services
Direct Access
RDBMSPDI ETL
Analytics
Access via Database
#CASSANDRAEUCASSANDRASUMMIT
EU
Direct Access to Cassandra Data
PDI ETP
Extract -> Transform -> Present
Pentaho Operational Reports
Pentaho Operational Dashboards
Cassandracluster
#CASSANDRAEUCASSANDRASUMMIT
EU
Pentaho Operational DashboardsArchitected Access for Reliable Executive Insight
#CASSANDRAEUCASSANDRASUMMIT
EU
Improve operational effectiveness• Machines/sensors: predict failures, network attacks
• Financial risk management: reduce fraud, increase security
Reduce data warehouse cost• Integrate new data sources without increased database
cost
• Provide online access to ‘dark data’
Drive incremental revenue• Predict customer behavior across all
channels
• Understand and monetize customer behavior
• Begin to monetize data as a service
Customer Value from Big DataMonetizing big data-driven use cases driving need to blend data
#CASSANDRAEUCASSANDRASUMMIT
EU
Analytics
Analyze quality of service: • Network outages• Dropped calls • Poor quality• Calls to support center
For profiles of customers:• Up for renewal• Profitable• Multiple agreements/services• In competitive area
Determine best action to take:• Billing Credit• Customer Coupon • No Action
EDW
ExistingETL Tool
or PDI
Customer
Billing
Provisioning
Call Detail Records from: • Billing• Payment• Usage
NoSQLNetwork
Location
PDI
Call Detail Records from Network: • Outages• Drops• Service Quality
PDI
Blend revenue-related and quality-of-service data together to find customers at risk
Why Blending at the Source MattersCustomer Experience Analytics for loyalty and revenue
#CASSANDRAEUCASSANDRASUMMIT
EU
• Just in time blending of data from multiple sources for a complete picture• Connect, combine and transform data from multiple sources• Query data directly from any transformation• Access architected blends with the full spectrum of Pentaho Analytics• Manage governance and security of data for on-going accuracy
Accurate, Blended Big Data AnalyticsOptimally stored data, blended when needed
EDW
ExistingETL Toolor PDI
Customer
Billing
Provisioning
NoSQLNetwork
Location
PDI
PDI Analytics
Just in time blending
#CASSANDRAEUCASSANDRASUMMIT
EU
Broadest options for storing and blending data
• New analytic use case templates for Hadoop and Splunk
• Deeper NoSQL integration to and direct reporting
• Hadoop high availability support with MapR
• Expanded big data integration
• New integrations: Redshift, Impala and Splunk
• New certifications: DataStax , Cassandra , Intel, Hortonworks, latest Cloudera, MapR, MongoDB, …
Bring More Big Data to LifeAdaptive Big Data Layer: broadest, deepest big data support
#CASSANDRAEUCASSANDRASUMMIT
EU
Demo!
Demonstrate how to easily write to and read from CassandraDemonstrate how to blend data
#CASSANDRAEUCASSANDRASUMMIT
EU
Takeaways…
#CASSANDRAEUCASSANDRASUMMIT
EU
Pentaho 5.0 key takeawaysMeeting the demands of the big data-driven enterprise
Blended Big Data at the source for more accurate insights
Enterprise-ready data integration and simplified embedding for any environment
Simplified analytics experience with a new modern interface
Analytics
Blended Big Data
EnterpriseBig Data
Integration
#CASSANDRAEUCASSANDRASUMMIT
EU
THANK YOU
Any questions?
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
www.pentaho.com