analytics for nosql sources...title track 9 - analytics for nosql sources-how to build dossiers with...
TRANSCRIPT
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Analytics for NoSQL SourcesHow to Build Dossiers with MongoDB, Cassandra, and other NoSQL SourcesAndrew Kern, MicroStrategy World 2018
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Safe Harbor Notice
This presentation describes features that are under development by MicroStrategy. The objective of this presentation is to provide insight into MicroStrategy’s technology direction. The functionalities described herein may or may not be released as shown.
This presentation contains statements that may constitute “forward-looking statements” for purposes of the safe harbor provisions under the Private Securities Litigation Reform Act of 1995, including estimates of future technology releases. Forward-looking statements inherently involve risks and uncertainties that could cause actual results of MicroStrategy Incorporated and its subsidiaries (collectively, the “Company”) to differ materially from the forward-looking statements.
Factors that could contribute to such differences include: the Company’s ability to develop, market and deliver on a timely and cost-effective basis new or enhanced offerings that respond to technological change or new customer requirements; delays in the Company’s ability to develop or ship new products; the extent and timing of market acceptance of MicroStrategy’s new offerings; continued acceptance of the Company’s other products in the marketplace; competitive factors; general economic conditions; and other risks detailed in the Company’s registration statements and periodic reports filed with the Securities and Exchange Commission. By making these forward-looking statements, the Company undertakes no obligation to update these statements for revisions or changes after the date of this presentation.
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
3 Database Admins walked into a NoSQL bar. A little while later, they walked out because the couldn’t find a table.
3
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Agenda
4
§ MicroStrategy Historical Connectivity§ Evolution of Big Data§ NoSQL Overview§ MongoDB§ Cassandra / HBase§ Apache Drill / Presto§ Suggestions/Best Practices§ Q&A
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .5
RDBMS Platforms
Traditional Enterprise DW
Schema Model
Connecting to Data
Traditional SQL based access for reporting and dashboarding
Uses Project Schema and Developer to build models on top of Hadoop
Build reports, documents and dashboards via live connect or in-memory cubes
Preferred method if requirements include:• Leverage Hadoop layer security at runtime• Live connection • Project schema is required
High-performance, parallelized native access to Hadoop
Uses Web Data Import functionality to publish in-memory cubes for access and modeling
Build reports, documents and dashboards on top of in-memory cubes
Preferred method if requirements include:• Bulk data transfer into memory• Data wrangling• Browse and preview Hadoop files via
import GUI
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .6
Multiple SQL Platforms
Leveraging disparate data sources
Data Import
Multi-Source & Bring-your-own-Data
Traditional SQL based access for reporting and dashboarding
Uses Project Schema and Developer to build models on top of Hadoop
Build reports, documents and dashboards via live connect or in-memory cubes
High-performance, parallelized native access to Hadoop
Uses Web Data Import functionality to publish in-memory cubes for access and modeling
Build reports, documents and dashboards on top of in-memory cubes
Preferred method if requirements include:• Bulk data transfer into memory• Data wrangling• Browse and preview Hadoop files via import GUI
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .7
RDBMS Strengths Limitations
Evolution of Databases
• Clearly structured tables
• ANSI SQL Standard, relatively end user friendly
• Filtering/joining capabilities
• Not necessarily ideal for heavy read/write concurrency
• Clear structure is not developer friendly, requires additional ETL to take initial data streams and load them into RDBMS
• Not entirely scalable for the exponential growth/streams of data existing in modern web applications
• Not traditionally ideal for scanning wide tables
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .8
Industry Drivers Evolution
Evolution of Databases
• Increasing volumes of incoming data via web applications
• Need to provide quick performance on read actions for web users
• Need for consistency and high availability
• Hadoop/HDFS
• Columnar Databases
• MPP platforms
• Key Value stores
• Document data stores (JSON, XML)
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .9
Structure
How we define them
Use cases
NoSQL Databases
Data is stored in non-relational structures, such as key-value stores or document-like structures, such as XML and JSON
Allows for data to be stored in a framework that is more friendly to developers, providing a simpler write-back methodology
Have increased flexibility as compared to RDBMS systems (CAP Theorem)
Flexibility
Web development
Real time data streams
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Popular NoSQL Data Sources
DBMS Ranking (January 2018) Database Model
Mongo DB 5 Document Store
Cassandra 8 Wide Column Store
HBase 16 Wide Column Store
Apache Drill 74 Document Store/RDBMS
Presto 124 Relational DBMS
10
*Note: Ranking is out of 341 DB systems. Data comes from http://db-engines.com/en/ranking
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Developer friendly Flexible storage Real Time
Strengths of NoSQLWhy should I leverage NoSQL databases in my ecosystem?
No need to convert document ‘objects’ into an RDBMS framework, simply write them as-is into collections
Not constrained by rigid limtations on columns within a table.
e.g. Document 1 contains elements for attributes A, B, and C; Document 2 contains elements for attributes A, C, D, and E
The ability to specify a specific document within a collection in a quick manner supports quick performance
11
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .12
Certified Databases
Connectivity
Connectivity
NoSQL and MicroStrategy
MongoDB (3.x)
Apache Cassandra (3.x)
MarkLogic (7)
Hbase
Presto
Drill 1.10
ODBC and JDBC connectivity (Presto is JDBC only)
MicroStrategy currently ships an ODBC driver with the MicroStrategy Intelligence Server for MongoDB
Supports access through both traditional Project Schema tables and attributes/metrics, as well as through Data Import
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MongoDBOverview
13
Concepts
Database
Table
Row
Index
Join
Database
Collection
Document
Index
Embedded Documents
RDBMS MongoDB
{Emp ID: 007,Name: “James”,Age: 62Department:
{Dep ID:012,Dep Name:”Agent”,Priority:”high”}
}
Employeeint Emp_ID
String Nameint Age
Departmentint Dep_ID
String Dep_NameString Priority
Relation TableEmp_ID Dep_ID
Document Data Model
Relational Model
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MongoDBBenefits
14
• Easy to scale out – Transparent to Application• High Availability• Flexible Data Model• Intuitive Design for Programmers• Good query performance for Data Consumers• BSON (Binary JSON) Storage - fast for scan (http://bsonspec.org/)
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MongoDBMicroStrategy Integration
15
• First introduced support in MicroStrategy 9.4.1• Certified MongoDB 3.x in MicroStrategy 10.7, now certified and available
in MicroStrategy 10.4.5 and later• ODBC and JDBC drivers are shipped with MicroStrategy releases.• SQL engine generate SQL that is optimized for MongoDB • 65 out of 131 analytical functions are pushed down
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MongoDBMicroStrategy Integration - Demo
16
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Wide Column Store (Cassandra/HBase)Intro
17
• Based on Google’s BigTable• Very Large number of
columns • Two dimensional key-value
store
Column Family1
Row Key
Cell
Column Family2Column Qualifier
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Cassandra and HBaseComparison
18
Cassandra
• AP out of CAP• High Availability (No single point of failure)• BigTable as Data Model; • Dynamo as the Storage Model• CQL (Cassandra Query Language)
HBase
• CP out of CAP• Strong Consistent Read/Write• BigTable – Data Model• HDFS - Storage• Hadoop Ecosystem, Rely on Zookeeper
Consistency Availability Partition Tolerance
Every read receives the most recent write or an error
Every request receives a (non-error) response – without guarantee that it contains the most recent write
The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MicroStrategy IntegrationCassandra
19
• Integrated Connector in MicroStrategy available initially in 10.8 (introduced later in 10.4.5)
• ODBC driver shipped out of the box with MicroStrategy
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MicroStrategy IntegrationHBase
20
• Connectivity to MicroStrategy can be achieved via Apache Phoenix
• MicroStrategy has Certified HBase via Phoenix 4.x
Phoenix: i. A Query Engine on top of HBaseii. Maps HBase Table to Relational
Tableiii. Provide JDBC Connectioniv. HBase-specific Push Downv. Transfer SQL Query to HBase Callsvi. Execute Scan in Parallel
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Apache DrillOverview
21
• A SQL Query Layer against NoSQL Data Sources
• JSON Data Model • Data-driven Query – Compile
Query on-the-fly without knowing the schema ahead
• Columnar Execution – Shredded, In-Mem Columnar Execution
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Apache DrillWorkflow in Hadoop
22
Drillbit
Data Source
Drillbit
Data Source
Drillbit
Data Source
Zookeeper Managed Cluster
Client
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Apache DrillWorkflow
23
Drillbit Drillbit Drillbit
Zookeeper Managed Cluster
Client
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
PrestoOverview
• A Query Layer against HDFS (and others)• Distributed Queries• Efficiently query against large volume of data• SQL Support
Presto IS:
• A general-purpose RDBMS to replace MySQL/Oracle• Designed to handle OLTP scenarios.
Presto Is NOT:
• Facebook• Airbnb• Dropbox
Presto Usage
24
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
PrestoWorkflow
25
Client
Presto Coordinator
Presto Worker
Presto Worker
Presto Worker
HDFS
Presto Cluster
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
MicroStrategy IntegrationDrill
26
• ODBC/JDBC Connectivity available• ODBC Driver shipped with the MicroStrategy platform• MicroStrategy is certified on MapR 5.x (Drill 1.10) in 10.10
Presto
• JDBC Connectivity support• JDBC Driver is shipped with the MicroStrategy platform
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Drill and PrestoSummary
27
Apache Drill Presto
Supported Data Source
HBaseMongoDBMapR-DBHDFSMapR-FSAmazon S3Azure Blob StorageGoogle Cloud Storage SwiftNAS and local files
Apache AccumuloCassandraHiveHDFSApache KafkaLocal File (on each Presto Worker)MongoDBMySQLPostgresSQLRedis
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .28
Limitations
Capabilities and Recommendations
Recommendations
NoSQL and MicroStrategy
SQL interface may exist but do not assume the same strengths as RDBMS systems
Joins may be supported by the driver used but are a last resort
Limit queries to one table/document
If multiple tables are needed, consider MTDI cubes
Thoroughly understand the types of calculations you plan to perform
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .29
MongoDB BI Connector Real-Time Data Streams
Leveraging NoSQL in the Future
Developed by MongoDB as a native BI connector
Connector works to translate SQL queries into Mongo queries
Does not store data, instead acts to provide a structured schema that can be read by MicroStrategy to generate appropriate queries
In some areas where functionality cannot be pushed down to Mongo, it will perform the action in-memory
Shipping more JDBC Drivers out of the box
New development around Application schema
Provides a quick and clean interface for reviewing your data structures and performing cleansing/ETL tasks
Is envisioned to include the ability to support real-time data streams for reporting within MicroStrategy
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Reference
30
• https://db-engines.com/en/• Introduction to HBase Schema Design (Link)• Bigtable: A Distributed Storage System for Structured Data (Link)• MongoDB: Bringing Online Big Data to Business Intelligence &
Analytics (Link)• Next Generation Databases: NoSQL, NewSQL, and Big Data by Guy
Harrison
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Thank You
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .
Q&A