analytics for nosql sources...title track 9 - analytics for nosql sources-how to build dossiers with...

32
Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved . Analytics for NoSQL Sources How to Build Dossiers with MongoDB, Cassandra, and other NoSQL Sources Andrew Kern, MicroStrategy World 2018

Upload: others

Post on 03-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Analytics for NoSQL SourcesHow to Build Dossiers with MongoDB, Cassandra, and other NoSQL SourcesAndrew Kern, MicroStrategy World 2018

Page 2: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Safe Harbor Notice

This presentation describes features that are under development by MicroStrategy. The objective of this presentation is to provide insight into MicroStrategy’s technology direction. The functionalities described herein may or may not be released as shown.

This presentation contains statements that may constitute “forward-looking statements” for purposes of the safe harbor provisions under the Private Securities Litigation Reform Act of 1995, including estimates of future technology releases. Forward-looking statements inherently involve risks and uncertainties that could cause actual results of MicroStrategy Incorporated and its subsidiaries (collectively, the “Company”) to differ materially from the forward-looking statements.

Factors that could contribute to such differences include: the Company’s ability to develop, market and deliver on a timely and cost-effective basis new or enhanced offerings that respond to technological change or new customer requirements; delays in the Company’s ability to develop or ship new products; the extent and timing of market acceptance of MicroStrategy’s new offerings; continued acceptance of the Company’s other products in the marketplace; competitive factors; general economic conditions; and other risks detailed in the Company’s registration statements and periodic reports filed with the Securities and Exchange Commission. By making these forward-looking statements, the Company undertakes no obligation to update these statements for revisions or changes after the date of this presentation.

Page 3: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

3 Database Admins walked into a NoSQL bar. A little while later, they walked out because the couldn’t find a table.

3

Page 4: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Agenda

4

§ MicroStrategy Historical Connectivity§ Evolution of Big Data§ NoSQL Overview§ MongoDB§ Cassandra / HBase§ Apache Drill / Presto§ Suggestions/Best Practices§ Q&A

Page 5: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .5

RDBMS Platforms

Traditional Enterprise DW

Schema Model

Connecting to Data

Traditional SQL based access for reporting and dashboarding

Uses Project Schema and Developer to build models on top of Hadoop

Build reports, documents and dashboards via live connect or in-memory cubes

Preferred method if requirements include:• Leverage Hadoop layer security at runtime• Live connection • Project schema is required

High-performance, parallelized native access to Hadoop

Uses Web Data Import functionality to publish in-memory cubes for access and modeling

Build reports, documents and dashboards on top of in-memory cubes

Preferred method if requirements include:• Bulk data transfer into memory• Data wrangling• Browse and preview Hadoop files via

import GUI

Page 6: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .6

Multiple SQL Platforms

Leveraging disparate data sources

Data Import

Multi-Source & Bring-your-own-Data

Traditional SQL based access for reporting and dashboarding

Uses Project Schema and Developer to build models on top of Hadoop

Build reports, documents and dashboards via live connect or in-memory cubes

High-performance, parallelized native access to Hadoop

Uses Web Data Import functionality to publish in-memory cubes for access and modeling

Build reports, documents and dashboards on top of in-memory cubes

Preferred method if requirements include:• Bulk data transfer into memory• Data wrangling• Browse and preview Hadoop files via import GUI

Page 7: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .7

RDBMS Strengths Limitations

Evolution of Databases

• Clearly structured tables

• ANSI SQL Standard, relatively end user friendly

• Filtering/joining capabilities

• Not necessarily ideal for heavy read/write concurrency

• Clear structure is not developer friendly, requires additional ETL to take initial data streams and load them into RDBMS

• Not entirely scalable for the exponential growth/streams of data existing in modern web applications

• Not traditionally ideal for scanning wide tables

Page 8: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .8

Industry Drivers Evolution

Evolution of Databases

• Increasing volumes of incoming data via web applications

• Need to provide quick performance on read actions for web users

• Need for consistency and high availability

• Hadoop/HDFS

• Columnar Databases

• MPP platforms

• Key Value stores

• Document data stores (JSON, XML)

Page 9: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .9

Structure

How we define them

Use cases

NoSQL Databases

Data is stored in non-relational structures, such as key-value stores or document-like structures, such as XML and JSON

Allows for data to be stored in a framework that is more friendly to developers, providing a simpler write-back methodology

Have increased flexibility as compared to RDBMS systems (CAP Theorem)

Flexibility

Web development

Real time data streams

Page 10: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Popular NoSQL Data Sources

DBMS Ranking (January 2018) Database Model

Mongo DB 5 Document Store

Cassandra 8 Wide Column Store

HBase 16 Wide Column Store

Apache Drill 74 Document Store/RDBMS

Presto 124 Relational DBMS

10

*Note: Ranking is out of 341 DB systems. Data comes from http://db-engines.com/en/ranking

Page 11: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Developer friendly Flexible storage Real Time

Strengths of NoSQLWhy should I leverage NoSQL databases in my ecosystem?

No need to convert document ‘objects’ into an RDBMS framework, simply write them as-is into collections

Not constrained by rigid limtations on columns within a table.

e.g. Document 1 contains elements for attributes A, B, and C; Document 2 contains elements for attributes A, C, D, and E

The ability to specify a specific document within a collection in a quick manner supports quick performance

11

Page 12: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .12

Certified Databases

Connectivity

Connectivity

NoSQL and MicroStrategy

MongoDB (3.x)

Apache Cassandra (3.x)

MarkLogic (7)

Hbase

Presto

Drill 1.10

ODBC and JDBC connectivity (Presto is JDBC only)

MicroStrategy currently ships an ODBC driver with the MicroStrategy Intelligence Server for MongoDB

Supports access through both traditional Project Schema tables and attributes/metrics, as well as through Data Import

Page 13: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MongoDBOverview

13

Concepts

Database

Table

Row

Index

Join

Database

Collection

Document

Index

Embedded Documents

RDBMS MongoDB

{Emp ID: 007,Name: “James”,Age: 62Department:

{Dep ID:012,Dep Name:”Agent”,Priority:”high”}

}

Employeeint Emp_ID

String Nameint Age

Departmentint Dep_ID

String Dep_NameString Priority

Relation TableEmp_ID Dep_ID

Document Data Model

Relational Model

Page 14: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MongoDBBenefits

14

• Easy to scale out – Transparent to Application• High Availability• Flexible Data Model• Intuitive Design for Programmers• Good query performance for Data Consumers• BSON (Binary JSON) Storage - fast for scan (http://bsonspec.org/)

Page 15: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MongoDBMicroStrategy Integration

15

• First introduced support in MicroStrategy 9.4.1• Certified MongoDB 3.x in MicroStrategy 10.7, now certified and available

in MicroStrategy 10.4.5 and later• ODBC and JDBC drivers are shipped with MicroStrategy releases.• SQL engine generate SQL that is optimized for MongoDB • 65 out of 131 analytical functions are pushed down

Page 16: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MongoDBMicroStrategy Integration - Demo

16

Page 17: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Wide Column Store (Cassandra/HBase)Intro

17

• Based on Google’s BigTable• Very Large number of

columns • Two dimensional key-value

store

Column Family1

Row Key

Cell

Column Family2Column Qualifier

Page 18: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Cassandra and HBaseComparison

18

Cassandra

• AP out of CAP• High Availability (No single point of failure)• BigTable as Data Model; • Dynamo as the Storage Model• CQL (Cassandra Query Language)

HBase

• CP out of CAP• Strong Consistent Read/Write• BigTable – Data Model• HDFS - Storage• Hadoop Ecosystem, Rely on Zookeeper

Consistency Availability Partition Tolerance

Every read receives the most recent write or an error

Every request receives a (non-error) response – without guarantee that it contains the most recent write

The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Page 19: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MicroStrategy IntegrationCassandra

19

• Integrated Connector in MicroStrategy available initially in 10.8 (introduced later in 10.4.5)

• ODBC driver shipped out of the box with MicroStrategy

Page 20: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MicroStrategy IntegrationHBase

20

• Connectivity to MicroStrategy can be achieved via Apache Phoenix

• MicroStrategy has Certified HBase via Phoenix 4.x

Phoenix: i. A Query Engine on top of HBaseii. Maps HBase Table to Relational

Tableiii. Provide JDBC Connectioniv. HBase-specific Push Downv. Transfer SQL Query to HBase Callsvi. Execute Scan in Parallel

Page 21: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Apache DrillOverview

21

• A SQL Query Layer against NoSQL Data Sources

• JSON Data Model • Data-driven Query – Compile

Query on-the-fly without knowing the schema ahead

• Columnar Execution – Shredded, In-Mem Columnar Execution

Page 22: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Apache DrillWorkflow in Hadoop

22

Drillbit

Data Source

Drillbit

Data Source

Drillbit

Data Source

Zookeeper Managed Cluster

Client

Page 23: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Apache DrillWorkflow

23

Drillbit Drillbit Drillbit

Zookeeper Managed Cluster

Client

Page 24: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

PrestoOverview

• A Query Layer against HDFS (and others)• Distributed Queries• Efficiently query against large volume of data• SQL Support

Presto IS:

• A general-purpose RDBMS to replace MySQL/Oracle• Designed to handle OLTP scenarios.

Presto Is NOT:

• Facebook• Airbnb• Dropbox

Presto Usage

24

Page 25: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

PrestoWorkflow

25

Client

Presto Coordinator

Presto Worker

Presto Worker

Presto Worker

HDFS

Presto Cluster

Page 26: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

MicroStrategy IntegrationDrill

26

• ODBC/JDBC Connectivity available• ODBC Driver shipped with the MicroStrategy platform• MicroStrategy is certified on MapR 5.x (Drill 1.10) in 10.10

Presto

• JDBC Connectivity support• JDBC Driver is shipped with the MicroStrategy platform

Page 27: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Drill and PrestoSummary

27

Apache Drill Presto

Supported Data Source

HBaseMongoDBMapR-DBHDFSMapR-FSAmazon S3Azure Blob StorageGoogle Cloud Storage SwiftNAS and local files

Apache AccumuloCassandraHiveHDFSApache KafkaLocal File (on each Presto Worker)MongoDBMySQLPostgresSQLRedis

Page 28: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .28

Limitations

Capabilities and Recommendations

Recommendations

NoSQL and MicroStrategy

SQL interface may exist but do not assume the same strengths as RDBMS systems

Joins may be supported by the driver used but are a last resort

Limit queries to one table/document

If multiple tables are needed, consider MTDI cubes

Thoroughly understand the types of calculations you plan to perform

Page 29: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .29

MongoDB BI Connector Real-Time Data Streams

Leveraging NoSQL in the Future

Developed by MongoDB as a native BI connector

Connector works to translate SQL queries into Mongo queries

Does not store data, instead acts to provide a structured schema that can be read by MicroStrategy to generate appropriate queries

In some areas where functionality cannot be pushed down to Mongo, it will perform the action in-memory

Shipping more JDBC Drivers out of the box

New development around Application schema

Provides a quick and clean interface for reviewing your data structures and performing cleansing/ETL tasks

Is envisioned to include the ability to support real-time data streams for reporting within MicroStrategy

Page 30: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Reference

30

• https://db-engines.com/en/• Introduction to HBase Schema Design (Link)• Bigtable: A Distributed Storage System for Structured Data (Link)• MongoDB: Bringing Online Big Data to Business Intelligence &

Analytics (Link)• Next Generation Databases: NoSQL, NewSQL, and Big Data by Guy

Harrison

Page 31: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Thank You

Page 32: Analytics for NoSQL Sources...Title TRACK 9 - Analytics for NoSQL Sources-How to build dossiers with MongoDB, Cassandra, and other NoSQL sources_v1 Created Date 1/24/2018 7:16:44 PM

Copyright © 2018 MicroStrategy Incorporated. All Rights Reserved .

Q&A