nosql tel aviv meetup#1: introduction to polyglot persistance

35
NoSQL Tel Aviv Meetup #1: Polyglot Persistance Arthur Gimpel [email protected] Wi: zx Password: n0tWireless

Upload: nosql-tlv

Post on 19-Feb-2017

257 views

Category:

Internet


2 download

TRANSCRIPT

Page 1: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

NoSQL Tel Aviv Meetup #1: Polyglot Persistance

Arthur Gimpel [email protected]

Wifi: zx Password: n0tWireless

Page 2: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Welcome

`

Page 3: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›3

SaaS and more…

Page 4: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›4

SaaS and more…

Page 5: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›5

SaaS and more…

BaaS

Page 6: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›6

NoSQL Tel Aviv: Meetup Agenda

E

OBJECTIVE COMPARISONS

q

NETWORKING

p

KNOWLEGE SHARING

Page 7: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›7

About Me

•Working with databases for 8 years

•5 years, SQL Server & .NET

•3 years with NoSQL & Python & Node.js

•2015 - Founded DataZone

Page 8: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›8

DataZone | Data is our business! What’s yours?

•Consultancy & projects

•Private & public training

•Multi vendor, multi tier support with SLA

•Child unit of CloudZone, public cloud leaders

Page 9: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Use case

o

Page 10: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›10

uBar: Toolbar Company

•uBar’s toolbar provides a search engine and various utilities on the toolbar itself

•uBar’s revenue streams:

•Ads, provided on uBar’s search engine

•Bundled downloads with partners

•Selling user data & statistics, gathered by the toolbar user’s usage analysis

Page 11: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›11

uBar: Architecture

MSSQL

Sessions Toolbar Usage Analytics•uBar’s solution is built on SOA:

•Sessions: Session & users mgmt. service

•Toolbar Usage: user statistics gathering

•Analytics: Near realtime BI

Page 12: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›12

uBar: Sessions Service - Features

MSSQL

Sessions

•Sessions are created when a client opens a

browser

•Sessions are ended when client closes browser,

or no activity is made during some specific time

•Users are mainly marketing, campaign

managers, media buyers and more. Those users

consume data from the Analytics service

Page 13: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›13

uBar: Sessions Service - Main Objects

MSSQL

Sessions

•Session: SessionId, ToolbarClientId,

UserId, UserAgent, StartTime

•User: UserId, UserPermissions,

Username, PasswordHash

•UserPermissions: UserId, PermissionId

•Permissions: PermissionId, Name

Page 14: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›14

uBar: Toolbar Usage Service - Features

MSSQL

•Every time an event occurs, like opening

a browser by a client, or browsing the

internet, the usage service saves data

about this event in the relevant table.

•ToolbarUsage writes ± 50M events per

day

Toolbar Usage

Page 15: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›15

uBar: Toolbar Usage Service - Main Objects

MSSQL

•ToolbarStart: ToolbarClientId, StartTime, [User data columns]

•NewTab: ToolbarClientId, NewTabUrl, [User data columns]

•ToolbarClicks: ToolbarClientId, ToolbarFeatureId, [User data

columns]

•WebsiteVisit: ToolbarClientId, WebsiteUrl, [User data

columns]

•ToolbarClients: ToolbarClientId, ToolbarVersion,

BundledVersion, BundleId

Toolbar Usage

Page 16: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›16

uBar: Analytics Service - Features

MSSQL

•Analytics service is providing Users with dashboards filled

with data.

•The data is pre aggregated every 1 hour in the database,

and saved to different tables

•The analytics service provides the most important KPI when

releasing campaigns to millions of users, and according to

its data operative decisions are made(stopping bad

campaigns, detecting bugs, a\b testing etc..)

Analytics

Page 17: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›17

uBar: Challenges

•Velocity: 10k writes/sec on Usage service, 1k writes/

sec on Sessions service

•Volume: 1TB of operational data(1 month retention)

•New clients increase the velocity, and IO subsystem is

a bottleneck

•Campaign managers want more and more insights in

realtime, which require writing complex aggregation

jobs on the database and use CPU intensively.

RDBMS

Sessions Toolbar Usage Analytics

Page 18: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›18

Issues with Relational Database Management Systems in the IoT Age

•Everything is persisted, synchronously. Limited by IO

performance.

•All data is bound to a tabular schema, hard to make

changes in big databases.

•All data relies on a single data store, making it hard to

scale horizontally.

•Complex schema slows down aggregations and

queries drastically.

RDBMS

Sessions Toolbar Usage Analytics

Page 19: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›19

Polyglot Persistance: Overview

Key Value

Suitable for key value access patterns. Main benefits are concurrency on key level (Optimistic & Pessimistic), and extremely easy scaling.

Document StoreData which is more suitable for OOP languages, storing complex data (JSON) while allowing scaling and distribution.

Search / Index stores

Every data store serves a different component of the application, according to its access patterns and needs.

Consept

Suitable for cases where the main data store cannot handle complex querying, Allows scaling the querying layers separately from operational data access (CUD in CRUD).

Page 20: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›20

uBar: New Data Solution’s Targets

New Data Solution

Handle the traffic, Velocity and Volume should not limit the product

Allow more realtime analytics, and more complex slice & dice for the product

Use open source where possible, Reduce costs.

Page 21: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Evaluation

E

Page 22: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›22

uBar: Analysing Sessions schema analysis & access patterns

Sessions

•Sessions are written with a UUID(SessionId),

and not sorted in any way in the table (Heap).

• Values:

•ToolbarClientId (Foreign key to ToolbarClient)

•UserId (Foreign key to User)

•UserAgent (Unstructured string)

•StartTime (DateTime)

?

Page 23: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›23

uBar: Analysing Sessions schema analysis & access patterns

Sessions

•Users and Permission tables are quite

simple and its own values with Many to

Many relation table (UsersPermissions)?

Page 24: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›24

uBar: Analysing Sessions schema analysis & access patterns

Sessions

•Sessions writing Velocity is 1k/sec. IO is a

bottleneck.

•Sessions are written in Key Value pattern

•Users and Permissions are not

problematic, since those are cached in

the application and rarely change.

?

Page 25: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›25

uBar: Possible data stores for Sessions service

Sessions•Candidate technologies with needed

throughput, complex data support, and

needed velocity: Redis, Couchbase,

Marklogic?

Page 26: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›26

uBar: Analysing Toolbar Usage schema analysis & access patterns

•Toolbar Usage tables are not normalized in SQL

Server, and written as raw data.

•Usage write pattern is key value, where value is large

(30kb) and unstructured(User agent).

•Velocity in writes is 10k/sec,

•Toolbar Usage data is also time series data. The tables

have a clustered TimeStamp column(and partitioned

by it), for easier Analytics and aggregation.

?

Toolbar UsageSessions

Redis?

Couchbase?

Marklogic?

Page 27: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›27

uBar: Possible data stores for Toolbar Usage service

•Again, needed write pattern is Key Value.

•Data sizing, and needed throughput fits

Redis, Couchbase, Marklogic the same

way.

•Sessions and ToolbarUsage both can rely

(potentially) on the same data store.

Toolbar UsageSessions

Redis?

Couchbase?

Marklogic?

Redis?

Couchbase?

Marklogic?

Page 28: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›28

uBar: Analysing Toolbar Usage schema analysis & access patterns

•Analytics service’s schema is based on aggregated

data of ToolbarUsage & Sessions services.

•Development should be simple, in order to allow

maximal elasticity for product and analysts.

•Analysts should be able to query the data / ad hoc

•Data refresh should be less than 15 minutes

Toolbar UsageSessions

Redis?

Couchbase?

Marklogic?

Redis?

Couchbase?

Marklogic?

Analytics

?

Page 29: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›29

uBar: Possible data stores for Sessions service

•Possible services for analytics divide to

various groups:

•Classic BI solutions: Tableu, Qlikview,

Pantahoo

•Column Store DBMS: Redshift, Vertica..

•Pure search engine: Elasticsearch, Solr..

Toolbar UsageSessions

Redis?

Couchbase?

Marklogic?

Redis?

Couchbase?

Marklogic?

Analytics

BI Tools

ColumnStore

Search Engine

Page 30: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›30

uBar: Putting it all together - Operational Needs

Toolbar UsageSessions

Redis?

Couchbase?

Marklogic?

Redis?

Couchbase?

Marklogic?

AnalyticsVelocity Volume Price

Couchbase V V Low - Mid

Redis V V Low - Mid

Marklogic V V High

BI Tools

ColumnStore

Search Engine

Page 31: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›31

uBar: Putting it all together - Operational Needs

Toolbar UsageSessions

Redis?

Couchbase?

AnalyticsSupport Integration Final Notes

Couchbase Vendor Support - SLA

Elasticsearch - XDCR

SQL Compatible - JDBC ODBC

Rich integrations, High quality

Support

Redis Redis Labs - Managed Plugin for Solr Managed - no

maintenance

BI Tools

ColumnStore

Search Engine

Redis?

Couchbase?

Page 32: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›32

uBar: Putting it all together - Analytical Needs

Toolbar UsageSessions

Redis?

Couchbase?

AnalyticsPossibilities Pros Cons

BI SolutionsTableu

Pentahoo Qlikview

Simple for business users, Integrates with

Couchbase

Might get expensive

Search Engines Elasticsearch Solr

Highly customizable

Querying is not straight

forward

BI Tools

Search Engine

Redis?

Couchbase?

Page 33: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›33

uBar: Final Architecture #1

Toolbar UsageSessions

Managed

Redis

Analytics

ElasticsearchManaged

Redis

•Redis is managed. No maintenance at all

for operational and scalable cluster.

•Using Elasticsearch with Kibana is great

for time series data

•Data transformation will be made through

ETL.

Page 34: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

Arthur Gimpell

‹ ›34

uBar: Final Architecture #2

Toolbar UsageSessions

Couchbase

Analytics

BI Tools +

ElasticsearchCouchbase

•Couchbase is easy to use.

•With Couchbase’s SQL on JSONs (N1QL) It is 0

configuration to make it a data source for every

possible BI solution

•Couchbase’s Filtered replication to

Elasticsearch allows it to function only where

SQL is not enough.

Page 35: NoSQL Tel Aviv Meetup#1: Introduction to Polyglot Persistance

So, What do

you choose?