nosql tel aviv meetup#1: introduction to polyglot persistance
TRANSCRIPT
NoSQL Tel Aviv Meetup #1: Polyglot Persistance
Arthur Gimpel [email protected]
Wifi: zx Password: n0tWireless
Welcome
`
Arthur Gimpell
‹ ›3
SaaS and more…
Arthur Gimpell
‹ ›4
SaaS and more…
Arthur Gimpell
‹ ›5
SaaS and more…
BaaS
Arthur Gimpell
‹ ›6
NoSQL Tel Aviv: Meetup Agenda
E
OBJECTIVE COMPARISONS
q
NETWORKING
p
KNOWLEGE SHARING
Arthur Gimpell
‹ ›7
About Me
•Working with databases for 8 years
•5 years, SQL Server & .NET
•3 years with NoSQL & Python & Node.js
•2015 - Founded DataZone
Arthur Gimpell
‹ ›8
DataZone | Data is our business! What’s yours?
•Consultancy & projects
•Private & public training
•Multi vendor, multi tier support with SLA
•Child unit of CloudZone, public cloud leaders
Use case
o
Arthur Gimpell
‹ ›10
uBar: Toolbar Company
•uBar’s toolbar provides a search engine and various utilities on the toolbar itself
•uBar’s revenue streams:
•Ads, provided on uBar’s search engine
•Bundled downloads with partners
•Selling user data & statistics, gathered by the toolbar user’s usage analysis
Arthur Gimpell
‹ ›11
uBar: Architecture
MSSQL
Sessions Toolbar Usage Analytics•uBar’s solution is built on SOA:
•Sessions: Session & users mgmt. service
•Toolbar Usage: user statistics gathering
•Analytics: Near realtime BI
Arthur Gimpell
‹ ›12
uBar: Sessions Service - Features
MSSQL
Sessions
•Sessions are created when a client opens a
browser
•Sessions are ended when client closes browser,
or no activity is made during some specific time
•Users are mainly marketing, campaign
managers, media buyers and more. Those users
consume data from the Analytics service
Arthur Gimpell
‹ ›13
uBar: Sessions Service - Main Objects
MSSQL
Sessions
•Session: SessionId, ToolbarClientId,
UserId, UserAgent, StartTime
•User: UserId, UserPermissions,
Username, PasswordHash
•UserPermissions: UserId, PermissionId
•Permissions: PermissionId, Name
Arthur Gimpell
‹ ›14
uBar: Toolbar Usage Service - Features
MSSQL
•Every time an event occurs, like opening
a browser by a client, or browsing the
internet, the usage service saves data
about this event in the relevant table.
•ToolbarUsage writes ± 50M events per
day
Toolbar Usage
Arthur Gimpell
‹ ›15
uBar: Toolbar Usage Service - Main Objects
MSSQL
•ToolbarStart: ToolbarClientId, StartTime, [User data columns]
•NewTab: ToolbarClientId, NewTabUrl, [User data columns]
•ToolbarClicks: ToolbarClientId, ToolbarFeatureId, [User data
columns]
•WebsiteVisit: ToolbarClientId, WebsiteUrl, [User data
columns]
•ToolbarClients: ToolbarClientId, ToolbarVersion,
BundledVersion, BundleId
Toolbar Usage
Arthur Gimpell
‹ ›16
uBar: Analytics Service - Features
MSSQL
•Analytics service is providing Users with dashboards filled
with data.
•The data is pre aggregated every 1 hour in the database,
and saved to different tables
•The analytics service provides the most important KPI when
releasing campaigns to millions of users, and according to
its data operative decisions are made(stopping bad
campaigns, detecting bugs, a\b testing etc..)
Analytics
Arthur Gimpell
‹ ›17
uBar: Challenges
•Velocity: 10k writes/sec on Usage service, 1k writes/
sec on Sessions service
•Volume: 1TB of operational data(1 month retention)
•New clients increase the velocity, and IO subsystem is
a bottleneck
•Campaign managers want more and more insights in
realtime, which require writing complex aggregation
jobs on the database and use CPU intensively.
RDBMS
Sessions Toolbar Usage Analytics
Arthur Gimpell
‹ ›18
Issues with Relational Database Management Systems in the IoT Age
•Everything is persisted, synchronously. Limited by IO
performance.
•All data is bound to a tabular schema, hard to make
changes in big databases.
•All data relies on a single data store, making it hard to
scale horizontally.
•Complex schema slows down aggregations and
queries drastically.
RDBMS
Sessions Toolbar Usage Analytics
Arthur Gimpell
‹ ›19
Polyglot Persistance: Overview
Key Value
Suitable for key value access patterns. Main benefits are concurrency on key level (Optimistic & Pessimistic), and extremely easy scaling.
Document StoreData which is more suitable for OOP languages, storing complex data (JSON) while allowing scaling and distribution.
Search / Index stores
Every data store serves a different component of the application, according to its access patterns and needs.
Consept
Suitable for cases where the main data store cannot handle complex querying, Allows scaling the querying layers separately from operational data access (CUD in CRUD).
Arthur Gimpell
‹ ›20
uBar: New Data Solution’s Targets
New Data Solution
Handle the traffic, Velocity and Volume should not limit the product
Allow more realtime analytics, and more complex slice & dice for the product
Use open source where possible, Reduce costs.
Evaluation
E
Arthur Gimpell
‹ ›22
uBar: Analysing Sessions schema analysis & access patterns
Sessions
•Sessions are written with a UUID(SessionId),
and not sorted in any way in the table (Heap).
• Values:
•ToolbarClientId (Foreign key to ToolbarClient)
•UserId (Foreign key to User)
•UserAgent (Unstructured string)
•StartTime (DateTime)
?
Arthur Gimpell
‹ ›23
uBar: Analysing Sessions schema analysis & access patterns
Sessions
•Users and Permission tables are quite
simple and its own values with Many to
Many relation table (UsersPermissions)?
Arthur Gimpell
‹ ›24
uBar: Analysing Sessions schema analysis & access patterns
Sessions
•Sessions writing Velocity is 1k/sec. IO is a
bottleneck.
•Sessions are written in Key Value pattern
•Users and Permissions are not
problematic, since those are cached in
the application and rarely change.
?
Arthur Gimpell
‹ ›25
uBar: Possible data stores for Sessions service
Sessions•Candidate technologies with needed
throughput, complex data support, and
needed velocity: Redis, Couchbase,
Marklogic?
Arthur Gimpell
‹ ›26
uBar: Analysing Toolbar Usage schema analysis & access patterns
•Toolbar Usage tables are not normalized in SQL
Server, and written as raw data.
•Usage write pattern is key value, where value is large
(30kb) and unstructured(User agent).
•Velocity in writes is 10k/sec,
•Toolbar Usage data is also time series data. The tables
have a clustered TimeStamp column(and partitioned
by it), for easier Analytics and aggregation.
?
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Arthur Gimpell
‹ ›27
uBar: Possible data stores for Toolbar Usage service
•Again, needed write pattern is Key Value.
•Data sizing, and needed throughput fits
Redis, Couchbase, Marklogic the same
way.
•Sessions and ToolbarUsage both can rely
(potentially) on the same data store.
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
Arthur Gimpell
‹ ›28
uBar: Analysing Toolbar Usage schema analysis & access patterns
•Analytics service’s schema is based on aggregated
data of ToolbarUsage & Sessions services.
•Development should be simple, in order to allow
maximal elasticity for product and analysts.
•Analysts should be able to query the data / ad hoc
•Data refresh should be less than 15 minutes
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
Analytics
?
Arthur Gimpell
‹ ›29
uBar: Possible data stores for Sessions service
•Possible services for analytics divide to
various groups:
•Classic BI solutions: Tableu, Qlikview,
Pantahoo
•Column Store DBMS: Redshift, Vertica..
•Pure search engine: Elasticsearch, Solr..
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
Analytics
BI Tools
ColumnStore
Search Engine
Arthur Gimpell
‹ ›30
uBar: Putting it all together - Operational Needs
Toolbar UsageSessions
Redis?
Couchbase?
Marklogic?
Redis?
Couchbase?
Marklogic?
AnalyticsVelocity Volume Price
Couchbase V V Low - Mid
Redis V V Low - Mid
Marklogic V V High
BI Tools
ColumnStore
Search Engine
Arthur Gimpell
‹ ›31
uBar: Putting it all together - Operational Needs
Toolbar UsageSessions
Redis?
Couchbase?
AnalyticsSupport Integration Final Notes
Couchbase Vendor Support - SLA
Elasticsearch - XDCR
SQL Compatible - JDBC ODBC
Rich integrations, High quality
Support
Redis Redis Labs - Managed Plugin for Solr Managed - no
maintenance
BI Tools
ColumnStore
Search Engine
Redis?
Couchbase?
Arthur Gimpell
‹ ›32
uBar: Putting it all together - Analytical Needs
Toolbar UsageSessions
Redis?
Couchbase?
AnalyticsPossibilities Pros Cons
BI SolutionsTableu
Pentahoo Qlikview
Simple for business users, Integrates with
Couchbase
Might get expensive
Search Engines Elasticsearch Solr
Highly customizable
Querying is not straight
forward
BI Tools
Search Engine
Redis?
Couchbase?
Arthur Gimpell
‹ ›33
uBar: Final Architecture #1
Toolbar UsageSessions
Managed
Redis
Analytics
ElasticsearchManaged
Redis
•Redis is managed. No maintenance at all
for operational and scalable cluster.
•Using Elasticsearch with Kibana is great
for time series data
•Data transformation will be made through
ETL.
Arthur Gimpell
‹ ›34
uBar: Final Architecture #2
Toolbar UsageSessions
Couchbase
Analytics
BI Tools +
ElasticsearchCouchbase
•Couchbase is easy to use.
•With Couchbase’s SQL on JSONs (N1QL) It is 0
configuration to make it a data source for every
possible BI solution
•Couchbase’s Filtered replication to
Elasticsearch allows it to function only where
SQL is not enough.
So, What do
you choose?