turbocharge your mysql - percona€¦ · turbocharge your mysql analytics with elasticsearch...

55
Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe 2017

Upload: others

Post on 22-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Turbocharge your MySQL analytics with ElasticSearch

Guillaume LefrancData & Infrastructure Architect, Productsup GmbH

Percona Live Europe 2017

Page 2: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

About the Speaker

Guillaume Lefranc • Data Architect at Productsup

• Replication Manager for MySQL and MariaDB - Lead Architect

• DBA Manager at MariaDB Corporation

• Infrastructure Consultant at Coinigy

• DB Architect at dailymotion.com

Page 3: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Takeaways

In this presentation, we will speak about:

● How Elasticsearch works as a document and column store

● What are its strengths and weaknesses when it comes to analytics

● How to sync data with MySQL

● How to build aggregations

Page 4: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Case study

Disclaimer: This case study is about medium data, not big data

Page 5: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Case study

Disclaimer: This case study is about medium data, not big data

What is medium data?

Page 6: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Case study

Disclaimer: This case study is about medium data, not big data

What is medium data?

Answer: from 100GB to a few TBs

Page 7: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Case study

Disclaimer: This case study is about medium data, not big data

What is medium data?

Answer: from 100GB to a few TBs

Types of data:

● User activity (Clicks)

Page 8: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Case study

Disclaimer: This case study is about medium data, not big data

What is medium data?

Answer: from 100GB to a few TBs

Types of data:

● User activity (Clicks)

● Market Data

Page 9: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Case study

Disclaimer: This case study is about medium data, not big data

What is medium data?

Answer: from 100GB to a few TBs

Types of data:

● User activity (Clicks)

● Market Data

● Trips

Page 10: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Use Cases

Business cases:

● A ride-sharing app

Example dataset: NYC Taxi Data (6 months: 78 million trips)

Page 11: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Use Cases

Business cases:

● A ride-sharing app

Example dataset: NYC Taxi Data (6 months: 78 million trips)

● Cryptocurrency market data

200 million documents per month - Courtesy of coinigy.com

Page 12: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

MySQL and Medium Data

● Medium data can scale well in MySQL (SELECT … WHERE id = ?)

Page 13: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

MySQL and Medium Data

● Medium data can scale well in MySQL (SELECT … WHERE id = ?)

● … not with Analytics

Page 14: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

MySQL and Medium Data

● Medium data can scale well in MySQL (SELECT … WHERE id = ?)

● … not with Analytics

● Every case cannot be covered by an index

Page 15: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

MySQL and Medium Data

● Medium data can scale well in MySQL (SELECT … WHERE id = ?)

● … not with Analytics

● Every case cannot be covered by an index

● Aggregations can be slow, especially if doing table scans

Page 16: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - What?

● “You know, for Search” -> Inverted Index

Page 17: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - What?

● “You know, for Search” -> Inverted Index

● Document Store

Page 18: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - What?

● “You know, for Search” -> Inverted Index

● Document Store

● REST API

○ POST /market_data -d ‘{ "market": "BTC/USD", value: “3418.03953” }

Page 19: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - What?

● “You know, for Search” -> Inverted Index

● Document Store

● REST API

○ POST /market_data -d ‘{ "market": "BTC/USD", value: “3418.03953” }

● JSON Native

Page 20: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Anatomy of a Document

{ "_index" : "raw-2016-07", "_type" : "market_raw", "_id" : "108051174765", "_score" : 7.455347, "_source" : { "quantity" : 0.64130859, "time_local" : "2016-07-12 06:45:34", "type" : "SELL", "market" : "USD/BTC", "total" : 414.52263332, "@timestamp" : "2016-07-12T06:45:34.000Z", "price" : 646.37, "exchange" : "BITS", "id" : 108051174765, "tradeid" : "11649948" }

Page 21: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Why?

● Distributed

Page 22: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Why?

● Distributed

● Fault Tolerant

Page 23: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Why?

● Distributed

● Fault Tolerant

● Scales Horizontally

Page 24: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Why?

● Distributed

● Fault Tolerant

● Scales Horizontally

Page 25: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Column Store

● All fields are indexed by default

Page 26: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Column Store

● All fields are indexed by default

● Query is (almost) always an index scan

Page 27: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Column Store

● All fields are indexed by default

● Query is (almost) always an index scan

● Doc values

○ Field values serialized on disk

○ Not stored in the JVM Heap: OS cache reliant

○ Compression

○ By default: all numerics, geo_points, dates, IPs and not_analyzed strings (keywords)

Page 28: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Column Store

● All fields are indexed by default

● Query is (almost) always an index scan

● Doc values

○ Field values serialized on disk

○ Not stored in the JVM Heap: OS cache reliant

○ Compression

○ By default: all numerics, geo_points, dates, IPs and not_analyzed strings (keywords)

● Not a general purpose column-store replacement

Page 29: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Elasticsearch - Column Store

● All fields are indexed by default

● Query is (almost) always an index scan

● Doc values

○ Field values serialized on disk

○ Not stored in the JVM Heap: OS cache reliant

○ Compression

○ By default: all numerics, geo_points, dates, IPs and not_analyzed strings (keywords)

● Not a general purpose column-store replacement

● Not exactly fast as a document DB

Bibliography:

● https://www.elastic.co/blog/elasticsearch-as-a-column-store

● https://www.elastic.co/guide/en/elasticsearch/guide/current/_deep_dive_on_doc_values.html

Page 30: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Alternatives (Open Source)

● MariaDB Columnstore

● Yandex Clickhouse

● Apache Spark

Bibliography:

https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs

-clickhouse-vs-apache-spark/

Page 31: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

MySQL Analytics: Performance

Fast if data is in the index

Very slow if linear scans have to be used

Example:

SELECT DATE(pickup_datetime) AS date, SUM(total_amount) AS earnings FROM trips WHERE driver_id=102 GROUP BY date ORDER BY pickup_datetime DESC;-> returns in milliseconds

SELECT DATE(pickup_datetime) AS date, SUM(total_amount) AS earnings FROM trips GROUP BY date ORDER BY pickup_datetime DESC;-> 4 minutes

Page 32: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Importing data: Logstash

● Logstash is an open source data collection engine with real-time pipelining capabilities.

● The L in ELK Stack

● ETL for ElasticSearch

● Pipeline model (input -> filter -> output)

Page 33: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Importing data - Input

input { jdbc { jdbc_driver_library => "/usr/share/java/mysql-connector-java.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://db1:3306/taxi_platform?useCursorFetch=true" jdbc_user => "root" jdbc_password => "admin" jdbc_fetch_size => 100000 statement => "SELECT id, driver_id, passenger_id, pickup_datetime, dropoff_datetime, CONCAT(pickup_latitude, ',', pickup_longitude) AS pickup_location, CONCAT(dropoff_latitude,',', dropoff_longitude) AS dropoff_location, payment_type, total_amount from trips WHERE id > :sql_last_value" use_column_value => true tracking_column => id #schedule => "*/5 * * * *" }

Page 34: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Importing data - Input

● Dealing with large result setsjdbc_connection_string => "jdbc:mysql://db1:3306/taxi_platform?useCursorFetch=true"jdbc_fetch_size => 10000

● SQL Last Value use_column_value => true tracking_column => id statement => "SELECT … WHERE id > :sql_last_value"

● Schedulerschedule => "*/5 * * * *"

Reference: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

Page 35: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Importing Data - Filters

Filters (Input Transformation)

filter { mutate { convert => [ "pickup_datetime", "string" ] } date { match => [ "pickup_datetime", "ISO8601" ] }}

● Date field is used by partitioning, if there is no suitable field, Elasticsearch will use the current date and

time

Page 36: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Importing Data - Output

output { elasticsearch { hosts => [ "es1:9200" ]

user => "elastic"password => "elasticpassword"index => "taxi-%{+YYYY-MM}"document_type => "trips"

document_id => "%{id}" }}

Page 37: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Importing Data - Output

● Document partitioning

index => "taxi-%{+YYYY-MM}"

● Matching ID with MySQL

document_id => "%{id}"

Page 38: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Schema Design - Indexes

● Document Partitioning

● ElasticSearch Types

● Number of Indices

● Number of Shards

● Replication Factor

Page 39: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Schema Design - Indexes

PUT /taxi{ "settings" : { "index" : { "number_of_shards" : 5, "number_of_replicas" : 1 } }}

PUT _template/template1{

“template”: “taxi*”,“settings”: {

Page 40: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Schema Design - Mapping

● Indexing with the optimal type

● Avoiding Full Text Search indexing (aka text or “analyzed”)

● Type overview

○ keyword

○ long

○ byte

○ date

○ geo_point

○ scaled_float

Page 41: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Schema Design - Mapping

{ "mappings": { "trips": { "properties": { "id": { "type": "long" }, "driver_id": { "type": "long" }, "passenger_id": { "type": "long" }, "pickup_datetime": { "type": "date" },

"dropoff_datetime": { "type": "date" }, "passenger_count": { "type": "byte" }, "pickup_location": { "type": "geo_point" }, "dropoff_location": { "type": "geo_point" }, "payment_type": { "type" : "byte" }, "total_amount": { "type": "scaled_float", "scaling_factor": 100 } } } }}

Page 42: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query Design - Aggregation

High Level concepts:

● Buckets

○ driver name, passenger name, location

○ currency exchange (bitstamp, coinbase, etc)

○ Can be nested

● Metrics

○ count

○ pricing sum, avg, max, min, etc

SELECT metric FROM indexGROUP BY bucket

Page 43: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Building an aggregation, step by step

Drivers’ average earning per day with customers using Credit Cards as a filter

● Search component: payment_type:1

● Bucket terms aggregation (driver name or id)

● Bucket date histogram aggregation (per day, month, specific interval, etc)

● Metric average aggregation

Page 44: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query design - The search component

● Aggregations can be filtered by search

● Multiple indexes can be hit using regular expressions

GET /taxi-*/_search?q=payment_type:1

● Up to complex searches

GET /taxi-2016-01/_search?q=+pickup_datetime:2016-01-09 +total_amount:>60&size=10000'

Page 45: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query Design - The Terms bucket

The Terms bucket

{ "size": 0, <- do not return documents "aggs": { <- define aggregation "drivers": { <- canonical agg name "terms": { <- agg type "field": "driver_id" <- agg argument (and options) } } }}

Page 46: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query Design - The date histogram bucket

"aggs": { "drivers": { "terms": { "field": "driver_id" }, "aggs": { "by_day": { "date_histogram": { "field": "pickup_datetime", "interval": "day" },

Page 47: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query design - Metric aggregation

"aggs": { "by_day": { "date_histogram": { "field": "pickup_datetime", "interval": "day" }, "aggs": { "avg_earning": { "avg": { "field": "total_amount" } } } }

Page 48: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query Results

Obviously too long to show here! General info comes first:

{ "took" : 1804, <- 1804ms execution time "timed_out" : false, "_shards" : { "total" : 30, <- number of shards parsed "successful" : 30, "failed" : 0 }, "hits" : { "total" : 46066859 <- number of parsed documents}

Page 49: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query Results - Example aggregation result

{ "key" : "Shay Banon", "doc_count" : 5476, "by_day" : { "buckets" : [ { "key_as_string" : "2017-01-01 00:00:00", "doc_count" : 152, "avg_earning" : { "value" : 11.11

} },

Page 50: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Query filtering - Filtering by location

"filter": { "geo_polygon": { "pickup_location": {

"points": [ {"lat": -74.04, "lon": 40.56},

{"lat": -74.04, "lon": 40.74}, {"lat": -73.83, "lon": 40.74}, {"lat": -73.83, "lon": 40.57}

] } } }

Page 51: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Going further - Bucket aggregations

Adjacency Matrix AggregationChildren AggregationDate Histogram AggregationDate Range AggregationDiversified Sampler AggregationFilter AggregationFilters AggregationGeo Distance AggregationGeoHash grid AggregationGlobal Aggregation

Histogram AggregationIP Range AggregationMissing AggregationNested AggregationRange AggregationReverse nested AggregationSampler AggregationSignificant Terms AggregationTerms Aggregation

Page 52: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Going Further - Metric Aggregations

Avg AggregationCardinality AggregationExtended Stats AggregationGeo Bounds AggregationGeo Centroid AggregationMax AggregationMin Aggregation

Percentiles AggregationPercentile Ranks AggregationScripted Metric AggregationStats AggregationSum AggregationTop hits AggregationValue Count Aggregation

Page 53: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

Going Further - Kibana

Build cool dashboards with your aggregations

Page 54: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

We’re Hiring!

● Frontend Developers

● Backend Developers

● Data Scientists

Page 55: Turbocharge your MySQL - Percona€¦ · Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe

We’re Hiring!