query analytics for the day-to-day developer with mysql 8or around 100mb/table now supports...
TRANSCRIPT
Query analytics for the day-to-day developer with MySQL 8.0
Gabriela D'Ávila Ferrara @gabidavila Developer Advocate @ Google Cloud ^(gabi|gabby)\.dev$
I try to solve all my problems with a single SQL query
Is it always pretty?
FizzBuzz
WITH RECURSIVE fizz_buzz (sequence, modulo_3, modulo_5) AS ( SELECT 1, CAST('' AS CHAR(4)), CAST('' AS CHAR(5)) UNION ALL SELECT sequence + 1, IF(MOD(sequence + 1, 3) = 0, 'Fizz', ''), IF(MOD(sequence + 1, 5) = 0, 'Buzz', '') FROM fizz_buzz WHERE sequence < 100 )
SELECT IF( CONCAT(modulo_3, modulo_5) = '', sequence, CONCAT(modulo_3, modulo_5)) AS fizzbuzz FROM fizz_buzz;
Brief History
● Created to handle from up to 10 to 100M rows or around 100MB/table
● Now supports terabyte-sized databases
● Supports SQL standards as new as SQL 2016
● But... some stuff from SQL 2003 just became available (Window Functions, CTEs)
MySQL 8.0.16
Be warned!
Don’t do this in production!
● Just because you can, doesn’t mean you should!
● The cost of analytical queries are too high ● Run on a replica if you must to ● Remember this is just to help you in an
emergency
Demo
Considerations
How often you write in raw SQL…
● A CRUD operation? ● A DML operation? ● Create/edit a Function? ● Create/edit a Procedure? ● A View?
Have you ever…
● Generated reports using a scripting language? (Python, PHP)
● Did an ETL? ● Synced data across different types of
databases? (i.e. full-text search)
Scenario
Let’s say you own a store
🏬 💵 💻
Tables: •products •users •orders •order_items
And you want several reports
● Most expensive order per user
● The highest price each product was ever sold and what was the date
of that ● The monthly amount
sold in a year together with the growth
Subqueries
SHOW CREATE TABLE `users`;
CREATE TABLE `users` ( `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, `username` varchar(255) NOT NULL, `created_at` timestamp NOT NULL, `updated_at` timestamp NOT NULL )
SELECT * FROM `users`;
Who never did this?
SELECT users.id, users.username, (SELECT id FROM orders WHERE users.id = user_id ORDER BY total LIMIT 1) AS order_id, (SELECT total FROM orders WHERE users.id = user_id ORDER BY total LIMIT 1) AS order_total FROM users ORDER BY users.id LIMIT 10;
The most expensive order for each user
LATERAL
SELECT users.id, users.username, total_orders.id AS order_id, total_orders.total AS order_total FROM users, LATERAL( SELECT id, total FROM orders WHERE users.id = user_id ORDER BY total LIMIT 1 ) AS total_orders ORDER BY users.id LIMIT 10;
Demo
OLTP vs OLAP
OLTP: Online Transaction Processing
● Transactional ● Single-source of truth ● Normalized (usually) ● Examples: MySQL, PostgreSQL, SQL
Server, etc.
OLAP: Online Analytical Processing
● Aggregations ● Star-Schema, Cube ● Forecasts ● Examples: Redshift, Pentaho*
Query analytics? 🤔
Analytical clause, up to 4 parts
● Analytical function ● Partitioning Clause ● Order by Clause ● Windowing Clause
Window Functions
What they do?
● Allows to analyze the rows of a given result set
● Can behave like a GROUP BY without changing
the result set
● Allows you to use a frame to "peek" OVER a
PARTITION of a window
Window Functions
● Examples:
● Enumerate rows - ROW_NUMBER()
● Show Aggregated sums - SUM()
● Rank results - RANK(), DENSE_RANK()
● Look at neighboring rows - LEAD(), LAG()
SHOW CREATE TABLE `orders`;CREATE TABLE `orders` ( `id` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, `user_id` bigint(20) UNSIGNED NOT NULL, `status` varchar(20) NOT NULL DEFAULT 'new', `total` decimal(10,2) NOT NULL, `created_at` datetime NOT NULL, `updated_at` datetime NOT NULL )
SELECT … FROM `orders` WHERE … ORDER BY created_at
+---------+---------+-----------+---------------------+ | id | user_id | status | created_at | +---------+---------+-----------+---------------------+ | 6534954 | 654321 | canceled | 2011-08-21 19:06:59 | | 6534949 | 654321 | canceled | 2013-11-16 09:02:19 | | 6534953 | 654321 | completed | 2015-10-28 02:21:30 | | 6534951 | 654321 | new | 2018-01-05 17:12:23 | | 6534952 | 654321 | new | 2018-07-19 04:23:41 | | 6534955 | 654321 | new | 2018-11-12 05:37:48 | | 6534950 | 654321 | pending | 2018-12-20 06:11:23 | +---------+---------+-----------+---------------------+
SELECT … FROM `orders` WHERE … ORDER BY created_at
+---------+---------+-----------+---------------------+---------------------+---------------------+ | id | user_id | status | previous_order | created_at | next_order | +---------+---------+-----------+---------------------+---------------------+---------------------+ | 6534954 | 654321 | canceled | NULL | 2011-08-21 19:06:59 | 2013-11-16 09:02:19 | | 6534949 | 654321 | canceled | 2011-08-21 19:06:59 | 2013-11-16 09:02:19 | 2015-10-28 02:21:30 | | 6534953 | 654321 | completed | 2013-11-16 09:02:19 | 2015-10-28 02:21:30 | 2018-01-05 17:12:23 | | 6534951 | 654321 | new | 2015-10-28 02:21:30 | 2018-01-05 17:12:23 | 2018-07-19 04:23:41 | | 6534952 | 654321 | new | 2018-01-05 17:12:23 | 2018-07-19 04:23:41 | 2018-11-12 05:37:48 | | 6534955 | 654321 | new | 2018-07-19 04:23:41 | 2018-11-12 05:37:48 | 2018-12-20 06:11:23 | | 6534950 | 654321 | pending | 2018-11-12 05:37:48 | 2018-12-20 06:11:23 | NULL | +---------+---------+-----------+---------------------+---------------------+---------------------+
Previous and Next orders | LAG and LEAD
SELECT id, user_id, status, LAG(created_at) OVER(ORDER BY created_at) AS previous_order, created_at, LEAD(created_at) OVER(ORDER BY created_at) AS next_order FROM orders WHERE user_id = 654321 ORDER BY created_at LIMIT 10;
Break down
windowfunction
column
# rows preceding
LAG(created_at, 1) OVER (ORDER BY created_at)
Repetition?
SELECT id, user_id, status, LAG(created_at) OVER(ORDER BY created_at) AS previous_order, created_at, LEAD(created_at) OVER(ORDER BY created_at) AS next_order FROM orders WHERE user_id = 654321 ORDER BY created_at LIMIT 10;
Named Windows!
SELECT id, user_id, status, LAG(created_at) OVER(dates) AS previous_order, created_at, LEAD(created_at) OVER(dates) AS next_order FROM orders WHERE user_id = 654321 WINDOW dates AS (ORDER BY created_at) ORDER BY created_at LIMIT 10;
Demo
CTE Common Table Expressions
CTE
Common Table Expressions
● Similar to CREATE [TEMPORARY] TABLE
● Doesn’t need CREATE privilege
● Can reference other CTEs (if those are already defined)
● Can be recursive
● Easier to read
Recursive CTE
● Useful with hierarchical data
● The Recipe is:
● Base query comes first
● Second query comes after an UNION statement
● And the stop condition should be on the recursive call
FizzBuzz
WITH RECURSIVE fizz_buzz (sequence, modulo_3, modulo_5) AS ( SELECT 1, CAST('' AS CHAR(4)), CAST('' AS CHAR(5)) UNION ALL SELECT sequence + 1, IF(MOD(sequence + 1, 3) = 0, 'Fizz', ''), IF(MOD(sequence + 1, 5) = 0, 'Buzz', '') FROM fizz_buzz WHERE sequence < 100 )
SELECT IF( CONCAT(modulo_3, modulo_5) = '', sequence, CONCAT(modulo_3, modulo_5)) AS fizzbuzz FROM fizz_buzz;
SELECT * FROM `store`.`categories`;
+------+-------------------------------+--------------------+ | id | name | parent_category_id | +------+-------------------------------+--------------------+ | 1 | Animal | 0 | | 2 | Plant | 0 | | 3 | Dog | 1 | | 4 | Cat | 1 | | 5 | Tulip | 10 | | 6 | West Highlander White Terrier | 12 | | 7 | Lettuce | 11 | | 8 | Sunflower | 10 | | 10 | Flowers | 2 | | 11 | Veggies | 2 | | 12 | Terrier | 3 | +------+-------------------------------+--------------------+ 11 rows in set (0.00 sec)
Demo
Recursive CTE
WITH RECURSIVE tree (depth_level, node, path, node_id) AS ( SELECT 1, CAST('root' AS CHAR(255)), CAST('root' AS CHAR(65535)), 0 ) SELECT * FROM tree;
Recursive CTE
WITH RECURSIVE tree (depth_level, node, path, node_id) AS ( SELECT 1, CAST('root' AS CHAR(255)), CAST('root' AS CHAR(65535)), 0 UNION ALL SELECT tree.depth_level + 1, categories.name, CONCAT_WS('/', tree.path, categories.name), categories.id FROM tree INNER JOIN categories
ON tree.node_id = categories.parent_category_id WHERE tree.depth_level < 5 ) SELECT * FROM tree ORDER BY path;
Thank you!
● Twitter: @gabidavila ● Co-host at GCP Podcast: gcppodcast.com ● Website: gabi.dev