nosql and mysql: news about json
TRANSCRIPT
NoSQL and SQL: The Best of Both Worlds
Mario Beck MySQL Presales Manager EMEA Mablomy.blogspot.de 3rd November, 2015
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright 2015, Oracle and/or its affiliates. All rights reserved 2
NoSQL
Simple access patterns
Compromise on consistency for performance
Ad-hoc data format
Simple operation
SQL
Complex queries with joins
ACID transactions
Well defined schemas
Rich set of tools
Still a role for SQL (RDBMS)?
Scalability
Performance
HA
Ease of use
SQL/Joins
ACID Transactions
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 3
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 4
MySQL Cluster Overview
• In-Memory Optimization + Disk-Data
• Predictable Low-Latency, Bounded Access Time REAL-TIME
• Auto-Sharding, Multi-Master
• ACID Compliant, OLTP + Real-Time Analytics HIGH SCALE, READS +
WRITES
• Shared nothing, no Single Point of Failure
• Self Healing + On-Line Operations 99.999% AVAILABILITY
• Key/Value + Complex, Relational Queries
• SQL + Memcached + JavaScript + Java + HTTP/REST & C++ SQL + NoSQL
• Open Source + Commercial Editions
• Commodity hardware + Management, Monitoring Tools LOW TCO
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 5
MySQL Cluster Scaling
MySQL Cluster Data Nodes
Clients
Application Layer
Data Layer
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 6
NoSQL Access to MySQL Cluster data
Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps
JPA
Cluster JPA
PHP Perl Python Ruby JDBC Cluster J JS Apache Memcached
MySQL JNI Node.JS mod_ndb ndb_eng
NDB API (C++)
MySQL Cluster Data Nodes
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 7
1.2 Billion UPDATEs per Minute
• Distributed Joins also possible
0
5
10
15
20
25
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Millio
ns o
f U
PD
AT
Es p
er
Se
co
nd
MySQL Cluster Data Nodes
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 8
Scalability a
Performance a
HA a
Ease of use a
SQL/Joins a
ACID Transactions a
• Memory optimized tables
– Durable
– Mix with disk-based tables
• Massively concurrent OLTP
• Distributed Joins for analytics
• Parallel table scans for non-indexed searches
• MySQL Cluster 7.4 FlexAsych – 200M NoSQL Reads/Second
26th March 2015 9
MySQL Cluster 7.4 NoSQL Performance 200 Million NoSQL Reads/Second
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.
-
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Readspersecond
DataNodes
FlexAsyncReads
Cluster & Memcached - Configured Schema
<town:maidenhead,SL6>
prefix key value
<town:maidenhead,SL6>
key value
Prefix Table Key-col Val-col policy
town: map.zip town code cluster
Config tables
town ... code ...
maidenhead ... SL6 ...
map.zip
Application view
SQL view
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 10
Node.js NoSQL API • Native JavaScript access to MySQL Cluster
– End-to-End JavaScript: browser to the app & DB
– Storing and retrieving JavaScript objects directly in MySQL Cluster
– Eliminate SQL transformation
• Implemented as a module for node.js
– Integrates Cluster API library within the web app
• Couple high performance, distributed apps, with high performance distributed database
• Optionally routes through MySQL Server
– Use with InnoDB
V8 JavaScript Engine
MySQL Cluster Node.js Module
MySQL Cluster Data Nodes
Clients
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 11
NoSQL API for Node.js & FKs
FKs enforced on all APIs: { message: 'Error',
sqlstate: '23000',
ndb_error: null,
cause:
{message: 'Foreign key constraint violated: No parent row found [255]',
sqlstate: '23000',
ndb_error:
{ message: 'Foreign key constraint violated: No parent row found',
code: 255,
classification: 'ConstraintViolation',
handler_error_code: 151,
status: 'PermanentError' },
cause: null } }
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 12
SQL
• Industry standard
• Joins & Complex queries
• Relational model
Memcached
• Simple to use API
• Key/value
• Drivers for many languages
Mod-ndb
• REST
• Html
• Plugin for Apache
ClusterJ
• Simple to Use Java API
• Web & telco
• Object Relational Mapping
• Native & fast access to data
ClusterJPA
• OpenJPA plugin
• Standards defined ORM
• Cross table Joins
JavaScript/Node.js
• Native JavaScript: client to DB
• Blazing fast asynchronous throughput
Choosing the right application API
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 13
MySQL 5.6 Memcached with InnoDB
0
10000
20000
30000
40000
50000
60000
70000
80000
8 32 128 512
TP
S
Client Connections
Memcached API
SQL
Clients and Applications
MySQL Server Memcached Plug-in
innodb_ memcached
local cache (optional)
Handler API InnoDB API
InnoDB Storage Engine
mysqld process
SQL Memcached Protocol
Up to 9x Higher “SET / INSERT” Throughput
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 14
26th March 2015 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 15
Core New JSON features in MySQL 5.7
• Native JSON datatype
• JSON Functions
• Generated Columns
16
The JSON Type
17
CREATE TABLE employees (data JSON); INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}'); INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}'); SELECT * FROM employees; +---------------------------+ | data | +---------------------------+ | {"id": 1, "name": "Jane"} | | {"id": 2, "name": "Joe"} | +---------------------------+ 2 rows in set (0,00 sec)
JSON Type Tech Specs
• utf8mb4 character set
• Optimized for read intensive workload
• Parse and validation on insert only
• Dictionary
• Sorted objects' keys
• Fast access to array cells by index
18
JSON Type Tech Specs (cont.)
• Supports all native JSON types
• Numbers, strings, bool
• Objects, arrays
• Extended
• Date, time, datetime, timestamp
• Other
19
Advantages over TEXT/VARCHAR
1. Provides Document Validation:
2. Efficient Binary Format Allows quicker access to object members and array elements
20
INSERT INTO employees VALUES ('some random text'); ERROR 3130 (22032): Invalid JSON text: "Expect a value here." at position 0 in value (or column) 'some random text'.
JSON Functions
21
SET @document = '[10, 20, [30, 40]]'; SELECT JSON_EXTRACT(@document, '$[1]'); +---------------------------------+ | JSON_EXTRACT(@document, '$[1]') | +---------------------------------+ | 20 | +---------------------------------+ 1 row in set (0.01 sec)
JSON Array Creation
22
SELECT JSON_ARRAY(id, feature->"$.properties.STREET", feature->'$.type") AS json_array FROM features ORDER BY RAND() LIMIT 3; +-------------------------------+ | json_array | +-------------------------------+ | [65298, "10TH", "Feature"] | | [122985, "08TH", "Feature"] | | [172884, "CURTIS", "Feature"] | +-------------------------------+ 3 rows in set (2.66 sec)
JSON Object Creation
23
SELECT JSON_OBJECT('id', id, 'street', feature->"$.properties.STREET", 'type', feature->"$.type" ) AS json_object FROM features ORDER BY RAND() LIMIT 3; +--------------------------------------------------------+ | json_object | +--------------------------------------------------------+ | {"id": 122976, "type": "Feature", "street": "RAUSCH"} | | {"id": 148698, "type": "Feature", "street": "WALLACE"} | | {"id": 45214, "type": "Feature", "street": "HAIGHT"} | +--------------------------------------------------------+ 3 rows in set (3.11 sec)
• 5.7 supports functions to CREATE, SEARCH, MODIFY and RETURN JSON values:
JSON Functions
24
JSON_ARRAY_APPEND()
JSON_ARRAY_INSERT()
JSON_ARRAY()
JSON_CONTAINS_PATH()
JSON_CONTAINS()
JSON_DEPTH()
JSON_EXTRACT()
JSON_INSERT()
JSON_KEYS()
JSON_LENGTH()
JSON_MERGE()
JSON_OBJECT()
JSON_QUOTE()
JSON_REMOVE()
JSON_REPLACE()
JSON_SEARCH()
JSON_SET()
JSON_TYPE()
JSON_UNQUOTE()
JSON_VALID()
https://dev.mysql.com/doc/refman/5.7/en/json-functions.html
Tests Using Real Life Data
• Via SF OpenData
• 206K JSON objects representing subdivision parcels.
• Imported from https://github.com/zemirco/sf-city-lots-json + small tweaks
25
CREATE TABLE features ( id INT NOT NULL auto_increment primary key, feature JSON NOT NULL );
26
{ "type":"Feature", "geometry":{ "type":"Polygon", "coordinates":[ [ [-122.42200352825247,37.80848009696725,0], [-122.42207601332528,37.808835019815085,0], [-122.42110217434865,37.808803534992904,0], [-122.42106256906727,37.80860105681814,0], [-122.42200352825247,37.80848009696725,0] ] ] }, "properties":{ "TO_ST":"0", "BLKLOT":"0001001", "STREET":"UNKNOWN", "FROM_ST":"0", "LOT_NUM":"001", "ST_TYPE":null, "ODD_EVEN":"E", "BLOCK_NUM":"0001", "MAPBLKLOT":"0001001" } }
Naive Performance Comparison
27
# as JSON type SELECT DISTINCT feature->"$.type" as json_extract FROM features; +--------------+ | json_extract | +--------------+ | "Feature" | +--------------+ 1 row in set (1.25 sec)
Unindexed traversal of 206K documents
# as TEXT type SELECT DISTINCT feature->"$.type" as json_extract FROM features; +--------------+ | json_extract | +--------------+ | "Feature" | +--------------+ 1 row in set (12.85 sec)
Explanation: Binary format of JSON type is very efficient at searching. Storing as TEXT performs over 10x worse at traversal.
Using short cut for JSON_EXTRACT. Coming in 5.7.9.
Introducing Generated Columns
28
id my_integer my_integer_plus_one
1 10 11
2 20 21
3 30 31
4 40 41
CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY auto_increment, my_integer INT, my_integer_plus_one INT AS (my_integer+1) ); UPDATE t1 SET my_integer_plus_one = 10 WHERE id = 1; ERROR 3105 (HY000): The value specified for generated column 'my_integer_plus_one' in table 't1' is not allowed.
Column automatically maintained based on your specification.
Read-only of course
Generated Columns Support Indexes!
29
ALTER TABLE features ADD feature_type VARCHAR(30) AS (feature->"$.type"); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 ALTER TABLE features ADD INDEX (feature_type); Query OK, 0 rows affected (0.73 sec) Records: 0 Duplicates: 0 Warnings: 0 SELECT DISTINCT feature_type FROM features; +--------------+ | feature_type | +--------------+ | "Feature" | +--------------+ 1 row in set (0.06 sec)
From table scan on 206K documents to index scan on 206K materialized values
Down from 1.25 sec to 0.06 sec
Creates index only. Does not modify table rows.
Meta data change only (FAST). Does not need to touch table.
Generated Columns (cont.)
• Used for “functional index”
• Available as either VIRTUAL (default) or STORED:
• Both types of computed columns permit for indexes to be added.
30
ALTER TABLE features ADD feature_type varchar(30) AS (feature->"$.type") STORED; Query OK, 206560 rows affected (4.70 sec) Records: 206560 Duplicates: 0 Warnings: 0
Indexing Options Available
31
STORED VIRTUAL
Primary and Secondary
BTREE, Fulltext, GIS
Mixed with fields
Requires table rebuild
Not Online
Secondary Only
BTREE Only
Mixed with fields
No table rebuild
INSTANT Alter
Faster Insert
Bottom Line: Unless you need a PRIMARY KEY, FULLTEXT or GIS index VIRTUAL is probably better.
Virtual vs. Stored Performance
• Approximate worst case scenario via a table scan:
32
SELECT DISTINCT feature_type FROM features; +--------------+ | feature_type | +--------------+ | "Feature" | +--------------+
VIRTUAL-TEXT (9.89 sec) STORED-TEXT (0.22 sec) VIRTUAL-JSON (0.85 sec) STORED-JSON (0.24 sec)
Clarification: Since indexes are materialized (stored) themselves, the real-life case for STORED is when generating the column is computationally expensive and you can not use indexes effectively.
Road Map
• In-place partial update of JSON/BLOB (performance)
• Partial streaming of JSON/BLOB (replication)
• Full text and GIS index on virtual columns
• Currently works for "STORED"
• Improved performance through condition pushdown
33
Prefer the Relational Model - Storing as a Column
• Easier to apply a schema to your application
• Schema may make applications easier to maintain over time, as change is controlled;
• Do not have to expect as many permutations
• Allows some constraints over data
34
Prefer the Document Model - Storing as JSON
• More flexible way to represent data that is hard to model in schema;
• Easier denormalization; an optimization that is important in some specific situations
• No painful schema changes*
• Easier prototyping, Fewer types to consider
• No enforced schema, start storing values immediately
35
* MySQL 5.6 has Online DDL. This is not as large of an issue as it was historically.
Prefer the Hybrid Model – Just do it!
36
SSDs have capacity_in_gb, CPUs have a core_count. These attributes are not consistent across products.
CREATE TABLE pc_components ( id INT NOT NULL PRIMARY KEY, description VARCHAR(60) NOT NULL, vendor VARCHAR(30) NOT NULL, serial_number VARCHAR(30) NOT NULL, attributes JSON NOT NULL );
Prefer Simple Access Pattern – Using Key-Value
• Full access to relational data –Value can be col1|col2|col3
–Value can be json
• Much higher throughput
• Only single Row,Primary Key Access
37
0
10000
20000
30000
40000
50000
60000
70000
80000
8 32 128 512
TP
S
Client Connections
Memcached API
SQL
Options for Dev – Simplicity for Ops
• Always the same tool to Backup (MySQL Enterprise Backup)
• Always the same tool to Monitor (MySQL Enterprise Monitor)
• Always the same tool to Audit (MySQL Enterprise Audit)
• Always the same tool to Protect (MySQL Enterprise Firewall)
• Always the same source of Support (Oracle MySQL Support)
• Always the same way to Deploy (Repos, Openstack, ...)
38
Polyglot Persistence with Operational Stability
Resources
• http://mysqlserverteam.com/
• http://mysqlserverteam.com/tag/json/
• https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
• http://dev.mysql.com/doc/relnotes/mysql/5.7/en/
• https://dev.mysql.com/doc/refman/5.7/en/json.html
• https://dev.mysql.com/doc/refman/5.7/en/json-functions.html
• http://www.thecompletelistoffeatures.com
39
Thank You!
Copyright © 2015, Oracle and/or its affiliates. All rights reserved.