data systems for the cloud - stanford...
TRANSCRIPT
![Page 2: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/2.jpg)
Outline
What is the cloud and what’s different with it?
S3 & Dynamo: object stores
Aurora: transactional DBMS
BigQuery: analytical DBMS
Delta Lake: ACID over object stores
CS 245 2
![Page 3: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/3.jpg)
Outline
What is the cloud and what’s different with it?
S3 & Dynamo: object stores
Aurora: transactional DBMS
BigQuery: analytical DBMS
Delta Lake: ACID over object stores
CS 245 3
![Page 4: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/4.jpg)
What is Cloud Computing?
Computing as a service, managed by an external party» Software as a Service (SaaS): application
hosted by a provider, e.g. Salesforce, Gmail» Platform as a Service (PaaS): APIs to
program against, e.g. DB or web hosting» Infrastructure as a Service (IaaS): raw
computing resources, e.g. VMs on AWS
CS 245 4
Large shift in industry over past 10-20 years!
![Page 5: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/5.jpg)
History of Cloud Computing
Old idea, but became successful in the 2000s
CS 245 5
1960 1970 1980 1990 2000 2010 2020
“Utility computing” first
used, talking about shared mainframes
Sun Cloud,HP Utility Datacenter, Loudcloud, VMware
Virtual private web servers
Amazon S3, EC2 (2006)
Google BigQuery (2011), AWS Redshift (2012)
AWS Aurora, Lambda (2014)
Salesforce (1999)
![Page 6: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/6.jpg)
6
Traditional Software Cloud Software
Vend
orC
usto
mer
s
Dev Team
Release
6-12 months
UsersOps
UsersOps
UsersOps
UsersOps
Dev + Ops Team
1-2 weeks
UsersOps
UsersOps
UsersOps
UsersOps
6-12 months
Development Process
![Page 7: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/7.jpg)
Why Might Customers Use Cloud Services?
Management built-in: more value than the software bits alone (security, availability, etc)
Elasticity: pay-as-you-go, scale on demand
Better features released faster
![Page 8: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/8.jpg)
Differences in Building Cloud Software
+ Release cycle: send releases to users faster, get feedback faster
+ Only need to maintain 2 software versions (current & next), fewer configs than on-premise
– Upgrading without regressions: critical for users to trust service, as updates are forced
– Building a multitenant service: difficult scaling, security and performance isolation work
![Page 9: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/9.jpg)
How Do These Factors Affect Data Systems?Data systems already had to support many users robustly, but new challenges arise:» Much larger scale: millions of users, VMs, …» Multitenancy: users don’t trust each other, so
need stronger security, perf isolation, etc» Elasticity: how can our system be elastic?» Updatability: avoid regressions & downtime
CS 245 9
![Page 10: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/10.jpg)
Outline
What is the cloud and what’s different with it?
S3 & Dynamo: object stores
Aurora: transactional DBMS
BigQuery: analytical DBMS
Delta Lake: ACID over object stores
CS 245 10
![Page 11: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/11.jpg)
S3, Dynamo & Object Stores
Goal: I just want to store some bytes reliably and cheaply for a long time period
Interface: key-value stores» Objects have a key (e.g. bucket/imgs/1.jpg)
and value (arbitrary bytes)» Values can be up to a few TB in size» Can only do operations on 1 key atomically
Consistency: eventual consistency
CS 245 11Store trillions of objects and exabytes of data
![Page 12: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/12.jpg)
Example: S3 API
PUT(key, value): write object with a key» Atomic update: replaces the whole object
GET(key, [range]): return object with a key» Can optionally read a byte range in the
object
LIST([startKey]): list keys in a bucket in lexicographic order, starting at a given key» Limit of 1000 returned keys per call
CS 245 12
![Page 13: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/13.jpg)
S3 Consistency Model
Eventual consistency: different readers may see different versions of the same object
Read-your-own-writes for new PUT: if you GET a new object that you PUT, you see it» Unless you had previously called GET while
it was missing, in which case you might not!
CS 245 13
![Page 14: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/14.jpg)
Why These Choices?
The primary goal is scale: keep the interface very simple to support trillions of objects» No cross-object operations except LIST!
Mostly target immutable or rarely changing data, so consistency is not as important
Can try to build stronger consistency on top
CS 245 14
![Page 15: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/15.jpg)
Implementing Object Stores
CS 245 15
![Page 16: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/16.jpg)
Goals
CS 245 16
![Page 17: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/17.jpg)
Goals
CS 245 17
![Page 18: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/18.jpg)
Goals
CS 245 18
Obviously different for S3!
![Page 19: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/19.jpg)
Dynamo Implementation
Commodity nodes with local storage on disks
Nodes form a “ring” to split up the key space among them» Actually, each node covers
many ranges (over-partitioning)
Use quorums and gossip to manage updates to each key
CS 245 19
![Page 20: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/20.jpg)
Reads and Writes to Dynamo
Quorums with configurable # of writersand readers required for success» E.g. 3 nodes, write to 2, read from 2» E.g. 3 nodes, write to 2, read from 1
(weaker consistency!)
Nodes gossip & merge updatesin an application-specific way
CS 245
Client 1: Write
Client 2: Read
![Page 21: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/21.jpg)
Usage of Object Stores
Very widely used (probably the largest storage systems in the world)
But the semantics can be complex» E.g. many users try to mount these as file systems
but they’re not the same
CS 245 21
![Page 22: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/22.jpg)
Outline
What is the cloud and what’s different with it?
S3 & Dynamo: object storage
Aurora: transactional DBMS
BigQuery: analytical DBMS
Delta Lake: ACID over object stores
CS 245 22
![Page 23: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/23.jpg)
Amazon Aurora
Goal: I want a transactional DBMS managed by the cloud vendor
Interface: same as MySQL/Postgres» ODBC, JDBC, etc
Consistency: strong consistency (similar to traditional DBMSes)
CS 245 23
Some of the largest & most profitable cloud services
![Page 24: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/24.jpg)
Initial Attempt at DBMS on AWS
Just run an existing DBMS (e.g. MySQL) on cloud VMs, and use replicated disk storage
CS 245 24
Same thing users would do on-premise
MySQL MySQL
Primary Backup
Replicated disk(e.g. EBS)
Apply log to recreate same pages
pages
log
pages
![Page 25: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/25.jpg)
Problems with This Model
Elasticity: doesn’t leverage the elastic nature of the cloud, or give users elasticity
Efficiency: mirroring and disk-level replication is expensive at global scale
CS 245 25
![Page 26: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/26.jpg)
Inefficiency of Mirrored DBMS
CS 245 26
Write amplification: each write at app level results in many writes to physical storage
For Aurora, Amazon wanted “4 out of 6” quorums (3 zones and2 nodes in each zone)
![Page 27: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/27.jpg)
Aurora’s Design
Implement replication at a higher level: only replicate the redo log (not disk blocks)
Enable elastic frontend and backend by decoupling API & storage servers» Lower cost & higher performance per tenant
CS 245 27
![Page 28: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/28.jpg)
Aurora’s Design
CS 245 28
Redo log
![Page 29: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/29.jpg)
Design DetailsLogging uses async quorum: wait until 4 of 6 nodes reply (faster than waiting for all 6)
Each storage node takesthe log and rebuilds theDB pages locally
Care taken to handleincomplete logs dueto async quorums
CS 245 29
![Page 30: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/30.jpg)
Performance
CS 245 30
![Page 31: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/31.jpg)
Other Features from this Design
Rapidly add or remove read replicas
Efficient DB recovery, cloning and rollback (use a prefix of the log and older pages)
Serverless Aurora (only pay when actively running queries)
CS 245 31
![Page 32: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/32.jpg)
CS 245 32
![Page 33: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/33.jpg)
Outline
What is the cloud and what’s different with it?
S3 & Dynamo: object stores
Aurora: transactional DBMS
BigQuery: analytical DBMS
Delta Lake: ACID over object stores
CS 245 33
![Page 34: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/34.jpg)
Google BigQuery
Goal: I want a cheap & fast analytical DBMS managed by the cloud vendor
Interface: SQL, JDBC, ODBC, etc
Consistency: depends on storage chosen (object stores or richer table storage)
CS 245 34
![Page 35: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/35.jpg)
Traditional Data Warehouses
Provision a fixed set of nodes that have both storage and computing» Big servers with lots of disks, etc» Makes sense when buying servers on-premise
Problem: no elasticity!
Interestingly, this was the model chosen by AWS Redshift initially (using ParAccel)
CS 245 35
![Page 36: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/36.jpg)
BigQuery and Other Elastic Analytics SystemsSeparate compute and storage» One set of nodes (or the cloud object store)
stores data, usually over 1000s of nodes» Separate set of nodes handle queries (again,
possibly scaling out to 1000s)
Users pay separately for storage & queries
Get performance of 1000s of servers to run a query, but only pay for a few seconds of use
CS 245 36
![Page 37: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/37.jpg)
Results
These elastic services generally provide better performance and cost for ad-hoc small queries than launching a cluster
For big organizations or long queries, paying per query can be challenging, so these services let you bound total # of nodes
CS 245 37
![Page 38: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/38.jpg)
Outline
What is the cloud and what’s different with it?
S3 & Dynamo: object stores
Aurora: transactional DBMS
BigQuery: analytical DBMS
Delta Lake: ACID over object stores
CS 245 39
![Page 39: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/39.jpg)
Delta Lake Motivation
Object stores are the largest & lowest-cost storage systems, but their semantics make it hard to manage mutable datasets
Goal: analytical table storage over object stores, built as a client library (no other services)
Interface: relational tables with SQL queries
Consistency: serializable ACID transactions
CS 245 40Open source at https://delta.io
![Page 40: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/40.jpg)
Setup
CS 245 41
Job 1
Job 2
Client library
![Page 41: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/41.jpg)
Naïve Way to Use Object Stores for Tables“Just a bunch of objects”: a table is a set of files (maybe partitioned on some fields)
mytable/date=2020-01-01/p1.parquet/p2.parquet
/date=2020-01-02/p1.parquet/p2.parquet/p3.parquet
/date=2020-01-03/p1.parquet...
CS 245 42
Columnar files of recordswith date=2020-01-01
Columnar files of recordswith date=2020-01-02
…
![Page 42: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/42.jpg)
Problems with “Just Objects”
No multi-object transactions» Hard to insert multiple objects at once
(what if your load job crashes partway through?)» Hard to update multiple objects at once
(e.g. delete a user or fix their records)» Hard to change data layout & partitioning
Poor performance» LIST is expensive (only 1000 results/request!)» Can’t do streaming inserts (too many small files)» Expensive to load metadata (e.g. column stats)
CS 245 43
![Page 43: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/43.jpg)
Example Problems
CS 245 44
![Page 44: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/44.jpg)
Example Problems
CS 245 45
![Page 45: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/45.jpg)
Delta Lake’s Approach
Can we implement a transaction log on top of the object store to retain its scale & reliability but provide stronger semantics?
CS 245 46
![Page 46: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/46.jpg)
Inspiration: Bolt-On Consistency
CS 245 47
![Page 47: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/47.jpg)
Delta Lake Implementation
Table = directory of data objects, with a set of log objects stored in _delta_log subdir» Log specifies which data objects are part of
the table at a given version
One log object for each write transaction, in order: 000001.json, 000002.json, etc
Periodic checkpoints of the log in Parquet format contain object list + column statistics
CS 245 48
![Page 48: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/48.jpg)
Delta Table Example
CS 245 49
mytable/date=2020-01-01/1b8a32d2ad.parquet/a2dc5244f7.parquet/f52312dfae.parquet/ba68f6bd4f.parquet
/_delta_log/00001.json/00002.json/00003.json/00003.parquet/00004.json/00005.json/_last_checkpoint
Data objects(partitioned
by date field)
Log recordsand checkpoints
Contains {version: “00003”}Coalesces logrecords 1–3Transaction’s operations, e.g.,
add date=2020-01-01/a2dc5244f7f7.parquetadd date=2020-01-02/ba68f6bd4f1e.parquet
![Page 49: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/49.jpg)
Log Record Types
Add data object + its column statistics
Remove data object
Change metadata, e.g. table schema or Delta Lake format version
A few others for streaming writes (allows treating a table like a message bus)
CS 245 50
![Page 50: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/50.jpg)
Writing to Delta Lake
1) Add new objects in the data directories; readers will ignore them because the log has no add entries for them
2) Try to add a new log record with the next valid log record number (e.g. 00006.json)» Various ways to make this atomic per cloud
3) Optional: write a new Parquet checkpoint
CS 245 51
What if one of these steps fails?
![Page 51: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/51.jpg)
What Kind of Concurrency Approach is This?
Optimistic! Even simpler than validation
Also MVCC: keep old data versions around
Why is this okay for Delta Lake’s workloads?
CS 245 52
![Page 52: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/52.jpg)
Reading from Delta Lake
1) Read the _last_checkpoint object to find a checkpoint number
2) Read that Parquet file, and use LIST to find any newer .json log records after it
3) Determine which objects are “add”ed but not “remove”d from those logs and read those» Use column min/max stats to prune data
CS 245 53
What if one step sees old versions of that data?
![Page 53: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/53.jpg)
Isolation Levels
Transactions with writes are serializable: one serial order, given by log record numbers
Read transactions can get either snapshot isolation (read older version) or serializability (by adding a dummy write)
CS 245 54
Takeaway: by using atomic operations on just one object at a time (last log record key), we
got ACID transactions for a whole table!
![Page 54: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/54.jpg)
Impact on Performance
Reading the list of object names from a Parquet file much faster than making many LIST operations
Reading column stats from this file is also faster than range GETs on each object
CS 245 55
0.1
1
10
100
1000
10000
100000
1000 10K 100K 1M
Tim
e (s
econ
ds, l
og s
cale
)
Number of Partitions
Apache Spark on Delta (no cache)Apache Spark on Delta (cache)Apache Spark on ParquetApache Hive on ParquetPresto on Parquet
![Page 55: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/55.jpg)
Other Features from this Design
Caching data & log objects on workers is safe because they are immutable
Time travel: can query or restore an old version of the table while those objects are retained
Background optimization: compact small writes or change data ordering (e.g. Z-order) without affecting concurrent readers
Audit logging: who wrote to the table?
CS 245 56
![Page 56: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/56.jpg)
Applications & Impact
Delta Lake now manages exabytes of data (>60% of Databricks’ workload in 3 years)!
Reduced support escalations relating to cloud storage from ~50% to nearly none
Largest single tables hold exabytes of data across billions of data objects
CS 245 57
![Page 57: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/57.jpg)
![Page 58: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/58.jpg)
Other “Bolt-On” Systems
Apache Hudi (at Uber) and Iceberg (at Netflix) also offer table storage on S3
Google BigTable was built over GFS
Filesystems that use S3 as a block store(e.g. early Hadoop s3:/, Goofys, MooseFS)
CS 245 60
![Page 59: Data Systems for the Cloud - Stanford Universityweb.stanford.edu/class/cs245/slides/16-Cloud-Systems.pdf · 2021. 3. 4. · Computing as a service, managed by an external party »Software](https://reader035.vdocuments.net/reader035/viewer/2022070221/613ae7a5f8f21c0c8268b3f8/html5/thumbnails/59.jpg)
Conclusion
Cloud computing requires changes in data management systems» Elasticity with separate compute & storage» Very large scale» Multitenancy: security, performance isolation» Updating without regressions
Can design and analyze these systems using the ideas we saw!
CS 245 61