jason petersen - citus datainfo.citusdata.com/rs/235-cne-301/images/sharding_and...increasing by...
TRANSCRIPT
![Page 1: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/1.jpg)
Scaling & Sharding PostgreSQLPrinciples and Practice
Jason Petersen
Software Developer, Citus Data
Copyright © 2015 Citus Data, Inc. 1
![Page 2: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/2.jpg)
This talk
Copyright © 2015 Citus Data, Inc. 2
![Page 3: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/3.jpg)
What we talk about when we talk about sharding:
Copyright © 2015 Citus Data, Inc. 3
![Page 4: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/4.jpg)
What we talk about when we talk about sharding:
Horizontal partitioning
Copyright © 2015 Citus Data, Inc. 4
![Page 5: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/5.jpg)
Horizontal partitioning […] involves putting different rows
into different tables.— Wikipedia, “Shard (database architecture)”
Copyright © 2015 Citus Data, Inc. 5
![Page 6: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/6.jpg)
Sharding goes beyond this: […] it does this across potentially
multiple instances of the schema.— Also Wikipedia
Copyright © 2015 Citus Data, Inc. 6
![Page 7: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/7.jpg)
Putting our foot down
Sharding is a form of horizontal partitioning which distributes database rows across totally separate physical database servers.
Copyright © 2015 Citus Data, Inc. 7
![Page 8: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/8.jpg)
A form of horizontal partitioning which distributes rows across
totally separate physical database servers.
— Citus Data
Copyright © 2015 Citus Data, Inc. 8
![Page 9: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/9.jpg)
What is Citus Data?
Copyright © 2015 Citus Data, Inc. 9
![Page 10: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/10.jpg)
(Pronounced like “Midas”)
Copyright © 2015 Citus Data, Inc. 10
![Page 11: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/11.jpg)
(We make CitusDB)
Copyright © 2015 Citus Data, Inc. 11
![Page 12: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/12.jpg)
What is CitusDB?
— Scalable analytics DB
— Extends PostgreSQL
— Brings distributed query logic
— Supports all types, extensions
— Does it all using sharding
Copyright © 2015 Citus Data, Inc. 12
![Page 13: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/13.jpg)
You may be thinking…click_events_2012.
Node%#1%(PostgreSQL)%
click_events_2013.
Node%#2%
click_events_2014.
Node%#3%
Copyright © 2015 Citus Data, Inc. 13
![Page 14: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/14.jpg)
How doesthat scale?Copyright © 2015 Citus Data, Inc. 14
![Page 15: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/15.jpg)
Not very well…
Node%#4%
click_events_2012.
Node%#1%
(4#TB)# click_events_2013.
Node%#2%
(4#TB)# click_events_2014.
Node%#3%
(4#TB)#
Copyright © 2015 Citus Data, Inc. 15
![Page 16: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/16.jpg)
Not very well…
click_events_2012.
Node%#1%
(4#TB)# click_events_2013.
Node%#2%
(4#TB)# click_events_2014.
Node%#3%
(4#TB)#
Node%#4%
1#TB#(each)#
Copyright © 2015 Citus Data, Inc. 16
![Page 17: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/17.jpg)
What about loadcharacteristics?
Copyright © 2015 Citus Data, Inc. 17
![Page 18: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/18.jpg)
Not great, either…
click_events_2012.
Node%#1%
click_events_2013.
Node%#2%
click_events_2014.
Node%#3%click_events_2012.
Node%#4%click_events_2013.
Node%#5%
click_events_2014.
Node%#6%
Copyright © 2015 Citus Data, Inc. 18
![Page 19: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/19.jpg)
Not great, either…
click_events_2012.
Node%#1%
click_events_2013.
Node%#2%
click_events_2014.
Node%#3%click_events_2012.
Node%#4%click_events_2013.
Node%#5%
click_events_2014.
Node%#6%
Copyright © 2015 Citus Data, Inc. 19
![Page 20: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/20.jpg)
So what to do?
Copyright © 2015 Citus Data, Inc. 20
![Page 21: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/21.jpg)
… when initially implementing sharding you’ll want to create an
arbitrary number of logical shards.
— Craig Kerstiens, “Sharding Your Database”
Copyright © 2015 Citus Data, Inc. 21
![Page 22: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/22.jpg)
“Logical”?Copyright © 2015 Citus Data, Inc. 22
![Page 23: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/23.jpg)
[the] system consists of several thousand ‘logical’ shards that are
mapped in code to far fewer physical shards…
— “Sharding & IDs at Instagram”
Copyright © 2015 Citus Data, Inc. 23
![Page 24: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/24.jpg)
… we can start with just a few database servers, and eventually
move to many more…— “Sharding & IDs at Instagram”
Copyright © 2015 Citus Data, Inc. 24
![Page 25: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/25.jpg)
A better approach
Node%#1%(PostgreSQL)%
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Node%#2%
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Node%#3%
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Copyright © 2015 Citus Data, Inc. 25
![Page 26: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/26.jpg)
Easier growth…
Node%#4%
Node%#1%(PostgreSQL)%
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Node%#2%
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Node%#3%
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
512$MB$(each)$
Copyright © 2015 Citus Data, Inc. 26
![Page 27: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/27.jpg)
Graceful failure…
Node%#1%(PostgreSQL)%
1" 6" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#2%
1" 2" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#3%
2" 3" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#4%
3" 4" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#5%
4" 5" 9"
…" …" …"
…" …" …"
…" …" …"
Node%#6%
5" 6" 9"
…" …" …"
…" …" …"
…" …" …"
Copyright © 2015 Citus Data, Inc. 27
![Page 28: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/28.jpg)
Graceful failure…
Node%#1%(PostgreSQL)%
1" 6" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#2%
1" 2" 7"
…" …" …"
…" …" …"
…" …" …"
Node%#3%
2" 3" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#4%
3" 4" 8"
…" …" …"
…" …" …"
…" …" …"
Node%#5%
4" 5" 9"
…" …" …"
…" …" …"
…" …" …"
Node%#6%
5" 6" 9"
…" …" …"
…" …" …"
…" …" …"
Copyright © 2015 Citus Data, Inc. 28
![Page 29: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/29.jpg)
Logical!Copyright © 2015 Citus Data, Inc. 29
![Page 30: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/30.jpg)
Logical shard benefits
— Enables rebalancing
— Better failure modes
— More granular migrations
— Performance benefits
Copyright © 2015 Citus Data, Inc. 30
![Page 31: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/31.jpg)
But wait!Copyright © 2015 Citus Data, Inc. 31
![Page 32: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/32.jpg)
Sharding concerns
— Operations burden
— Network resiliency
— ACID tradeoffs?
— No return
Copyright © 2015 Citus Data, Inc. 32
![Page 33: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/33.jpg)
(Should be your last)
Copyright © 2015 Citus Data, Inc. 33
![Page 34: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/34.jpg)
Pyramids!Copyright © 2015 Citus Data, Inc. 34
![Page 35: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/35.jpg)
Self-Actualiza.on
Esteem
Love/Belonging
Safety
Physiological
Copyright © 2015 Citus Data, Inc. 35
![Page 36: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/36.jpg)
Sharding!
SplitLoad
HardwareandTuning
DatabaseSchema
Applica;onCode
Copyright © 2015 Citus Data, Inc. 36
![Page 37: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/37.jpg)
Always useSCIENCE
Copyright © 2015 Citus Data, Inc. 37
![Page 38: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/38.jpg)
Getting to ScalePrinciples
1. Generate realistic load
2. Measure, measure, measure…
3. Change one thing
4. Determine the impact
5. GOTO the first step
Copyright © 2015 Citus Data, Inc. 38
![Page 39: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/39.jpg)
Getting to ScaleDetermining Workload
— pg_stat_statements
— pgBadger
— PoWA
— New Relic
Copyright © 2015 Citus Data, Inc. 39
![Page 40: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/40.jpg)
Getting to ScaleGenerating Load
— pgbench
— apachebench
— jmeter
— Fill up your queue!
Copyright © 2015 Citus Data, Inc. 40
![Page 41: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/41.jpg)
Getting to Scale
Measuring
— Long runtimes
— Eliminate hidden unknowns
— pgbench-tools
— time
Copyright © 2015 Citus Data, Inc. 41
![Page 42: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/42.jpg)
Changeand
CompareCopyright © 2015 Citus Data, Inc. 42
![Page 43: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/43.jpg)
Sharding!
SplitLoad
HardwareandTuning
DatabaseSchema
Applica;onCode
Copyright © 2015 Citus Data, Inc. 43
![Page 44: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/44.jpg)
Getting to sharding…Optimize application logic
Add caching. Use connection pools. Bundle writes and issue them in batches. Use JOINs judiciously. Dig beneath your ORM.
Copyright © 2015 Citus Data, Inc. 44
![Page 45: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/45.jpg)
Getting to sharding…✓ Optimize application logic
Tweak schemas
Denormalize where necessary. Add indexes to all commonly used columns. Locally partition tables if it makes sense. Move hot columns to separate tables.
Copyright © 2015 Citus Data, Inc. 45
![Page 46: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/46.jpg)
Getting to sharding…✓ Optimize application logic
✓ Tweak schemas
Upgrade and tune
Benchmark your system. Determine resource bottlenecks. Upgrade. Tune postgresql.conf to within an inch of its life. Do the same1 for your OS.
1 Check out Brendan Gregg’s USE Method
Copyright © 2015 Citus Data, Inc. 46
![Page 47: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/47.jpg)
Getting to sharding…✓ Optimize application logic
✓ Tweak schemas
✓ Upgrade and tune
Try replication
Use a read replica. Use read replicas for every distinct workload (to avoid background jobs evicting your app’s working set from the DB cache).
Copyright © 2015 Citus Data, Inc. 47
![Page 48: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/48.jpg)
Getting to sharding…✓ Optimize application logic
✓ Tweak schemas
✓ Upgrade and tune
✓ Try replication
Split writes
Modularize concerns within your app to isolate write-heavy tables to their own database.
Copyright © 2015 Citus Data, Inc. 48
![Page 49: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/49.jpg)
You’ve already…✓ Optimized application logic
✓ Tweaked schemas
✓ Upgraded and tuned
✓ Tried replication
✓ Split writes
When you’re on the best hardware with a tuned OS, optimized queries, and servers devoted to each workload and you’re still worried about scaling?
Copyright © 2015 Citus Data, Inc. 49
![Page 50: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/50.jpg)
You’re ready to shard.
Copyright © 2015 Citus Data, Inc. 50
![Page 51: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/51.jpg)
Our dream extension
— Creates and manages shards
— Uses regular SQL commands
— Supports replicas/failover
— Integrated with CitusDB
Copyright © 2015 Citus Data, Inc. 51
![Page 52: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/52.jpg)
pg_shard
Copyright © 2015 Citus Data, Inc. 52
![Page 53: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/53.jpg)
pg_shard
Motivation
— Real-time ingest for CitusDB
— Customers building their own
— Could be NoSQL alternative
Copyright © 2015 Citus Data, Inc. 53
![Page 54: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/54.jpg)
pg_shard
User needs
— Dynamic rebalancing/scaling
— “Automagic” failure handling
— Transactions not so important
Copyright © 2015 Citus Data, Inc. 54
![Page 55: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/55.jpg)
Good News, Everyone!
Copyright © 2015 Citus Data, Inc. 55
![Page 56: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/56.jpg)
Upcoming Developments
— Streamlining offerings
— CitusDB soon open-source
— Extension, not standalone
— Real-time modifications
— Contact us for early access
Copyright © 2015 Citus Data, Inc. 56
![Page 57: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/57.jpg)
Sharding principles
Copyright © 2015 Citus Data, Inc. 57
![Page 58: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/58.jpg)
Principles of sharding
— Need to know where to put rows
— And where to find stored ones
— Designate a dimension of data as key
— In relational databases: a column
— Logical shard covers range of values
Copyright © 2015 Citus Data, Inc. 58
![Page 59: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/59.jpg)
Visualized
MongoDB uses logical shards, but calls them “chunks”. Weird, but they made a decent diagram2 of the concept:
2 From the MongoDB Manual, “Shard Keys”
Copyright © 2015 Citus Data, Inc. 59
![Page 60: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/60.jpg)
Shard key refinements
— Pass into hash function(smooths out distribution)
— Use contiguous range
— Specify a list of columns
— Generalize to any expression
Copyright © 2015 Citus Data, Inc. 60
![Page 61: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/61.jpg)
Choosing a key
Copyright © 2015 Citus Data, Inc. 61
![Page 62: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/62.jpg)
The field you choose as your hashed shard key should have a
good cardinality.— MongoDB Manual, “Shard Keys”
Copyright © 2015 Citus Data, Inc. 62
![Page 63: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/63.jpg)
… the correct shard key can have a great impact on […]
performance [and] capability…— ib., “Considerations for Selecting Shard Keys”
Copyright © 2015 Citus Data, Inc. 63
![Page 64: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/64.jpg)
Choosing a key
— What is most important to your application?
— Spreading incoming writes
— Targeting reads to reduce latency
— Consider key frequently in WHERE clauses
— Use a hybrid approach when it makes sense(shard on customer, partition on time)
— Mind the “hot spots”Copyright © 2015 Citus Data, Inc. 64
![Page 65: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/65.jpg)
Costs of poor choice
— Cross-shard scans hurt performance
— Low cardinality limits ultimate scalability
— Switching keys after distribution burdensome
Copyright © 2015 Citus Data, Inc. 65
![Page 66: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/66.jpg)
So how doesthis thing work?
Copyright © 2015 Citus Data, Inc. 66
![Page 67: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/67.jpg)
pg_shard
Installation
— Build from GitHub source
— pgxnclient install pg_shard
— sudo yum install pg_shard_94
— CloudFormation templates3
3 Available on the Citus Data blog
Copyright © 2015 Citus Data, Inc. 67
![Page 68: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/68.jpg)
pg_shard
Master'Node'(PostgreSQL'+'pg_shard)'
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker'Node'#1'
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker'Node'#2'
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker'Node'#3'
shard'and'shard'placement'metadata'
Copyright © 2015 Citus Data, Inc. 68
![Page 69: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/69.jpg)
pg_shard
Master node
— Holds authoritative shard state
— One metadata row per:
— Sharded table
— Shard
— Placement
— Just regular tablesCopyright © 2015 Citus Data, Inc. 69
![Page 70: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/70.jpg)
pg_shard
Master failure
Increasing by acceptable downtime…
1. Use streaming replication and failover
2. Use EBS volume for data directory
3. Restore from pg_dump, etc.
4. Reconstruct from workers
Copyright © 2015 Citus Data, Inc. 70
![Page 71: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/71.jpg)
pg_shard
Metadata structure
postgres=# SELECT * FROM pgs_distribution_metadata.shard;
id | relation_id | storage | min_value | max_value-------+-------------+---------+-------------+------------- 10004 | 177880 | t | -2147483648 | -1879048194 10005 | 177880 | t | -1879048193 | -1610612739 10006 | 177880 | t | -1610612738 | -1342177284 10007 | 177880 | t | -1342177283 | -1073741829 10008 | 177880 | t | -1073741828 | -805306374 10009 | 177880 | t | -805306373 | -536870919 ... | ... | ... | ... | ...
Copyright © 2015 Citus Data, Inc. 71
![Page 72: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/72.jpg)
pg_shard
Worker nodes
— Logical shards are placed on nodes
— Each placement is one PostgreSQL table
— Object names extended by shard identifiere.g. click_events_1001 for shard 1001
— Indexes, constraints propagated at creation
Copyright © 2015 Citus Data, Inc. 72
![Page 73: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/73.jpg)
pg_shard
Worker failure
— Unreachable nodes marked as inactive
— Repair with master_copy_shard_placement
1. Replay DDL commands for table, objects
2. Copy data from healthy node
3. Update master metadata
Copyright © 2015 Citus Data, Inc. 73
![Page 74: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/74.jpg)
pg_shard
First steps…
Copyright © 2015 Citus Data, Inc. 74
![Page 75: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/75.jpg)
pg_shard
Distributing a table
-- create regular table and some indexesCREATE TABLE users ( id integer NOT NULL, name text NOT NULL, birthday date NOT NULL, CONSTRAINT name_present CHECK (btrim(name) != '') );
CREATE INDEX id_idx ON users (id);CREATE INDEX bday_idx ON users (birthday);CREATE INDEX name_idx ON users (name);CREATE INDEX pfx_idx ON users (lower(name) text_pattern_ops);
Copyright © 2015 Citus Data, Inc. 75
![Page 76: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/76.jpg)
pg_shard
Distributing a table
CREATE EXTENSION IF NOT EXISTS pg_shard;
-- designate table as distributed; specify keySELECT master_create_distributed_table('users', 'id');
-- create sixteen shards, each with two copiesSELECT master_create_worker_shards('users', 16, 2);
Copyright © 2015 Citus Data, Inc. 76
![Page 77: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/77.jpg)
pg_shard
Just use SQL!
INSERT INTO users VALUES (1, 'Jason Petersen', '2015-03-23');INSERT INTO users VALUES (2, 'Ozgun Erdogan', '2013-02-11');INSERT INTO users VALUES (3, 'Ageless', NULL);INSERT INTO users VALUES (4, ' ', '2010-08-17');
DELETE FROM users WHERE id = 2;
UPDATE users SET birthday = '1900-06-01' WHERE id = 1;
SELECT name FROM users WHERE id = 1;SELECT max(birthday) FROM users;
Copyright © 2015 Citus Data, Inc. 77
![Page 78: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/78.jpg)
Under the hood
Copyright © 2015 Citus Data, Inc. 78
![Page 79: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/79.jpg)
pg_shard
PostgreSQL hooks
— Full control over command lifecycle
— Specific hooks for specific needs:
— Planning
— Execution (Start, Run, Finish, End)
— Utility
Copyright © 2015 Citus Data, Inc. 79
![Page 80: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/80.jpg)
pg_shard
Planning phase
— Determine whether distributed
— Fall through to PostgreSQL if not(enables regular tables on master!)
— Find involved shards based on shard key
— Deparse query to shard-specific SQL
Copyright © 2015 Citus Data, Inc. 80
![Page 81: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/81.jpg)
pg_shard
Planning example
Starting with the input SQL…
INSERT INTO users VALUES (5, 'Tom Lane', '2005-07-08');
Copyright © 2015 Citus Data, Inc. 81
![Page 82: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/82.jpg)
pg_shard
Planning example
… determine the partition key clauses…
(id = 5)
Copyright © 2015 Citus Data, Inc. 82
![Page 83: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/83.jpg)
pg_shard
Planning example
… use them to find the proper shard…
SELECT id FROM pgs_distribution_metadata.shardWHERE hashint4(5) BETWEEN min_value::integer AND max_value::integer AND relation_id = 'users'::regclass;
# id # -------# 10003
Copyright © 2015 Citus Data, Inc. 83
![Page 84: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/84.jpg)
pg_shard
Planning example
… generate shard-specific SQL…
INSERT INTO users_10003 VALUES (5, 'Tom Lane', '2005-07-08');
… and send it to the shard’s placements.
Copyright © 2015 Citus Data, Inc. 84
![Page 85: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/85.jpg)
pg_shard
Execution
Now we know what the SQL is and where it should be routed. Execution logic differs depending if the query is a SELECT or a modification.
Copyright © 2015 Citus Data, Inc. 85
![Page 86: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/86.jpg)
pg_shard
Distributed modification
— Locks enforce safe commutation
— Replicas visited in predictable order
— Per-session libpq connection pool
— If replica errors out, mark as inactive
Copyright © 2015 Citus Data, Inc. 86
![Page 87: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/87.jpg)
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Single'shard-INSERT'Replica1on-factor:-2-
Master&
INSERT"INTO"customer_reviews"...&
Copyright © 2015 Citus Data, Inc. 87
![Page 88: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/88.jpg)
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Single'shard-INSERT-One-replica-fails-
Master&
INSERT"INTO"customer_reviews"...&
Copyright © 2015 Citus Data, Inc. 88
![Page 89: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/89.jpg)
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Single'shard-INSERT-Master-marks-inac3ve-
Master&
Sets&shard&6,&node&3&to&inac8ve&status&
INSERT"INTO"customer_reviews"...&
Copyright © 2015 Citus Data, Inc. 89
![Page 90: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/90.jpg)
pg_shard
Modification semantics
— Consistent (read your own writes)
— Safety comes from commutativity rules
— Can reorder SELECTs and INSERTs
— Not so for UPDATEs and DELETEs
— Constraints require predictable order
Copyright © 2015 Citus Data, Inc. 90
![Page 91: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/91.jpg)
pg_shard
Targeted SELECT
— Fetch entire result from single shard
— Failover to anther replica on error
— Do not modify state if failure occurs
— Common key-value access pattern
Copyright © 2015 Citus Data, Inc. 91
![Page 92: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/92.jpg)
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Targeted(SELECT&Try(first(placement(
Master&
SELECT"*"FROM"customer_reviews""""""""""WHERE"customer_id"="'HN892';&
Copyright © 2015 Citus Data, Inc. 92
![Page 93: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/93.jpg)
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Targeted(SELECT(Encounter(error(
Master&
SELECT"*"FROM"customer_reviews""""""""""WHERE"customer_id"="'HN892';&
Copyright © 2015 Citus Data, Inc. 93
![Page 94: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/94.jpg)
1" 3" 4"
6" 7" 9"
…" …" …"
…" …" …"
Worker&Node&
1" 2" 4"
5" 7" 8"
…" …" …"
…" …" …"
Worker&Node&
2" 3" 5"
6" 8" 9"
…" …" …"
…" …" …"
Worker&Node&
Targeted(SELECT(Try(next(placement(
Master&
SELECT"*"FROM"customer_reviews""""""""""WHERE"customer_id"="'HN892';&
Copyright © 2015 Citus Data, Inc. 94
![Page 95: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/95.jpg)
pg_shard
Limitations
— Transactions cannot…
— involve multiple shards
— span multiple statements
— Cross-shard constraints unenforced
Copyright © 2015 Citus Data, Inc. 95
![Page 96: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/96.jpg)
What are people building?
pg_shard’s capabilities and limitations are similar to those of many popular NoSQL solutions.
Copyright © 2015 Citus Data, Inc. 96
![Page 97: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/97.jpg)
What are people building?
pg_shard in Production
— Clickstream event data
— HyperLogLog4 for scalable UNIQUEs
— 30,000 INSERTs/second ingest
— Around 200GB data already
— CitusDB SELECTs: 100x faster
4 “HyperLogLog data structures as a native [PostgreSQL] data type”
Copyright © 2015 Citus Data, Inc. 97
![Page 98: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/98.jpg)
Upcoming features?
— More SQL coverage
— Rebalancing
— Multi-master
— Auto-recovery
— INSERT streaming/pipelining
— Suggestions welcome
Copyright © 2015 Citus Data, Inc. 98
![Page 99: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/99.jpg)
Scaling summary
— Explore these avenues first!
— Many little experiments
— Cross-cutting; whole-stack
— Get out every ounce
Copyright © 2015 Citus Data, Inc. 99
![Page 100: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/100.jpg)
Sharding summary
— Shard once you rule out all else
— Use many small “logical shards”
— Think carefully when picking key
— pg_shard/CitusDB merging!
Copyright © 2015 Citus Data, Inc. 100
![Page 101: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/101.jpg)
pg_shard summary
— Open source sharding for PostgreSQL
— First-class PostgreSQL extension
— LOAD, CREATE TABLE, distribute
— https://github.com/citusdata/pg_shard
Copyright © 2015 Citus Data, Inc. 101
![Page 102: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/102.jpg)
Contact
— Jason: [email protected]
— General: [email protected]
Copyright © 2015 Citus Data, Inc. 102
![Page 103: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/103.jpg)
QuestionsCopyright © 2015 Citus Data, Inc. 103
![Page 104: Jason Petersen - Citus Datainfo.citusdata.com/rs/235-CNE-301/images/Sharding_and...Increasing by acceptable downtime… 1. Use streaming replication and failover 2. Use EBS volume](https://reader034.vdocuments.net/reader034/viewer/2022042415/5f2f9cd5392ef434cd7a9ec7/html5/thumbnails/104.jpg)
Copyright © 2015 Citus Data, Inc. 104