1 bell & gray 4/15 / 95 parallel database systems a snap application gordon bell 450 old oak...
TRANSCRIPT
![Page 1: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/1.jpg)
1Bell & Gray 4/15 / 95
Parallel Database SystemsParallel Database Systems A SNAP Application A SNAP Application
Gordon Bell
450 Old Oak CourtLos Altos, CA 94022
Jim Gray
310 Filbert, SF CA [email protected]
NetworkPlatform
![Page 2: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/2.jpg)
2Bell & Gray 4/15 / 95
OutlineOutline
• Cyberspace Pep Talk: •Databases are the dirt of Cyberspace•Billions of clients mean millions of servers
• Parallel Imperative:• Hardware trend: Many little devices• Consequence: Servers are arrays of commodity components• PC’s are the bricks of Cyberspace• Must automate parallel {design / operation / use}• Software parallelism via dataflow & Data Partitioning
• Parallel database techniques• Parallel execution of many little jobs (OLTP)• Data Partitioning• Pipeline Execution• Automation techniques)
• Summary
![Page 3: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/3.jpg)
3Bell & Gray 4/15 / 95
Kinds Of Information ProcessingKinds Of Information Processing
Point-to-Point Broadcast
Immediate
TimeShifted
conversationmoney
lectureconcert
mail booknewspaper
NetNetworkwork
DataDataBaseBase
Its ALL going electronicImmediate is being stored for analysis (so ALL database)Analysis & Automatic Processing are being added
![Page 4: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/4.jpg)
4Bell & Gray 4/15 / 95
Why Put Everything in Cyberspace?Why Put Everything in Cyberspace?
Low rentmin $/byte
Shrinks timenow or later
Shrinks spacehere or there
Automate processingknowbots
Point-to-Point OR Broadcast
Imm
edia
te O
R T
ime
Del
ayed
Network
DataBase
LocateProcessAnalyzeSummarize
![Page 5: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/5.jpg)
6Bell & Gray 4/15 / 95
Database Store ALL Data Database Store ALL Data TypesTypes
• The New World:•Billions of objects•Big objects (1MB)•Objects have behavior
(methods)
• The Old World:
– Millions of objects
– 100-byte objects
People
Name Address Papers Picture Voice
Mike
Won
David NY
Berk
Austin
People
Name Address
Mike
Won
David NY
Berk
Austin Paperless officeLibrary of congress onlineAll information online entertainment publishing businessInformation Network, Knowledge Navigator, Information at your fingertips
![Page 6: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/6.jpg)
7Bell & Gray 4/15 / 95
Magnetic Storage Cheaper than PaperMagnetic Storage Cheaper than Paper
• File Cabinet: cabinet (4 drawer) 250$paper (24,000 sheets) 250$space (2x3 @ 10$/ft2) 180$total 700$3 ¢/sheet
•Disk: disk (8 GB =) 4,000$ASCII: 4 m pages0.1 ¢/sheet (30x cheaper)
• Image: 200 k pages2 ¢/sheet (similar to paper)
• Store everything on disk
![Page 7: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/7.jpg)
8Bell & Gray 4/15 / 95
Cyberspace Demographics Cyberspace Demographics
• Computer History:
• most computers are smallNEXT: 1 Billion X for some X
(phone?)
• most of the money is inclients and wiring1990: 50% desktop1995: 75% desktop
1950 National Computer1960 Corporate Computer1970 Site Computer1980 Departmental Computer1990 Personal Computer2000 ?
100M
1M
1KSUPER MAIN
FRAMEMINI WS PC
po
pu
lati
on
1B$
100B$
10B$
Rev
enu
e
![Page 8: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/8.jpg)
9Bell & Gray 4/15 / 95
Billions of Clients Billions of Clients
• Every device will be “intelligent”•Doors, rooms, cars, ...
•Computing will be ubiquitous
![Page 9: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/9.jpg)
10Bell & Gray 4/15 / 95
Billions of Clients Need Billions of Clients Need Millions of ServersMillions of Servers
mobileclients
fixed clients
server
superserver
Clients
Servers
Super ServersLarge DatabasesHigh Traffic shared data
All clients are networked to serversmay be nomadic or on-demand
Fast clients want faster servers
Servers provide
data,
control,
coordination
communication
![Page 10: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/10.jpg)
17Bell & Gray 4/15 / 95
OutlineOutline
• Cyberspace Pep Talk: • Databases are the dirt of Cyberspace• Billions of clients mean millions of servers
• Parallel Imperative:•Hardware trend: Many little devices•Consequence: Server arrays of commodity parts•PC’s are the bricks of Cyberspace•Must automate parallel {design / operation / use}•Software parallelism via dataflow & Data Partitioning
• Parallel database techniques• Parallel execution of many little jobs (OLTP)• Data Partitioning• Pipeline Execution• Automation techniques)
• Summary
![Page 11: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/11.jpg)
18Bell & Gray 4/15 / 95
Hardware trends: Few generic parts: CPU RAM Disk & Tape arrays
ATM for LAN/WAN?? for CAN
?? for OS
These parts will be inexpensive (commodity components)Systems will be arrays of these partsSoftware challenge: how to program arrays
1 M$100 K$ 10 K$
Mainframe MiniMicro Nano
9"5.25" 3.5" 2.5" 1.8"
Moore’s Law RestatedMoore’s Law RestatedMany Little Won over Few BigMany Little Won over Few Big
![Page 12: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/12.jpg)
19Bell & Gray 4/15 / 95
Future SuperServerFuture SuperServer
Array of processors, disks, tapescomm lines
Challenge:How to program itMust use parallelism
Pipeline hide latency
Partitionbandwidthscaleup
1,000 discs = 10 Terrorbytes
100 Tape Transports
= 1,000 tapes = 1 PetaByte
100 Nodes 1 Tips
Hig
h S
pe
ed N
etw
ork
( 10
Gb
/s)
![Page 13: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/13.jpg)
21Bell & Gray 4/15 / 95
The Hardware is in Place and Then A Miracle Occurs
SNAPSNAPScaleable Network And Platforms
Commodity Distributed OSCommodity Distributed OS built onCommodity PlatformsCommodity Network Interconnect
?
![Page 14: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/14.jpg)
22Bell & Gray 4/15 / 95
Why Parallel Access To Data?Why Parallel Access To Data?
1 Terabyte
10 MB/s
At 10 MB/s1.2 days to scan
1 Terabyte
1,000 x parallel1.3 minute SCAN.
Parallelism: divide a big problem into many smaller ones
to be solved in parallel.
BANDWIDTH
![Page 15: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/15.jpg)
23Bell & Gray 4/15 / 95
DataFlow ProgrammingDataFlow ProgrammingPrefetch & Postwrite Hide Latency Prefetch & Postwrite Hide Latency
• Can't wait for the data to arrive• Need a memory that gets the data in advance ( 100MB/S)
• Solution: •Pipeline from source (tape, disc, ram...) to cpu cache•Pipeline results to destination LATENCY
![Page 16: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/16.jpg)
24Bell & Gray 4/15 / 95
The New Law of ComputingThe New Law of Computing
Grosch's Law:
Parallel Law: Needs
Linear Speedup and Linear ScaleupNot always possible
1 MIPS1 $
1,000 $
1,000 MIPS
2x $ is 2x performance
1 MIPS1 $
1,000 MIPS 32 $.03$/MIPS
2x $ is 4x performance
![Page 17: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/17.jpg)
25Bell & Gray 4/15 / 95
Parallelism: Performance is the GoalParallelism: Performance is the Goal
Goal is to get 'good' performance.
Law 1: parallel system should be faster than serial system
Law 2: parallel system should give near-linear scaleup or
near-linear speedup orboth.
Parallelism is faster, not cheaper:trades money for time.
![Page 18: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/18.jpg)
27Bell & Gray 4/15 / 95
The Perils of ParallelismThe Perils of Parallelism
Startup: Creating processesOpening filesOptimization
Interference: Device (cpu, disc, bus)logical (lock, hotspot, server, log,...)
Skew: If tasks get very small, variance > service time
Processors & Discs
The Good Speedup Curve
Processors & Discs
Three Perils
Sta
rtu
p
Inte
rfer
enc
e
Ske
w
Processors & Discs
A Bad Speedup Curve
Linearity
No Parallelism Benefit
![Page 19: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/19.jpg)
29Bell & Gray 4/15 / 95
Kinds of Parallel ExecutionKinds of Parallel Execution
Pipeline
Partition outputs split N ways inputs merge M ways
Any Sequential Program
Any Sequential Program
SequentialSequential
SequentialSequential Any Sequential Program
Any Sequential Program
![Page 20: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/20.jpg)
30Bell & Gray 4/15 / 95
Data RiversData Rivers Split + Merge StreamsSplit + Merge Streams
River
M ConsumersN producers
Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering
does partition and merge of data recordsRiver = Exchange operator in Volcano.
N X M Data Streams
![Page 21: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/21.jpg)
31Bell & Gray 4/15 / 95
Partitioned Data and ExecutionPartitioned Data and Execution
A...E F...J K...N O...S T...Z
A Table
Count Count Count Count Count
Count
Spreads computation and IO among processors
Partitioned data gives NATURAL execution parallelism
![Page 22: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/22.jpg)
32Bell & Gray 4/15 / 95
Partitioned + Merge + Pipeline Partitioned + Merge + Pipeline Execution Execution
A...E F...J K...N O...S T...Z
Merge
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Pure dataflow programming Gives linear speedup & scaleup
But, top node may be bottleneckSo....
![Page 23: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/23.jpg)
33Bell & Gray 4/15 / 95
N xM way ParallelismN xM way Parallelism
A...E F...J K...N O...S T...Z
Merge
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Merge Merge
N inputs, M outputs, no bottlenecks.
![Page 24: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/24.jpg)
34Bell & Gray 4/15 / 95
Why are Relational OperatorsWhy are Relational OperatorsSuccessful for Parallelism?Successful for Parallelism?
Relational data model uniform operatorson uniform data streamClosed under composition
Each operator consumes 1 or 2 input streamsEach stream is a uniform collection of dataSequential data in and out: Pure dataflow
partitioning some operators (e.g. aggregates, non-equi-join, sort,..)
requires innovation
AUTOMATIC PARALLELISM
![Page 25: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/25.jpg)
35Bell & Gray 4/15 / 95
SQL SQL a NonProcedural Programming Languagea NonProcedural Programming Language
• SQL: functional programming language describes answer set.
• Optimizer picks best execution plan•Picks data flow web (pipeline), •degree of parallelism (partitioning)•other execution parameters (process placement, memory,...)
GUI
Schema
Plan
Monitor
Optimizer
ExecutionPlanning
Rivers
Executors
![Page 26: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/26.jpg)
36Bell & Gray 4/15 / 95
Database Systems “Hide” Parallelism Database Systems “Hide” Parallelism
•Automate system management via tools•data placement•data organization (indexing)•periodic tasks (dump / recover / reorganize)
•Automatic fault tolerance•duplex & failover• transactions
•Automatic parallelism•among transactions (locking)•within a transaction (parallel execution)
![Page 27: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/27.jpg)
37Bell & Gray 4/15 / 95
Success StoriesSuccess Stories
•Online Transaction Processing •many little jobs•SQL systems support 3700 tps-A
(24 cpu, 240 disk)
•SQL systems support 21,000 tpm-C (110 cpu, 800 disk)
•Batch (decision support and Utility)• few big jobs, parallelism inside•Scan data at 100 MB/s•Linear Scaleup to 50 processors
tran
sact
ion
s /
sec
hardware
recs
/ se
c
hardware
![Page 28: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/28.jpg)
38Bell & Gray 4/15 / 95
Kinds of Partitioned DataKinds of Partitioned Data
Split a SQL table to subset of nodes & disks
Partition within set:Range Hash Round Robin
A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z
Good for equijoins, range queriesgroup-by
Shared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning
Good for equijoins Good to spread load
![Page 29: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/29.jpg)
41Bell & Gray 4/15 / 95
Picking Data RangesPicking Data Ranges
Disk PartitioningFor range partitioning, sample load on disks.
Cool hot disks by making range smallerFor hash partitioning,
Cool hot disks by mapping some buckets to others
River PartitioningUse hashing and assume uniform If range partitioning, sample data and use
histogram to level the bulk
Teradata, Tandem, Oracle use these tricks
![Page 30: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/30.jpg)
42Bell & Gray 4/15 / 95
Parallel Data ScanParallel Data Scan
Select imagefrom landsatwhere date between 1970 and 1990and overlaps(location, :Rockies) and snow_cover(image) >.7;
Temporal
Spatial
Image
date loc image
Landsat
1/2/72.........4/8/95
33N120W.......34N120W
Assign one process per processor/disk:find images with right data & locationanalyze image, if 70% snow, return it
image
Answer
date, location, & image tests
![Page 31: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/31.jpg)
44Bell & Gray 4/15 / 95
Parallel AggregatesParallel Aggregates
For aggregate function, need a decomposition strategy:
count(S) = count(s(i)), ditto for sum()avg(S) = ( sum(s(i))) / count(s(i))and so on...
For groups,sub-aggregate groups close to the sourcedrop sub-aggregates into a hash river.
A...E F...J K...N O...S T...Z
A Table
Count Count Count Count Count
Count
![Page 32: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/32.jpg)
46Bell & Gray 4/15 / 95
Sub-sortsgenerateruns
Mergeruns
Range or Hash Partition River
River is range or hash partitioned
Scan or other source
Parallel SortParallel Sort
M input N output Sort design
Disk and mergenot needed if sort fits in memory
Scales linearly because
6
12= => 2x slowerlog(10 ) 6
log(10 ) 12
Sort is benchmark from hell for shared nothing machinesnet traffic = disk bandwidth, no data filtering at the source
![Page 33: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/33.jpg)
47Bell & Gray 4/15 / 95
Blocking Operators =Short PiplelinesBlocking Operators =Short Piplelines
An operator is blocking, if it does not produce any output, until it has consumed all its input
Examples:Sort, Aggregates, Hash-Join (reads all of one operand)
Blocking operators kill pipeline parallelismMake partition parallelism all the more important.
Sort RunsScan
Sort Runs
Sort Runs
Sort Runs
Tape File SQL Table Process
Merge Runs
Merge Runs
Merge Runs
Merge Runs
Table Insert
Index Insert
Index Insert
Index Insert
SQL Table
Index 1
Index 2
Index 3
Database LoadTemplate hasthree blocked phases
![Page 34: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/34.jpg)
50Bell & Gray 4/15 / 95
Hash JoinHash Join
Hash smaller table into N buckets (hope N=1)
If N=1 read larger table, hash to smallerElse, hash outer to disk then
bucket-by-bucket hash join.
Purely sequential data behavior
Always beats sort-merge and nestedunless data is clustered.
Good for equi, outer, exclusion joinLots of papers,
products just appearing (what went wrong?)
Hash reduces skew
Right Table
LeftTable
HashBuckets
![Page 35: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/35.jpg)
51Bell & Gray 4/15 / 95
Observation: Execution “easy”Observation: Execution “easy”Automation “hard”Automation “hard”
It is “easy” to build a fast parallel execution environment(no one has done it, but it is just programming)
It is hard to write a robust and world-class query optimizer.There are many tricksOne quickly hits the complexity barrier
Common approach:Pick best sequential planPick degree of parallelism based on bottleneck analysisBind operators to processPlace processes at nodesPlace scratch files near processesUse memory as a constraint
![Page 36: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/36.jpg)
52Bell & Gray 4/15 / 95
Systems That Work This WaySystems That Work This Way
Shared NothingTeradata: 400 nodesTandem: 110 nodesIBM / SP2 / DB2: 48 nodesATT & Sybase 112 nodesInformix/SP2 48 nodes
Shared DiskOracle 170 nodesRdb 24 nodes
Shared MemoryInformix 9 nodes RedBrick ? nodes
CLIENTS
MemoryProcessors
CLIENTS
CLIENTS
![Page 37: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/37.jpg)
53Bell & Gray 4/15 / 95
Research Research ProblemsProblems
• Automatic data placement (partition: random or organized)
• Automatic parallel programming (process placement)
• Parallel concepts, algorithms & tools
• Parallel Query Optimization
• Execution Techniques load balance, checkpoint/restart, pacing,
1,000 discs = 10 Terrorbytes
100 Tape Transports
= 1,000 tapes = 1 PetaByte
100 Nodes 1 Tips
Hig
h S
pe
ed N
etw
ork
( 10
Gb
/s)
![Page 38: 1 Bell & Gray 4/15 / 95 Parallel Database Systems A SNAP Application Gordon Bell 450 Old Oak Court Los Altos, CA 94022 GBell@Microsoft.com Jim Gray 310](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649b4b550346318e8c1dbb/html5/thumbnails/38.jpg)
54Bell & Gray 4/15 / 95
SummarySummary
• Cyberspace is Growing• Databases are the dirt of cybersspace
PCs are the bricks, Networks are the morter. Many little devices: Performance via Arrays of {cpu, disk ,tape}
• Then a miracle occurs: a scaleable distributed OS and net•SNAP: Scaleable Networks and Platforms
• Then parallel database systems give software parallelism•OLTP: lots of little jobs run in parallel•Batch TP: data flow & data partitioning•Automate processor & storage array administration •Automate processor & storage array programming
• 2000 platforms as easy as 1 platform.