postgresql: joining 1 million tables
TRANSCRIPT
Joining 1 million tables
Hans-Jurgen Schonig
www.postgresql-support.de
Hans-Jurgen Schonigwww.postgresql-support.de
Joining 1 million tables
Hans-Jurgen Schonigwww.postgresql-support.de
The goal
I Creating and joining 1 million tables
I Pushing PostgreSQL to its limits (and definitely beyond)
I How far can we push this without patching the core?
I Being the first one to try this ;)
Hans-Jurgen Schonigwww.postgresql-support.de
Creating 1 million tables is easy, no?
I Here is a script:
#!/usr/bin/perl
print "BEGIN;\n";
for (my $i = 0; $i < 1000000; $i++)
{print "CREATE TABLE t_tab_$i (id int4);\n";
}
print "COMMIT;\n";
Hans-Jurgen Schonigwww.postgresql-support.de
Creating 1 million tables
Who expects this to work?
Hans-Jurgen Schonigwww.postgresql-support.de
Well, it does not ;)
CREATE TABLE
CREATE TABLE
WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase
max_locks_per_transaction.
ERROR: current transaction is aborted, commands ignored
until end of transaction block
Hans-Jurgen Schonigwww.postgresql-support.de
Why does it fail?
I There is a variable called max locks per transaction
I max locks per transaction * max connections = maximumnumber of locks
I 100 connection x 64 is by far not enough
Hans-Jurgen Schonigwww.postgresql-support.de
A quick workaround
I Use single transactionsI Avoid nasty disk flushes on the way
SET synchronous_commit TO off;
time ./create_tables.pl | psql test > /dev/null
real 14m45.320s
user 0m45.778s
sys 0m18.877s
Hans-Jurgen Schonigwww.postgresql-support.de
Reporting success . . .
test=# SELECT count(tablename)
FROM pg_tables
WHERE schemaname = ’public’;
count
---------
1000000
(1 row)
I One million tables have been created
Hans-Jurgen Schonigwww.postgresql-support.de
Creating an SQL statement
I Writing this by hand is not feasibleI Desired output:
SELECT 1
FROM t_tab_1, t_tab_2, t_tab_3, t_tab_4, t_tab_5
WHERE t_tab_1.id = t_tab_2.id AND
t_tab_2.id = t_tab_3.id AND
t_tab_3.id = t_tab_4.id AND
t_tab_4.id = t_tab_5.id AND
1 = 1;
Hans-Jurgen Schonigwww.postgresql-support.de
A first attempt to write a script
#!/usr/bin/perl
my $joins = 5; my $sel = "SELECT 1 ";
my $from = ""; my $where = "";
for (my $i = 1; $i < $joins; $i++)
{$from .= "t_tab_$i, \n";$where .= " t_tab_" . ($i) . ".id = t_tab_".
($i+1) . ".id AND \n";}$from .= " t_tab_$joins ";
$where .= " 1 = 1; ";
print "$sel \n FROM $from \n WHERE $where\n";
Hans-Jurgen Schonigwww.postgresql-support.de
1 million tables: What it means
I If you do this for 1 million tables . . .
perl create.pl | wc
2000001 5000003 54666686
I 54 MB of SQL is quite a bit of codeI First observation: The parser works ;)
Hans-Jurgen Schonigwww.postgresql-support.de
Giving it a try
I runtimes with PostgreSQL default settings:
I n = 10: 158 msI n = 100; 948 msI n = 200: 3970 msI n = 400: 18356 msI n = 800: 87095 msI n = 1000000: prediction = 1575 days :)
I ouch, this does not look linearI 1575 days was too close to the conference
Hans-Jurgen Schonigwww.postgresql-support.de
GEQO:
I At some point GEQO will kick inI Maybe it is the problem?
SET geqo TO off;
I n = 10: 158 ms
I n = 100: ERROR: canceling statement due to user requestTime: 680847.127 ms
I It clearly is not . . .
Hans-Jurgen Schonigwww.postgresql-support.de
It seems GEQO is needed
I Let us see how far we can get by adjusting GEQO settings
SET geqo_effort TO 1;
- n = 10: 158 ms
- n = 100: 916 ms
- n = 200: 1464 ms
- n = 400: 4694 ms
- n = 800: 18896 ms
- n = 10000000: maybe 1 year? :)
=> the conference deadline is coming closer
=> still 999.000 tables to go ...
Hans-Jurgen Schonigwww.postgresql-support.de
Trying with 10.000 tables
./make_join.pl 10000 | psql test
ERROR: stack depth limit exceeded
HINT: Increase the configuration parameter
max_stack_depth (currently 2048kB), after ensuring
the platform’s stack depth limit is adequate.
I this is easy to fix
Hans-Jurgen Schonigwww.postgresql-support.de
Removing limitations
I Problem can be fixed easilyOS: ulimit -s unlimitedpostgresql.conf: set max stack depth to a very high value
Hans-Jurgen Schonigwww.postgresql-support.de
Next stop: Exponential planning time
I Obviously planning time goes up exponentially
I There is no way to do this without patching
I As long as we are ways slower than linear it is impossible tojoin 1 million tables.
Hans-Jurgen Schonigwww.postgresql-support.de
This is how PostgreSQL joins tables:
Tables a, b, c, d form the following joins of 2:
[a, b], [a, c], [a, d], [b, c], [b, d], [c, d]
By adding a single table to each group, the following joins of 3 areconstructed:
[[a, b], c], [[a, b], d], [[a, c], b], [[a, c], d], [[a, d], b], [[a, d], c], [[b,c], a], [[b, c], d], [[b, d], a], [[b, d], c], [[c, d], a], [[c, d], b]
Hans-Jurgen Schonigwww.postgresql-support.de
More joining
Likewise, add a single table to each to get the final joins:
[[[a, b], c], d], [[[a, b], d], c], [[[a, c], b], d], [[[a, c], d], b], [[[a, d],b], c], [[[a, d], c], b], [[[b, c], a], d], [[[b, c], d], a], [[[b, d], a], c],[[[b, d], c], a], [[[c, d], a], b], [[[c, d], b], a]
I Got it? ;)
Hans-Jurgen Schonigwww.postgresql-support.de
A little patching
A little patch ensures that the list of tables is grouped into“sub-lists”, where the length of each sub-list is equal tofrom collapse limit. If from collapse limit is 2, the list of 4becomes . . .
[[a, b], [c, d]]
[a, b] and [c, d] are joined separately, so we only getone “path”:
[[a, b], [c, d]]
Hans-Jurgen Schonigwww.postgresql-support.de
The downside:
I This ensures minimal planning time but is also no realoptimization.
I There is no potential for GEQO to improve things
I Speed is not an issue in our case anyway. It just has to work“somehow”.
Hans-Jurgen Schonigwww.postgresql-support.de
Are we on a path to success?
I Of course not ;)
Hans-Jurgen Schonigwww.postgresql-support.de
The next limitation
I the join fails again
ERROR: too many range table entries
Hans-Jurgen Schonigwww.postgresql-support.de
This time life is easy
Solution: in include/nodes/primnodes.h
#define INNER_VAR 65000
/* reference to inner subplan */
#define OUTER_VAR 65001
/* reference to outer subplan *
#define INDEX_VAR 65002
/* reference to index column */
to
#define INNER_VAR 2000000
#define OUTER_VAR 2000001
#define INDEX_VAR 2000002
Hans-Jurgen Schonigwww.postgresql-support.de
Next stop: RAM starts to be an issue
| Tables (k) | Planning time (s) | Memory
|------------+-------------------+---------------------|
| 2 | 1 | 38.5 MB |
| 4 | 3 | 75.6 MB |
| 8 | 31 | 192 MB |
| 16 | 186 | 550 MB |
| 32 | 1067 | 1.7 GB |
(this is for planning only)
Expected amount of memory: not available ;)
Hans-Jurgen Schonigwww.postgresql-support.de
Is there a way out?
I Clearly: The current path leads to nowhere
I Some other approach is needed
I Divide and conquer?
Hans-Jurgen Schonigwww.postgresql-support.de
Changing the query
I Currently there is . . .
SELECT 1 ...
FROM tab1, ..., tabn
WHERE ...
I The problem must be split
Hans-Jurgen Schonigwww.postgresql-support.de
CTEs might come to the rescue
I How about creating WITH-clauses joining smaller sets oftables?
I At the end those WITH statements could be joined togethereasily.
I 1.000 CTEs with 1.000 tables each gives 1.000.000 tables
I Scary stuff: Runtime + memory Unforeseen stuff
Hans-Jurgen Schonigwww.postgresql-support.de
A skeleton query
SET enable_indexscan TO off;
SET enable_hashjoin TO off;
SET enable_mergejoin TO off;
SET from_collapse_limit TO 2;
SET max_stack_depth TO ’1GB’;
WITH cte_0(id_1, id_2) AS (
SELECT t_tab_0.id, t_tab_16383.id
FROM t_tab_0, ...
cte_1 ...
SELECT ... FROM cte_0, cte_1 WHERE ...
Hans-Jurgen Schonigwww.postgresql-support.de
Finally: A path to enlightenment
I With all the RAM we had . . .I With all the patience we had . . .I With a fair amount of luck . . .I And with a lot of confidence . . .
Hans-Jurgen Schonigwww.postgresql-support.de
Finally: It worked . . .
I Amount of RAM needed: around 40 GB
I Runtime: 10h 36min 53 sec
I The result:
?column?
----------
(0 rows)
Hans-Jurgen Schonigwww.postgresql-support.de
Finally
Hans-Jurgen Schonigwww.postgresql-support.de
Thank you for your attention
Cybertec Schonig & Schonig GmbH
Hans-Jurgen Schonig
Grohrmuhlgasse 26
A-2700 Wiener Neustadt
www.postgresql-support.de
Hans-Jurgen Schonigwww.postgresql-support.de