lessons postgresql learned from commercial databases, and didn’t
TRANSCRIPT
Lessons PostgreSQL learned from commercialdatabases, and didn’t
Ilya [email protected]
Preamble
PostgreSQL is a great database!
• (You always need to say so if you are going to say PostgreSQLlags behind commercial databases or has some limitations)
Preamble
PostgreSQL is a great database!
• The only open source database technology, massively used asan alternative to commercial RDBMSs
• Moreover, 10 years ago it was seriously disputed (by somepeople), if PostgreSQL can outperform MySQL
• Moreover, 5 years ago any Oracle to Postgres migrationcase-study meant you will be 100% accepted to anyPostgreSQL conference
• Only PostgreSQL did such impressive progress!• Well, Linux did, but Linux is not a database system
What made that possible?
• Good initial architecture• Well organized community work• SQL close to standard• Procedural languages• Lots of things - you probably know those things if you are here
Did PostgreSQL learne something?(from commercial databases)
• Well, not directly• At least, this a worst possible way to start discussion on[HACKERS]: ”...we need this feature because Oracle has it...”
• Most likely people came from Oracle, did not find somebeloved instruments and started to implement a substitution
Anecdotally
• Prominent Soviet aircraft designer Tupolev, being unofficiallyaccused of plagiarizing some of his models, used to say that allbeautiful aircrafts look similar and that is why they can fly
• Tupolev’s ill-wishers believed that he definitely plagiarized thatformula as well - from some another aircraft designer...
• For aviation engineers, it was always obvious, that internallyairplanes were totally different
• Anyway, databases _are_ like aircrafts: common theorybeneath makes them look similar
That common theory was
Transactions• If your data is important, use a database which supports ACIDtransactions
• In PostgreSQL: MVCC implementation since version 6.5(1999), WAL since 7.1 (2001)
• Adopting MVCC instead of pure-locking scheduler was wise(DB2 and MS SQL Server proved that over the time)
• That allowed to implement reliable backup/recoverymechanism and replication for high availability
• And that was actually a pivotal point, which startedPostgreSQL adoption in enterprise-level solutions
• Ironically, current MVCC implementation itself became somelimitation for Postgres
OK, hold on
What can actually stop you from choosing Postgres insteadof Oracle or DB2?
• Write performance - Yes, absolutely• Database size - Yes, definitely• Lack of diagnostics tools - Yes• We need to run PostgreSQL in Microsoft environment - Yes• Lack of qualified people - Maybe• Lack of build in analog of RAC/PureScale - Yes and No• We are talking about heavy workloads and comparing with
enterprise licenses
OK, hold on
What can actually stop you from choosing Postgres insteadof Oracle or DB2?
• Write performance - Yes, absolutely• Database size - Yes, definitely• Lack of diagnostics tools - Yes• We need to run PostgreSQL in Microsoft environment - Yes• Lack of qualified people - Maybe• Lack of build in analog of RAC/PureScale - Yes and No
• We are talking about heavy workloads and comparing withenterprise licenses
OK, hold on
What can actually stop you from choosing Postgres insteadof Oracle or DB2?
• Write performance - Yes, absolutely• Database size - Yes, definitely• Lack of diagnostics tools - Yes• We need to run PostgreSQL in Microsoft environment - Yes• Lack of qualified people - Maybe• Lack of build in analog of RAC/PureScale - Yes and No• We are talking about heavy workloads and comparing with
enterprise licenses
Main problem
Write performanceand
database size
PostgreSQL uses buffered writes
Disks
Kernel buffer
shared_buffers
Disks
Kernel buffer
shared_buffers
Buffered IO Direct IO
PostgreSQL uses buffered writes
• Effectively, one PostgreSQL process writes pages one by one tokernel buffer, then that buffer will be flushed to disk
• Besides double-caching, this is slow and does not allow to usesome cool features (O_ATOMIC)
• Oracle can bypass kernel buffer using direct IO. Moreover,both Oracle’s database writer and logwriter can swap threadsto write asynchronously
• That is a serious limitation for reaching high TPS figures on asingle instance
Huge database
• Same problem - double caching• Storage overhead• Backup performance and recovery time• Autovacuum performance becomes an issue
Backup performance
• No build-in parallelism• Level 0 plus PITR only• Keeping undo information right in datafiles can be a problemfor incremental backups
Current MVCC implementation is a limitation itself
Nothing new, I only want to mention that it can be largestchallenge for PostgreSQL in the next 20 years
• It solves only one, the ”snapshot to old”, problem (and modernOracle solves it better)
• Undo information, spreaded inside datafiles brings a lot ofproblems
Luck of diagnostics tools
• OK, there are plenty of them• Tools, which require kernel developer experience, such as perf,are not proper tools for a DBA
• Full time PostgreSQL developers are not DBAs. We need toexplain them, what we need and why
• Adding wait information to pg_stat_activity is a goodexample of such joint effort
• And a good lesson learned from Oracle. Not the last I hope
PostgreSQL performance on Windows
• Well, there is no such thing. By the way, Oracle performs well• At the same time, a lot of PostgreSQL on Windows• Lack of enthusiasts for proper porting• At the same time, we support various BSD and even Tru64UNIX!
• Welcome to the world of open source!
Documetation
• Relatively small, but efficient, not over-engendered, covers alltopics well - at a first glance
• No graphic diagrams. It seems much easier to decide aboutgraphical format, than to rework MVCC!
• No guidebooks. Application developer must read a half ofdocumentation, to install Postgres in test environment!
• OK, there is PostgreSQL wiki, but it is not under releasecontrol
In spite of all this
PostgreSQL is a great database!
• It is still relatively simple to start with and to live with• It is safe. We have no listener, but we have no thick booksabout securing listener from external attack.
• It learns fast• May be it will change databases global market like Linuxchange operating systems global market
Questions?