apache derby performance - huihoo
TRANSCRIPT
Apache Derby Performance
Olav Sandstå, Dyre Tjeldvoll, Knut Anders HatlenDatabase Technology GroupSun Microsystems
• Derby Architecture• Performance Evaluation of Derby• Performance Tips• Comparing Derby, MySQL and PostgreSQL
Overview
Derby Architecture: Embedded
Java Virtual Machine
Application
JDBC
SQL
Access
Log Data
StorageDatabase buffer
● Parsing of SQL● Compiling ● Optimization● Byte-code gen.● Execution
● Access path● Indexes
● Database buffer● Log generation● Log execution
● Synchronous write of log records
● Asynchronous write of data (checkpoint)● Synchronous read of data
Derby Architecture: Client-Server
Java Virtual Machine
Application
JDBC client
Java Virtual Machine
JDBC
SQL
Access
Log Data
StorageDatabase buffer
Network Server
Performance Evaluation of Derby
Overview:• Evaluation of disk and file system configurations:
> Disk write cache> Database and transaction log on separate disks
• Database configurations:> Size of database buffer
• Comparing Embedded and Client-Server
What Is “Performance”?
• How to measure database performance?> Throughput> Response time> Scalability
• Which configuration?> Out-of-the-box configuration> Carefully tuned
• How to compare database systems with different properties and different tuning possibilities?
Test Configuration
Load clients:
1. TPC-B like load:> 3 UPDATES > 1 SELECT> 1 INSERT
2. Single-record SELECT:> Select of one record on
primary key
Test platform:
• Sun JVM 1.5.0• Solaris 10 x86• 2 x 2.4 GHz AMD Opteron• 2 GB RAM• 2 SCSI disks• UFS file system
Effect of Write Cache on Disk
TPC-B like load:
WARNING: The write cache reduces probability of successful recovery after power failure
Separate Data and Log Devices
Log device:> Sequential write of
transaction log> Synchronous as part of
commit> Group commit
Data device:> Data in database buffer
regularly written do disk as part of checkpoint
> Data read from disk on demand Tip: use separate disks for
data and log device
Database Buffer
• Cache of frequently used data pages in memory
• Cache-miss leads to a read from disk (or disk cache)
• Size: > default 4 MB> derby.storage. pageCacheSize
Performance tip: • increase the size of the database buffer to get frequently
accessed data in memory
Example:single-record select:
Comparing Embedded and Client-Server
• TPC-B like load:
Comparing Embedded and Client-Server
• Single-record SELECT:
Comparing Embedded and Client-Server:CPU Usage
TPC-B Embedded
TPC-B Client-Server
Select Embedded
Select Client-Server
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
CPU usage per transaction [ms]
User CPU
System CPU
• Network Server add 30%-50% to the CPU usage
• System CPU usage:> Message sending and
receiving
• User CPU usage:> Message parsing> Character set conversions
Performance Tips
Performance Tips 1:Use Prepared Statements• Compilation of SQL
statements is expensive:> Derby generates Java byte
code and loads generated classes
• Prepared statements eliminate this cost
Tip:• USE prepared statements • and REUSE them
PreparedStatement Statement
0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
2.75
System time
User time
CP
U u
sag
e (
ms/
tx)
Performance Tips 2:Avoid Table Scans• Use indexes to optimize much used access paths:
> CREATE INDEX....
• Check the query plan:> derby.language.logQueryPlan=true
• Use RunTimeStatistics:> SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1)> SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS()
Tip: • Use indexes• Use Derby's tools to understand the query execution
Comparing the Performance ofMySQL, PostgreSQL and Derby
Performance Evaluation:
MySQL, PostgreSQL and Derby
Evaluated performance of:• MySQL/InnoDB (version 5.0.10)• PostgreSQL (version 8.0.3) • Derby Embedded (version 10.1.1.0)• Derby Client-Server
Database Configurations
Configurations:• “Out-of-box” performance• No tuning, except:
> size of database buffer> database and transaction
log on separate disks➔ No Benchmark
Load:> 1-100 concurrent clients
Databases:1. Main-memory database:
> 10 MB user data> 64 MB database buffer
2. Disk database:> 10 GB user data> 64 MB database buffer
Throughput: TPC-B like load
Main-memory database (10 MB): Disk-based database (10 GB):
Throughput: Single-record Select
Main-memory database (10 MB): Disk-based database (10GB):
Observations
• Derby outperforms MySQL on disk-based databases> Derby has 100% higher throughput than MySQL
• MySQL performs better on small main-memory databases> Update-intensive load: Derby has 20-50% lower
throughput> Read-intensive load: Derby has 50% lower throughput
• PostgreSQL performs best on read-only databases, and has lowest throughput on update-intensive databases
Why?
CPU Usage Main-Memory Database
Derby embedded
Derby client/server
MySQL PostgreSQL
00.10.20.30.40.50.6
0.70.80.9
11.11.21.31.41.5
TPC-B
System time
User time
ms/
tran
sact
ion
Derby embedded
Derby client/server
MySQL PostgreSQL0
0.025
0.05
0.075
0.1
0.125
0.15
0.175
0.2
0.225
0.25
0.275
0.3
0.325
SELECT
System time
User time
ms/
tran
sact
ion
CPU Usage Disk-Based Database
Derby embedded
Derby client/server
MySQL PostgreSQL0
0.1
0.20.3
0.40.5
0.6
0.70.8
0.91
1.11.2
1.31.4
1.5
CPU usage per transaction
System time
User time
ms/
tra
nsa
ctio
n
Derby embedded
Derby client/server
MySQL PostgreSQL0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
CPU usage per transaction
System time
User time
ms/
tra
nsa
ctio
n
TPC-B SELECT
Network I/O
Derby client/server
MySQL PostgreSQL0
0.51
1.52
2.53
3.54
4.55
5.56
6.57
7.58
8.59
Incoming packets
Outgoing packets
Pac
kets
per
tra
nsa
ctio
n
Derby client/server
MySQL PostgreSQL0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Incoming packets
Outgoing packets
Pac
kets
per
tra
nsa
ctio
n
TPC-B SELECT
Observation:• Derby has two extra round-trips for SELECT operations
Disk I/O: Main-Memory Database (TPC-B)
Derby MySQL PostgreSQL0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Write Operations
Data
Log
Writ
e o
pe
ratio
ns
pe
r tr
an
sact
ion
Derby MySQL PostgreSQL0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Write Volume
KB
per
tran
sact
ion
Disk I/O: Disk-Based Database (TPC-B)
Derby MySQL PostgreSQL0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
Data Volume
KB
pe
r tr
ansa
ctio
n
Derby MySQL PostgreSQL0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
I/O Operations
Log Data write Data read
Ope
ratio
ns p
er tr
ans
act
ion
Disk I/O: Disk-Based Database (SELECT)
Derby MySQL PostgreSQL0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
Data Volume
KB
pe
r tr
ansa
ctio
n
Derby MySQL PostgreSQL0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
2.75
I/O Operations
Data write Data read
Ope
ratio
ns p
er tr
ansa
ctio
n
Conclusions: Resource Usage
• MySQL performs better than Derby when> The database is small and fits in the database buffer> Throughput becomes CPU-bound> Derby uses more CPU and sends more messages over the net
• Derby performs better than MySQL when> The database is large and does not fit in the database buffer> Throughput becomes I/O-bound
• PosgreSQL performs best with read-only load> Update-intensive load results in much disk I/O
Performance Improvement Activities
CPU usage:• High CPU usage during
initialization of database buffer (DONE)
• Reduce CPU usage in network server (message parsing and generation)
• Improve use of hash tables and reduce lock contention
Network IO:• Reduce number of
messages sent and received
Disk IO:• Improve fairness of
scheduling of I/O requests (DONE)
• Allow concurrent read operations on data files
Performance tip:• Next version of Derby will have even better performance
Summary
• Disk Write Cache: > Improves the performance, but.....> WARNING: durability and the probability of successful recovery
after a power failure are reduced
• Data and log devices:> Keep the transaction log and database on a separate devices
• Database buffer:> Keep frequently accessed data in the database buffer
• Client-Server versus Embedded:> Throughput: down by 5% (TPC-B) to 30% (SELECT)
• Use:> Prepared statements and indexes
Olav Sandstå, Dyre Tjeldvoll, Knut Anders [email protected], [email protected], [email protected]
Questions?