non-volatile memory logging - pgcon€¦ · • combination of existing technologies -- dram, ssd,...
Post on 29-Sep-2020
0 Views
Preview:
TRANSCRIPT
Non-volatile Memory (NVM) Logging
2016.05.20Takashi HORIKAWA
1
Who am IName
Takashi HORIKAWA, Ph. D.
Research interestsPerformance evaluation of computer & communication systems, including
performance engineering of IT systemswith slightly shifting the focus of the research to CPU scalability
PapersLatch-free data structures for DBMS: design, implementation, and evaluation, SIGMOD ‘13An Unexpected Scalability Bottleneck in a DBMS: A Hidden Pitfall in Implementing Mutual Exclusion, PDCS ‘11An approach for scalability-bottleneck solution: identification and elimination of scalability bottlenecks in a DBMS, ICPE '11A method for analysis and solution of scalability bottleneck in DBMS, SoICT '10
2
Contents
• Introduction• Problems to be solved• Implementation• Evaluation• Technical trends• Conclusion
3
Write Ahead LoggingSync. vs Async. CommitDifference in PerformanceFundamental idea for NVM LoggingByte or Block Addressable NVMByte addressable NVMs
Introduction
4
Write Ahead Logging
Worker process
Shared buffer WAL buffer
Asynchronous Write Synchronous Write
Data files WAL files
Storage
Worker processWorker processWorker process
Memory
TableIndex
XLog
Transactionprocessing
Widely used method to make transaction durable
5
XLog recordTable, Index
Write Ahead Logging
Worker process
Shared buffer WAL buffer
Asynchronous Write Synchronous Write
Data files WAL files
Storage
Worker processWorker processWorker process
Memory
TableIndex
XLog
Write latency is included in transaction response
Overhead
Transactionprocessing
Widely used method to make transaction durable
6
XLog recordTable, Index
Sync. vs Async. Commit
7
Transaction processing WAL write (I/O)
t
Transaction becomes durable
Sync. vs Async. Commit
8
Transaction processing WAL write (I/O)
t
Notifying the client of the transaction commit
Sync. commit
Transaction becomes durable
In sync. committhe client is notified of the transaction commit after the transaction becomes durable.
Durability : OKResponse : Slow
Sync. vs Async. Commit
9
Transaction processing WAL write (I/O)
t
Notifying the client of the transaction commit
Async. commit
Transaction becomes durable
In async. committhe client is notified of the transaction commit before the transaction becomes durable.
Durability : NGResponse : Fast
Difference in Performance
10
0 40 80 120 1600
10000
20000
30000
40000
50000
ClientsTh
roug
hput
Async
Sync
0 40 80 120 1600
10000
20000
30000
40000
50000
Clients
Thro
ughp
ut (T
PS)
Async
Sync
(PGBENCH)
Disk-drive cache off Disk-drive cache on
Fundamental idea for NVM Logging
Worker process
Shared buffer WAL buffer
Asynchronous Write Synchronous Write
Data files WAL files
Storage
Worker processWorker processWorker process
Memory
TableIndex
XLog
Transactionprocessing
Non-volatile
11
XLog recordTable, Index
Volatile
Fundamental idea for NVM Logging
Worker process
Shared buffer WAL buffer
Asynchronous Write Asynchronous Write
Data files WAL files
Storage
Worker processWorker processWorker process
Memory
TableIndex
XLog
Transactionprocessing
Volatile
12
XLog recordTable, Index
XLog record is stored as non-volatile memory
NVM
Non-volatile
Byte or Block Addressable NVM
13
CPU
Memory
Store instruction
Storage
HD, SSD
I/O operation
Accessed in units of byte
Accessed in units of block
Byte addressable NVM
Block addressable NVM(Commonly used already)
(Expected to be used from now on)
Key device in NVM Logging
Byte addressable NVMs• Combination of existing technologies -- DRAM, SSD, battery (NVDIMM)
– AgigA Tech (Micron Technology)– Viking Techonogy– SK Hynix
• Use of a new memory cell (Storage Class Memory)– Phase Change Memory (PCM)– Magnetic Random Access Memory (MRAM)– Ferroelectric RAM (FRAM)– The memristor
14
Capacity
Small
Large
Access time
~100nS
~μS
Ready-to-use
Near future (?)
Availability
Fundamental idea for NVM Logging (Again)It is not simple than it looksNecessary condition for RecoveryProblems
Partial writeUnreachable XLog RecordCPU cache effect
Problems to be solved
15
Fundamental idea for NVM Logging
Worker process
Shared buffer WAL buffer
Asynchronous Write Asynchronous Write
Data files WAL files
Storage
Worker processWorker processWorker process
Memory
TableIndex
XLog
Transactionprocessing
Volatile
16
XLog recordTable, Index
XLog record is stored as non-volatile memory
NVM
Non-volatile
(Again)
It is not simple than it looks
• Naive implementation of NVM Logging– Allocating WAL buffer in NVM area– Using asynchronous commit mode
• Problems are:– Partial write– Unreachable XLog Record– CPU cache effect
17
Not sufficient
Necessary condition for Recovery
• If the recovery process reads all XLR of transaction m correctly
transaction m is possible to be recovered
• Else transaction m will be lost
18
LSN
XLRm,N-2 XLRm,N-1 XLRm,NXLog
A worker process finishes the commit of the transaction after all XLRs of the transaction are stored in the non-volatile memory.
Partial write
• The recovery process will read an incomplete XLRif the system crashes in the middle of writing a XLR
19
XLR 1
WAL buffer
LSN
(CRC in the XLR may be effective but it is not perfect)
Unreachable XLog Record
• The recovery process cannot find a XLR of a committed transaction– A worker process finishes writing of XLR 3 before
another worker process begin to write XLR 2
20
XLR 1
WAL bufferLSN
XLR 2 XLR 3
Length1
XLog reader finds the head position of a XLR by adding the head position of the previous XLR and its length
CPU cache effect
• The recovery process will read inconsistent XLR
21
CPU
Memory
Cache
Inconsistent
If the CPU internal cache uses write-back policy, a XLR written by the CPU does not reach to the memory immediately
Prototype architecturePreventing partial writePreventing an unreachable XLRUse of Write-combined modeGUC parameter for NVM LoggingAccessing NVM at recoveryWrap around of WAL buffer
Implementation
22
Prototype architecture
23
File system
PostgreSQL
open mmap
WAL buffer
Pseudo NVM(pram)
Shared buffer
userkernel
ext4 ext3 xfsext2pm
Modified
Newly added
Kernel module
Similar to RAM disk but CPU cache mode differs
(9.6devel at the middle of March)
Preventing partial write
24
XLog Record
Tail
Start
Reserve
Write data
t
LSNWrite len
New tail
1. Move the tail pointer to reserve buffer area
2. Write the XLR data other than length field
3. Write length field of the XLR
( At this point length field of XLR in XLog buffer is 0)
If XLog reader find a XLR whose length field is not zero,all XLR data is written in the XLog buffer.
Preventing an unreachable XLR
25
XLR 1Start
Reserve 1
tLSN
Write 1
Reserve 2
Write 2
Reserve 3
Write 3
Reserve 4
Write 4
XLR 2
XLR 3
WP waits until commit becomes able to be finished
WP finishes the commit as all previous XLRs are written
Write
Write Write
Write
Wait control
• A wait mechanism is already implemented in PostgreSQLstatic XLogRecPtr
WaitXLogInsertionsToFinish(XLogRecPtr upto)
If any XLR with a smaller LSN than the upto parameter is not finished to copy in the WAL buffer, the worker process sleeps until all of those XLRs are copied in the WAL buffer.
NVM Logging implements the wait mechanism by using this function
26
Use of Write-combined mode
27
File system
PostgreSQL
open mmap
WAL buffer
Pseudo NVM(pram)
Shared buffer
userkernel
ext4 ext3 xfsext2pm
Kernel module
Write-combined mode, which is a variation of write-through, is set for the memory pages for pseudo NVM.
GUC parameter for NVM Logging
• NVM Logging is enabled through one GUC parameter (described in postgresql.conf)
– PRAM_FILE_NAME = “NVM File name”
• When PRAM_FILE_NAME is set– XLOGShmemInit() invokes open() and mmap() to
“NVM File name” and uses the memory area for WAL buffer
– CopyXLogRecordToWAL() copies XLR in WAL buffer according to the procedure that prevents partial write
28
Accessing NVM at recovery
29
XLogReader
XLog buffer in NVM
n n+1 n+2 n+3n-1n-2
minLSN maxLSN
Segment number
WAL files
XLogReader accesses NVM XLog buffer to obtain XLog records whose LSN is between minLSN and maxLSN
LSN
Wrap around of WAL buffer
30
Logical view
LSN
NVM size
Saved in XLog files
InitializedUpto
Written in NVM only
EvaluatedUpto Current write point
Wrap around of WAL buffer
31
Logical view
LSN
NVM size
Saved in XLog files
InitializedUpto
Written in NVM only
EvaluatedUpto
Physical viewNVM size
InitializedUptoCurrent write point
EvaluatedUpto
LSN
WAL segment
Current write point
(n-1) th round
n th round
Experimental SetupPerformance
PGBENCHDBT-2
DurabilityDurability testResult
Write amplification Reduction
Evaluation
32
Experimental setup• DB server
– CPU: E5-2650 v2 x 2 (16 cores)– Memory: 64GB– Storage
• RAID0: 200GB SSD x 2 for data• RAID0: 1TB ATA HD x 4 for WAL
• Client– CPU: E7420 x 4 (16 cores)– Memory: 8GB
• Network– GB ether x 1
33
PGBENCH Performance
34
0 40 80 120 1600
10000
20000
30000
40000
50000
ClientsTh
roug
hput
(TPS
)
Async
NVM Logging
Sync
0 40 80 120 1600
10000
20000
30000
40000
50000
Clients
Thro
ughp
ut (T
PS)
Async
NVM
Log
ging
Sync
Disk-drive cache off Disk-drive cache on
DBT-2 Performance
35
0 10 20 30 40 500
100000
200000
300000
400000
Clients
Thro
ughp
ut (N
OTP
M)
Async
NVM Logging
Sync
0 10 20 30 40 500
100000
200000
300000
400000
ClientsTh
roug
hput
(NO
TPM
)
Async
NVM Loggingg
Sync
Disk-drive cache off Disk-drive cache on
Write amplification Reduction
36
Page of the storage
WriteWriteWriteWriteWrite
Write
The tail WAL block is written in at every commit
The tail WAL block is written in once
Sync.commit
NVM Logging
Durability test
37
table2table1
DBMS
key value(=Kp)
while (1) {BeginSelect value From table1 Where key = Kp(v = value + 1)Update table1 set value = v Where key = Kp
(… same for table 2)End(record v as committed transaction)
}
Fault
Transaction
Clientp
After the recovery, durability is examined by checking whether the value of table1 and table2 is equal andvalue is equal to or greater than v that each client recorded as the result of the last transaction.
Results
• The results were just what we expected
– Durability is ensured• Sync. commit, NVM Logging
– Durability is not ensured• Async. commit
38
NVDIMM for DB serversProgramming support for NVM
Technical trends
39
NVDIMM in DB Servers
• NVDIMM-N Standardization– JEDEC Hybrid Memory Task Group– SNIA NVDIMM SIG
• Server product: HP ProLiant XL230a Server– Up to 2 Intel® Xeon® E5-2600 v3 Series, 6/8/10/12/14/16
Cores (16) – DDR4, (512GB max), support for NVDIMM– …
40
http://community.hpe.com/t5/Servers-The-Right-Compute/Address-your-Compute-needs-with-HP-ProLiant-Gen9/ba-p/6794213#.VybkulK3GA9
DB server with NVDIMM is just around the corner!!
Programming support for NVM
• pmem.io– The Linux NVM Library builds on the Direct Access
(DAX) changes under development in Linux.– This project focuses specifically on how persistent
memory is exposed to server-class applications which will explicitly manage the placement of data among the three tiers (volatile memory, persistent memory, and storage).
41
http://pmem.io/
Conclusion
• NVM is becoming commodity– NVDIMM is already shipped as a product– Servers began to equipped with NVDIMM
• Benefits of NVM Logging– Performance improvement– Durability ensurance– Write amplitude reduction
42
Similar to sync. commit
Almost the same as async. commit
Good for SSD lifetime
Future work
• Bring to a state acceptable for the mainline – Cope with standard for NVM access
• libpmem is a promising candidate
– Check the operation in using real NVM
43
44
That’s it
Thank you for listening
top related