the linux virtual memory system - red hat · 2018-12-12 · 4 virtual memory system (simplified)...
TRANSCRIPT
![Page 1: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/1.jpg)
The Linux Virtual Memory System
Patrick LaddTechnical Account [email protected] / [email protected]
NY Red Hat Users GroupJune 8, 2016
Slides are available at http://people.redhat.com/pladd/
![Page 2: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/2.jpg)
2
Topics
● Evolution of the Linux Memory System● Evolution of system memory architectures● Latest innovations in Linux Memory System
● NUMA● Hugepages
![Page 3: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/3.jpg)
Virtual Memory
![Page 4: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/4.jpg)
4
Virtual Memory System (simplified)Click to add subtitle
Virtual pages● Cost “nothing”
● Basically unlimited on 64 bit architecture
Physical Pages● RAM of system
● Cost money!
PageTables● Virtual Physical Mapping→
Virtual pages
Physical Pages
PageTables
![Page 5: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/5.jpg)
5
PageTablesClick to add subtitle
pgd
pud pud
pmd pmd pmd pmd
...
... ...
pte pte pte pte pte... ...
Data Structure● Common code and x86 formats are the
same● All 4K in size● 2^9 (512) pointers per table● grep PageTables /proc/meminfo
Accessible Memory● Total:
● (2^9)^4 ** 4096 48 bits→
● 281.5 TBpte pte... pte...
![Page 6: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/6.jpg)
6
Data structures for connecting
Hardware constrained structures● pagetables
Software abstractions● Tasks
● Processes
● Virtual memory areas (VMAs)
● mmap (glibc malloc())
Virtual Memory FabricClick to add subtitle
to
![Page 7: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/7.jpg)
7
Physical Page and struct pageClick to add subtitle
PAGE_SIZE (4K)struct page64 bytes
Physical RAM ...mem_map
64 / 4096(1.56% of RAM)
![Page 8: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/8.jpg)
Memory Management Algorithms
![Page 9: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/9.jpg)
9
Memory System Heuristics
● Algorithms doing computations on memory fabric structures● Need to solve hard problems with no guaranteed perfect solution
● When is the right time to unmap pages? (swappiness)● Which page should I page out?
● Some design is unchanged● Measurement of how hard it is to free memory● Free memory used as cache● Overcommit by default
![Page 10: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/10.jpg)
10
Page reclaim clock algorithmEarly kernels (~2.2) Early '90sSmall system RAM sizes
page[0]page[0]
page[n]m
em_m
ap
page
rec
laim
clo
ck a
lgor
ithm
![Page 11: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/11.jpg)
11
pgtable scan clock algorithmKernel 2.2 Mid 90's
page[0]page[1]
page[n]m
em_m
ap
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
page
rec
laim
clo
ck a
lgor
ithm
pgtable scan clock algorithm
pgtables
pgtablespgtables
pgtables
![Page 12: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/12.jpg)
12
Last Recently Used ListKernel 2.2 Late 90's
page[0]page[1]
page[n]m
em_m
ap
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
pgtables
pgtablespgtables
pgtables
page_lru
![Page 13: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/13.jpg)
13
Active and Inactive List LRUKernel 2.4 - 2001
The active page LRU preserves the the active memory working set
● Only the inactive LRU loses information as fast as use-once I/O goes
● Works well enough also with an arbitrary balance
● Active/inactive list optimum balancing algorithm was solved in 2012-2014
● Shadow radix tree nodes that detect re-faults (more patches last month)
![Page 14: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/14.jpg)
14
Active & Inactive LRU ListsKernel 2.4 Early 2000's
page[0]page[1]
page[n]m
em_m
ap
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
pgtables
pgtablespgtables
pgtables
active_lruinactive_lru
Use-once pages to trash
Use-many pages
![Page 15: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/15.jpg)
15
$ grep -i active /proc/meminfo Active: 11192976 kBInactive: 2643936 kBActive(anon): 10402692 kBInactive(anon): 2058248 kBActive(file): 790284 kBInactive(file): 585688 kB
![Page 16: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/16.jpg)
16
rmap Obsoletes pgtable Scan Clock AlgorithmClick to add subtitle
To free a candidate page, we must first drop all references to it (mark the page table entry non-present)
![Page 17: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/17.jpg)
17
MM & VMA
● mm_struct aka MM● Memory of a process● Shared by all threads
● vm_area_struct aka VMA● Virtual memory area● Created and torn down by mmap & munmap● Defines the virtual address space of a MM
![Page 18: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/18.jpg)
18
rmap Obsoletes pgtable Scan Clock AlgorithmClick to add subtitle
rmap – reverse mapping● Allows direct connection to
pagetables from any given physical page without scanning
![Page 19: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/19.jpg)
19
rmap after unmap eventClick to add subtitle
If the user space program access the page, it will trigger a pagein/swapin
Free
pag
e
Page faultswapin
Page faultswapin
![Page 20: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/20.jpg)
20
objrmap / anon-vma / ksmPhysical Page
Anonymous
Physical PageFilesystem
Physical PageAnonymous
Physical PageFilesystem
Physical PageKSM
anon_vma vma
inode(objrmap)
anon_vma
vma
vma
anon_vma
vma
prio_tree
vma
vma
![Page 21: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/21.jpg)
21
Active & Inactive + rmapKernel 2.6 Late 2000's
page[0]page[1]
page[n]m
em_m
ap
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
proc
ess
virt
ual m
emor
ypr
oces
svi
rtua
l mem
ory
pgtables
pgtablespgtables
pgtables
active_lruinactive_lru
Use-once pages to trash
Use-many pages
![Page 22: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/22.jpg)
22
/proc/meminfoMemTotal: 16054316 kBMemFree: 1390476 kBMemAvailable: 2946740 kBBuffers: 112292 kBCached: 2652012 kBSwapCached: 39876 kBActive: 11228856 kBInactive: 2087104 kBActive(anon): 10440148 kBInactive(anon): 1710336 kBActive(file): 788708 kBInactive(file): 376768 kBUnevictable: 3188 kBMlocked: 3188 kBSwapTotal: 8069116 kBSwapFree: 6731456 kBDirty: 23076 kBWriteback: 0 kBAnonPages: 10531332 kBMapped: 844892 kBShmem: 1599248 kBSlab: 862968 kB
SReclaimable: 644192 kBSUnreclaim: 218776 kBKernelStack: 23776 kBPageTables: 147796 kBNFS_Unstable: 0 kBBounce: 0 kBWritebackTmp: 0 kBCommitLimit: 16096272 kBCommitted_AS: 32734512 kBVmallocTotal: 34359738367 kBVmallocUsed: 763212 kBVmallocChunk: 34358783056 kBHardwareCorrupted: 0 kBAnonHugePages: 1280000 kBHugePages_Total: 0HugePages_Free: 0HugePages_Rsvd: 0HugePages_Surp: 0Hugepagesize: 2048 kBDirectMap4k: 305040 kBDirectMap2M: 15040512 kBDirectMap1G: 1048576 kB
![Page 23: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/23.jpg)
23
More recent changes: Many More LRUs
● Separated LRU for anon and file backed mappings● memcg (memory cgroups) introduced per-memcg LRUs● Removal of un-freeable pages from LRUs
● anonymous memory with no swap● mlocked memory
● Transparent Hugepages in the LRU increase scalability further (lru size decreased 512 times)
![Page 24: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/24.jpg)
CPU Memory Architectures
![Page 25: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/25.jpg)
25
Older ArchitecturesDirect memory bus
CPU
Memory
I/O
Main Bus
![Page 26: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/26.jpg)
26
Older ArchitecturesNorthbridge / Southbridge memory bus
CPU
Memory
I/ONorthbridgeFSB Southbridge
HS I/O(graphics / PCI-E)
![Page 27: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/27.jpg)
27
NUMA ArchitectureMulti-core Multi-Bus Architecture
CPU
Memory
I/O
CPU CPU
CPU
Memory
MemoryMemory
I/O I/O
I/O
![Page 28: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/28.jpg)
28
NUMA – Non-Uniform Memory Architecture
● Multiple Nodes in a NUMA System
● Each Node
● CPU
● Memory
● PCI/Devices
● All Nodes are interconnected
● Interconnects are slow
● Why NUMA?
![Page 29: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/29.jpg)
Linux and NUMA
![Page 30: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/30.jpg)
30
{MPOL_DEFAULT, /* no numactl */MPOL_PREFERRED, /* --preferred=node */MPOL_BIND, /* default numactl /MPOL_INTERLEAVE, /* --interleave=nodes */MPOL_LOCAL, /* --localalloc */
}
NUMA memory policies
![Page 31: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/31.jpg)
31
Node #0
Virt startup on CPU #0MPOL_DEFAULT
RAM #0
CPU #0
Node #1
RAM #1
CPU #1
KVM
![Page 32: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/32.jpg)
32
Node #0
Virt Mach allocates from RAM #0MPOL_DEFAULT, no bindings
RAM #0
CPU #0
Node #1
RAM #1
CPU #1
KVM
Guest RAM
![Page 33: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/33.jpg)
33
Node #0
Scheduler CPU migration to #1MPOL_DEFAULT, no bindings
RAM #0
CPU #0
Node #1
RAM #1
CPU #1
KVM
Guest RAM
gccgcc
![Page 34: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/34.jpg)
34
Node #0
Load goes awayLinux scheduler is blind – KVM will stay on CPU#1 with slow memory access
RAM #0
CPU #0
Node #1
RAM #1
CPU #1
KVM
Guest RAM
![Page 35: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/35.jpg)
35
Hard NUMA bindings
● man numactl
● man numastat
● Kernel API
● sys_mempolicy
● sys_mbind
● sys_sched_setaffinity
● sys_move_pages
● /dev/cpuset
● Full topology available in /sys
Numad can use the kernel API to monitor memory pressure and act accordingly
![Page 36: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/36.jpg)
36
# numastat -c qemu-kvmPer-node process memory usage (in Mbs)PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----51722 (qemu-kvm) 68 16 357 6936 2 3 147 598 812851747 (qemu-kvm) 245 11 5 18 5172 2532 1 92 807653736 (qemu-kvm) 62 432 1661 506 4851 136 22 445 811653773 (qemu-kvm) 1393 3 1 2 12 0 0 6702 8114--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----Total 1769 463 2024 7462 10037 2672 169 7837 32434
No NUMA affinity
![Page 37: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/37.jpg)
37
# numastat -c qemu-kvmPer-node process memory usage (in Mbs)PID Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Total--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----51722 (qemu-kvm) 0 0 7 0 8072 0 1 0 808051747 (qemu-kvm) 0 0 7 0 0 0 8113 0 812053736 (qemu-kvm) 0 0 7 0 0 0 1 8110 811853773 (qemu-kvm) 0 0 8050 0 0 0 0 0 8051--------------- ------ ------ ------ ------ ------ ------ ------ ------ -----Total 0 0 8072 0 8072 0 8114 8110 32368
NUMA affinity
![Page 38: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/38.jpg)
38
Automatic NUMA balancingIntroduces “gravity” between CPU and memory
CPU
CPU
Memory
Memory
Memory attracts CPUAGGRESSIVELY● If idle load
balancing permits
CPU attracts memory slowly● If anti-false sharing
permits
![Page 39: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/39.jpg)
39
NUMA exampleStartup state
RAM #0 RAM #1
RAM RAM
Thread 1Thread 2
Thread 1Thread 2
![Page 40: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/40.jpg)
40
NUMA exampleConverged state
RAM #0 RAM #1
RAM RAM
Thread 1 Thread 2Thread 1 Thread 2
![Page 41: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/41.jpg)
41
NUMA threading exampleStartup state
RAM #0 RAM #1
RAM
Thread 1 Thread 3 Thread 4Thread 2
![Page 42: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/42.jpg)
42
NUMA threading exampleConverged state
RAM #0 RAM #1
RAM
Thread 3 Thread 4
RAM
Thread 2Thread 1
![Page 43: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/43.jpg)
43
Automatic NUMA balancing benchmark
● Intel SandyBridge (Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz)2 Sockets – 32 Cores with Hyperthreads256G Memory
● Software:● RHEV 3.6
● Host bare metal – 3.10.0-327.el7 (RHEL7.2)
● VM guest – 3.10.0-324.el7 (RHEL7.2)
● VM – 32P , 160G (Optimized for Server)
● Oracle – 12C , 128G SGA
● Storage – Violin 6616 – 16G Fibre Channel
● Test – Running Oracle OLTP workload with increasing user count and measuring Trans / min for each run as a metric for comparison
![Page 44: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/44.jpg)
44
4VMs with different NUMA options
![Page 45: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/45.jpg)
45
Automatic NUMA Balancing Configuration
● In RHEL7 Automatic NUMA balancing is enabled when:
# numactl --hardware shows multiple nodes
● To disable automatic NUMA balancing:
# echo 0 > /proc/sys/kernel/numa_balancing
● To enable automatic NUMA balancing:
# echo 1 > /proc/sys/kernel/numa_balancing
● At boot:
numa_balancing=enable|disable
![Page 46: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/46.jpg)
46
Automatic NUMA balancing benchmark
● Intel SandyBridge (Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz)2 Sockets – 32 Cores with Hyperthreads256G Memory
● Software:● RHEV 3.6
● Host bare metal – 3.10.0-327.el7 (RHEL7.2)
● VM guest – 3.10.0-324.el7 (RHEL7.2)
● VM – 32P , 160G (Optimized for Server)
● Oracle – 12C , 128G SGA
● Storage – Violin 6616 – 16G Fibre Channel
● Test – Running Oracle OLTP workload with increasing user count and measuring Trans / min for each run as a metric for comparison
![Page 47: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/47.jpg)
HugePages
![Page 48: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/48.jpg)
48
HugePages
● Traditionally x86 hardware gave us 4KiB pages
● The more memory the bigger the overhead in managing 4KiB pages
● What if you had bigger pages?
● 512 times bigger 2MiB→
![Page 49: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/49.jpg)
49
Why HugePages?
● Improve CPU performance
● Enlarge TLB size
● Speed up TLB miss
● Need 3 accesses to memory instead of 4 to refill the TLB
● Faster to allocate memory initially (minor)
● Page colouring inside the hugepage (minor)
● Higher scalability of the page LRUs
● Cons
● clear_page/copy_page less cache friendly
● higher memory footprint sometime
● Direct compaction takes time
![Page 50: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/50.jpg)
50
TLB Miss cost: # of memory accesses
![Page 51: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/51.jpg)
Conclusion & Questions
![Page 52: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/52.jpg)
52
Recent Trends
● Large Memory Usage processes HugePages (4KB 2MB)→ →
● Programs or Virtual Machines duplicating Memory KSM→
● Optimization of workloads for you, without manual tuning
● Automatic NUMA balancing
● Transparent HugePages
● Page pinning MMU notifier→
● Direct Managed private device memory (i.e. GPU) UVM (unified virtual memory)→
● 4th Layer of pagetables
● pagetables in high memory region
![Page 53: The Linux Virtual Memory System - Red Hat · 2018-12-12 · 4 Virtual Memory System (simplified) Click to add subtitle Virtual pages Cost “nothing” Basically unlimited on 64 bit](https://reader034.vdocuments.net/reader034/viewer/2022042117/5e9580ec752fa40ddd48e83f/html5/thumbnails/53.jpg)
THANK YOU
plus.google.com/+RedHat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNewslinkedin.com/company/red-hat
Slides are available at http://people.redhat.com/pladd/