updates

34
06/21/11 UPDATES Isa Ansharullah

Upload: isa-ansharullah

Post on 11-May-2015

756 views

Category:

Technology


3 download

DESCRIPTION

asdf

TRANSCRIPT

Page 1: Updates

06/21/11

UPDATESIsa Ansharullah

Page 2: Updates

06/21/11

PROCESS in LINUX

• An executing program• Described in task_struct structure that is

stored in a circular linked list.– A process descriptor – or Process Control – Block (PCB)

• Created via Slab • Allocator

Linux Kernel Development, Robert Love p.25

Page 3: Updates

06/21/11

<include/sched.h> struct task_struct

• 1.6~2KB size (x86 – varies per arch)• Contains :

– pid_t PID– Open files– Process’ address space (struct mm_struct *mm)– Process’ state (Waiting, Running, Ready)– Process’ stack address (void* stack)– Others (parent’s PID, –

Page 4: Updates

06/21/11

PROCESS’ STACKS

• Each process has two kind of stacks :– User-space stack : This can expand– Kernel stack : This is FIXED size

• Mode switch: Switch from user stack to kernel stack by System call or Exception handlers.

• Context switch: Suspending the progress of one process, switching to another process during kernel mode

Page 5: Updates

06/21/11

Process’ Kernel Stack (1)

• It is stored in kernel area of physical memory: physically contiguous, non-swappable– Make it as small as possible, fixed size to prevent

fragmentation & hazard (if expanding)

Page 6: Updates

06/21/11

Process’ Kernel Stack (2)

Professional Linux Kernel Architecture, Wolfgang Mauer p.71

task_struct is referenced via thread_info structure at the bottom of kernel stack (to provide fast access)

Page 7: Updates

06/21/11

PROCESS DUPLICATION

• There are actually 3 approaches :– fork() : Heavy-weight call (copy entirely)

• Allow Copy-on-write

– vfork() : Light-weight call (shares resources)• Since fork() implement COW, this has no

meaning

– clone() : Allow to choose which to share

• Fork in Linux is implemented via clone()– clone() takes flags of which resources should be

shared

Page 8: Updates

06/21/11

Copy-on-write

• Usually, after forking, the child will call exec() that will replace resources copied from parent– This is inefficient

• Copy-on-write: The child will have the copy of the resources only if the shared data is written into (by either parent or child)

Page 9: Updates

06/21/11

FORKING PROCESSfork()

clone()

do_fork()defined in <kernel/fork.c>architecture-independent

copy_process()

The actual work of duplicating process

Takes several flags aboutresources sharing, etc.

If new child is returned successfully,The new child is woken up and run. In the common case, child will callexec() immediately, thus no overheadcost on copying. Thanks to Copy-on-write.

Page 10: Updates

06/21/11

<kernel/fork.c > copy_process()

Creates a new kernel stack, task_struct, thread_info struct-tures similar to its parent’s

Copy/Share resources

Professional Linux Kernel Architecture, p73Professional Linux Kernel Architecture, p68

Page 11: Updates

06/21/11

PROCESS vs THREAD

• In Linux THREAD is treated as PROCESS– THREAD = PROCESS

• THREAD– Process that shares resources with its parent

clone ( CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0)

Page 12: Updates

06/21/11

Questions (06/13/11)

• How does user-space stack allocated?– created during fork process, see kernel/fork.c,

dup_mmap (line 288)

• What are namespaces?– still no clue, the description is quite complex, some

clue :• Professional linux kernel arch, p.47• <linux_source>/Documentation/unshare.txt

Page 13: Updates

1306/21/11

Allocating Process Descriptors (task_struct)

• Where are PDs’ stored?– Inside the kernel’s address space, in a task list, with

init_task created on boot (init process descriptor)

• How are task_struct structures allocated?– By using the SLAB ALLOCATOR

Understanding Linux Kernel (3rd Edition)

Page 14: Updates

1406/21/11

Slab allocator & Buddy Allocator

• Linux uses both of the allocators– Buddy allocator manages physical memory in

pages (8KB) – Slab allocator is to enhance memory allocation of

small, frequently-used data structures (< sizeof page)

• task_struct

Takuo Watanabe, Operating System Lecture slide, “Buddy System”

Page 15: Updates

The Slab Allocator

• Memory allocation of kernel objects• Retaining allocated memory that contains a

data objects of certain type for reuse

07/04/11

Proposed by Jeff Bonwick (Sun Micro), read The Slab Allocator: An Object-Caching Kernel Memory Allocator (google it!)

Kernel objects : inode structure, task_struct, vm_area_struct etc..

Kernel objects : inode structure, task_struct, vm_area_struct etc..

Page 16: Updates

1606/21/11

Not only task_struct...

( from kernel/fork.c )

many structures are also allocated using the slab allocator

Page 17: Updates

Basis

• The initialization and destruction of objects can outweigh the cost of allocating them

• Object caching is used to mitigate ( 緩和)the overhead cost of initializing objects

• Also to avoid internal fragmentation of memory (i.e. memory allocated but not used, happens in Buddy Allocator)

07/04/11

Page 18: Updates

Overview

07/04/11

The memory is organized in caches, one cache for each object type. (e.g.

inode_cache, dentry_cache, buffer_head, vm_area_struct) . Each

cache consists out of many slabs (they are small (usually one page long) and

always contiguous), and each slab contains multiple initialized objects.

The memory is organized in caches, one cache for each object type. (e.g.

inode_cache, dentry_cache, buffer_head, vm_area_struct) . Each

cache consists out of many slabs (they are small (usually one page long) and

always contiguous), and each slab contains multiple initialized objects.

From linux/mm/slab.c header comment :

Page 19: Updates

include/linux/slab_def.h | struct kmem_cache

07/04/11

struct kmem_cache { struct array_cache *array[NR_CPUS]; unsigned int batchcount; unsigned int limit; unsigned int shared;

unsigned int buffer_size; u32 reciprocal_buffer_size; unsigned int flags; unsigned int num; unsigned int gfporder; gfp_t gfpflags; size_t colour; unsigned int colour_off; struct kmem_cache *slabp_cache; unsigned int slab_size; unsigned int dflags;

void (*ctor) (void *obj);

const char *name; struct list_head_next;

struct kmem_list3 *nodelists[MAX_NUMNODES];}

struct kmem_cache { struct array_cache *array[NR_CPUS]; unsigned int batchcount; unsigned int limit; unsigned int shared;

unsigned int buffer_size; u32 reciprocal_buffer_size; unsigned int flags; unsigned int num; unsigned int gfporder; gfp_t gfpflags; size_t colour; unsigned int colour_off; struct kmem_cache *slabp_cache; unsigned int slab_size; unsigned int dflags;

void (*ctor) (void *obj);

const char *name; struct list_head_next;

struct kmem_list3 *nodelists[MAX_NUMNODES];}

Page 20: Updates

2006/21/11

/proc/slabinfo - seeing the caches

from : isa’s personal VPS @ webbynode.com

Page 21: Updates

2106/21/11

From allocating to freeing (1)

• Allocating task_struct– copy_process() calls

dup_task_struct(current), this function instructs slab allocator to allocate an instance of task_struct (also thread_info), a direct copy of current task, the parent process.

Page 22: Updates

2206/21/11

kernel/fork.c | dup_task_struct()

( continues.. )

Page 23: Updates

2306/21/11

From allocating to freeing (2)

• Freeing task_struct– When the process calls exit() syscall, the process

do the following :• after all objects associated with the process (address

space, open files..) is freed, the process enters zombie state (exit_state = EXIT_ZOMBIE)

• inform the parent that its life has ended• returns task_struct via release_task(), which calls

put_task_struct() to its slab cache.

Page 24: Updates

2406/21/11

Slob & Slub allocators (not yet covered)

• Slob allocator : List of blocks, optimized for large-scale system

• Slub allocator : Optimized for embedded system

Page 25: Updates

This week’s Updates

• The Buddy Allocator

25

Page 26: Updates

The Buddy Allocator (mm/page_alloc.c)

• “Page frame” (physical page) memory management

• All allocations must go through this system• Implemented to prevent external

fragmentation of memory :• Free spaces become divided into small fragments,

scatters around here and there

26

Page 27: Updates

Basics

• All free page frames are grouped into lists• Each list contain 2order-sized contiguous page

frames (alloc_pages(gfp_mask, order))• There are 11 lists :

– 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 page-frame list ( from order 0…10)

27

Page 28: Updates

cat /proc/buddyinfo

28

0 1 2 3 4 5 6 7 8 9 10order

Showing available memory blocks in each zones

Page 29: Updates

Example|Allocating 256 page (1MB)1. Look into 256 page-frame list, if available, allocate.

2. If not, look into next larger block, 512 page-frame list1. If exist, divide it, allocate 256 page, the remaining 256 page-

frame goes to 256 page-frame list2. If not, look into next larger block, 1024 page-frame list

1. If exist, allocate 256 page, move the remaining 512 page-frame to 512 page-frame list, and the remaining 256 page-frame to 256 page-frame list

2. If not, the algorithm gives out error (1024 is the largest block already)

29

Page 30: Updates

Freeing page• The kernel attempts to merge pairs of free buddy blocks of size b

together into a single block of size 2b, to blocks are considered buddy if :– Both have the same size b– They are located in contiguous physical address (neighbors)

• The algorithm iterates until it becomes the biggest block (1024 block), or find non-free neighboring block

30

Page 31: Updates

Disadvantage

• Happens to create internal fragmentation, having to allocate a block of memory even though the required size is less than that– E.g. To allocate 275 page, 512 page is used, wasting 237 page.

• This lost can be minimized using Slab Allocator (explained)

31

Page 32: Updates

Process’s Address Space

• Defined by mm_struct structure• Pointer to mm_struct is in every process

descriptor• Can be shared among processes (thus creating

what we call threads)• Is shared with its parent before Copy-on-Write• Consist of contiguous virtual memory blocks

32

Page 33: Updates

/bin/gonzo’s address space

33

http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory

Page 34: Updates

pmap <pid>

34

Linux kernel Development, Robert Love p.314

library

ELF

library