Download Memory

Post on 20-Nov-2014




1 download

Embed Size (px)




  • 1. Memory Management in Linux Anand Sivasubramaniam

2. Two Parts

  • Architecture Independent Memory
    • Should be flexible and portable enough across platforms
  • Implementation for a specific architecture

3. Architecture Independent Memory Model

  • Process virtual address space divided into pages
  • Page size given in PAGE_SIZE macro in asm/page.h
    • (4K for x86 and 8K for Alpha)
  • The pages are divided between 4 segments
  • User Code, User Data, Kernel Code, Kernel Data
  • In User mode, access only User Code and User Data
  • But in Kernel mode, access also needed for User Data


  • put_user(), get_user(), memcpy_tofs(), memcpy_fromfs()allow kernel to access user data (defined in asm/segment.h)
  • Registerscsanddspoint to the code and data segments of the current mode
  • fspoints to the data segment of the calling process in kernel mode.
  • Get_ds(), get_fs(),andset_fs()are defined inasm/segment.h


  • Segment + Offset = 4 GB Linear address (32 bits)
  • Ofthis,user space = 3 GB(defined byTASK_SIZEmacro) and kernel space = 1GB
  • Linear Address converted to physical address using 3 levels

Index into Page Dir. Index into Page Middle Dir. Indexinto Page Table Page Offset 6. Page Dir. And Middle Dir. Access Functions(inasm/page.handasm/pgtable.h )

  • Structurespgd_tandpmd_tdefine an entry of these tables.
  • pgd_alloc_alloc()/pgd_free()to allocate and free a page for the page directory
  • pmd_alloc(),pmd_alloc_kernel()/pmd_free(),pmd_free_kernel()allocate and free a page middle directory in user and kernel segments.
  • pgd_set(),pgd_clear()/pmd_set(),pmd_clear()set and clear a entry of their tables.
  • pgd_present()/pmd_present()checks for presence of what the entries are pointing to.
  • pgd_page()/pmd_page() returns the base address of the page to which the entry is pointing
  • ..

7. Page Table Entry ( pte_t )

  • Attributes
  • Presence (is page present in VAS?)
  • Read, Write and Execute
  • Accessed ? (age)
  • Dirty
  • Macros ofPgprot_type
    • PAGE_NONE(invalid)
    • PAGE_SHARED(read-write)
    • PAGE_COPY/READ_ONLY (read only, used by copy-on-write)
    • PAGE_KERNEL (accessibe only by kernel)

8. Page Table Functions

  • mk_pte(), Pte_clear(), set_pte()
  • pte_mkclean(), pte_mkdirty(), pt_mkread(),.
  • pte_none()(check whether entry is set)
  • pte_page()(returns address of page)
  • pte_dirty(), pte_present(),pte_young(), pte_read(), pte_write()

9. Process Address Space (not to scale!) Kernel 0xC0000000 File name, Environment Arguments Stack bss _end _bss_start Data _edata _etext Code Header 0x84000000 Shared Libs 10. Address Space Descriptor

  • mm_structdefined in the process descriptor. (inlinux/sched.h )
  • This is duplicated if CLONE_VM is specified on forking.
  • struct mm_struct {
    • int count;// no. of processes sharing this descriptor
    • pgd_t *pgd;//page directory ptr
    • unsigned long start_code, end_code;
    • unsigned long start_data, end_data;
    • unsigned long start_brk, brk;
    • unsigned long start_stack;
    • unsigned long arg_start, arg_end, env_start, env_end;
    • unsigned long rss;// no. of pages resident in memory
    • unsigned long total_vm; // total # of bytes in this address space
    • unsigned long locked_vm;// # of bytes locked in memory
    • unsigned long def_flags;// status to use when mem regions are created
    • struct vm_area_struct *mmap;// ptr to first region desc.
    • struct vm_area_struct *mmap_avl;// faster search of region desc.
    • }

11. Region Descriptors

  • Why even allocate all of the VAS? Allocate only on demand.
  • Use region descriptors for each allocated region of VAS
  • Map allocated but unused regions to same physical page to save space.
  • struct vm_area_struct {
    • struct mm_struct *vm_mm; // descriptor of VAS
    • unsigned long vm_start, vm_end; // of this region
    • pgprot_t vm_page_prot; // protection attributes for this region
    • short vm_avl_height;
    • struct vm_avl_left;
    • vm_area_struct *vm_avl_permission;// right hand child
    • vm_area_struct * vm_next_share, *vm_prev_share; // doubly linked
    • vm_operations_struct *vm_ops;
    • struct inode *vm_inode;// of file mapped, or NULL = anonymous mapping
    • unsigned long vm_offset; // offset in file/device
    • }


  • Ifvm_inodeis NULL (anonymous mapping), all PTEs for this region point to the same page.
  • If the process does a write to any of these pages, the faulting mechanism creates a new physical page (copy-on-write).
  • This is used by thebrk()system call.
  • Operations specific to this region (including fault handling) are specified invm_operations_struct .
  • Hence, different regions can have different functions.


  • Struct vm_operations_struct {
    • void (*open)(struct vm_area_struct *);
    • void (*close)(struct vm_area_struct *);
    • void (*unmap)();
    • void (*protect)()
    • void (*sync)();
    • unsigned long (*nopage)(struct vm_area_struct *, unsigned long address, unsigned long page, int write_access);
    • void (*swapout)(struct vm_area_struct *, unsigned long, pte_t *);
    • pte_t (*swapin)(struct vm_area_struct *, unsigned long, unsigned long);
  • }

14. Traditional mmap()

  • int do_mmap(struct file *, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long off);
  • Creates a new memory region
  • Creates the required PTEs
  • Sets the PTEs to fault later
  • The handler (nopage) will either copy-on-write if anonymous mapping, or will bring in the required page of file.

15. How is brk() implemented?

  • Check whether to allocate (deny if not enough physical memory, exceeds its VA limits, or crosses stack).
  • Then calldo_mmap()for anonymous mapping between the old and new values ofbrk(in process table).
  • Return the newbrkvalue.

16. Kernel Segment

  • On a sys call, CS points to kernel segment. DS and ES are set to kernel segment as well.
  • Next, FS is set to user data segment.
  • Put_user() and get_user() can then access user space if needed.
  • The address parameters to these functions cannot exceed 0xc0000000.
  • Violation of this should result in a trap, together with any writes to a read-only page (creates a problem on 386, while the problem does notexist in 486/Pentium)
  • Hence, verify_area() is typically called before performing such operations.
  • Physical and Virtual addresses are same except for those allocated using vmalloc().
  • Kernel segment shared across processes (not switched!)

17. Memory Allocn for Kernel Segment

  • Static
    • Memory_start = console_init(memory_start, memory_end);
    • Typically done for drivers to reserve areas, and for some other kernel components.
  • Dynamic
    • Void *kmalloc(size, priority), Void kfree (void *)// in mm/kmalloc.c
    • Void *vmalloc(size), void *vmfree(void *) // in mm/vmalloc.c
    • Kmalloc is used for physically contiguous pages while vmalloc does not necessarily allocate physically contiguous pages
    • Memory allocated is not initialized (and is not paged out).

18. kmalloc() data structures sizes[] bh bh bh bh bh bh Null Null page_descriptor size_descriptor 32 64 128 252 508 1020 2040 4080 8176 16368 32752 65520 131056 19. vmalloc()

  • Allocated virtually contiguous pages, but they do not need to be physically contiguous.

View more >