linux operating system 許 富 皓

67
1 Linux Operating System 許 許 許

Upload: yetty

Post on 13-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Linux Operating System 許 富 皓. Chapter 3 Processes. Non-circular Doubly Linked Lists. A sequence of nodes chained together through two kinds of pointers : a pointer to its previous node and a pointer to its subsequent node. Each node has two links : - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linux Operating System 許 富 皓

1

Linux Operating System

許 富 皓

Page 2: Linux Operating System 許 富 皓

2

Chapter 3 Processes

Page 3: Linux Operating System 許 富 皓

3

Non-circular Doubly Linked Lists

A sequence of nodes chained together through two kinds of pointers: a pointer to its previous node and a pointer to its subsequent node.

Each node has two links: one points to the previous node, or points to a null value or empty list if

it is the first node and one points to the next, or points to a null value or empty list if it is the

final node.

Page 4: Linux Operating System 許 富 皓

4

Problems with Doubly Linked Lists

The Linux kernel contains hundred of various data structures that are linked together through their respective doubly linked lists. Drawbacks:

a waste of programmers' efforts to implement a set of primitive operations, such as,

initializing the list inserting and deleting an element scanning the list.

a waste of memory to replicate the primitive operations for each different list.

Page 5: Linux Operating System 許 富 皓

5

Data Structure struct list_head

Therefore, the Linux kernel defines the struct list_head data structure, whose only fields next and prev represent the forward and back pointers of a generic doubly linked list element, respectively.

It is important to note, however, that the pointers in a list_head field store the addresses of other list_head fields rather than the addresses of the whole data structures in which

the list_head structure is included.

Page 6: Linux Operating System 許 富 皓

6

list_head

A Circular Doubly Linked List with Three Elements

next prev

next prev

list_head

next prev

list_head

next prev

list_head

data structure 1

data structure 2

data structure 3

Page 7: Linux Operating System 許 富 皓

7

Macro LIST_HEAD(list_name) A new list is created by using the LIST_HEAD(list_name) macro. it declares a new variable named list_name of type list_head, which is a dummy first element that acts as a placeholder for the head of the new list.

and it initializes the prev and next fields of the list_head data structure so as to point to the list_name variable itself.

Page 8: Linux Operating System 許 富 皓

8

Code of Macro LIST_HEAD(list_name)struct list_head

{

struct list_head *next, *prev;

};

#define LIST_HEAD_INIT(name) { &(name), &(name) }

#define LIST_HEAD(name) \

struct list_head name = LIST_HEAD_INIT(name)

Page 9: Linux Operating System 許 富 皓

9

An Empty Doubly Linked List

LIST_HEAD(my_list)

prev

next

struct list_head my_list

Page 10: Linux Operating System 許 富 皓

p

1

n

list_add(n,p) list_add_tail(n,p) list_del(p) list_empty(p)

10

Relative Functions and Macros (1)

. . .2 n

Page 11: Linux Operating System 許 富 皓

11

Relative Functions and Macros (2) list_for_each(p,h) list_for_each_entry(p,h,m) list_entry(p,t,m)

Returns the address of the data structure of type t in which the list_head field that has the name m and the address p is included.

Page 12: Linux Operating System 許 富 皓

12

Example of list_entry(p,t,m)

sturct class{

char name[20];

char teacher[20];

struct student_pointer *student;

struct list_head link;

};

struct class grad_1A;

struct list_head *poi;

poi=&(grad_1A.link);

list_entry(poi,struct class,link)

&grad_1A

name

(20 bytes)

teacher

(20 bytes)

student

(4 bytes)

link next (4 bytes)

prev(4 bytes)

Page 13: Linux Operating System 許 富 皓

13

Code of list_entry

typedef unsigned int __kernel_size_t; typedef __kernel_size_t size_t;

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

#define list_entry (ptr, type, member) \

({ \

const typeof(((type *)0)->member)*__mptr= (ptr); \

(type *)((char *)__mptr - offsetof(type,member) );\

})

Page 14: Linux Operating System 許 富 皓

14

Explanation of list_entry(p,t,m)

list_entry(poi,struct class,link)

((struct class *)((char *)(poi)-(unsigned long)(&((struct class *)0)->link)))

name

(20 bytes)

teacher

(20 bytes)

student

(4 bytes)

link next (4 bytes)

prev(4 bytes)

poi

offset

list_entry(…)

#define list_entry(ptr, type, member) \

((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

poi

= list_entry()

offset

- offset

Page 15: Linux Operating System 許 富 皓

15

hlist_head The Linux kernel 2.6 supports another kind

of doubly linked list, which mainly differs from a list_head list because it is NOT circular.

It is mainly used for hash tables. The list head is stored in an hlist_head

data structure, which is simply a pointer to the first element in the list (NULL if

the list is empty).

Page 16: Linux Operating System 許 富 皓

16

hlist_node Each element is represented by an hlist_node data structure, which includes a pointer next to the next element and a pointer pprev to the next field of the

previous element. Because the list is not circular, the pprev

field of the first element and the next field of the last element are set to NULL.

Page 17: Linux Operating System 許 富 皓

17

A Non-circular Doubly Linked List

struct hlist_node

*first pprev next pprev next pprev next

struct hlist_headstruct hlist_node

Page 18: Linux Operating System 許 富 皓

18

Functions and Macro for hlist_head and hlist_node The list can be handled by means of

several helper functions and macros similar to those listed in previous sixth slide: hlist_add_head( ), hlist_del( ), hlist_empty( ), hlist_entry, hlist_for_each_entry, and so on.

Page 19: Linux Operating System 許 富 皓

19

The Process ListThe process list is a circular

doubly linked list that links the process descriptors of all existing thread group leaders:Each task_struct structure includes a tasks

field of type list_head whose prev and next fields point, respectively, to the previous and to the next task_struct element’s tasks field.

Page 20: Linux Operating System 許 富 皓

20

The Head of the Process List

The head of the process list is the init_task task_struct descriptor; it is the process descriptor of the so-called process 0 or swapper (see the section "Kernel Threads" later in this chapter).

The tasks->prev field of init_task points to the tasks field of the process descriptor inserted last in the list.

Page 21: Linux Operating System 許 富 皓

21

Code for init_taskstruct task_struct init_task = INIT_TASK(init_task);

#define INIT_TASK(tsk) \ { \ .state = 0, \ .thread_info = &init_thread_info, \ .usage = ATOMIC_INIT(2), \ .flags = 0, \ .lock_depth = -1, \ .prio = MAX_PRIO-20, \ .static_prio = MAX_PRIO-20, \ .policy = SCHED_NORMAL, \ .cpus_allowed = CPU_MASK_ALL, \ .mm = NULL, \ .active_mm = &init_mm, \ .run_list = LIST_HEAD_INIT(tsk.run_list), \ : :}

Page 22: Linux Operating System 許 富 皓

22

Insert and Delete a Process Descriptor from the Process List The SET_LINKS and REMOVE_LINKS

macros are used to insert and to remove a process descriptor in the process list, respectively.

These macros also take care of the parenthood relationship of the process (see the section "How Processes Are Organized" later in this chapter).

Page 23: Linux Operating System 許 富 皓

23

Scans the Whole Process List with Macro for_each_process#define for_each_process(p) \ for (p=&init_task; (p=list_entry((p)->tasks.next, \ struct task_struct, tasks)) != &init_task; )

The macro starts by moving PAST init_task to the next task and continues until it reaches init_task again (thanks to the circularity of the list).

At each iteration, the variable p passed as the argument of the macro contains the address of the currently scanned process descriptor, as returned by the list_entry macro.

Page 24: Linux Operating System 許 富 皓

24

Example Macro for_each_process scans the whole

thread group leader list. The macro is the loop control statement after

which the kernel programmer supplies the loop.

e.g. counter=1; /* for init_task */

for_each_process(t)

{ if(t->state==TASK_RUNNING)

++counter;

}

Page 25: Linux Operating System 許 富 皓

25

The Lists of TASK_RUNNING Processes – in Early Linux Version When looking for a new process to run on a CPU,

the kernel has to consider only the runnable processes (that is, the processes in the TASK_RUNNING state).

Earlier Linux versions put all runnable processes in the same list called runqueue.

Because it would be too costly to maintain the list ordered according to process priorities, the earlier schedulers were compelled to scan the whole list in order to select the "best" runnable process.

Linux 2.6 implements the runqueue differently.

Page 26: Linux Operating System 許 富 皓

26

The Lists of TASK_RUNNING Processes – in Linux Version 2.6 Linux 2.6 achieves the scheduler speedup by

splitting the runqueue in many lists of runnable processes, one list per process priority.

Each task_struct descriptor includes a run_list field of type list_head.

If the process priority is equal to k (a value ranging between 0 and 139), the run_list field links the process descriptor into the list of runnable processes having priority k.

Page 27: Linux Operating System 許 富 皓

27

runqueue in a Multiprocessor System Furthermore, on a multiprocessor

system, each CPU has its own runqueue, that is, its own set of lists of processes.

Page 28: Linux Operating System 許 富 皓

28

Trade-off of runqueue

runqueue is a classic example of making a data structures more complex to improve performance: to make scheduler operations more efficient,

the runqueue list has been split into 140 different lists!

Page 29: Linux Operating System 許 富 皓

29

The Main Data Structures of a runqueue

The kernel must preserve a lot of data for every runqueue in the system.

The main data structures of a runqueue are the lists of process descriptors belonging to the runqueue.

All these lists are implemented by a single prio_array_t (= struct prio_array ) data structure.

Page 30: Linux Operating System 許 富 皓

30

struct prio_arraystruct prio_array

{ unsigned int nr_active;

unsigned long bitmap[BITMAP_SIZE];

struct list_head queue[MAX_PRIO];

};

nr_active: the number of process descriptors linked into the lists.

bitmap: a priority bitmap: each flag is set if and only if the corresponding priority list is not empty

queue: the 140 heads of the priority lists.

Page 31: Linux Operating System 許 富 皓

31

The prio and array Field of a Process Descriptor The prio field of the process descriptor

stores the dynamic priority of the process.

The array field is a pointer to the prio_array_t data structure of its current runqueue. P.S.: Each CPU has its own runqueue.

Page 32: Linux Operating System 許 富 皓

32

Scheduler-related Fields of a Process Descriptor

:

int prio

struct prev list_head run_list next

prio_array_t *array

:

unsigned int nr_active

unsigned long bitmap[5]

struct [0] list_head [1] queue[140][x]

prio_array_t

struct task_struct

:

int prio

struct prev list_head run_list next

prio_array_t *array

:

struct task_struct

:

int prio

struct prev list_head run_list next

prio_array_t *array

:

struct task_struct

.

.

.

Page 33: Linux Operating System 許 富 皓

33

The enqueue_task(p,array) function inserts a process descriptor into a runqueue list; its code is essentially equivalent to:

list_add_tail(&p->run_list, &array->queue[p->prio]);

__set_bit(p->prio, array->bitmap);

array->nr_active++;

p->array = array;

Function enqueue_task(p,array)

Page 34: Linux Operating System 許 富 皓

34

Function dequeue_task(p,array) Similarly, the dequeue_task(p,array)

function removes a process descriptor from a runqueue list.

Page 35: Linux Operating System 許 富 皓

35

Relationships among Processes

Processes created by a program have a parent/child relationship.

When a process creates multiple children, these children have sibling relationships.

Several fields must be introduced in a process descriptor to represent these relationships with respect to a given process P.

Processes 0 and 1 are created by the kernel. Process 1 (init) is the ancestor of all other

processes.

Page 36: Linux Operating System 許 富 皓

36

real_parent:points to the process descriptor of the process

that created P or points to the descriptor of process 1 (init) if

the parent process no longer exists. Therefore, when a user starts a background

process and exits the shell, the background process becomes the child of init.

Fields of a Process Descriptor Used to Express Parenthood Relationships (1)

Page 37: Linux Operating System 許 富 皓

37

Fields of a Process Descriptor Used to Express Parenthood Relationships (2) parent:

Points to the current parent of P this is the process that must be signaled when the

child process terminates. its value usually coincides with that of real_parent.

It may occasionally differ, such as when another process issues a ptrace( ) system call requesting that it be allowed to monitor P.

see the section "Execution Tracing" in Chapter 20.

Page 38: Linux Operating System 許 富 皓

38

Fields of a Process Descriptor Used to Express Parenthood Relationships (3) struct list_head children:

The head of the list containing all children created by P. This list is formed through the sibling field of the child

processes. struct list_head sibling:

The pointers to the next and previous elements in the list of the sibling processes, those that have the same parent as P.

P.S.:

/* children/sibling forms the list of my children plus the tasks I'm ptracing. */

struct list_head children; /* list of my children */ struct list_head sibling; /* linkage in my parent's children list */

Page 39: Linux Operating System 許 富 皓

39

The children Field of a Patent Process Points to the sibling Field of a Child Process

#define add_parent(p, parent)

list_add_tail(&(p)->sibling,&(parent)->children)

=========================================================

#define SET_LINKS(p)

do { if (thread_group_leader(p))

list_add_tail(&(p)->tasks,&init_task.tasks);

add_parent(p, (p)->parent);

} while (0)

Page 40: Linux Operating System 許 富 皓

40

Iterate over a Process’s Children

Similarly, it is possible to iterate over a process's children with

#define list_for_each(pos, head) \ for(pos = (head)->next; pos != (head); pos = pos->next)

struct task_struct *task; struct list_head *list; list_for_each(list, &current->children) { task = list_entry(list, struct task_struct, sibling); /* task now points to one of current's children */ }

Page 41: Linux Operating System 許 富 皓

41

Example

Process P0 successively created P1, P2, and P3. Process P3, in turn, created process P4.

children/sibling fields forms the list of children of P0 (those links marked with )

Page 42: Linux Operating System 許 富 皓

42

Process Groups

Modern Unix operating systems introduce the notion of process groups to represent a job abstraction. For example,

in order to execute the command line: $ ls | sort | more

a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more.

In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise).

Page 43: Linux Operating System 許 富 皓

Process Groups [waikato]

One important feature is that it is possible to send a signal to every process in the group.

Process groups are used for distribution of signals, andby terminals to arbitrate requests for their

input and output.

Page 44: Linux Operating System 許 富 皓

Process Groups [waikato]

Foreground Process Groups A foreground process has read and write access to

the terminal. Every process in the foreground receives SIGINT (^C

) SIGQUIT (^\ ) and SIGTSTP signals. Background Process Groups

A background process does not have read access to the terminal.

If a background process attempts to read from its controlling terminal its process group will be sent a SIGTTIN.

Page 45: Linux Operating System 許 富 皓

45

Group Leaders and Process Group IDs Each process descriptor includes a field

containing the process group ID. Each group of processes may have a

group leader, which is the process whose PID coincides with the process group ID.

Page 46: Linux Operating System 許 富 皓

46

Creation of a New Process Group [

Bhaskar]

A newly created process is initially inserted into the process group of its parent.

The shell after doing a fork would explicitly call setpgid to set the process group of the child.

The process group is explicitly set for purposes of job control.

When a command is given at the shell prompt, that process or processes (if there is piping) is assigned a new process group.

Page 47: Linux Operating System 許 富 皓

47

Login Sessions

Modern Unix kernels also introduce login sessions.

Informally, a login session contains all processes that are descendants of the process that has started a working session on a specific terminal -- usually, the first command shell process created for the user.

Page 48: Linux Operating System 許 富 皓

48

Login Sessions vs. Process Groups

All processes in a process group must be in the same login session.

A login session may have several process groups active simultaneously; one of these process groups is always in the

foreground, which means that it has access to the terminal.

The other active process groups are in the background. When a background process tries to access the terminal, it

receives a SIGTTIN or SIGTTOUT signal. In many command shells, the internal commands bg and fg can

be used to put a process group in either the background or the foreground.

Page 49: Linux Operating System 許 富 皓

49

Other Relationship between Processes

There exist other relationships among processes: a process can be a leader of a process

group or of a login session, it can be a leader of a thread group, and it can also trace the execution of other

processes (see the section "Execution Tracing" in Chapter 20).

Page 50: Linux Operating System 許 富 皓

50

Other Process Relationship Fields of a Process Descriptor P (1) struct task_struct * group_leader

Process descriptor pointer of the group leader of P.

signal->pgrp PID of the group leader of P.

pid_t tgid PID of the thread group leader of P.

signal->session PID of the login session leader of P.

Page 51: Linux Operating System 許 富 皓

51

Other Process Relationship Fields of a Process Descriptor P (2)

struct list_head ptrace_childrenThe head of a list containing all children of P

being traced by a debugger. struct list_head ptrace_list

The pointers to the next and previous elements in the real parent's list of traced processes (used when P is being traced).

Page 52: Linux Operating System 許 富 皓

More about ptrace_list and ptrace_children[1]

/*

ptrace_list/ptrace_children forms

the list of my children that were stolen by a

ptracer.

*/

52

Page 53: Linux Operating System 許 富 皓

More about ptrace_list and ptrace_children[2]

void __ptrace_link(task_t *child, task_t *new_parent)

{

if (!list_empty(&child->ptrace_list))

BUG();

if (child->parent == new_parent)

return;

list_add(&child->ptrace_list, &child->parent->ptrace_children);

REMOVE_LINKS(child);

child->parent = new_parent;

SET_LINKS(child);

}

53

Page 54: Linux Operating System 許 富 皓

54

The pidhash Table and Chained Lists

In several circumstances, the kernel must be able to derive the process descriptor pointer corresponding to a PID. This occurs, for instance, in servicing the kill( )

system call. When process P1 wishes to send a signal to another process,

P2, it invokes the kill( ) system call specifying the PID of P2 as the parameter.

The kernel derives the process descriptor pointer from the PID and then extracts the pointer to the data structure that records the pending signals from P2's process descriptor.

Page 55: Linux Operating System 許 富 皓

55

Multiple Hash Tables

The process descriptor includes fields that represent different types of PID, and each type of PID requires its own hash table.

Due to the above reason, FOUR hash tables have been introduced to derive the process descriptor pointer corresponding to a PID.

Page 56: Linux Operating System 許 富 皓

56

The Four Hash Tables and Corresponding

Fields in the Process Descriptor Hash Table Type Field

NameDescription

PIDTYPE_PID pid PID of the process

PIDTYPE_TGID tgid PID of thread group leader process

PIDTYPE_PGID pgrp PID of the group leader process

PIDTYPE_SID session PID of the session leader process

Page 57: Linux Operating System 許 富 皓

57

Array pid_hash

The four hash tables are dynamically allocated during the kernel initialization phase, and their addresses are stored in the pid_hash array.

static struct hlist_head *pid_hash[PIDTYPE_MAX];

struct hlist_head *

pid_hash

Page 58: Linux Operating System 許 富 皓

58

Initialization of the Four Hash Tables -- pidhash_init(void) for (i = 0; i < PIDTYPE_MAX; i++) { pid_hash[i]=alloc_bootmem(pidhash_size* sizeof(*(pid_hash[i]))); if (!pid_hash[i]) panic("Could not alloc pidhash!\n"); for (j = 0; j < pidhash_size; j++) INIT_HLIST_HEAD(&pid_hash[i][j]); }

Page 59: Linux Operating System 許 富 皓

59

The Four Initialized Hash Tables

struct hlist_head *

pid_hash

struct

hlist_head

[0] [1] [2] [3] [2047]

struct

hlist_head

[0] [1] [2] [3] [2047]

Page 60: Linux Operating System 許 富 皓

60

PID and Table Index The PID is transformed into a table index using the pid_hashfn macro,

which expands to: #define pid_hashfn(x) hash_long((unsigned long) x, pidhash_shift)

The pidhash_shift variable stores the length in bits of a table index (11, in our example).

The hash_long( ) function is used by many hash functions; on a 32-bit architecture it is essentially equivalent to:

unsigned long hash_long(unsigned long val, unsigned int bits)

{ unsigned long hash = val * 0x9e370001UL; return hash >> (32 - bits); }

Because in our example pidhash_shift is equal to 11, pid_hashfn yields values ranging between 0 and 211 - 1 = 2047.

Page 61: Linux Operating System 許 富 皓

61

Colliding

As every basic computer science course explains, a hash function does not always ensure a one-to-one correspondence between PIDs and table indexes.

Two different PIDs that hash into the same table index are said to be colliding.

Linux uses chaining to handle colliding PIDs; each table entry is the head of a doubly linked list of colliding process descriptors.

Page 62: Linux Operating System 許 富 皓

62

Colliding Example The following figure illustrates a PID hash table with two lists.

The processes having PIDs 2,890 and 29,384 hash into the 200th element of the table, while the process having PID 29,385 hashes into the 1,466th element of the table.

Page 63: Linux Operating System 許 富 皓

63

Properties of Data Structures Used in the PID Hash Tables

The data structures used in the PID hash tables must keep track of the relationships between the processes.

As an example, suppose that the kernel must retrieve all processes belonging to a given thread group, that is, all processes whose tgid field is equal to a given number. Looking in the hash table for the given thread group number

returns just one process descriptor, that is, the descriptor of the thread group leader.

To quickly retrieve the other processes in the group, the kernel must maintain a list of processes for each thread group.

The same situation arises when looking for the processes belonging to a given login session or belonging to a given process group.

Page 64: Linux Operating System 許 富 皓

64

Properties of a Process Descriptor’s pids Field The PID hash tables' data structures solve

all these problems, because they allow the definition of a list of processes for any PID number included in a hash table.

The core data structure is an array of four pid structures embedded in the pids field of the process descriptor.

Page 65: Linux Operating System 許 富 皓

65

struct pid

struct pid {int nr; struct hlist_node pid_chain; struct list_head pid_list; };

nr: The PID number.

pid_chain: The links to the next and previous elements in the hash

chain list. pid_list:

The head of the per-PID list.

Page 66: Linux Operating System 許 富 皓

66

The PID Hash Tables

Page 67: Linux Operating System 許 富 皓

67

pidhash-related Functions

do_each_task_pid(nr, type, task) while_each_task_pid(nr, type, task)

find_task_by_pid_type(type, nr) find_task_by_pid(nr) attach_pid(task, type, nr) detach_pid(task, type) next_thread(task)