linux operating system 許富皓

1

Linux Operating System

許富皓

2

Chapter 3 Processes

3

Non-circular Doubly Linked Lists

A sequence of nodes chained together through two kinds of pointers: a pointer to its previous node and a pointer to its subsequent node.

Each node has two links: one points to the previous node, or points to a null value or empty list if

it is the first node and one points to the next, or points to a null value or empty list if it is the

final node.

4

Problems with Doubly Linked Lists

The Linux kernel contains hundred of various data structures that are linked together through their respective doubly linked lists. Drawbacks:

a waste of programmers' efforts to implement a set of primitive operations, such as,

initializing the list inserting and deleting an element scanning the list.

a waste of memory to replicate the primitive operations for each different list.

5

Data Structure struct list_head

Therefore, the Linux kernel defines the struct list_head data structure, whose only fields next and prev represent the forward and back pointers of a generic doubly linked list element, respectively.

It is important to note, however, that the pointers in a list_head field store the addresses of other list_head fields rather than the addresses of the whole data structures in which

the list_head structure is included.

6

list_head

A Circular Doubly Linked List with Three Elements

next prev

next prev

list_head

next prev

list_head

next prev

list_head

data structure 1

data structure 2

data structure 3

7

Macro LIST_HEAD(list_name) A new list is created by using the LIST_HEAD(list_name) macro. it declares a new variable named list_name of type list_head, which is a dummy first element that acts as a placeholder for the head of the new list.

and it initializes the prev and next fields of the list_head data structure so as to point to the list_name variable itself.

8

Code of Macro LIST_HEAD(list_name)struct list_head

{

struct list_head *next, *prev;

};

#define LIST_HEAD_INIT(name) { &(name), &(name) }

#define LIST_HEAD(name) \

struct list_head name = LIST_HEAD_INIT(name)

9

An Empty Doubly Linked List

LIST_HEAD(my_list)

prev

next

struct list_head my_list

p

1

n

list_add(n,p) list_add_tail(n,p) list_del(p) list_empty(p)

10

Relative Functions and Macros (1)

. . .2 n

11

Relative Functions and Macros (2) list_for_each(p,h) list_for_each_entry(p,h,m) list_entry(p,t,m)

Returns the address of the data structure of type t in which the list_head field that has the name m and the address p is included.

12

Example of list_entry(p,t,m)

sturct class{

char name[20];

char teacher[20];

struct student_pointer *student;

struct list_head link;

};

struct class grad_1A;

struct list_head *poi;

poi=&(grad_1A.link);

list_entry(poi,struct class,link)

&grad_1A

name

(20 bytes)

teacher

(20 bytes)

student

(4 bytes)

link next (4 bytes)

prev(4 bytes)

13

Code of list_entry

typedef unsigned int __kernel_size_t; typedef __kernel_size_t size_t;

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

#define list_entry (ptr, type, member) \

({ \

const typeof(((type *)0)->member)*__mptr= (ptr); \

(type *)((char *)__mptr - offsetof(type,member) );\

})

14

Explanation of list_entry(p,t,m)

list_entry(poi,struct class,link)

((struct class *)((char *)(poi)-(unsigned long)(&((struct class *)0)->link)))

name

(20 bytes)

teacher

(20 bytes)

student

(4 bytes)

link next (4 bytes)

prev(4 bytes)

poi

offset

list_entry(…)

#define list_entry(ptr, type, member) \

((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member)))

poi

= list_entry()

offset

- offset

15

hlist_head The Linux kernel 2.6 supports another kind

of doubly linked list, which mainly differs from a list_head list because it is NOT circular.

It is mainly used for hash tables. The list head is stored in an hlist_head

data structure, which is simply a pointer to the first element in the list (NULL if

the list is empty).

16

hlist_node Each element is represented by an hlist_node data structure, which includes a pointer next to the next element and a pointer pprev to the next field of the

previous element. Because the list is not circular, the pprev

field of the first element and the next field of the last element are set to NULL.

17

A Non-circular Doubly Linked List

struct hlist_node

*first pprev next pprev next pprev next

struct hlist_headstruct hlist_node

18

Functions and Macro for hlist_head and hlist_node The list can be handled by means of

several helper functions and macros similar to those listed in previous sixth slide: hlist_add_head( ), hlist_del( ), hlist_empty( ), hlist_entry, hlist_for_each_entry, and so on.

19

The Process ListThe process list is a circular

doubly linked list that links the process descriptors of all existing thread group leaders:Each task_struct structure includes a tasks

field of type list_head whose prev and next fields point, respectively, to the previous and to the next task_struct element’s tasks field.

20

The Head of the Process List

The head of the process list is the init_task task_struct descriptor; it is the process descriptor of the so-called process 0 or swapper (see the section "Kernel Threads" later in this chapter).

The tasks->prev field of init_task points to the tasks field of the process descriptor inserted last in the list.

21

Code for init_taskstruct task_struct init_task = INIT_TASK(init_task);

#define INIT_TASK(tsk) \ { \ .state = 0, \ .thread_info = &init_thread_info, \ .usage = ATOMIC_INIT(2), \ .flags = 0, \ .lock_depth = -1, \ .prio = MAX_PRIO-20, \ .static_prio = MAX_PRIO-20, \ .policy = SCHED_NORMAL, \ .cpus_allowed = CPU_MASK_ALL, \ .mm = NULL, \ .active_mm = &init_mm, \ .run_list = LIST_HEAD_INIT(tsk.run_list), \ : :}

22

Insert and Delete a Process Descriptor from the Process List The SET_LINKS and REMOVE_LINKS

macros are used to insert and to remove a process descriptor in the process list, respectively.

These macros also take care of the parenthood relationship of the process (see the section "How Processes Are Organized" later in this chapter).

23

Scans the Whole Process List with Macro for_each_process#define for_each_process(p) \ for (p=&init_task; (p=list_entry((p)->tasks.next, \ struct task_struct, tasks)) != &init_task; )

The macro starts by moving PAST init_task to the next task and continues until it reaches init_task again (thanks to the circularity of the list).

At each iteration, the variable p passed as the argument of the macro contains the address of the currently scanned process descriptor, as returned by the list_entry macro.

24

Example Macro for_each_process scans the whole

thread group leader list. The macro is the loop control statement after

which the kernel programmer supplies the loop.

e.g. counter=1; /* for init_task */

for_each_process(t)

{ if(t->state==TASK_RUNNING)

++counter;

}

25

The Lists of TASK_RUNNING Processes – in Early Linux Version When looking for a new process to run on a CPU,

the kernel has to consider only the runnable processes (that is, the processes in the TASK_RUNNING state).

Earlier Linux versions put all runnable processes in the same list called runqueue.

Because it would be too costly to maintain the list ordered according to process priorities, the earlier schedulers were compelled to scan the whole list in order to select the "best" runnable process.

Linux 2.6 implements the runqueue differently.

26

The Lists of TASK_RUNNING Processes – in Linux Version 2.6 Linux 2.6 achieves the scheduler speedup by

splitting the runqueue in many lists of runnable processes, one list per process priority.

Each task_struct descriptor includes a run_list field of type list_head.

If the process priority is equal to k (a value ranging between 0 and 139), the run_list field links the process descriptor into the list of runnable processes having priority k.

27

runqueue in a Multiprocessor System Furthermore, on a multiprocessor

system, each CPU has its own runqueue, that is, its own set of lists of processes.

28

Trade-off of runqueue

runqueue is a classic example of making a data structures more complex to improve performance: to make scheduler operations more efficient,

the runqueue list has been split into 140 different lists!

29

The Main Data Structures of a runqueue

The kernel must preserve a lot of data for every runqueue in the system.

The main data structures of a runqueue are the lists of process descriptors belonging to the runqueue.

All these lists are implemented by a single prio_array_t (= struct prio_array ) data structure.

30

struct prio_arraystruct prio_array

{ unsigned int nr_active;

unsigned long bitmap[BITMAP_SIZE];

struct list_head queue[MAX_PRIO];

};

nr_active: the number of process descriptors linked into the lists.

bitmap: a priority bitmap: each flag is set if and only if the corresponding priority list is not empty

queue: the 140 heads of the priority lists.

31

The prio and array Field of a Process Descriptor The prio field of the process descriptor

stores the dynamic priority of the process.

The array field is a pointer to the prio_array_t data structure of its current runqueue. P.S.: Each CPU has its own runqueue.

32

Scheduler-related Fields of a Process Descriptor

:

int prio

struct prev list_head run_list next

prio_array_t *array

:

unsigned int nr_active

unsigned long bitmap[5]

struct [0] list_head [1] queue[140][x]

prio_array_t

struct task_struct

:

int prio


prio_array_t *array

:

struct task_struct

:

int prio


prio_array_t *array

:

struct task_struct

.

.

.

33

The enqueue_task(p,array) function inserts a process descriptor into a runqueue list; its code is essentially equivalent to:

list_add_tail(&p->run_list, &array->queue[p->prio]);

__set_bit(p->prio, array->bitmap);

array->nr_active++;

p->array = array;

Function enqueue_task(p,array)

34

Function dequeue_task(p,array) Similarly, the dequeue_task(p,array)

function removes a process descriptor from a runqueue list.

35

Relationships among Processes

Processes created by a program have a parent/child relationship.

When a process creates multiple children, these children have sibling relationships.

Several fields must be introduced in a process descriptor to represent these relationships with respect to a given process P.

Processes 0 and 1 are created by the kernel. Process 1 (init) is the ancestor of all other

processes.

36

real_parent:points to the process descriptor of the process

that created P or points to the descriptor of process 1 (init) if

the parent process no longer exists. Therefore, when a user starts a background

process and exits the shell, the background process becomes the child of init.

Fields of a Process Descriptor Used to Express Parenthood Relationships (1)

37

Fields of a Process Descriptor Used to Express Parenthood Relationships (2) parent:

Points to the current parent of P this is the process that must be signaled when the

child process terminates. its value usually coincides with that of real_parent.

It may occasionally differ, such as when another process issues a ptrace( ) system call requesting that it be allowed to monitor P.

see the section "Execution Tracing" in Chapter 20.

38

Fields of a Process Descriptor Used to Express Parenthood Relationships (3) struct list_head children:

The head of the list containing all children created by P. This list is formed through the sibling field of the child

processes. struct list_head sibling:

The pointers to the next and previous elements in the list of the sibling processes, those that have the same parent as P.

P.S.:

/* children/sibling forms the list of my children plus the tasks I'm ptracing. */

struct list_head children; /* list of my children */ struct list_head sibling; /* linkage in my parent's children list */

39

The children Field of a Patent Process Points to the sibling Field of a Child Process

#define add_parent(p, parent)

list_add_tail(&(p)->sibling,&(parent)->children)

=========================================================

#define SET_LINKS(p)

do { if (thread_group_leader(p))

list_add_tail(&(p)->tasks,&init_task.tasks);

add_parent(p, (p)->parent);

} while (0)

40

Iterate over a Process’s Children

Similarly, it is possible to iterate over a process's children with

#define list_for_each(pos, head) \ for(pos = (head)->next; pos != (head); pos = pos->next)

struct task_struct *task; struct list_head *list; list_for_each(list, &current->children) { task = list_entry(list, struct task_struct, sibling); /* task now points to one of current's children */ }

41

Example

Process P0 successively created P1, P2, and P3. Process P3, in turn, created process P4.

children/sibling fields forms the list of children of P0 (those links marked with )

42

Process Groups

Modern Unix operating systems introduce the notion of process groups to represent a job abstraction. For example,

in order to execute the command line: $ ls | sort | more

a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more.

In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise).

Process Groups [waikato]

One important feature is that it is possible to send a signal to every process in the group.

Process groups are used for distribution of signals, andby terminals to arbitrate requests for their

input and output.

Process Groups [waikato]

Foreground Process Groups A foreground process has read and write access to

the terminal. Every process in the foreground receives SIGINT (^C

) SIGQUIT (^\ ) and SIGTSTP signals. Background Process Groups

A background process does not have read access to the terminal.

If a background process attempts to read from its controlling terminal its process group will be sent a SIGTTIN.

45

Group Leaders and Process Group IDs Each process descriptor includes a field

containing the process group ID. Each group of processes may have a

group leader, which is the process whose PID coincides with the process group ID.

46

Creation of a New Process Group [

Bhaskar]

A newly created process is initially inserted into the process group of its parent.

The shell after doing a fork would explicitly call setpgid to set the process group of the child.

The process group is explicitly set for purposes of job control.

When a command is given at the shell prompt, that process or processes (if there is piping) is assigned a new process group.

47

Login Sessions

Modern Unix kernels also introduce login sessions.

Informally, a login session contains all processes that are descendants of the process that has started a working session on a specific terminal -- usually, the first command shell process created for the user.

48

Login Sessions vs. Process Groups

All processes in a process group must be in the same login session.

A login session may have several process groups active simultaneously; one of these process groups is always in the

foreground, which means that it has access to the terminal.

The other active process groups are in the background. When a background process tries to access the terminal, it

receives a SIGTTIN or SIGTTOUT signal. In many command shells, the internal commands bg and fg can

be used to put a process group in either the background or the foreground.

49

Other Relationship between Processes

There exist other relationships among processes: a process can be a leader of a process

group or of a login session, it can be a leader of a thread group, and it can also trace the execution of other

processes (see the section "Execution Tracing" in Chapter 20).

50

Other Process Relationship Fields of a Process Descriptor P (1) struct task_struct * group_leader

Process descriptor pointer of the group leader of P.

signal->pgrp PID of the group leader of P.

pid_t tgid PID of the thread group leader of P.

signal->session PID of the login session leader of P.

51

Other Process Relationship Fields of a Process Descriptor P (2)

struct list_head ptrace_childrenThe head of a list containing all children of P

being traced by a debugger. struct list_head ptrace_list

The pointers to the next and previous elements in the real parent's list of traced processes (used when P is being traced).

More about ptrace_list and ptrace_children[1]

/*

ptrace_list/ptrace_children forms

the list of my children that were stolen by a

ptracer.

*/

52

More about ptrace_list and ptrace_children[2]

void __ptrace_link(task_t *child, task_t *new_parent)

{

if (!list_empty(&child->ptrace_list))

BUG();

if (child->parent == new_parent)

return;

list_add(&child->ptrace_list, &child->parent->ptrace_children);

REMOVE_LINKS(child);

child->parent = new_parent;

SET_LINKS(child);

}

53

54

The pidhash Table and Chained Lists

In several circumstances, the kernel must be able to derive the process descriptor pointer corresponding to a PID. This occurs, for instance, in servicing the kill( )

system call. When process P1 wishes to send a signal to another process,

P2, it invokes the kill( ) system call specifying the PID of P2 as the parameter.

The kernel derives the process descriptor pointer from the PID and then extracts the pointer to the data structure that records the pending signals from P2's process descriptor.

55

Multiple Hash Tables

The process descriptor includes fields that represent different types of PID, and each type of PID requires its own hash table.

Due to the above reason, FOUR hash tables have been introduced to derive the process descriptor pointer corresponding to a PID.

56

The Four Hash Tables and Corresponding

Fields in the Process Descriptor Hash Table Type Field

NameDescription

PIDTYPE_PID pid PID of the process

PIDTYPE_TGID tgid PID of thread group leader process

PIDTYPE_PGID pgrp PID of the group leader process

PIDTYPE_SID session PID of the session leader process

57

Array pid_hash

The four hash tables are dynamically allocated during the kernel initialization phase, and their addresses are stored in the pid_hash array.

static struct hlist_head *pid_hash[PIDTYPE_MAX];

struct hlist_head *

pid_hash

58

Initialization of the Four Hash Tables -- pidhash_init(void) for (i = 0; i < PIDTYPE_MAX; i++) { pid_hash[i]=alloc_bootmem(pidhash_size* sizeof(*(pid_hash[i]))); if (!pid_hash[i]) panic("Could not alloc pidhash!\n"); for (j = 0; j < pidhash_size; j++) INIT_HLIST_HEAD(&pid_hash[i][j]); }

59

The Four Initialized Hash Tables

struct hlist_head *

pid_hash

struct

hlist_head

[0] [1] [2] [3] [2047]

struct

hlist_head

[0] [1] [2] [3] [2047]

60

PID and Table Index The PID is transformed into a table index using the pid_hashfn macro,

which expands to: #define pid_hashfn(x) hash_long((unsigned long) x, pidhash_shift)

The pidhash_shift variable stores the length in bits of a table index (11, in our example).

The hash_long( ) function is used by many hash functions; on a 32-bit architecture it is essentially equivalent to:

unsigned long hash_long(unsigned long val, unsigned int bits)

{ unsigned long hash = val * 0x9e370001UL; return hash >> (32 - bits); }

Because in our example pidhash_shift is equal to 11, pid_hashfn yields values ranging between 0 and 211 - 1 = 2047.

61

Colliding

As every basic computer science course explains, a hash function does not always ensure a one-to-one correspondence between PIDs and table indexes.

Two different PIDs that hash into the same table index are said to be colliding.

Linux uses chaining to handle colliding PIDs; each table entry is the head of a doubly linked list of colliding process descriptors.

62

Colliding Example The following figure illustrates a PID hash table with two lists.

The processes having PIDs 2,890 and 29,384 hash into the 200th element of the table, while the process having PID 29,385 hashes into the 1,466th element of the table.

63

Properties of Data Structures Used in the PID Hash Tables

The data structures used in the PID hash tables must keep track of the relationships between the processes.

As an example, suppose that the kernel must retrieve all processes belonging to a given thread group, that is, all processes whose tgid field is equal to a given number. Looking in the hash table for the given thread group number

returns just one process descriptor, that is, the descriptor of the thread group leader.

To quickly retrieve the other processes in the group, the kernel must maintain a list of processes for each thread group.

The same situation arises when looking for the processes belonging to a given login session or belonging to a given process group.

64

Properties of a Process Descriptor’s pids Field The PID hash tables' data structures solve

all these problems, because they allow the definition of a list of processes for any PID number included in a hash table.

The core data structure is an array of four pid structures embedded in the pids field of the process descriptor.

65

struct pid

struct pid {int nr; struct hlist_node pid_chain; struct list_head pid_list; };

nr: The PID number.

pid_chain: The links to the next and previous elements in the hash

chain list. pid_list:

The head of the per-PID list.

66

The PID Hash Tables

67

pidhash-related Functions

do_each_task_pid(nr, type, task) while_each_task_pid(nr, type, task)

find_task_by_pid_type(type, nr) find_task_by_pid(nr) attach_pid(task, type, nr) detach_pid(task, type) next_thread(task)

linux operating system 許 富 皓

Documents

linux operating system 許富皓