model-based kernel testing for concurrency bugs through counter example replay moonzoo kim, shin...

16
Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim , Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST, South Korea Taeho Kim Embedded System Group ETRI, South Korea

Upload: jasmin-ford

Post on 29-Dec-2015

253 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Model-based Kernel Testing for Concurrency Bugs through Counter

Example Replay

Moonzoo Kim, Shin Hong, Changki HongProvable Software Lab.

CS Dept. KAIST, South Korea

Taeho KimEmbedded System Group

ETRI, South Korea

Page 2: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Introduction• There are increasing need for operating systems customized for

embedded systems– Mobile phones, portable players, etc

• Conventional testing cannot provide satisfactory reliability to op-erating system kernel – Complexity of code– Multi-threaded program– Difficulty of unit testing– Lack of proper testing tools

• A model checking solves some of these problems through abstrac-tion, but there is a gap between a real kernel and its abstract model

2 / 16

Þ To combine both model checking and testing through replaying a counter example on the real code

Page 3: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Summary of the Approach• MOdel-based KERnel Testing (MOKERT) framework

3 / 16

FormalModel

Req.Property Model

CheckerErrorTrace

Model Extractor

Translation Script 𝒯

Target pro-gram

Automated Instru-mentation𝒯-1

OK Model checking

Testing

Model extraction

Page 4: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Traditional Model Checking v.s. MOKERT

4 / 16

FormalModel

Req.Property

Model Checker

Manual modeling

Manual Analysis

Target pro-gram

ErrorTrace

Traditional model checking MOKERT framework

Modeling Manual (Semi) Automatic

Counter example analysis Manual Automatic

OK

Manual Script Writing

FormalModel

Req.Property

SpinErrorTrace

Modex

Translation Script 𝒯

Target pro-gram

𝒯-1

Automated Instrumenta-

tion

OK

Page 5: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Model Extraction• Similarity between C and Promela

– Control statements (if, goto, loop)– Support complex data structures (array, typedef)

• Without a translation script– C control statements are automatically translated– Others C statements are embedded into a promela

model using c_expr{…} and c_code{…}

5 / 16

68: do { 69: spin_unlock(&proc_subdir_lock); 70: if (filldir(dirent, de->name, de->namelen,

filp->f_pos, de->low_ino, de->mode >> 12) < 0 ) 71: goto out; 72: spin_lock(&proc_subdir_lock);

… 77: } while (de);

Target C code Promela code123: do124: ::125: spin_unlock(proc_subdir_lock); /* line 69 */126: if127: :: false; /* line 70 */128: goto out; 129: :: else; /* line 72 */130: fi;131: spin_lock(proc_subdir_lock); /* line 72 */ …137: od;

Patterns in C code Corresponding Promela codespin_unlock(&proc_subdir_lock); spin_unlock(proc_subdir_lock)(filldir(dirent… falsespin_lock(&proc_subdir_lock); spin_lock(proc_subdir_lock)

Translation table

• Modex translate a C program into a Promela model based on a trans-lation script

- A translation script consists of translation tables for functions and an environment model

Page 6: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Architecture of MOKERT

6 / 16

Program

Translation script

Instrumented program

Req.property

Spin (2)

Modex (1)

Target C program

Promela Model

Mapping info

Instrumentation module (3)

Okay

Counter example

Static phase

Targetthreads

Controllerthread

Monitorthread

Process state

Scheduling signals

Controllerstate

Testdriver

Run-time phase

User input

Objects

ProcessLegend

Page 7: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Automatic Instrumentation

7 / 16

• Target C program is automatically instrumented based on the mapping in-formation and the counter example.

Target C program 68 : do {69: spin_unlock (&proc_subdir_lock);70: if (filldir(dirent, de->name, de->namelen, filp->f_pos, de->low_ino, de->mode >> 12) < 0)…77: } while (de)

Promela model…123 : do 124: :: 125: spin_unlock (proc_subdir_lock); /* line 69*/126: if127: :: false; /* line 70 */

Counter example...44: proc 1 (proc_readdir) line 116 [( (!i) )]45: proc 1 (proc_readdir) line 125 [((proc_subdir_lock==1))]46: proc 2 (remove_proc_entry) line 156 [((proc_subdir_lock==0))]47: proc 2 (remove_proc_entry) line 157 [p = proc_parent.next]

Context switch occurs after the exe-cution of line 125 of the promela model

insert a probe : the probe enforces context switching

Page 8: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Automatic Instrumentation (cont.)• An example of a probe

8 / 16

Target C program 68 : do { /* filldir passes info to user space */69: spin_unlock (&proc_subdir_lock);70: if (filldir(dirent, de->name, de->namelen, filp->f_pos, de->low_ino, de->mode >> 12) < 0)…76: de = de->next;77: } while (de);

Instrumented C program 68 : do {69: spin_unlock (&proc_subdir_lock); if (test_start == true) { m_pid = get_model_pid(current_pid); switch(m_pid) { case m_pid_1: switch (context_switching_point[m_pid][69]) {

case 3: notify_controller(); wait_for_signal(); default: context_switching_point[m_pid][69]+

+; break;}

} 70: if (filldir(dirent, de->name, de->namelen, filp->f_pos, de->low_ino, de->mode >> 12) < 0)…77: } while (de);

Suppose that a counter example has a context switching after the 3rd execution of line 69 of the C code

notify_controller() : notify the controller thread that context switching should occur

wait_for_signal() : wait for the signal from the controller thread to continue execution while(runnable[m_pid]==false) { sys_sched_yield(); }

Page 9: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Detection of Replay Failure• The monitoring thread decides whether a counter example is replayed

correctly or not based on the following information– Probe entering/leaving logs

• Each probe reports its entering/exit to the monitoring thread. – Process status

• A user can read a process's status through proc file system – Target threads' call-stack

• Example (see case study 2)– If a target thread enters into a probe but not leave and – the status of the thread is "Running“ for long time,

• then the monitoring thread reports that the counter example replay failed, since the target thread is waiting for the controller thread's signal indefinitely

9 / 16

Page 10: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay 10 / 20

Context Switch

Case Study 1: Data Race between proc_readdir() and remove_proc_entry()

Context Switch

• Linux Changelog 2.6.22 reported that proc_readdir() and remove_proc_entry() had a data race bug in the Linux 2.6.21 kernel.– proc_readdir() : 67 lines (68 lines in Promela)– remove_proc_entry() : 35 lines (36 lines in Promela) – Test driver (environment): 76 lines (24 lines in Promela)– Two graduate students spent 2 days for replaying the bug

Page 11: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

• Replaying the data race bug– Environment setting

11 / 16

“Jan” “Mar” “Apr”“Feb”

“Month”

x

Case Study 1: Data Race between proc_readdir() and remove_proc_entry()

“Jan” “Feb”

“Month”

- Result of replaying the data race bug

“Jan” “Feb” “Mar” “Apr”

Removed directory entry

The first directory entry x

free_proc_entry(de) does not actually free de, but modifies incoming and outgoing links

Page 12: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

• We found a data race in a Promela model for ext2_readdir() (78 lines/65 lines) and ext2_rmdir() (15 lines/15 lines) in Linux 2.6.25 as follows:

• However, the counter example could not be replayed on the real Linux kernel– The monitor thread found that a thread for ext2_readdir() was

waiting for the signal from the controller thread without progress 12 / 16

Case Study 2: Model Refinement of ext2_readdir() and ext2_rmdir()

ext2_readdir() ext2_rmdir() 67a: if(de->inode) { …74a: offset = (char *) de – kaddr;

93b: struct inode * inode = dentry->d_inode; …97b: if (ext2_empty_dir(inode)) {98b: err = ext2_unlink(dir, dentry);

75a: over = filldir(dirent, de->name, de->name_len,76a: (n<PAGE_CACHE_SHIFT) | offset,77a: le32_to_cpu(de->inode), d_type);

Page 13: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

• Replaying the data race bug– vfs_readdir() and do_rmdir() were invoked instead, since ext2_readdir() and

ext2_rmdir() could be invoked only from these two functions – We found that vfs_readdir() and do_rmdir() protected ext2_readir() and

ext2_rmdir() through mutex, thus preventing the data race.

13 / 16

Case Study 2: Model Refinement of ext2_readdir() and ext2_rmdir()

22 : int vfs_readdir(struct file *file, filldir_t filler, void *buf) {…33: mutex_lock(&inode->i_mutex); …36: res = file->f_op->readdir(file, buf, filler); …39: mutex_unlock(&inode->i_mutex); …

ext2_readdir(struct file *filp, void *dirent, filldir_t filldir) {

2038: static long do_rmdir(int dfd, const char __user *pathname) { …2064: mutex_lock_nested(… ); …2069: error = vfs_rmdir (…); …2072: mutex_unlock(…); …

2005: int vfs_rmdir(struct inode *dir, struct dentry *dentry) { …2024: error = dir->i_op->rmdir(dir, dentry) … ext2_rmdir( struct inode * dir, struct dentry *dentry) {

(a) Calling sequence from vfs_readdir() to ext2_readdir()

(b) Calling sequence from do_rmdir() to ext2_readdir()

Page 14: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

• MOKERT was applied to verify the proc file system in Linux 2.6.28.2 and found a data race between remove_proc_entry()’s– It may cause a null pointer dereference

14 / 16

Case Study 3: Data Race between remove_proc_entry() ‘s

remove_proc_entry(“dir1”, null) remove_proc_entry(“dir1/dir2”,null)755a: spin_lock(&proc_subdir_lock);756a: for(p=&parent->subdir; *p; p=&(*p)->next) { …764a: spin_unlock(&proc_subdir_lock) ; …807a: if ( de->subdir != NULL) {

755b: spin_lock(&proc_subdir_lock) ;756b: for(p=&parent->subdir; *p; p=&(*p)->next) { /* if the linked list pointed by parent->subdir becomes empty, subdir is set as NULL */ …763b: }764b: spin_unlock(&proc_subdir_lock) ;

808a: printk(KERN_WARNING “%s: removing non-empty directory …” ,de->parent->name, de->name, de->subdir->name) ;

dir1/dir2

Page 15: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Case Study 3: Data Race between remove_proc_entry() ‘s

• Replaying the data race bug– We could reuse the previous experimental setting of the case study 1

• Promela model of remove_proc_entry(): 81 lines• Environment model: 73 lines

– It took one day for a graduate student to perform this verification task

15 / 16

Page 16: Model-based Kernel Testing for Concurrency Bugs through Counter Example Replay Moonzoo Kim, Shin Hong, Changki Hong Provable Software Lab. CS Dept. KAIST,

Moonzoo Kim@ Model-based Kernel Testing for Concurrency Bugs Provable Software Lab, KAIST through Counter Example Replay

Conclusions• We proposed the MOKERT framework to verify multi-threaded

program like an operating system kernel.– The framework applies the analysis result (i.e., a counter ex-

ample) of model checking to the actual kernel code.– We have demonstrated the effectiveness of MOKERT through

the case studies• We found a data race bug in proc file system of Linux

2.6.28.2 • How to create an abstract model systematically is still issue

– Currently, we depend on human knowledge • We plan to apply MOKERT to more components of Linux file

systems

16 / 16