embedded linux barco-20121001

Internal Barco TrainingSeptember 24th / October 1st, 2012Kubrick training roomNoordlaan 5, 8520 Kuurne — Belgium

Introduction to Embedded Linux for Engineering

Marc Leeman, VNGPeter Korsgaard, DnA

2011

Contents

1 Introduction 11.1 Preconditions and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Some Hackable Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Marvell SheevaPlug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.2 Dreambox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3.3 Linksys NSLU2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.4 Buffalo Linkstation Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.5 Neo Freerunner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.6 AzBox HD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Cross Compilation Toolchain 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 GNU Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 C Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 GNU C Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 uClibc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.1 Crosstool-NG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.2 Buildroot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Hands On - Toolchain with Buildroot . . . . . . . . . . . . . . . . . . . . . . . . . 112.5.1 Getting the Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5.2 Configuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.3 Finishing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 The Linux Boot Process 153.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Step 1: The Boot Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 System startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.2 Extracting the MBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.3 Stage 1 boot loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.4 Stage 2 boot loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.4.1 GRUB stage boot loaders . . . . . . . . . . . . . . . . . . . . . . . 183.2.5 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.5.1 Manual boot in GRUB . . . . . . . . . . . . . . . . . . . . . . . . 203.2.5.2 decompress kernel output . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Step 2: init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.1 Step 2.1: /etc/inittab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 Step 3: Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.5 Step 4: More inittab fun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.6 Hands On . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i

CONTENTS ii

3.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Boot Loaders 274.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2 RedBoot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 Das U-Boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Barebox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Hands On - Explore U-Boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7 Hands On - Replace Bootloader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7.1.1 Getting the Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7.1.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.7.1.3 Building and booting . . . . . . . . . . . . . . . . . . . . . . . . . 314.7.1.4 A note about the flash layout . . . . . . . . . . . . . . . . . . . . . 324.7.1.5 Adjusting the U-Boot environment . . . . . . . . . . . . . . . . . . 334.7.1.6 Fine tuning the Startup Behaviour . . . . . . . . . . . . . . . . . . 34

4.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 The Linux Kernel 395.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Technical features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3.2 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3.3 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3.4 Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3.5 Getting the Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3.6 Source Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3.7 Tracking Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Hands On - Build Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4.1 NFS - Network File System . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.4.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.4.3 Upgrading a Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.5 Device Tree (Powerpc, Microblaze, only for now) . . . . . . . . . . . . . . . . . . . 475.5.1 Flash Mapping in the Device Tree . . . . . . . . . . . . . . . . . . . . . . . 495.5.2 What if Something Goes Wrong . . . . . . . . . . . . . . . . . . . . . . . . 50

5.5.2.1 What Is The Kernel Symbol Table? . . . . . . . . . . . . . . . . . 505.5.2.2 What Is An Oops? . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.5.2.3 What Does An Oops Have To Do With System.map? . . . . . . . 51

5.6 Device Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.6.1.1 How to load a module . . . . . . . . . . . . . . . . . . . . . . . . . 525.6.1.2 Choosing the device type . . . . . . . . . . . . . . . . . . . . . . . 52

5.6.2 Busses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.6.2.1 Platform Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.6.2.2 PCI Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.6.3 A Real Life Barco Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.6.3.2 Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.6.3.3 Registering a PCI Driver . . . . . . . . . . . . . . . . . . . . . . . 605.6.3.4 Assigning the I/O and Memory Spaces . . . . . . . . . . . . . . . 625.6.3.5 PCI Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.6.4 Adding a Character Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 66

CONTENTS iii

5.6.4.1 Major and Minor Numbers . . . . . . . . . . . . . . . . . . . . . . 665.6.4.2 The Internal Representation of Device Numbers . . . . . . . . . . 665.6.4.3 Some Important Data Structures . . . . . . . . . . . . . . . . . . . 675.6.4.4 File Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.6.4.5 Char Device Registration . . . . . . . . . . . . . . . . . . . . . . . 695.6.4.6 Open and Release . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.6.4.7 The Release Method . . . . . . . . . . . . . . . . . . . . . . . . . . 715.6.4.8 Read and Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.6.4.9 Ioctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.6.5 Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.6.5.1 Installing an Interrupt Handler . . . . . . . . . . . . . . . . . . . . 805.6.5.2 The /proc Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 825.6.5.3 Fast and Slow Handlers . . . . . . . . . . . . . . . . . . . . . . . . 845.6.5.4 Implementing a Handler . . . . . . . . . . . . . . . . . . . . . . . . 845.6.5.5 Handler Arguments and Return Values . . . . . . . . . . . . . . . 855.6.5.6 Top and Bottom Halves . . . . . . . . . . . . . . . . . . . . . . . . 865.6.5.7 Interrupt Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.6.5.8 Adding Your Driver in KConfig . . . . . . . . . . . . . . . . . . . 91

5.6.6 Configure the Flash Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.6.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.6.6.2 Flash Map Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.6.6.3 Combining Multiple Flash Chips (hardcoded) . . . . . . . . . . . . 965.6.6.4 Adding Your Driver in KConfig . . . . . . . . . . . . . . . . . . . 98

5.7 Hands On - LED Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.7.1 Hardware Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.7.2 Kernel Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.7.2.1 Platform Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.7.2.2 Hardware Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.7.2.3 Sysfs Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.7.2.4 Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6 File Systems 1126.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 Disk Based File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2.1 Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.3 Flash Based File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.3.1 Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.4 Network File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.5 Virtual File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7 Userspace 1157.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.2 BusyBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167.2.2 Configuring and Building BusyBox . . . . . . . . . . . . . . . . . . . . . . . 1177.2.3 Manual configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.2.4 Hands On - Adding New Commands to BusyBox . . . . . . . . . . . . . . . 119

7.3 Dropbear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.4 Build Systems and Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.4.1 Buildroot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.5 Hands On - Explore Buildroot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

CONTENTS iv

7.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8 Creating an image with a full Linux system 1288.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.2 Preparing an ARM GNU/Debian based system on a GNU/Debian based build system1288.3 Preparing an ARM GNU/Debian based system on a Non GNU/Debian based build

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1298.4 Customising the ARM root filesystem . . . . . . . . . . . . . . . . . . . . . . . . . 1318.5 Starting up: Compiling the Linux kernel for NAND boot . . . . . . . . . . . . . . . 1328.6 Starting up: Creating the base root filesystem image . . . . . . . . . . . . . . . . . 1338.7 Install GNU/Debian 6.0 on the internal NAND flash . . . . . . . . . . . . . . . . . 134

8.7.1 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1348.8 Booting in the final system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

9 Hacking the SheevaPlug 1419.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1419.2 OpenOCD Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

9.2.1 Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1429.2.2 Target state handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1429.2.3 Memory access commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1429.2.4 Flash commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

9.3 Hacking the SheevaPlug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439.4 Hands On - Tweak System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1509.5 Flashing the system from the bootloader . . . . . . . . . . . . . . . . . . . . . . . . 150

9.5.1 GNU/Debian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1519.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

10 Debugging with GDB 15210.1 GDB and gdbserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15210.2 gdb Remote debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

10.2.1 Major Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15310.2.2 ELF and Binutil Background . . . . . . . . . . . . . . . . . . . . . . . . . . 153

10.3 Remote Debugging With GDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15610.4 Talking Dirty with GDB and SSH Tunnelling . . . . . . . . . . . . . . . . . . . . . 15810.5 SSH Tunnelling and GDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16110.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

A The GNU/Linux System 164A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164A.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164A.3 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165A.4 Linux and GNU/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166A.5 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167A.6 Development efforts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167A.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167A.8 Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168A.9 Market share . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169A.10 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170A.11 Installation on an existing platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 170A.12 Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171A.13 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171A.14 Programming on Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171A.15 Portability of Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171A.16 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

CONTENTS v

A.17 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

B Setting up a Server 173B.1 Setting up the NFS Root Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . 173B.2 Set up a Firewall with a private address range. . . . . . . . . . . . . . . . . . . . . 175

B.2.1 firehol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175B.3 Configure your BDI probe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

B.3.1 Check the serial connection to the BDI . . . . . . . . . . . . . . . . . . . . 176B.3.2 Activating BOOTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176B.3.3 Load/Update the BDI firmware/logic . . . . . . . . . . . . . . . . . . . . . 177B.3.4 Transmit the initial configuration parameters . . . . . . . . . . . . . . . . . 177B.3.5 Fixed Configuarion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177B.3.6 Check configuration and exit loader mode . . . . . . . . . . . . . . . . . . . 178B.3.7 Summarising the upgrade procedure . . . . . . . . . . . . . . . . . . . . . . 178

C Miscellaneous Tools 180C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180C.2 Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

C.2.1 Applying a patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180C.2.2 Creating a patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

C.3 Quilt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181C.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181C.3.2 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181C.3.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

D Network Configuration 185D.1 TCP/IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

D.1.1 Static . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185D.1.2 DHCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

D.1.2.1 IP address allocation . . . . . . . . . . . . . . . . . . . . . . . . . 185D.1.2.2 Protocol Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . 186D.1.2.3 DHCP and firewalls . . . . . . . . . . . . . . . . . . . . . . . . . . 186

D.1.3 ZCIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187D.1.4 DNS-SD & uPnP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

D.2 Writing a simple web interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187D.2.1 What is CGI? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187D.2.2 Structure of a CGI Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187D.2.3 Reading the User’s Form Input . . . . . . . . . . . . . . . . . . . . . . . . . 187D.2.4 Sending the Response Back to the User . . . . . . . . . . . . . . . . . . . . 188D.2.5 Haserl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

D.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

List of Figures

1.1 Architecture of a Linux system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Marvell SheevaPlug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Dreambox 7025 S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Linksys NSLU2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Buffalo Linkstation Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.6 Neo Freerunner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.7 The AzBox HD decoder, and much more... . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 http://buildroot.net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Configuration Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Selecting a system wide path with a date-string avoids confusion and overwriting

existing toolchains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 A high level view of the Linux boot process . . . . . . . . . . . . . . . . . . . . . . 163.2 Anatomy of the MBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Anatomy of bzImage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Major functions flow for the Linux kernel x86 boot . . . . . . . . . . . . . . . . . . 20

5.1 The kernel.org website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 make ARCH=arm menuconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.3 make ARCH=arm gconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4 Layout of a typical PCI System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.5 The standardised PCI configuration registers. . . . . . . . . . . . . . . . . . . . . . 575.6 The arguments to read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.7 Flash map for SVC mk II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.1 Busybox configuration screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1187.2 Winscp, a drag and drop interface to your embedded target . . . . . . . . . . . . . 121

10.1 Running ddd with a remote target. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15810.2 Lab setup with a workstation on a LAN (10.x); public servers (150.158.231.x) and

embedded targets on the LAN. The gateway (niobe) is not directly accessible butprovides an ssh tunnel on port 22 to gemini on the LAN . . . . . . . . . . . . . . . 159

10.3 After putting the ssh tunnel in place, the connections on 150.158.231.13, port 4000are forwarded over TCP to the target 10.2.4.10 on port 2200. . . . . . . . . . . . . 162

A.1 Richard Stallman, founder of the GNU project for a free operating system. . . . . 165A.2 Linus Torvalds, creator of the Linux kernel. . . . . . . . . . . . . . . . . . . . . . . 166A.3 A GNOME Desktop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

C.1 Drawing a graphical dependency between patches with quilt. . . . . . . . . . . . . 184

vi

PREFACE

As an engineer, there is nothing more fun than poking in the internal of a system, to see howit reacts and how it all works. It is very hard to cope with black boxes, we want to know whatcauses output c when inputs a and b are provided. Even more; we want to know that a system isdesigned good and if there is an error; we want to figure out what causes it and fix it.

An open source operating system allows to do exactly that: First of all, it gives us the freedomto design the hardware platform we want, only limited by time and money; to add the peripheralswe care about and to configure our hardware as we see fit. Obviously, we’ll encounter some bumpsalong the road and we’ll need to dig in to our designs; adding debugging code in the kernel - Butin the end, we can always get it to work.

One of the advantages, in our view, of Linux is that we can run the same software on all oursystems: From our servers that compile and manage our environment, over our desktops to ourembedded targets. As the requirements shrink1, so does our operating system. A typical serverinstallation quickly surpasses a couple of GB, while we can shrink the root file system of anembedded Linux system to a couple of hundred kB.

In this text, we tried to bundle some of the experience and techniques we’ve had with buildingembedded Linux systems. We have tried to ensure everything is correct, but some errors are boundto have slipped through. If you feel something is not correct or missing, you are invited to informus about it, so we can correct the text for future trainings.

Though a lot of text is original, some sections have been added or integrated that were accessiblefrom public sources. Wherever possible, you should be able to obtain the original text from thereferences section at the end of each chapter.

We hope you have as much fun and as good a learning experience as we had while draftingthis text.

Flanders, August 2006, May 2008, December 2008, June 2009, August 2010, January 2011, Septem-ber 2012.

Marc Leeman & Peter Korsgaard

1Taking an embedded processor does not mean that it has less capabilities than a desktop or server processor,quite the contrary. A lot of functionality that is otherwise reserved for external peripherals is on the processor dieitself. As a rule, embedded processors will be clocked slower as the desktop and server counterparts and will bere-designed to consume less power during operation.

Chapter 1

Introduction

1.1 Preconditions and Goals

Embedded Linux is a huge topic, that cannot realistically be covered in such a short session (evenif we would know it all). The idea of this training is therefore not to cover everything related tobuilding embedded Linux systems, but rather to provide an introduction to the subject and getyou up to speed as fast a possible. We will try to share the experience we have and to show what so-lutions we have found to work. This is not to say that these are the only workable solutions though!

To limit the scope a bit and provide real life examples we focus on and base the examples onthe existing embedded Linux systems within Barco and the Marvell Sheevaplug. We also assumethat the reader is familiar with Linux on PC hardware. If not, have a look at appendix A.

1.2 System Overview

Like other embedded systems, the detailed architecture of embedded Linux systems vary a lot,but certain basic components are common for all systems.

The basic hardware consists of a CPU, RAM, some kind of storage and a number of peripheralsfor I/O.

Linux supports a long range of CPUs, but ARM and PowerPC processors are typically usedwithin Barco. Storage can also vary a lot: Disks, network, NOR/NAND/managed flash, whereflash is the most commonly used solution. I/O peripherals probably have the most variation ofthem all, but the most interesting from a Linux system design are UARTs and Ethernet MACs.

The software consists of a boot loader, a Linux kernel and one or more file systems containingthe applications and libraries. Boot loaders are further described in chapter 4. Boot loaders areimportant for bringing up a system, but once the kernel is loaded it is no longer active. The genericarchitecture of a running Linux system can be seen in figure 1.1.

Figure 1.1: Architecture of a Linux system

1

CHAPTER 1. INTRODUCTION 2

At the bottom we have the hardware. Right above it the kernel is located. The kernel is thecore part of the operating system, and its purpose is to manage hardware and provide high levelabstractions to the user level software. The kernel is (normally) the only software which talksdirectly to the hardware. The kernel is further described in chapter 5. Above the kernel the userspace applications and static- or dynamically linked libraries are located. Libraries provide furtherhigh level abstraction for applications than what is provided by the kernel. Libraries exists for justabout everything, but all Linux systems at least contain a C library1. User space applications arefurther described in chapter 7.

Notice that this generic architecture is the same for all Linux systems, no matter if they areserver-, desktop- or embedded systems.

1.3 Some Hackable Examples

1.3.1 Marvell SheevaPlug

The Marvell SheevaPlug is a cheap, powerful device in a small form factor. It contains a 1.2GHzMarvell Sheeva processor (ARMv5), 512MB DDR2 and 512MB NAND flash, USB, gigabit ethernetand a SDIO interface.

Figure 1.2: Marvell SheevaPlug

For development, it is also very interesting that the device comes with serial and JTAG access(through USB) out of the box, making it very easy to get started with.

root@debian:~# cat /proc/cpuinfo

Processor : ARM926EJ-S rev 1 (v5l)

BogoMIPS : 1192.75

Features : swp half thumb fastmult edsp

CPU implementer : 0x56

CPU architecture: 5TE

CPU variant : 0x2

CPU part : 0x131

CPU revision : 1

Cache type : write-back

Cache clean : cp15 c7 ops

Cache lockdown : format C

Cache format : Harvard

I size : 16384

I assoc : 4

I line length : 32

I sets : 128

D size : 16384

1You could imagine a setup without it, but it wouldn’t be very useful.


D assoc : 4

D line length : 32

D sets : 128

Hardware : Feroceon-KW

Revision : 0000

Serial : 0000000000000000

The price of a SheevaPlug is around 75 Euros.

1.3.2 Dreambox

The dreambox devices (see Figure 1.3) are very popular DVB (S/T/C) decoders that are allrunning Linux. Since the code is open; a lot of alternative firmwares are available on the internet;offering more flexibility and functionality than the original firmware.

root@dm7025:~> cat /proc/cpuinfo

system type : ATI XILLEON HDTV SUPERTOLL

processor : 0

cpu model : MIPS 4KEc V4.8

BogoMIPS : 297.98

wait instruction : yes

microsecond timers : yes

tlb_entries : 16

extra interrupt vector : yes

hardware watchpoint : yes

VCED exceptions : not available

VCEI exceptions : not available

Unfortunately; the Flemish DVB-C provider has chosen a closed box approach (generate rev-enue based on trivial functionality like recording, pause, delayed playback, . . . ) and getting adreambox to run for cable TV is not that trivial in Flanders. There are reports that programmingthe default box number (read with a JTAG probe) should work.

The most popular use is receiving DVB-S. Even though that the Satellite provider does notsupport a dreambox; it is fully functional with the default firmware in combination with a CI(Common Interface) module or by replacing the firmware with an alternative version that providesa software CAM (Conditional Access Module).

Figure 1.3: Dreambox 7025 S

Depending on the model, a dreambox can be obtained from 300 Euro onwards.


1.3.3 Linksys NSLU2

Another extremely popular device up until recently is the Linksys NSLU2 (see Figure 1.4) (NetworkStorage Link for USB 2.0). It offers out of the box a ARMv5 CPU, running from flash. Via a webinterface; the user can configure the hard disks that can accessed via a number of network protocols(e.g. NFS, Samba, . . . ).

The real interesting part of this device is that the hacker does not need to stick with the onboard flash to build the system on. If a USB disk (or memory stick) is connected; the root filesystemcan be stored on the external device; while the kernel boots from flash. With this modification;the NSLU can serve as full fledged Linux server, keeping into account the hardware limitations ofe.g. 32 MB memory.

[marc@chiana ~]$ cat /proc/cpuinfo

Processor : XScale-IXP42x Family rev 1 (v5l)

BogoMIPS : 266.24

Features : swp half fastmult edsp


CPU architecture: 5TE

CPU variant : 0x0

CPU part : 0x41f

CPU revision : 1

Cache type : undefined 5

Cache clean : undefined 5

Cache lockdown : undefined 5


I size : 32768

I assoc : 32

I line length : 32

I sets : 32

D size : 32768

D assoc : 32

D line length : 32

D sets : 32

Hardware : Linksys NSLU2

Revision : 0000

Serial : 0000000000000000

When the external HDD is replaced by a flash memory pen, the full power of the NSLU2 isunleashed: a running Linux system can be used with as little as 4 Watt power consumption. Somepeople use it for e.g. Domotics control (EIB), network access points for all kinds of USB devices,ssh tunnel server, bittorrent downloader, . . .

The price of a NSLU2 used to be around 70 Euros.

1.3.4 Buffalo Linkstation Live

Unfortunately, the NSLU2 is being made obsolete in the course of 2008, but a good candidate toreplace the niche left by the NSLU2 is the Buffalo Linkstatation Live (see Figure 1.5).

Two of the drawbacks for the NSLU2 were the limited CPU clocking (133 or 266 for newerdevices) and only 32 MB of memory. In contrast, the Linkstation Live pictured here, has an ARM9CPU core, clocked at 400 MHz and 128 MB of memory. Especially for running a home server; theadditional memory comes in handy for multiple concurrent processes.

Again, the stock firmware can be replaced with GNU/Debian and support for the Feroceonprocessor is included from kernl 2.6.27 onwards.


Figure 1.4: Linksys NSLU2

Processor : Feroceon rev 0 (v5l)

BogoMIPS : 266.24

Features : swp half thumb fastmult edsp


CPU architecture: 5TEJ

CPU variant : 0x0

CPU part : 0x926

CPU revision : 0



Cache lockdown : format C


I size : 32768

I assoc : 1

I line length : 32

I sets : 1024

D size : 32768

D assoc : 1

D line length : 32

D sets : 1024

Hardware : Buffalo Linkstation Pro/Live

Revision : 0000

Serial : 0000000000000000

As are real nice hacker feature; the case designers left a hole to connect a serial level converterto; giving direct access to the U-Boot bootloader. It is enough to solder a 90 degree header on themotherboard to get serial access on the device.

Depending on the size of the disk, the price of a Linkstation Live is anywhere between 100 to200 Euros. Note that it can be cheaper buying a device with a small HDD and replace the HDDwith a larger one; than buying the Linkstation with the large disk in the first place.


Figure 1.5: Buffalo Linkstation Live

1.3.5 Neo Freerunner

The Neo FreeRunner (see Figure 1.6) (made by FIC) is a smartphone developed by the Openmokoproject. It is the successor to the first development phase smartphone Neo 1973, and is intendedfor hackers, since it gives the user great customizability.

Processor : ARM920T rev 0 (v4l)

BogoMIPS : 199.47

Features : swp half thumb


CPU architecture: 4T

CPU variant : 0x1

CPU part : 0x920

CPU revision : 0



Cache lockdown : format A


I size : 16384

I assoc : 64

I line length : 32

I sets : 8

D size : 16384

D assoc : 64

D line length : 32

D sets : 8

Hardware : GTA02

Revision : 0360

Serial : 0000000000000000

The default OpenMoko distribution can be replaced by Debian.The Freerunner costs about 350 Euro.


Figure 1.6: Neo Freerunner

1.3.6 AzBox HD

Just like the Dreambox devices, the AZBox is a DVB decoder based on Linux. It has full hardwaredecoding of MPEG4; which allows you to basically decode almost any current video, audio orimage format on your box.

It allows you to add diskspace with Samba, eSata, USB; . . .

Figure 1.7: The AzBox HD decoder, and much more...

system type: Sigma Designs TangoX

processor: 0

cpu model: MIPS 4KEc V6.9

Initial bogomips: 296.96

wait instruction: yes

microsecond timers: yes

tlb_entries: 32

extra interrupt vector: yes

Hardware watchpoint: yes

ASES implemented: mips16

VCED exceptions: not available

VCEI exceptions: not available

System bus frequency: 200250000 Hz


CPU frequency: 300375000 Hz

DSP frequency: 300375000 Hz

At around 350 Euro, it is a lot cheaper than its Dreambox HD counterpart (DM 8000). As withall Dreambox devices; the custom firmwares use Software CAM (Conditional Access Module); tokeep track of the key negociation for image decoding. As such; keys can be shared over the network.

Chapter 2

Cross Compilation Toolchain

2.1 Introduction

Before we can get started with developing embedded Linux systems we need a toolchain suitablefor generating code for our embedded platform. Development can be done natively (E.G. on theembedded system itself once it is bootstrapped), but by far the most common setup is to use across compiler.

A cross compiler allows the developer to run the compilation on a much more powerful plat-form (a multiuser server or a powerful desktop machine) instead of the slower and more resourceconstrained embedded system.

This chapter describes how to configure and compile such a cross toolchain from sources. It ispossible to download pre-compiled cross toolchains like the ones included in ELDK, but even ifyou are not going to compile the toolchain yourself, it can be very useful to know how it is done.

Just like on a desktop Linux system, the toolchain of choice for an embedded Linux system isthe GNU toolchain.

Compiling a program takes place by running a compiler on the build platform. The compiledprogram will run on the host platform. Usually these two are the same; if they are different, theprocess is called cross-compilation.

Typically the hardware architecture differs, like for example when compiling a program destinedfor the PowerPC architecture on an x86-64 computer; but cross-compilation is also applicable whenonly the operating system environment differs, as when compiling a FreeBSD program under Linux;or even just the system library, as when compiling programs with uClibc on a glibc host.

The GNU/Autotools packages (i.e. autoconf, automake, and libtool) use the notion of a buildplatform, a host platform, and a target platform.

The build platform is where the code is actually compiled.

The host platform is where the compiled code will execute.

The target platform usually only applies to compilers as it represents what type of object codethe package itself will produce (such as cross-compiling a cross-compiler); otherwise thetarget platform setting is irrelevant.

Since we will be compiling a target filesystem image that fits in under 1 MB, we cannot performthe compilation on the target itself (limited in flash). Even if we could; it would still be betterand faster to do this in a server class machine. Even when compiling for a target architecture thatis similar to the server/development environment, there are valid arguments for using a cross-compiler; especially when the product is relatively long lived and there are no plans to upgradethe operating systems’ libc version1.

1This is not a good idea in any case, but it beats having to keep around that single version of the obsoleteRH7.0, merely for building the firmware

9

CHAPTER 2. CROSS COMPILATION TOOLCHAIN 10

In this chapter, the building blocks will be laid out for the cross compiler specifically targetedfor small embedded systems.

First, gcc will be introduced, followed by glibc. gcc and glibc is the typical compiler combi-nation that is used in most desktop systems. The following section will cover a smaller alternativeto glibc: uClibc. Finally, gdb (and gdbserver) is introduced.

These are the building blocks for the cross compilation toolchain that we need for our previouslyintroduced target. Manually hacking up a compiler can be a challenging task; but luckily there isan easier way: Buildroot, which is a set of Makefiles doing exactly this2.

2.2 GNU Toolchain

A minimal GNU toolchain consists of binutils, the GNU Compiler Collection (GCC), and a Clibrary.

Binutils are the binary utilities of the toolchain, i.e. the programs that work with the binaryand object files. This includes the assembler, linker, archiver and a number of smaller more-or-lessobscure utilities.

GCC is the compiler itself. GCC contains front-ends for a lot of languages (C, C++, Java,Ada, Objective C, Fortran, ..), but here we will only focus on the C compiler.

Last, but not least, a C library is needed. The C library is part of the configuration and creationof the GNU toolchain because part of the compiler configuration depends on the chosen C library.

Due to this, GCC has to be compiled in two steps. First a bootstrap compiler is compiled,which is then used to compile the C library, which in turn is used to compile the final compiler.

To summarise, a GNU toolchain configuration depends on 3 high level choices:

• Build and target CPU type

• Build and target Operating System

• C library to use

A configuration could for example be: A cross compiler running on an x86 Linux PC whichcreates executables for an embedded Linux system with a PowerPC processor using the uClibcC library (see below). To keep track of all these configuration parameters, the following namingconvention is normally used for the binaries:

<target-cpu>-<target-os>-<target-c-library>-<toolname>

e.g. the C compiler for the above would be called:

powerpc-linux-uclibc-gcc

Next to these major configuration choices, some more subtle tweaking is still available. Oneof the most important of these is floating point mode. The compiler can either be configured togenerate hardware floating point instructions or use a software floating point emulation. Hardwarefloating point instructions can be used even if the CPU doesn’t have a FPU, but then the kernelhas to emulate it, which is a lot slower than soft float (10-100x).

2.3 C Library

What C library to use? Several options exists, the most popular being the GNU C library (Glibc)and uClibc:

2There are a number of alternatives that will not be covered here


2.3.1 GNU C Library

Glibc is the GNU project’s C standard library. It is free software and is available under the GNULesser General Public License. The lead contributor and maintainer is Ulrich Drepper.

Glibc is what is used for practically all desktop and server Linux distributions. It is veryfeatureful and supports a lot of different hardware platforms and operating systems. Unfortunatelyit is also very big (several MBs), which makes it less suitable for building small embedded Linuxsystems.

2.3.2 uClibc

uClibc is a small C library intended for embedded Linux systems.uClibc was created to support uClinux, a version of Linux not requiring a memory management

unit and thus suited for microcontrollers (hence the “uC” in the name), but now also runs on “real”Linux.

uClibc is much smaller than Glibc, but still very much compatible. For most applications nochange to the source code is needed to use uClibc.

While Glibc is intended to fully support all relevant C standards across a wide range of plat-forms, uClibc is specifically focused on embedded Linux. Features can be enabled or disabledaccording to space requirements.

uClibc doesn’t support other operating systems than Linux. It supports amongst others: i386,ARM, AVR32, Blackfin, h8300, m68k, Microblaze, MIPS, Nios/Nios2, PowerPC, SuperH, SPARC,and x86-64 processors.

2.4 Compilation

As described above, the GNU toolchain is a big system consisting of several independent packages,every version of which might not be compatible with each other without extra patches. Finding aworking combinations of all these packages and Compiling the toolchain by hand is not a simplejob.

Luckily there now exists scripts to automate it, crosstool(-NG) and buildroot.

2.4.1 Crosstool-NG

Crosstool-ng is a tool by Yann E. Morin, which makes it easy to create cross toolchains usinguClibc/Glibc/EGlibc. Crosstool-ng is nice, but it only creates toolchains, so we will here insteadfocus on Buildroot (see chapter 7).

Notice that Crosstool-ng toolchains can be used with Buildroot through its external toolchainsupport.

2.4.2 Buildroot

Buildroot is a set of Makefiles and patches that allows to easily generate cross toolchains usinguClibc. Actually it is more than that, as it can also be used to build the complete userspace for asystem, but more about that in chapter 7.

2.5 Hands On - Toolchain with Buildroot

2.5.1 Getting the Source

While the hardware platform for the duration of the course will be Marvell SheevaPlug, most ofthe Barco designs use Buildroot to create a toolchain and/or the target filesystem. The centralwebsite for Buildroot is http://www.buildroot.net, See figure 2.1.

http://www.buildroot.net


Figure 2.1: http://buildroot.net

Buildroot until recently didn’t have releases on a regular basis, but that has luckily changed.As for getting the source, we take the latest version available (or you can check out the sourceswith git).

[mleeman@cypher code]$ wget http://www.buildroot.net/downloads/\

buildroot-2012.08.tar.bz2

[mleeman@cypher code]$ tar jxf buildroot-2012.08.tar.bz2

[mleeman@cypher code]$ cd buildroot-2012.08

2.5.2 Configuring

[mleeman@cypher buildroot-2012.08]$ make menuconfig

As for most projects that tackle the complexity of creating a kernel; the configuration can bedone in detail; configuring each and every component; or in a more coarse fashion. Since most ofthe developers focus on small and fast, it can be assumed that the defaults are reasonable (thishas been verified by experience).

At this point, only a toolchain is created; that is the compiler, the binutils, optionally gdb,and the (uC)libc version that is heavily intertwined with the compiler. When gdb is enabled forthe host, gdbserver for the target needs to be enabled too (one without the other does not makemuch sense). When browsing through the options, disable all the target packages.

Figure 2.2: Configuration Interface

If we have an existing toolchain configuration file from a previous build (see Figure 2.2), wecan load it in the configuration tool that is modelled on the Linux kernel configuration.


Figure 2.3: Selecting a system wide path with a date-string avoids confusion and overwritingexisting toolchains

Compilers and libc libraries improve and evolve over time. On the other hand, installing a newtoolchain, is changing the entire engine of your embedded development and needs to be done withcare. Therefore, adding a date string in system wide path (where the toolchain will be placed) isadded to avoid this. This way, users can play with different compilers by just changing the datestring in their $PATH environment variable (see Figure 2.3).

Exit and save the configuration. The final list of changed options is rather short:

[mleeman@cypher buildroot-2012.08]$ make savedefconfig

[mleeman@cypher buildroot-2012.08]$ cat defconfig

BR2_arm=y

BR2_arm926t=y

BR2_PACKAGE_GDB_SERVER=y

BR2_PACKAGE_GDB_HOST=y

make savedefconfig creates a defconfig file from the full .config, with only the settingsthat are changed from the default.

Run make, sit back and enjoy3:

[mleeman@cypher buildroot-2012.08]$ make

Buildroot will now download and compile all the packages. If a question is asked for input; justopt for the default values.

Depending on the speed of you machine, this will take from about an hour to several hours tocompile (after all, the GCC compiler is compiled 3×).

The result for the target is is a number of files located in output/images. The number of filesdepends on the targets you selected for the root filesystem. Typical targets are archive, ubifs,ext2, jffs2, . . .

In order to use it, add these lines to the bottom of your ~/.bashrc

3You will need to configure wget to either use a proxy that does not require authentication and that uses theBarco proxy as a parent; or configure the .wgetrc to use the proxy-user and proxy-password options.


PATH=/users/firmware/mleeman/Development/\

buildroot-2012.08/buildroot-2012.08/\

output/host/usr/bin:$PATH

export PATH

and re-source your .bashrc

[mleeman@neo buildroot]$ . ~/.bashrc

A final check of our toolchain should result in:

[mleeman@cypher bin]$ ./arm-unknown-linux-uclibcgnueabi-gcc -v

Using built-in specs.

COLLECT_GCC=./arm-unknown-linux-uclibcgnueabi-gcc

COLLECT_LTO_WRAPPER=/users/firmware/mleeman/Development/buildroot-2012.08/buildroot-2012.08/output/host/usr/libexec/gcc/arm-unknown-linux-uclibcgnueabi/4.7.1/lto-wrapper

Target: arm-unknown-linux-uclibcgnueabi

...

Thread model: posix

gcc version 4.7.1 (Buildroot 2012.08)

2.5.3 Finishing up

After creating the toolchain, we want to distribute it in a clean fashion to other machines of similararchitecture (e.g. colleagues debugging in the field with laptops).

In order to do that; select a location more suitable than a home directory(/opt/barco/arm/20120911/toolchain uclibc arm/); and build the toolchain there.

Assuming you’ve built the toolchain on a comparable machine, use the following command topackage the toolchain in a Debian package:

[mleeman@neo buildroot-20120911]$ tar cvfz toolchain_arm_uclibc_20120911.tar.gz \

/opt/barco/arm/20120911/

[mleeman@neo buildroot-20120911]$ fakeroot alien --fixperms \

toolchain_arm_uclibc_20120911.tar.gz

toolchain-arm-uclibc-20120911_1-2_all.deb generated

Note that we put the time stamp in the package name, instead as in the version name; sincewe want to allow different version to exist next to each other after installation. If not, installing apackage with a more recent version will replace (and remove) the other package.

2.6 References

• Cross Compile: http://en.wikipedia.org/wiki/Cross-compile

• Remote Debugging: http://www.cucy.net/lacp/archives/000024.html

• GCC: http://gcc.gnu.org

• Glibc: http://www.gnu.org/software/libc/

• uClibc: http://www.uclibc.org

• Crosstool-NG: http://ymorin.is-a-geek.org/projects/crosstool

• Buildroot: http://buildroot.net

• Embedded Linux Development Kit (ELDK): http://www.denx.de/wiki/DULG/ELDK

http://en.wikipedia.org/wiki/Cross-compile

http://www.cucy.net/lacp/archives/000024.html

http://gcc.gnu.org

http://www.gnu.org/software/libc/

http://www.uclibc.org

http://ymorin.is-a-geek.org/projects/crosstool

http://buildroot.net

http://www.denx.de/wiki/DULG/ELDK

Chapter 3

The Linux Boot Process

In the beginning, there was GRUB (or maybe LILO) and GRUB loaded the kernel,and kernel begat init, and init begat rc, and rc begat network and httpd and getty,and getty begat login, and login begat shell and so on.

3.1 Introduction

This section will cover the boot process of most Linux distributions. Even though there are somedifferences between the distributions, the process is alike.

The process of booting a Linux system consists of a number of stages, but whether a x86,x86-64 desktop, server or a deeply embedded processor is booted, the flow is similar. In thischapter, we will explore the Linux boot process from the initial bootstrap to the start of the firstuser-space application. Along the way; several boot-related topics such as the bootloaders, kerneldecompression and RAM disks and other element of the Linux boot process will be introduced.

As an example, a GNU/Debian 6.0 (Squeeze) on a x86-64 will be used to explain the process;but booting on x86, PowerPC, Sparc, . . . are more or less the same.

In modern computers the bootstrapping process begins with the CPU executing software con-tained in ROM (for example, the BIOS of an IBM PC) at a predefined address (the CPU isdesigned to execute this software after reset without outside help). This software contains rudi-mentary functionality to search for devices eligible to participate in booting, and load a smallprogram from a special section (most commonly the boot sector) of the most promising device.

Boot loaders may face peculiar constraints, especially in size; for instance, on the IBM PC andcompatibles, the first stage of boot loaders must fit into the first 446 bytes of the Master BootRecord, in order to leave room for the 64-byte partition table and the 2-byte AA55h ’signature’,which the BIOS requires for a proper boot loader.

Today’s computers are equipped with facilities to simplify the boot process, but that doesn’tnecessarily make it simple.

Figure 3.1 shows a high level view of the Linux boot process. In the next sections, each stepwill be elaborated.

When a system is first booted, or is reset, the processor executes code at a well-known location.In a personal computer (PC), this location is in the basic input/output system (BIOS), which isstored in flash memory on the motherboard. The central processing unit (CPU) in an embeddedsystem invokes the reset vector to start a program at a known address in flash/ROM. In a lot ofLinux based embedded processors; the devices is boot at a well know address (e.g. 0x00000100 onChip Select 0 (CS0)). Placing the bootloader (e.g. U-Boot) on that location will start it.

In either case, the result is the same. Because PCs offer so much flexibility, the BIOS mustdetermine which devices are candidates for boot. We’ll look at this in more detail later.

When a boot device is found, the first-stage boot loader is loaded into RAM and executed. Thisboot loader is less than 512 bytes in length (a single sector), and its job is to load the second-stage

15

CHAPTER 3. THE LINUX BOOT PROCESS 16

Figure 3.1: A high level view of the Linux boot process

boot loader.When the second-stage boot loader is in RAM and executing, a splash screen is commonly

displayed, and Linux and an optional initial RAM disk (temporary root file system) are loadedinto memory. When the images are loaded, the second-stage boot loader passes control to thekernel image and the kernel is decompressed and initialised. At this stage, the kernel checks andinitialises the system hardware, enumerates the attached hardware devices, mounts the root device,and then loads the necessary kernel modules. When complete, the first user-space program (init)starts, and high-level system initialisation is performed.

That’s Linux boot in a nutshell. Now let’s dig in a little further and explore some of the detailsof the Linux boot process.

3.2 Step 1: The Boot Manager

The boot manager is a small program that resides mostly on the MBR1 1 and presents a menufor choosing the Operating System (if more than one is present); kernel or boot options to boot.

In the regular, plain-old-booting-linux business, all the boot loader does is:

• Load the kernel into memory

• Optionally load a ramdisk called initrd containing stuff like disk drivers

• Pass the kernel arguments, of which we are only interested in runlevel and init

• Start execution of the kernel.

3.2.1 System startup

The system startup stage depends on the hardware that Linux is being booted on. On an embeddedplatform, a bootstrap environment is used when the system is powered on, or reset. Examplesinclude U-Boot, RedBoot, and MicroMonitor from Lucent. Embedded platforms are commonlyshipped with a boot monitor. These programs reside in special region of flash memory on thetarget hardware and provide the means to download a Linux kernel image into flash memory andsubsequently execute it. In addition to having the ability to store and boot a Linux image, these

1Master Boot Record.


boot monitors perform some level of system test and hardware initialisation. In an embeddedtarget, these boot monitors commonly cover both the first- and second-stage boot loaders.

In a PC, booting Linux begins in the BIOS at address 0xFFFF0. The first step of the BIOS isthe power-on self test (POST). The job of the POST is to perform a check of the hardware. Thesecond step of the BIOS is local device enumeration and initialisation.

Given the different uses of BIOS functions, the BIOS is made up of two parts: the POSTcode and runtime services. After the POST is complete, it is flushed from memory, but the BIOSruntime services remain and are available to the target operating system.

To boot an operating system, the BIOS runtime searches for devices that are both activeand bootable in the order of preference defined by the complementary metal oxide semiconductor(CMOS) settings. A boot device can be a floppy disk, a CD-ROM, a partition on a hard disk, adevice on the network, or even a USB flash memory stick.

Commonly, Linux is booted from a hard disk, where the Master Boot Record (MBR) containsthe primary boot loader. The MBR is a 512-byte sector, located in the first sector on the disk(sector 1 of cylinder 0, head 0). After the MBR is loaded into RAM, the BIOS yields control to it.

3.2.2 Extracting the MBR

As an exercise, the MBR can be inspected. Use these commands:

$ sudo dd if=/dev/sda of=mbr.bin bs=512 count=1

$ od -xa mbr.bin

The dd command, which needs to be run from root. Since is is a bad habit of logging into yoursystem as root; we use the sudo command that gives the user temporarily root permissions. ddreads the first 512 bytes from /dev/sda (the first disk drive) and writes them to the mbr.bin file.The od command prints the binary file in hex and ASCII formats.

3.2.3 Stage 1 boot loader

The primary boot loader that resides in the MBR is a 512-byte image containing both programcode and a small partition table (see Figure 3.2). The first 446 bytes are the primary boot loader,which contains both executable code and error message text. The next sixty-four bytes are thepartition table, which contains a record for each of four partitions (sixteen bytes each). The MBRends with two bytes that are defined as the magic number (0xAA55). The magic number servesas a validation check of the MBR.

The job of the primary boot loader is to find and load the secondary boot loader (stage 2). Itdoes this by looking through the partition table for an active partition. When it finds an activepartition, it scans the remaining partitions in the table to ensure that they’re all inactive. Whenthis is verified, the active partition’s boot record is read from the device into RAM and executed.

3.2.4 Stage 2 boot loader

The secondary, or second-stage, boot loader could be more aptly called the kernel loader. The taskat this stage is to load the Linux kernel and optional initial RAM disk.

The first- and second-stage boot loaders combined are called Linux Loader (LILO) or GRandUnified Bootloader (GRUB) in the x86 PC environment. Both alternatives are pretty well docu-mented, elaborating on the options server little purpose here. Most of the options and configurationis done in a text file with a lot of the options explained in commentary (e.g. /boot/grub/menu.lstfor GRUB and /etc/lilo.conf for LILO). Some distribution have patched versions for includinggraphical themes instead of the default minimalistic text or curses-alike approach. A differencethat should be mentioned is that LILO requires to run the lilo command after modifying theconfiguration file; while current GRUB version do not: the changes in /boot/grub/menu.lst areinstantaneous.


Figure 3.2: Anatomy of the MBR

Because LILO has some disadvantages that were corrected in GRUB, let’s look into GRUB.The great thing about GRUB is that it includes knowledge of Linux file systems. Instead of

using raw sectors on the disk, as LILO does, GRUB can load a Linux kernel from an ext2 or ext3file system. It does this by making the two-stage boot loader into a three-stage boot loader. Stage1 (MBR) boots a stage 1.5 boot loader that understands the particular file system containingthe Linux kernel image. Examples include reiserfs stage1 5 (to load from a Reiser journaling filesystem) or e2fs stage1 5 (to load from an ext2 or ext3 file system). When the stage 1.5 boot loaderis loaded and running, the stage 2 boot loader can be loaded.

With stage 2 loaded, GRUB can, upon request, display a list of available kernels (definedin /boot/grub/menu.lst). You can select a kernel and even amend it with additional kernelparameters. Optionally, you can use a command-line shell for greater manual control over theboot process.

With the second-stage boot loader in memory, the file system is consulted, and the defaultkernel image and initrd image are loaded into memory. With the images ready, the stage 2 bootloader invokes the kernel image.

3.2.4.1 GRUB stage boot loaders

The /boot/grub directory contains the stage1, stage1.5, and stage2 boot loaders, as well as anumber of alternate loaders (for example, CR-ROMs use the iso9660 stage 1 5).


3.2.5 Kernel

With the kernel image in memory and control given from the stage 2 boot loader, the kernel stagebegins. The kernel image isn’t so much an executable kernel, but a compressed kernel image. OnLinux systems, vmlinux is a statically linked executable file that contains the Linux kernel in oneof the executable file formats supported by Linux, including ELF, COFF and a.out. The vmlinuxfile might be required for kernel debugging, generating symbol table or other operations, but mustbe made bootable before being used as an operating system kernel by adding a multiboot header,bootsector and setup routines.

Typically this is a zImage (compressed image, less than 512KB) or a bzImage (big compressedimage, greater than 512KB), that has been previously compressed with zlib. As the Linux kernelmatured, the size of the kernels generated by users grew beyond the limits imposed by somearchitectures, where the space available to store the compressed kernel code is limited. The bzImage(big zImage) format was developed to overcome this limitation by cleverly splitting the kernel overdiscontiguous memory regions (see Figure 3.3). The bzImage format is still compressed using thezlib algorithm2.

Figure 3.3: Anatomy of bzImage

At the head of this kernel image is a routine that does some minimal amount of hardwaresetup and then decompresses the kernel contained within the kernel image and places it into highmemory. If an initial RAM disk image is present, this routine moves it into memory and notes itfor later use. The routine then calls the kernel and the kernel boot begins.

When the bzImage (for an x86 image) is invoked, you begin at ./arch/x86/boot/header.S inthe start assembly routine (see Figure 3.4 for the major flow). This routine does some basic hard-ware setup and invokes the startup 32 routine in ./arch/x86/boot/compressed/header.S. Thisroutine sets up a basic environment (stack, etc.) and clears the Block Started by Symbol (BSS).The kernel is then decompressed through a call to a C function called decompress kernel (locatedin ./arch/x86/boot/compressed/misc.c). When the kernel is decompressed into memory, it iscalled. This is yet another startup 32 function, but this function is in ./arch/x86/kernel/header.S.

In the new startup 32 function (also called the swapper or process 0), the page tables areinitialised and memory paging is enabled. The type of CPU is detected along with any optionalfloating-point unit (FPU) and stored away for later use. The start kernel function is then invoked(init/main.c), which takes you to the non-architecture specific Linux kernel. This is, in essence,the main function for the Linux kernel.

2Although there is the popular misconception that the bz- prefix means that bzip2 compression is used (thebzip2 package is often distributed with tools prefixed with bz-, such as bzless, bzcat, etc.), this is not the case.


Figure 3.4: Major functions flow for the Linux kernel x86 boot

With the call to start kernel, a long list of initialisation functions are called to set up inter-rupts, perform further memory configuration, and load the initial RAM disk. In the end, a call ismade to kernel thread (in ./arch/x86/kernel/process.c) to start the init function, which isthe first user-space process. Finally, the idle task is started and the scheduler can now take control(after the call to cpu idle). With interrupts enabled, the pre-emptive scheduler periodically takescontrol to provide multitasking.

During the boot of the kernel, the initial-RAM disk (initrd) that was loaded into memoryby the stage 2 boot loader is copied into RAM and mounted. This initrd serves as a temporaryroot file system in RAM and allows the kernel to fully boot without having to mount any physicaldisks. Since the necessary modules needed to interface with peripherals can be part of the initrd,the kernel can be very small, but still support a large number of possible hardware configurations.After the kernel is booted, the root file system is pivoted (via pivot root) where the initrd rootfile system is unmounted and the real root file system is mounted.

The initrd function allows you to create a small Linux kernel with drivers compiled as loadablemodules. These loadable modules give the kernel the means to access disks and the file systemson those disks, as well as drivers for other hardware assets. Because the root file system is a filesystem on a disk, the initrd function provides a means of bootstrapping to gain access to thedisk and mount the real root file system. In an embedded target without a hard disk, the initrd

can be the final root file system, or the final root file system can be mounted via the Network FileSystem (NFS).

3.2.5.1 Manual boot in GRUB

From the GRUB command-line, you can boot a specific kernel with a named initrd image asfollows:

grub> kernel /bzImage-2.6.22.6

[Linux-bzImage, setup=0x1400, size=0x29672e]

grub> initrd /initrd-2.6.22.6.img

[Linux-initrd @ 0x5f13000, 0xcc199 bytes]


grub> boot

Uncompressing Linux... Ok, booting the kernel.

If you don’t know the name of the kernel to boot, just type a forward slash (/) and press theTab key. GRUB will display the list of kernels and initrd images.

3.2.5.2 decompress kernel output

The decompress kernel function is where you see the usual decompression messages emitted tothe display:

Uncompressing Linux... Ok, booting the kernel.

3.3 Step 2: init

After the kernel is booted and initialised, the kernel starts the first user-space application. This isthe first program invoked that is compiled with the standard C library. Prior to this point in theprocess, no standard C applications have been executed.

The init argument the boot loader can pass to the kernel is the name of a program. Usually,none is given, and the default, /sbin/init is used. But it need not be. Rarely do embedded systemsrequire the extensive initialisation provided by init (as configured through /etc/inittab). Inmany cases, you can invoke a simple shell script that starts the necessary embedded applications.

A good example where /sbin/init is replaced by a script is in embedded systems; where aread-only filesystem is overlaid with another FS that is writable. The changes are written to aflash filesystem and during boot these changes are again overlaid on the RO filesystem. In thiscase, e.g. init=/etc/preinit is passed to the kernel as an argument.

#!/bin/sh

# script to do pivot root and allow the entire root filesystem to be

# written to

/sbin/insmod /lib/modules/$(uname -r)/kernel/fs/mini_fo/mini_fo.ko

/sbin/insmod /lib/modules/$(uname -r)/kernel/lib/zlib_deflate/zlib_deflate.ko

/sbin/insmod /lib/modules/$(uname -r)/kernel/fs/jffs2/jffs2.ko

if ! /bin/mount -t jffs2 -w -o noatime,nodiratime /dev/mtdblock7 /mnt/mtdblock7

then

/usr/bin/eraseall /dev/mtd7

/bin/mount -t jffs2 -w -o noatime,nodiratime /dev/mtdblock7 /mnt/mtdblock7

fi

mount -t mini_fo -o base=/,sto=/mnt/mtdblock7 / /mnt/mini_fo

cd /mnt/mini_fo

[ -e old_rootfs ] || mkdir -p old_rootfs

pivot_root . old_rootfs

exec /usr/sbin/chroot . /sbin/init

echo "Oops, exec chroot didnt work! :( :( :( "

exit 1

When the we pass the following parameter to the kernel: init=/bin/sh to the kernel, and thena plain shell would be used instead of init.


What does the kernel do with init? It starts it. It’s the only program the kernel itself starts,everything else is started by init.

The regular Linux init will then read a file called /etc/inittab to see what it has to do. Theformat of that file is somewhat involved and archaic, but it’s not too complex3.

In order to understand the process of init, the concept of a runlevel needs to be introduced. Arunlevel is a state or mode, that is defined by the services that run in that mode. The runlevelsare derived from its Unix historical roots. Here services means services like sshd, network, ftpdand, crond, . . .

Runlevels are needed because different systems can be used in different ways. Some servicesare not available until the system is in a particular state or mode. Only when some lower servicesare available, other higher services can be started/used.

Consider that your system disk, may be a LAN server and, is corrupted and you want torepair it. In such situations, you do not expect other users to login to the system. Now you canswitch to runlevel 1 and perform the maintenance tasks on your disk. Since runlevel 1 doesn’tsupport network/multiuser login, other users cannot login to the system, when it is under main-tenance. (i.e. When a low-level service filesystem is not available, other high-level services such asmultiuser/network login cannot be started or used).

Linux has the following runlevels:

0 : Halt (Shutdown)

1 : Single User Mode

2 : Basic Multi-User mode without NFS

3 : Full Multi-User mode

4 : Not Used (User Definable)

5 : Full Multi User Mode with X11 Login

6 : Reboot

Each runlevel runs a particular set of services. The list of all services in the system will be inthe /etc/init.d directory. There is a directory that corresponds to each runlevels.

• For runlevel 0: /etc/rc0.d







• For runlevel S: /etc/rcS.d

3A lot of embedded systems do not use the Sys-V init, but busybox init. The configuration of the busyboxinittab file is slightly different. Another option is initng. While classic init executes processes in sequence; and alot of these tasks are hardware dependent; the processor is idle while waiting the reply from the hardware. initngtackles this by starting independent tasks in parallel, resulting in a faster boot-up; but a lot harder to configuredue to dependencies between tasks.


Each of these directory will contain many symbolic links. These links will point to the servicesin the /etc/init.d directory. All these links will start with either an ”S” or ”K”. Each link is namedwith a prefix of ”K” or ”S” according to whether that particular service need to be killed or startedin that runlevel.

e.g.. Consider the following entries (symbolic links) in the directory /etc/rc0.d:

[mleeman@seraph ~]$ ls -1 /etc/rc0.d/

K11anacron

K11cron

K20autofs

K20courier-authdaemon

K20courier-mta

...

S50mdadm-raid

S60umountroot

S90halt

This directory corresponds to runlevel 0 which is ”shutdown”. Here the services ”killall” and”halt” are started. All other services are killed. This can be seen since only killall and halt startwith ”S” and all other entries start with ”K”. You may wonder what if ”killall” and ”halt” servicesstart before the kill of all the other services. Unfortunately that doesnt happen. First all the killservices in the directory will be executed, followed by the start services. If you need further info,tweak into the /etc/init.d/rc file which manages the start and stop of services when switchingrunlevels.

The system starts, when init loads in an undefined state (sometimes called N), and then willswitch to one runlevel or another depending on what the runlevel argument from the bootloaderto the kernel was, and the contents of /etc/inittab.

For example, if the bootloader passed runlevel as 5, init will try to switch to that state. If norunlevel argument was passed, it will use its default, which is in /etc/inittab

The default runlevel is defined in the /etc/inittab file:

# The default runlevel.

id:3:initdefault:

By default it is set to runlevel 3 or 5 (when X11 is installed). It can be customized to yourneeds4.

Some distributions (like Debian) define a sysinit runlevel that is run first (/etc/rcS.d), andstarting as few processes as possible5.

Normally the only reason for the bootloader to pass an argument is if you want it to boot in anunusual state, for example, a single-user mode for maintenance (runlevel 1), or with a replacementinit because of disk corruption (init=/bin/sh).

So, let’s look at that file in more detail.

3.3.1 Step 2.1: /etc/inittab

All lines starting with # are comments. The other lines are like this:

1:2345:respawn:/sbin/getty 38400 tty1

They have 4 fields, separated with colons, which mean (taken from the inittab(5) man page).

id : is a unique sequence of 1-4 characters which identifies an entry in inittab (for versions ofsysvinit compiled with the old libc5 (¡ 5.2.18) or a.out libraries the limit is 2 characters).

4Alert: Be sure not to set the default to 0 or 6.5In fact, Debian, as well as most of the distributions based on it, like Ubuntu, does not make any difference

between runlevels 2 to 5, they are all there for the local admin to configure to his or her taste.


runlevels : lists the runlevels for which the specified action should be taken.

action : describes which action should be taken.

process : specifies the process to be executed. If the process field starts with a ‘+’ character, initwill not do utmp and wtmp accounting for that process. This is needed for gettys that insiston doing their own utmp/wtmp housekeeping.

When it’s booting, to decide the desired runlevel (again, if it’s not passed as an argument),init will look for a line with the initdefault action.

id:5:initdefault:

That means: go to runlevel 5. So, if you wanted to change the default runlevel, that’s whatyou change.

But what does it mean to go to one runlevel? Well, each runlevel runs a different configurationof software. One runlevel may have a webserver running, and another not have it. One runlevelmay show you a graphical login screen, or not, or give you 6 text terminals, or one.

So, for example, if you switched to runlevel 6, it would reboot. You can switch runlevels at anymoment using the telinit command, but for the purposes of booting and this article, you switchonly once, to the default runlevel, and you’re done.

So, what happens after you know you are going to runlevel 5?If you are booting, you check all lines with actions sysinit boot and bootwait, in that order,

and run what the command field says.For GNU/Debian, this is

si::sysinit:/etc/init.d/rcS

So, it will run a script called /etc/init.d/rcS, which does stuff like loading a terminal font,check disks, mount stuff... basic system habitability drudge work.

Then it will get all lines with action once and wait that have the desired runlevel in the runlevelfield, and will run its commands, and will wait until the wait lines commands are finished.

In GNU/Debian, for runlevel 5:

l5:5:wait:/etc/init.d/rc 5

What this particular script does is start all services configured for runlevel 5. That’s what yousee when it says things like ”configuring foo bar” on boot. Now, let’s see the details...

3.4 Step 3: Services

When you install a decently packaged software that needs to run without being manually startedby a user (think webserver), it should have provided you with a control script for itself, and placedit in the standard place: /etc/init.d/.

There you will find many scripts. For example, there is one called /etc/init.d/networking

which, amazingly enough, controls the network.For example, when /etc/init.d/networking stop is executed, it brings down the network.

and /etc/init.d/networking start brings the networking back up.Some services support more or less commands, but all support stop, start and restart. To see

what is supported, call the script without arguments:

[mleeman@seraph ~]$ /etc/init.d/networking

Usage: /etc/init.d/networking {start|stop|restart|force-reload}


For each runlevel, there’s a list of services that should be started, and a list of services thatshould be stopped. On entering runlevel 5, for example, you may want to stop service httpd butstart service smb, or whatever. I heavily recommend you use a system management tool, likeDebian’s rcconf to handle this, they are simple and work just fine. But, if you want to do it byhand, or just want to know how that configuration is stored, read on :-)

For each runlevel N, there is a folder, called /etc/rc.d/rcN.d.Here is part of runlevel 5:

[mleeman@seraph ~]$ ls -al /etc/rc5.d/ | cut -c 52-

...

S10sysklogd -> ../init.d/sysklogd

S11klogd -> ../init.d/klogd

S14ppp -> ../init.d/ppp

S19slapd -> ../init.d/slapd

S20autofs -> ../init.d/autofs

...

S91apache2 -> ../init.d/apache2

S99rmnologin -> ../init.d/rmnologin

S99stop-bootlogd -> ../init.d/stop-bootlogd

As mentioned before, the links that start with K are to be stopped and those which start withS are to be started. The numbers are to give them an order to be killed or started.

The stopping or starting is simply done by calling, for example

/etc/rc5.d/S20autofs start

Since S20autofs is a symbolic link to /etc/init.d/autofs, it’s just the same as what weused before to start the network service.

After all that is done, all services are started, we get back to inittab.

3.5 Step 4: More inittab fun

Now init will get all lines with action respawn for the desired runlevel and start their processes.respawn commands are restarted when they end, so they will be running pretty much all the timeas long as you are in this runlevel.

For GNU/Debian in runlevel 5:







co:2345:respawn:/sbin/getty -L console 57600 vt220

The lines with id 1 through 6 run a program in the terminals you reach using ALT-F1 throughALT-F6, which asks your username. Yes, those are what you use to login in text mode.

The line with co spawns a serial console on the serial port.And voila, you are booted and ready to login.

3.6 Hands On

By now, the SheevaPlug device should be up and running. Log in to your device (user: root,password: nosoup4u) and examine the /etc/inittab file. Follow the logic and scripts in that file.

After this, check /proc/cmdline and match this with the inittab file.


3.7 References

• http://www-128.ibm.com/developerworks/linux/library/l-linuxboot/

• http://en.wikipedia.org/wiki/BzImage

• http://sourceforge.net/projects/u-boot

• http://www.faqs.org/docs/Linux-HOWTO/Kernel-HOWTO.html

http://www-128.ibm.com/developerworks/linux/library/l-linuxboot/

http://en.wikipedia.org/wiki/BzImage

http://sourceforge.net/projects/u-boot

http://www.faqs.org/docs/Linux-HOWTO/Kernel-HOWTO.html

Chapter 4

Boot Loaders

4.1 Introduction

The boot loader is the very first thing running after power on. Its task is to initialise (some of)the hardware and provide a means for loading the kernel from some kind of storage and executeit. Bootloaders also often have monitor functionality to read/write memory, program flash and soon.

A lot of Linux compatible bootloaders exists. For Linux systems running on PCs the mostpopular are LILO and GRUB, but they are not interesting for most embedded Linux systems asthey are x86 specific and only support booting from disks.

Most boot loaders are by their nature very platform/board specific, but three strives to beportable: RedBoot, “Das U-Boot” and Barebox. A portable boot loader is very interesting asyou don’t need to write or get familiar with a new boot loader every time you change hardwareplatform.

4.2 RedBoot

RedBoot (Red Hat Embedded Debug and Bootstrap), is an advanced bootloader by Red Hatwritten on top of the eCos embedded operating system. Features of special interest are:

• Portable (ARM, Calmrisc16/32, Coldfire, Frv, H8300, x86, M68K, Mips, OpenRisc, PPC,SH, Sparc, V85x)

• Interactive command line interface over serial and telnet

• Boot scripting

• TCP/IP stack with BOOTP and DHCP support

• Image download from file, X/Y modem, TFTP and HTTP

• ELF, SREC and binary image formats, optionally GZIP compressed

• Flash interface (NOR/NAND) with image system (FIS)

• Read-only file system access (JFFS2, FAT, EXT2)

• Integrated GDB stubs for easy debugging (serial and TCP/IP)

• Boot support for eCos and Linux

27

CHAPTER 4. BOOT LOADERS 28

The Flash Image System (FIS) is especially interesting as Linux can also parse it (See CON-FIG MTD REDBOOT PARTS), so no special effort is needed to keep the bootloader and kernel’sidea of the flash layout in sync.

As Red Hat is no longer working on eCos development activity around RedBoot has unfortu-nately slowed down quite a bit.

RedBoot’s eCos heritage also means that it is a fairly large source base (100MB), where mostof the source code isn’t relevant for RedBoot. This might lead to a steeper learning curve thanother dedicated boot loaders. The memory map is further more not optimised for loading Linuxkernels (E.G. on PPC RedBoot normally is located at the bottom of the memory map, so a rawLinux kernel cannot be directly loaded but the zImage target with it’s small loader that movesthe kernel in place after loading must be used).

RedBoot is licensed under the eCos License, which is for all intents the same as GPL.

4.3 Das U-Boot

U-Boot, the universal boot loader is probably the most feature full, flexible and most activelydeveloped open source boot loader available. It is maintained by Wolfgang Denk of DENX SoftwareEngineering.

It started as a PowerPC specific boot loader (PPCBoot), but now also runs on ARM, Blackfin,x86, M68K, Microblaze, Mips and Nios(1 & 2) boards.

It is very much focused on booting Linux systems, and the development approach is also clearlyinspired by it (GIT version control, coding style, reuse of Linux drivers, ..).

Features of special interest are:

• Flash support (NOR, NAND, Dataflash)

• Compression (GZIP, BZIP2)

• Interactive command line interface and boot scripting

• TCP/IP stack with BOOTP, DHCP, TFTP and NFS support

• Lots of drivers (IDE, SCSI, MMC, PCMCIA, USB, LCD, I2C, SPI, ...)

• x86 emulation for graphics card POST on non-x86

• File systems (JFFS2, Cramfs, EXT2, FAT, ReiserFS, ..)

• Boot splash images

• FPGA configuration

U-Boot is licensed under the GPL.

4.4 Barebox

Barebox is a relatively new bootloader (2009). It started its life under the code name U-Boot v2by Sascha Hauer from Pengutronix as a technology study to see if it was possible to merge the niceuser features of U-Boot with infrastructure concepts inspired by the Linux kernel (driver model,POSIX, ..).

Barebox has many of the same features as U-Boot, a cleaner code base and command set, butnot as broad hardware support or popularity. It follows a relatively agressive development flowwith monthly releases.

Barebox is also licensed under the GPL.


4.5 Conclusion

Both RedBoot and U-Boot are or have been in use within Barco and both are valid options, butbecause of its active development and strong Linux focus we recommend to go with U-Boot fornew development. Barebox has to our knowledge not been used within Barco yet, but is also avery interesting project to consider.

4.6 Hands On - Explore U-Boot

The SheevaPlug device runs U-Boot; log into the device with serial and poke around to discoverthe hardware, using U-Boot commands. Inspect the environment that is stored in flash.

4.7 Hands On - Replace Bootloader

4.7.1 Introduction

The SheevaPlug uses U-Boot as a bootloader. If any custom hardware is made, it is always wiseto start from a well known reference design.

First of all, these kinds of evaluation boards are designed to allow the customers to evaluate theon-board functionality of the processor. As such a lot of the hardware peripherals will be accessible,or at the very least, defined in the code (e.g. the IMMR registers) of both the Bootloader (DasU-Boot) as well as for the Linux kernel.

Secondly, whenever there is a design error, a lot of users will have the same error; possiblysaving you time if someone else already encountered the problem (by providing a patch and/orworkaround). As we all know, this is often the case in the first revisions of new devices. It is notunusual to see patches appearing for these devices on mailing lists for the bootloader (the mostlikely place to tackle silicon bugs); or possibly even in the Linux kernel.

Deviation from the reference design should be taken with great care, and always in cooperationwith the person(s) doing the U-Boot and Linux kernel port. What can be a simple twist of thepen for a hardware designer (re-connecting chip selects) can cause a lot of work in locating therelevant code snippets and/or adjusting in the code, especially in a start-up phase where we arenot yet certain what causes a particular problem.

A general rule of thumb should be: don’t change if there is no paramount reason; while somechanges can be fixed rather rapidly; others can cause important head aches (changing interruptlines) and maintenance problems for the remainder of the product life cycle. Changes are ofteneasier incorporated when the base platform is understood and ported: e.g. configuring memoryfrom a SoDimm device is easy since the settings can be read out via I2C from a small EEPROMon the SoDimm; soldering the memory is cheaper and more compact; but requires setting thetimings themselves.

In the remainder of this chapter, we will have a look at the U-Boot configuration and the Linuxkernel configuration for the SheevaPlug port.

In the following, we assume the usage of a BDI2000 or BDI3000 probe or OpenOCD and thatthe configuration file for the BDI probe has already been defined. As well as a working JTAGprobe is important, the same goes for a serial line, especially for early debugging and U-Bootaccess. Again, we assume that this has been take care of during early system design.

4.7.1.1 Getting the Source

U-Boot has a regular 3 month release interval not unlike the kernel. We use the 2010.06 releasetarball. Later releases can be used as well, but the board support configuration is somewhatchanged. Download from ftp://ftp.denx.de/pub/u-boot/u-boot-2010.06.tar.bz2.

ftp://ftp.denx.de/pub/u-boot/u-boot-2010.06.tar.bz2


4.7.1.2 Configuration

Since we know that the Sheevaplug port has already been done; we start looking for the con-figuration file. The configuration of U-Boot for a particular board is defined in a header file ininclude/configs/; include/configs/sheevaplug.h in our case.

Configuration is usually done using C preprocessor defines; the rationale behind that is to avoiddead code whenever possible. There are two classes of configuration variables:

Configuration OPTIONS : These are selectable by the user and have names beginning withCONFIG .

Configuration SETTINGS : These depend on the hardware etc. and should not be meddledwith if you don’t know what you’re doing; they have names beginning with CFG .

The options themselves are documented in the README.Since we want to create a derived configuration from the reference board, we copy the

include/configs/sheevaplug.h to include/configs/myplug.h. In order to build our variant,we add the following lines in the Makefile(in the top source directory).

myplug_config: unconfig

@$(MKCONFIG) $(@:_config=) arm arm926ejs $(@:_config=) barco kirkwood

The first option specifies that we will be building for a arm architecture; processor typearm926ejs and board configuration Marvell1. The second but last option indicates that we will beplacing our board port under board/barco/ instead of board/ (Barco is using the board/barco/

directory, as was agreed up on within Barco2.Since our port is based on the board/Marvell/sheevaplug/; we copy the directory to provide

a base to work with.

[mleeman@neo u-boot-2010.06]$ cp -a board/Marvell/sheevaplug/ board/barco/myplug/

Since we opted to keep changes to a minimum and trying to leverage as much as possible fromthe U-Boot functionality while keeping maintenance to a minimum; we chose not to do this.

We carefully inspect, validate and adjust the settings where needed, if your design is close tothe reference design, you will not need to make any other code changes.

While this is not strictly needed, U-Boot gives us the possibility to store the environmentin flash. Since this is a very powerful tool, we enable this by making certain that the followingvariables are correct:

#define CFG_ENV_IS_IN_FLASH 1

#define CFG_ENV_ADDR (CFG_MONITOR_BASE + 0x40000)

#define CFG_ENV_SECT_SIZE 0x20000 /* 128K (one sector) for env */

#define CFG_ENV_SIZE 0x20000

/* Address and size of Redundant Environment Sector */

#define CFG_ENV_ADDR_REDUND (CFG_ENV_ADDR + CFG_ENV_SECT_SIZE)

#define CFG_ENV_SIZE_REDUND (CFG_ENV_SIZE)

This instructs U-Boot that a environment will be stored in flash at location CFG MONITOR BASE

+0x40000, of size 0x20000 (one sector). We also add a redundant configuration section just afterit.

1If the board port is significantly different from the reference design or if you require extensive and specificfunctionality you need to add; it might be wise to completely branch the original port into a specific one.

2Since Barco is creating a lot of U-Boot based boards, it is a good idea to provide a vendor (barco) directory toplace the boards in.


4.7.1.3 Building and booting

At this point; you can try to build the bootloader from this point onwards. We specify the usualCROSS COMPILE and ARCH parameters (See Chapter 2). Fist, we configure U-Boot to build ournewly defined board; check the dependencies and finally, build the image.

make ARCH=arm CROSS_COMPILE=arm-linux- myplug_config

make ARCH=arm CROSS_COMPILE=arm-linux-

make ARCH=arm CROSS_COMPILE=arm-linux- u-boot.kwb

If all goes well, you should end up with a binary and Marvell (kwb) image (u-boot.bin andu-boot.kwb). We will use the Marvell image to load over the network and burn it to flash. For theinitial loading of the bootloader in flash, we copy the image to our tftpboot directory and burnit with OpenOCD.

The first step is to get OpenOCD working (See Chapter 9 for more information about OpenOCD).With this information, connect OpenOCD JTAG emulator. OpenOCD is part of GNU/Debian and themost recent ones (testing/unstable) have been tested and are known to work with the SheevaPlug.Another option is to use the precompiled version available from http://www.openplug.org.

Start OpenOCD:

[marc@staleek Sheeva]$ openocd -f /usr/share/openocd/scripts/board/sheevaplug.cfg

If OpenOCD complains with a similar message:


Open On-Chip Debugger 0.3.0-in-development (2009-08-13-23:22) svn:r2529

$URL: http://svn.berlios.de/svnroot/repos/openocd/trunk/src/openocd.c $

For bug reports, read http://svn.berlios.de/svnroot/repos/openocd/trunk/BUGS

2000 kHz

jtag_nsrst_delay: 200

jtag_ntrst_delay: 200

dcc downloads are enabled

Error: unable to open ftdi device: device not found

Runtime error, file "command.c", line 469:

[marc@staleek Sheeva]$

you might need to change the file: sheevaplug-installer-v1.0/uboot/openocd/config/interface/sheevaplug.cfg

[marc@staleek ~]$ cat /usr/share/openocd/scripts/interface/sheevaplug.cfg

#

# Marvel SheevaPlug Development Kit

#

# http://www.marvell.com/products/embedded_processors/developer/kirkwood/sheevaplug.jsp

#

interface ft2232

ft2232_layout sheevaplug

ft2232_vid_pid 0x9e88 0x9e8f

ft2232_device_desc "SheevaPlug JTAGKey FT2232D B"

jtag_khz 2000

This is due to a changed vendor ID after 07/2009.Connect with a telnet session to your OpenOCD session.

[marc@crichton ~]$ nc localhost 4444

Open On-Chip Debugger

http://www.openplug.org


> sheevaplug_init

sheevaplug_init

target state: halted

target halted in ARM state due to debug-request, current mode: Supervisor

cpsr: 0x000000d3 pc: 0xffff0000

MMU: disabled, D-Cache: disabled, I-Cache: disabled

0 0 1 0: 00052078

>

Start by clearing the flash (whatever was on the SheevaPlug). Next, load the kwb image of thebootloader.

> nand probe 0

nand probe 0

> nand erase 0 0 0x20000000

nand erase 0 0 0x20000000

bad block: 1137

didn’t erase block 1133; status: 0xe1

erased blocks 0 to 4096 on NAND flash device #0 ’NAND 512MiB 3,3V 8-bit’

> nand write 0 u-boot.kwb 0 oob_softecc_kw

nand write 0 u-boot.kwb 0 oob_softecc_kw

after hitting the resume command, we should see not unlike the following in our serial console:

U-Boot 2010.06 (Jul 26 2010 - 10:36:14)

Marvell-Sheevaplug

SoC: Kirkwood 88F6281_A0

DRAM: 512 MiB

NAND: 512 MiB

*** Warning - bad CRC or NAND, using default environment

In: serial

Out: serial

Err: serial

Net: egiga0

88E1116 Initialized on egiga0

Hit any key to stop autoboot: 0

Marvell>>

If you set your CONFIG BOOTDELAY sufficiently large (> 0)3; you can now stop the bootloaderfrom executing its default startup command as defined in CONFIG BOOTCOMMAND. U-Boot can nowbe used for upgrading itself, and writing kernel and filesystem images to flash; as well as hardwareinspection. Running help from the U-Boot command line gives you a list of simple commands.

4.7.1.4 A note about the flash layout

At the U-Boot shell prompt, the mtdparts variable can be inspected.

Marvell>> printenv mtdparts

mtdparts=orion_nand:512k(uboot),3m@1m(kernel),1m@4m(psm),13m@5m(rootfs) rw

This configuration is passed to the kernel; allowing the kernel to partition the flash using thecmdlinepart driver. In this particular case; 512 kB is reserved for U-Boot, the kernel has 3 MB

3When set to −1; it will not boot automatically


at an offset of 1 MB, a configuration space that is further not used here is stored at an offset of 4MB and can hold 1 MB and finally, the root filesystem is up to 13 MB large; stored at an offsetof 5 MB.

A careful observer will have seen that there is a gap between the bootloader and the kernel thatis not accounted for in this setup. Inspecting the environment definition in include/configs/sheevaplug.h

shows:

#define CONFIG_ENV_SIZE 0x20000 /* 128k */

#define CONFIG_ENV_ADDR 0x60000

#define CONFIG_ENV_OFFSET 0x60000 /* env starts here */

128 kB is being used for storing the environment in flash and it is stored at an offset of 0x60000;i.e. at 386 kB. From the 512 kB of the bootloader; 368 kB can be used by the bootloader binaryitself; while the last 128 kB is used for storing the configuration itself.

Storing the environment in the bootloader partition has the advantage that the configuration ishidden from userspace (can be a design choice); but does add some concerns about addressing theconfiguration space from userspace. In that respect, it can be better to explicitly add a partitionfor the configuration that is addressable seperately from userspace (from the bootloader).

Another improvement is to add a redundant configuration space for the following reason. Whenflash is modified; a complete sector is typically erased and re-written. If an error occurs before theflash is rewritten; the system will end up with an inconsisten configuration (resorting to factorysettings again when properly designed). When a redundant configuration space is used, a sequencenumber is prepended to the config space. This (together with a CRC) is used to figure out whichof the two configurations is most recent.

This is enabled by defining in the header file:

#define CONFIG_ENV_ADDR_REDUND (CONFIG_ENV_ADDR + CONFIG_ENV_SECT_SIZE)

#define CONFIG_ENV_SIZE_REDUND (CONFIG_ENV_SIZE)

And the revised mtdparts variable would then be:

mtdparts=orion_nand:368k(uboot), \

128k@368k(config0),128k@512k(config1), \

3m@1m(kernel),1m@4m(psm),13m@5m(rootfs) rw

The bootloader has a seperate partition, the configuration spaces are appended to it, consuming128 kB of the previously unallocated space.

As a result, for a uClibc based system, only 18 MB is used of the entire flash size (512 MB).However, the remaining flash can further be partitioned for other, independent systems. An ex-ample of this will be discussed next.

4.7.1.5 Adjusting the U-Boot environment

When the CFG ENV IS IN FLASH configuration option is set, the settings are stored in flash; andcan be modified there. This is a very powerful tool to modify the early startup behaviour of theU-Boot bootloader.

The main commands are setenv, printenv and saveenv.The MAC address on the SheevaPlug is set by the bootloader; since the bootloader got replaced,

all SheevaPlug devices will not have the same bootloader address. For this reason; the ethaddr

variable needs to be set again in U-Boot. Check the back of the device and set the correct valuefrom the command line.

> setenv ethaddr 00:02:a5:74:2a:34

> printenv ethaddr

> saveenv


Remember to use saveenv to burn the change to flash after the modifications to make thempersistent.

U-Boot as a a number of tools (c-code) that includes the same functionality in a Linux system.These programs are called fw setenv and fw printenv. It allows the user to access the U-Bootconfiguration and modify it. The most important difference is that a changing a variable directlywrites it to flash (the counterpart for the saveenv command is not required).

4.7.1.6 Fine tuning the Startup Behaviour

One of the strong points of U-Boot is that these commands can be combined in a script likebehaviour4.

The following section uses settings as used in a Barco design and can easily be adjusted forthe demo board.

Though all the variables can be modified and saved in flash, (see printenv, setenv and saveenv

U-Boot commands in Section 4.7.1.5); we should add the valid default in our include/configs/myplug.hconfiguration file. The place to store these extensions is in CONFIG EXTRA ENV SETTINGS. In whatfollows, the settings will be explained.

#define CONFIG_ETHADDR 02:04:a5:01:05:ce

#define CONFIG_ETH1ADDR 02:04:a5:01:05:ce

#define CONFIG_IPADDR 10.2.4.40

#define CONFIG_ROOTPATH _bad_non_dhcp_default_value_rootpath_

#define CONFIG_BOOTFILE _bad_non_dhcp_default_value_bootfile_

Each board is different, but a number of default values are needed in order to assure properoperation even when board dependent parameters are not stored during production. A number ofthere parameters are the default MAC addresses (in this case for the first two network devices),the default IP address. Finally, we end with two values that are obviously not valid (this assuresthat an error, e.g. loading the wrong kernel, will be detected early).

#define CONFIG_EXTRA_ENV_SETTINGS \

"netdev=eth0\0" \

Since we are defining one preprocessor directive, it should be on one long line. For readability,we are splitting them up in several lines; but denote that they are one logical one to the preprocessorby adding the \ at the end of each file line. A u-boot variable consist out a normal key/value pair.The pair is closed by the NULL character (or \0).

"nfsargs=setenv bootargs root=/dev/nfs rw\0" \

"addnfsipargs=setenv bootargs ${bootargs} " \

"nfsroot=${serverip}:${rootpath},tcp\0" \

"addip=setenv bootargs ${bootargs} " \

"ip=${ipaddr}:${serverip}:${gatewayip}:${netmask}" \

":${hostname}:${netdev}:off panic=1\0" \

"addtty=setenv bootargs ${bootargs} console=ttyS0,${baudrate}\0"\

"dhcp_nfs=run nfsargs addnfsipargs addip addtty;bootm $loadaddr - $fdtaddr\0" \

"kernel=4\0" \

The above piece of code is the base of the script to start a development system over the network.The first line indicates that the bootargs argument to the kernel should be provide the /dev/nfs

root node. The second line extends the bootargs with the nfsroot, the location where the rootfilesystem is placed.

In the third line (addip), the bootloader passes its IP address to the kernel, avoiding any IPconfusion or assigning a different IP after mounting the NFS root.

4Make certain to enable the CFG HUSH PARSER define.


Next, pass the serial communication options to the kernel boot argemunts (baudrate). dhcp nfs

binds all this logic together and calls the scripts we just defined. Finally, it boots the kernel fromaddress $loadaddr with the flat device tree at address $fdtaddr. More on the Flat Device Treeslater.

The kernel=4 value plays a central role in this boot logic. For the moment, it’s enough toknow that a value of 4 is an invalid value and signals factory defaults.

"release=ngs103.2.6.continuous\0" \

"ubootargs=setenv uboot /home/services/tftpboot/\

v-$release/u-boot.$release.img\0" \

"flashkernelargs=setenv flashkernel /home/services/tftpboot/\

v-$release/flashfs/kernel.$release.img\0" \

"filesystemargs=setenv filesystem /home/services/tftpboot/\

v-$release/flashfs.$release.img\0" \

"tftpargs=run ubootargs flashkernelargs filesystemargs dtblobargs\0"\

The following lines define the (default) locations of the firmware images on the developmentserver. The release is parametrised, making uploading via the bootloader with a different releaseeasy to do (Excercise: update these addresses from a PowerPC based design to something suitableon your plug device).

"factkernsaddr=fe080000\0" \

"factkerneaddr=fe1bffff\0" \

"factfssaddr=fe1c0000\0" \

"factfseaddr=fe6fffff\0" \

"upgrkernsaddr=fe700000\0" \

"upgrkerneaddr=fe83ffff\0" \

"upgrfssaddr=fe840000\0" \

"upgrfseaddr=fed7ffff\0" \

"jffsfssaddr=fed80000\0" \

"jffsfseaddr=fefbffff\0" \

"ubootsaddr=fe000000\0" \

"ubooteaddr=fe03ffff\0" \

"cfg1saddr=fe040000\0" \

"cfg1eaddr=fe05ffff\0" \

"cfg2saddr=fe060000\0" \

"cfg2eaddr=fe07ffff\0" \

Next, there is an entire list of addresses that reflect the address map. Even though theseaddresses can also be found in the device tree, they are duplicated here for the purpose of easyscripting in the bootloader shell. There are two systems that are mirrored (upgrade and factory),a redundant configuration space and a single JFFS2 filesystem to store persistent changes.

"loaduboot=run ubootargs; tftp 100000 ${uboot}\0" \

"updateuboot=protect off ${ubootsaddr} ${ubooteaddr};" \

"era ${ubootsaddr} ${ubooteaddr};" \

"cp.b 100000 ${ubootsaddr} ${filesize};" \

"protect on ${ubootsaddr} ${ubooteaddr}\0" \

"burnuboot=run loaduboot;run updateuboot\0" \

"loadflashfs=run filesystemargs; tftp 100000 ${filesystem}\0" \

"updatefactflashfs=protect off ${factfssaddr} ${factfseaddr};" \

"era ${factfssaddr} ${factfseaddr};" \

"cp.b 100000 ${factfssaddr} ${filesize};" \

"protect on ${factfssaddr} ${factfseaddr}\0" \

"burnfactflashfs=run loadflashfs;run updatefactflashfs\0" \


"updateupgrflashfs=protect off ${upgrfssaddr} ${upgrfseaddr};"\

"era ${upgrfssaddr} ${upgrfseaddr};" \

"cp.b 100000 ${upgrfssaddr} ${filesize}\0" \

"burnupgrflashfs=run loadflashfs;run updateupgrflashfs\0" \

"loadkernel=run flashkernelargs; tftp 100000 ${flashkernel}\0" \

"updatefactkernel=protect off ${factkernsaddr} ${factkerneaddr};"\

"era ${factkernsaddr} ${factkerneaddr};" \

"cp.b 100000 ${factkernsaddr} ${filesize};" \

"protect on ${factkernsaddr} ${factkerneaddr}\0" \

"burnfactkernel=run loadkernel;run updatefactkernel\0" \

"updateupgrkernel=protect off ${upgrkernsaddr} ${upgrkerneaddr};"\

"era ${upgrkernsaddr} ${upgrkerneaddr};" \

"cp.b 100000 ${upgrkernsaddr} ${filesize}\0" \

"burnupgrkernel=run loadkernel;run updateupgrkernel\0" \

"burnfact=run burnuboot;run burnfactkernel;run burnfactflashfs\0"\

"burnupgr=run burnupgrkernel;run burnupgrflashfs\0" \

The next lines are the scripting to load the appropriate firmware files. e.g. loaduboot sets theevironment variables and fetches the file over tftp and stores it in memory. The update commandstake a file from memory and burn it into the flash with the appropriate removal and restoring ofthe flash protection. Finally, the burn command combine the above two, fetching the file from tftpand burning it in flash. All the flash sections/firmware sections follow the same structure. At thebottom, we combine these again in burnfact and burnupgr to burn the entire factory firmware orthe entire upgrade firmware from tftp in the flash.

"factargs=setenv bootargs noinitrd quiet root=/dev/mtdblock4 ro\0" \

"upgrargs=setenv bootargs noinitrd quiet root=/dev/mtdblock6 rw\0" \

"runfact=run factargs addtty;bootm ${factkernsaddr} \0" \

"runupgr=run upgrargs addtty;bootm ${upgrkernsaddr} \0" \

"eraseconfig=protect off ${cfg1saddr} ${cfg2eaddr};" \

"era ${cfg1saddr} ${cfg2eaddr}\0" \

"erasejffs=era ${jffsfssaddr} ${jffsfseaddr}\0" \

The next batch of commands set up the scene for booting the board: factargs prepares thefactory arguments for the booting a flash file system (similar to what was first introduced for NFS,but more simple) and runfact boots the factory system. eraseconfig and erasejffs erase thepersistent storage (for both U-Boot and the Linux based system).

We end with the boot magic:

"scriptcmd=dhcp; run dhcp_nfs;\0" \

"runsystem=if itest $kernel -eq 0;" \

"then run scriptcmd;" \

"else if itest $kernel -eq 1;" \

"then run runupgr;" \


"then run runfact;" \


"then setenv kernel 2; saveenv; run runupgr;" \

"else setenv kernel 2; saveenv; run runfact;" \

"fi;fi;fi;fi\0" \

"fdtaddr=400000\0" \

""

runsystem is the main startup command (as defined with in the CONFIG BOOTCOMMAND vari-able), cf. infra. Previously; we hinted that we enabled the hush shell in the bootloader. By doing


this; simple shell commands, variables and even control flow can be used. As mentioned before,we need a reliable way to upgrade firmware: this upgrade can include additional functionality,bug-fixes or overall performance improvements.

The selection of the system to boot is based on the kernel variable we defined earlier. kernelcan contain the following valid values:

0 : Run the user defined script. In this case, this is scriptcmd.

1 : We know that we have a valid upgrade system. Always boot this system.

2 : We only trust the factory system. This might be because we are in an uncertain state (up-grading), corruption of the upgrade system or because the factory system is the only systemthat is present on flash.

3 : During our previous booted session; the system was upgraded. However, since we have not yetrun that system; it is not yet certain that this upgrade system is good. Therefore, at theend of the upgrade; signal the bootloader that we are in an uncertain state by setting thekernel value to 3.

With this knowledge, the runsystem does the following, depending on the value of kernel. Ifit is 0; run the user command; no failsafe behaviour; assume that he/she5 knows what he is doing.If the value of kernel is 1; we have a confirmed and working upgrade system, just run it. Thesame goes for a kernel value of 2, don’t perform any checks; just boot the system6.

A kernel value of 3 indicates to the bootloader that system is in an un-confirmed and somewhatuncertain state: the user has upgraded the running system and the new system that resides inflash has not been confirmed: it has not yet booted successfully and the included firmware was notyet able to verify itself. Therefore, we adjust the kernel to 2 (the good ol’ trusted system), savethe environment to flash (for the next reboot) and boot the upgrade system with runupgrade.When the system is up and running and the userspace was able to execute the self-check, it willchange the kernel value from 2 to 1 (with the userspace tool fw setenv).

Finally, the else branch tackles everything else: for some reason, the kernel value is set to avalue that is not known to the system. In this case; only the factory system should be booted: thekernel variable is corrected (setting it to 2), saved, and the factory system is booted. Note thatthis is the branch that will be executed during the first boot on a virgin production system, or asystem that defaults back to the hard coded settings: the kernel variable is set to 4, which is notvalid.

""

Finally, the extra settings are closed.

#define CONFIG_BOOTCOMMAND "getmac; setcy22150; run runsystem" \

" || protect off fe040000 fe07ffff" \

" && era fe040000 fe07ffff" \

" && era fed80000 fefbffff" \

" && saveenv" \

" && setenv bootargs noinitrd root=/dev/mtdblock1 ro console=ttyS0,115200" \

" && bootm fe080000 - fefc0000"

The default behaviour for U-Boot should be to run the command runsystem. If that fails (theenvironment is corrupted, we try to boot the factory settings. Note that we cannot depend onvariables here since we must assume that the environment cannot be trusted anymore, thereforewe delete the environment sections.

5We wish!6Obviously, the factory system should be tested thoroughly before being uploaded during production: not only

would it cost a lot to upgrade all the boards but since the factory system is the ultimate fall back in case somethinggoes wrong; we must be able to really trust it.


4.8 References

• RedBoot: http://ecos.sourceware.org/redboot/

• U-Boot: http://u-boot.sf.net

• U-Boot chapter of DULG: http://www.denx.de/wiki/view/DULG/UBoot

• Barebox: http://barebox.org

http://ecos.sourceware.org/redboot/

http://u-boot.sf.net

http://www.denx.de/wiki/view/DULG/UBoot

http://barebox.org

Chapter 5

The Linux Kernel

5.1 Introduction

The Linux kernel is a Unix-like operating system kernel that was begun by Linus Torvalds in 1991and subsequently developed with the assistance of developers worldwide.

The project was launched in 1991. At the time, the GNU project had created many of thecomponents required for a free operating system, but its own kernel project, the GNU Hurd,was incomplete and unavailable. The BSD operating system had not yet freed itself from legalencumbrances. This left a space for the Linux kernel to fill, and despite the limited functionality ofthe early versions it rapidly accumulated developers and users. Early on, Minix hackers contributedcode and ideas to the Linux kernel, and today it has received contributions from thousands ofprogrammers.

5.2 Timeline

• April 1991 - Linus Torvalds, then 21, starts working on some simple ideas for an operatingsystem. He starts with a task switcher in 386 assembly and a terminal driver.

• 25 August 1991 - Torvalds posts to comp.os.minix:

I’m doing a (free) operating system (just a hobby, won’t be big and professional likeGNU) for 386(486) AT clones. This has been brewing since April, and is startingto get ready. I’d like any feedback on things people like/dislike in Minix, as my OSresembles it somewhat (same physical layout of the file-system (due to practicalreasons) among other things).

I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work. Thisimplies that I’ll get something practical within a few months [...] Yes - it’s free ofany minix code, and it has a multi-threaded fs. It is NOT portable (uses 386 taskswitching etc., and it probably never will support anything other than AT-harddisks, as that’s all I have :-(.

[...] It’s mostly in C, but most people wouldn’t call what I write C. It uses everyconceivable feature of the 386 I could find, as it was also a project to teach meabout the 386. As already mentioned, it uses a MMU, for both paging (not todisk yet) and segmentation. It’s the segmentation that makes it REALLY 386dependent (every task has a 64Mb segment for code & data - max 64 tasks in4Gb. Anybody who needs more than 64Mb/task - tough cookies). [...] Some of my”C”-files (specifically mm.c) are almost as much assembler as C. [...] Unlike minix,I also happen to LIKE interrupts, so interrupts are handled without trying to hidethe reason behind them

39

CHAPTER 5. THE LINUX KERNEL 40

• September 1991 - Linux version 0.01 is released. (10,239 lines of code)

• October 1991 - Linux version 0.02 is released.

• December 1991 - Linux 0.11 is released. This version is the first that is self-hosted (Linux0.11 can be compiled under Linux 0.11)

• January 19, 1992 - First post to alt.os.linux newsgroup.

• March 31, 1992 - The newsgroup comp.os.linux is created.

• March 1992 - Linux version 0.95 is the first to be capable of running the X Window System.

• During the whole of 1993, and early 1994 - 15 development versions 0.99.*, with 0.99.11(July 1993) introducing BogoMips into the kernel

• March 14, 1994 - Linux 1.0.0 is released. (176,250 lines of code)

• March 1995 - Linux 1.2.0 is released (310,950 lines of code)

• May 9, 1996 - Tux the penguin is suggested as mascot for Linux

• June 9, 1996 - Linux 2.0.0 is released. (777,956 lines of code.)

• January 25, 1999 - Linux 2.2.0 is released. (1,800,847 lines of code)

• December 18, 1999 - IBM mainframe patches for 2.2.13 published, bringing Linux into thebiggest enterprises.

• January 4, 2001 - Linux 2.4.0 is released. (3,377,902 lines of code)

• December 17, 2003 - Linux 2.6.0 is released. (5,929,913 lines of code)

• July 22nd, 2011 - Linux 3.0 is released. (14,619,185 lines of code)

• July 21st, 2012 - Linux 3.5 is released. (15,596,464 lines of code)

5.3 Technical features

The Linux kernel supports true preemptive multitasking (both in user mode and kernel mode),virtual memory, shared libraries, demand loading, shared copy-on-write executables, memory man-agement, TCP/IP networking, and threading.

5.3.1 Architecture

Linux is a monolithic kernel. Device drivers and kernel extensions run in kernel space (ring 0), withfull access to the hardware, although some exceptions run in user space. The graphics sub-system(X-Windows) is not part of the kernel, is optional, and runs in user space, in contrast to MicrosoftWindows.

kernel mode preemption means device drivers can be preempted under certain conditions.This latter feature was added to handle hardware interrupts correctly, and to improve supportfor symmetric multiprocessing. Preemption also improves latency, increases responsiveness andmaking Linux more suitable for real-time applications.

The fact that Linux is not a microkernel was the topic of a famous flame war between LinusTorvalds and Andy Tanenbaum on comp.os.minix in 1992. This subject was revisited in 2006.

Unlike traditional monolithic kernels, device drivers are easily configured as modules, andloaded or unloaded while running the system.


5.3.2 Programming Languages

The Linux kernel is written in C using GNU GCC extensions, together with a number of relativelyshort sections of code written in the assembly language (in GCC’s ”AT&T-style” syntax) of thetarget architecture. Because of the extensions to C it supports, GCC was for a long time the onlycompiler capable of correctly building a Linux kernel. Recently, Intel claims to have modified itsC compiler so that it is also capable of correctly compiling the kernel.

Many other languages are used in some way, primarily in connection with the kernel buildprocess (the methods whereby the bootable image is created from the sources). These includePerl, Python, and various shell scripting languages. Some drivers may also be written in C++,Fortran, or other languages, but this behaviour is strongly discouraged. The kernel’s build systemonly officially supports the GCC as a kernel and driver compiler.

5.3.3 Portability

While not originally designed to be portable, Linux is now one of the most widely ported operatingsystem kernels, running on a diverse range of systems from the iPAQ (a handheld computer) tothe IBM System z9 (a massive mainframe server that can run hundreds or even thousands ofconcurrent Linux instances), to the iPod (a portable mp3 player). Linux is intended to run as themain operating system on IBM’s new Blue Gene supercomputer architecture when it is finished.

It is important to note that Torvalds’ efforts were also directed successfully at a different sortof portability. Portability, according to Torvalds, was the ability to easily compile applicationsfrom a variety of sources on his system; thus Linux originally became popular in part because itrequired the least effort to get popular free software and other open source applications running.

5.3.4 Versions

Further developing his own code and integrating changes made by other programmers, LinusTorvalds keeps releasing new versions of the Linux kernel. These are called ”vanilla” kernels,meaning they have not been modified by anyone. Many providers of GNU/Linux operating systemsmodify the kernels of their product, mainly in order to add support for drivers or features whichhave not officially been released as stable, while some distributions, such as Slackware, rely onvanilla kernels.

The kernel follows a quite rigid release schedule with new releases every 2-3 months.

• 2.6.35: August 1st 2010

• 2.6.36: October 20th, 2010

• 2.6.37: January 5th, 2011

• 2.6.38: March 15th, 2011

• 2.6.39: May 19th, 2011

• 3.0: July 22nd, 2011

• 3.1: October 24th, 2011

• 3.2: January 5h, 2012

• 3.3: March 22nd, 2012

• 3.4: May 21st, 2012

• 3.5: July 21st, 2012


New features are only added to the kernel the first two weeks after a release, after that -rc1 isreleased. After release candidate 1, only bug fixes are allowed, and the stabilisation phase beginswith a new release candidate every 1-2 weeks. This continues until the kernel is sufficiently stable(normally 6-7 release candidates).

The first two weeks after a release is very hectic, with the -rc1 patch typically around 1Mlines (around 10.000 individual changes).

If serious bugs are found after a release, a 3.x.y kernel is released with the fixes.

5.3.5 Getting the Source

There are a number of ways to get the the source code for the Linux kernel. Where in the pre-2.6series, it was customary to have important architecture dependent development trees; this is muchless the case with the current 3.x series. In fact; new embedded code gets picked up quickly in themain Linux kernel tree.

The place to be for the Linux kernel code is still http://www.kernel.org, it gives an overviewof the development status of the different trees and the latest changes.

Figure 5.1: The kernel.org website

5.3.6 Source Tree

The kernel source tree is BIG! It is currently (3.5) around 10M lines of code (97% C, 3% ASM),or around 525 MB, so finding your way around it can seem a bit overwhelming at first. Luckilymost of the code can be ignored as it isn’t applicable to embedded systems.

• 256M drivers

• 120M arch

• 33M fs

• 25M sound

• 24M include

http://www.kernel.org


• 22M net

• 21M Documentation

• 5.4M kernel

• 2.5M scripts

• 2.4M mm

• 2.0M crypto

• 2.0M lib

• 2.1M security

• 872K block

• 236K ipc

• 172K init

• 204K samples

• 40K usr

As it can be seen, the biggest part of the kernel is by far device drivers. Most of those de-vice drivers are for devices not available on your embedded system, so they can be ignored1.The arch/<cpu> and include/asm-<cpu> directories contains architecture specific code, and istypically a few MBs per architecture (depending on number of supported boards).

5.3.7 Tracking Development

Several methods exist for tracking development of the Linux kernel. You can subscribe to thekernel mailing list2, but that is not recommended if you want to get other stuff done as it is a veryhigh bandwidth list (2-300 mails/day). Likewise mailing lists exists for specific sub systems of thekernel (E.G. driver types, architectures, ..).

A more high-level view of the development can be obtained by reading the weekly kernelsection in Linux Weekly News at http://lwn.net/Kernel or checking the LinuxChanges page onkernelnewbies at http://www.kernelnewbies.org/LinuxChanges.

5.4 Hands On - Build Kernel

Once the bootloader is up and running, we obviously want to run our own built kernel. Whenusing the Sheevaplug, we can start from the work done by others, but when porting to a customboard, this is not the case. Unless you have a very ambitious project where you are allowed toport the Linux kernel to a new processor, there are a lot of examples; there will probably be areference design where the current design is derived from, so that’s a good point to start.

Always make certain that the Linux kernel is ported to a processor family before designing inorder to avoid unpleasant mistakes.

The central point to get a pristine kernel copy is of course http://www.kernel.org and theeasiest way is to download the tarball from the public repository.

We now set off to download the latest version

[mleeman@seraph ~]$ wget http://www.kernel.org/pub/linux/\

kernel/v2.6/linux-3.5.4.tar.bz2

1They can be very useful as examples though2http://vger.kernel.org/vger-lists.html#linux-kernel

http://lwn.net/Kernel

http://www.kernelnewbies.org/LinuxChanges


http://vger.kernel.org/vger-lists.html


And extract the tarball

[mleeman@seraph code]$ tar xfj linux-3.5.4.tar.bz2

If you are really adventurous, you can even get Linus’ own tree with git:

[marc@scorpius tmp]$ git clone git://git.kernel.org/\

pub/scm/linux/kernel/git/torvalds/linux-2.6.git

NOTE: The kernel tree is very large. This constitutes downloading several hundred megabytesof data.

5.4.1 NFS - Network File System

As an exercise, a NFS bootable kernel will be created. Start with initialising the kernel configura-tion with the default configuration for the SheevaPlug that is in the Linux kernel upstream.

$ make ARCH=arm CROSS_COMPILE=arm-linux- kirkwood_defconfig

Next, open the configuration (as stored in .config and verify the settings.

$ make ARCH=arm CROSS_COMPILE=arm-linux- menuconfig

Configuring the kernel for our target is not unlike configuring a kernel for a desktop machine,however, just as with the bootloader, we need to provide the correct architecture. As is the casewith the U-Boot bootloader, the ARCH variable will specify which architecture to compile for andCROSS COMPILE passes the cross compiler prefix to the kernel build system.

If you are developing on the same target family as the target (PowerPC for ppc or x86 forx86 targets); you can omit these variables as you will be using the same compiler (in most cases)as the system and there is no libc dependency as there is for compiling userspace applications.However, using there prefixes, especially for embedded development is a good habit and from acompilation point of view, you don’t care much on what architecture your development system isand what the target architecture is.

menuconfig presents us with a ncurses interface (alternatives are gconfig and xconfig). Sincethe ncurses interface is being used by a lot of projects that we are using in our embedded design;menuconfig is the best option here (see Figure 5.2).

Going through the all the configuration options of the Linux kernel is a challenge on its own,and it is left up to the reader to try this. Most options come with a good or decent help sectionexplaining the function of en- and disabling an kernel configuration option.

The targets gconfig en xconfig provide a graphical configuration with the GTK+ and the QTtoolkits respectively (see Figure 5.3).

For our embedded purpose, there are a number rules of thumb:

• Do not enable an option, unless you know what it is doing and when and where you willuse it. In general, this will allow a kernel with a small footprint; saving you flash space andupgrade time.

• You should only enable the options that are needed for booting. Other drivers (and especiallythe ones for custom hardware) should be kept as modules. As such, they will end up thefilesystem and can be loaded and unloaded at runtime. For development, it is for exampleuseful to unload a module, and load a slightly modified one without the need for re-bootingthe target.


Figure 5.2: make ARCH=arm menuconfig

5.4.2 Configuration

In order to get a running NFS kernel for our target, these are some of the more important optionsthat should be enabled:

• Machine selection: Marvell Kirkwood → Marvell Sheevaplug Reference Board

• Enable loadable module support: Kernel modules are small pieces of compiled code which canbe inserted in the running kernel, rather than being permanently built into the kernel. Youuse the ”modprobe” tool to add (and sometimes remove) them. If you say Y here, manyparts of the kernel can be built as modules (by answering M instead of Y where indicated):this is most useful for infrequently used options which are not required for booting. For moreinformation, see the man pages for modprobe, lsmod, modinfo, insmod and rmmod.

• Networking and DHCP configuration support: Networking Support → Networking options→ Packet socket, Packet socket: mmapped IO, Unix domain sockets, TCP/IP networking →IP: kernel level autoconfiguration → IP: DHCP support

• File systems→ Network File Systems→ NFS client support, Root file system on NFS: If youwant your Linux box to mount its whole root file system (the one containing the directory/) from some other computer over the net via NFS (presumably because your box doesn’thave a hard disk), say Y. Read <file:Documentation/nfsroot.txt> for details. It is likelythat in this case, you also want to say Y to ”Kernel level IP autoconfiguration” so that yourbox can discover its network address at boot time.

• The settings for configuring the kernel to boot over NFS can be hardcoded in the kernel,more generic and flexible is to pass the kernel command line parameters from the bootloaderto the kernel.

or whereever you plan to locate your NFS root filesystem.Compile the kernel with

[mleeman@zee linux-3.5.4]$ make ARCH=arm CROSS_COMPILE=arm-linux- uImage


Figure 5.3: make ARCH=arm gconfig

Copy the resulting U-Boot image (arch/arm/boot/uImage) to the tftpboot directory, with thecorrect name as defined in the U-Boot configuration.

Hit the reset button of the board, sit back, watch and enjoy. The kernel boots correctly frommemory, gets the correct IP address and mounts the root filesystem over NFS.

You can overload a number of variables, and when a configurable server is available; it can bemade easy for you by the DHCP server, to e.g. supply the bootfile.

In /etc/dhcp/dhcpd.conf

host sheeva01 {

hardware ethernet 00:04:a5:03:05:fb;

fixed-address 172.2.1.1;

option root-path "/home/barco/mleeman/nfs";

filename "/home/services/tftpboot/kernel.mleeman";

}

In this case, DHCP is used serve the kernel image to the bootloader (filename) and the NFSroot system to the kernel root-path). For the last to work; we need DHCP support in the kerneland pass ip=dhcp) from the bootloader to the kernel.

Or when an unconfigured DHCP server is available; it can still be used to get a unique IPaddress

Marvell>> setenv serverip 172.0.0.1

Marvell>> setenv bootfile /mleeman/uImage

Marvell>> setenv bootargs ’console=ttyS0,115200 root=/dev/nfs \

nfsroot=172.0.0.1:/home/services/nfs/mleeman/,tcp ip=dhcp’

Marvell>> dhcp

Marvell>> bootm


Furthermore, configure the NFS root directory in the kernel command line; and verify thebootargs and bootcmd environment variables in U-Boot, remember that the boot command exe-cutes the bootcmd.

5.4.3 Upgrading a Kernel

About every 3 months, a new kernel is released, containing bugfixes, general improvements, newfeatures and support for new drivers. Once a kernel has been tailored, the .config file can bere-used as an initialisation for the new kernel.

[mleeman@zee linux-3.5.4]$ make ARCH=arm

CROSS_COMPILE=arm-linux- oldconfig

The target oldconfig uses the values stored in the .config file and tries to match them to theones that are begin used in the new kernel configuration. For KConfig values it does not find amatch for, the question is asked to the user. The target silentoldconfig does the same, but is moreterse in the output.

As a result, the old .config is replaced with a matching one for the kernel in which theoldconfig target was called3.

If menuconfig is run, the unmatched values will be answered with the default values, and thisnot always preferable.

5.5 Device Tree (Powerpc, Microblaze, only for now)

During the recent development of the Linux/ppc64 kernel, and more specifically, the addition ofnew platform types outside of the old IBM pSeries/iSeries pair, it was decided to enforce somestrict rules regarding the kernel entry and bootloader <-> kernel interfaces, in order to avoid thedegeneration that had become the ppc32 kernel entry point and the way a new platform shouldbe added to the kernel.

Since 2006, the powerpc 32 and 64-bit architectures have been merging and are reworked inthe kernel. The result of this is the arch/powerpc/ architecture.

Since the powerpc architectures have always been very popular as embedded processors, a lot ofkernel configuration options were nothing more than addressing and interrupts that were changed.Instead of re-compiling the kernel for these changes, the kernel could load this configuration dataand configure the drivers as indicated.

The device tree reflects this architecture description. The device tree is used by the bootloaderto initialise the correct data and is passed to the kernel. The device tree is passed by the bootloader(U-Boot) to the kernel, after which the kernel extracts the required information to boot the system.

Work is in progress to also use device trees on other architectures like ARM, but for now it isonly used on large scale on Powerpc and Microblaze (and Sparc).

In form, the device tree source (dts) is a hierarchical text structure, found in arch/powerpc/boot/dts/.

cpus {

#address-cells = <1>;

#size-cells = <0>;

PowerPC,8349@0 {

device_type = "cpu";

reg = <0>;

d-cache-line-size = <32>;

i-cache-line-size = <32>;

d-cache-size = <32768>;

3Backup the original file.


i-cache-size = <32768>;

timebase-frequency = <0>; // from bootloader

bus-frequency = <0>; // from bootloader

clock-frequency = <0>; // from bootloader

};

};

From this snippet from the arch/powerpc/boot/dts/mpc8349emitx.dts file in the 3.5.4 ker-nel, a number of settings are passed (cache line size), while a number of other critical settings areset by the bootloader as indicated in the placeholder (e.g. bus-frequency).

The dts file further contains information about the silicon on chip (SoC) structure of manyembedded powerpc processors, (e.g. i2c, usb, Ethernet MACs, . . . ) and their peripherals (e.g PCIdevices).

There are two major ways a powerpc kernel can be booted from U-Boot.

1. separate device tree blob (dtb) file

2. combined kernel/device tree blob file

Both approaches have their advantages and disadvantages. While the first approach offers abetter modularity and flexibility, it requires a bit more flash (typically one sector to be safe).

[mleeman@zee linux-3.5.4]$ make ARCH=powerpc \

CROSS_COMPILE=powerpc-linux-uclibc- uImage

For the creation of the image, the mkimage tool needs to be in the PATH, and is provided withthe source of the U-Boot bootloader in tools/mkimage.c.

The device tree is built with (it requires the installation of a recent version of device-tree-compiler):

dtc -I dts -O dtb -S 0x3000 -R 8 barco8347svc2.dts -o barco8347svc2.dtb

The command line specifies the input format (dts), the output format (dtb), extra space (toallow the bootloader to create extra device nodes) in a number of entries (-R).

Another option is to compile the kernel wrapped in the device tree. For this, the cuImage

target is proviced in the kernel build:

[mleeman@zee linux-3.5.4]$ make ARCH=powerpc \

CROSS_COMPILE=powerpc-linux-uclibc- cuImage

While the first approach of a separate dtb and kernel is preferred by many, the second canprovide a solution for platforms that are ported from the ppc architecture and where it is notallowed to modify the flash map to include the dtb partitions.

Chapter 4 shows how a dts file is used and passed to the kernel.

bootm ${factkernsaddr} - ${factdtbsaddr}

While for booting a classic kernel, one or two addresses are needed (factkernsaddr); bootinga dts file requires an extra parameter:

factkernsaddr the location of the kernel image in flash or in memory.

- the location of the initrd file in flash or in memory (which we do not use here).

factdtbsaddr the location of the device tree blob file.


5.5.1 Flash Mapping in the Device Tree

Later in this chapter, writing a flash map driver will be explained. Even though this is still validfor PowerPC architectures, as well as other architectures, the use of a device tree allows a cleanerand more user-friendly way to define these maps.

The following shows the example of a flash map of a Barco board:

/* Flash map */

flash@fe000000 {

device_type = "rom";

compatible = "amd,s29gl128n", "cfi-flash";

reg = <0xfe000000 0x01000000>;

bank-width = <2>;

partition@0{

label = "U-Boot";

reg = <0x00000000 0x00040000>;

};

partition@40000{

label = "Config Space 1";

reg = <0x00040000 0x00020000>;

};

partition@60000{

label = "Config Space 2";

reg = <0x00060000 0x00020000>;

};

partition@80000{

label = "Factory Kernel";

reg = <0x00080000 0x00140000>;

};

partition@1c0000{

label = "Factory Filesystem";

reg = <0x001c0000 0x00540000>;

};

partition@700000{

label = "Upgrade Kernel";

reg = <0x00700000 0x00140000>;

};

partition@840000{

label = "Upgrade Filesystem";

reg = <0x00840000 0x00540000>;

};

partition@d80000{

label = "JFFS2";

reg = <0x00d80000 0x00240000>;

};

partition@fc0000{

label = "Factory DTB";

reg = <0x00fc0000 0x00020000>;

};

partition@fe0000{

label = "Upgrade DTB";

reg = <0x00fe0000 0x00020000>;

};

};


The format is almost self explanatory, a partition is given a label, and the reg shows the startaddress of the flash partition (starting from 0x0), combined with the length of the partition.

5.5.2 What if Something Goes Wrong

If your design is or should be close to a reference design, chances are that somewhere along the linesome change or error got in the hardware. In any case, the kernel should be able to give sufficientfeedback to circumvent the problem.

5.5.2.1 What Is The Kernel Symbol Table?

The kernel doesn’t use symbol names. It’s much happier knowing a variable or function nameby the variable or function’s address. Rather than using size t BytesRead, the kernel prefers torefer to this variable as (for example) 0xc0343f20.

Humans, on the other hand, do not appreciate names like 0xc0343f20. We prefer to usesomething like size t BytesRead. Normally, this doesn’t present much of a problem. The kernelis mainly written in C, so the compiler/linker allows us to use symbol names when we code andallows the kernel to use addresses when it runs. Everyone is happy.

There are situations, however, where we need to know the address of a symbol (or the symbolfor an address). This is done by a symbol table, and is very similar to how gdb can give you thefunction name from a address (or an address from a function name). A symbol table is a listingof all symbols along with their address. Here is an example of a symbol table:

...

00000000c000ef74 T __div64_32

00000000c000f010 T mpc83xx_restart

00000000c000f0a0 T mpc83xx_power_off

00000000c000f0b0 T mpc83xx_halt

00000000c000f0c0 T ppc_sys_device_remove

00000000c000f15c T platform_notify_map

00000000c000f218 T ppc_sys_device_initfunc

00000000c000f2d0 T ppc_sys_device_setfunc

00000000c000f3d0 T ppc_sys_device_disable

00000000c000f45c T ppc_sys_device_enable

...

5.5.2.2 What Is An Oops?

What is the most common bug in your home-brewed programs? The segfault. Good ol’ signal 11.What is the most common bug in the Linux kernel? The segfault. Except here, the notion of a

segfault is much more complicated and can be, as you can imagine, much more serious. When thekernel dereferences an invalid pointer, it’s not called a segfault – it’s called an ”oops”. An oopsindicates a kernel bug and should always be reported and fixed.

Note that an oops is not the same thing as a segfault. Your program cannot recover from asegfault. The kernel doesn’t necessarily have to be in an unstable state when an oops occurs. TheLinux kernel is very robust; the oops may just kill the current process and leave the rest of thekernel in a good, solid state.

An oops is not a kernel panic. In a panic, the kernel cannot continue; the system grinds to ahalt and must be restarted. An oops may cause a panic if a vital part of the system is destroyed.An oops in a device driver, for example, will almost never cause a panic.

When an oops occurs, the system will print out information that is relevant to debugging theproblem, like the contents of all the CPU registers, and the location of page descriptor tables. Inparticular, the contents of the EIP (instruction pointer) is printed. Like this:

EIP: 0010:[<00000000>]

Call Trace: [<c010b860>]


5.5.2.3 What Does An Oops Have To Do With System.map?

You can agree that the information given in EIP and Call Trace is not very informative. But moreimportantly, it’s really not informative to a kernel developer either. Since a symbol doesn’t havea fixed address, c010b860 can point anywhere.

To help us use this cryptic oops output, Linux uses a daemon called klogd, the kernel loggingdaemon. klogd intercepts kernel oopses and logs them with syslogd, changing some of the uselessinformation like c010b860 with information that humans can use. In other words, klogd is a kernelmessage logger which can perform name-address resolution. Once klogd transforms the kernelmessage, it uses whatever logger is in place to log system wide messages, usually syslogd.

To perform name-address resolution, klogd uses System.map. Now you know what an oops hasto do with System.map.

Fine print: There are actually two types of address resolution are performed by klogd.

• Static translation, which uses the System.map file.

• Dynamic translation which is used with loadable modules, doesn’t use System.map and istherefore not relevant to this discussion, but I’ll describe it briefly anyhow. Suppose youload a kernel module which generates an oops. An oops message is generated, and klogdintercepts it. It is found that the oops occurred at d00cf810. Since this address belongs to adynamically loaded module, it has no entry in the System.map file. klogd will search for it,find nothing, and conclude that a loadable module must have generated the oops. klogd thenqueries the kernel for symbols that were exported by loadable modules. Even if the moduleauthor didn’t export his symbols, at the very least, klogd will know what module generatedthe oops, which is better than knowing nothing about the oops at all.

With embedded system, you will not have a klogd, but instead, you’ll have to rely on dmesg

and the serial interface. It is then up to the developer to look for the address that directly precedesthe address in the EIP in order to find out the location where the oops occurred.

5.6 Device Drivers

Linux kernel drivers is a huge topic that cannot be covered in its entirety here. This chapterinstead aims to provide an overview of aspects of device driver programming common to embeddedsystems. For more details, please see the (free) book Linux Device Drivers4.

5.6.1 Introduction

Many modern operating systems have a method for installing special files to make hardware work.Linux device drivers work through special kernel code that directly accesses the hardware. To makethe services that the card or other device offers available to normal user programs, the kernel usesthe special files in /dev.

One end of the file in /dev can be opened normally and the other end is attached to the kernel.That is of course an oversimplification, but I think you get the general idea: hardware, kernel,special file, user program and the same path back from user program to hardware. There are twoforms of the kernel portion of this equation: compiled-in drivers that are coded in permanentlywhen the kernel is built, and modules.

Another much-trumpeted advantage of Linux is that it does not need to be rebooted as oftenas other operating systems. You might think that this is due to its rock solid stability. You maythink I am now going to talk about the quality of the device drivers. But you’d be wrong. Thereason that Linux device drivers lead to less rebooting is that we can reconfigure, load or unloadthem without restarting the system.

To do this modular kernel drivers are used.

4http://lwn.net/Kernel/LDD3/

http://lwn.net/Kernel/LDD3/


5.6.1.1 How to load a module

Most people configure their modules at install time and then leave them alone. All the majordistributions have taken to modules because of another advantage they have: size. Distributionmakers want to support all the possible cards and devices that Linux can. If we compiled allthese into the kernel it would be huge. If several different static kernels for different devices weresupplied then they would take up too much space, as well. With the modular system distributionmakers supply a stripped down kernel plus a comprehensive set of device drivers. This typicallyonly occupies two or three floppy disks in total. modprobe, lsmod and insmod

if you want to load a module after system setup time, then the easiest way is as follows:

[mleeman@seraph ~]$ sudo modprobe xfs

This example loads the XFS subsystem driver with the modprobe command. If the moduletakes parameters, like IRQ numbers, then you can specify them with modprobe too.

To see what modules are loaded and to see information on how they depend on each other weuse lsmod. Here is some example output from lsmod.

[mleeman@seraph ~]$ lsmod

Module Size Used by

xfs 639312 0

nfs 340672 1

nfsd 355080 13

exportfs 25592 2 xfs,nfsd

nfs_acl 22056 2 nfs,nfsd

lockd 112280 3 nfs,nfsd

sunrpc 245576 11 nfs,nfsd,nfs_acl,lockd

autofs4 48240 2

In this example the nfs kernel device driver depends on the sunrpc device driver.In the normal course of events the modules we asked for when Linux was installed are loaded

at boot time. To achieve this the file /etc/modules is used. This is a list of modules to be loaded.A more flexible way to load modules, is by using devfs. A devfs/hotplug system will probe thehardware at boot time and load the appropriate modules.

5.6.1.2 Choosing the device type

Block drivers A block device is something that can host a filesystem such as a disk. A blockdevice can only be accessed as multiples of a block, where a block is usually one kilobyte of data .

Character drivers A character device is one that can be accessed like a file, and a char driver isin charge of implementing this behaviour. This driver implements the open, close, read and writesystem calls. The console and parallel ports are examples of char devices.

5.6.2 Busses

Devices are connected to the main CPU through a bus. Linux supports a plethora of busses (PCI,ISA, USB, FireWire, I2C, SPI, W1, SCSI, ..), but because of the new common driver model inthe 2.6 kernel the general kernel interface of most busses is pretty similar:

• Device drivers register themselves with the driver core and describes what kind of devicesthey can handle (vendor/device IDs for USB or PCI, ..).

• When a new device is detected the driver core with call the probe() method of the driver,which then sets up the hardware. If

• the device again disappears (E.G. USB hotplug) the remove() method of the driver is called.


5.6.2.1 Platform Bus

The platform bus is a common logical bus for all devices on simple “un-intelligent” busses like theold ISA bus or the internal busses of SoCs. Because those busses are so simple, there isn’t a lotfor the kernel to do, and hence the support is pretty minimal.

The simple busses don’t have a mechanism for plug-n-play like more complicated busses like PCIand USB, so the driver core cannot automatically detect what devices are present. Board specificcode instead registers data structures (struct platform device5) with information about thetype of device and any resources needed to use it:

struct platform device {const char ∗ name;u32 id;struct device dev;u32 num resources;struct resource ∗ resource;

};

Resources are a generic way of providing platform details such as base address and interruptnumbers to the driver.

Likewise, the platform devices do not have vendor/device IDs like PCI and USB. Instead the.name member is simply matched. Every platform driver has a unique name, and their probe()

function will get called for every platform device registered with the same name. drivers.Below you see the code needed to setup the driver for the SMSC 9117 Ethernet chip on the

thinLITE board6:

static struct resource smc911x resources[] = {[0] = {

.start = 0x8e000000,

.end = 0x8e0000ff,

.flags = IORESOURCE MEM,},[1] = {

.start = 4,

.end = 4,

.flags = IORESOURCE IRQ,},

};

static struct platform device thinlite eth = {.name = "smc911x",.id = 0,.num resources = ARRAY SIZE(smc911x resources),.resource = smc911x resources,

};

The smc911x driver has a corresponding struct platform driver7:

5include/linux/platform device.h6arch/ppc/platforms/4xx/barco thinlite.c7drivers/net/smc911x.c


static struct platform driver smc911x driver = {.probe = smc911x drv probe,.remove = smc911x drv remove,.driver = {

.name = "smc911x",}

};

5.6.2.2 PCI Bus

The PCI specification covers most issues related to computer interfaces. This section will explainhow a PCI driver can find its hardware and gain access to it8.

The PCI architecture was designed as a replacement for the ISA standard, with three maingoals:

• to get better performance when transferring data between the computer and its peripherals

• to be as platform independent as possible

• to simplify adding and removing peripherals to the system

The PCI bus archives better performance by using a higher clock range than ISA; its clockruns at 25 or 33 MHZ; 66 MHz and even 133 MHz implementations have been deployed as well.Moreover, it is equipped with a 32-bit or 64-bit data bus. Platform independence is often a goalin the design of a computer bus: PCI is currently used extensively on ia-32, Alpha, PowerPC,Sparc64 and ia-64 systems.

What is most relevant to the kernel developer is the PCI support for auto detection of interfaceboards: PCI devices are jumperless and are automatically configured at boot time. Then, thedevice driver must be able to access configuration information in the device in order to completeinitialisation. This happens without the need to perform any probing.

In the remainder of this chapter, we will explain a driver that is used to manipulate andcommunicate with a TI DSP. The DSP itself communicates with the processor over PCI. This iselaborated next. After the PCI part, we will go in detail on the char device access that is usedfor finer granularity access and data transfer.

PCI Addressing Each PCI peripheral is identified by a bus number, a device number, anda function number. The PCI specification permits a single system to host up to 256 buses, butbecause 256 buses are not sufficient for many large systems, Linux now supports PCI domains.Each PCI domain can host up to 256 buses. Each bus hosts up to 32 devices, and each device canbe a multi-function board with a maximum of eight functions. Therefore, each function can beidentified at hardware level by a 16-bit address, or key. Device drivers written for Linux, though,don’t need to deal with those binary addresses, because they use a specific data structure, calledpci dev, to act on the devices.

Most recent workstations feature at least two PCI buses. Plugging more than one bus in asingle system is accomplished by means of bridges, special-purpose PCI peripherals whose taskis joining two buses. The overall layout of a PCI system is a tree where each bus is connectedto an upper-layer bus, up to bus 0 at the root of the tree. The CardBus PC-card system is alsoconnected to the PCI system via bridges. A typical PCI system is represented in Figure 5.4, wherethe various bridges are highlighted.

8The following text contains important extracts from Linux Device Drivers, adjusted and expanded to matchBarco specific examples


Figure 5.4: Layout of a typical PCI System.

The 16-bit hardware addresses associated with PCI peripherals, although mostly hidden in thestruct pci dev object, are still visible occasionally, especially when lists of devices are being used.One such situation is the output of lspci (part of the pciutils package, available with most distribu-tions) and the layout of information in /proc/pci and /proc/bus/pci. The sysfs representation ofPCI devices also shows this addressing scheme, with the addition of the PCI domain information9.When the hardware address is displayed, it can be shown as two values (an 8-bit bus number andan 8-bit device and function number), as three values (bus, device, and function), or as four values(domain, bus, device, and function); all the values are usually displayed in hexadecimal.

For example, /proc/bus/pci/devices uses a single 16-bit field (to ease parsing and sorting),while /proc/bus/busnumber splits the address into three fields. This is the listing of an NSLU2device, in general purpose machines a lot of more devices would typically appear.

[marc@chiana ~]$ lspci

00:01.0 USB Controller: NEC Corporation USB (rev 43)

00:01.1 USB Controller: NEC Corporation USB (rev 43)

00:01.2 USB Controller: NEC Corporation USB 2.0 (rev 04)

[marc@chiana ~]$ cat /proc/bus/pci/devices | cut -f1

0008

0009

000a

[marc@chiana ~]$ tree /sys/bus/pci/devices/

/sys/bus/pci/devices/

|-- 0000:00:01.0 -> ../../../devices/pci0000:00/0000:00:01.0

|-- 0000:00:01.1 -> ../../../devices/pci0000:00/0000:00:01.1

‘-- 0000:00:01.2 -> ../../../devices/pci0000:00/0000:00:01.2

3 directories, 0 files

When taking a USB controller as an example, 0x00a means 0000:00:01:2 when split intodomain (16 bits), bus (8 bits), device (5bits) and function (3 bits).

The hardware circuitry of each peripheral board answers queries pertaining to three addressspaces: memory locations, I/O ports, and configuration registers. The first two address spaces areshared by all the devices on the same PCI bus (i.e., when you access a memory location, all thedevices on that PCI bus see the bus cycle at the same time). The configuration space, on the other

9Some architectures also display the PCI domain information in the /proc/pci and /proc/bus/pci files.


hand, exploits geographical addressing. Configuration queries address only one slot at a time, sothey never collide.

As far as the driver is concerned, memory and I/O regions are accessed in the usual ways viainb, readb, and so forth. Configuration transactions, on the other hand, are performed by callingspecific kernel functions to access configuration registers. With regard to interrupts, every PCIslot has four interrupt pins, and each device function can use one of them without being concernedabout how those pins are routed to the CPU. Such routing is the responsibility of the computerplatform and is implemented outside of the PCI bus. Since the PCI specification requires interruptlines to be shareable, even a processor with a limited number of IRQ lines, such as the x86, canhost many PCI interface boards (each with four interrupt pins).

The I/O space in a PCI bus uses a 32-bit address bus (leading to 4 GB of I/O ports), while thememory space can be accessed with either 32-bit or 64-bit addresses. 64-bit addresses are availableon more recent platforms. Addresses are supposed to be unique to one device, but software mayerroneously configure two devices to the same address, making it impossible to access either one.But this problem never occurs unless a driver is willingly playing with registers it shouldn’t touch.The good news is that every memory and I/O address region offered by the interface board can beremapped by means of configuration transactions. That is, the firmware initialises PCI hardwareat system boot, mapping each region to a different address to avoid collisions. The addresses towhich these regions are currently mapped can be read from the configuration space, so the Linuxdriver can access its devices without probing. After reading the configuration registers, the drivercan safely access its hardware.

The PCI configuration space consists of 256 bytes for each device function (except for PCIExpress devices, which have 4 KB of configuration space for each function), and the layout ofthe configuration registers is standardised. Four bytes of the configuration space hold a uniquefunction ID, so the driver can identify its device by looking for the specific ID for that peripheral.In summary, each device board is geographically addressed to retrieve its configuration registers;the information in those registers can then be used to perform normal I/O access, without theneed for further geographic addressing.

Boot Time To see how PCI works, we start from system boot, since that’s when the devicesare configured. When power is applied to a PCI device, the hardware remains inactive. In otherwords, the device responds only to configuration transactions. At power on, the device has nomemory and no I/O ports mapped in the computer’s address space; every other device-specificfeature, such as interrupt reporting, is disabled as well.

It is the task for the motherboard firmware, BIOS or, U-Boot to access the device configu-ration address space by reading and writing registers in the PCI controller. At system boot, thebootloader, or the linux kernel, performs configuration transactions with every PCI peripheral inorder to allocate a safe place for each address region it offers. By the time a device driver accessesthe device, its memory and I/O regions have already been mapped into the processor’s addressspace. The driver can cahnge this default assignment, but it never needs to do that.

[mleeman@chiana ~]$ dmesg |grep PCI

PCI: IXP4xx is host

PCI: IXP4xx Using direct access for memory space

PCI: bus0: Fast back to back transfers disabled

ohci_hcd: 2005 April 22 USB 1.1 ’Open’ Host Controller (OHCI) Driver (PCI)

PCI: enabling device 0000:00:01.0 (0140 -> 0142)



As suggested, the user can look at the PCI device list and the devices’ configuration registersby reading /proc/bus/pci/devices and /proc/bus/pci/*/*. The former is a text file with (hexadec-imal) device information, and the latter are binary files that report a snapshot of the configurationregisters of each device, one file per device. The individual PCI device directories in the sysfs treecan be found in /sys/bus/pci/devices. A PCI device directory contains a number of different files:


[marc@chiana ~]$ tree /sys/bus/pci/devices/0000\:00\:01.1

/sys/bus/pci/devices/0000:00:01.1

|-- broken_parity_status

|-- bus -> ../../../bus/pci

|-- class

|-- config

|-- device

|-- driver -> ../../../bus/pci/drivers/ohci_hcd

|-- enable

|-- irq

|-- local_cpus

|-- modalias

|-- pools

|-- resource

|-- resource0

|-- subsystem -> ../../../bus/pci

|-- subsystem_device

|-- subsystem_vendor

|-- uevent

|-- usb2

<...>

|-- usb_host:usb_host2 -> ../../../class/usb_host/usb_host2

‘-- vendor

23 directories, 64 files

The file config is a binary file that allows the raw PCI config information to be read fromthe device (just like the /proc/bus/pci/*/* provides.) The files vendor, device, subsystem device,subsystem vendor, and class all refer to the specific values of this PCI device (all PCI devicesprovide this information.) The file irq shows the current IRQ assigned to this PCI device, and thefile resource shows the current memory resources allocated by this device.

Configuration Registers and Initialisation All PCI devices feature at least a 256-byte ad-dress space. The first 64 bytes are standardised, while the rest are device dependent. Figure 5.5shows the layout of the device independent configuration space.

Figure 5.5: The standardised PCI configuration registers.


As the figure shows, some of the PCI configuration registers are required and some are optional.Every PCI device must contain meaningful values in the required registers, whereas the contentsof the optional registers depend on the actual capabilities of the peripheral. The optional fields arenot used unless the contents of the required fields indicate that they are valid. Thus, the requiredfields assert the board’s capabilities, including whether the other fields are usable.

It’s interesting to note that the PCI registers are always little-endian. Although the standardis designed to be architecture independent, the PCI designers sometimes show a slight bias towardthe PC environment. The driver writer should be careful about byte ordering when accessingmultibyte configuration registers; code that works on the PC might not work on other platforms.The Linux developers have taken care of the byte-ordering problem, but the issue must be kept inmind. If you ever need to convert data from host order to PCI order or vice versa, you can resortto the functions defined in <asm/byteorder.h>.

Three or five PCI registers identify a device: vendorID, deviceID, and class are the three thatare always used. Every PCI manufacturer assigns proper values to these read-only registers, andthe driver can use them to look for the device. Additionally, the fields subsystem vendorID andsubsystem deviceID are sometimes set by the vendor to further differentiate similar devices.

Let’s look at these registers in more detail:

vendorID : This 16-bit register identifies a hardware manufacturer. For instance, every Inteldevice is marked with the same vendor number, 0x8086. There is a global registry of suchnumbers, maintained by the PCI Special Interest Group, and manufacturers must apply tohave a unique number assigned to them.

deviceID : This is another 16-bit register, selected by the manufacturer; no official registrationis required for the device ID. This ID is usually paired with the vendor ID to make a unique32-bit identifier for a hardware device. We use the word signature to refer to the vendor anddevice ID pair. A device driver usually relies on the signature to identify its device; you canfind what value to look for in the hardware manual for the target device.

class : Every peripheral device belongs to a class. T heclass register is a 16-bit value whose top 8bits identify the ”base class” (or group). For example, ”ethernet” and ”token ring” are twoclasses belonging to the ”network” group, while the ”serial” and ”parallel” classes belong tothe ”communication” group. Some drivers can support several similar devices, each of themfeaturing a different signature but all belonging to the same class; these drivers can rely onthe class register to identify their peripherals, as shown later.

subsystem vendorID

subsystem deviceID : These fields can be used for further identification of a device. If thechip is a generic interface chip to a local (onboard) bus, it is often used in several completelydifferent roles, and the driver must identify the actual device it is talking with. The subsystemidentifiers are used to this end.

Using these different identifiers, a PCI driver can tell the kernel what kind of devices it supports.The struct pci device id structure is used to define a list of the different types of PCI devicesthat a driver supports. This structure contains the following fields:

u32 vendor; u32 device :These specify the PCI vendor and device IDs of a device. If adriver can handle any vendor or device ID, the value PCI ANY ID should be used for thesefields.

u32 subvendor; u32 subdevice : These specify the PCI subsystem vendor and subsystemdevice IDs of a device. If a driver can handle any type of subsystem ID, the value PCI ANY ID

should be used for these fields.


u32 class; u32 class mask : These two values allow the driver to specify that it supportsa type of PCI class device. The different classes of PCI devices (a VGA controller is oneexample) are described in the PCI specification. If a driver can handle any type of subsystemID, the value PCI ANY ID should be used for these fields.

kernel ulong t driver data : This value is not used to match a device but is used to holdinformation that the PCI driver can use to differentiate between different devices if it wantsto.

There are two helper macros that should be used to initialise a struct pci device id struc-ture:

PCI DEVICE(vendor, device) : This creates a struct pci device id that matches only thespecific vendor and device ID. The macro sets the subvendor and subdevice fields of thestructure to PCI ANY ID.

PCI DEVICE CLASS(device class, device class mask) : This creates a struct pci device idthat matches a specific PCI class.

5.6.3 A Real Life Barco Example

5.6.3.1 Introduction

While the examples for PCI usage in barco embedded systems are no doubt many, we will mainlyfocus on a couple of modules. The first (sfpga) is a PCI driver that accesses Altera FPGAs overPCI and the second example (ppc2dsp) is a module use to steer TI C64x DSPs over PCI.

5.6.3.2 Initialisation

The code in drivers/barco/ppc2dsp.c shows example usage of the macros described on page 58:

/∗ — Supported devices — ∗/static struct pci device id sfpga id table[] devinitdata = {{

.vendor = VENDOR ID ALTERA,

.device = DEVICE ID SFPGA,

.subvendor = PCI ANY ID,

.subdevice = PCI ANY ID,

.class = 0,

.class mask = 0,

.driver data = 0},{

.vendor = VENDOR ID ALTERA,

.device = DEVICE ID NWW,

.subvendor = PCI ANY ID,

.subdevice = PCI ANY ID,

.class = 0,

.class mask = 0,

.driver data = 0},{0,}

};MODULE DEVICE TABLE(pci, sfpga id table);


This pci device id structure needs to be exported to user space to allow the hotplug andmodule loading systems know what module works with what hardware devices. The macro MOD-ULE DEVICE TABLE accomplishes this. An example is:

MODULE DEVICE TABLE(pci, sfpga id table); This statement creates a local variable calledmod pci device table that points to the list of struct pci device id. Later in the kernel build pro-

cess, the depmod program searches all modules for the symbol mod pci device table. If thatsymbol is found, it pulls the data out of the module and adds it to the file /lib/modules/ KER-NEL VERSION/modules.pcimap. After depmod completes, all PCI devices that are supportedby modules in the kernel are listed, along with their module names, in that file. When the ker-nel tells the hotplug system that a new PCI device has been found, the hotplug system uses themodules.pcimap file to find the proper driver to load.

This particular example shows that the same driver can be used for two different Altera basedFPGAs.

5.6.3.3 Registering a PCI Driver

The main structure that all PCI drivers must create in order to be registered with the kernelproperly is the struct pci driver structure. This structure consists of a number of function callbacksand variables that describe the PCI driver to the PCI core. Here are the fields in this structurethat a PCI driver needs to be aware of:

const char *name : The name of the driver. It must be unique among all PCI drivers in thekernel and is normally set to the same name as the module name of the driver. It shows upin sysfs under /sys/bus/pci/drivers/ when the driver is in the kernel.

const struct pci device id *id table : Pointer to the struct pci device id table described ear-lier in this chapter.

int (*probe) (struct pci dev *dev, const struct pci device id *id) : Pointer to the probefunction in the PCI driver. This function is called by the PCI core when it has a structpci dev that it thinks this driver wants to control. A pointer to the struct pci device id thatthe PCI core used to make this decision is also passed to this function. If the PCI driverclaims the struct pci dev that is passed to it, it should initialise the device properly andreturn 0. If the driver does not want to claim the device, or an error occurs, it should returna negative error value. More details about this function follow later in this chapter.

void (*remove) (struct pci dev *dev) : Pointer to the function that the PCI core calls whenthe struct pci dev is being removed from the system, or when the PCI driver is being unloadedfrom the kernel. More details about this function follow later in this chapter.

int (*suspend) (struct pci dev *dev, u32 state) : Pointer to the function that the PCI corecalls when the struct pci dev is being suspended. The suspend state is passed in the statevariable. This function is optional; a driver does not have to provide it.

int (*resume) (struct pci dev *dev) : Pointer to the function that the PCI core calls whenthe struct pci dev is being resumed. It is always called after suspend has been called. Thisfunction is optional; a driver does not have to provide it.

In summary, to create a proper struct pci driver structure, only four fields need to be ini-tialised10:

10#define DRV NAME ”sfpga”


static struct pci driver sfpga pci driver = {.name = SFPGA DRV NAME,.id table = sfpga id table,.probe = sfpga probe,.remove = sfpga remove,

};

To register the struct pci driver with the PCI core, a call to pci register driver is made witha pointer to the struct pci driver. This is traditionally done in the module initialisation code forthe PCI driver:

int sfpga init module(void){

int32 t result = 0;

DBG("FUNCTIONCALL: %s at %d\n", FUNCTION , LINE );printk(KERN INFO SFPGA DRV NAME ": " SFPGA DRV DESCRIPTION ", " SF-

PGA DRV VERSION "\n");printk(KERN INFO SFPGA DRV NAME ": " SFPGA DRV COPYRIGHT "\n");

/∗ Register PCI driver ∗/if(pci register driver(&sfpga pci driver)){

result = −ENODEV;goto fail;

}

return result; /∗ succeed ∗/fail:

sfpga cleanup module();return result;

}

When the PCI driver is to be unloaded, the struct pci driver needs to be unregistered from thekernel. This is done with a call to pci unregister driver. When this call happens, any PCI devicesthat were currently bound to this driver are removed, and the remove function for this PCI driveris called before the pci unregister driver function returns.


/∗∗ The cleanup function is used to handle initialization failures as well.∗ Therefore, it must be careful to work correctly even if some of the items∗ have not been initialised∗/

void sfpga cleanup module(void){

sfpga clean chardevs();

pci unregister driver(&sfpga pci driver);

printk(KERN INFO SFPGA DRV NAME ": device removed.\n");}

In the probe function for the PCI driver, before the driver can access any device resource (I/Oregion or interrupt) of the PCI device, the driver must call the pci enable device function:

int pci enable device(struct pci dev *dev) : This function actually enables the device. Itwakes up the device and in some cases also assigns its interrupt line and I/O regions. Thishappens, for example, with CardBus devices (which have been made completely equivalentto PCI at the driver level).

static int devinit sfpga probe(struct pci dev ∗dev, iconst struct pci device id ∗pci id)

{int retval= 0;

/∗ .... ∗//∗ Initialise device before it’s used by the driver ∗/if(pci enable device(dev)){

printk(KERN INFO DRV NAME ": could not enable PCI device\n");retval = −EIO;goto error pci enable;

}

error pci enable:pci set drvdata(dev,NULL);

error kmalloc:kfree(pci dev);return retval;

}

In case of an error during enabling of the device, we clean up what we have previously allocated.

5.6.3.4 Assigning the I/O and Memory Spaces

Accessing the I/O and Memory Spaces A PCI device implements up to six I/O address regions.Each region consists of either memory or I/O locations. Most devices implement their I/O registersin memory regions, because it’s generally a saner approach. However, unlike normal memory, I/Oregisters should not be cached by the CPU because each access can have side effects. The PCI devicethat implements I/O registers as a memory region marks the difference by setting a ”memory-is-


prefetchable” bit in its configuration register11. If the memory region is marked as prefetchable,the CPU can cache its contents and do all sorts of optimization with it; nonprefetchable memoryaccess, on the other hand, can’t be optimised because each access can have side effects, just as withI/O ports. Peripherals that map their control registers to a memory address range declare thatrange as nonprefetchable, whereas something like video memory on PCI boards is prefetchable. Inthis section, we use the word region to refer to a generic I/O address space that is memory-mappedor port-mapped.

An interface board reports the size and current location of its regions using configurationregisters–the six 32-bit registers shown in Figure 12-2, whose symbolic names are PCI BASE ADDRESS 0through PCI BASE ADDRESS 5. Since the I/O space defined by PCI is a 32-bit address space,it makes sense to use the same configuration interface for memory and I/O. If the device uses a64-bit address bus, it can declare regions in the 64-bit memory space by using two consecutivePCI BASE ADDRESS registers for each region, low bits first. It is possible for one device to offerboth 32-bit regions and 64-bit regions.

In the kernel, the I/O regions of PCI devices have been integrated into the generic resourcemanagement. For this reason, you don’t need to access the configuration variables in order to knowwhere your device is mapped in memory or I/O space. The preferred interface for getting regioninformation consists of the following functions:

unsigned long pci resource start(struct pci dev *dev, int bar) : The function returns thefirst address (memory address or I/O port number) associated with one of the six PCI I/Oregions. The region is selected by the integer bar (the base address register), ranging from0-5 (inclusive).

unsigned long pci resource end(struct pci dev *dev, int bar) : The function returns thelast address that is part of the I/O region number bar. Note that this is the last usableaddress, not the first address after the region.

unsigned long pci resource flags(struct pci dev *dev, int bar) : This function returns theflags associated with this resource.

Resource flags are used to define some features of the individual resource. For PCI resourcesassociated with PCI I/O regions, the information is extracted from the base address registers, butcan come from elsewhere for resources not associated with PCI devices.

All resource flags are defined in <linux/ioport.h>; the most important are:

IORESOURCE IO

IORESOURCE MEM :If the associated I/O region exists, one and only one of these flags isset.

IORESOURCE PREFETCH

IORESOURCE READONLY : These flags tell whether a memory region is prefetchableand/or write protected. The latter flag is never set for PCI resources.

By making use of the pci resource functions, a device driver can completely ignore the under-lying PCI registers, since the system already used them to structure resource information.

11The information lives in one of the low-order bits of the base address PCI registers. The bits are definedin<linux/pci.h>.


/∗ Inspect PCI BARs and remap I/O memory ∗/for (i = 0; i < DEVICE COUNT RESOURCE; i++) {

/∗ the function returns the first address (memory address∗ or I/O port number) associated with one of the six PCI∗ I/O regions. The region is selected by integer i (the∗ base address register), ranging from 0 to 5, inclusive ∗/

if (pci resource start(dev, i) != 0) {DBG("BAR %d (%#08x-%#08x), len=%d, flags=%#08x\n", i,

(uint32 t) pci resource start(dev, i),(uint32 t) pci resource end(dev, i),(uint32 t) pci resource len(dev, i),(uint32 t) pci resource flags(dev, i)

);}/∗ if the associated I/O region exists, one and only∗ one of these flags is set (IORESOURCE MEM xor∗ IORESOURCE IO)∗/

if (pci resource flags(dev, i) & IORESOURCE MEM) {if(!(pci dev→sfpga devices[i].mmio addr =

ioremap(pci resource start(dev, i),pci resource len(dev, i)))){

DBG("unable to remap I/O memory\n");retval = −ENOMEM;goto error ioremap;

}

pci dev→sfpga devices[i].mmio len = pci resource len(dev, i);

}else if (pci resource flags(dev, i) & IORESOURCE IO) {

request region(pci resource start(dev, i),pci resource len(dev, i), DRV NAME);

pci dev→sfpga devices[i].mmio addr =(void ∗)pci resource start(dev, i);

pci dev→sfpga devices[i].mmio len =pci resource len(dev, i);

}}

In this example, we scan the memory ranges and save them in the driver data structure. Ifit is a memory range, the range is remapped. For accessing the PCI configuration space, thesefunctions are provided by Linux:

int pci read config byte(struct pci dev *dev, int where, u8 *val);

int pci read config word(struct pci dev *dev, int where, u16 *val);

int pci read config dword(struct pci dev *dev, int where, u32 *val); : Read one, two, orfour bytes from the configuration space of the device identified by dev. T hewhere argumentis the byte offset from the beginning of the configuration space. The value fetched from theconfiguration space is returned through the val pointer, and the return value of the functions


is an error code. The word and dword functions convert the value just read from little-endianto the native byte order of the processor, so you need not deal with byte ordering.

int pci write config byte(struct pci dev *dev, int where, u8 val);

int pci write config word(struct pci dev *dev, int where, u16 val);

int pci write config dword(struct pci dev *dev, int where, u32 val); : Write one, two, orfour bytes to the configuration space. The device is identified by dev as usual, and the valuebeing written is passed as val. T heword and dword functions convert the value to little-endian before writing to the peripheral device.

5.6.3.5 PCI Interrupts

As far as interrupts are concerned, PCI is easy to handle. By the time Linux boots, the computer’sfirmware has already assigned a unique interrupt number to the device, and the driver just needsto use it. The interrupt number is stored in configuration register 60 (PCI INTERRUPT LINE),which is one byte wide. This allows for as many as 256 interrupt lines, but the actual limit dependson the CPU being used. The driver doesn’t need to bother checking the interrupt number, becausethe value found in PCI INTERRUPT LINE is guaranteed to be the right one.

If the device doesn’t support interrupts, register 61 (PCI INTERRUPT PIN) is 0; otherwise,it’s nonzero. However, since the driver knows if its device is interrupt driven or not, it doesn’tusually need to read PCI INTERRUPT PIN.

Thus, PCI-specific code for dealing with interrupts just needs to read the configuration byteto obtain the interrupt number that is saved in a local variable, as shown in the following code.

if((result = pci read config byte(dev, PCI INTERRUPT LINE, &myirq))){/∗ deal with error ∗/

}

When applied to a practical example from drivers/barco/ppc2dsp.c:

/∗ check for the interrupt ∗/if((returnvalue = pci read config byte(dev,PCI INTERRUPT LINE,&ppc2dsp irq))){

printk(KERN INFO DRV NAME ": could not get IRQ from PCI configuration\n");}else if ((returnvalue =

request irq(ppc2dsp irq, dsp interrupt, 0x0, DRV NAME, NULL))) {printk(KERN INFO DRV NAME ": could not get IRQ %d\n", ppc2dsp irq);

} else {unsigned long hsr;/∗ enable DSP interrupt here ∗/hsr = inl(PCI REGISTER(legacyptr, 0));hsr &= ∼0x4;outl(hsr, PCI REGISTER(legacyptr, 0));wmb();

}

In this example from ppc2dsp probe, we read the configured PCI interrupt line from PCI toget the interrupt line, register the interrupt handler to the system. Finally, we signal and enablethe interrupt to the DSP.


5.6.4 Adding a Character Interface

For our PCI driver, we want to make the communication to the peripheral available to user spaceapplications via device files. For this purpose, a char device interface is added to the sfpga driver.

5.6.4.1 Major and Minor Numbers

Char devices are accessed through names in the filesystem. Those names are called special filesor device files or simply nodes of the filesystem tree; they are conventionally located in the /dev/

directory tree. Special files for char drivers are identified by a c in the first column of the outputof ls -l. Block devices appear in /dev/ as well, but they are identified by a b.

# ls -l /dev |grep fpga

crw-rw-rw- 1 0 0 252, 4 Jul 20 2006 dspa

crw-r----- 1 0 0 252, 0 Jul 20 2006 dspa0

crw-r----- 1 0 0 252, 1 Jul 20 2006 dspa1

crw-r----- 1 0 0 252, 2 Jul 20 2006 dspa2

crw-r----- 1 0 0 252, 3 Jul 20 2006 dspa3

# ls -l /dev |grep fpga

crw-rw-rw- 1 0 0 253, 4 Jul 20 2006 fpgaa

crw-r----- 1 0 0 253, 0 Jul 20 2006 fpgaa0

crw-r----- 1 0 0 253, 1 Jul 20 2006 fpgaa1

crw-r----- 1 0 0 253, 2 Jul 20 2006 fpgaa2

crw-r----- 1 0 0 253, 3 Jul 20 2006 fpgaa3

crw-r----- 1 0 0 251, 0 Jul 20 2006 nwwfpgaa

If you issue the ls -l command, you’ll see two numbers, seperated by a comma, in the devicefile entries before the data of the last modification, where the file length normally appers. Thesenumbers are the major and minor device number for the particular device. The listing showsthe character devices on a SVC mk I board. The major number identifies the driver associatedwith the device. For example, /dev/dspa and /dev/dspa0 are both managed by driver 252, while/dev/fpgaa is managed by driver 253. Modern Linux kernels allow multiple dirvers to sharejmajor numbers, but most devices that you will see are still organised on the one-major-one-driverprinciple.

The minor number is used by the kernel to determine exaclty which device is being referred to.Depebnding on how your driver is written, you can either get a direct pointer to your device fromthe kernel, or you can use the minor number yourself as an index into a local array of devices.Either way, the kernel itself knows almost nothing about minor number beyond the fact that theyrefer to devices implemented by your driver.

5.6.4.2 The Internal Representation of Device Numbers

Within the kernel, the dev t type (defined in <linux/types.h>) is used to hold device numbers–both the major and minor parts. Dev t is a 32-bit quantity with 12 bits set aside for the majornumber and 20 for the minor number. Your code should, of course, never make any assumptionsabout the internal organization of device numbers; it should, instead, make use of a set of macrosfound in <linux/kdev t.h>. To obtain the major or minor parts of a dev t, use:

MAJOR(dev_t dev);

MINOR(dev_t dev);

If, instead, you have the major and minor numbers and need to turn them into a dev t, use:

MKDEV(int major, int minor);


5.6.4.3 Some Important Data Structures

Most of the fundamental driver operations involve three important kernel data structures, calledfile operations, file, and inode. A basic familiarity with these structures is required to be able todo much of anything interesting, so we will now take a quick look at each of them before gettinginto the details of how to implement the fundamental driver operations.

5.6.4.4 File Operations

So far, we have reserved some device numbers for our use, but we have not yet connected any of ourdriver’s operations to those numbers. The file operations structure is how a char driver sets up thisconnection. The structure, defined in <linux/fs.h>, is a collection of function pointers. Each openfile (represented internally by a file structure, which we will examine shortly) is associated withits own set of functions (by including a field called f op that points to a file operations structure).The operations are mostly in charge of implementing the system calls and are therefore, namedopen, read, and so on.

Conventionally, a file operations structure or a pointer to one is called fops (or some variationthereof). Each field in the structure must point to the function in the driver that implements aspecific operation, or be left NULL for unsupported operations. The exact behavior of the kernelwhen a NULL pointer is specified is different for each function, as the list later in this sectionshows.

The following list introduces all the operations that an application can invoke on a device.We’ve tried to keep the list brief so it can be used as a reference, merely summarising eachoperation and the default kernel behavior when a NULL pointer is used.

As you read through the list of file operations methods, you will note that a number of param-eters include the string user. This annotation is a form of documentation, noting that a pointeris a user-space address that cannot be directly dereferenced. For normal compilation, user hasno effect, but it can be used by external checking software to find misuse of user-space addresses.

The rest of the chapter, after describing some other important data structures, explains therole of the most important operations and offers hints, caveats, and real code examples. We deferdiscussion of the more complex operations to later chapters, because we aren’t ready to dig intotopics such as memory management, blocking operations, and asynchronous notification quite yet.

struct module *owner : The first file operations field is not an operation at all; it is a pointerto the module that ”owns” the structure. This field is used to prevent the module frombeing unloaded while its operations are in use. Almost all the time, it is simply initialisedto THIS MODULE, a macro defined in <linux/module.h>.

loff t (*llseek) (struct file *, loff t, int); : The llseek method is used to change the currentread/write position in a file, and the new position is returned as a (positive) return value.The loff t parameter is a ”long offset” and is at least 64 bits wide even on 32-bit platforms.Errors are signaled by a negative return value. If this function pointer is NULL, seek callswill modify the position counter in the file structure (described in the section ”The fileStructure”) in potentially unpredictable ways.

ssize t (*read) (struct file *, char user *, size t, loff t *); : Used to retrieve data fromthe device. A null pointer in this position causes the read system call to fail with -EINVAL(”Invalid argument”). A nonnegative return value represents the number of bytes successfullyread (the return value is a ”signed size” type, usually the native integer type for the targetplatform).

ssize t (*aio read)(struct kiocb *, char user *, size t, loff t); : Initiates an asynchronousread–a read operation that might not complete before the function returns. If this methodis NULL, all operations will be processed (synchronously) by read instead.


ssize t (*write) (struct file *, const char user *, size t, loff t *); : Sends data to the de-vice. If NULL, -EINVAL is returned to the program calling the write system call. The returnvalue, if nonnegative, represents the number of bytes successfully written.

ssize t (*aio write)(struct kiocb *, const char user *, size t, loff t *); : Initiates an asyn-chronous write operation on the device.

int (*readdir) (struct file *, void *, filldir t); : This field should be NULL for device files;it is used for reading directories and is useful only for filesystems.

unsigned int (*poll) (struct file *, struct poll table struct *); : The poll method is theback end of three system calls: poll, epoll, and select, all of which are used to query whethera read or write to one or more file descriptors would block. The poll method should return abit mask indicating whether nonblocking reads or writes are possible, and, possibly, providethe kernel with information that can be used to put the calling process to sleep until I/Obecomes possible. If a driver leaves its poll method NULL, the device is assumed to be bothreadable and writable without blocking.

int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); : The ioctl sys-tem call offers a way to issue device-specific commands (such as formatting a track of afloppy disk, which is neither reading nor writing). Additionally, a few ioctl commands arerecognised by the kernel without referring to the fops table. If the device doesn’t providean ioctl method, the system call returns an error for any request that isn’t predefined (-ENOTTY, ”No such ioctl for device”).

int (*mmap) (struct file *, struct vm area struct *); : mmap is used to request a mappingof device memory to a process’s address space. If this method is NULL, the mmap systemcall returns -ENODEV.

int (*open) (struct inode *, struct file *); : Though this is always the first operation per-formed on the device file, the driver is not required to declare a corresponding method. Ifthis entry is NULL, opening the device always succeeds, but your driver isn’t notified.

int (*flush) (struct file *); : The flush operation is invoked when a process closes its copy of afile descriptor for a device; it should execute (and wait for) any outstanding operations onthe device. This must not be confused with the fsync operation requested by user programs.Currently, flush is used in very few drivers; the SCSI tape driver uses it, for example, toensure that all data written makes it to the tape before the device is closed. If flush isNULL, the kernel simply ignores the user application request.

int (*release) (struct inode *, struct file *); : This operation is invoked when the file struc-ture is being released. Like open, release can be NULL12.

int (*fsync) (struct file *, struct dentry *, int); : This method is the back end of the fsyncsystem call, which a user calls to flush any pending data. If this pointer is NULL, the systemcall returns -EINVAL.

int (*aio fsync)(struct kiocb *, int); : This is the asynchronous version of the fsync method.

int (*fasync) (int, struct file *, int); : This operation is used to notify the device of a changein its FASYNC flag. The field can be NULL if the driver doesn’t support asynchronousnotification.

int (*lock) (struct file *, int, struct file lock *); : The lock method is used to implement filelocking; locking is an indispensable feature for regular files but is almost never implementedby device drivers.

12Note that release isn’t invoked every time a process calls close. Whenever a file structure is shared (for exam-ple,after a fork or a dup), release won’t be invoked until all copies are closed. If you need to flush pending data whenany copy is closed, you should implement the flush method.


ssize t (*readv) (struct file *, const struct iovec *, unsigned long, loff t *);

ssize t (*writev) (struct file *, const struct iovec *, unsigned long, loff t *); : These meth-ods implement scatter/gather read and write operations. Applications occasionally need todo a single read or write operation involving multiple memory areas; these system calls allowthem to do so without forcing extra copy operations on the data. If these function pointersare left NULL, the read and write methods are called (perhaps more than once) instead.

ssize t (*sendfile)(struct file *, loff t *, size t, read actor t, void *); : This method im-plements the read side of the sendfile system call, which moves the data from one file de-scriptor to another with a minimum of copying. It is used, for example, by a web server thatneeds to send the contents of a file out a network connection. Device drivers usually leavesendfile NULL.

ssize t (*sendpage) (struct file *, struct page *, int, size t, loff t *, int); : sendpage is theother half of sendfile; it is called by the kernel to send data, one page at a time, to the cor-responding file. Device drivers do not usually implement sendpage.

unsigned long (*get unmapped area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);: The purpose of this method is to find a suitable location in the process’s address spaceto map in a memory segment on the underlying device. This task is normally performed bythe memory management code; this method exists to allow drivers to enforce any alignmentrequirements a particular device may have. Most drivers can leave this method NULL.

int (*check flags)(int) : This method allows a module to check the flags passed to an fc-ntl(F SETFL...) call.

int (*dir notify)(struct file *, unsigned long); : This method is invoked when an applicationuses fcntl to request directory change notifications. It is useful only to filesystems; driversneed not implement dir notify.

The sfpga driver only implements the most important device methods. Its file operations

structure is initialised as follows:

struct file operations sfpga fops = {.owner = THIS MODULE,.llseek = sfpga llseek,.read = sfpga read,.write = sfpga write,.open = sfpga open,.release = sfpga release,

/∗ .ioctl = sfpga ioctl, ∗/.mmap = sfpga mmap,

};

5.6.4.5 Char Device Registration

struct cdev is the kernel’s internal structure that represents char devices; this field contains apointer to that structure when the inode refers to a char device file.

struct cdev *i_cdev;

Before the kernel invokes your device’s operations, you must allocate and register one or moreof these structures. To do so, your code should include <linux/cdev.h>, where the structure andits associated helper functions are defined.


There are two ways of allocating and initializing one of these structures. If you wish to obtaina standalone cdev structure at runtime, you may do so with code such as:

cdev init(&dev→cdev, &sfpga fops);dev→cdev.owner = THIS MODULE;dev→cdev.ops = &sfpga fops;

Chances are, however, that you will want to embed the cdev structure within a device-specificstructure of your own; that is what sfpga does. In that case, you should initialize the structurethat you have already allocated with:

void cdev init(struct cdev ∗cdev, struct file operations ∗fops);

Either way, there is one other struct cdev field that you need to initialize. Like the file operationsstructure, struct cdev has an owner field that should be set to THIS MODULE.

Once the cdev structure is set up, the final step is to tell the kernel about it with a call to:

int cdev add(struct cdev ∗dev, dev t num, unsigned int count);

Here, dev is the cdev structure, num is the first device number to which this device responds,and count is the number of device numbers that should be associated with the device. Oftencount is one, but there are situations where it makes sense to have more than one device numbercorrespond to a specific device. Consider, for example, the SCSI tape driver, which allows userspace to select operating modes (such as density) by assigning multiple minor numbers to eachphysical device.

There are a couple of important things to keep in mind when using cdev add. The first is thatthis call can fail. If it returns a negative error code, your device has not been added to the system.It almost always succeeds, however, and that brings up the other point: as soon as cdev add

returns, your device is ”live” and its operations can be called by the kernel. You should not callcdev add until your driver is completely ready to handle operations on the device.

To remove a char device from the system, call:

void cdev del(struct cdev ∗dev);

Clearly, you should not access the cdev structure after passing it to cdev del.

5.6.4.6 Open and Release

Now that we’ve taken a quick look at the fields, we start using them.

The open Method The open method is provided for a driver to do any initialization in prepa-ration for later operations. In most drivers, open should perform the following tasks:

• Check for device-specific errors (such as device-not-ready or similar hardware problems)

• Initialize the device if it is being opened for the first time

• Update the f ops pointer, if necessary

• Allocate and fill any data structure to be put in filp->private data13

13


The first order of business, however, is usually to identify which device is being opened. Re-member that the prototype for the open method is:

int (∗open)(struct inode ∗inode, struct file ∗filp);

The inode argument has the information we need in the form of its i cdev field, which containsthe cdev structure we set up before. The only problem is that we do not normally want the cdevstructure itself, we want the sfpga dev structure that contains that cdev structure. The C languagelets programmers play all sorts of tricks to make that kind of conversion; programming such tricksis error prone, however, and leads to code that is difficult for others to read and understand.Fortunately, in this case, the kernel hackers have done the tricky stuff for us, in the form of thecontainer of macro, defined in <linux/kernel.h>:

container of(pointer, container type, container field);

This macro takes a pointer to a field of type container field, within a structure of typecontainer type, and returns a pointer to the containing structure. In sfpga open, this macro isused to find the appropriate device structure:

struct sfpga dev ∗dev; /∗ device information ∗/

if(!(dev = container of(inode→i cdev, struct sfpga dev, cdev))){printk(KERN ERR DRV NAME ": no private_data file this filehandle\n");return (−EBADF);

}filp→private data = dev; /∗ for other methods ∗/

return 0; /∗ success ∗/

Once it has found the sfpga dev structure, sfpga stores a pointer to it in the private data

field of the file structure for easier access in the future.

5.6.4.7 The Release Method

The role of the release method is the reverse of open. Sometimes you’ll find that the methodimplementation is called device close instead of device release. Either way, the device methodshould perform the following tasks:

• Deallocate anything that open allocated in filp->private data

• Shut down the device on last close

sfpga has no hardware to shut down, since the DSP it is communicating to operates stand-alone and resets are performed through a FPGA, so the code required is minimal:

int sfpga release(struct inode ∗inode, struct file ∗filp){

filp→private data = NULL;return 0;

}


5.6.4.8 Read and Write

The read and write operations on the filehandles copy data from the kernel to application code(and vice versa). Since the operations and prototypes are quit similar, only the read operationwill be described.

ssize t sfpga read(struct file ∗filp, char user ∗buf, size t count,loff t ∗f pos)

{struct sfpga dev ∗dev = filp→private data;

if((uint32 t)∗f pos +count >= dev→size){return 0;

}

copy to user(buf,(uint32 t∗)dev→mmio addr + (uint32 t)∗f pos,count)

∗f pos += count;

return count;}

The filp pointer is the file pointer and count is the size of the requested data transfer. Thebuf argument points to the user buffer holding the data to be written or the emtpy buffer wherethe newly read data should be placed. Finally, offp is a pointer to a long offset type object thatindicates the file position the user is accessing. The return value is a signed size type.

The buf argument to the read and write methods is a user-space pointer. Therefore, it cannotbe directly dereferenced by kernel code. There are a few reasons for this restriction:

• Depending on which architecture your driver is running on, and how the kernel was config-ured, the user-space pointer may not be valid while running in kernel mode at all. Theremay be no mapping for that address, or it could point to some other, random data.

• Even if the pointer does mean the same thing in kernel space, user-space memory is paged,and the memory in question might not be resident in RAM when the system call is made.Attempting to reference the user-space memory directly could generate a page fault, whichis something that kernel code is not allowed to do. The result would be an ”oops,” whichwould result in the death of the process that made the system call.

• The pointer in question has been supplied by a user program, which could be buggy ormalicious. If your driver ever blindly dereferences a user-supplied pointer, it provides anopen doorway allowing a user-space program to access or overwrite memory anywhere in thesystem. If you do not wish to be responsible for compromising the security of your users’systems, you cannot ever dereference a user-space pointer directly.

Obviously, your driver must be able to access the user-space buffer in order to get its job done.This access must always be performed by special, kernel-supplied functions, however, in order tobe safe.

The code for read and write in sfpga needs to copy a whole segment of data to or from the useraddress space. This capability is offered by the following kernel functions, which copy an arbitraryarray of bytes and sit at the heart of most read and write implementations:

unsigned long copy to user(void user ∗to, const void ∗from, unsigned long count);unsigned long copy from user(void ∗to, const void user ∗from, unsigned long count);


Although these functions behave like normal memcpy functions, a little extra care must beused when accessing user space from kernel code. The user pages being addressed might not becurrently present in memory, and the virtual memory subsystem can put the process to sleepwhile the page is being transferred into place. This happens, for example, when the page must beretrieved from swap space. The net result for the driver writer is that any function that accessesuser space must be reentrant, must be able to execute concurrently with other driver functions,and, in particular, must be in a position where it can legally sleep.

The role of the two functions is not limited to copying data to and from user-space: they alsocheck whether the user space pointer is valid. If the pointer is invalid, no copy is performed; if aninvalid address is encountered during the copy, on the other hand, only part of the data is copied.In both cases, the return value is the amount of memory still to be copied. The sfpga code looksfor this error return, and returns -EFAULT to the user if it’s not 0.

As far as the actual device methods are concerned, the task of the read method is to copy datafrom the device to user space (using copy to user), while the write method must copy data fromuser space to the device (using copy from user). Each read or write system call requests transferof a specific number of bytes, but the driver is free to transfer less data (see Figure 5.6).

Figure 5.6: The arguments to read.

Whatever the amount of data the methods transfer, they should generally update the file po-sition at *offp to represent the current file position after successful completion of the system call.The kernel then propagates the file position change back into the file structure when appropriate.

Finally, we can use a lseek operation to set the read/write pointer on the file handle. Inuserspace, this is done with e.g. the following code:

lseek(handle,0x7� 2,SEEK SET);if(write(handle,&val,0x4)<0){

return EXIT FAILURE;}

In kernel space, the sfpga llseek takes care of setting the file pointer for the subsequent readand write operation:


loff t sfpga llseek(struct file ∗filp, loff t off, int whence){

struct sfpga dev ∗dev = filp→private data;loff t newpos;

switch(whence) {case 0: /∗ SEEK SET ∗/

newpos = off;break;

case 1: /∗ SEEK CUR ∗/newpos = filp→f pos + off;break;

case 2: /∗ SEEK END ∗/newpos = dev→size + off;break;

default: /∗ can’t happen ∗/return −EINVAL;

}if (newpos < 0) return −EINVAL;filp→f pos = newpos;return newpos;

}

5.6.4.9 Ioctl

Most drivers need–in addition to the ability to read and write the device–the ability to performvarious types of hardware control via the device driver. Most devices can perform operationsbeyond simple data transfers; user space must often be able to request, for example, that thedevice lock its door, eject its media, report error information, change a baud rate, or self destruct.These operations are usually supported via the ioctl method, which implements the system callby the same name.

In user space, the ioctl system call has the following prototype:

int ioctl(int fd, unsigned long cmd, ...);

The prototype stands out in the list of Unix system calls because of the dots, which usuallymark the function as having a variable number of arguments. In a real system, however, a systemcall can’t actually have a variable number of arguments. System calls must have a well-definedprototype, because user programs can access them only through hardware ”gates.” Therefore, thedots in the prototype represent not a variable number of arguments but a single optional argument,traditionally identified as char *argp. The dots are simply there to prevent type checking duringcompilation. The actual nature of the third argument depends on the specific control commandbeing issued (the second argument). Some commands take no arguments, some take an integervalue, and some take a pointer to other data. Using a pointer is the way to pass arbitrary data tothe ioctl call; the device is then able to exchange any amount of data with user space.

The unstructured nature of the ioctl call has caused it to fall out of favor among kerneldevelopers. Each ioctl command is, essentially, a separate, usually undocumented system call,


and there is no way to audit these calls in any sort of comprehensive manner. It is also difficultto make the unstructured ioctl arguments work identically on all systems; for example, consider64-bit systems with a userspace process running in 32-bit mode. As a result, there is strongpressure to implement miscellaneous control operations by just about any other means. Possiblealternatives include embedding commands into the data stream or using virtual filesystems, eithersysfs or driverspecific filesystems. However, the fact remains that ioctl is often the easiest andmost straightforward choice for true device operations.

The ioctl driver method has a prototype that differs somewhat from the user-space version:

int (∗ioctl) (struct inode ∗inode, struct file ∗filp, unsigned int cmd, unsigned long arg);

The inode and filp pointers are the values corresponding to the file descriptor fd passed onby the application and are the same parameters passed to the open method. The cmd argumentis passed from the user unchanged, and the optional arg argument is passed in the form of anunsigned long, regardless of whether it was given by the user as an integer or a pointer. If theinvoking program doesn’t pass a third argument, the arg value received by the driver operation isundefined. Because type checking is disabled on the extra argument, the compiler can’t warn youif an invalid argument is passed to ioctl, and any associated bug would be difficult to spot.

As you might imagine, most ioctl implementations consist of a big switch statement that selectsthe correct behavior according to the cmd argument. Different commands have different numericvalues, which are usually given symbolic names to simplify coding. The symbolic name is assignedby a preprocessor definition. Custom drivers usually declare such symbols in their header files;ppc2dsp.h declares them for ppc2dsp. User programs must, of course, include that header file aswell to have access to those symbols.

Choosing the ioctl commands Before writing the code for ioctl, you need to choose thenumbers that correspond to commands. The first instinct of many programmers is to choose a setof small numbers starting with 0 or 1 and going up from there. There are, however, good reasonsfor not doing things that way. The ioctl command numbers should be unique across the system inorder to prevent errors caused by issuing the right command to the wrong device. Such a mismatchis not unlikely to happen, and a program might find itself trying to change the baud rate of anon-serial-port input stream, such as a FIFO or an audio device. If each ioctl number is unique,the application gets an EINVAL error rather than succeeding in doing something unintended.

To help programmers create unique ioctl command codes, these codes have been split up intoseveral bitfields. The first versions of Linux used 16-bit numbers: the top eight were the ”magic”numbers associated with the device, and the bottom eight were a sequential number, uniquewithin the device. This happened because Linus was ”clueless” (his own word); a better divisionof bitfields was conceived only later. Unfortunately, quite a few drivers still use the old convention.They have to: changing the command codes would break no end of binary programs, and that isnot something the kernel developers are willing to do.

To choose ioctl numbers for your driver according to the Linux kernel convention, you shouldfirst check include/asm/ioctl.h and Documentation/ioctl-number.txt. The header defines the bit-fields you will be using: type (magic number), ordinal number, direction of transfer, and size ofargument. The ioctl-number.txt file lists the magic numbers used throughout the kernel, so you’llbe able to choose your own magic number and avoid overlaps. The text file also lists the reasonswhy the convention should be used.

The approved way to define ioctl command numbers uses four bitfields, which have the followingmeanings. New symbols introduced in this list are defined in <linux/ ioctl.h>.

type : The magic number. Just choose one number (after consulting ioctl-number.txt) and use itthroughout the driver. This field is eight bits wide ( IOC TYPEBITS).

number : The ordinal (sequential) number. It’s eight bits ( IOC NRBITS) wide.


direction : The direction of data transfer, if the particular command involves a data transfer.The possible values are IOC NONE (no data transfer), IOC READ, IOC WRITE, andIOC READ— IOC WRITE (data is transferred both ways). Data transfer is seen from the

application’s point of view; IOC READ means reading from the device, so the driver mustwrite to user space. Note that the field is a bit mask, so IOC READ and IOC WRITEcan be extracted using a logical AND operation.

size : The size of user data involved. The width of this field is architecture dependent, but isusually 13 or 14 bits. You can find its value for your specific architecture in the macroIOC SIZEBITS. Proper use of this field can help detect user-space programming errors and

enable you to implement backward compatibility if you ever need to change the size of therelevant data item.

The header file <asm/ioctl.h>, which is included by <linux/ioctl.h>, defines macros thathelp set up the command numbers as follows: IO(type,nr) (for a command that has no argument),IOR(type,nr,datatype) (for reading data from the driver), IOW(type,nr,datatype) (for writing

data), and IOWR(type,nr,datatype) (for bidirectional transfers). The type and number fields arepassed as arguments, and the size field is derived by applying sizeof to the datatype argument. Theheader also defines macros that may be used in your driver to decode the numbers: IOC DIR(nr),IOC TYPE(nr), IOC NR(nr), and IOC SIZE(nr). We won’t go into any more detail about these

macros because the header file is clear, and sample code is shown later in this section.Here is how some ioctl commands are defined in ppc2dsp. In particular, these commands set

and get the driver’s configurable parameters.

/∗ ... ∗/#define PPC2DSP IOC MAGIC 0xBA

#define PPC2DSP SET DSPP IOW(PPC2DSP IOC MAGIC,1,int)#define PPC2DSP SET HDCR IOW(PPC2DSP IOC MAGIC,2,int)#define PPC2DSP SET HSR IOW(PPC2DSP IOC MAGIC,3,int)#define PPC2DSP SET REG IOW(PPC2DSP IOC MAGIC,4,int)#define PPC2DSP SET MEM IOW(PPC2DSP IOC MAGIC,5,int)#define PPC2DSP SET MEM BLOCK IOW(PPC2DSP IOC MAGIC,6,int)#define PPC2DSP GET REG IOWR(PPC2DSP IOC MAGIC,7,int)#define PPC2DSP GET HDCR IOR(PPC2DSP IOC MAGIC,8,int)#define PPC2DSP GET MEM BLOCK IOWR(PPC2DSP IOC MAGIC,9,int)#define PPC2DSP SEND INTERRUPT IOW(PPC2DSP IOC MAGIC,10,int)#define PPC2DSP GET MEM IOWR(PPC2DSP IOC MAGIC,11,int)#define PPC2DSP GET HSR IOR(PPC2DSP IOC MAGIC,12,int)#define PPC2DSP CONSISTENT ALLOC IOWR(PPC2DSP IOC MAGIC,13,int)#define PPC2DSP CONSISTENT FREE IOWR(PPC2DSP IOC MAGIC,14,int)#define PPC2DSP XFER BUFFER IOW(PPC2DSP IOC MAGIC,15,int)#define PPC2DSP ALLOCLOG BUFFER IOR(PPC2DSP IOC MAGIC,16,int)#define PPC2DSP GETLOG BUFFER IOR(PPC2DSP IOC MAGIC,17,int)#define PPC2DSP FREELOG BUFFER IOR(PPC2DSP IOC MAGIC,18,int)#define PPC2DSP MMAPAVAIL IOR(PPC2DSP IOC MAGIC,19,int)#define PPC2DSP GETDATA BUFFER IOWR(PPC2DSP IOC MAGIC,20,int)

#define PPC2DSP IOC MAXNR (20)/∗ ... ∗/

The implementation of ioctl is usually a switch statement based on the command number. Butwhat should the default selection be when the command number doesn’t match a valid operation?The question is controversial. Several kernel functions return -EINVAL (”Invalid argument”),


which makes sense because the command argument is indeed not a valid one. The POSIX standard,however, states that if an inappropriate ioctl command has been issued, then -ENOTTY shouldbe returned. This error code is interpreted by the C library as ”inappropriate ioctl for device,”which is usually exactly what the programmer needs to hear. It’s still pretty common, though, toreturn -EINVAL in response to an invalid ioctl command.

Using the ioctl argument Another point we need to cover before looking at the ioctl code forthe ppc2dsp driver is how to use the extra argument. If it is an integer, it’s easy: it can be useddirectly. If it is a pointer, however, some care must be taken.

When a pointer is used to refer to user space, we must ensure that the user address is valid. Anattempt to access an unverified user-supplied pointer can lead to incorrect behavior, a kernel oops,system corruption, or security problems. It is the driver’s responsibility to make proper checks onevery user-space address it uses and to return an error if it is invalid.

Previously, we looked at the copy from user and copy to user functions, which can be used tosafely move data to and from user space. Those functions can be used in ioctl methods as well, butioctl calls often involve small data items that can be more efficiently manipulated through othermeans. To start, address verification (without transferring data) is implemented by the functionaccess ok, which is declared in <asm/uaccess.h>:

int access ok(int type, const void ∗addr, unsigned long size);

The first argument should be either VERIFY READ or VERIFY WRITE, depending onwhether the action to be performed is reading the user-space memory area or writing it. Theaddr argument holds a user-space address, and size is a byte count. If ioctl, for instance, needs toread an integer value from user space, size is sizeof(int). If you need to both read and write at thegiven address, use VERIFY WRITE, since it is a superset of VERIFY READ.

Unlike most kernel functions, access ok returns a boolean value: 1 for success (access is OK)and 0 for failure (access is not OK). If it returns false, the driver should usually return -EFAULTto the caller.

There are a couple of interesting things to note about access ok. First, it does not do thecomplete job of verifying memory access; it only checks to see that the memory reference is ina region of memory that the process might reasonably have access to. In particular, access okensures that the address does not point to kernel-space memory. Second, most driver code neednot actually call access ok. The memory-access routines described later take care of that for you.Nonetheless, we demonstrate its use so that you can see how it is done.

The ppc2dsp source exploits the bitfields in the ioctl number to check the arguments beforethe switch:


if( IOC TYPE(cmd) != PPC2DSP IOC MAGIC){printk(KERN INFO DRV NAME ": invalid ppc2dsp ioctl magic code.\n");return −ENOTTY;

}

if( IOC NR(cmd) != PPC2DSP IOC MAXNR){printk(KERN INFO DRV NAME ": invalid ppc2dsp ioctl command: %x.\n", cmd);return −ENOTTY;

}

if( IOC DIR(cmd) & IOC READ){retval = !access ok(VERIFY WRITE,(void user ∗)arg, IOC SIZE(cmd));

}

if( IOC DIR(cmd) & IOC WRITE){retval = !access ok(VERIFY READ,(void user ∗)arg, IOC SIZE(cmd));

}

if(retval){printk(KERN INFO DRV NAME ": access control failed for command: %x.\n", cmd);return −EFAULT;

}

In this sniplet of code from the ppc2dsp driver, the ioctl accesses are checked by using theappropriate bitfields as defined in the header files: first the command is checked to see if it isvalid for the current driver ( IOC TYPE) and if the command number is still valid (not too large:IOC NR). Finally, depending on the direction that was defined with the IOR, IOW and IOWR

macros, the access to the addresses in user space is verified.After calling access ok, the driver can safely perform the actual transfer. In addition to the

copy from user and copy to user functions, the programmer can exploit a set of functions thatare optimised for the most used data sizes (one, two, four, and eight bytes). These functions aredescribed in the following list and are defined in <asm/ uaccess.h>:

put user(datum, ptr)

put user(datum, ptr) : These macros write the datum to user space; they are relatively fastand should be called instead of copy to user whenever single values are being transferred.The macros have been written to allow the passing of any type of pointer to put user, aslong as it is a user-space address. The size of the data transfer depends on the type of the ptrargument and is determined at compile time using the sizeof and typeof compiler builtins. Asa result, if ptr is a char pointer, one byte is transferred, and so on for two, four, and possiblyeight bytes. put user checks to ensure that the process is able to write to the given memoryaddress. It returns 0 on success, and -EFAULT on error. put user performs less checking(it does not call access ok), but can still fail if the memory pointed to is not writable by theuser. Thus, put user should only be used if the memory region has already been verifiedwith access ok.

As a general rule, you call put user to save a few cycles when you are implementing a readmethod, or when you copy several items and, thus, call access ok just once before the firstdata transfer, as shown above for ioctl.

get user(local, ptr)

get user(local, ptr) : These macros are used to retrieve a single datum from user space. Theybehave like put user and put user, but transfer data in the opposite direction. The value


retrieved is stored in the local variable local; the return value indicates whether the operationsucceeded. Again, get user should only be used if the address has already been verified withaccess ok.

If an attempt is made to use one of the listed functions to transfer a value that does not fit oneof the specific sizes, the result is usually a strange message from the compiler, such as ”conversionto non-scalar type requested.” In such cases, copy to user or copy from user must be used.

The ioctl Implementation A full implementation for the ppc2dsp ioctl commands wouldlead us too far; the following are a number of commands that read and write simple data elementsfrom and to kernel space.

switch (cmd) {/∗ ... ∗/case PPC2DSP SET HDCR:{

unsigned int hdcr;

get user(hdcr, (unsigned int ∗)arg);spin lock irq(&ppc2dsp pciaccess);outl(hdcr, PCI REGISTER(p, 1));spin unlock irq(&ppc2dsp pciaccess);wmb();

break;}

case PPC2DSP GET HDCR:{

unsigned int hdcr;

hdcr = inl(PCI REGISTER(p, 1));put user(hdcr, (unsigned int ∗)arg);

break;}/∗ ... ∗/

}

The first command sets a register on the peripheral DSP. Since the access check has alreadybeen done by using IOC bitmasks, we can now use the faster get user to obtain the data in thekernel module. Ince the value has been read, it is written to the DSP (DSP access is protectedwith a spinlock, effectively serialising access to the DSP).

The second command does the reverse and reads the HDCR value from the DSP, and writesit back to userspace. Again; since the access checks have already been done; we can use the fasterversion of put user.

The full code is accessible in the subversion repository (http://neo.barco.com/svn/code/trunk/firmware/ppc/kernel/).

ioctl in userspace As mentioned before, the interfaces for ioctl in user and kernelspace areslightly different. In order to use our defined ioctl commands in user space, we include the kernelheader files barco/ppc2dsp.h. The ioctl is called as follows:

http://neo.barco.com/svn/code/trunk/firmware/ppc/kernel/

http://neo.barco.com/svn/code/trunk/firmware/ppc/kernel/


/∗ set the HSR reg to a particular value ∗/static inline void ppc2dsp sethsr(const DSP ∗dsphandle, unsigned long hsr){

ioctl(∗dsphandle,PPC2DSP SET HDCR,&hsr);}

5.6.5 Interrupt Handling

Although some devices can be controlled using nothing but their I/O regions, most real devicesare a bit more complicated than that. Devices have to deal with the external world, which oftenincludes things such as spinning disks, moving tape, wires to distant places, and so on. Much hasto be done in a time frame that is different from, and far slower than, that of the processor. Sinceit is almost always undesirable to have the processor wait on external events, there must be a wayfor a device to let the processor know when something has happened.

That way, of course, is interrupts. An interrupt is simply a signal that the hardware can sendwhen it wants the processor’s attention. Linux handles interrupts in much the same way thatit handles signals in user space. For the most part, a driver need only register a handler for itsdevice’s interrupts, and handle them properly when they arrive. Of course, underneath that simplepicture there is some complexity; in particular, interrupt handlers are somewhat limited in theactions they can perform as a result of how they are run.

It is difficult to demonstrate the use of interrupts without a real hardware device to generatethem. Thus, the sample code used in this section again uses the ppc2dsp driver code.

Before we get into the topic, however, it is time for one cautionary note. Interrupt handlers, bytheir nature, run concurrently with other code. Thus, they inevitably raise issues of concurrencyand contention for data structures and hardware. A solid understanding of concurrency controltechniques is vital when working with interrupts.

5.6.5.1 Installing an Interrupt Handler

If you want to actually ”see” interrupts being generated, writing to the hardware device isn’tenough; a software handler must be configured in the system. If the Linux kernel hasn’t been toldto expect your interrupt, it simply acknowledges and ignores it.

Interrupt lines are a precious and often limited resource, particularly when there are only15 or 16 of them. The kernel keeps a registry of interrupt lines, similar to the registry of I/Oports. A module is expected to request an interrupt channel (or IRQ, for interrupt request) beforeusing it and to release it when finished. In many situations, modules are also expected to be ableto share interrupt lines with other drivers, as we will see. The following functions, declared in<linux/interrupt.h>, implement the interrupt registration interface:

int request irq(unsigned int irq,irqreturn t (∗handler)(int, void ∗),unsigned long flags,

const char ∗dev name,void ∗dev id);

void free irq(unsigned int irq, void ∗dev id);

The value returned from request irq to the requesting function is either 0 to indicate successor a negative error code, as usual. It’s not uncommon for the function to return -EBUSY to signalthat another driver is already using the requested interrupt line. The arguments to the functionsare as follows:

unsigned int irq :i The interrupt number being requested.


irqreturn t (*handler)(int, void *) : The pointer to the handling function being installed. Wediscuss the arguments to this function and its return value later in this chapter.

unsigned long flags : As you might expect, a bit mask of options (described later) related tointerrupt management.

const char *dev name : The string passed to request irq is used in /proc/interrupts to showthe owner of the interrupt (see the next section).

void *dev id : Pointer used for shared interrupt lines. It is a unique identifier that is used whenthe interrupt line is freed and that may also be used by the driver to point to its own privatedata area (to identify which device is interrupting). If the interrupt is not shared, dev id canbe set to NULL, but it a good idea anyway to use this item to point to the device structure.We’ll see a practical use for dev id in the section ”Implementing a Handler.”

The bits that can be set in flags are as follows:

IRQF DISABLED : When set, this indicates a ”fast” interrupt handler. Fast handlers areexecuted with interrupts disabled on the current processor (the topic is covered in the section”Fast and Slow Handlers”).

IRQF SHARED : This bit signals that the interrupt can be shared between devices. The conceptof sharing is outlined in the section ”Interrupt Sharing.”

IRQF SAMPLE RANDOM : This bit indicates that the generated interrupts can contributeto the entropy pool used by /dev/random and /dev/urandom. These devices return trulyrandom numbers when read and are designed to help application software choose secure keysfor encryption. Such random numbers are extracted from an entropy pool that is contributedby various random events. If your device generates interrupts at truly random times, youshould set this flag. If, on the other hand, your interrupts are predictable (for example,vertical blanking of a frame grabber), the flag is not worth setting–it wouldn’t contributeto system entropy anyway. Devices that could be influenced by attackers should not set thisflag; for example, network drivers can be subjected to predictable packet timing from outsideand should not contribute to the entropy pool. See the comments in drivers/char/rando-.cfor more information.

The interrupt handler can be installed either at driver initialisation or when the device is firstopened. Although installing the interrupt handler from within the module’s initialisation functionmight sound like a good idea, it often isn’t, especially if your device does not share interrupts.Because the number of interrupt lines is limited, you don’t want to waste them. You can easilyend up with more devices in your computer than there are interrupts. If a module requests an IRQat initialisation, it prevents any other driver from using the interrupt, even if the device holdingit is never used. Requesting the interrupt at device open, on the other hand, allows some sharingof resources.

It is possible, for example, to run a frame grabber on the same interrupt as a modem, as longas you don’t use the two devices at the same time. It is quite common for users to load the modulefor a special device at system boot, even if the device is rarely used. A data acquisition gadgetmight use the same interrupt as the second serial port. While it’s not too hard to avoid connectingto your Internet service provider (ISP) during data acquisition, being forced to unload a modulein order to use the modem is really unpleasant.

The correct place to call request irq is when the device is first opened, before the hardware isinstructed to generate interrupts. The place to call free irq is the last time the device is closed, afterthe hardware is told not to interrupt the processor any more. The disadvantage of this technique isthat you need to keep a per-device open count so that you know when interrupts can be disabled.

This discussion notwithstanding, short requests its interrupt line at load time. This was doneso that you can run the test programs without having to run an extra process to keep the device


open. short, therefore, requests the interrupt from within its initialisation function (short init)instead of doing it in short open, as a real device driver would.

The interrupt requested by the following code is ppc2dsp irq. As mentioned before, we try toobtain the irq from the PCI configuration and then request the irq to the kernel.

#ifdef ENABLE INTERRUPT/∗ check for the interrupt ∗/if((retval = pci read config byte(dev,PCI INTERRUPT LINE,&ppc2dsp irq))){

printk(KERN INFO DRV NAME ": could not get IRQ from PCI configuration\n");goto error irq;

}else if ((retval = request irq(ppc2dsp irq, dsp interrupt, IRQF DISABLED, DRV NAME,NULL))) {

printk(KERN INFO DRV NAME ": could not get IRQ %d\n", ppc2dsp irq);goto error irq;

} else {unsigned long hsr;/∗ enable DSP interrupt here ∗/hsr = inl(PCI REGISTER(p, 0));hsr &= ∼0x4;outl(hsr, PCI REGISTER(p, 0));wmb();

}#endif

The code shows that the handler being installed (dsp interrupt) is a fast handler (IRQF DISABLED),doesn’t support interrupt sharing (IRQF SHARED is missing), and doesn’t contribute to systementropy (IRQF SAMPLE RANDOM is missing, too). The outl call then enables the interrupt forthe DSP.

For what it’s worth, the x86 architectures define a function for querying the availability of aninterrupt line:

int can request irq(unsigned int irq, unsigned long flags);

This function returns a nonzero value if an attempt to allocate the given interrupt succeeds.Note, however, that things can always change between calls to can request irq and request irq.

5.6.5.2 The /proc Interface

Whenever a hardware interrupt reaches the processor, an internal counter is incremented, providinga way to check whether the device is working as expected. Reported interrupts are shown in/proc/interrupts. The following snapshot was taken on a two-processor PowerPC system:

[mleeman@seraph ~]$ cat /proc/interrupts

CPU0 CPU1

18: 96191 56369 XICS Edge IPI

32: 781431 2344278 XICS Edge ide0, ide1

35: 0 0 XICS Edge ohci_hcd:usb1, ohci_hcd:usb2

45: 7417626 22251447 XICS Edge eth1

BAD:

The first column is the IRQ number. You can see from the IRQs that are missing that the fileshows only interrupts corresponding to installed handlers. For example, the first serial port (which


uses interrupt number 4) is not shown, indicating that the modem isn’t being used. In fact, evenif the modem had been used earlier but wasn’t in use at the time of the snapshot, it would notshow up in the file; the serial ports are well behaved and release their interrupt handlers when thedevice is closed.

The /proc/interrupts display shows how many interrupts have been delivered to each CPUon the system. As you can see from the output, the Linux kernel generally handles interrupts onthe first CPU as a way of maximising cache locality. The last two columns give information onthe programmable interrupt controller that handles the interrupt (and that a driver writer doesnot need to worry about), and the name(s) of the device(s) that have registered handlers for theinterrupt (as specified in the dev name argument to request irq).

The /proc tree contains another interrupt-related file, /proc/stat; sometimes you’ll find one filemore useful and sometimes you’ll prefer the other. /proc/stat records several low-level statisticsabout system activity, including (but not limited to) the number of interrupts received since systemboot. Each line of stat begins with a text string that is the key to the line; the intr mark is whatwe are looking for. The following (truncated) snapshot was taken shortly after the previous onefor x86:

intr 5167833 5154006 2 0 2 4907 0 2 68 4 0 4406 9291 50 0 0

The first number is the total of all interrupts, while each of the others represents a single IRQline, starting with interrupt 0. All of the counts are summed across all processors in the system.This snapshot shows that interrupt number 4 has been used 4907 times, even though no handleris currently installed. If the driver you’re testing acquires and releases the interrupt at each openand close cycle, you may find /proc/stat more useful than /proc/interrupts.

Another difference between the two files is that interrupts is not architecture dependent (except,perhaps, for a couple of lines at the end), whereas stat is; the number of fields depends on thehardware underlying the kernel. The number of available interrupts varies from as few as 15 onthe SPARC to as many as 256 on the IA-64 and a few other systems. It’s interesting to note thatthe number of interrupts defined on the x86 is currently 224, not 16 as you may expect; this, asexplained in include/ as–i386/irq.h, depends on Linux using the architectural limit instead of animplementation-specific limit (such as the 16 interrupt sources of the old-fashioned PC interruptcontroller). The following shows the output for the afore mentioned PowerPC system:

intr 32947827

There is only one entry giving the total of the interrupts. The following is a snapshot of/proc/interrupts taken on an IA-64 system. As you can see, besides different hardware routing ofcommon interrupt sources, the output is very similar to that from the PowerPC system shownearlier.

[mleeman@zee ~]$ cat /proc/interrupts

CPU0 CPU1 CPU2 CPU3

0: 415621701 415704853 417032882 417064739 IO-APIC-edge timer

8: 0 0 0 0 IO-APIC-edge rtc

11: 0 0 0 0 IO-APIC-level acpi

16: 124 137 115 133 IO-APIC-level uhci_hcd:usb1

17: 0 0 0 0 IO-APIC-level uhci_hcd:usb2

18: 44677234 44720078 45851087 45888779 IO-APIC-level ioc0

19: 621848669 621722523 619263507 619193909 IO-APIC-level eth0

NMI: 400167 402384 400123 402364

LOC: 1665445215 1665445191 1665445169 1665445020

ERR: 3

MIS: 0


5.6.5.3 Fast and Slow Handlers

Older versions of the Linux kernel took great pains to distinguish between ”fast” and ”slow” in-terrupts. Fast interrupts were those that could be handled very quickly, whereas handling slowinterrupts took significantly longer. Slow interrupts could be sufficiently demanding of the pro-cessor, and it was worthwhile to re-enable interrupts while they were being handled. Otherwise,tasks requiring quick attention could be delayed for too long.

In modern kernels, most of the differences between fast and slow interrupts have disappeared.There remains only one: fast interrupts (those that were requested with the IRQF DISABLEDflag) are executed with all other interrupts disabled on the current processor. Note that otherprocessors can still handle interrupts, although you will never see two processors handling thesame IRQ at the same time.

So, which type of interrupt should your driver use? On modern systems, IRQF DISABLEDis intended only for use in a few, specific situations such as timer interrupts. Unless you have astrong reason to run your interrupt handler with other interrupts disabled, you should not useIRQF DISABLED.

5.6.5.4 Implementing a Handler

So far, we’ve learned to register an interrupt handler but not to write one. Actually, there’s nothingunusual about a handler–it’s ordinary C code.

The only peculiarity is that a handler runs at interrupt time and, therefore, suffers somerestrictions on what it can do. These restrictions are the same as those we saw with kernel timers.A handler can’t transfer data to or from user space, because it doesn’t execute in the context of aprocess. Handlers also cannot do anything that would sleep, such as calling wait event, allocatingmemory with anything other than GFP ATOMIC, or locking a semaphore. Finally, handlers cannotcall schedule.

The role of an interrupt handler is to give feedback to its device about interrupt receptionand to read or write data according to the meaning of the interrupt being serviced. The first stepusually consists of clearing a bit on the interface board; most hardware devices won’t generateother interrupts until their ”interrupt-pending” bit has been cleared. Depending on how yourhardware works, this step may need to be performed last instead of first; there is no catch-all rulehere. Some devices don’t require this step, because they don’t have an ”interrupt-pending” bit;such devices are a minority, although the parallel port is one of them.


#ifdef ENABLE INTERRUPTirqreturn t dsp interrupt(int irq, void ∗dev id){

unsigned long hsr = 0ul;/∗ ACK interrupt to DSP: set INTSRC bit in HSR on the DSP to 1 ∗/hsr |= 0x1;spin lock(&ppc2dsp pciaccess);outl(hsr, PCI REGISTER(legacyptr, 0));spin unlock(&ppc2dsp pciaccess);

/∗ if the atomic read buffer is set, we can assume∗ that it was an xfer to the DSP, otherwise it is∗ an interrupt from the DSP to let the PPC know data∗ is present to xfer to userspace ∗/

if (atomic read(&buffer xfer)) {atomic set(&buffer xfer, 0);

} else {wake up interruptible sync(&ppc2dsp queue);

}

return IRQ HANDLED;}#endif

The first lines signal to the DSP that the kernel has handled the interrupt is being handled andnew interrupts can be sent. By checking the variable buffer xfer, we determine if the processis waiting for data to come from the DSP (and was put asleep) or if it is a DMA transfer to theDSP. In the latter case, we clear the transfer variable (signalling that the DSP is ready to receivenew data), in the former, we wake up the userspace program and allow it to read the data thatthe DSP DMA’d to the PowerPC from the driver.

5.6.5.5 Handler Arguments and Return Values

Though ppc2dsp ignores them, two arguments are passed to an interrupt handler: irq and dev id.Let’s look at the role of each.

The interrupt number (int irq) is useful as information you may print in your log messages, ifany. The second argument, void *dev id, is a sort of client data; a void * argument is passed torequest irq, and this same pointer is then passed back as an argument to the handler when theinterrupt happens. You usually pass a pointer to your device data structure in dev id, so a driverthat manages several instances of the same device doesn’t need any extra code in the interrupthandler to find out which device is in charge of the current interrupt event.

Typical use of the argument in an interrupt handler is as follows:

static irqreturn t sample interrupt(int irq, void ∗dev id){

struct sample dev ∗dev = dev id;

/∗ now ‘dev’ points to the right hardware item ∗//∗ .... ∗/

}


The typical open code associated with this handler looks like this:

static void sample open(struct inode ∗inode, struct file ∗filp){

struct sample dev ∗dev = hwinfo + MINOR(inode→i rdev);request irq(dev→irq, sample interrupt,

0 /∗ flags ∗/, "sample", dev /∗ dev id ∗/);/∗....∗/return 0;

}

Interrupt handlers should return a value indicating whether there was actually an interruptto handle. If the handler found that its device did, indeed, need attention, it should returnIRQ HANDLED; otherwise the return value should be IRQ NONE. You can also generate thereturn value with this macro:

IRQ RETVAL(handled) where handled is nonzero if you were able to handle the interrupt.The return value is used by the kernel to detect and suppress spurious interrupts. If your devicegives you no way to tell whether it really interrupted, you should return IRQ HANDLED.

5.6.5.6 Top and Bottom Halves

One of the main problems with interrupt handling is how to perform lengthy tasks within ahandler. Often a substantial amount of work must be done in response to a device interrupt, butinterrupt handlers need to finish up quickly and not keep interrupts blocked for long. These twoneeds (work and speed) conflict with each other, leaving the driver writer in a bit of a bind.

Linux (along with many other systems) resolves this problem by splitting the interrupt handlerinto two halves. The so-called top half is the routine that actually responds to the interrupt–theone you register with request irq. The bottom half is a routine that is scheduled by the top half tobe executed later, at a safer time. The big difference between the top-half handler and the bottomhalf is that all interrupts are enabled during execution of the bottom half–that’s why it runs ata safer time. In the typical scenario, the top half saves device data to a device-specific buffer,schedules its bottom half, and exits: this operation is very fast. The bottom half then performswhatever other work is required, such as awakening processes, starting up another I/O operation,and so on. This setup permits the top half to service a new interrupt while the bottom half is stillworking.

Almost every serious interrupt handler is split this way. For instance, when a network interfacereports the arrival of a new packet, the handler just retrieves the data and pushes it up to theprotocol layer; actual processing of the packet is performed in a bottom half.

The Linux kernel has two different mechanisms that may be used to implement bottom-halfprocessing. Tasklets are often the preferred mechanism for bottom-half processing; they are veryfast, but all tasklet code must be atomic. The alternative to tasklets is workqueues, which mayhave a higher latency but that are allowed to sleep.

The following discussion works, with an example driver. When loaded with a module option,short can be told to do interrupt processing in a top/bottom-half mode with either a tasklet orworkqueue handler. In this case, the top half executes quickly; it simply remembers the currenttime and schedules the bottom half processing. The bottom half is then charged with encodingthis time and awakening any user processes that may be waiting for data.

Tasklets Remember that tasklets are a special function that may be scheduled to run, in softwareinterrupt context, at a system-determined safe time. They may be scheduled to run multipletimes, but tasklet scheduling is not cumulative; the tasklet runs only once, even if it is requestedrepeatedly before it is launched. No tasklet ever runs in parallel with itself, since they run only


once, but tasklets can run in parallel with other tasklets on SMP systems. Thus, if your driver hasmultiple tasklets, they must employ some sort of locking to avoid conflicting with each other.

Tasklets are also guaranteed to run on the same CPU as the function that first schedules them.Therefore, an interrupt handler can be secure that a tasklet does not begin executing before thehandler has completed. However, another interrupt can certainly be delivered while the tasklet isrunning, so locking between the tasklet and the interrupt handler may still be required.

Tasklets must be declared with the DECLARE TASKLET macro:

DECLARE TASKLET(name, function, data);

name is the name to be given to the tasklet, function is the function that is called to executethe tasklet (it takes one unsigned long argument and returns void), and data is an unsigned longvalue to be passed to the tasklet function.

The short driver declares its tasklet as follows:

void short do tasklet(unsigned long);DECLARE TASKLET(short tasklet, short do tasklet, 0);

The function tasklet schedule is used to schedule a tasklet for running. If short is loaded withtasklet=1, it installs a different interrupt handler that saves data and schedules the tasklet asfollows:

irqreturn t short tl interrupt(int irq, void ∗dev id){

/∗ cast to stop ’volatile’ warning ∗/do gettimeofday((struct timeval ∗) tv head);short incr tv(&tv head);tasklet schedule(&short tasklet);/∗ record that an interrupt arrived ∗/short wq count++;return IRQ HANDLED;

}

The actual tasklet routine, short do tasklet, will be executed shortly (so to speak) at thesystem’s convenience. As mentioned earlier, this routine performs the bulk of the work of handlingthe interrupt; it looks like this:


void short do tasklet (unsigned long unused){

int savecount = short wq count, written;/∗ we have already been removed from the queue ∗/short wq count = 0;/∗∗ The bottom half reads the tv array, filled by the top half,∗ and prints it to the circular text buffer, which is then consumed∗ by reading processes∗/

/∗ First write the number of interrupts that occurred before this bh ∗/written = sprintf((char ∗)short head,"bh after %6i\n",savecount);short incr bp(&short head, written);

/∗∗ Then, write the time values. Write exactly 16 bytes at a time,∗ so it aligns with PAGE SIZE∗/

do {written = sprintf((char ∗)short head,"%08u.%06u\n",

(int)(tv tail→tv sec % 100000000),(int)(tv tail→tv usec));

short incr bp(&short head, written);short incr tv(&tv tail);

} while (tv tail != tv head);

/∗ awake any reading process ∗/wake up interruptible(&short queue);

}

Among other things, this tasklet makes a note of how many interrupts have arrived since itwas last called. A device such as short can generate a great many interrupts in a brief period, soit is not uncommon for several to arrive before the bottom half is executed. Drivers must alwaysbe prepared for this possibility and must be able to determine how much work there is to performfrom the information left by the top half.

Workqueues Recall that workqueues invoke a function at some future time in the context of aspecial worker process. Since the workqueue function runs in process context, it can sleep if need be.You cannot, however, copy data into user space from a workqueue (under normal circumstances),the worker process does not have access to any other process’s address space.

The short driver, if loaded with the wq option set to a nonzero value, uses a workqueue forits bottom-half processing. It uses the system default workqueue, so there is no special setupcode required; if your driver has special latency requirements (or might sleep for a long time inthe workqueue function), you may want to create your own, dedicated workqueue. We do need awork struct structure, which is declared and initialised with the following:


static struct work struct short wq;

/∗ this line is in short init( ) ∗/INIT WORK(&short wq, (void (∗)(void ∗)) short do tasklet, NULL);

Our worker function is short do tasklet, which we have already seen in the previous section.When working with a workqueue, short establishes yet another interrupt handler that looks

like this:

irqreturn t short wq interrupt(int irq, void ∗dev id) {/∗ Grab the current time information. ∗/do gettimeofday((struct timeval ∗) tv head);short incr tv(&tv head);

/∗ Queue the bh. Don’t worry about multiple enqueueing ∗/schedule work(&short wq);

short wq count++;/∗ record that an interrupt arrived ∗/return IRQ HANDLED;

}

As you can see, the interrupt handler looks very much like the tasklet version, with the exceptionthat it calls schedule work to arrange the bottom-half processing.

5.6.5.7 Interrupt Sharing

The notion of an IRQ conflict is almost synonymous with the PC architecture. In the past, IRQlines on the PC have not been able to serve more than one device, and there have never beenenough of them. As a result, frustrated users have often spent much time with their computer caseopen, trying to find a way to make all of their peripherals play well together.

Modern hardware, of course, has been designed to allow the sharing of interrupts; the PCI busrequires it. Therefore, the Linux kernel supports interrupt sharing on all buses, even those (suchas the ISA bus) where sharing has traditionally not been supported. Device drivers for the 3.xkernel should be written to work with shared interrupts if the target hardware can support thatmode of operation. Fortunately, working with shared interrupts is easy, most of the time.

Installing a Shared Handler Shared interrupts are installed through request irq just likenonshared ones, but there are two differences:

• The IRQF SHARED bit must be specified in the flags argument when requesting the inter-rupt.

• The dev id argument must be unique. Any pointer into the module’s address space will do,but dev id definitely cannot be set to NULL.

The kernel keeps a list of shared handlers associated with the interrupt, and dev id can bethought of as the signature that differentiates between them. If two drivers were to register NULLas their signature on the same interrupt, things might get mixed up at unload time, causingthe kernel to oops when an interrupt arrived. For this reason, modern kernels complain loudly ifpassed a NULL dev id when registering shared interrupts. When a shared interrupt is requested,request irq succeeds if one of the following is true:


• The interrupt line is free.

• All handlers already registered for that line have also specified that the IRQ is to be shared.

Whenever two or more drivers are sharing an interrupt line and the hardware interrupts theprocessor on that line, the kernel invokes every handler registered for that interrupt, passing eachits own dev id. Therefore, a shared handler must be able to recognise its own interrupts and shouldquickly exit when its own device has not interrupted. Be sure to return IRQ NONE whenever yourhandler is called and finds that the device is not interrupting.

If you need to probe for your device before requesting the IRQ line, the kernel can’t help you.No probing function is available for shared handlers. The standard probing mechanism works if theline being used is free, but if the line is already held by another driver with sharing capabilities, theprobe fails, even if your driver would have worked perfectly. Fortunately, most hardware designedfor interrupt sharing is also able to tell the processor which interrupt it is using, thus eliminatingthe need for explicit probing.

Releasing the handler is performed in the normal way, using free irq. Here the dev id argumentis used to select the correct handler to release from the list of shared handlers for the interrupt.That’s why the dev id pointer must be unique.

A driver using a shared handler needs to be careful about one more thing: it can’t play withenable irq or disable irq. If it does, things might go haywire for other devices sharing the line;disabling another device’s interrupts for even a short time may create latencies that are problematicfor that device and it’s user. Generally, the programmer must remember that his driver doesn’town the IRQ, and its behaviour should be more ”social” than is necessary if one owns the interruptline.

Running the Handler As suggested earlier, when the kernel receives an interrupt, all theregistered handlers are invoked. A shared handler must be able to distinguish between interruptsthat it needs to handle and interrupts generated by other devices.

Loading short with the option shared=1 installs the following handler instead of the default:

irqreturn t short sh interrupt(int irq, void ∗dev id){int value, written; struct timeval tv;

/∗ If it wasn’t short, return immediately ∗/value = inb(short base);if (!(value & 0x80))

return IRQ NONE;

/∗ clear the interrupting bit ∗/outb(value & 0x7F, short base);

/∗ the rest is unchanged ∗/do gettimeofday(&tv);written = sprintf((char ∗)short head,"%08u.%06u\n",

(int)(tv.tv sec % 100000000), (int)(tv.tv usec));short incr bp(&short head, written);wake up interruptible(&short queue);/∗ awake any reading process ∗/return IRQ HANDLED;

}

An explanation is due here. Since the parallel port has no ”interrupt-pending” bit to check,the handler uses the ACK bit for this purpose. If the bit is high, the interrupt being reported isfor short, and the handler clears the bit.


The handler resets the bit by zeroing the high bit of the parallel interface’s data port–shortassumes that pins 9 and 10 are connected together. If one of the other devices sharing the IRQwith short generates an interrupt, short sees that its own line is still inactive and does nothing.

A full-featured driver probably splits the work into top and bottom halves, of course, but that’seasy to add and does not have any impact on the code that implements sharing. A real driverwould also likely use the dev id argument to determine which, of possibly many, devices might beinterrupting.

Note that if you are using a printer (instead of the jumper wire) to test interrupt managementwith short, this shared handler won’t work as advertised, because the printer protocol doesn’tallow for sharing, and the driver can’t know whether the interrupt was from the printer.

The /proc Interface and Shared Interrupts Installing shared handlers in the system doesn’taffect /proc/stat, which doesn’t even know about handlers. However, /proc/interrupts changesslightly.

CPU0

0: 892335412 XT-PIC timer

1: 453971 XT-PIC i8042

2: 0 XT-PIC cascade

5: 0 XT-PIC libata, ehci_hcd

8: 0 XT-PIC rtc

9: 0 XT-PIC acpi

10: 11365067 XT-PIC ide2, uhci_hcd, uhci_hcd, SysKonnect SK-98xx, EMU10K1

11: 4391962 XT-PIC uhci_hcd, uhci_hcd

12: 224 XT-PIC i8042

14: 2787721 XT-PIC ide0

15: 203048 XT-PIC ide1

NMI: 41234

LOC: 892193503

ERR: 102

MIS: 0

This system has several shared interrupt lines. IRQ 5 is used for the serial ATA and IEEE1394 controllers; IRQ 10 has several devices, including an IDE controller, two USB controllers, anEthernet interface, and a sound card; and IRQ 11 also is used by two USB controllers.

5.6.5.8 Adding Your Driver in KConfig

The new driver is added in a dedicated barco driver directory. This has the advantage that thechanges are largely contained to that location. In order to select the compilation of the driver, adrivers/barco/Kconfig file is added:

#

# Barco SMD embedded devices configuration

#

menu "Barco Control Rooms Embedded"

depends on BARCO8245G1

config PCI_BARCO_PPC2FPGA

tristate "Communication with Altera FPGA (Streaming)"

depends on PCI

default m

help

Create driver to allows PCI communication between the PowerPC

and the FPGA. This is only for the Streaming platforms.


The Makefile in the drivers/barco/ directory is straightforward:

#

# Makefile for the Barco SMD Streaming Platforms (G1) drivers

#

obj-$(CONFIG_PCI_BARCO_PPC2DSP) += ppc2dsp.o

obj-$(CONFIG_PCI_BARCO_PPC2FPGA) += ppc2fpga.o

Since the drivers are located in the new drivers/barco directory, the directory needs to beadded in the upper lying drivers/Makefile:

obj-$(CONFIG_BARCO8245G1) += barco/

Because the modules are dependent on the selection of the BARCO8245G1 platform, it shouldnot be compiled for other platforms.

Finally, we include the Kconfig file in the larger scheme in drivers/Kconfig:

source "drivers/barco/Kconfig"

5.6.6 Configure the Flash Map

5.6.6.1 Introduction

When the bootloader is able to detect and access the flash, and booting the kernel with tftpbootand the kernel mounts the root filesystem over NFS; userspace code can be written, implementedand tested. However, at some point, the system will need to save settings to flash, and finallythe entire system will need to run from flash (in most cases, certainly when the systems arestand-alone).

For this reason, logical partitions are needed on the currently homogeneous flash chip.Another problem can be that the system needs a large flash footprint; that is currently not

available in one chip or that is too expensive to be used on the devices. In such a case, two (oreven more) flash chips can be mapped in memory and should be presented to user-level as onehomogeneous area.

In this case, not only logical partitions are needed, but also; the physical boundaries are masked.In both cases, a flash map driver tackles this problem.Notice that for most setups (single flash chip, linear mapping), you don’t even need to write a

new flash driver as the kernel already contains a generic driver: physmap.c. This driver just needsto be initialised with the platform specific settings (base address, flash size, partition layout, seeSection 9.3 and Section 4.7.1.4).

For PowerPC architectures, there is even a flexible way to define the flash map by defining itin the device tree.

All these remarks aside, the following section explains how to write a driver from scratch toexplain how it all works in details.

5.6.6.2 Flash Map Driver

Before code of a flash map can be written; a clear definition of what is needed must be provided.During the design of the firmware, redundancy and reliability are very important; both from ahardware point of view as from a firmware point of view. With this in mind, the following wasused for the prototype of the SVC mk II. Defining the partition table is very important since itwill fix the behaviour of the system; and once boards/systems are shipped and installed; it cannoteasily be changed for that product line. Not without breaking backward compatibility.

The definition of the sizes cannot be completely arbitrary14: since writing to flash is done inblocks of a sector15, the sizes we use for partitions should be aligned on sectors.

14In theory, it can; in practice, we should be more cautious.15As is well known, empty flash contains 0xFF: all bits are set to 1. Toggling a bit from 1 top 0 can be done by

accessing only one bit, but the other way around requires a sector to be erased and then written again.


Das U-Boot : when starting to define the flash lay-out, the bootloader is the first one to de-termine since it is basically the only one that has a location that is not completely free tochose. In the case for the device that we are using (a Freescale MPC8347E), the processorstarts booting from address 0x00000100 from CS0 and reads the RCW (Reset ConfigurationWords) from address 0x00000000 from CS0. When looking at the binary u-boot image16

we see that the RCW that was configured in the include/configs/BARCO834XG1.h file isspecified at the beginning of the file. The remaining bytes up until address 0x100 are padded,and U-Boot binary code starts at that address. Indeed, when we look at the binary codefrom other platforms (e.g. the 8245 from the previous SVC, included in the default U-Bootrelease), address 0x0 only contained a version string. From this, we can conclude that savingU-Boot to address 0x00000000 on the flash should suffice to boot the processor17.

Kernel and Filesystem : As specified before, a kernel and filesystem need to be stored. Forthat, two partitions are reserved (one for the kernel and one for the filesystem). From ourinitial builds, we know that the kernel is between 700 and 800 kB. We add a margin of 50%to include possible future extensions18. Determining the filesystem size is more difficult sincea lot of development is still going on. However, we know that for the previous system, thefilesystem was around 2.5 MB (the bulk of which are binary DSP codecs). To be on the safeside, we double it to a size of around 5 MB.

Configuration : In order to manipulate the behaviour of the bootloader in a flexible way, apartition is required for saving settings to flash. This will be a partition without a filesystem;writing data to raw flash. This partition will also be used for saving some application specificsettings: U-Boot provides the fw printenv and fw setenv tools for adding and editingkey/value pairs in userspace19.

Configuration Filesystem : Even though is not mandatory to have a read/write filesystemwhere files can be written to persistently, is is useful for a number of cases like savingcritical logging (or saving at certain checkpoints); saving ssh keys (otherwise, they need tobe re-generated every time when the system is booted), . . . .

Furthermore, we opted for a fully redundant system: at production, a kernel/filesystem isuploaded to the boards before shipping. At the installation, the these must be updated in a securefashion: when somewhere during the upgrade or during operation, the system gets corrupted, itshould revert back to a working (factory) settings. For this reason, we add an additional kerneland filesystem partition.

The U-Boot bootloader provides a mechanism to save the settings in a redundant way: twolocations are kept and after one location is written with the new, the second one is made in-activeby toggling a bit from 1 to 0, thus avoiding to erase/write the entire sector. Furthermore, theentire section is protected with a CRC32. Since writing to flash is relatively slow (where much cango wrong), we opt for this redundant strategy for saving the settings in raw flash.

The Configuration filesystem (JFFS2) is not made redundant, since it only serves to keep somepersistent logging and save a couple of keys. When this data is lost, it is not critical to the operationof the system.

Ultimately, the partitioning in Figure 5.7 is drawn up.The code for flash map drivers can be found in drivers/mtd/maps.First, we define some of the physical information about our flash mapping; the width, size and

start address (where flash is mapped in memory). Instead of hard coding this, we will modifyKConfig to allow entering this information during configuration (cf. infra).

16Edit the file with vim -b u-boot.bin and use xxd to see the file in split hex/binary mode.17Of course, when the settings we entered in the RCW are correct.18As mentioned before, some kernel functionality can be shifted from the kernel to userspace (or the other way

around) by compiling drivers as modules.19The fw setenv utility only edits one key/value pair at a time. It is a simple exercise to extend and modify

fw setenv to edit any number of key/value pairs and write them away to flash in one pass, see http://neo.barco.

com/svn/code/trunk/firmware/ppc/env-1.1.4/.

http://neo.barco.com/svn/code/trunk/firmware/ppc/env-1.1.4/

http://neo.barco.com/svn/code/trunk/firmware/ppc/env-1.1.4/


struct map info barco834xg1 map = {.name = "Barco SMD Streaming Platforms (BARCO834XG1) Flash Layout",.phys = CONFIG MTD BARCO START,.size = CONFIG MTD BARCO LEN,.bankwidth = CONFIG MTD BARCO BUSWIDTH,

};

The partition table as described in Figure 5.7 is entered as an array of struct mtd partition.The offset is only entered for the first partition (offset 0); and there others are just appended tothe previous partitions MTDPART OFS APPEND. For each partition, the appropriate name and size isdetermined. The partition masks are not modified, except for the factory partitions.

Since the factory settings/firmware are the firmware the system should always be able to fallback on when other partitions and/or functionality is corrupted; it should not be possible tooverwrite these partitions. To this end, we mask off the writable bit. When the system is running,it will not be possible to write to those partitions.


static struct mtd partition barco834xg1 partitions[] = {{

.name = "Das U-Boot",

.size = 0x40000,

.offset = 0,#ifndef CONFIG MTD BARCO UNPROTECT

.mask flags = MTD WRITEABLE, /∗ Force Read-Only ∗/#endif},{

.name = "Configuration Space 0",

.size = 0x20000,

.offset = MTDPART OFS APPEND,},{

.name = "Configuration Space 1",

.size = 0x20000,


.name = "Factory Kernel",

.size = 0x140000,

.offset = MTDPART OFS APPEND,#ifndef CONFIG MTD BARCO UNPROTECT


.name = "Factory Filesystem",

.size = 0x540000,

.offset = MTDPART OFS APPEND,#ifndef CONFIG MTD BARCO UNPROTECT


.name = "Upgrade Kernel",

.size = 0x140000,


.name = "Upgrade Filesystem",

.size = 0x540000,


.name = "JFFS2 Flash Filesystem",

.size = 0x280000,

.offset = MTDPART OFS APPEND,}

};static int num barco834xg1 partitions = 8;


This can be tested be setting all bits to 0:

# cat /dev/zero > /dev/mtd4

-sh: cannot create /dev/mtd4: Permission denied

# cat /dev/zero > /dev/mtd6

cat: Write Error: No space left on device

The first command simply fails (factory root filesystem); while the second command just fillsup the entire partition with 0s.

This setting can again be configured with a KConfig option: MTD BARCO UNPROTECT. This wasadded for the rare occasion where in-the field factory firmware was so unstable that it, underany circumstances, should be replaced. Since we’ve protected this as much as possible, a specialfirmware must be uploaded that has these protections disabled; before we can do this dangerousupgrade.

The rest of the code is just minor adaptation from the generic driver code in drivers/mtd/maps/physmap.c,where we use our flash mapping instead of the example mymtd.

5.6.6.3 Combining Multiple Flash Chips (hardcoded)

Writing a flash mapping for one flash chip is simple enough. However, we can also mask the factthat a flash region is in fact part of two different chips. As such, two cheaper devices can be(re)used instead of a larger one. In this case, we need to re-define a number of flash operations inthe struct map info.

As an example, we use the implementation for the first SVC card. This card had a flashdevice of 16 MB, where the second half is addressed with a different chip select; creating twodifferent devices in one package. Note that a more generic approach is described in Section 9.3and Section 4.7.1.4 by defining the mtdparts variable in the bootloader and using the physmap

driver on kernel level.

struct map info barco8245g1 map = {.name = "Barco SMD Streaming Platforms (G1) Flash Layout",.phys = CONFIG MTD BARCO START,.size = CONFIG MTD BARCO LEN,.bankwidth = CONFIG MTD BARCO BUSWIDTH,.read = barco8245g1 read8,.copy from = barco8245g1 copy from,.write = barco8245g1 write8,.copy to = barco8245g1 copy to,

};

The read/write addresses accessing a single byte within the flash, while the copy from/copy to

functions tackle block reading/writing.During the initialisation of the driver, the two flash areas are save in the internal structure


if (!(barco8245g1 map.map priv 1 = (unsigned long)ioremap(CONFIG MTD BARCO START,(CONFIG MTD BARCO LEN�1)))) {

returnvalue = −EIO;goto error ioremap1;

}

if (!(barco8245g1 map.map priv 2 = (unsigned long)ioremap(CONFIG MTD BARCO START+(CONFIG MTD BARCO LEN�1),(CONFIG MTD BARCO LEN�1)))) {

returnvalue = −EIO;goto error ioremap2;

}

The code for the single beat accesses is pretty straight forward (the read operation is similar).

static void barco8245g1 write8(struct map info ∗map, map word d, unsigned long adr){if(adr < (CONFIG MTD BARCO LEN�1)){

writeb(d.x[0],(u8∗)(map→map priv 1 + adr));}else if(adr < CONFIG MTD BARCO LEN){

writeb(d.x[0],(u8∗)(map→map priv 2 + (adr−(CONFIG MTD BARCO LEN�1))));}else{

printk(KERN WARNING "Invalid address (%#10lx) at %s: %s (%d)\n",adr,FILE , FUNCTION , LINE );}

}

Based on the address, the function selects the correct device (mapped on map priv 1 andmap priv 2) and writes the data to the correct locations. If the address offset is completely bogus,past the end of the device, we print a warning in the kernel log and ignore the attempt.

For reading a block of data, the same technique is used:


void barco8245g1 copy from(struct map info ∗map, void ∗to, unsigned long from, ssize t len){if(len>CONFIG MTD BARCO LEN){

printk(KERN WARNING "Trying to read a length which cannot be contained in

the chip\n");printk(KERN WARNING " %#10x at %#10lx\n",len, from);

}else if(from < (CONFIG MTD BARCO LEN�1)){

if((from+len) > (CONFIG MTD BARCO LEN�1)){memcpy fromio(to,(void∗)(map→map priv 1+from),

(CONFIG MTD BARCO LEN�1)−from);memcpy fromio(to+((CONFIG MTD BARCO LEN�1)−from),

(void∗)(map→map priv 2),len−((CONFIG MTD BARCO LEN�1)−from));

}else{

memcpy fromio(to,(void∗)(map→map priv 1+from),len);}

}else{

if((from+len) <= (CONFIG MTD BARCO LEN)){memcpy fromio(to,(void∗)(map→map priv 2+

(from−(CONFIG MTD BARCO LEN�1))),len);}else{

printk(KERN WARNING "Length too large at a too large offset for

reading\n");printk(KERN WARNING " %#10x at %#10lx\n",len, from);

}}

}

First, a basic check is done on the validity of the length of the block. Next, the chip is selectedbased on the starting address of the read operation. If this is in the first part, we must check theend address of the block is still in the current chip. If it is not, the first part of the block must beread from the first memory range (chip) and the second part from the second (and appended).

If the start address is in the second part, we do a check to see if the length matches the size ofthe second chip range. The offset is adjusted to be relative to the second chip range and the datais returned.

The barco8245g1 copy to is very similar to the previous function.

5.6.6.4 Adding Your Driver in KConfig

After these changes have been added, we want to compile them in the Linux kernel (or as amodule). Therefore we add the required options to the appropriate Kconfig files.

In drivers/mtd/maps/Kconfig, the relevant changes are lifted out, the relevant help sectionsshould speak for themselves:

config MTD_BARCO834XG1

tristate "CFI Flash device mapped on Barco Streaming 834X G1 Platform"

depends on BARCO834XG1 && MTD_CFI && MTD_PARTITIONS

default y


help

Enable this if you want to access the Flash devices on the

Barco Streaming 834X G1 Platform boards.

This provides a ’mapping’ driver which allows the CFI probe and

command set driver code to communicate with flash chips which

are mapped physically into the CPU’s memory.

menu "Barco SMD Streaming Platforms (G1) Flash layout configuration"

depends on MTD_BARCO8245G1 || MTD_BARCO8245G2 || MTD_BARCO834XG1

config MTD_BARCO_START

hex "Physical start address of flash mapping"


default "0xFF000000"

help

This is the physical memory location at which the flash chips

are mapped on your particular target board. Refer to the

memory map which should hopefully be in the documentation for

your board.

config MTD_BARCO_LEN

hex "Physical length of flash mapping"


default "0x1000000"

help

This is the total length of the mapping of the flash chips on

your particular board. If there is space, or aliases, in the

physical memory map between the chips, this could be larger

than the total amount of flash present. Refer to the memory

map which should hopefully be in the documentation for your

board.

config MTD_BARCO_BUSWIDTH

int "Bus width in octets"


default "1"

help

This is the total width of the data bus of the flash devices

in octets. For example, if you have a data bus width of 32

bits, you would set the bus width octet value to 4. This is

used internally by the CFI drivers.

endmenu

If a default value is specified; this is the value that will be use when running oldconfig; orproposed while running menuconfig. However, in the last case, these settings can be changed.

Also, a line is needed in the driver/mtd/maps/Makefile to compile the selected file:

obj-$(CONFIG_MTD_BARCO834XG1) += barco834xg1.o


5.7 Hands On - LED Driver

During porting, one of the first things done is giving an indication to the outside world that theboard is up and running. The most visible way to do this is by blinking Light Emitting Diodes(LEDs). The Sheevaplug has 2 visible leds: An orange for power, and a green/blue for activity.

The power LED is connected directly to the power supply and hence not under direct CPUcontrol, but the activity LED is connected to General Purpose Input / Output (GPIO) pin 49 onthe processor.

The GPIO controller is a very simple device with only three 32bit registers: Out, Direction andIn. Bit N in each register is used to control GPIO pin N, E.G. bit 2 controls GPIO pin 2.

• The Out register is used to set the output level (1 = high) of the GPIO pins configured asoutputs.

• The Direction register is used to configure the direction of the GPIO pins (0 = output, 1 =input).

• The In register is used to read the input level of the GPIO pins configured as inputs.

The SoC contains two GPIO controllers, one for GPIO 0..31, and another for GPIO 32..63.The two controllers are located at base address 0xf1010100 (low) and 0xf1010140 (high), with thefollowing offsets relative to this:

• Out is at offset 0 (0xf10101X0)

• Direction is at offset 4 (0xf10101X4)

• In is at offset 16 (0xf10101Y0)

See page 765 of the 88F6281 functional description for more info.

5.7.1 Hardware Verification

It is always a good idea to do a quick check of the hardware assumptions before starting on akernel driver. This is luckily easy to do in U-Boot. GPIO 49 is bit 17 of the high GPIO controller,so we look at address 0xf101014X:

__ __ _ _

| \/ | __ _ _ ____ _____| | |

| |\/| |/ _‘ | ’__\ \ / / _ \ | |

| | | | (_| | | \ V / __/ | |

|_| |_|\__,_|_| \_/ \___|_|_|

_ _ ____ _

| | | | | __ ) ___ ___ | |_

| | | |___| _ \ / _ \ / _ \| __|

| |_| |___| |_) | (_) | (_) | |_

\___/ |____/ \___/ \___/ \__|

** MARVELL BOARD: SHEEVA PLUG LE

U-Boot 1.1.4 (Jul 14 2009 - 06:46:57) Marvell version: 3.4.1

..


Marvell>> md.l f1010140 5

f1010140: 00000000 00000000 00000000 00000000 ................

f1010150: 00000000 ....


We see that all GPIOs are configured as low outputs without any blinking. If we now try tochange bit 17 (0x20000) of the Out register we see the LED changing from blue to green (notice:U-Boot always works with hexidecimal numbers):

Marvell>> mw.l f1010140 20000

And back:

Marvell>> mw.l f1010140 0

5.7.2 Kernel Driver

With this out of the way the real task can be started. We will implement a Linux kernel driver tocontrol the green/blue LED from userspace on the Sheevaplug device.

The driver that will be implemented is called ledtest. Start out with a minimalistic framework,printing a kernel message when the driver is initialised and when it is removed. The driver isplaced in drivers/misc/ledtest.c.

Note that in the following, the code will be explained. Please take care, when inputting thecode in your file, that the order of the declarations and/or definitions is correct: define or declarea function before using it (e.g. read and write functions that are passed to a structure).

#include <linux/init.h>#include <linux/module.h>

static int init ledtest init(void){

printk(KERN INFO "ledtest_init\n");return 0;}

static void exit ledtest exit(void){

printk(KERN INFO "ledtest_exit\n");}

module init(ledtest init);module exit(ledtest exit);

Add the driver in the kernel build infrastructure, add the configuration option (LEDTEST) tothe Makefile in drivers/misc/Makefile:

obj−$(CONFIG LEDTEST) += ledtest.o

Next, include the required information to select this option in the menu structure of the kernelin drivers/misc/Kconfig.

config LEDTESTtristate "Sheevaplug example LED driver"

depends on MACH SHEEVAPLUGhelp

Example driver for green/blue LED on Sheevaplug.


Enable the driver in make menuconfig and recompile the kernel (don’t forget the cross compi-lation prefix and selecting the kernel architecture) and load it with the bootloader into memory ofthe Sheevaplug. Since the device driver is specific for our hardware, the driver can only be selectedwhen the kernel is being compiled for it.

Boot the kernel again over NFS.Carefully inspect the kernel messages and look for the indication that the ledtest driver is

active:

ledtest_init

5.7.2.1 Platform Bus

This is a simple driver. In the next phase, add the platform infrastructure to the module. For this,both the driver needs to be modified slightly and inthe board description file,arch/arm/mach-kirkwood/sheevaplug-setup.c, extra resources are added:

static struct resource sheevaplug ledtest res[] = {{

.flags = IORESOURCE MEM,

.start = 0xf1010140,

.end = 0xf1010153,},

};

static struct platform device sheevaplug ledtest = {.name = "ledtest",.resource = sheevaplug ledtest res,.num resources = ARRAY SIZE(sheevaplug ledtest res)

};

The first structure specifies the addresses that identify the location to read and write to in orderto steer the GPIO pins. This structure is used to define the platform device sheevaplug ledtest.

Finally we register it at the bottom of sheevaplug init():

platform device register(&sheevaplug ledtest);

The second part of the work is done in the ledtest driver itself:

static struct platform driver ledtest driver = {.probe = ledtest probe,.remove = devexit p(ledtest remove),.driver = {

.owner = THIS MODULE,

.name = "ledtest",},

};

Add the structure (not unlike the structures that are used for e.g. adding PCI drivers) thatdefine the probe and remove functionality and a name that identifies the device driver. This needsto match the value that was used in the platform device structure defined in the architecturedefinition.


The definition of the probe and remove function itself does not do much yet, next to printinga debug message:

#include <linux/platform device.h>

static int devinit ledtest probe(struct platform device ∗pdev){

printk(KERN INFO "ledtest_probe\n");return 0;

}

static int devexit ledtest remove(struct platform device ∗pdev){

return 0;}

In the init and exit code of the driver, add the platform registration and unregistration code:

static int init ledtest init(void){

printk(KERN INFO "ledtest_init\n");

return platform driver register(&ledtest driver);}

static void exit ledtest exit(void){

printk(KERN INFO "ledtest_exit\n");

platform driver unregister(&ledtest driver);}

Reload the kernel in the bootloader and inspect the kernel log during booting for the correctfeedback.

5.7.2.2 Hardware Access

At this point, our platform driver nicely loads but does not do much yet. In order to get thatdone, code needs to be added in drivers/misc/ledtest.c that reads and writes to the memorylocations of the GPIO pins.

The probing gets more complex as we assign the base address to a local structure and need tocapture errors while initialising:


struct ledtest {void iomem ∗base;

};

static int devinit ledtest probe(struct platform device ∗pdev){

struct ledtest ∗led;struct resource ∗res;int ret;

printk(KERN INFO "ledtest_probe\n");

res = platform get resource(pdev, IORESOURCE MEM, 0);if (!res)

return −ENODEV;

led = kzalloc(sizeof(∗led), GFP KERNEL);if (!led)

return −ENOMEM;

led→base = ioremap(res→start, resource size(res));if (!led→base) {

dev err(&pdev→dev, "Unable to map registers\n");ret = −EIO;goto map failed;

}

platform set drvdata(pdev, led);

ledtest clear(led, REG DIR, GPIO LED);ledtest set(led, REG OUT, GPIO LED);

dev info(&pdev→dev, "Ledtest LEDs at %p\n", led→base);

return 0;

map failed:kfree(led);

return ret;}

The remove clears the initialised pointer:


static int devexit ledtest remove(struct platform device ∗pdev){

struct ledtest ∗led;

led = platform get drvdata(pdev);platform set drvdata(pdev, NULL);

iounmap(led→base);kfree(led);

return 0;}

At the end of the probe, the GPIO is configured as a high output (green). In order to do so,the correct offsets and bitmasks need to be defined:

#define REG OUT 0x00#define REG DIR 0x04#define REG IN 0x10

#define GPIO LED (1� 17)

And some helper functions are defined to read and write to the registers, the clear and set areeach others’ complementary functions.

#include <linux/io.h>#include <linux/slab.h>

static void ledtest write(struct ledtest ∗led, int reg, u32 val){

writel(val, led→base + reg);}

static u32 ledtest read(struct ledtest ∗led, int reg){

return readl(led→base + reg);}

static void ledtest clear(struct ledtest ∗led, int reg, u32 mask){

ledtest write(led, reg, ledtest read(led, reg) & ∼mask);}

static void ledtest set(struct ledtest ∗led, int reg, u32 mask){

ledtest write(led, reg, ledtest read(led, reg) | mask);}

Recompile the kernel and reload it on the board. When booting, closely watch the activity

LED.


5.7.2.3 Sysfs Interface

The current ledtest driver drives the LED as expected. Modifying the code so userlevel programscan drive the LED requires a small modification.

In order to do so, sysfs support is added. Sysfs is a virtual file system provided by Linux.Sysfs exports information about devices and drivers from the kernel device model to userspace,and is also used for configuration.

For each object added in the driver model tree (drivers, devices including class devices) adirectory in sysfs is created. The parent/child relationship is reflected with subdirectories under/sys/devices/ (reflecting the physical layout). The subdirectory /sys/bus/ is populated withsymbolic links, reflecting how the devices belong to different busses. /sys/class/ shows devicesgrouped according to classes, like network, while /sys/block/ contains the block devices.

For device drivers and devices, attributes may be created. These are simple files; the rule isthat they should only contain a single value and/or allow a single value to be set (unlike some filesin procfs, which need to be heavily parsed). These files show up in the subdirectory of the device.Using attribute groups, a subdirectory filled with attributes may also be created.

In the ledtest probe function, add sysfs probe functionality (before driving the LEDs atthe end of the initialisation):

ret = device create file(&pdev→dev, &dev attr color);if (ret)

goto err sysfs color;

platform set drvdata(pdev, led);

Even though goto statements are considered evil in many programming courses, they do pro-vide an elegant way in handling errors for sequential initialisation. As such, goto statements arevery common in kernel driver code. Each initialisation function is matched with an appropriateerror code that undoes whatever initialisation that occurred before the error was triggered. Thefollowing error handling is added at the bottom of the probe function:

err sysfs color:map failed:

kfree(led);

As usual, what has been done during probe needs to be undone during remove:

device remove file(&pdev→dev, &dev attr color);

The next code add the interactivity from the sysfs filesystem towards the kernel. Add thecolor attribute to the ledtest driver. This attribute is readable by everybody (user, group andother, and is only writable by the owner of the device (root). Two functions are added; one forreading the value and one for writing the value (show and store).

static DEVICE ATTR(color, S IRUGO|S IWUSR, ledtest color show, ledtest color store);

A lot of drivers simple output the boolean value in the sysfs file, but in this case, the drivershould show ”green” and ”blue” instead of 1 and 0. Add the following functions to the driver codeto implement the previously used ledtest color show and ledtest color store.


static ssize t ledtest color show(struct device ∗dev,struct device attribute ∗attr, char ∗buf)

{struct platform device ∗pdev =

container of(dev, struct platform device, dev);struct ledtest ∗led = platform get drvdata(pdev);

return sprintf(buf, "%s\n", (ledtest read(led, REG OUT) & GPIO LED)? "green" : "blue");

}

static ssize t ledtest color store(struct device ∗dev,struct device attribute ∗attr,const char ∗buf, size t count)



if (sysfs streq("green", buf))ledtest set(led, REG OUT, GPIO LED);

else if (sysfs streq("blue", buf))ledtest clear(led, REG OUT, GPIO LED);

elsereturn −EINVAL;

return count;}

Recompile the kernel, load it on the board and reboot the system. Log in to the system andlook for the ledtest interfaces in /sys/bus/platform directory of the sysfs filesystem.

You should now be able to drive the function of the LEDs with:

# echo green > color

# echo blue > color

# cat color

blue

or

# while true; do echo green > color; sleep 3; echo blue > color; sleep 3; done

To make it blink the LED every 3 seconds.

5.7.2.4 Timers

This way of blinking works, but isn’t very effecient. It would be nice if the kernel could handlethe blinking for us: In a final step the ledtest driver is modified to use a timer for blinking, withthe frequency of the timer being the variable that the user is able to influence from the sysfs

interface.Modify the ledtest structure so that the led has a timer attached to it and a delay value:


#include <linux/timer.h>

struct ledtest {void iomem ∗base;struct timer list timer;long delay;

};

Next to the color attribute, we will add a blink attribute where the blink delay in millisecondscan be accessed:

static ssize t ledtest blink show(struct device ∗dev,struct device attribute ∗attr, char ∗buf)



return sprintf(buf, "%ld\n", led→delay);}

static ssize t ledtest blink store(struct device ∗dev,struct device attribute ∗attr,const char ∗buf, size t count)



del timer sync(&led→timer);

led→delay = simple strtol(buf, 0, 0);if (led→delay)

mod timer(&led→timer, jiffies + msecs to jiffies(led→delay));

return count;}

static DEVICE ATTR(blink, S IRUGO|S IWUSR, ledtest blink show, ledtest blink store);

When the timer expires, the driver toggles the value of the LED. We add the following codeto the driver:


static void ledtest timer(unsigned long data){

struct ledtest ∗led = (struct ledtest ∗)data;

ledtest write(led, REG OUT, ledtest read(led, REG OUT) ˆ GPIO LED);mod timer(&led→timer, jiffies + msecs to jiffies(led→delay));

}

And finally, we need to add the attribute to the kernel and initialise the timer correctly inprobe (and cleanup in the remove code).

ret = device create file(&pdev→dev, &dev attr blink);if (ret)

goto err sysfs blink;

init timer(&led→timer);led→timer.function = ledtest timer;led→timer.data = (unsigned long)led;

...

err sysfs blink:device remove file(&pdev→dev, &dev attr color);

}

In the remove function, we delete the timer again.

device remove file(&pdev→dev, &dev attr blink);

del timer sync(&led→timer);

Recompile the kernel, reload it on the board and see the effect of modifying the blinking intervalwith the blink sysfs attribute.

# echo 100 > blink

# echo 200 > blink

5.8 References

• http://www.kroah.com/lkn/

• http://janitor.kernelnewbies.org/docs/driver-howto.html

• http://lwn.net/Kernel/LDD3/

• http://www.xml.com/ldd/chapter/book/

• http://linuxplanet.com/linuxplanet/tutorials/1019/1/

• http://www.freeos.com/articles/2677/

• http://en.wikipedia.org/wiki/Linux_kernel

http://www.kroah.com/lkn/

http://janitor.kernelnewbies.org/docs/driver-howto.html

http://lwn.net/Kernel/LDD3/

http://www.xml.com/ldd/chapter/book/

http://linuxplanet.com/linuxplanet/tutorials/1019/1/

http://www.freeos.com/articles/2677/

http://en.wikipedia.org/wiki/Linux_kernel


• http://en.wikipedia.org/wiki/Sysfs

• http://www.marvell.com/products/processors/embedded/kirkwood/

• http://www.marvell.com/products/processors/embedded/kirkwood/FS_88F6180_9x_6281_

OpenSource.pdf

http://en.wikipedia.org/wiki/Sysfs

http://www.marvell.com/products/processors/embedded/kirkwood/

http://www.marvell.com/products/processors/embedded/kirkwood/FS_88F6180_9x_6281_OpenSource.pdf

http://www.marvell.com/products/processors/embedded/kirkwood/FS_88F6180_9x_6281_OpenSource.pdf

Chapter 6

File Systems

6.1 Introduction

Linux supports more than 50 file systems in the mainline kernel (and more if you count out-of-treeones), a lot more than any other operating system! But with this flexibility comes the question:What file system to use?

The file systems can roughly be divided into 4 groups:

1. Disk based

2. Flash based

3. Network file systems

4. “Virtual” file systems

6.2 Disk Based File Systems

Most of the file systems supported by Linux are disk based. They can be further divided into“native” file systems and compatibility file systems:

Native file systems are normally used for the main storage on Linux systems, and they allsupport the POSIX features expected of the system (hard and soft links, owner/group/otherspermissions, atime and mtime, ..). These include:

• EXT2/3/4

• Reiserfs3/4

• XFS

• JFS

• Btrfs

The other group of file systems are made for compatibility with other operating systems. Thesefile systems might not have all the POSIX features and might not be as performant as the nativeones, but they do allow access to data from “foreign” systems. Examples of these are: FAT16/32,NTFS, HPFS, BEFS, UFS, AFFS, HFS(+), ISO9660, UDF, ...

112

CHAPTER 6. FILE SYSTEMS 113

6.2.1 Choice

Disk based file systems are not that common in embedded systems, but they do exist - Typicallyin x86 based designs. Also remember that some flash based technologies emulate disks (Compact-Flash/MMC/SD cards, USB memory sticks, ...). When using those technologies it is importantto minimise writes by using non-journaling file systems like EXT2 or FAT and to mount with thenoatime1 option.

6.3 Flash Based File Systems

The Linux MTD (flash) layer can emulate a block device, so you can use the disk based filesystems on them, but Linux also supports several file systems which are optimised for flash, E.G.by compressing data and limiting flash writes. These are:

• Cramfs

• Squashfs

• Journaling file system 2 (JFFS2)

• UBIFS

• Logfs

Cramfs and squashfs are both read-only file systems. Cramfs is fairly old and limited, andsquashfs is a newer system with better compression and more features. JFFS2, UBIFS and Logfsall allow complete read/write access to the flash.

6.3.1 Choice

for a long time JFFS2 was the only option if read/write access was needed, but recently UBIFS(and more recently Logfs) have been added. JFFS2 works good for what it was made for, but itdoesn’t scale very well to bigger (>200MB) flashes because of its memory consumption and mounttime. For those situations UBIFS is getting more popular.

Notice though that read/write filesystems have their drawbacks as well. They are more complex,takes longer to mount and has a worse compression ratio than the read-only file systems. Considerwriting data to a raw flash partition or EEPROM if the amount of data to be written is limitedor use a RAM disk if data doesn’t need to be persistent.

For read-only access we recommend squashfs, even though it only fairly recently got added tothe mainline kernel. It compresses better, has less limits and is faster than cramfs.

Notice that cramfs and squashfs can also be used on disks.

6.4 Network File Systems

Linux supports several network file systems, the most popular being NFS and SMBFS/CIFS, butalso relative obscure ones as AFS, CODA and 9FS. They are normally not used in production onembedded Linux systems, but NFS can be very handy doing development as the kernel can booton a NFS share and NFS supports all the basic POSIX features.

1Otherwise the file systems must be written to every time a file is read to update the timestamp.

CHAPTER 6. FILE SYSTEMS 114

6.5 Virtual File Systems

And finally, Linux supports several virtual file systems. The term “virtual” refers to the fact thatthere isn’t any underlying media to store the data on.

The most well known virtual file systems are the tmpfs RAM disk file systems, devtmpfs, procfsand sysfs, where the two last are a clever interface to the kernel from user space.

TMPFS as the name implies is often used together with flash file systems for temporary datain /tmp and /var. It has the nice property that it is dynamic in size, E.G. it doesn’t use any morespace than needed to contain the files. This filesystem is also used by the new devtmpfs filesystem,which automatically manages /dev device nodes.

Procfs (and lately sysfs) is used by a lot of utilities for reading information and communicatingwith the kernel. They can also be useful for adding debugging hooks to drivers doing development.

6.6 Conclusion

As it can be seen from the above there is plenty to choose from, and no single file system is the bestin all situations. Most embedded Linux systems uses several file systems (E.G. squashfs, tmpfs,procfs and sysfs).

6.7 References

• Linux file systems wiki: http://linuxfs.pbwiki.com/

http://linuxfs.pbwiki.com/

Chapter 7

Userspace

7.1 Introduction

Userspace is where the one or more applications containing the actual functionality of the embed-ded system resides, and it is therefor also here we see the most variation between systems. Still, anumber of standard components are used in most systems.

The major difference with the boot loader and kernel discussed up until now, is that userapplication no longer directly access the hardware. If this access is needed, a request is sent to thekernel that handles it. The applications in Userland will add functionality to the system while thekernel tackles the interaction with the hardware itself.

In this chapter, the focus is on building the components that are missing to have a fullyfunctional embedded Linux system, containing shell access over ssh and a webserver. Next to thiscomes the application(s) performing the application specific tasks of the design.

7.2 BusyBox

Busybox is a program combining many standard Unix utilities into a single small executable. Itcan provide most of the utilities specified in the Single Unix Specification plus many other utilitiesa user would expect to see on a GNU/Linux system. Busybox is typically used in a single-floppy orembedded Linux system because of its small size, although it has also been used in the distributionsof Linux for the Sharp Zaurus and the Nokia 770. It is free software, licensed under the GNU GPL.

According to the project home page, Busybox is ”The Swiss Army Knife of Embedded Linux”,and is often paired with uClibc for embedded Linuxes.

Originally written by Bruce Perens in 1996, the intent of Busybox was to put a completebootable system on a single floppy that would be both a rescue disk and an installer for theDebian GNU/Linux distribution. It has since then become the De facto standard for EmbeddedLinux devices and Linux distribution installers. Since each Linux executable requires several KB ofoverhead, having the Busybox program combined the 107 programs together can save considerablespace.

Busybox exploits the fact that the standard Linux utilities share many common elements. Forexample, many file-based utilities (such as grep and find) require code to recurse a directory insearch of files. When the utilities are combined into a single executable, they can share thesecommon elements, which results in a smaller executable. In fact, Busybox can pack almost 3.5MBof utilities into around 200KB. This provides greater functionality to bootable floppy disks andembedded devices that use Linux. You can use Busybox with both the 2.4 and 2.6 Linux kernels.

115

CHAPTER 7. USERSPACE 116

7.2.1 Examples

Programs included in Busybox can be run simply by adding their name as an argument whenrunning the program as seen here.

/bin/busybox ls

More commonly, the Busybox executable is linked to (using hard or symbolic links) the desiredcommand names; Busybox notices the name it is called as, and runs the appropriate command,for example just

/bin/ls

after /bin/ls is linked to /bin/busybox.An program or applet can as such be called directly in which case the code uses argv[0]

to determine the functionality or it can be called as the first argument of the main program,simply shifting the arguments when this is detected. From a coding perspective, this is easilyaccomplished:

if (!strcmp(basename(argv[0]), "busybox")) {if (argc>1) {

argv++, argc−−;}else{

fprintf(stderr, "No applet specified.\n");return show usage();

}}

if (!strcmp(basename(argv[0]), "foo")) {return foo main(argc, argv);

else if (!strcmp(basename(argv[0]), "bar")) {return bar main(argc, argv);

else{fprintf(stderr,"No such applet \"%s\".\n", argv[0]);return show usage();

}

For busybox, a list of compiled in functionality can be obtained by just typing in busybox;without specifying the functionality:

[mleeman@zee busybox-1.20.2]$ ./busybox

BusyBox v1.20.2 (2012-09-11 10:26:34 CEST) multi-call binary.

Copyright (C) 1998-2011 Erik Andersen, Rob Landley, Denys Vlasenko

and others. Licensed under GPLv2.

See source distribution for full notice.

Usage: busybox [function] [arguments]...

or: busybox --list[-full]

or: busybox --install [-s] [DIR]

or: function [arguments]...

BusyBox is a multi-call binary that combines many common Unix

utilities into a single executable. Most people will create a


link to busybox for each function they wish to use and BusyBox

will act like whatever it was invoked as.

Currently defined functions:

[, [[, addgroup, adduser, ar, arping, ash, awk, basename, blkid,

bunzip2, bzcat, cat, catv, chattr, chgrp, chmod, chown, chroot,

chrt, chvt, cksum, clear, cmp, cp, cpio, crond, crontab, cut,

date, dc, dd, deallocvt, delgroup, deluser, devmem, df, diff,

dirname, dmesg, dnsd, dnsdomainname, dos2unix, du, dumpkmap, echo,

egrep, eject, env, ether-wake, expr, false, fdflush, fdformat,

fgrep, find, fold, free, freeramdisk, fsck, fuser, getopt, getty,

grep, gunzip, gzip, halt, hdparm, head, hexdump, hostid, hostname,

hwclock, id, ifconfig, ifdown, ifup, inetd, init, insmod, install,

ip, ipaddr, ipcrm, ipcs, iplink, iproute, iprule, iptunnel, kill,

killall, killall5, klogd, last, less, linux32, linux64, linuxrc,

ln, loadfont, loadkmap, logger, login, logname, losetup, ls,

lsattr, lsmod, lsof, lspci, lsusb, lzcat, lzma, makedevs, md5sum,

mdev, mesg, microcom, mkdir, mkfifo, mknod, mkswap, mktemp,

modprobe, more, mount, mountpoint, mt, mv, nameif, netstat,

nice, nohup, nslookup, od, openvt, passwd, patch, pidof, ping,

pipe_progress, pivot_root, poweroff, printenv, printf, ps, pwd,

rdate, readlink, readprofile, realpath, reboot, renice, reset,

resize, rm, rmdir, rmmod, route, run-parts, runlevel, sed, seq,

setarch, setconsole, setkeycodes, setlogcons, setserial, setsid,

sh, sha1sum, sha256sum, sha512sum, sleep, sort, start-stop-daemon,

strings, stty, su, sulogin, swapoff, swapon, switch_root, sync,

sysctl, syslogd, tail, tar, tee, telnet, test, tftp, time,

top, touch, tr, traceroute, true, tty, udhcpc, umount, uname,

uniq, unix2dos, unlzma, unxz, unzip, uptime, usleep, uudecode,

uuencode, vconfig, vi, vlock, watch, watchdog, wc, wget, which,

who, whoami, xargs, xz, xzcat, yes, zcat

7.2.2 Configuring and Building BusyBox

You can download the latest version of BusyBox from its Web site1. Like most open sourceprograms, it’s distributed in a compressed tarball, and you can transform it into a source treeusing the following command:

[mleeman@zee code]$ tar busybox-1.20.2.tar.bz2

(If you downloaded a version other than 1.20.2, use the appropriate version number in this andother version-specific commands.)

7.2.3 Manual configuration

If you’re building an embedded device that has very specific needs, you can manually configurethe contents of your Busybox with the menuconfig make target. If you’re familiar with buildinga Linux kernel, note that menuconfig is the same target for configuring the contents of the Linuxkernel. In fact, the ncurses-based application is the same.

Using manual configuration, you can specify the commands to be included in the final Busy-box image. You can also configure the Busybox environment, such as including support for theUnited States National Security Agency’s (NSA) Security-Enhanced Linux (SELinux), specifyingthe compiler to use (for cross-compiling in an embedded environment), and whether Busybox

1http://www.busybox.net.

http://www.busybox.net


should be compiled statically or dynamically. Figure 7.1 shows the main screen for menuconfig.Here you can see the different major classes of applications (applets) that you can configure forBusybox.

Figure 7.1: Busybox configuration screen.

To manually configure Busybox, use the following commands:

[mleeman@zee busybox-1.20.2]$ make \

CROSS_COMPILE=arm-linux- menuconfig


CROSS_COMPILE=arm-linux-

This provides you with a Busybox binary that can be invoked. The next step is to buildan environment around Busybox, including the symbolic links that redirect the standard Linuxcommands to the Busybox binary. You can do this very simply with the following command:


CROSS_COMPILE=arm-linux- install

By default, a new local subdirectory is created, called install, which contains the basic Linuxenvironment. At the root, you’ll find a linuxrc program that links to BusyBox. The linuxrc programis useful when building an install or rescue disk (permits a modularised boot prior). Also at the rootis a /sbin subdirectory that contains operating system binaries (used primarily for administration),and a /bin subdirectory that contains binaries intended for users. You can then migrate this installdirectory into your target environment when building a floppy distribution or embedded initialRAM disk. You can also use the PREFIX option with the make program to redirect the installsubdirectory to a new location. For example, the following code segment installs the symlinksusing the /tmp/newtarget root directory instead of the ./ install directory:

[mleeman@zee busybox-1.20.2]$ make

CROSS_COMPILE=arm-linux- PREFIX=/tmp/newtarget install

The links that are created through the install make target come from the busybox.links file.This file is created when Busybox is compiled, and it contains the list of commands that have


been configured. When install is performed, the busybox.links file is checked for the symlinks tocreate.

The command links to Busybox can also be created dynamically at runtime using Busybox.The CONFIG FEATURE INSTALLER option enables this feature, which can be performed atruntime as follows:

[mleeman@zee busybox-1.20.2]$ ./busybox --install -s

The -s option forces symbolic links to be created (otherwise, hard links are created). Thisoption requires that the /proc file system is present.

7.2.4 Hands On - Adding New Commands to BusyBox

In the following, a simple command will be added to busybox. Download the source tarball of thelast release off the website and extract the sources on the server.

Even though busybox can be cross compiled, for debugging and integrating, it works as well onthe server platform as on the target platform. In order to minimise overhead, just compile busyboxon the server and execute the binary on the server.

Adding a new command to BusyBox is simple because of its well-defined architecture. Thefirst step is to choose a location for your new command’s source. Select the location based on thetype of command (networking, shell, and so on), and be consistent with other commands. Thisis important because your new command will ultimately show up in the particular configurationmenu for menuconfig (in this case, in the Miscellaneous Utilities menu).

For this example, I’ve called the new command (newcmd) and placed it in the ./miscutilsdirectory. The new command’s source is shown here:

#include "libbb.h"

int newcmd main(int argc, char ∗∗argv) MAIN EXTERNALLY VISIBLE;int newcmd main( int argc, char ∗argv[] ){

int i;

printf("newcmd called:\n");

for (i = 0 ; i < argc ; i++) {

printf("arg[%d] = %s\n", i, argv[i]);

}

return 0;}

Next, add your new command source to Makefile.in in the chosen subdirectory. In this example,I update ./miscutils/Kbuild. Add your new command in alphabetical order to maintain consistencywith the existing commands:

lib-$(CONFIG_MT) += mt.o

lib-$(CONFIG_NEWCMD) += newcmd.o

lib-$(CONFIG_RAIDAUTORUN) += raidautorun.o

Next, update the configuration file, again within the ./miscutils directory, to make your newcommand visible within the configuration process. This file is called Config.in, and your newcommand is added in alphabetical order:


config NEWCMD

bool "newcmd"

default n

help

newcmd is a new test command.

This structure defines a new config entry (through the config keyword) and then the configoption (CONFIG NEWCMD). Your new command will either be enabled or disabled, so use thebool (Boolean) menu attribute for configuration. Its default is disabled (n for No), and you endwith a short Help description. You can see the entire grammar for the configuration syntax in thesource tree at ./scripts/config/Kconfig-language.txt.

Next, update the ./include/applets.src.h file to include your new command. Add the followingline to this file, remembering to keep it in alphabetical order. It’s important to maintain this order,otherwise your command will not be found.

IF_NEWCMD(APPLET(raidautorun, BB_DIR_USR_BIN, BB_SUID_DROP))

This defines your command name (newcmd), its function name in the Busybox source (newcmd main),where the link will be created for this new command (in this case, in the /usr/bin directory), and,finally, whether the command has permissions to set the user id (in this case, no).

The penultimate step is to add detailed Help information to the ./include/usage.h file. Asyou’ll see from examples in this file, usage information can be quite verbose. In this case, I’ve justadded a little information so I can build the new command:

#define newcmd_trivial_usage "None"

#define newcmd_full_usage "None"

#define newcmd_example_usage "$ newcmd arg1"

The final step is to enable your new command (through make menuconfig and then enable theoption in the Miscellaneous Utilities menu) and then build Busybox with make.

With your new Busybox available, you can test your new command, as shown next:

[mleeman@zee busybox-1.20.2]$ ./busybox newcmd arg1

newcmd called:

arg[0] = newcmd

arg[1] = arg1

$ ./busybox newcmd --help

BusyBox v1.20.2 (2012.09.18-13:47+0000) multi-call binary

Usage: newcmd None

None

That’s it! The BusyBox developers made a tool that’s not only great but also simple to extend.

7.3 Dropbear

For many years telnet has been the means by which users logged onto remote computers. Buttelnet transmits data in plain readable text, which is readily intercepted by hackers.

There’s now a better choice: It’s SSH (which stands for Secure Shell). SSH clients work justlike traditional telnet clients. You can use SSH to do anything you might typically do with telnetand with the assurance that your password and other sensitive information are secure.

Dropbear is a popular open source SSH (version 2 only) server and client. It is optimised forsmall resource usage and therefore suitable for embedded systems and other resource-constrainedenvironments. It is developed by Matt Johnston.


SSH isn’t just a replacement for telnet. Adding dropbear to your embedded system will adda lot of functionality to it; next to the secure shell access; you will get file transfer functionalitywith ssh, scp, sfs,. . . . For Windows users (e.g. support); Winscp gives a simple end-user interfaceto the target, see Figure 7.2.

Figure 7.2: Winscp, a drag and drop interface to your embedded target

All in all, there is no reason to stick with telnet.

7.4 Build Systems and Distributions

Compiling each component by hand and manually editing configuration files might work for reallysmall systems, but it quickly gets tiring. Luckily several options exists for automating it.

7.4.1 Buildroot

As described in chapter 2, Buildroot is a set of makefiles and patches to make it easy to generateembedded Linux systems. Besides creating cross toolchains, it can also compile Linux kernels,bootloaders and create file system images containing the entire userspace. The buildroot systemdoesn’t actually contain the source code of all the packages that it can build, instead the Makefilescontains instructions which automatically downloads the needed source tarballs over the Internet.

Just like BusyBox, buildroot is also very easy to extend with extra packages.

7.5 Hands On - Explore Buildroot

Start the Buildroot menuconfig and have a look at the configuration options. Try to enable a fewpackages and let it build a rootfs. After this; burn the kernel that was built with buildroot andthe filesystem onto the SheevaPlug and let it boot standalone.


Buildroot already supports the SheevaPlug2. In order to get all the images built within buil-droot; enable compilation of the bootloader and configure it correctly (sheevaplug, kwb image

format). Do the same for the linux kernel (kirkwood).For the flash layout, enable Flash Type (NAND flash with 2kB Page and 128 kB erasesize).A careful inspection of the flash map and a bit of knowledge allows resetting the SheevaPlug

to the factory settings. See section 4.7.1.4 for the discussion.The NFS modified environment should look something like:

Marvell>>

Marvell>> printenv

bootcmd=dchp; bootm

baudrate=115200

ipaddr=10.175.196.18

x_bootargs=console=ttyS0,115200 mtdparts=orion_nand:512k(uboot),\


x_bootcmd_kernel=nand read 0x6400000 0x100000 0x300000

x_bootcmd_usb=usb start

x_bootargs_root=root=/dev/mtdblock3 rw rootfstype=jffs2

stdin=serial

stdout=serial

stderr=serial

ethact=egiga0

serverip=172.0.0.1

netmask=255.0.0.0

bootdelay=10

gatewayip=172.0.0.1

ethaddr=f0:ad:4e:00:a0:10

bootargs=console=ttyS0,115200 root=/dev/nfs \

rootfs=172.0.0.1:/home/services/nfs/mleeman/,tcp ip=dhcp

Environment size: 645/131068 bytes

Marvell>>

As discussed in section 4.7.1.4; the environment is located in the last 128 kB of the U-Bootpartition; or from 368 onwards. In order to reset to the factory default; it is enough to erase thelast nand block.

Marvell>> nand device 0

Device 0: nand0... is now current device

Marvell>> nand erase 0x60000 0x20000

NAND erase: device 0 offset 0x60000, size 0x20000

Erasing at 0x60000 -- 100% complete.

OK

Marvell>>

The address is also clearly visible when executing printenv and is typically one sector (128kBwith typical current flash chip). Verify by resetting.

Marvell>> reset

resetting ...

U-Boot 2010.06 (Jul 26 2010 - 10:36:14)

2In fact, it comes with a sensible sheevaplug defconfig


Marvell-Sheevaplug


DRAM: 512 MiB

NAND: 512 MiB


In: serial

Out: serial

Err: serial

Net: egiga0



Marvell>>

Marvell>> printenv

bootcmd=${x_bootcmd_kernel}; setenv bootargs ${x_bootargs} ${x_bootargs_root};\

${x_bootcmd_usb}; bootm 0x6400000;

bootdelay=3

baudrate=115200

ethaddr=04:25:fe:ed:00:18

ipaddr=10.175.196.18

serverip=10.175.196.221

gatewayip=10.175.196.1

netmask=255.255.255.0

x_bootargs=console=ttyS0,115200 mtdparts=orion_nand:512k(uboot),\





stdin=serial

stdout=serial

stderr=serial

ethact=egiga0


The bootloader indicates that no valid configuration was found in flash; and all settings arereset to the hardcoded defaults. First off, reset the network related customisations:

Marvell>> setenv ethaddr f0:ad:4e:00:a0:10


Marvell>> setenv gatewayip 172.0.0.1

Marvell>> setenv netmask 255.0.0.0

Marvell>> saveenv

Saving Environment to NAND...

Erasing Nand...


Writing to Nand... done

Marvell>>

In the flashmap; the kernel is located at 0x100000 with a size of 0x300000. Even though it wascleared in a previous step; clear only the relevant area.





Erasing at 0x3e0000 -- 100% complete.

OK

Marvell>>

Make certain that the kernel is located on the TFTP server; fetch it in memory and burn it tothe flash.

Marvell>> tftp 0x8000000 /mleeman/uImage

Using egiga0 device

TFTP from server 10.0.0.21; our IP address is 10.175.196.18

Filename ’/mleeman/uImage’.

Load address: 0x8000000

Loading: #################################################################

#################################################################

############################

done

Bytes transferred = 2309020 (233b9c hex)

Marvell>> nand write.e 0x8000000 0x100000 0x300000

NAND write: device 0 offset 0x100000, size 0x300000

3145728 bytes written: OK

The same is done for the root filesystem. Initially, the jffs2 filesystem is used. In the flashmap; it is located at 0x500000 and occupies 13 MB (0xd00000).

Marvell>> nand erase 0x500000 0xD00000

NAND erase: device 0 offset 0x500000, size 0xd00000


OK

Marvell>> tftp 0x8000000 /mleeman/rootfs.jffs2

Using egiga0 device


Filename ’/mleeman/rootfs.jffs2’.


Loading: #################################################################

done

Bytes transferred = 939840 (e5740 hex)

Marvell>> nand write.e 0x8000000 0x500000 0xD00000

NAND write: device 0 offset 0x500000, size 0xd00000


Hit reset and watch the boot process.

Marvell>> reset

....

Initializing random number generator... done.

Starting network...

The last lines are starting the network. There is no serial console attached; have a look inbuildroot in output/target/etc/inittab and output/target/etc/network/interfaces anddetermine why.


Also, enable dropbear in the buildroot network tools; and rerun the build. Burn the resultingimage to the sheeva again.

After setting the root password to something not empty (dropbear does not allow an emptypassword for remote access; ssh can be used to log in.

On the target via serial:

buildroot login: root

# passwd

Changing password for root

New password:

Retype password:

Password for root changed by root

From a remote machine:

[marc@staleek ~]$ ssh [email protected]

The authenticity of host ’10.4.1.193 (10.4.1.193)’ can’t be established.

RSA key fingerprint is b1:77:a7:c9:29:ab:1a:1c:81:f1:80:f2:d2:d0:2b:22.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added ’10.4.1.193’ (RSA) to the list of known hosts.

[email protected]’s password:

#

A next step is enabling squashfs on the target. Verify if the filesystem is compiled in thelinux kernel (output/build/linux-2.6.37/) with the usual make ARCH=arm menuconfig andrecompile the image, either from within the kernel source; or let buildroot handle the changes;reflash and reboot the system to verify the kernel. If all is fine; reflash the jffs2 filesystem withthe squashfs filesystem.

We also need to modify the boot arguments and let the kernel detect the root filesystem type(squashfs); since squashfs is read only; adjust the permissions.

Marvell>> setenv x_bootargs_root root=/dev/mtdblock3 ro

Marvell>> saveenv


Erasing Nand...



Marvell>>

Reboot and verify the changes. Note that due to the changes; dropbear will not work since itcannot store the the keys. The easiest way to circumvent this is to add the dropbear directy inthe buildroot target and mount mount a tmpfs on /etc/dropbear. See /etc/fstab on the targetand copy/adjust the entry for /tmp).

For completeness, replace the bootloader without the previously used OpenOCD. The choice canbe made to replace it with deletion of the settings; or replace it and retaining the changes in theenvironment by careful selection of the erase size (0x80000 vs 0x60000).

U-Boot 2010.06 (Jul 26 2010 - 10:36:14)

Marvell-Sheevaplug


DRAM: 512 MiB

NAND: 512 MiB

In: serial

Out: serial


Err: serial

Net: egiga0



Marvell>>






OK

Marvell>> tftp 0x8000000 /mleeman/u-boot.kwb

Using egiga0 device


Filename ’/mleeman/u-boot.kwb’.


Loading: ###########################

done

Bytes transferred = 382608 (5d690 hex)




Marvell>> reset

resetting ...

U-Boot 2010.12 (Jan 18 2011 - 11:44:24)

Marvell-Sheevaplug


DRAM: 512 MiB

NAND: 512 MiB

*** Warning - bad CRC, using default environment

In: serial

Out: serial

Err: serial

Net: egiga0



Marvell>>

7.6 References

• http://en.wikipedia.org/wiki/Userspace

• http://en.wikipedia.org/wiki/Userland

• http://www-128.ibm.com/developerworks/linux/library/l-busybox/?ca=dgr-lnxw06BusyBox

• BusyBox: http://busybox.net

• Dropbear: http://matt.ucc.asn.au/dropbear/dropbear.html

http://en.wikipedia.org/wiki/Userspace

http://en.wikipedia.org/wiki/Userland

http://www-128.ibm.com/developerworks/linux/library/l-busybox/?ca=dgr-lnxw06BusyBox

http://busybox.net

http://matt.ucc.asn.au/dropbear/dropbear.html



• GNU/Debian: http://www.debian.org


http://www.debian.org

Chapter 8

Creating an image with a fullLinux system

8.1 Introduction

Up until relatively recently, running GNU/Linux on an embedded system typically included min-imising the system to match the reduced resource availability (e.g. memory and flash).

Over the last years, flash memory has decreased dramatically in price, while continuouslyincreasing in size. Furthermore, the performance of the processors that are being used, while stillvery much System on Chip (SoC), are often in par with the low end desktop systems.

On top of that, the behaviour and required functionality of the systems that are controlledwith embedded Linux become more complex and intelligent.

As a result, a consensus seems to be form that even on these devices, a full blown operatingsystem needs to be used.

Indeed, the Sheeva Plug already includes a full blown Linux system when shipped from thefactory.

In this chapter, a GNU/Debian Linux system will be built and loaded on the Sheeva Plug.When the device is accessed from the network, there is little or no difference with e.g. serverssystems at first glance next to the raw performance.

8.2 Preparing an ARM GNU/Debian based system on aGNU/Debian based build system

debootstrap is used to create a foreign root filesystem for the armel architecture. Note that thisis the GNU/Debian architecture name (armel) and not the kernel one (arm).

Contrary to the previous sections where most, if not all of the code needed to be compiled,it is now possible to leverage the large amount of available and readily packaged deliverables inGNU/Debian. The first task is as such to compile a root filesystem by combining the packagesavailable in GNU/Debian.

GNU/Debian has a good tool to create a base root filesystem from scratch: debootstrap (inPython) or cdebootstrap (a rewrite in C).

$ mkdir sheeva-armel-wheezy/

$ sudo debootstrap --verbose --include=apt --arch armel \

--foreign wheezy sheeva-armel-wheezy/ \

ftp://ftp.nl.debian.org/debian \

/usr/share/debootstrap/scripts/wheezy

128

CHAPTER 8. CREATING AN IMAGE WITH A FULL LINUX SYSTEM 129

This creates a first phase root filesystem: after this point, the system needs to be booted intothis base system and configuration and tweaking happens on the system itself.

However, with the advent of multiarch in GNU/Debian, this second phase can also be done onthe development system. Multiarch is the term being used to refer to the capability of a system toinstall and run applications of multiple different binary targets on the same system. For examplerunning a i386-linux-gnu application on an amd64-linux-gnu system. This example is the mostcommon case, but many other working combinations are possible, such as armel and armhf.

Multiarch also simplifies cross-building, where foreign-architecture libraries and headers areneeded on a system during building.

This second phase is optional; it can be done while preparing the target or it can be done whenthe target is being booted for the first time (on the target). If the distribution that is being useddoes not support multiarch, the latter approach will be the preferred one.

Since at this point, binaries will be run that have are compiled for a different architecture, theARM instruction set needs to be translated at run time to x86 instructions. Multiarch foreseesthis, but in order to do this in a changeroot environment; emulation needs to be provided is thischroot:

$ sudo apt-get install qemu-user-static

$ sudo cp /usr/bin/qemu-arm-static sheeva-armel-wheezy/usr/bin/

Next, the second phase of the installer is called with ARM emulation:

$ sudo chroot sheeva-armel-wheezy/ /debootstrap/debootstrap --second-stage

When this step is finished; a working armel root filesystem has been created. It is also possibleto add extra packages by simply chrooting in the newly created system and use apt to installpackages from the network.

At this point, the running system kernel has been used (typically x86). In order to boot thesystem on the ARM target, a kernel needs to be provided form ARM.

8.3 Preparing an ARM GNU/Debian based system on aNon GNU/Debian based build system

In the case a distribution does not support multiarch, some extra work needs to be done: the secondphase of the installer is run on the sheevaplug, off of a USB stick. To this end, the bootloaderneeds to be configured to run the filesystem off of the USB stick and a kernel needs to be providedto boot from the USB stick. This filesystem is then used to create the final system to run on thenand flash.

There are also other options available here (e.g. telnet into the GNU/Debian installer).

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-linux-config-boot-from-usb.

xz

In the first phase, the initial base root filesystem install needs to be finished. Since the finalgoal is to have a clean Wheezy image that allows simple flashing onto the internal NAND flash,this is an intermediary step. As such, a simple USB stick is used to store the filesystem on andthe current kernel needs to support booting from USB with an ext2 filesystem.

Even though buildroot can perfectly compile the kernel, this is done out of tree, using aseparate copy of the kernel. Download the Linux kernel from kernel.org http://www.kernel.org)and extract it and add a config with support for booting from USB.

$ wget http://crichton.homelinux.org/~marc/downloads/sheevaplug-linux-config-boot-from-usb.xz

$ unxz sheevaplug-linux-config-boot-from-usb.xz

$ cp sheevaplug-linux-config-boot-from-usb.xz linux-3.1.4/.config

$ make ARCH=arm CROSS_COMPILE=arm-linux- uImage

http://crichton.homelinux.org/~marc/downloads/sheevaplug-linux-config-boot-from-usb.xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-linux-config-boot-from-usb.xz



• http://crichton.homelinux.org/~marc/downloads/sheevaplug-u-boot.kwb.xz

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-uImage-boot-from-usb.

xz

Use a VFAT formatted memory stick and copy the bootloader file and the kernel image fileonto that. Connect the SheevaPlug to your machine with the USB cable and power it up. Makecertain you are ready to interrupt the boot cycle with your favourite serial connector (e.g. minicom,screen). The serial settings are 115200 8N1 on device node /dev/ttyUSB0. The bootloader onlyneeds to be replaced here if it was not yet done before.

$ screen /dev/ttyUSB0 115200

nand device 0

usb start

Erase the bootloader location and burn the newly compiled bootloader onto the NAND flash.

nand erase 0x0 0xa0000

fatload usb 0:1 0x8000000 /u-boot.kwb

nand write.e 0x8000000 0x0 0xa0000

reset

At this point, the system should boot with the newly compiled bootloader. Make certain tointerrupt the boot cycle before the counter hits 0.

resetting ...

U-Boot 2011.06 (Dec 01 2011 - 13:37:18)

Marvell-Sheevaplug


DRAM: 512 MiB

NAND: 512 MiB

In: serial

Out: serial

Err: serial

Net: egiga0



Marvell>>

In a next step, the existing kernel is removed and replaced by the USB capable kernel.

nand erase 0x100000 0x400000

fatload usb 0:1 0x8000000 /uImage

nand write.e 0x8000000 0x100000 0x400000

In order to boot from the USB stick, the behaviour of the bootloader is adjusted. The followingcommands will erase settings and modify them to boot from the USB stick. Finally, the settingsare saved (verify with printenv).

setenv bootargs

setenv boot_nand

setenv bootargs_console ’console=ttyS0,115200’

setenv bootargs_root ’root=/dev/sda1 rootdelay=10’

http://crichton.homelinux.org/~marc/downloads/sheevaplug-u-boot.kwb.xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-uImage-boot-from-usb.xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-uImage-boot-from-usb.xz


setenv bootcmd_usb ’usb start; nand read.e 0x2000000 0x100000 0x400000’

setenv bootcmd ’setenv bootargs $(bootargs_console) $(bootargs_root); \

run bootcmd_usb; bootm 0x2000000’

saveenv

reset

The system reboots, the kernel starts and mounts the USB based filesystem. Instead of of-fering the user a shell; the installation of GNU/Debian Wheezy is continued: packages will beinstalled (already available on the image). If your installation stops for some reason, there is somehousekeeping to do, if all goes fine; a brandy new Wheezy system will be running of your USBstick.

If the installation stops with e.g. a kernel panic, some cleaning up needs to be done on thefilesystem. Start the system and get dropped into a shell on the serial console. If this fails; a lotof the modifications can be done on any other Linux system by mounting the stick as ext2:

$ sudo mount /dev/sdb1 /media/usb0

$ su -

# cd /media/usb0/

# mv /sbin/init.REAL to /sbin/init

8.4 Customising the ARM root filesystem

Before burning the file system on a headless device, some improvements can be done.When multiarch is used, it is possible to simply chroot into the filesystem (as was done for

executing the second phase of the debootstrap.

$ sudo chroot sheeva-armel-wheezy/

Make certain that init is spawning a serial console. there are two options; one with a login (firstline) or spawn a bash shell in any case (second line). A good option is to use a bash initially; butreplace it with a login once the system is properly configured.

# T0:23:respawn:/sbin/getty -L ttyS0 115200 linux

T0:23:respawn:/bin/bash

There are a number of things that need further taking care of like:

• Set the root password (passwd).

Finally; add a proper network configuration in /etc/network/interfaces and add a correct/etc/fstab:

# cat /etc/fstab

# /etc/fstab: static file system information.

#

# <file system> <mount point> <type> <options> <dump> <pass>

# tmpfs /var/cache/apt tmpfs defaults,noatime

proc /proc proc rw,noexec,nosuid,nodev 0 0

/dev/root / rootfs rw 0 0

and for the network configuration, default back to DHCP:

# cat /etc/network/interfaces

# The loopback network interface

auto lo


iface lo inet loopback

# The primary network interface

allow-hotplug eth0

iface eth0 inet dhcp

Currently, only the root user is available on the system. Create a default user to use e.g.knx/knx. Note that this user will also be created when installing the knx packages.

# adduser knx

# mkdir -p /lib/init/rw/

Since part of the filesystem is mounted on memory (see /etc/fstab); some crucial directoriesare not created by default. These directories can be created at boot time. Add the following linesto /etc/rc.local.

# mkdir -p /var/log/knx/

# mkdir -p /var/run/knx/

# chown knx.nogroup /var/log/knx/

# chown knx.nogroup /var/run/knx/

# mkdir -p /var/cache/apt/archives/partial

Before finishing up; some extra packages are installed:

# echo "deb ftp://ftp.nl.debian.org/debian wheezy main contrib non-free" \

> /etc/apt/sources.list.d/debian.list

# apt-get update

# apt-get install openssh-server ntp vim

# apt-get clean

Before continuing, some minor modifications are done. In order to clean up the target system,qemu can be removed if present. Exit the chroot if used (exit).

$ sudo rm sheeva-armel-wheezy/usr/bin/qemu-arm-static

Create an image of the root filesystem that will be used to finish the installation (mind thedot at the end):

$ sudo tar cfJ /home/marc/sheevaplug-images/sheeva-armel-wheezy-step1.tar.xz \

-C sheeva-armel-wheezy/ .

At this point, the clean root filesystem is available.

8.5 Starting up: Compiling the Linux kernel for NANDboot

At this point, a working GNU/Debian filesystem is available. All that is remaining is to preparea kernel that boots off the NAND flash and wrap up the filesystem into an image to use on theflash.

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-modules-3.1.4.tar.xz

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-uImage-boot-from-nand.

xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-modules-3.1.4.tar.xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-uImage-boot-from-nand.xz



At this point, a kernel is required that boots from NAND. This means that at least the requiredfunctionality to detect the hardware and to boot off nand with the filesystem needs to be compiledin the kernel.

Contrary to the stock kernel; the kernel to be compiled uses ubifs instead of the older jffs2, afilesystem fitted for flash storage. JFFS2 was designed for small flashes and does not handle thecurrent large flashes very well, resulting in block balancing problems.

$ wget ftp://ftp.kernel.org/pub/linux/kernel/v3.0/linux-3.1.4.tar.bz2

$ tar xf linux-3.1.4.tar.bz2

$ cd linux-3.1.4

$ wget http://crichton.homelinux.org/~marc/downloads/sheevaplug-linux-config-boot-from-mtd.xz

$ unxz sheevaplug-linux-config-boot-from-mtd.xz

$ mv sheevaplug-linux-config-boot-from-mtd .config

$ make ARCH=arm CROSS_COMPILE=arm-linux- uImage

$ make ARCH=arm CROSS_COMPILE=arm-linux- modules

Note that the modules can be installed imediately to the ARM filesystem instead of using anintermediate archive. However, the archive is handier when a running system is to be updated andnot created from scratch.

The module selection in this example configuration is pretty broad.

$ sudo make INSTALL_MOD_PATH=/home/marc/sheevaplug-build/sheeva-armel-wheezy/ \

ARCH=arm CROSS_COMPILE=arm-linux- modules_install

$ sudo tar cfJ /home/marc/sheevaplug-images/sheeva-armel-wheezy-target.tar.xz \

-C sheeva-armel-wheezy/ .

Or with an intermediate archive:

$ mkdir /home/marc/sheevaplug-build/modules-3.1.4/

$ sudo make INSTALL_MOD_PATH=/home/marc/sheevaplug-build/modules-3.1.4/ \

ARCH=arm CROSS_COMPILE=arm-linux- modules_install

$ sudo tar cfJ /home/marc/sheevaplug-images/sheevaplug-modules-3.1.4.tar.xz \

-C /home/marc/sheevaplug-build/modules-3.1.4/ .

At this point; all is available to create the final images.

8.6 Starting up: Creating the base root filesystem image

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-armel-wheezy-target.

tar.xz

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-armel-wheezy-ubi.img.

xz

After creating a filesystem and compiling the kernel with modules; the both need to be com-bined. Use the USB stick, mount it and extract the modules into the file system or extract it inthe root filesystem staging directory.

$ cd /media/usb1

$ sudo tar xf /home/marc/sheevaplug-images/sheevaplug-modules-3.1.4.tar.xz

$ sudo find . | while read i ; do sudo touch $i; done

The filesystem on the memory stick is now a perfect image of what needs to be burnt intoflash. Before continuing, create a snapshot. The same technique is used as for creating an imageof the modules.

First, an image is created of the filesystem in the USB stick (backup purposes).

http://crichton.homelinux.org/~marc/downloads/sheevaplug-armel-wheezy-target.tar.xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-armel-wheezy-target.tar.xz

http://crichton.homelinux.org/~marc/downloads/sheevaplug-armel-wheezy-ubi.img.xz



$ sudo tar cfJ /home/marc/sheevaplug-images/sheeva-armel-wheezy-target.tar.xz \

-C /media/usb1/ .

Finally, create the filesystem image to flash. The utilities are a part of mtd-utils.

$ sudo mkfs.ubifs -r sheeva-armel-wheezy -m 2048 -e 129024 \

-c 4096 -o ubifs.img -x zlib

[mleeman@bane debian-rootfs]$ cat ubi.cfg

[ubifs]

mode=ubi

image=ubifs.img

vol_id=0

vol_size=256MiB

vol_type=dynamic

vol_name=rootfs

vol_flags=autoresize

[mleeman@bane debian-rootfs]$ sudo ubinize -o ubi.img -m 2048 \

-p 128KiB -s 512 ubi.cfg

8.7 Install GNU/Debian 6.0 on the internal NAND flash

8.7.1 Files

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-u-boot.kwb.xz

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-uImage-boot-from-nand.

xz

• http://crichton.homelinux.org/~marc/downloads/sheevaplug-armel-wheezy-ubi.img.

xz

At this point, the final images are created. There are several ways of getting the images ontothe SheevaPlug, but having a simple USB stick is probably the simplest (alternatives can be e.g.TFTP).

Place the 3 images on an VFAT formatted USB stick:

1. u-boot.kwb

2. uImage

3. ubi.img

Initialise the attached USB drive. Note that I did this with a USB disk that contains a partition,I don’t know if the bootloader will recognise one that does not have a partition

nand device 0

usb start

setenv quiet 0

1. write u-boot. This is not really required, but heavily suggested to have better USB sup-port wrt the standard firmware. After writing the bootloader, press reset and let it use the newbootloader.


fatload usb 0:1 0x8000000 /u-boot.kwb


reset

http://crichton.homelinux.org/~marc/downloads/sheevaplug-u-boot.kwb.xz






As expected, the new bootloader is activated:

U-Boot 2012.04.01 (Sep 28 2012 - 14:52:25)

Marvell-Sheevaplug


DRAM: 512 MiB

WARNING: Caches not enabled

NAND: 512 MiB

*** Warning - bad CRC, using default environment

In: serial

Out: serial

Err: serial

Net: egiga0



2. write the kernel to the NAND flash

nand device 0

usb start

setenv quiet 0

nand erase 0x100000 0x400000


nand write.e 0x8000000 0x100000 0x400000

Finally, write the UbiFS root filesystem:

nand erase 0x500000 0x1fb00000

fatload usb 0:1 0x8000000 /ubi.img

nand write.e 0x8000000 0x500000 0x12000000

The write size is the full flash; if there are too many bad blocks; just adjust the write size to abit smaller (e.g. 0x1f000000). Don’t worry too much about the bad blocks; bad blocks is prettynormal for low cost NAND flash (in comparison with more expensive NOR flash). A better optionis to quickly convert the size into hex.

$ echo "obase=16; ibase=10; 300000000" | bc

In some cases, the serial just hangs after writing. In this case; reset the device and continuethe configuration from the bootloader shell.

8.8 Booting in the final system

Wrapping up, signal our bootloader to start our system by default and configure it to boot amainline kernel.

setenv bootargs ’console=ttyS0,115200 ubi.mtd=2 root=ubi0:rootfs rootfstype=ubifs’

setenv boot_nand ’nand read.e 0x2000000 0x100000 0x400000’

setenv bootcmd ’run boot_nand; bootm 0x2000000’

setenv ethaddr f0:ad:4e:00:36:62

saveenv

reset


If the plug was reflashed, the MAC address must be re-set.Remove the memory stick (not required), and reboot. If all works out, the following output

should appear:

Marvell>> reset

resetting ...

U-Boot 2012.04.01 (Sep 28 2012 - 14:52:25)

Marvell-Sheevaplug


DRAM: 512 MiB

WARNING: Caches not enabled

NAND: 512 MiB

In: serial

Out: serial

Err: serial

Net: egiga0



NAND read: device 0 offset 0x100000, size 0x400000

4194304 bytes read: OK

## Booting kernel from Legacy Image at 02000000 ...

Image Name: Linux-3.1.4

Image Type: ARM Linux Kernel Image (uncompressed)

Data Size: 1951104 Bytes = 1.9 MiB

Load Address: 00008000

Entry Point: 00008000

Verifying Checksum ... OK

Loading Kernel Image ... OK

OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.

[ 0.000000] Initializing cgroup subsys cpuset

[ 0.000000] Initializing cgroup subsys cpu

[ 0.000000] Linux version 3.1.4 (mleeman@zee) (gcc version 4.5.4 (Buildroot 2012.08) ) #1 Fri Sep 28 15:09:46 CEST 2012

[ 0.000000] CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977

[ 0.000000] CPU: VIVT data cache, VIVT instruction cache

[ 0.000000] Machine: Marvell SheevaPlug Reference Board

[ 0.000000] Memory policy: ECC disabled, Data cache writeback

[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 130048

[ 0.000000] Kernel command line: console=ttyS0,115200 ubi.mtd=2 root=ubi0:rootfs rootfstype=ubifs

[ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)

[ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)

[ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)

[ 0.000000] Memory: 512MB = 512MB total

[ 0.000000] Memory: 515740k/515740k available, 8548k reserved, 0K highmem

[ 0.000000] Virtual kernel memory layout:

[ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)

[ 0.000000] fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)


[ 0.000000] DMA : 0xffc00000 - 0xffe00000 ( 2 MB)

[ 0.000000] vmalloc : 0xe0800000 - 0xfe800000 ( 480 MB)

[ 0.000000] lowmem : 0xc0000000 - 0xe0000000 ( 512 MB)

[ 0.000000] modules : 0xbf000000 - 0xc0000000 ( 16 MB)

[ 0.000000] .text : 0xc0008000 - 0xc037392c (3503 kB)

[ 0.000000] .init : 0xc0374000 - 0xc0396000 ( 136 kB)

[ 0.000000] .data : 0xc0396000 - 0xc03bb840 ( 151 kB)

[ 0.000000] .bss : 0xc03bb864 - 0xc03ebb78 ( 193 kB)

[ 0.000000] NR_IRQS:114

[ 0.000000] sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 21474ms

[ 0.000000] Console: colour dummy device 80x30

[ 14.872095] Calibrating delay loop... 1191.11 BogoMIPS (lpj=5955584)

[ 14.961988] pid_max: default: 32768 minimum: 301

[ 14.962112] Security Framework initialized

[ 14.962133] SELinux: Disabled at boot.

[ 14.962193] Mount-cache hash table entries: 512

[ 14.962504] Initializing cgroup subsys cpuacct

[ 14.962529] Initializing cgroup subsys devices

[ 14.962541] Initializing cgroup subsys freezer

[ 14.962550] Initializing cgroup subsys net_cls

[ 14.962623] CPU: Testing write buffer coherency: ok

[ 14.965501] print_constraints: dummy:

[ 14.965920] NET: Registered protocol family 16

[ 14.968463] Kirkwood: MV88F6281-A1, TCLK=200000000.

[ 14.968477] Feroceon L2: Enabling L2

[ 14.968513] Feroceon L2: Cache support initialised.

[ 14.976906] bio: create slab <bio-0> at 0

[ 14.977475] vgaarb: loaded

[ 14.978841] Switching to clocksource orion_clocksource

[ 14.981971] Switched to NOHz mode on CPU #0


[ 14.991022] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)

[ 14.991628] TCP established hash table entries: 16384 (order: 5, 131072 bytes)

[ 14.991981] TCP bind hash table entries: 16384 (order: 4, 65536 bytes)

[ 14.992161] TCP: Hash tables configured (established 16384 bind 16384)

[ 14.992170] TCP reno registered

[ 14.992180] UDP hash table entries: 256 (order: 0, 4096 bytes)

[ 14.992202] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)


[ 14.992557] NetWinder Floating Point Emulator V0.97 (double precision)

[ 14.993418] audit: initializing netlink socket (disabled)

[ 14.993452] type=2000 audit(0.100:1): initialized

[ 15.004481] VFS: Disk quotas dquot_6.5.2

[ 15.004574] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)

[ 15.004679] JFFS2 version 2.2. (NAND) (SUMMARY) ?? 2001-2006 Red Hat, Inc.

[ 15.004992] msgmni has been set to 1007

[ 15.005687] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)

[ 15.005702] io scheduler noop registered

[ 15.005709] io scheduler deadline registered

[ 15.005756] io scheduler cfq registered (default)

[ 15.005808] mv_xor_shared mv_xor_shared.0: Marvell shared XOR driver

[ 15.005835] mv_xor_shared mv_xor_shared.1: Marvell shared XOR driver

[ 15.038902] mv_xor mv_xor.0: Marvell XOR: ( xor cpy )

[ 15.078901] mv_xor mv_xor.1: Marvell XOR: ( xor fill cpy )


[ 15.118899] mv_xor mv_xor.2: Marvell XOR: ( xor cpy )

[ 15.158900] mv_xor mv_xor.3: Marvell XOR: ( xor fill cpy )

[ 15.159432] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled

[ 15.180124] serial8250.0: ttyS0 at MMIO 0xf1012000 (irq = 33) is a 16550A

[ 15.586995] console [ttyS0] enabled

[ 15.597588] brd: module loaded

[ 15.601915] NAND device: Manufacturer ID: 0xec, Chip ID: 0xdc (Samsung NAND 512MiB 3,3V 8-bit)

[ 15.610598] Scanning device for bad blocks

[ 15.731337] Bad eraseblock 1571 at 0x00000c460000

[ 15.923381] Creating 3 MTD partitions on "orion_nand":

[ 15.928542] 0x000000000000-0x000000100000 : "u-boot"

[ 15.934515] uncorrectable error :

[ 15.938355] 0x000000100000-0x000000500000 : "uImage"

[ 15.944496] ftl_cs: FTL header not found.

[ 15.949128] 0x000000500000-0x000020000000 : "root"

[ 15.955333] ftl_cs: FTL header not found.

[ 15.960349] UBI: attaching mtd2 to ubi0

[ 15.964206] UBI: physical eraseblock size: 131072 bytes (128 KiB)

[ 15.970518] UBI: logical eraseblock size: 129024 bytes

[ 15.975939] UBI: smallest flash I/O unit: 2048

[ 15.980666] UBI: sub-page size: 512

[ 15.985304] UBI: VID header offset: 512 (aligned 512)

[ 15.991171] UBI: data offset: 2048

[ 16.570565] UBI: max. sequence number: 0

[ 16.609261] UBI: volume 0 ("rootfs") re-sized from 2081 to 4011 LEBs

[ 16.616206] UBI: attached mtd2 to ubi0

[ 16.619992] UBI: MTD device name: "root"

[ 16.624890] UBI: MTD device size: 507 MiB

[ 16.629887] UBI: number of good PEBs: 4055

[ 16.634612] UBI: number of bad PEBs: 1

[ 16.639076] UBI: number of corrupted PEBs: 0

[ 16.643533] UBI: max. allowed volumes: 128

[ 16.648163] UBI: wear-leveling threshold: 4096

[ 16.652888] UBI: number of internal volumes: 1

[ 16.657345] UBI: number of user volumes: 1

[ 16.661808] UBI: available PEBs: 0

[ 16.666266] UBI: total number of reserved PEBs: 4055

[ 16.671252] UBI: number of PEBs reserved for bad PEB handling: 40

[ 16.677366] UBI: max/mean erase counter: 1/0

[ 16.681655] UBI: image sequence number: 0

[ 16.685826] UBI: background thread "ubi_bgt0d" started, PID 282

[ 16.699067] mv643xx_eth: MV-643xx 10/100/1000 ethernet driver version 1.4

[ 16.706021] mv643xx_eth smi: probed

[ 16.739309] mv643xx_eth_port mv643xx_eth_port.0: eth0: port 0 with MAC address f0:ad:4e:00:36:62

[ 16.759239] mousedev: PS/2 mouse device common for all mice

[ 16.778964] rtc-mv rtc-mv: rtc core: registered rtc-mv as rtc0

[ 16.785061] i2c /dev entries driver

[ 16.799290] cpuidle: using governor ladder

[ 16.818941] cpuidle: using governor menu

[ 16.830217] TCP cubic registered


[ 16.837941] Registering the dns_resolver key type

[ 16.859193] registered taskstats version 1

[ 16.863774] rtc-mv rtc-mv: setting system clock to 2012-09-28 16:11:21 UTC (1348848681)


[ 17.099480] UBIFS: mounted UBI device 0, volume 0, name "rootfs"

[ 17.105523] UBIFS: file system size: 516096000 bytes (504000 KiB, 492 MiB, 4000 LEBs)

[ 17.113578] UBIFS: journal size: 9033728 bytes (8822 KiB, 8 MiB, 71 LEBs)

[ 17.120929] UBIFS: media format: w4/r0 (latest is w4/r0)

[ 17.126787] UBIFS: default compressor: zlib

[ 17.130990] UBIFS: reserved for root: 0 bytes (0 KiB)

[ 17.136873] VFS: Mounted root (ubifs filesystem) on device 0:12.

[ 17.143322] Freeing init memory: 136K

INIT: version 2.88 booting

[info] Using makefile-style concurrent boot in runlevel S.

[ ok ] Starting the hotplug events dispatcher: udevd.

[....] Synthesizing the initial hotplug events...[ 20.119208] usbcore: registered new interface driver usbfs

[ 20.124817] usbcore: registered new interface driver hub

[ 20.225457] usbcore: registered new device driver usb

[ 20.251401] mmc0: mvsdio driver initialized, lacking card detect (fall back to polling)

[ ok [ 20.284447] ehci_hcd: USB 2.0 ’Enhanced’ Host Controller (EHCI) Driver

[ 20.291084] orion-ehci orion-ehci.0: Marvell Orion EHCI

[ 20.296399] orion-ehci orion-ehci.0: new USB bus registered, assigned bus number 1

[ 20.328963] orion-ehci orion-ehci.0: irq 19, io mem 0xf1050000

done.

[ 20.405468] orion-ehci orion-ehci.0: USB 2.0 started, EHCI 1.00

[ 20.411523] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002

[ 20.418348] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1

[ 20.425616] usb usb1: Product: Marvell Orion EHCI

[ 20.430345] usb usb1: Manufacturer: Linux 3.1.4 ehci_hcd

[ 20.435681] usb usb1: SerialNumber: orion-ehci.0

[ 20.440663] hub 1-0:1.0: USB hub found

[ 20.444438] hub 1-0:1.0: 1 port detected

[....] Waiting for /dev to be fully populated...[ 20.759023] usb 1-1: new high speed USB device number 2 using orion-ehci

done.

[ 21.027952] usb 1-1: New USB device found, idVendor=058f, idProduct=6387

[ 21.034719] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3

[ 21.041903] usb 1-1: Product: Mass Storage

[ 21.046021] usb 1-1: Manufacturer: USB

[ 21.049797] usb 1-1: SerialNumber: 21474C65

[ 21.143435] SCSI subsystem initialized

[ 21.190038] usbcore: registered new interface driver uas

[ 21.225699] Initializing USB Mass Storage driver...

[ 21.259057] scsi0 : usb-storage 1-1:1.0

[ 21.264610] usbcore: registered new interface driver usb-storage

[ 21.270673] USB Mass Storage support registered.

[ ok ] Activating swap...done.

[warn] Creating compatibility symlink from /etc/mtab to /proc/mounts. ... (warning).

[ ok ] Cleaning up temporary files... /tmp /lib/init/rw.

[ 22.277630] scsi 0:0:0:0: Direct-Access USB USB 2.0 Flash 8.07 PQ: 0 ANSI: 2

[ 22.294843] scsi: killing requests for dead queue








[ 22.405963] sd 0:0:0:0: [sda] 2006016 512-byte logical blocks: (1.02 GB/979 MiB)


[ 22.414934] sd 0:0:0:0: [sda] Write Protect is off

[ 22.428952] sd 0:0:0:0: [sda] No Caching mode page present

[ 22.434472] sd 0:0:0:0: [sda] Assuming drive cache: write through



[ 22.549650] sda: sda1



[ 22.566564] sd 0:0:0:0: [sda] Attached SCSI removable disk

[ 22.591037] sd 0:0:0:0: Attached scsi generic sg0 type 0

[ ok ] Activating lvm and md swap...done.

[....] Checking file systems...fsck from util-linux 2.20.1

done.

[ ok ] Mounting local filesystems...done.

[ ok ] Activating swapfile swap...done.

[ ok ] Cleaning up temporary files....

[ ok ] Setting kernel variables ...done.

[ ok ] Configuring network interfaces...done.

[ ok ] Cleaning up temporary files....

INIT: Entering runlevel: 2

[info] Using makefile-style concurrent boot in runlevel 2.

[ ok ] Starting enhanced syslogd: rsyslogd.

[....] Starting periodic command scheduler: cron[ 26.265945] NET: Registered protocol family 10

. ok

[....] Starting NTP server: ntpd[ 26.709561] ADDRCONF(NETDEV_UP): eth0: link is not ready

. ok

[ ok ] Starting OpenBSD Secure Shell server: sshd.

bash: cannot set terminal process group (-1): Inappropriate ioctl for device

bash: no job control in this shell

root@eee1215n:/# [ 29.830738] mv643xx_eth_port mv643xx_eth_port.0: eth0: link up, 1000 Mb/s, full duplex, flow control disabled

[ 29.840878] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

root@eee1215n:/#

The password for the root user is root, change it and re-generated the ssh keys

rm /etc/ssh/ssh_host*

ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key -N ‘‘

ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key -N ‘‘

The system is now prepared with a GNU/Debian Wheezy system to install the remainder ofthe software on.

Note: this system is not yet optimised for flash: it treats the NAND flash as a normal disk. Anexercise is to look up the settings for modifying the system for running on flash and minimise theoften used systems like /var/log/ into a tmpfs filesystem.

Chapter 9

Hacking the SheevaPlug

9.1 Introduction

The Sheeva Plug is an excellent platform to do all kinds of applications. Combined with its’ smallform factor and low power consumption for a very decent performance, it is a perfect tool forrunning control applications on.

Unfortunately, there are a number of flaws in the out of the box firmware that incite us toreplace it with a cleaner and better alternative. The main problems are:

• The installation is a mix of Debian and Ubuntu, it is not clean. As far as we can tell, thebase is Debian Lenny and it has been upgraded with Ubuntu 9.04. Even though this mightbe enough for the hobbyist; this needs to be cleaned up for professional applications.

• The RW filesystem is based on JFFS2. JFFS2 has been the decent filesystem for years onNAND and NOR flash based systems; but the size of the onboard flash; pushes the JFFS2algorithms to the limit. Very soon, the Sheeva Plug will end up spending a lot of cycles onwriting and balancing the wear of the flash blocks (see the load of pdflush and jffs2 basedapplications.

• The bootloader has old code that does not properly initialise the USB devices; making themslow and consume too much power (just connect a flash stick to it and feel the heat of theconnector after some time).

For these reasons, we’ll upgrade the bootloader, the kernel and the filesystem.With the previously built environment; the SheevaPlug can now be flashed with the new

environment. Since there is a working U-boot bootloader on the system; that one can be used.In this particular case; the choice is made to ignore the present firmware completely and startprogramming the device from scratch; as if it just left the factory floor.

To this purpose, OpenOCD is used. OpenOCD (Open On-Chip Debugger) is an open sourceJTAG and debugging interface for ARM processors. The openocd daemon provides both a simplecommand line interface over telnet, and a GDB interface for more sophisticated debugging.

9.2 OpenOCD Commands

The Open On-Chip Debugger (OpenOCD) allows user interaction through a telnet interface (de-fault: port 4444) and a GDB server (default: port 3333). The command line interpreter is availablefrom both the telnet interface and a GDB session. To issue commands to the interpreter from withina GDB session, use the ”monitor” command, e.g. use ’monitor poll’ to issue the ’poll’ command.All output is relayed through the GDB session.

141

CHAPTER 9. HACKING THE SHEEVAPLUG 142

9.2.1 Daemon

sleep <msec> : Wait for n milliseconds before resuming. Useful in connection with script files(script command and target script configuration).

shutdown : Close the OpenOCD daemon, disconnecting all clients (GDB, Telnet).

debug level [n ]: Display or adjust debug level to n<0-3>

log output <file> : Redirect logging to <file> (default: stderr)

script <file> : Execute commands from <file>

9.2.2 Target state handling

poll [’on’|’off’ ]: Poll the target for its current state. If the target is in debug mode, architec-ture specific information about the current state are printed. An optional parameter allowscontinuous polling to be enabled and disabled.

halt : Send a halt request to the target. The debugger signals the debug request, and waits forthe target to enter debug mode.

resume [address ]: Resume the target at its current code position, or at an optional address.

step [address ]: Single-step the target at its current code position, or at an optional address.

reset [’run’|’halt’|’init’|’run and halt’|’run and init’ ]: Do a hard-reset. The optional pa-rameter specifies what should happen after the reset. This optional parameter overwritesthe setting specified in the configuration file, making the new behaviour the default for the’reset’ command.

• run: Let the target run.

• halt: Immediately halt the target (works only with certain configurations).

• init: Immediately halt the target, and execute the reset script (works only with certainconfigurations)

• run and halt: Let the target run for a certain amount of time, then request a halt.

• run and init: Let the target run for a certain amount of time, then request a halt.Execute the reset script once the target entered debug mode.

9.2.3 Memory access commands

These commands allow accesses of a specific size to the memory system:

mdw <addr> [count ]: display memory words

mdh <addr> [count ]: display memory half-words

mdb <addr> [count ]: display memory bytes

mww <addr> <value> : write memory word

mwh <addr> <value> : write memory half-word

mwb <addr> <value> : write memory byte

load image <file> <address> [’bin’|’ihex’|’elf ’ ]: Load image <file> to target memory at<address>

dump image <file> <address> <size> : Dump <size> bytes of target memory starting at<address> to a (binary) <file>.


9.2.4 Flash commands

flash banks : List configured flash banks

flash info <num> : Print info about flash bank <num>

flash probe <num> : Identify the flash, or validate the parameters of the configured flash.Operation depends on the flash type.

flash erase check <num> : Check erase state of sectors in flash bank <num>. This is the onlyoperation that updates the erase state information displayed by ’flash info’. That meansyou have to issue an ’erase check’ command after erasing or programming the device to getupdated information.

flash protect check <num> : Check protection state of sectors in flash bank <num>.

flash erase <num> <first> <last> : Erase sectors at bank <num>, starting at sector <first>up to and including <last>. Sector numbering starts at 0. Depending on the flash type,erasing might require the protection to be disabled first (e.g. Intel Advanced Bootblock flashusing the CFI driver).

flash write binary <num> <file> <offset> : Write the binary <file> to flash bank <num>,starting at <offset> bytes from the beginning of the bank.

flash write image <file> [offset [type]]: Write the image <file> to the current target’s flashbank(s). A relocation [offset] can be specified and the file [type] can be specified explicitlyas ’bin’ (binary), ’ihex’ (Intel hex), ’elf’ (ELF file) or ’s19’ (Motorola s19).

flash protect <num> <first> <last> <’on’|’off’> : Enable (’on’) or disable (’off’) protec-tion of flash sectors <first> to <last> of flash bank <num>.

flash auto erase <on|off> : Enable (’on’) to erase flash banks prior to writing using the flashwrite image command only. Default is (’off’), flash banks have to be erased using flash erasecommand.

9.3 Hacking the SheevaPlug

With this information, connect OpenOCD JTAG emulator. OpenOCD is part of GNU/Debian and themost recent ones (testing/unstable) have been tested and are known to work with the SheevaPlug.Another option is to use the precompiled version available from http://www.openplug.org.

Start OpenOCD:


If OpenOCD complains with a similar message:


Open On-Chip Debugger 0.3.0-in-development (2009-08-13-23:22) svn:r2529

$URL: http://svn.berlios.de/svnroot/repos/openocd/trunk/src/openocd.c $

For bug reports, read http://svn.berlios.de/svnroot/repos/openocd/trunk/BUGS

2000 kHz

jtag_nsrst_delay: 200

jtag_ntrst_delay: 200

dcc downloads are enabled

Error: unable to open ftdi device: device not found

Runtime error, file "command.c", line 469:

[marc@staleek Sheeva]$

http://www.openplug.org


you might need to change the file: sheevaplug-installer-v1.0/uboot/openocd/config/interface/sheevaplug.cfg

[marc@staleek ~]$ cat /usr/share/openocd/scripts/interface/sheevaplug.cfg

#

# Marvel SheevaPlug Development Kit

#

# http://www.marvell.com/products/embedded_processors/developer/kirkwood/sheevaplug.jsp

#

interface ft2232

ft2232_layout sheevaplug

ft2232_vid_pid 0x9e88 0x9e8f

ft2232_device_desc "SheevaPlug JTAGKey FT2232D B"

jtag_khz 2000

This is due to a changed vendor ID after 07/2009.Connect with a telnet session to your OpenOCD session.

[marc@crichton ~]$ nc localhost 4444

Open On-Chip Debugger

> sheevaplug_init

sheevaplug_init

target state: halted

target halted in ARM state due to debug-request, current mode: Supervisor

cpsr: 0x000000d3 pc: 0xffff0000

MMU: disabled, D-Cache: disabled, I-Cache: disabled

0 0 1 0: 00052078

>

Start by clearing the flash (whatever was on the SheevaPlug). Next, load the kwb image of thebootloader.

> nand probe 0

nand probe 0

> nand erase 0 0 0x20000000

nand erase 0 0 0x20000000

bad block: 1137

didn’t erase block 1133; status: 0xe1

erased blocks 0 to 4096 on NAND flash device #0 ’NAND 512MiB 3,3V 8-bit’

> nand write 0 u-boot.kwb 0 oob_softecc_kw

nand write 0 u-boot.kwb 0 oob_softecc_kw

Starting is done by resume; make certain that the USB/Serial connection is up and running.Interrupt the boot cycle in the serial connection terminal by pressing any key (enter is probablythe best choice here).

> resume

resume

U-Boot 2010.06 (Jul 26 2010 - 10:36:14)

Marvell-Sheevaplug


DRAM: 512 MiB

NAND: 512 MiB



In: serial

Out: serial

Err: serial

Net: egiga0



Marvell>>

Before flashing the system; initialise some variables to reduce typing. Append your log-in tothe files in order to avoid overwrites and name clashes.



Marvell>> setenv uboot u-boot.kwb.mleeman

Marvell>> setenv kernel uImage.mleeman

Marvell>> setenv flashfs rootfs.jffs2.mleeman

Configure the network to something more useful

Marvell>> setenv ipaddr 172.2.4.100


Marvell>> setenv gatewayip 172.0.0.1

Marvell>> setenv netmask 255.0.0.0

Verify and save your settings

Marvell>> printenv

bootcmd=${x_bootcmd_kernel}; setenv bootargs ${x_bootargs} ${x_bootargs_root}; ${x_bootcmd_usb}; bootm 0x6400000;

bootdelay=3

baudrate=115200

ethaddr=04:25:fe:ed:00:18

x_bootargs=console=ttyS0,115200 mtdparts=orion_nand:512k(uboot),3m@1m(kernel),1m@4m(psm),13m@5m(rootfs) rw




ethact=egiga0

uboot=u-boot.kwb

kernel=uImage

flashfs=rootfs.jffs2

ipaddr=172.2.4.100

serverip=172.0.0.1

gatewayip=172.0.0.1

netmask=255.0.0.0

stdin=serial

stdout=serial

stderr=serial


Marvell>> saveenv


Erasing Nand...




Now, you’re set to flash the remainder of the system. Though there are a number of ways toget the files on sheeva (e.g. USB or SD Card), use TFTP.

The kernel commandline gives a clear indication about the filesystem layout

x_bootargs=console=ttyS0,115200 mtdparts=orion_nand:512k(uboot),3m@1m(kernel),1m@4m(psm),13m@5m(rootfs) rw

• The bootloader is located at the beginning of the flash, and occupies 512 kB

• The kernel is located at offset 1 MB (0x100000) and occupies 3 MB (0x300000)

• A partition psm is located at 4MB (0x400000) and occupies 1MB (0x100000)

• The root filesystem is located at 5MB (0x500000) and occupies 13 MB (0xD00000).

It can be rewritten as

x_bootargs=console=ttyS0,115200 \

mtdparts=orion_nand:0x80000(uboot),0x300000@0x100000(kernel),0x100000@0x400000(psm),\

0xd00000@0x500000(rootfs) rw

For the remainder, attach a serial console to /dev/ttyUSB0. This is mainly done since upgradingfrom the bootloader is much faster than using the OpenOCD/JTAG access.

The flashing is done in two phases per file: first, the files are fetched over TFTP to the memoryof the SheevaPlug, second; the image is written from memory to flash. Note that in the following;the flash is erased again; but this is not required since the entire flash was already cleared.

Marvell>> tftp 0x8000000 ${kernel}

Using egiga0 device


Filename ’uImage’.


Loading: #################################################################

#################################################################

########################################################

done

Bytes transferred = 2720260 (298204 hex)




OK




Marvell>> tftp 0x8000000 ${flashfs}

Using egiga0 device


Filename ’rootfs.jffs2’.


Loading: #################################################################

#################################################################

#################################################################

#################################################################

#################################################################


#################################################################

#################################################################

####################################################

done

Bytes transferred = 7429476 (715d64 hex)

Marvell>> nand erase 0x500000 0x1fb00000

NAND erase: device 0 offset 0x500000, size 0xd00000

Erasing at 0xcf00000 -- 100% complete.

OK

Marvell>> nand write.e 0x8000000 0x500000 0xd00000

NAND write: device 0 offset 0x500000, size 0xd00000

Note the use of the variable. This process can further be scripted; so that the entire processcan be automated with one command.

Power down the device; attach power again and connect the serial line immediately after powerup.

Welcome to minicom 2.4

OPTIONS: I18n

Compiled on Dec 26 2009, 13:37:54.

Port /dev/ttyUSB0

Press CTRL-A Z for help on special keys

0

NAND read: device 0 offset 0x100000, size 0x300000

3145728 bytes read: OK

(Re)start USB...

USB: Register 10011 NbrPorts 1

USB EHCI 1.00

scanning bus for devices... 1 USB Device(s) found

scanning bus for storage devices... 0 Storage Device(s) found

## Booting kernel from Legacy Image at 06400000 ...

Image Name: Linux-2.6.34

Image Type: ARM Linux Kernel Image (uncompressed)

Data Size: 2720196 Bytes = 2.6 MiB

Load Address: 00008000

Entry Point: 00008000

Verifying Checksum ... OK

Loading Kernel Image ... OK

OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.

Linux version 2.6.34 (mleeman@cypher) (gcc version 4.3.5 (Buildroot

2010.08-git) ) #1 PREEMPT Fri Jul 23 09:07:37 CEST 2010

CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977

CPU: VIVT data cache, VIVT instruction cache

Machine: Marvell SheevaPlug Reference Board


Memory policy: ECC disabled, Data cache writeback

Built 1 zonelists in Zone order, mobility grouping on. Total pages: 130048

Kernel command line: console=ttyS0,115200

mtdparts=orion_nand:512k(uboot),3m@1m(kernel),1m@4m(psm),13m@5m(rootfs) rw

root=/dev/mtdblock3 rw rootfstype=jffs2

PID hash table entries: 2048 (order: 1, 8192 bytes)

Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)

Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)

Memory: 256MB 256MB = 512MB total

Memory: 513768k/513768k available, 10520k reserved, 0K highmem

Virtual kernel memory layout:

vector : 0xffff0000 - 0xffff1000 ( 4 kB)

fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)

DMA : 0xffc00000 - 0xffe00000 ( 2 MB)

vmalloc : 0xe0800000 - 0xfe800000 ( 480 MB)

lowmem : 0xc0000000 - 0xe0000000 ( 512 MB)

modules : 0xbf000000 - 0xc0000000 ( 16 MB)

.init : 0xc0008000 - 0xc0028000 ( 128 kB)

.text : 0xc0028000 - 0xc04ee000 (4888 kB)

.data : 0xc050a000 - 0xc05398c0 ( 191 kB)

SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1

Hierarchical RCU implementation.

NR_IRQS:114

Console: colour dummy device 80x30

Calibrating delay loop... 1192.75 BogoMIPS (lpj=5963776)

Mount-cache hash table entries: 512

CPU: Testing write buffer coherency: ok

NET: Registered protocol family 16

Kirkwood: MV88F6281-A0, TCLK=200000000.

Feroceon L2: Cache support initialised.

bio: create slab <bio-0> at 0

vgaarb: loaded

SCSI subsystem initialized

usbcore: registered new interface driver usbfs

usbcore: registered new interface driver hub

usbcore: registered new device driver usb

cfg80211: Calling CRDA to update world regulatory domain

Switching to clocksource orion_clocksource


IP route cache hash table entries: 4096 (order: 2, 16384 bytes)

TCP established hash table entries: 16384 (order: 5, 131072 bytes)

TCP bind hash table entries: 16384 (order: 4, 65536 bytes)

TCP: Hash tables configured (established 16384 bind 16384)

TCP reno registered

UDP hash table entries: 256 (order: 0, 4096 bytes)

UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)


RPC: Registered udp transport module.

RPC: Registered tcp transport module.

RPC: Registered tcp NFSv4.1 backchannel transport module.

JFFS2 version 2.2. (NAND) 2001-2006 Red Hat, Inc.

JFS: nTxBlock = 4013, nTxLock = 32110

msgmni has been set to 1003

alg: No test for stdrng (krng)


io scheduler noop registered

io scheduler deadline registered

io scheduler cfq registered (default)

Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled

serial8250.0: ttyS0 at MMIO 0xf1012000 (irq = 33) is a 16550A

console [ttyS0] enabled

brd: module loaded

loop: module loaded

NAND device: Manufacturer ID: 0xad, Chip ID: 0xdc (Hynix NAND 512MiB 3,3V

8-bit)

Scanning device for bad blocks

Bad eraseblock 1133 at 0x000008da0000

4 cmdlinepart partitions found on MTD device orion_nand

Creating 4 MTD partitions on "orion_nand":

0x000000000000-0x000000080000 : "uboot"

0x000000100000-0x000000400000 : "kernel"

0x000000400000-0x000000500000 : "psm"

0x000000500000-0x000001200000 : "rootfs"

MV-643xx 10/100/1000 ethernet driver version 1.4

mv643xx_eth smi: probed

net eth0: port 0 with MAC address 04:25:fe:ed:00:18

libertas_sdio: Libertas SDIO driver

libertas_sdio: Copyright Pierre Ossman

ehci_hcd: USB 2.0 ’Enhanced’ Host Controller (EHCI) Driver

orion-ehci orion-ehci.0: Marvell Orion EHCI

orion-ehci orion-ehci.0: new USB bus registered, assigned bus number 1

orion-ehci orion-ehci.0: irq 19, io mem 0xf1050000

orion-ehci orion-ehci.0: USB 2.0 started, EHCI 1.00

hub 1-0:1.0: USB hub found

hub 1-0:1.0: 1 port detected

Initializing USB Mass Storage driver...

usbcore: registered new interface driver usb-storage

USB Mass Storage support registered.

usbcore: registered new interface driver ums-datafab

usbcore: registered new interface driver ums-freecom

usbcore: registered new interface driver ums-jumpshot

usbcore: registered new interface driver ums-sddr09

usbcore: registered new interface driver ums-sddr55

mice: PS/2 mouse device common for all mice

rtc-mv rtc-mv: rtc core: registered rtc-mv as rtc0

i2c /dev entries driver

cpuidle: using governor ladder

cpuidle: using governor menu

sdhci: Secure Digital Host Controller Interface driver

sdhci: Copyright(c) Pierre Ossman

mmc0: mvsdio driver initialized, lacking card detect (fall back to polling)

mv_xor_shared mv_xor_shared.0: Marvell shared XOR driver

mv_xor_shared mv_xor_shared.1: Marvell shared XOR driver

mv_xor mv_xor.0: Marvell XOR: ( xor cpy )

mv_xor mv_xor.1: Marvell XOR: ( xor fill cpy )

mv_xor mv_xor.2: Marvell XOR: ( xor cpy )

mv_xor mv_xor.3: Marvell XOR: ( xor fill cpy )

usbcore: registered new interface driver hiddev

usbcore: registered new interface driver usbhid


usbhid: USB HID core driver

oprofile: using timer interrupt.

TCP cubic registered


lib80211: common routines for IEEE802.11 drivers

rtc-mv rtc-mv: setting system clock to 2034-09-14 09:19:41 UTC (2041838381)

VFS: Mounted root (jffs2 filesystem) on device 31:3.

Freeing init memory: 128K

Initializing random number generator... done.

Starting network...

Starting dropbear sshd: generating rsa key... generating dsa key... OK

9.4 Hands On - Tweak System

The current interactivity is rather limited. Have a look in /etc/inittab to see what is wrong andcauses this. Since dropbear is started; it might also be interesting to use a default DHCP protocol.Adjust the /etc/network/interfaces file and create new a new filesystem image.

The Sheeva has 512 MB flash available. In this setup; only a fraction has been used. Use theremaining flash to store another kernel and filesystem and boot from the second system.

9.5 Flashing the system from the bootloader

The SheevaPlug can also be flashed at home from a USB stick and without OpenOCD.Make certain that a serial console is attached to your plug and stop the prompt at the boot-

loader.Place the required files on your USB stick. The files u-boot.kwb, uImage and ubi.img should

be in the top directory. Connect the USB stick to your plug and:Initialise the attached USB drive. Note that I did this with a USB disk that contains a partition,

I don’t know if the bootloader will recoginise one that does not have a partition

nand device 0

usb start

• write u-boot. This is not really required, but heavily suggested to have better USB supportwrt the standard firmware (this is, of course, if you haven’t changed the default firmware,shipping with the SheevaPlug, with something better). After writing the bootloader, pressreset and let it use the new bootloader.


fatload usb 0:1 0x8000000 /u-boot.bin


reset

• write the kernel to the NAND flash

nand erase 0x100000 0x400000


nand write.e 0x8000000 0x100000 0x400000

• Finally, write the UbiFS root filesystem:

nand erase 0x500000 0x1fb00000

fatload usb 0:1 0x8000000 /ubi.img

nand write.e 0x8000000 0x500000 0x1fb00000


9.5.1 GNU/Debian

Since the SheevaPlug has 512 MB of internal NAND flash; a standard GNU/Debian (or otherdistribution that fits within this size) can easily be installed. See http://chiana.homelinux.

org/~marc/eib_sheeva.html for more info.Other options are to run a distribution from a MMC card or even a USB stick.

9.6 References

1. http://openfacts.berlios.de/index-en.phtml?title=OpenOCD_commands

2. http://chiana.homelinux.org/~marc/eib_sheeva.html

http://chiana.homelinux.org/~marc/eib_sheeva.html


http://openfacts.berlios.de/index-en.phtml?title=OpenOCD_commands


Chapter 10

Debugging with GDB

10.1 GDB and gdbserver

A very useful add-on to a minimal GNU toolchain is the GNU Debugger, usually called just GDB,is the standard debugger for the GNU software system. It is a portable debugger that runs onmany Unix-like systems and works for many programming languages, including C, C++, andFORTRAN.

The GNU debugger offers a simple command line interface and a lot of different commands.With the help of front-ends like DDD (Data Display Debugger, a graphical front-end for commandline debuggers), you get a powerful graphical user interface for the GNU debugger. DDD is a partof most GNU/Linux distributions.

An interesting component of GDB for embedded development is gdbserver, a small head-lessserver implementing the low level features of GDB. This allows debugging of userspace applicationson embedded Linux systems without having to run the big GDB natively.

An example usage of the command line interface of GDB is:

[mleeman@seraph ~]$ vim stack.c

[mleeman@seraph ~]$ gcc -o stack -g stack.c

[mleeman@seraph ~]$ gdb stack

GNU gdb 6.4-debian

Copyright 2005 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "powerpc-linux-gnu"...Using host libthread_db library

(gdb) run

Starting program: /users/firmware/mleeman/stack

This program will demonstrate gdb

Program received signal SIGSEGV, Segmentation fault.

0x100004f4 in function_2 (x=24) at stack.c:24

24 return *y;

(gdb) edit

(gdb) shell gcc -o stack -g stack.c

(gdb) run

The program being debugged has been started already.

Start it from the beginning? (y or n) y

‘/users/firmware/mleeman/stack’ has changed; re-reading symbols.

152

CHAPTER 10. DEBUGGING WITH GDB 153

Starting program: /users/firmware/mleeman/stack


24

Program exited normally.

(gdb)

10.2 gdb Remote debugging

Ever debugged a program remotely and felt like telling your computer where to go and how to getthere? Hopelessly adding calls to printf() and recompiling as a steady string of explectatives flowfrom your over-caffeinated brain waves.

In embedded systems development making do with less is the name of the game - less CPUpower, less physical RAM and less persistent storage (if any!) to name a few. Debugging a misbe-having process in this environment can be challenging, but a little ingenuity coupled with plentyof free software eases the problem.

10.2.1 Major Differences

While many differences exist between desktops and embedded systems running GNU/Linux, thebiggest difference is size and power. Embedded systems have very specific design goals limitingthe overall power consumption and physical dimensions. This leads to the use of low power CPUs,limited physical RAM and little or no persistent storage devices.

The next major difference is the CPU architecture - embedded systems often use low powerCPUs that are not x86 based. To compile programs from your x86 based desktop you need ”cross-compilation tools”, which run on your local desktop, but generate executable code for the targetarchitecture. In this article I will be cross-compiling for the PowerPC architecture.

The last major difference is perspective. You will be debugging a process that is executing on aremote CPU not your local workstation. This requires a slightly different mindset then traditionaldebugging.

10.2.2 ELF and Binutil Background

Before getting to the nuts and bolts here’s a quick review about how executable code and debugginginformation is stored in an ELF binary. Most modern *NIX systems use the the ELF format forexecutables and shared libraries.

On a GNU/Linux system a family of utilities called binutils exists for examining and manipu-lating ELF objects. In a cross-compiling development environment the usual tool names like gcc,gdb and all the binutils will have a prefix that describes the target architecture.

Let’s play around with some of the binutils using a simple stack application:


#include <stdio.h>

int function 1(void){

int x = function 2(24);return x;

}

int function 2(int x){

int ∗y = (int ∗)x;return ∗y;

}

int main(void){

int x;printf("This program will demonstrate gdb\n");x=function 1();printf("%d", x);return 0;

}

First let’s compile the program with debugging symbols using the -g option.

$ powerpc-linux-uclibc-gcc -g -o stack stack.c

Note I used the cross compiler, powerpc-linux-uclibc, to compile the program for the Pow-erPC 83xx target.

On my system the resulting binary size is 10257 bytes.To see what symbols the binary contains use the nm binary utility. Remember I’m using the

PowerPC version of nm, ppc 8xx-nm.

[mleeman@seraph code]$ powerpc-linux-uclibc-nm stack

10010894 A _DYNAMIC

10010974 T _GLOBAL_OFFSET_TABLE_

[... stuff deleted]

10010964 W data_start

10000510 t frame_dummy

1000059c T function_1

10000594 T function_2

100005a8 T main

10010a14 b object.2

1001096c d p.0

U printf

U puts

[mleeman@seraph code]$

The interesting lines for us are near the bottom:

100005a8 T main

U printf

The first line shows the address of the main() function is 0x100005a8. ”T” means the textsection, which is an old term for the section where the code resides. The next line shows thatprintf() is an unresolved symbol and will be loaded from a shared library at run time.


The ELF format defines sections where various information about the executable is stored.The most interesting sections are:

text : where the executable code lives

data : where global variables live

rodata : where global ”read only” constants live

In addition several other sections also are present, including sections containing the debugginginformation. To see all the sections and their sizes use the objdump binutil with the -h option todisplay section headers.

[mleeman@seraph code]$ powerpc-linux-uclibc-objdump -h stack

stack: file format elf32-powerpc

Sections:

Idx Name Size VMA LMA File off Algn

0 .interp 00000014 100000f4 100000f4 000000f4 2**0

CONTENTS, ALLOC, LOAD, READONLY, DATA

1 .hash 00000090 10000108 10000108 00000108 2**2


2 .dynsym 00000110 10000198 10000198 00000198 2**2


3 .dynstr 000000c1 100002a8 100002a8 000002a8 2**0


4 .gnu.version 00000022 1000036a 1000036a 0000036a 2**1


5 .gnu.version_r 00000020 1000038c 1000038c 0000038c 2**2


6 .rela.dyn 00000024 100003ac 100003ac 000003ac 2**2


7 .rela.plt 0000003c 100003d0 100003d0 000003d0 2**2


8 .init 00000024 1000040c 1000040c 0000040c 2**2

CONTENTS, ALLOC, LOAD, READONLY, CODE

9 .text 00000404 10000430 10000430 00000430 2**2


10 .fini 00000020 10000834 10000834 00000834 2**2


11 .rodata 00000027 10000854 10000854 00000854 2**2


12 .sdata2 00000000 1000087c 1000087c 0000087c 2**2


13 .eh_frame 00000004 1000087c 1000087c 0000087c 2**2


14 .ctors 00000008 10010880 10010880 00000880 2**2

CONTENTS, ALLOC, LOAD, DATA

15 .dtors 00000008 10010888 10010888 00000888 2**2


16 .jcr 00000004 10010890 10010890 00000890 2**2


17 .dynamic 000000d0 10010894 10010894 00000894 2**2



18 .data 0000000c 10010964 10010964 00000964 2**2


19 .got 0000001c 10010970 10010970 00000970 2**2

CONTENTS, ALLOC, LOAD, CODE

20 .sdata 00000000 1001098c 1001098c 0000098c 2**2


21 .sbss 00000000 1001098c 1001098c 0000098c 2**0

ALLOC

22 .plt 00000084 1001098c 1001098c 0000098c 2**2

ALLOC, CODE

23 .bss 0000001c 10010a10 10010a10 0000098c 2**2

ALLOC

24 .comment 00000094 00000000 00000000 0000098c 2**0

CONTENTS, READONLY

25 .debug_aranges 00000020 00000000 00000000 00000a20 2**0

CONTENTS, READONLY, DEBUGGING

26 .debug_pubnames 00000039 00000000 00000000 00000a40 2**0


27 .debug_info 00000118 00000000 00000000 00000a79 2**0


28 .debug_abbrev 000000ac 00000000 00000000 00000b91 2**0


29 .debug_line 00000046 00000000 00000000 00000c3d 2**0


30 .debug_frame 00000048 00000000 00000000 00000c84 2**2


31 .debug_str 000000a0 00000000 00000000 00000ccc 2**0


WOW! That’s a lot of sections and many of them contain debug information. These sectionscan add quite a bit of size to an executable and none of it is essential to running the program.The information is only useful when trying to debug.

Aside: the ctors and dtors sections are for ”constructors” and ”destructors”, like those usedfor static C++ objects.

In order to save as much space as possible on an embedded system we often strip off all thenon-essential information from the ELF sections. The binutil strip does just this.

[mleeman@seraph code]$ cp stack stack.orig

[mleeman@seraph code]$ powerpc-linux-uclibc-strip stack

[mleeman@seraph code]$ ls -al stack stack.orig

-rwxr-x--- 1 mleeman firmware 3872 Aug 17 17:11 stack

-rwxr-x--- 1 mleeman firmware 10257 Aug 17 17:11 stack.orig

Now the size of my executable is 3872 bytes, a reduction of 6385 bytes. That is over an 60%reduction. But it comes at a price. Without the debug sections debugging will be difficult.

10.3 Remote Debugging With GDB

GDB offers a remote mode often used when debugging embedded systems. Remote operation iswhen GDB runs on one machine and the program being debugged runs on another. GDB cancommunicate to the remote ’stub’ which understands GDB protocol via Serial or TCP/IP.

The same mode is also used by KGDB for debugging a running linux kernel on the source levelwith gdb. With kgdb, kernel developers can debug a kernel similar to application programs. Itmakes it possible to place breakpoints in kernel code, step through the code and observe variables.


On some architectures, where hardware debugging registers are available, also watchpoints canbe set which trigger a breakpoint when a certain memory address is executed or accessed. kgdbrequires an additional machine which is connected to the machine to be debugged using a serialcable or Ethernet. On FreeBSD, also debugging using Firewire DMA is possible.

Write your C program and translate the C source code with the GNU cross C compiler to anexecutable and a symbol file. Use the following command line with the -g parameter. This samplecommand line builds an executable, called stack from a source code file with the name stack.c.

$ powerpc-linux-uclibc-gcc -Wall -g -Os -o stack stack.c

Transfer the executable from your PC hard disk drive to the NFS root filesystem or to theRAM of your target board and run the executable on the target with the help of gdbserver. Thetransfer itself can be done with cp (in case of NFS) or scp1.

target$ cd /tmp

target$ scp [email protected]:code/stack .

target$ gdbserver 150.158.231.13:2200 ./stack

The first command puts your working directory in /tmp, a place where we typically mount aRAM filesystem. The second command transfers the executable loop from the server to the target.The third command starts the command loop with gdbserver. You need the IP address of theserver together with a TCP/IP port number. We use the port number 2200 for this example.

Run the GNU cross debugger powerpc-linux-uclibc-gdb with the help of DDD on your server.Use the following command line. The parameter --debugger powerpc-linux-uclibc-gdb tellsddd the name of the debugger and stack is the file name of the program to debug.

$ ddd --debugger powerpc-linux-uclibc-gdb

Now the debugger waits for your debugging commands. First please enter always the followingcommand line within the ddd command line window (see Figure 10.1).

(gdb) file stack

(gdb) target remote 150.158.231.6:2200

This debugger command line is setting up the Ethernet-based TCP/IP connection betweenthe board and the server. Please use the same TCP/IP port number. The sample command lineassumes that the board is using the IP address 10.2.4.0.

Then set your breakpoints within the C source code and run your program with your remotedebugging session between the PC and the target.

$ gdbserver 150.158.231.13:2200 ./stack

Process ./stack created; pid = 131

Listening on port 2200

Remote debugging from host 150.158.231.13


DDD allows you to set breakpoints with your mouse. Just put the mouse cursor over the sourcecode line of your choice and press the right hand mouse button. Then use the command button forC ontinue from the command button menu window for the running the program. The program runsto the next (or first) breakpoint. You can also use the command button S tep for single stepping atC language level through your program. If the program execution stops, you can enter debuggercommands within the DDD command line window. for example:

(gdb) show version


Figure 10.1: Running ddd with a remote target.

The GNU debugger then shows some copyright and version information and the current con-figuration.

When working with the BDI2000 debugger, it can be used as a low level gdbserver for kerneland bootloader debugging. Just connect to the IP of the BDI probe on port 2001 in ddd.

10.4 Talking Dirty with GDB and SSH Tunnelling

So now we have a stripped application that we can execute on our embedded system. But what ifit is crashing and we want to debug it? A couple of obstacles sit in our way.

First, the target system has limited storage so we did not bother to put a cross-compiled versionof GDB on it. On the development workstation the GDB executable alone is 4322408 bytes, over4 megabytes. Clearly that won’t leave much room for anything else if only 16 MB flash is available(let alone room to build in our required redundancy).

[mleeman@seraph code]$ du -ks $(which powerpc-linux-uclibc-gdb)

4236 /opt/barco/20060727/toolchain_uclibc_powerpc/bin//powerpc-linux-uclibc-gdb

1In these examples, we are using 2 networks: a public Barco network (150.158.231.x) and a local class A network(10.x). Initially, 150.158.231.6:2200 is forwarded with iptables NAT to 10.2.4.10:2200; but we’ll get to that in detaillater.


The answer is to use gdbserver, a small footprint server that implements the low level featuresof GDB.

Consider the following diagram in Figure 10.2 - using TCP the feature-rich GDB on the serverseraph connects to the light-weight gdbserver running on the embedded system (target). Mostof the heavy lifting is done by the GDB on the workstation, while gdbserver deals with the lowlevel interactions.

Figure 10.2: Lab setup with a workstation on a LAN (10.x); public servers (150.158.231.x) andembedded targets on the LAN. The gateway (niobe) is not directly accessible but provides an sshtunnel on port 22 to gemini on the LAN.

On my system gdbserver is only 64092 bytes, a considerable improvement over the size of thefull GDB program.

In the following examples the workstation, seraph, has the IP address of 150.158.231.13 andthe embedded system, svc has an IP address of 10.2.4.10 as shown in the above diagram.

The first step is to attach the gdbserver to a process on the target system. You can havegdbserver start a program and attach to it immediately or you can attach to an already runningprocess using the process ID (pid). The last argument you need to specify is the TCP port thatgdbserver will listen on. Here’s the syntax:

gdbserver host:2200 PROGRAM [ARGS...]

gdbserver host:2200 --attach PID

In the above examples the gdbserver would listen on port 2200. Using gdbserver to launch ourhello world application on the embedded system looks like this:

$ gdbserver 150.158.231.13:2200 ./stack



The prompt does not comeback as the gdbserver is now blocking, waiting for connections onport 2200.

The next step is to start the main GDB program on your workstation and connect to thegdbserver process. In order for the main GDB debug my program it needs to examine the ”un-stripped” version of executable that contains all of the debugging symbols. The simplest thingto is to chdir to the directory containing the unstripped executable and start the cross compiledversion of the gdb, like this:

[mleeman@seraph code]$ powerpc-linux-uclibc-gdb

GNU gdb 6.4







This GDB was configured as "--host=powerpc-pc-linux-gnu --target=powerpc-linux-uclibc"...

To ”tell” GDB to read the symbols from the unstripped executable use the GDB file commandlike this:

(gdb) file stack.orig

Load new symbol table from "/users/firmware/mleeman/code/stack.orig"? (y or n) y

Reading symbols from /users/firmware/mleeman/code/stack.orig...done.

If your application uses shared libraries (and you want to debug them) and most real worldapplications do, then you also need to tell GDB where to locate these libraries. These need to bethe unstripped versions of these libraries so that GDB can tell you more info.

Set the GDB ”solib-search-path” variable so that GDB can find the shared libraries used byyour application, like this:

(gdb) set solib-search-path [path to libraries]

If your application has a lot of shared libraries spread all over your source tree (and most realworld ones do) then here’s a little trick for the solib-search-path variable. Create one directoryand populate it with symlinks to all of your shared libraries. Then you need only specify this onedirectory when setting the solib-search-path variable. Comes in handy.

Now we are ready to connect to the gdbserver running on the embedded system. We use thetarget remote command from the main GDB command prompt, like this:

(gdb) target remote 150.158.231.6:2200

Remote debugging using 150.158.231.6:2200

0x30000c90 in ?? ()

warning: Unable to find dynamic linker breakpoint function.

GDB will be unable to debug shared library initialisers

and track explicitly loaded dynamic code.

On the embedded system console you should see this output:


Now we are connected and the program being debugged is currently paused. Now would be agood time to set some break points and then continue running the program. Here’s an example:

(gdb) b main

Breakpoint 1 at 0x100005bc: file stack.seg.c, line 7.

(gdb) c

Continuing.

Error while mapping shared library sections:

/lib/libc.so.0: No such file or directory.


/lib/ld-uClibc.so.0: No such file or directory.

Error while reading shared library symbols:




Breakpoint 1, main () at stack.seg.c:7

7 printf("This program will demonstrate gdb\n");


And there we are! We are remotely debugging the stripped executable. Pretty cool, huh? I loveit! You can now use all your favourite GDB commands and techniques to debug.

10.5 SSH Tunnelling and GDB

Suppose you have the network topology shown in Figure 10.2 and you want to debug a processrunning on the host target. In this topology the host niobe is dual homed with one interface onthe 10.0.0.0/8 network and one on the 150.158.231.0/24 network. The host target is also on the10.0.0.0 network, while the development server is on the 150.158.231.0 network.

The diagram also depicts a serial console connection from seraph to target - serial UARTconnections are very common on embedded systems for debug purposes, a back door for when thenetwork connection is not working. Even though they are a bit slow, UARTs are cheap, reliableand easy to configure. A serial console is usually the first bit of hardware tested out when bringingup a new board.

The problem here is that no routable network path exists from seraph to target - niobe

cannot act as a gateway and forward packets in the normal manner. This is what we have beendoing until now by adjusting the NAT tables on the firewall.

However, changing these is very sensitive. It is unlikely that many people will have access tothese tables, let alone root access to restart the firewall on the niobe gateway host. Next to this,using a UART cable from a server is also out of the question.

Another method is to use the port forwarding capabilities of your old friend, ssh. The trickhere is to forward connections on a local seraph port to a port on target - as a side benefit thetraffic on the tunnelled port is also encrypted by the SSH protocol.

Log in on the target from the local 10.0.0.0 network and perform basic commands. Just like inthe previous examples; we start gdbserver on target:

$ gdbserver 150.158.231.13:2200 ./stack



Now we need to create an SSH tunnel from seraph to target via niobe2. Here’s the commandto do that:

[mleeman@seraph code]$ ssh -L 4000:10.2.4.10:2200 [email protected]

Linux gemini 2.6.8-1-686 #1 Thu Oct 7 03:15:25 EDT 2004 i686 GNU/Linux

The programs included with the Debian GNU/Linux system are free software;

the exact distribution terms for each program are described in the

individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent

permitted by applicable law.

You have new mail.

Last login: Thu Aug 17 12:37:21 2006 from 10.4.0.3

[mleeman@gemini ~]$

This opens a listening TCP socket on the local host, seraph:4000. Whenever a connection ismade to this socket it is forwarded to the sshd process on niobe, which then opens a connectionto target:2200. This is shown in the following diagram.

Now we can start GDB on seraph like before, but this time the ”target host” command uselocalhost:4000, like this:

2In fact, we do not have direct access to our gateway server niobe, but we know it forwards connections on port22 to a server within the network, so this will server our purpose just fine since it has the same effect: it is as if theserver gemini is just the second interface on niobe.


Figure 10.3: After putting the ssh tunnel in place, the connections on 150.158.231.13, port 4000are forwarded over TCP to the target 10.2.4.10 on port 2200.

[mleeman@seraph code]$ powerpc-linux-uclibc-gdb

GNU gdb 6.4






This GDB was configured as "--host=powerpc-pc-linux-gnu --target=powerpc-linux-uclibc".

(gdb) file stack.orig

Reading symbols from /users/firmware/mleeman/code/stack.orig...done.

(gdb) target remote localhost:4000

Remote debugging using localhost:4000

0x30000c90 in ?? ()

warning: Unable to find dynamic linker breakpoint function.

GDB will be unable to debug shared library initialisers

and track explicitly loaded dynamic code.

(gdb) b main

Breakpoint 1 at 0x100005bc: file stack.seg.c, line 7.

(gdb) c

Continuing.









Breakpoint 1, main () at stack.seg.c:7

7 printf("This program will demonstrate gdb\n");

This connection is tunnelled via niobe (and gemini) to target:2200. On the target sshconsole you should see:



Note target thinks the debug request is coming from niobe, not seraph.Now you can proceed to debug as before.

10.6 References

• http://en.wikipedia.org/wiki/Cross-compile

• http://www.cucy.net/lacp/archives/000024.html

• GCC: http://gcc.gnu.org

• Glibc: http://www.gnu.org/software/libc/

• uClibc: http://www.uclibc.org

• Crosstool-NG: http://ymorin.is-a-geek.org/dokuwiki/projects/crosstool


http://en.wikipedia.org/wiki/Cross-compile

http://www.cucy.net/lacp/archives/000024.html

http://gcc.gnu.org

http://www.gnu.org/software/libc/

http://www.uclibc.org

http://ymorin.is-a-geek.org/dokuwiki/projects/crosstool


Appendix A

The GNU/Linux System

A.1 Introduction

Linux is a computer operating system and its kernel. It is one of the most prominent examplesof free software and of open-source development; unlike proprietary operating systems such asWindows, all of its underlying source code is available to the public for anyone to freely use,modify, improve, and redistribute.

In its narrowest sense, the term Linux refers to the Linux kernel, but it is commonly used todescribe complete Unix-like operating systems (also known as GNU/Linux) based on a combina-tion of the kernel and components from the GNU Project and elsewhere. A Linux distributionbundles many applications with the core system, and can provide standardised installation andupgrades. Desktop environments such as GNOME and KDE are sometimes associated with Linuxand referred to as integral components of the system, though they are independent packages thatfunction on several Unix-like systems.

Initially, Linux was primarily developed and used by individual enthusiasts. Since then, Linuxhas gained the support of major corporations such as IBM, Sun Microsystems, Hewlett-Packard,and Novell for use in servers and is gaining popularity in the desktop market[1]. Proponents andanalysts attribute this success to its vendor independence (the opposite of vendor lock-in), lowcost, security, and reliability.

Linux was originally developed for Intel 386 microprocessors. It now supports virtually allpopular computer architectures, as well as several obscure ones. It is deployed in applicationsranging from embedded systems (such as mobile phones and personal video recorders) to personalcomputers to supercomputers.

A.2 History

In 1983, Richard Stallman founded the GNU project, which today provides key elements of mostLinux systems (see also GNU/Linux, below). The goal of GNU was to develop a complete Unix-like operating system composed entirely of Free Software. By the beginning of the 1990s, GNUhad produced or collected most of the necessary components of this system – libraries, compilers,text editors, a Unix-like shell – except for the lowest level, the kernel. The GNU project begandeveloping its own kernel, the Hurd, in 1990, based on the Mach microkernel. This Mach-baseddesign subsequently proved unexpectedly difficult, however, and the Hurd’s development proceededslowly.

Meanwhile, in 1991, another kernel – eventually named ”Linux” – was begun as a hobby byFinnish university student Linus Torvalds while attending the University of Helsinki. Torvaldsoriginally used Minix, a simplified Unix-like system written by Andrew Tanenbaum for teachingoperating system design. However, Tanenbaum did not permit others to extend his operatingsystem, leading Torvalds to develop a replacement for Minix. Linux started out as a terminal

164

APPENDIX A. THE GNU/LINUX SYSTEM 165

Figure A.1: Richard Stallman, founder of the GNU project for a free operating system.

emulator written in IA-32 assembler and C, which was compiled into binary form and bootedfrom a floppy disk so that it would run outside of any operating system. The program ran twothreads one for sending and one for receiving characters from the serial port. When Linus neededto read and write files to disk, this task-switching terminal emulator was extended with an entirefile-system handler. After that, it gradually evolved into an entire operating system kernel intendedas a foundation for POSIX-compliant systems. The first version of the Linux kernel (0.01) wasreleased to the Internet on September 17, 1991, with the second version following shortly thereafterin October [2]. Since then, thousands of developers from around the world have participated inthe project.

By the 0.01 release, Linus had implemented enough POSIX system calls to make Linux run theGNU Bash shell; after this bootstrapping procedure, development accelerated rapidly. A computerrunning Minix was originally necessary in order to configure, compile, and install Linux. Initialversions of Linux also required an operating system to be present in order to boot from a harddisk, but soon there were independent bootloaders, the most well known being LILO. The Linuxsystem quickly surpassed Minix in functionality; Torvalds and other early Linux kernel developersadapted their kernel to work with the GNU components and user-space programs to create acomplete, fully functional, free operating system.

Today, Torvalds continues to direct the development of the kernel, while other subsystems suchas the GNU components are developed separately. The task of producing an integrated systemwhich combines these components with a graphical interface and application software is nowperformed by Linux distribution vendors and organisations.

Tux the penguin, based on an image created by Larry Ewing in 1996, is the logo and mascotof the Linux kernel, and by extension has become known as the mascot for Linux-based systems.

The name ”Linux” was coined not by Torvalds but by Ari Lemmke. Lemmke was working forthe Helsinki University of Technology (TKK), located in Espoo near Helsinki, as an administratorof ftp.funet.fi, an FTP server which belongs to the Finnish University and Research Network(FUNET), which has numerous organisations as its members, amongst them the TKK and theUniversity of Helsinki. He invented the name Linux for the directory from which Torvalds’ projectwas first available for download [3]. (The name Linux was derived from ”Linus’ Minix”.) The namewas later trademarked (see below). Originally, Linus was going to call it Freax for ”free” and withthe often-used X in the names of Unix-like systems.

A.3 Licensing

The Linux kernel, along with most of the GNU components, is licensed under the GNU GeneralPublic License (GPL). The GPL requires that all source code modifications and derived works alsobe licensed under the GPL, and is sometimes referred to as a ”share and share-alike” (or copyleft)


Figure A.2: Linus Torvalds, creator of the Linux kernel.

license. In 1997, Linus Torvalds stated, ”Making Linux GPL’d was definitely the best thing I everdid.” [4] Other subsystems use other licenses, although all of them share the property of beingfree / open-source; for example, several libraries use the LGPL (a more-permissive variant of theGPL), and the X Window System uses the permissive (non-copyleft) MIT License.

The Linux trademark (U.S. Reg No: 1916230) is owned by Linus Torvalds, registered for ”Com-puter operating system software to facilitate computer use and operation.” The licensing of thetrademark is now handled by the Linux Mark Institute (LMI). LMI has also sought to enforce theLinux trademark in countries other than the U.S.. In September 2005, Intellectual Property Aus-tralia, the trademark regulator in Australia, rejected an application to trademark Linux. Torvaldshas stated that he only trademarked the name to prevent someone else doing so, but was boundin 2005 by United States trademark law to take active measures to enforce the trademark. As aresult, the LMI sent out a number of letters to distribution developers requesting that a fee bepaid for the use of the name, with which a number of companies complied.

A.4 Linux and GNU/Linux

Because the GNU libraries and programs, an essential part of nearly all Linux distributions, stemfrom a free operating system project that predates the Linux kernel, Richard Stallman and the FreeSoftware Foundation ask that the combined system be referred to as ”GNU/Linux” or ”a Linux-based GNU system”. Torvalds, the creator of the Linux kernel, has said that he finds calling Linuxin general GNU/Linux ”just ridiculous”. Some distributions do use this name – notably DebianGNU/Linux – while most people simply refer to the system as Linux. The distinction betweenTorvalds’ kernel and entire Linux-based systems that contain the kernel is a perennial source ofconfusion, and the naming remains controversial.


A.5 Distributions

Linux is predominantly used as part of a Linux distribution (commonly called a ’distro’). Theseare compiled by individuals, loose-knit teams, and various professional organisations. They includeadditional system software and application programs, as well as certain processes to install thesesystems on a computer. Distributions are created for many different purposes, including localisa-tion, architecture support, real-time applications, and embedded systems, and many deliberatelyinclude only free software. Over 450 distributions are available [8].

A typical general-purpose distribution includes the Linux kernel, some GNU libraries and tools,command-line shells, the graphical X Window System and an accompanying desktop environmentsuch as KDE or GNOME, together with thousands of application software packages, from officesuites to compilers, text editors, and scientific tools. Throughout the remainder of this text, allscreenshots are taking from a GNU/Linux desktop (e.g. see Figure A.3).

A.6 Development efforts

More Than a Gigabuck: Estimating GNU/Linux’s Size, a study of Red Hat Linux 7.1, foundthat this particular distribution contained 30 million source lines of code (SLOC). The Linuxkernel contained 2.4 million lines of code, or 8% of the total. Using the Constructive Cost Model(COCOMO), the study estimated that this distribution required about eight thousand person-years of development time. Had all this software been developed by conventional proprietarymeans, it would have cost 1.08 billion dollars (year 2000 U.S. dollars) to develop in the UnitedStates. Slightly over half of the code in that distribution was licensed under the GPL.

In a later study, Counting potatoes: the size of Debian 2.2, the same analysis was performedfor Debian GNU/Linux version 2.2. This distribution contained over fifty-five million source linesof code, and the study estimated that it would have cost 1.9 billion dollars (year 2000 U.S. dollars)to develop by conventional proprietary means.

The source code for the Linux kernel used to be maintained using the software applicationcalled BitKeeper but, partly because of a license dispute, it is now maintained via Git, the newdirectory content manager created by Linus Torvalds himself.

A.7 Applications

In the past, a user needed significant knowledge of computers in order to install and configureLinux. Because of this, and because of being attracted by access to the internals of the system,Linux users have traditionally tended to be more technologically oriented than users of MicrosoftWindows and Mac OS, sometimes revelling in the tag of ”hacker” or ”geek”.

This stereotype has been dispelled in recent years by the increased user-friendliness and broadadoption of many Linux distributions. Linux has made considerable gains in server and special-purpose markets, such as image rendering and Web services, and is now making inroads into thehigh volume desktop market.

Linux is the cornerstone of the so-called LAMP server-software combination (Linux, Apache,MySQL, Perl/PHP/Python) that has achieved widespread popularity among Web developers,making it one of the most common platforms on the Web. A prominent example of this softwarecombination in use is MediaWiki – the software primarily written for Wikipedia. Additionally,Linux has a plethora of database software such as MySQL, Sybase ASE (Linux application),mSQL and others.

The multi-billion dollar video game industry will see widespread Linux use with the 2006launch of the Sony PlayStation 3 video game console which will run Linux out of the box. Sonyhas previously released a PS2 Linux kit for their PlayStation 2 video game console.

Linux is also often used in embedded systems. Its low cost makes it particularly useful inset-top boxes and for devices such as the Simputer, a computer aimed mainly at low-incomepopulations in developing nations. In mobile phones, Linux has become a major competitor to the


Figure A.3: A GNOME Desktop.

proprietary Symbian OS software. In handheld devices, it is an increasingly popular alternative tothe Windows CE and Palm OS operating systems (PalmSource has stated that future versions ofPalm OS Cobalt will be built as a layer on top of the Linux kernel[9]). The popular TiVo digitalvideo recorder also uses a customised version of Linux. A large number of network firewalls androuters, including several from Linksys and Netgear, use Linux internally, taking advantage ofthe advanced firewalling and routing capabilities built in the kernel itself. The TomTom satellitenavigation system also uses an embedded version of the Linux kernel. Linux is also expanding intotelecommunications equipment through efforts such as Carrier Grade Linux.

Linux is increasingly common as an operating system for supercomputers, most recently on64-bit AMD Opterons in the Cray XD1. As of June 2005, the 3 fastest supercomputers in theworld (as recorded by the Top500) run Linux.

Linux is rapidly gaining popularity as a desktop operating system. In desktop environmentslike GNOME and KDE, Linux may be used with a user interface that is similar to that of MacOS, Microsoft Windows, or other desktop environments, and its traditional Unix-like commandline interface. Graphical Linux software exists for almost any area and in some areas there is agreater quality and quantity of software available than for proprietary operating systems.

A.8 Usability

Once viewed as an operating system only computer professionals and aficionados could use, Linuxdistributions have become user-friendly, with many graphical interfaces and applications.

Users might have to switch application software, and there may be fewer options, as in the caseof computer games. Equivalents of some specific programs may not be available. However, generalapplications like spreadsheets, word processors, and browsers are available for Linux in profusion.

Linux and other free software projects have been frequently criticised for not going far enoughin terms of ensuring usability, and Linux was once considered more difficult to use than Windowsor the Macintosh, although this has changed. Applications running within graphical desktop envi-ronments such as GNOME and KDE in Linux are very similar to those running on other operating


systems. While some very specific application may not be available for Linux, there usually existsa replacement of equal or better quality. A growing number of proprietary software vendors aresupporting Linux, and open source development for Linux is also steadily increasing. Additionally,proprietary software for other operating systems may be run through compatibility layers, suchas Wine. The area of hardware and services configuration is where user experience is most varied.GUI configuration tools and control panels are available for many system settings and services, butediting of plain-text configuration files is often required. On the command shell, many usabilityhang-ups from early Unix days generally remain, such as the difficulty in finding some commands,and the inability to undo many operations such as file deletion. Many older programs with textuser interfaces (TUI) have wild inconsistencies between them, but they maintain loyal followings.

There have been conflicting studies of Linux’s usability in the past. Relevantive, the renownedBerlin-based organisation specialising in providing consultation to companies on the usability ofsoftware and Web services, concluded that the usability of Linux for a set of desktop-related tasksis ”equal to Windows XP.” Since then, there have been numerous independent studies and articles,such as [10] [11] [12] that show that a modern Linux desktop using Gnome or KDE is on par withor superior to Microsoft Windows.

A.9 Market share

Its market share of desktops has been steadily growing, with the most noticeable gains being madein Linux desktop adoption in the last five years. According to market research company IDC, in2002, only 25% of servers and 2.8% of desktop computers were already running Linux. However,argued advantages of Linux, such as lower cost, fewer security vulnerabilities [13], and lack ofvendor lock-in, have spurred a growing number of high-profile cases of mass adoption of Linux bycorporations and governments. The Linux market is among the fastest growing and is projected toexceed $35.7 billion by 2008 [14] (this statistic is not comparable to capitalised operating systemslike Windows - since Linux is free to use and modify).

Because of reluctance to change and the fact that many computers still come with MicrosoftWindows pre-installed, there has been a slow initial adoption of new desktop operating systems.Linux is past that stage now, with numerous manufacturers installing Linux and many organisa-tions having five or more years experience with Linux - since installation evolved to graphical userinterfaces - or Unix, which has been around for decades. Linux is rapidly gaining popularity as adesktop operating system as it is increasingly used in schools and workplaces and more people arebecoming familiar with it.

Support for certain new and obscure hardware remains an issue. Though some vendors providedevice drivers, many device drivers must be developed by volunteers after the release of the prod-uct. Often, this development requires reverse engineering of some sort, as certain manufacturersremain secretive and refuse to provide the hardware or firmware specifications for their products.Deliberately non-portable hardware drivers like Winmodems and Winprinters have been a generalproblem.

Linux distributions have been criticised for unpredictable development schedules, thus makingenterprise users less comfortable with Linux than they might be with other systems (Marcinkowski,2003). However, some observers claim that the intervals between Linux distribution releases are noworse, and often better, than the project management ”schedule slipping” that occurs with otheroperating systems and with software systems in general. The large number of choices of Linuxdistributions can also confuse users and software vendors.

The total cost of ownership (TCO) of Linux has been a contentious issue surrounding theadoption of Linux. Some studies, such as those by IDC and Gartner have argued that Linux hada higher TCO than Windows. Others, such as those by Soeren Research and RFG claim theopposite. Many of the studies, most notably studies which were later found to have been fundedby Microsoft themselves, have been criticised as unbalanced and biased. Also, a number of studieswhich gave results favourable to Linux were commissioned by companies such as IBM and Novell.

The paper Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!


identifies many quantitative studies of open source software, on topics including market share andreliability, with many studies specifically examining Linux.

A.10 Installation

In the past, difficulty of installation was a barrier to wide adoption of Linux-based systems,but the process has been made easy in recent years. Many distributions are at least as easyto install as a comparable version of Windows. It is unnecessary to file license numbers andenter them during installation. Also, it is not normally necessary to install drivers after installingLinux, as most hardware is supported out of the box. Further, personal computers that come withLinux distributions already installed are readily available from numerous vendors, including largemainstream vendors like Hewlett-Packard and Dell.

The most common method of installing Linux, supported by all major distributions, is bybooting from a CD that contains the installation program and installable software. Such a CDcan be burned from a downloaded ISO image, purchased alone for a low price, or can be obtainedas part of a box set that may also include manuals and additional commercial software.

Some distributions, such as Debian, can be installed from a small set of floppy disks. After abasic system is installed, more software can be added by downloading it from the Internet or usingCDs.

Other distributions, such as Knoppix, can be run directly from a ”live CD” running entirely inRAM, rather than installing it to the hard drive. With this, one boots from the CD and can useLinux without making any modification to the contents of the hard drive. Similarly, some minimaldistributions, such as tomsrtbt, can be run directly from as little as 1 floppy disk without needingto change the hard drive contents.

Still another mode of installation of Linux is to install on a powerful computer to use as aserver and to use ordinary less powerful machines (perhaps without hard drives, and having lessmemory and slower CPUs) as thin clients over the network. Clients can boot over the networkfrom the server and display results and pass information to the server where all the applicationsrun. A Linux Terminal Server is a single machine to which many clients can connect this way, soone obtains the benefit of installing Linux on many machines for the cost of installing on one. Theclients can be ordinary PCs with the addition of the network bootloader on a drive or networkinterface controller. Variations on this mode include using local drives and computing power to runapplications. The cost savings achieved by using thin clients can be invested in greater computingpower or storage on the server.

Many distributions also support booting over a network, so an installation on a properlyconfigured machine can be done remotely.

Anaconda, one of the more popular installers, is used by Red Hat Linux, Fedora Core andother distributions to simplify the installation process. It is famous for its ability to automaticallypartition a hard drive using the Disk Druid utility.

A.11 Installation on an existing platform

Many distribution companies now are sparing no effort to provide users with advanced, easy andspecific installations. Some beginners (especially those familiar with Microsoft Windows and MacOS) may still feel that making the shift can be hard but many solutions have been created to solvethis problem.

Some let the user install Linux on top of their current system. Consider WinLinux, for example.After downloading the installer (more than 100MB), the user can install Linux just like any otherWindows application. The software provides all the needed features; it is a real Linux distribution.The difference is that it is not necessary for the user to leave Windows, since Linux is installed tothe Windows hard-disk partition. A Linux boot loader will boot the Linux system when the PCis restarted and the user chooses to boot Linux. Similar approaches include coLinux.


Technology of virtual machines (such as Virtual PC or VMware) also enables Linux to berun inside another OS such as Microsoft Windows. The virtual machine software will simulatean isolated environment onto which the Linux system is installed. After everything is done, thevirtual machine can be booted just as if it were an independent computer.

A.12 Demonstration

The difficulty in quickly demonstrating Linux on the computer of a potential new user remainsstill an obstacle, slowing its adoption as a personal computing platform. So-called ”live CDs” thatsimply boot from CD and automatically load the necessary drivers for the user’s respective systempromise to change that. Linux User Groups, or LUGs, still provide the primary face-to-face forumfor demonstration of Linux. Commercial exhibitions provide Linux demonstrations to potentialnew users, especially corporate buyers. Many commercial distributions are hard to install, butwith work, allow someone to re-use an old machine to see what the Linux desktop is like. Theapproach by Knoppix, which runs Linux directly from a CD without disturbing the PC’s harddrive, is probably the most successful demonstration tool to date. MEPIS also runs from CD likeKnoppix, and both can be installed onto a PC like any other Linux distribution. Ubuntu has aseparate ”Live” version of their distribution which runs from CD. However, the fastest approachis probably that of Workspot, which uses VNC to provide a free Linux desktop demo online.

A.13 Configuration

Configuration of most system wide settings are stored in a single directory called /etc, while user-specific settings are stored in hidden files in the user’s home directory. A few programs use aconfiguration database instead of files.

There are a number of ways to change these settings. The easiest way to do this is by usingtools provided by distributions such as Debian’s debconf, Mandriva’s Control Center, or SUSE’sYaST. Others, like Linuxconf, Gnome System Tools, and Webmin, are not distribution-specific.There are also many command line utilities for configuring programs. Since nearly all settingsare stored in ordinary text files they can be configured by any text editor, which is the standardmethod of configuration for some distributions such as Slackware.

A.14 Programming on Linux

A number of compilers are available for Linux.The GNU Compiler Collection (GCC) comes with the vast majority of distributions. GCC

supports C, C++ and Java (for example by using GCJ) among other languages.There are also a number of IDEs available for Linux. Some of the most popular are Anjuta,

Code::Blocks, KDevelop, NetBeans IDE, Glade (actually a user interface designer), Eclipse, thefamous Emacs and Vim.

Another option for Linux programming is writing shell scripts. These are applications thatare written without the need for compilation of the code. They are interpreted line-by-line ascommands entered in the shell.

Linux also integrates well with Python, Perl, PHP and Ruby.

A.15 Portability of Linux

As originally envisioned by Linus Torvalds, Linux was strictly a 386 (operating system) kernel. Itwas later ported to other architectures, including:

• Intel/AMD x86


• x86-64 (AMD’s AMD64 and Intel’s EM64T)

• IA-64

• ARM

• DEC Alpha

• ESA/390

• Motorola 68K

• MIPS

• PA-RISC

• PowerPC

• SuperH

• SPARC

• . . .

A.16 Support

Technical support is provided by commercial suppliers and by other Linux users, usually in onlineforums, newsgroups and mailing lists. Linux users are often organised in so called Linux UserGroups or abbreviated LUG.

The business model of commercial suppliers is generally dependent on charging for support,especially for business users. Companies, which offer a special business version of their distribution,add special support packages and special tools to administrate higher numbers of installations ordo administrative tasks more easily.

A.17 References

• Glyn Moody: Rebel Code: Linux and the Open Source Revolution, Perseus Publishing, ISBN0-713-99520-3

• Gedda. R. (2004). Linux breaks desktop barrier in 2004: Torvalds. Retrieved January 16,2004 from [15]

• Mackenzie, K. (2004). Linux Torvalds Q&A. Retrieved January 19, 2004 from [16]

• More Than a Gigabuck: Estimating GNU/Linux’s Size by David A. Wheeler

• Counting potatoes: the size of Debian 2.2 by Jess M. Gonzlez-Barahona et al.

• Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers! by David A.Wheeler

• Desktop Linux: Ready for Prime Time? by Emmett Dulaney, Redmond Magazine, June2005, retrieved on 21 December 2005

• Mandrake 8.1 easier than Win-XP by Thomas C. Greene, The Register, retrieved December22, 2005

Appendix B

Setting up a Server

B.1 Setting up the NFS Root Filesystem

Loading a filesystem image is time consuming and during a testing/debugging/development cycle,this is done quite often. Therefore a NFS filesystem, where changes can be done on the fly, ismore useful. The boards are configured to use BOOTP for booting. In the following, the basicmodifications are described to configure a GNU/Linux server to serve as a BOOTP server.

First of all, a number of services need to be installed on the serving machine (gemini in thisexample):

• a NFS server

• a DHCP server

On a GNU/Debian system, the following packages need to be installed:

[mleeman@gemini mleeman]$ dpkg -l |grep nfs

ii nfs-common 1.0-2woody1 NFS support files common to client and serve

ii nfs-kernel-ser 1.0-2woody1 Kernel NFS server support

[mleeman@gemini mleeman]$ dpkg -l |grep dhcp

ii dhcp 2.0pl5-19 DHCP server for automatic IP address assignment

Other distributions should have similarly named packages.The main configuration for this is done in a number of files:* /etc/exports contains the files of your files on the server which are exported over Net-

workFileSystem (NFS). In this case, this will be the root filesystem for our embedded board. */etc/dhcpd.conf contains the configuration of the DHCP server. You have to make certain thatyou are the only DHCP server in your netrange. If you do not, a lot of unpleasantness can occursince your server will reply with DHCPNACK on requests for the official DHCP server.

Extract a root filesystem tarball (or something similar) to your homedir (or a predetermined lo-cation). A free and general purpose PPC root filesystem can be found in ELDK (http://www.denx.de/ELDK/).

[mleeman@gemini mleeman]$ mkdir targets

[mleeman@gemini mleeman]$ sudo tar xfz target.scn.tar.gz -C targets/

With the current BARCO builds, you can also use the archive of the squashfs to create yourNFS root FS1:

[mleeman@gemini mleeman]$ sudo tar xfj target-svc2.2.6.nightly.tar.bz2 -C targets/

1 Do not forget to copy your stream.cfg file, which basically re-confirms your IP address to ppcserver intothe /root/ppcstream directory. This is only required for SMD firmware builds that do not yet use the U-Bootconfiguration settings.

173

APPENDIX B. SETTING UP A SERVER 174

Secondly, add the following entry in the exports (/etc/exports) file, indicating that you wantto export without any condition:

/home/mleeman/targets/target.scn.01 *(rw,no_root_squash,no_all_squash)

Finally, restart the NFS server:

[mleeman@gemini mleeman]$ sudo /etc/init.d/nfs-kernel-server restart

Obviously, these last two steps are not required if you already have a parent directory of yourtarget in the /etc/exports file.

We’re halfway through. We now need to link this NFS filesystem to the MAC address of ourboard. This is done by adding the following in the /etc/dhcpd.conf file:

# global configuration options

# allow bootp packages

allow bootp;

# the first subnet we listen to

subnet 150.158.231.0 netmask 255.255.255.0 {

# options for the subnet in question

option routers 150.158.231.1;

default-lease-time 1209600;

max-lease-time 31557600;

# for each board, add the following configuration block.

# It is best that you use different root FS directories

# for each board.

group {

host svc2.01{

hardware ethernet 00:04:a5:04:05:93;

fixed-address 10.2.4.10;

option root-path "/home/firmware/mleeman/targets/svc2/nightly";

filename "/home/services/tftpboot/kernels/kernel.svc2.2.6.continuous.img";

}

}

}

We now need to restart the dhcp server:

[mleeman@gemini mleeman]$ sudo /etc/init.d/dhcpd restart

If you do not know the MAC address, and it is not indicated on the board in some way, justlet the target boot (loaded with an NFS kernel of course) and watch the messages of the dhcpserver, it will display the the MAC address (but will not pass a root FS).

[mleeman@gemini mleeman]$ sudo tail -f /var/log/messages

and search for something like the following passing:

Apr 9 07:53:22 gemini dhcpd: BOOTREQUEST from 00:04:a5:00:05:19 via eth0

Apr 9 07:53:22 gemini dhcpd: No applicable record for BOOTP host 00:04:a5:00:05:19 via eth0

More detailed information will be see in

sudo tail -f /var/log/syslog

Jul 13 15:57:08 gemini dhcpd: BOOTREQUEST from 00:04:a5:04:05:0c via eth0

Jul 13 15:57:08 gemini dhcpd: BOOTREPLY for 150.158.231.121 to hydra_stream_dmar (00:04:a5:04:05:0c) via eth0

Jul 13 15:57:13 gemini rpc.mountd: authenticated mount request from 150.158.231.121:800 for /home/dmartens/targets/target.01 (/home/dmartens/targets/target.01)

In this case 00:04:a5:00:05:19 is our MAC address.More details about the kernel parameters are found in the attached file.


B.2 Set up a Firewall with a private address range.

B.2.1 firehol

Firehol is a configuration tool upon iptables that allows to write a very powerful firewall in asmall number of lines. The following example creates a firewall for a machine with 2 NetworkInterface Connectors (NICs), eth0 and eth1; with eth0 to a public address range and eth1 to aprivate address range. Machines from the private range will transparently be able to access theexternal network, while selected ports will be routed trough to machines on the private network.

########################################################################

#

# These are the valid NAT ranges. Make certain that the network behind

# the NAT is different than the one from the client/control network.

# Class From To CIDR Mask Decimal Mask

# Class "A" or 24 Bit 10.0.0.0 10.255.255.255 /8 255.0.0.0

# Class "B" or 20 Bit 172.16.0.0 172.31.255.255 /12 (or more typically /16) 255.240.0.0 (or 255.255.0.0)

# Class "C" or 16 Bit 192.168.0.0 192.168.255.255 /16 (or more typically /24) 255.255.0.0 (or 255.255.255.0)

#

# The IPs are set in the file /etc/network/intefaces

# See man(5) interfaces for more information

#

# Marc Leeman <[email protected]>

#

# After a change in the configuration file, execute

# $ sudo /etc/init.d/firehol restart

# to make the changes in the firewall active.

#

########################################################################

version 5

########################################################################

#

# Custom services

# According to IANA (http://www.iana.org/assignments/port-numbers)

# The Dynamic and/or Private Ports are those from 49152 through 65535

#

########################################################################

# 100 tcp/udp ports for a service

server_hydra1_ports="tcp/50000:50099 udp/50000:50099"

client_hydra1_ports="default"

server_hydra2_ports="tcp/50100:50199 udp/50100:50199"

client_hydra2_ports="default"

########################################################################

#

# forward traffic arriving on this machine on port 80 to machine

# 172.16.0.2

#

########################################################################

dnat to 172.16.0.2:80 inface eth0 proto tcp dport 50080

dnat to 172.16.0.3:80 inface eth0 proto tcp dport 50180

########################################################################

#

# The IP address from this network interface should be assigned by DHCP

# or is the client network.

#

########################################################################

interface eth0 control

protection strong

policy reject

server hydra1 accept

server hydra2 accept

server ssh accept

server http accept

client all accept

########################################################################

#

# This network interface is under control of Barco. You can choose any

# of the private network ranges mentioned above, just make certain that

# they do not clash.

#

########################################################################

interface eth1 stream

protection strong

policy reject

server ssh accept

server http accept

client all accept

########################################################################

#

# The routing magic.

#

########################################################################

router control2stream inface eth0 outface eth1

masquerade reverse

client all accept

route ssh accept

route http accept


B.3 Configure your BDI probe

GNU/Linux Setup Tool Build the setup tool: The setup tool is delivered only as source files. Thisallows to build the tool on any Linux / Unix host. To build the tool, simply start the make utility.

[root@LINUX_1 bdisetup]# make

cc -O2 -c -o bdisetup.o bdisetup.c

cc -O2 -c -o bdicnf.o bdicnf.c

cc -O2 -c -o bdidll.o bdidll.c

cc -s bdisetup.o bdicnf.o bdidll.o -o bdisetup

B.3.1 Check the serial connection to the BDI

With bdisetup -v you may check the serial connection to the BDI. The BDI will respond withinformation about the current loaded firmware and network configuration. Note: you need to haveroot permissions, otherwise you probably have no access to the serial port.

[mleeman@bpscltpd bdisources]$ sudo ./bdisetup -v -p/dev/ttyS0 -b57 -s

BDI Type : BDI2000 Rev.C (SN: 94330418)

Loader : V1.05

Firmware : V1.15 bdiGDB for PPC6xx/PPC7xx

Logic : V1.02 PPC6xx/PPC7xx

MAC : 00-0c-01-94-33-04

IP Addr : 0.0.0.0

Subnet : 255.255.255.255

Gateway : 255.255.255.255

Host IP : 255.255.255.255

Config : svc8245-streaming.cfg

B.3.2 Activating BOOTP

The BDI can get the network configuration and the name of the configuration file also via BOOTP.For this simple enter 0.0.0.0 as the BDI’s IP address (see following chapters). If present, the subnetmask and the default gateway (router) is taken from the BOOTP vendor-specific field as definedin RFC 1533. With the Linux setup tool, simply use the default parameters for the -c option:

[root@LINUX_1 bdisetup]# ./bdisetup -c -p/dev/ttyS0 -b57

The MAC address is derived from the serial number as follows: MAC: 00-0C-01-xx-xx-xx ,replace the xx-xx-xx with the 6 left digits of the serial number Example: SN# 93123457 ==¿¿00-0C-01-93-12-34

You only need to specify the configuration file (which will contain the host address of yourtftpserver):

sudo ./bdisetup -c -p/dev/ttyS0 -b57 -fsvc8245-streaming.cfg

Connecting to BDI loader

Writing network configuration

Writing init list and mode

Configuration passed

and exit the probe programming

\begin{verbatim}

[mleeman@bpscltpd bdisources]$ sudo ./bdisetup -v -p/dev/ttyS0 -b57 -s



Loader : V1.05



MAC : 00-0c-01-95-56-03

IP Addr : 0.0.0.0

Subnet : 255.255.255.255

Gateway : 255.255.255.255

Host IP : 255.255.255.255


B.3.3 Load/Update the BDI firmware/logic

With bdisetup -u the firmware is loaded and the CPLD within the BDI2000 is programmed. Thisconfigures the BDI for the target you are using. Based on the parameters -a and -t, the tool selectsthe correct firmware / logic files. If the firmware / logic files are in the same directory as the setuptool, there is no need to enter a -d parameter. Note: There is no difference between CPU typePPC600, PPC700, MPC8200, MPC7400.

[root@LINUX_1 bdisetup]# ./bdisetup -u -p/dev/ttyS0 -b57 -aGDB -tPPC700


Erasing CPLD

Programming firmware with ./b20copgd.108

Programming CPLD with ./copjed21.102

B.3.4 Transmit the initial configuration parameters

With bdisetup -c the configuration parameters are written to the flash memory within the BDI.The following parameters are used to configure the BDI: BDI IP Address The IP address for theBDI2000. Ask your network administrator for assigning an IP address to this BDI2000. EveryBDI2000 in your network needs a different IP address. The subnet mask of the network where theBDI is connected to. A subnet mask of 255.255.255.255 disables the gateway feature. Ask yournetwork administrator for the correct subnet mask. If the BDI and the host are in the same subnet,it is not necessary to enter a subnet mask. Enter the IP address of the default gateway. Ask yournetwork administrator for the correct gateway IP address. If the gateway feature is disabled, youmay enter 255.255.255.255 or any other value.

B.3.5 Fixed Configuarion

Host IP Address Enter the IP address of the host with the configuration file. The configuration fileis automatically read by the BDI2000 after every start-up. Configuration file Enter the full pathand name of the configuration file. This file is read via TFTP. Keep in mind that TFTP has it’sown root directory (usual /tftpboot). You can simply copy the configuration file to this directoryand the use the file name without any path. For more information about TFTP use ”man tftpd”.

[root@LINUX_1 bdisetup]# ./bdisetup -c -p/dev/ttyS0 -b57 \

> -i151.120.25.101 \

> -h151.120.25.118 \

> -fppc750.cnf


Writing network configuration

Writing init list and mode

Configuration passed


B.3.6 Check configuration and exit loader mode

The BDI is in loader mode when there is no valid firmware loaded or you connect to it with thesetup tool. While in loader mode, the Mode LED is flashing. The BDI will not respond to networkrequests while in loader mode. To exit loader mode, the ”bdisetup -v -s” can be used. You mayalso power-off the BDI, wait some time (1min.) and power-on it again to exit loader mode.

[root@LINUX_1 bdisetup]# ./bdisetup -v -p/dev/ttyS0 -b57 -s


Loader : V1.05



MAC : 00-0c-01-92-15-21

IP Addr : 151.120.25.101

Subnet : 255.255.255.255

Gateway : 255.255.255.255

Host IP : 151.120.25.118

Config : ppc750.cnf

The Mode LED should go off, and you can try to connect to the BDI via Telnet.

[root@LINUX_1 bdisetup]# telnet 151.120.25.101

B.3.7 Summarising the upgrade procedure

This is the verbatim output of the last upgrade (02092005) to the SMD lab probes (cf. firmwareat the bottom). That’s all there is to it ;)

[mleeman@bpscltpd bdi2000_v121]$ sudo ./bdisetup/bdisetup -v -p/dev/ttyS1 -b57 -s


Loader : V1.05



MAC : 00-0c-01-95-56-03

IP Addr : 0.0.0.0

Subnet : 255.255.255.255

Gateway : 255.255.255.255

Host IP : 255.255.255.255


[mleeman@bpscltpd bdi2000_v121]$ sudo ./bdisetup/bdisetup -u -p/dev/ttyS1 -b57 -aGDB -tPPC700


Erasing CPLD

Programming firmware with ./b20copgd.121

Erasing firmware flash ....

Erasing firmware flash passed

Programming firmware flash ....

Programming firmware flash passed

Programming CPLD with ./copjed21.105

[mleeman@bpscltpd bdi2000_v121]$ sudo ./bdisetup/bdisetup -v -p/dev/ttyS1 -b57 -s


Loader : V1.05



MAC : 00-0c-01-95-56-03

IP Addr : 0.0.0.0


Subnet : 255.255.255.255

Gateway : 255.255.255.255

Host IP : 255.255.255.255


Appendix C

Miscellaneous Tools

C.1 Introduction

Embedded firmware engineers have to keep track of complex code bases that are often subject tocontinuous change on a lot of fronts: there are improvements on drivers that suddenly give youa better throughput; workarounds for silicon bugs that explain the mysterious and unpredictablecrashes of your systems, new hardware support for that processor watchdog you’ve been dying totry out, . . .

As such, it is not acceptable to chose one particular version of U-Boot, the Linux kernel,Busybox, uClibc and start changing throughout the code base. We need a number of tools totrack our changes; trace them back and include our changes with a minimum of effort when a newkernel is released to the public.

This chapter will introduce a number of these tools that are commonly used in Open and FreeSource development.

C.2 Patch

A patch is an update to the source code of a program. Be careful, every patch isdesigned for a special version and cannot be applied on every tree.

However, patching should work in most cases if the reference code (before and after the changes)has not changed or is not too far away from the original location. For example. It is fine to addlines in functions before and after the function that is contained in the patch, but patching willfail if the lines before and after the patch modifications are different.

C.2.1 Applying a patch

To apply a patch to a project, you have to first obtain the project source code. You should alwayskeep a tar.gz archive with the source of your current project on your shell. In the next step, youhave to change to your source directory (i.e.: /usr/src/linux-2.6.17/) and type the followingcommand:

[mleeman@seraph linux-2.6.17]$ patch -p1 < ../path.to.the/patch

Once this is complete, execute the following command:

[mleeman@seraph linux-2.6.17]$ find . -name "*.rej" -print

If it returns a list with filenames ending with .rej extension, then the patch didn’t applyproperly. Ensure that the patch is intended for your version and that you have the original source.You should also try to re-download the patch to ensure that the patch is not corrupted.

If you get an error such as this:

180

APPENDIX C. MISCELLANEOUS TOOLS 181

|Index: Makefile.in

|===================================================================

|RCS file: /usr/local/cvsroot/project-1.6/Makefile.in,v

|retrieving revision 1.38

|diff -u -r1.38 Makefile.in

|--- Makefile.in 17 Jun 2004 05:43:28 -0000 1.38

|+++ Makefile.in 23 Jul 2004 21:58:23 -0000

--------------------------

File to patch:

Then you should try using a different ’-p’ option. Try -p0 first, and then -p2, -p3, etc.If the patch applied properly, the only thing left to do is to recompile your project and install

the new modules and binaries.

C.2.2 Creating a patch

If you fixed a bug and/or changed something in project ’s source code, it would be really nice tolet the dev team know about it, so we can possibly apply it to next release of project.

There are several steps to submit a patch:

• Create a directory with original source tree and one with modified source tree.

• Run the following:

[mleeman@seraph code]$ diff -urN linux-2.6.17.orig linux-2.6.17 > my.changes.diff

C.3 Quilt

C.3.1 Introduction

Quilt is a tool to manage large sets of patches by keeping track of the changes each patch makes.Patches can be applied, un-applied, refreshed, etc. The key philosophical concept is that yourprimary output is patches.

With quilt, all work occurs within a single directory tree. Commands can be be invoqued fromanywhere within the source tree. They are of the form quilt cmd similar to CVS commands.

Patch files are located in the patches sub-directory of the source tree. The QUILT PATCHES

environment variable can be used to override this location. The patches directory may containsub-directories. It may also be a symbolic link instead of a directory.

A file called series contains a list of patch file names that defines the order in which patchesare applied. In this file, each patch file name is on a separate line. Patch files are identified bypathnames that are relative to the patches directory; patches may be in sub-directories below thisdirectory. Lines in the series file that start with a hash character (#) are ignored. When quiltadds, removes, or renames patches, it automatically updates the series file.

C.3.2 Some Examples

When starting out with a new project where you want to apply your changes to and keep thechanges in a patch set, use new to initialise quilt with a new patch on top:

[mleeman@seraph linux-2.6.17]$ quilt new linux-2.6.17.diff

Patch linux-2.6.17.diff is now on top

After this, the files that need to be changed can be added manually with add or implicitly withedit:


[mleeman@seraph linux-2.6.17]$ quilt add include/asm-ppc/mpc83xx.h

File include/asm-ppc/mpc83xx.h added to patch linux-2.6.17.diff

add adds the file to the list to keep track off. You can then change the file at any time in thesource tree; with any editory1. In contrast, edit will use the $EDITOR environment variable, addsthe file and immediately spawns the favourite editor:

[mleeman@seraph linux-2.6.17]$ quilt edit include/asm-ppc/mpc83xx.h

File include/asm-ppc/mpc83xx.h added to patch linux-2.6.17.diff

Use add and edit for changing existing files and for creating new files. Finally, before commit-ting the change patch to the repository, use refresh to create or update your patch:

[mleeman@seraph linux-2.6.17]$ quilt refresh

Refreshed patch linux-2.6.17.diff

Quilt keeps a stack of patches. This allows developers to keep a functional distinction in thechanges, and store these changes in different patches: a certain file can occur in different patcheswithout a problem.

Moving up and downward in the stack of patches is done with the commands pop and push2:

[mleeman@seraph linux-2.6.17]$ quilt pop

Removing patch linux-2.6.17.diff

Restoring include/asm-ppc/mpc83xx.h

Now at patch config-2.6.17.diff

[mleeman@seraph linux-2.6.17]$ quilt push

Applying patch linux-2.6.17-barco834xg1.diff

patching file config.barco834xg1.nfs

patching file config.barco834xg1.flashfs

patching file drivers/mtd/maps/barco834xg1.c

patching file arch/powerpc/platforms/83xx/Kconfig

patching file arch/ppc/Kconfig

patching file arch/powerpc/platforms/83xx/Makefile

patching file include/asm-ppc/mpc83xx.h

patching file arch/ppc/platforms/83xx/mpc834x_sys.c

patching file config.barco8245g1.flashfs

patching file config.barco8245g1.nfs

patching file config.barco8245g2.flashfs

patching file config.barco8245g2.nfs

patching file arch/ppc/platforms/83xx/Makefile

patching file drivers/mtd/maps/Kconfig

patching file drivers/mtd/maps/Makefile

patching file drivers/net/phy/phy.c

Now at patch linux-2.6.17-barco834xg1.diff

Using these techniques, it is equally easy to insert a patch in the stack (instead of adding it onthe stack): just pop the stack to the location the patch needs to be inserted; add the new patchand then push the remainder back on the stack.

When a file is changed in that is already contained in the current patch, just use refresh.When a file is adjusted that you know is somewhere in the patch stack, use refresh and pop

1It is important that a file is added before a change is made to it, since the this file is needed as a reference file.2quilt push -a applies all the remaining patches in the stack, quilt pop -a removes all the patches in the

stack.


repeatedly untill you’ve refreshed the first patch on the stack. The changes will be kept in thepatch that has the last change (last occurence in the series file).

For determining what patches are already applied in a source tree, the applied command isused.

[mleeman@seraph linux-2.6.17]$ quilt applied

linux-2.6.16-barco.diff

squashfs3.0-patch

squashfs2.2-2.6.13-config.diff

linux-2.6.16-barco8245g2.diff

linux-2.6.17.diff

Similarly, unapplied lists the unapplied patches on the source tree:

[mleeman@seraph linux-2.6.17]$ quilt unapplied

serial-2.6.17.diff

linux-2.6.17-barco834xg1.diff

For further examples, Chapter 7 already includes an extensive example when upgrading aproject that contains a working quilt configuration.

A graphical dependency between the patches can be obtained using a number of simple com-mands (see Figure C.1):

quilt graph | dot -Tpng |display

[mleeman@neo linux-2.6.18]$ cat ~/.quiltrc

QUILT_REFRESH_ARGS="--sort --no-timestamps"

This small fix will not use timestamp patches when creating a patch; which ensures that onlyreal changes in the patches are taken into account: otherwise patches will be changed quite a lotwhen refreshing, simply because the timestamps will be updated.

C.3.3 References

Quilt is a set of scripts to effectively manage a series of patches, E.G. to apply/remove/reorder/-modify patch series. See “Introduction to Quilt” in the quilt distribution (http://savannah.nongnu.org/projects/quilt) for more details.

• Andrew Morton: Patch Management Scripts, http://lwn.net/Articles/13518/ and http:

//www.zip.com.au/~akpm/linux/patches/patch-scripts-0.9.

• Andreas Grnbacher et al.: Patchwork Quilt, http://savannah.nongnu.org/projects/quilt.

• IEEE Std. 1003.1-2001: Standard for Information Technology, Portable Operating SystemInterface (POSIX), Shell and Utilities, diff command, pp. 317. Online version available fromthe The Austin Common Standards Revision Group, http://www.opengroup.org/austin/.

• GNU diff info pages (info Diff), section Output Formats.

• Edward C. Bailey: Maximum RPM: Taking the Red Hat Package Manager to the Limit,http://www.rpm.org/max-rpm/.

http://savannah.nongnu.org/projects/quilt


http://lwn.net/Articles/13518/

http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.9

http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.9


http://www.opengroup.org/austin/

http://www.rpm.org/max-rpm/


Figure C.1: Drawing a graphical dependency between patches with quilt.

Appendix D

Network Configuration

D.1 TCP/IP

D.1.1 Static

With static configuration; the IP settings are entered (or hard coded) and are not subject tochange or negotiation. Though this might seem the easiest way to configure the network, it is alsothe least flexible one.

D.1.2 DHCP

In the context of computer networking, Dynamic Host Configuration Protocol (DHCP, currentlyimplemented as DHCPv6) is a client-server networking protocol. A DHCP server provides config-uration parameters specific to the DHCP client host requesting, generally, information required bythe client host to participate on an IP network. DHCP also provides a mechanism for allocationof IP addresses to client hosts.

DHCP emerged as a standard protocol in October 1993. RFC 2131 provides the latest (March1997) DHCP definition. DHCP functionally became a successor to the older BOOTP protocol.Due to the backward-compatibility of DHCP, very few networks continue to use pure BOOTP.

D.1.2.1 IP address allocation

Depending on implementation, the DHCP server has three methods of allocating IP-addresses:

manual allocation where the DHCP server performs the allocation based on a table with MACaddress - IP address pairs manually filled by the server administrator. Only requesting clientswith a MAC address listed in this table get the IP address according to the table.

automatic allocation where the DHCP server permanently assigns to a requesting client a freeIP-address from a range given by the administrator.

dynamic allocation the only method which provides dynamic re-use of IP addresses. A networkadministrator assigns a range of IP addresses to DHCP, and each client computer on theLAN has its TCP/IP software configured to request an IP address from the DHCP serverwhen that client computer’s network interface card starts up. The request-and-grant processuses a lease concept with a controllable time period. This eases the network installationprocedure on the client computer side considerably.

This decision remains transparent to clients.

185

APPENDIX D. NETWORK CONFIGURATION 186

D.1.2.2 Protocol Anatomy

DHCP uses the same two IANA assigned ports as BOOTP: 67/udp for the server side, and 68/udpfor the client side.

DHCP Discover : The client broadcasts on the local physical subnet to find available servers.Network administrators can configure a local router to forward DHCP packets to a DHCPserver on a different subnet. This client-implementation creates a UDP packet with thebroadcast destination of 255.255.255.255 and also requests its last-known IP address (in theexample below, 192.168.1.100) although the server may ignore this optional parameter.

DHCP Offer : The server determines the configuration, based on the client’s hardware addressas specified in the CHADDR field. Here the server, 192.168.1.1, specifies the IP address inthe YIADDR field.

DHCP Request : The client selects a configuration out of the DHCP ”Offer” packets it hasreceived and broadcasts it on the local subnet. Again, this client requests the 192.168.1.100address that the server specified. In case the client has received multiple offers it specifiesthe server it is accepting the offer from.

DHCP Acknowledge : The server acknowledges the request and sends the acknowledgement tothe client. The system as a whole expects the client to configure its network interface withthe supplied options.

DHCP Inform : The client sends a request to the DHCP server: either to request more infor-mation than the server sent with the original DHCPACK; or to repeat data for a particularapplication - for example, browsers use DHCP Inform to obtain web proxy settings viaWPAD. Such queries do not cause the DHCP server to refresh the IP expiry time in itsdatabase.

DHCP Release : The client sends a request to the DHCP server to release the DHCP andthe client unconfigures its IP address. As clients usually do not know when users may un-plug them from the network, the protocol does not define the sending of DHCP Release asmandatory.

D.1.2.3 DHCP and firewalls

DHCP and firewallsFirewalls usually have to permit DHCP traffic explicitly. Specification of the DHCP client-

server protocol describes several cases when packets must have the source address of 0x00000000or the destination address of 0xffffffff. Anti-spoofing policy rules and tight inclusive firewalls oftenstop such packets. Multi-homed DHCP servers require special consideration and further complicateconfiguration.

To allow DHCP, network administrators need to allow several types of packets through theserver-side firewall. All DHCP packets travel as UDP datagrams; all client-sent packets have sourceport 68 and destination port 67; all server-sent packets have source port 67 and destination port68. For example, a server-side firewall should allow the following types of packets:

• Incoming packets from 0.0.0.0 or dhcp-pool to dhcp-IP

• Incoming packets from any address to 255.255.255.255

• Outgoing packets from dhcp-IP to dhcp-pool or 255.255.255.255

where dhcp-IP represents any address configured on DHCP server host and dhcp-pool standsfor the pool from which addresses are assigned to clients by DHCP server.


D.1.3 ZCIP

Zeroconf or Zero Configuration Networking is a set of techniques that automatically create ausable IP network without configuration or special servers. This allows unknowledgeable usersto connect computers, networked printers, and other items together and expect them to work.Without Zeroconf or something similar, a knowledgeable user must either set up special servers,like DHCP and DNS, or set up each computer by hand.

D.1.4 DNS-SD & uPnP

DNS Service Discovery (DNS-SD) is Apple’s lightweight protocol, used in Apple products, manynetwork printers and a considerable number of third party products and applications on variousoperating systems. It is considered simpler and easier to implement than SSDP (below) because ituses DNS rather than HTTP. It uses DNS SRV (RFC 2782), TXT, and PTR records to advertiseService Instance Names, which are details of available services like instance, service type, domainname and optional configuration parameters. Service types are given informally on a first-comebasis. A service type registry is maintained and published by DNS-SD.org.

Simple Service Discovery Protocol (SSDP) is UPnP’s protocol, used in Windows XP and severalbrands of network equipment. Despite its name, it is considered complex and requires more effortto implement than DNS-SD. SSDP uses HTTP notification announcements that give a service-type URI and a Unique Service Name (USN). Service types are regulated by the Universal Plugand Play Steering Committee.

D.2 Writing a simple web interface

D.2.1 What is CGI?

Common Gateway Interface (CGI) is not a language - It is a simple protocol used to communi-cate between web forms and programs. A CGI script can be written in any language that canread STDIN, write to STDOUT, and read environment variables, i.e. virtually any programminglanguage, including C, Perl, or even shell scripting.

D.2.2 Structure of a CGI Script

Here’s the typical sequence of steps for a CGI script:

1. Read the user’s form input.

2. Do what you want with the data.

3. Write the HTML response to STDOUT.

The first and last steps are described below.

D.2.3 Reading the User’s Form Input

When the user submits the form, your script receives the form data as a set of name-value pairs.The names are what you defined in the INPUT (or SELECT or TEXTAREA) tags, and the valuesare whatever the user typed in or selected. (Users can also submit files with forms, but this primerdoesn’t cover that.)

The long string is in one of these two formats:

name1=value1&name2=value2&name3=value3

name1=value1;name2=value2;name3=value3


So just split on the ampersands or semicolons, then on the equal signs. Then, do two morethings to each name and value:

1. Convert all ”+” characters to spaces, and

2. Convert all ”%xx” sequences to the single character whose ASCII value is ”xx”, in hex. Forexample, convert ”%3d” to ”=”.

This is needed because the original long string is URL-encoded, to allow for equal signs,ampersands, and so forth in the user’s input. So where do you get the long string? That dependson the HTTP method the form was submitted with:

• For GET submissions, it’s in the environment variable QUERY STRING.

• For POST submissions, read it from STDIN. The exact number of bytes to read is in theenvironment variable CONTENT LENGTH.

D.2.4 Sending the Response Back to the User

First, write the line

Content-type: text/html

plus another blank line, to STDOUT. After that, write your HTML response page to STDOUT,and it will be sent to the user when your script is done. That’s all there is to it.

Yes, you’re generating HTML code on the fly. It’s not hard; it’s actually pretty straightforward.HTML was designed to be simple enough to generate this way.

#include <stdio.h>

void main() {

/∗∗ Print the CGI response header, required for all HTML output. ∗∗//∗∗ Note the extra \n, to send the blank line. ∗∗/printf("Content-type: text/html\n\n") ;

/∗∗ Print the HTML response page to STDOUT. ∗∗/printf("<html>\n") ;printf("<head><title>CGI Output</title></head>\n") ;printf("<body>\n") ;printf("<h1>Hello, world.</h1>\n") ;printf("</body>\n") ;printf("</html>\n") ;

exit(0) ;}

D.2.5 Haserl

An alternative to doing all of this by hand is to use a framework which takes care of these things.Most people have probably heard of server side scripting languages like PHP, which makes it easyto do dynamic web pages. PHP is unfortunately quite large (several MBs) and overkill for a simpleconfiguration web interface.


A nice alternative for embedded systems is Haserl. Haserl is conceptually very similar to PHP,but it uses shell as the scripting language instead of the PHP language. The advantage of this isthat it is very small (14k) and that you don’t need to learn a new language.

Haserl automatically parses GET/POST form parameters and creates FORM <parameter> en-vironment variables with the values. Just like PHP it allows scripting lines to be interleaved withHTML. The easiest way of understanding it is probably to see an example:

#!/usr/bin/haserl

content-type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<html>

<head>

<title>Sample Haserl Form</title>

</head>

<body>

<h1>Sample form running on <? hostname ?></h1>

<form action="<? echo -n $SCRIPT_NAME ?>" method=POST>

<textarea name=textarea1><? echo -n $FORM_textarea1 | tr a-z A-Z ?>

</textarea>

<input type=submit value=GO>

</form>

<?if [ "$FORM_textarea1" != "$(echo -n $FORM_textarea1 | tr a-z A-Z)" ] ?>

<b>Please note that I had to uppercase some of your input</b>

<?el?>

<?if [ -n "$FORM_textarea1" ] ?>

<b>Input above was already uppercased

<?fi?>

<?fi?>

</body>

</html>

D.3 References

• CGI specification: http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

• PHP: Hypertext Preprocessor http://www.php.net

• Haserl: http://haserl.sf.net

http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

http://www.php.net

http://haserl.sf.net