operating systems and computer networks scipt - part 1... · a modern computer consists of one or...

1

OPERATING SYSTEMS AND COMPUTER

NETWORKS

LECTURE NOTES

OPE RATI NG S YST EMS A N D COMPUT ER NETW ORK S

LECTURE NOTES

U N I V E R S I T Y O F D U I S B U R G - E S S E N

F A C U L T Y O F E N G I N E E R I N G

I N S T I T U T E O F C O M P U T E R E N G I N E E R I N G

P R O F . D R . - I N G A . H U N G E R

2

Table of Contents

1 INTRODUCTION TO OPERATING SYSTEM ..................................................................... 4

1.1 TASKS OF OPERATING SYSTEMS .......................................................................................... 8 1.2 TYPES OF OPERATING SYSTEMS ........................................................................................ 10

2 ADDRESSING .................................................................................................................... 14

3 FILE MANAGEMENT ......................................................................................................... 16

3.1 FILE SYSTEMS ON DISK (PHYSICAL STORAGE) .................................................................... 17 3.1.1 FILE MANAGEMENT ON DISKS AND FLOPPY DISKS .................................................................. 17 3.2 STRUCTURE OF A HARD DISK ............................................................................................. 23 3.3 STRUCTURE OF PORTABLE DATA DISCS ............................................................................ 25 3.4 BLOCK SIZE AND MANAGEMENT OF FREE SPACES .............................................................. 27 3.4.1 OPTIMAL BLOCK SIZE ................................................................................................................ 27 3.4.2 FREE SPACE MANAGEMENT ..................................................................................................... 29 3.4.2.1 FAT Sizes: FAT12, FAT16 and FAT32 ........................................................................... 32 3.5 DIRECTORIES ......................................................................................................................... 33 3.5.1 FILE ORGANIZATION IN UNIX ................................................................................................... 36

4 MEMORY MANAGEMENT ................................................................................................ 38

4.1 MEMORY HIERARCHY ......................................................................................................... 40 4.2 CACHE MEMORY ................................................................................................................ 43 4.2.1 CACHE STRUCTURE ................................................................................................................. 44 4.2.2 CACHE PERFORMANCE ............................................................................................................ 45 4.2.3 AVERAGE ACCESS TIME .......................................................................................................... 46 4.2.4 CACHE ORGANIZATION ............................................................................................................ 47 4.3 MAIN MEMORY MANAGEMENT ............................................................................................ 49 4.3.1 MEMORY MANAGEMENT WITHOUT SWAPPING AND PAGING ................................................... 49 4.3.1.1 Relocation ............................................................................................................................ 49 4.3.1.2 Protection ............................................................................................................................. 49 4.3.1.3 Partitions .............................................................................................................................. 49 4.3.1.3.1 Creating Partitions ........................................................................................................... 49 4.3.1.3.2 Allocation Strategies ....................................................................................................... 50 4.3.2 MEMORY MANAGEMENT WITH SWAPPING AND PAGING .......................................................... 51 4.3.3 REPLACEMENT STRATEGIES FOR PAGES ................................................................................ 55 4.4 VIRTUAL MEMORY .............................................................................................................. 56

5 PROCESS MANAGEMENT ............................................................................................... 58

5.1 PROCESS STATES AND PRINCIPLES OF HANDLING .............................................................. 59 5.1.1 PROCESS MANAGEMENT TOOLS .............................................................................................. 60 5.1.2 STRUCTURE OF MULTI-TASKING OPERATING SYSTEMS .......................................................... 61 5.1.2.1 The time sharing concept (Solution for Problem 1: To change between processes) 62

3

5.1.2.2 Scheduling algorithms (Solution for problem 2: Increase efficiency of use of CPU)) 63 5.2 PROCESS CHANGE ............................................................................................................ 67 5.3 SCHEDULING ..................................................................................................................... 70 5.3.1 SCHEDULER ............................................................................................................................. 70 5.3.2 SCHEDULING ALGORITHMS ..................................................................................................... 71 5.3.2.1 Requirements to a Scheduling Algorithm ....................................................................... 71 5.3.2.2 Classification of scheduling algorithms ........................................................................... 72 4.3.3 ANALYSIS OF SCHEDULING ALGORITHMS .............................................................................. 73 4.3.3.1 Gantt-Diagram ..................................................................................................................... 73 4.3.3.2 Timing Diagram ................................................................................................................... 74 4.3.3.3 Example of Planning Algorithms ...................................................................................... 75

4

Part I - OPERATING SYSTEMS

1 INTRODUCTION TO OPERATING SYSTEM

A modern Computer consists of one or more processors, main memory, data-storage devices and I/O-peripherals. Compared to computers of the late century, these devices are far more complex and challenging to understand. It is a necessity, that users can operate a computer without the knowledge of the mechanisms within. To accomplish this, a layer of software exists in almost every computer, or embedded system, called the operating system, whose job it is to provide user programs with a simpler interface and/or to manage all the resources of the underlying system. Basically an operating system is a program that serves as the mediator between computer users and the computer hardware (ref Fig. 1-1).

Fig. 1-1 Components of Computer System

According to figure 1-1 a computer system is composed of 4 components:

Hardware

Operating systems

Service programs

Application programs

Every computer consists on its most basic level as a combination of billions of transistors. All those transistors serve a specific purpose and if switched correctly a signal is generated that will visualize pictures and working surroundings on a monitor. Since the manual control of every transistor in a computer is nigh impossible software is implemented that automates most of the basic hardware functions.

One of the lowest level of software is the instruction set of a CPU. Those a predetermined command words that execute certain functions within the CPU. An instruction set can be

5

visualized as a predefined set of binary numbers that enable the desired reaction within the transistors of a CPU. Another Low-Level operating system is BIOS. A -by now outdated- OS that was used to guarantee basic functions for the connected hardware and was used to boot up the PC and execute the higher-level Operating system. BIOS is now being replaced by UEFI (Unified Extensible Firmware Interface).

For users of a computer device the OS is a program that organizes all resources connected to the device and makes them accessible for the user. Since humans are individuals using visual sight and mechanical movements for the interaction with such a device a GUI (Graphical User Interface) and peripheral input mechanics like a keyboard or touchpad need to be implemented to interact with the computer and insert the desired operations. Also, there are alternatives like voice control and audio output, the most common used method of interacting with the system remains mechanical inputs translated into digital signals.

Some operating systems do not require a GUI. In most cases those OS control electrical and/or mechanical operations in a more complex system either autonomously or controlled by a remote device. Those systems are usually embedded systems. The Operating System is stored (flashed) on the device and cannot be changed without reinstalling the entire OS.

The obvious advantage of an operating system for software developers is, that they do not need to create their programs for a specific type of hardware system. It is only necessary to provide compatibility to the OS that the device is running. The OS takes care of necessary hardware management and translates the commands of the application into machine code and executes them. The results of those executions are then retranslated into higher functions or specific signals for connected devices. This structure can be visualized in form of layers, where the GUI is on the top level and the machine language is on the lowest level.

To understand the distinctive layers of a computer system and software hierarchy Tanenbaum’s layered model helps to understand the underlying structure necessary to execute programs on the hardware (ref. Fig. 1-2,1-3).

Fig. 1-2 Computer as a Multilevel Machine

6

Fig. 1-3 Tanenbaum introduced a layered Principle

Short description of the particular levels in Tanenbaum’s principle -Level 0: Combination of Gates, arithmetic circuits, memory flip-flops and latches, Micro-

Processors, chips and similar basic elements. The basic logic at this level is described with the Boolean algebra

-Level 1: True machine language level – numeric language. Programs at this level consist

of simple arithmetic operations and logical combinations. The execution is usually done by an Arithmetic Logic Unit (ALU)

-Level 2: The Instruction Set Architecture (ISA), also known as the machine language, is

the target of compilers and high-level languages. Commands are interpreted by microprograms or executed directly by hardware.

-Level 3: The Operating System Machine ‘sits’ on top of the ISA and provides additional

sets of instructions. New mechanisms of memory organization can be realized and the possibility of parallel program execution is usually implemented (in modern OS). This level is the lowest level within the hierarchy of an Operating system.

-Level 4: The Assembler Level translates the assembly language to machine language.

E.g. symbolic instruction names to numerical instruction codes, register names to numbers and symbolic variable names to numerical memory locations (to name an example of operations).

7

-Level 5: The Problem Oriented Language Level consists usually of applications programming specific languages (C, C++, Java, LISP, to name a few). The compiler translates commands into L3, L4 languages and serves as interpreters for specialized application areas. Software at this level deals with e.g. arrays, complex arithmetic operations, Boolean expressions.

Summarizing all the requirements for an OS, it can be said, that the main purposes of an operating system are:

To provide an environment for a computer user to execute programs and/or tasks on a given set of hardware in a convenient and efficient manner.

To allocate the separate resources of the computer as needed to solve a given problem and/or function. The allocation process should be as fair and efficient as possible.

As a control program it serves two major functions: (1) supervision of user programs to prevent errors and improper use of the computer, and (2) 1management of the operation and control of I/O devices.

Figure 1-4 shows an abstract model of a computer system. The user usually operates the system with the Graphic User Interface (GUI), but direct control with the help of a command-console (e.g. windows cmd-interface) is also possible in most cases. In both cases the gray colored fields describe the components of an operating system.

Every computer has at least one Bus-system. The system bus is the communication route between the CPU and peripheral components. The user sends input signals through the external connector to the CPU, where the instructions are interpreted and executed.

During the development of computer systems additional peripheral devices were added in order to optimize a PC. Back in the 1970s computer were equipped with a basic on board OS but every software needed to be started with one (or more) removable disks. Changes in a set of data needed to be stored on those disks because computer were usually not equipped with a permanent data storage. When HDD’s became more common the requirements for operating systems increased. The permanent storage system needed to be mapped and organized to maximum efficiency, since every byte was expensive. In the chapter ‘memory management’ several methods of organizing and storing data in HDD’s but also volatile memory will be shown.

All the different hardware components are managed by drivers. A driver is a set of software libraries that contain all the necessary command lines to manipulate the piece of hardware. Drivers are usually installed within the core of an OS and added to the startup routine to guarantee the function on every boot sequence. Drivers can also be seen as interfaces. In software development interfaces are one of the key designs in every program, because they allow other users direct access to the underlying functions without a complete understanding of the code behind them. In the second part of this script the function of a network driver will be explained in detail, showing how this piece of software translates given commands onto signals that can be transmitted and successfully received by other computers.

8

1.1 TASKS OF OPERATING SYSTEMS

Fig. 1-4 Abstract Model of a Computer System

9

Deriving from the complex system modern computers have developed into, operating systems are responsible for 4 main tasks:

1. Process management : When a computer is switched on the operating system loads with the respective presets, in many cases starting the hardware and a GUI, ready to accept user input. Hereby, a whole lot of processes have already run: The ROM-resident boot program prompted to load the OS from a memory storage system. A program call from the user or the OS itself causes the operating system to load the called program from the disk and run it. Possibly, during its execution, the program calls on other software components, like drivers, which are to be loaded and executed by the operating system.

If parallel or concurrent program execution is supported, the operating system has to assign the programs to be executed to the CPU according to a specific strategy. Parallel execution means that several processors are working at the same time on different tasks. Concurrent means the ability to execute several tasks on one CPU specific scheduling algorithms for optimal throughput (s. chapter 4).

2. Memory management: Every software needs temporary or permanent memory during the execution. The operating system must organize available memory in order to reduce access times and prevent memory overflow. A computer has several layers of memory, starting at the CPU cache, which has the lowest access time up to the HDD with the slowest access time. With the "virtual memory concept ", where not all parts of a program are located in memory, the operating system must organize the reloading of the files on a hard disk in order to keep waiting times within the CPU on a low level. Additionally, the access to sensible data and programs must be prevented from other processes.

3. File/data management: File/Data management describes the organization of the mass storage systems, using a system of files and directories. Keyword: File tree. On the top layer, files are organized in folders and by names. On the HDD, those files are mapped by binary addresses with pointers and file maps showing the complete set of bytes that belong to a specific file. The task of the operating system it is to map the logical name to a physical address and keep data fragmentation to a minimum.

4. Resource management Every OS needs to track and map all connected internal and external devices. When a device is added its logical and physical address are added to the Bus system. All commands targeting the devices are then send to the physical address of said device and can be further executed by the embedded software. If there are several programs in progress and they ask for devices the OS needs to make schedules for the bus-manager to prevent collisions.

10

1.2 TYPES OF OPERATING SYSTEMS

To distinguish the different types of operating systems it is necessary to understand that even in a computer there is not one specific operating system working, but several. The mainboard has an integrated microcontroller with a pre-installed software that manages the main bus between CPU and main memory, the SATA-connectors, PCI-express bus systems, etc. The same principle goes for every other added component though the predefined functions differ according to the functionality of a device.

Different types of operating systems perform different tasks. The following list offers an overview of different kinds of systems and there specified functions:

Operating systems that rely on console commands: Those systems rely heavily on the user to perform the desired tasks. Many command lines need to be inserted manually ore be pre-scripted by the user, using only a text-field as a GUI. MS-DOS serves as an example, but also the modern Linux can be used in console mode. A much older version is an electrical computer system using punch-cards to block or access electrical lines, which then executed the desired operations. Progress could be monitored by indicator lights on the casing of the computer.

o Example of console operations by the user doing the following steps:

Load program (entry via: switches, paper tape, voice-command)

switch input of the starting address and the start command

Tracking the program course via indicator lights or outputs on the monitor after each step.

Time sharing systems: Time sharing systems mostly consists of several terminals connected to one (usually powerful) computer system. Each user demands processing power and the OS distributes its resources accordingly. The OS of a time sharing system therefor has two additional tasks:

o Selection of jobs from the pool for the transfer into memory

o Select which job gets how much CPU-Time (dependent on method).

Key words: job-scheduling, CPU-scheduling

11

Distributed systems: Those systems are a setup of loosely coupled systems; each processor is autonomous with its own memory. They communicate with each other over different data links (buses, telephone lines, network).

The execution has three forms: (1) All processes share one processor, (2) each process has its own processor and all of them share the memory, (3) each process has its own processor and the processors are distributed (computer networks, LAN, WAN).

Real-time: A real-time system can be distinguished between "hard" and "soft" real-time conditions. Hard real-time conditions demand an action at a given time, a soft real time system has a better tolerance for missing a deadline. Example: A control unit for a power electronics system demands a control signal every 200µs. If the signal does not arrive in time the control sequence is faulty which may lead to a permanent damage of the system. The control unit must fulfil the criteria: Reaction time + operation time < maximum delay (here: 200µs), which makes it a hard real-time system. Some properties from non-real time systems are usually missing, e.g. the virtual memory concept, since the unpredictable delay that may occur while reloading data from a HDD is a problem for programmers.

Embedded systems: An embedded system refers to a microprocessor or microcontroller for process control that are embedded in a technical environment. For example, the engine management system in an automobile is an ES. Sensors and actors handle the micro-electronic core which is possibly configured as a standard component of the special program for its task. The processor must, under circumstances, accept information simultaneously from various sensors, do calculations and output information to the actors. Here concepts of programming concurrent processes are to be considered.

Multi user systems: In contrast to a single-user system the multi user system allows processing time and other system resources to be shared between multiple users. Access can be established via Network or a shared workstation. A Multi User System can be also a time sharing system, but only if every user has its own access to the system itself. If more than one user shares the same terminal/access to the system it is a time and workspace sharing system, or: Multi User System.

12

No matter which kind of system is used, all OS have also a set of subsystems / operating methods that need to be clarified. A time sharing system can work its jobs as a batch. Therefor a list of example structure for job scheduling is presented.

Multitasking / Time sharing systems: In such a system a/the CPU(s) work(s) on several jobs at the same time, providing each job a specific amount of CPU-Time. Though a CPU can only execute the instructions of one given job at a specific time, many programs have data accesses and idle time during the execution. During those idle times the system puts the current job on hold and loads another one to be executed. While another job is processed the user is able to interact with the program, allowing for interrupts during the execution. This gives the user a shorter response time for changes to commit. Time sharing systems are able to multitask. The not only schedule the CPU times of every instruction of every program, but also the different user requests for processing time. Those system require the handling of two additional tasks:

o Selection of jobs from the pool for the transfer into memory

o Select which job gets how much CPU-Time (dependent on method).

A practical example for a multitasking system is the ‘Hyperthreading’ feature of an Intel CPU. The CPU itself handles different threads and automatically switches between jobs, when a memory access occurs. Since this feature is implemented within the OS of the CPU itself the job exchange times are very low and the throughput of programs is higher compared to a system without multithreading technology.

Key words: job-scheduling, CPU-scheduling

Fig. 1-6 Multitasking with ‘Round Robin’- method

13

Batch systems (batch processing): A batch process describes a sequence of jobs a system computes with a given set of data, in which the user has no option to intervene. In modern computer systems a batch is used in cases of non-interactive Data handling. Since the execution can’t be stopped there is no need for idle time for user inputs. Also those systems have a faster error handling. They dump a job creating a fault proceed to the next one. Jobs with similar needs are packed together (for example, all jobs that require the FORTRAN compiler). At the end of a job or disorders the operator can intervene.

Overlapping CPU and I/O Operation: In order to accelerate input/output operations, data from a slower storage medium is copied on faster storage units. For example a CD can be copied on a SSD hard drive in order to enable a faster reading of the data for the CPU. While the data is copied the CPU can access parts of the data. Program files are regularly copied into the main memory, while the CPU calculates its current task in case a memory access is needed.

SPOOL (Simultaneous Peripheral Operation On-Line): A further improvement, allowed the storing of jobs from an interface into a buffer before processing it. A print job e.g. can be stored on a hard disk, if the printer is busy. The CPU doesn’t need to wait for the printer to finish and can go on working on other jobs.

Fig. 1-5 SPOOL-ing System

Parallel systems / multi-processor: In principle, we distinguish between "tightly coupled" and "loosely coupled" systems. In the first case, two or more processors share the same bus, the same clock and same memory and peripheral devices (e.g. the I-Series CPUs from Intel). Loosely coupled indicates, e.g. computers in distributed locations connected with one or more communication system, => distributed systems. The goal in both systems is improved throughput and increased reliability and maybe a more economical system compared to a bigger processing unit.

14

2 ADDRESSING

Every block of information in a computer system needs to be mapped in order to grant successful access to its contents or properties. Therefor every device within the system has its own identification number. Bus schedulers define and manage given connected sets of hardware and either label them with a physical or virtual address or just map the access information provided by the device itself. The CPU does not necessarily need to know in witch block of RAM the data currently used for a process will be stored, only which blocks within the entire RAM system are available. Once the data is stored the system needs to be 100% accurate about the whereabouts, because a mismatch within the data access can crash the program and/or the entire system. This is why data blocks are organized with binary addresses that point to a specific Byte within the memory block for complete data access. Figure 2-1 shows an example for addressing in a RAM module.

Fig. 2-1 Addressing in a RAM module

The address is built by counting through the separate layers of the module. To define the specific memory block 3 Bits are necessary (23 = 8, so 3 Bits count from 0 to 7). To count through all 131,072 pages of one 512 MiB module 217 numbers need to be stored, meaning 17 Bits are needed for the page index. Within every page 4 KiB of data is stored. The segment of an address that points to a specific Byte within the page is often called the ‘offset’. For the offset 12 Bits are necessary. Adding all the components of the address the total amount of bits necessary to point on one specific byte within the memory all parts need to be added up.

15

𝑁𝐵𝑖𝑡𝑠,𝑅𝑎𝑚 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 = 𝑁𝐵𝑖𝑡𝑠,𝑀𝑒𝑚𝑜𝑟𝑦 𝑚𝑜𝑑𝑢𝑙𝑒 + 𝑁𝐵𝑖𝑡𝑠,𝑃𝑎𝑔𝑒 + 𝑁𝐵𝑖𝑡𝑠,𝑂𝑓𝑓𝑠𝑒𝑡

= 3 𝐵𝑖𝑡𝑠 + 17 𝐵𝑖𝑡𝑠 + 12 𝐵𝑖𝑡𝑠 = 32 𝐵𝑖𝑡𝑠

This shows, that a 4 GiB Ram module needs 32 Bits for addressing. Some operating systems still in use rely on the 32 Bit architecture (which was an upgrade from the 16 Bit architecture), which has its limits regarding the addressing space. Those OS cannot work with more than 4 GiB Ram without workarounds, which is why the CPU architecture and top level OS are upgraded to the 64 Bit architecture. This means that instead of 4 Bytes, 8 bytes are used for addressing allowing mapping of 16 exbibyte of main memory. Since such an amount of main memory is currently not available or necessary for computer systems parts of the address can be used for other means by the OS. Mapping the other storage devices within a computer system works in a similar pattern. Registers within the CPU, the cache and even the HDD memory have a defined starting register/byte/cluster and are then counted until the end of the memory. Specific methods of mapping, file- and memory management will be discussed in the related chapters.

16

3 FILE MANAGEMENT

A ‚file‘, as used in the computer science, describes a combined set of data, defined by the user or corresponding software to be stored on a mass storage device e.g. hard drives, CD’s, flash memory storage. According to the type of file system these data sets are compiled and stored as a combination of binary expressions. Those files must be managed and archived for the time the content is needed again by the OS or user. The file manager is a component of an operating system that manages the entire space on a (mass) storage device. The tasks that the manager has to fulfill include:

1. Locating the files requested by users.

All operating systems aim to achieve device modularity, so that data access is independent of the type of device on which the files are stored. The device manager must be able not only to find a file, but to acknowledge files on any compatible storage system and make them available for the OS or user (e.g. USB Sticks).

2. Space allocation for new created files.

When Data is written on a storage system it is sometimes only necessary to

capture the position of the last track (i.e. unfinished CD). But on a hard drive

files can be deleted and the space once occupied is available again. A file

manager has to be able to recognize free areas and assign them to new files

if needed. That may lead to a set of data being spread over several areas of

the storage system.

3. Overview of the files and associated memory Aside from ‘knowing’ where files are located and how much space they occupy a file manager, in some cases, knows, when a file is spread over several areas, where the components of a specific file are located.

A file system is different from database management systems. Database systems represent program systems for structuring large amounts of data. They have a set of commands for inputting, requesting, modifying the content.

17

3.1 FILE SYSTEMS ON DISK (PHYSICAL STORAGE)

3.1.1 FILE MANAGEMENT ON DISKS AND FLOPPY DISKS

A modern computer system unifies many types of data storage systems. The working memory (RAM) is a low memory, fast access data system, which usually cannot save data permanently, but allows access to specific bits within a reasonable amount of time. This makes the Random Access Memory a perfect complement for a CPU but unusable for long time data saving. The, to this day still, most common system for saving a great amount of data is the magnetic disk hard drive (HDD). A HDD consists of at least 1 magnetic disk and 2 heads to read and write data. Due to the physical design of disks and floppy disks, the data storage device is ordered in:

Concentric tracks which are divided into sectors. The sector size varies 32-4096 bytes. The sector is the smallest addressable unit and a combination of several sectors called cluster.

Front and back; with several plates. In case of a stack of plates, 2 heads are assigned to each plate. The heads of all calculated tracks is called cylinder, see Fig. 3.1.

Fig. 3-1 Construction of a disk storage layout

To access a given set of data, the head has to move to the right track and the disk needs to spin to the right position, so the rotation speed of a disk is a major variable in determining the latency of data access. If a file is scattered across one or more

18

disks the data access takes a considerably greater amount of time compared to a defragment file. The process of assigning disk space to a file is referred to as an allocation. To achieve an optimal utilization of the given data storage medium it is necessary to develop an effective strategy. To determine the working parameters of the sought strategy it is necessary to have all information about the storage device. For example the, today rarely used, magnetic tape permits a purely sequential organization, because it is not possible to ‘jump’ to a certain position on the medium. Magnetic disks on the other hand allow direct and indexed sequential arrays on one or more plates. When stored data is modified, deleted or new contend is pasted it is not desired to rewrite the entire file on the storage device. To avoid this, a file is divided into blocks separated by gaps. Each block comprises one or more sectors. The blocks may be distributed over the disc and represent the logical layout of a file, the physical sectors and clusters. Often there is no distinction between block and cluster. Since new data can (or at least should) only inserted into free memory space it is necessary to keep track of unused memory blocks. This is usually done with a free memory list in which information about free clusters / blocks is stored. The block and/or cluster size can usually defined by the operating system or user once the storage device is plugged into the system. This choice affects the data transfer rate as well as the utilization of disk space. Here a four common allocation strategies presented:

1. Coherent (continuous) storage (also called Sequential storage) Sequential organization means that the data are successively stored on the disk and also are to be found by searching in this order. Pasting, modifying, and deleting data can only be done by rewriting the entire file, if the file is not organized in blocks.

All blocks of a file are located one after another.

Advantages: Small folder size, because only the address of the first block/Cluster is saved as a file name. File can be read without interruption. Disadvantages: the complete volume of the file must be known from the beginning. Fragmentation of the disc space will occur when files are repeatedly deleted and created. This system was usually found on the magnetic tape storage system which is outdated today. Modern mass media devices are able to jump to a specific position on the memory unit, which allows more effective allocation strategies.

2. Allocation and access via linked blocks (Sectors)

This allocation method organizes files in several blocks, that can be placed anywhere on the disk. A block of the file contains the information where the next part of the file is located. The memory directory contains the information

19

where a file starts and optional where it ends. The last block of a file contains a NIL, to define the end of the files. Advantages: No external fragmentation of the disk space when files are deleted and on expanded repeatedly. Disadvantages: Slow access to a specific block. A file has to be searched sequential for pointers to find the correct block. The pointer has to be stored in every block of a file which leads to a loss of space. I.e. if a pointer needs 4 bytes of disk space at a block size of 512 byte the effective data volume is reduced by 0.78%. To counter this loss of space it is possible to create clusters the size of multiples of 512 bytes. This results in fewer pointers to the next block/cluster but increases the potential of the ever-present internal fragmentation.

Fig. 3-2 Allocation and Access via Linked Blocks

3. Allocation and access via a pointer list, Fig. 2-3: This method creates a list of pointers in the main memory, which stores the pointers to the memory blocks of the stored files and keeps the information where the file continuous (ref. Fig. 2-3). The list is stored on the HDD when the main memory is shut down.

Advantage: Faster access via list in main memory

Disadvantage: Possibly huge table in main memory. An error in the table can cause real problems.

20

File A

File A resides in sectors 2,7,5,1023,3

Fig. 3-3 Allocation and Access via list of pointers

4. Indexed Allocation (Combination of 2 and 3): The indexed allocation method combines the attributes of both previous described methods. The first block of a file contains the pointers for all parts of the specific file. This means, that all addresses for the physical location of a block/cluster are stored in this file directory (ref. Fig. -4).

Fig. 3-4 Indexed Allocation

0 2

1

2 7

3 EOF

4

5 1023

6

7 5

~ ~

~

~

1023 3

21

When a file is created all pointers in the index block are assigned the value NIL (not in list). When writing a block for the first time, the file management system removes a free block from the free memory list and writes the address in the index block. Advantages: No external fragmentation. Faster access than linked blocks. Disadvantages: Wasted space for pointers in the index block. Block size defines the size of the index block.

Usually one block of a disk block is selected for each index block. If this is not sufficient, the last entry to a sequential block can be assigned to point on another index block for more file blocks. If the first blocks of a file point to blocks with more pointers on data blocks the method is called ‘multi level indexing’ (see below).

5. Multi-level indexing The first entries of a file contain pointers to other pointer blocks. This method allows the storage of very large files on different segments of a disk but is slower the bigger the file gets. Apart from this the advantages and disadvantages are similar to indexed allocation.

Index block

1st Level

data

2nd Level

Fig. 3-5 Two-Stage Block Access (Multi level indexing)

1st entry of file

~

n-th entry of file

pointer on data block

pointer on data block

…

…

Index Block 1

Pointer on data block

Pointer on

data block

Index Block n

22

In UNIX a combined method to prevent internal fragmentation is used. Small files are organized with an index table for direct access. If the size of a file is equal or smaller than the cluster size it is linked to a direct addressing block. For bigger files the single, double, triple (and so on) index blocks are used (ref. Fig. 3-6.). This method allows size independence compared to a fixed size method.

Fig. 3-6 UNIX I-Node

.

.

.

.

.

.

12

23

3.2 STRUCTURE OF A HARD DISK

The effectiveness of a file management system depends to a certain degree on the quality and size of the data storage system. Since HDD are still the most common file storage for computer systems, it is necessary to get a superficial understanding of the medium. The following figures depict the components of a disk drive and the structure of a hard disk.

Fig. 3-7 Components of a disk

The platters of a hard disk are either constructed out of metal or plastic coated with magnetizable material. The recording and retrieving of data is done by a conduction coil fixed on a moving arm which is called ‘head’. During a read/write operation the head stays stationary, while the disks rotate underneath it.

Write mechanism: During a write operation an electric current flows through the coil within a head and generates a magnetic field. This field magnetizes the target area of the disk (usually 1 Bit) depending on the polarity of the field, resulting in a logical ‘0’ or ‘1’ as written data. The current used for the magnetic field is a pulsed current, meaning that it can change its direction at a very high frequency allowing a recording speed which is usually limited only through the rotational speed of the disks.

Read mechanism: When data is read form the disk it spins beneath the head and induces a current in the coil depending on the polarization of the field of a particular bit. This current results in a comparable voltage at the control unit of the head which interprets it as either ‘1’ or ‘0’.

24

The Figure 2-8 shows a simplified layout of a magnetic disk. The disk is divided into several tracks which contain a fixed number of sectors and are separated from another tracks by small gaps. This prevents or at least minimizes errors due to misalignment of the head or magnetic interference between tracks. The width of a track matches the width of the assigned head to ensure that no undesired magnetization of adjacent tracks happens.

Fig. 3-8 Structure of a hard disk

The head reads/writes on one track in a bit serial manner, meaning bit by bit. To read/write the next track requires the arm to re-position to the next track. In reality it is less likely that data packages will fill a whole track, therefore a sector or cluster is used to store data. One cluster can consist of one or more sectors, depending on the formatting of the HDD. An exemplary size of a sector is 512 Bytes, which is a standard size for most HDD’s in use. CD’s and DVD’ use 2048 Bytes as a sector size, while newer HDD’s with the advanced format attribute may even have a sector size of 4096 Bytes (4KiB).

To determine the capacity (volume) of a storage media (BV) the following term is used:

𝑩𝑽 = 𝑵𝑶𝑯 ∙ 𝑵𝑶𝑻 ∙ 𝑵𝑶𝑺 ∙ 𝑵𝑩𝑺

NOH: number of heads

NOT: number of tracks

NOS: number of sectors

NBS: number of Bytes per Sector (by default 512 Bytes/ Sector)

25

3.3 STRUCTURE OF PORTABLE DATA DISCS

The following chapter describes the structure and organization of a portable disc. While the floppy disk was the most common portable data storage system in the 80’s to early 90’s it has today been superseded by CD’s and DVD’s as data medium. There are still some areas where floppy disks are used even today, mostly because the system still works and an update would be unnecessary and expensive. In order to understand the evolution of disc shaped data devices the 1.44” floppy disk will be explained and compared to the optical based data storage systems.

Fig. 3-9 Allocation of capacity on a disk

Figure 2-9 shows the schematic of a 1.44” disk and its similarities to a HDD. But while a HDD can use every sector on a platter the floppy disk cannot use its innermost tracks due to the different data density. This means that parts of the physical space of the disk cannot be used for data storage. This problem was solved with the technology of optical data discs due to a spirally alignment of sectors with a fixed size (ref. Fig. 2-10), resulting in a much higher data density.

26

Fig. 3-10 Sector alignment on a CD-ROM

Unlike most CD’s floppy disks and hard disks can be written on both sides. The indexing of the sector identification number was done by track number, meaning, that the sectors of one side were numbered and then continued on the other side. When all tracks on the backside were numbered the front side were next (ref Fig. 2-11).

Fig. 3-11 Numbering of the sectors on a two-sided disk

27

3.4 BLOCK SIZE AND MANAGEMENT OF FREE SPACES

3.4.1 OPTIMAL BLOCK SIZE

To determine the optimal block size for a data storage system it is necessary to understand the characteristics of each method. If the disk is organized in small blocks the disk space can be better utilized to store data compared to large blocks (-> fragmentation). The disadvantage of this method is an increased cost of space required to manage the huge number of blocks (ref. chap. 2.1.1). For example, a HDD with 16 GiB disk space and a block size of 512 bytes means there are about 33.55M blocks to manage and thus addresses with 25 bits length are needed. In addition, small blocks may result to long loading times, if a big file is distributed on many different blocks, depending on the data rate of the HDD. Large blocks on the other hand may result in poor disk exploitation due to internal fragmentation. The reading time of the HDD is the other variable that needs to be addressed in order to determine the optimal block size. Figure 2-12 shows an exemplary platter with the associated rotation- and position time.

Fig. 3-12 Disk layout, head positioning and rotation time

The following formula shows how to determine the required time to read a block.

𝑹𝒆𝒂𝒅 𝒕𝒊𝒎𝒆 = 𝑩𝒍𝒐𝒄𝒌 𝑺𝒊𝒛𝒆

𝑻𝒐𝒕𝒂𝒍 𝑻𝒓𝒂𝒄𝒌 ∙ 𝑹𝒐𝒕𝒂𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆

16,67 ms/rotation ↔ 3600 rotation/min

32.768 Bits/sector ↔ 64 sectors with 512 bits

Figure 2-13 shows the relationships between data rate, block size and disk exploitation. The data rate and disk usage is plotted as a function of the block size with block/cluster sizes of 128 Bytes to 8KiB with an assumed file size of 1KiB.

28

Fig. 3-13 Disk Usage and Data Rate

The solid curve (left-hand scale) shows the data rate of the disk. The dashed curve (right-hand scale) shows the efficiency of disk usage.

To calculate the values (data rate) the following formula was used:

𝑫𝒂𝒕𝒂 𝒓𝒂𝒕𝒆 = 𝒕𝒊𝒎𝒆 𝒕𝒐 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏 𝒉𝒆𝒂𝒅 + 𝟏

𝟐 𝒓𝒐𝒕𝒂𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆

+ 𝒕𝒓𝒂𝒏𝒔𝒇𝒆𝒓 𝒕𝒊𝒎𝒆 𝒇𝒐𝒓 𝒕𝒉𝒆 𝒃𝒍𝒐𝒄𝒌

For example:

[30+8,3+(k/32.768) * 16,67] ms.

Block size k in Bit, track length would be 32.768 Bits 4K : 38,3 ms+ (4K/32K) * 16,67 ms = 40,38 ms = 101,44 Kbit/sec 2K : 39,34 ms = 50,84 KB/sec 1K : 38,82 ms = 25,76 KB/sec

To determine the percentage of disk exploitation the following formula was used:

𝑬𝒅𝒊𝒔𝒌 = {

𝟏, 𝒃𝒍𝒐𝒄𝒌 𝒔𝒊𝒛𝒆 < 𝒇𝒊𝒍𝒆 𝒔𝒊𝒛𝒆

𝒇𝒊𝒍𝒆 𝒔𝒊𝒛𝒆

𝒃𝒍𝒐𝒄𝒌 𝒔𝒊𝒛𝒆, 𝒃𝒍𝒐𝒄𝒌 𝒔𝒊𝒛𝒆 ≥ 𝒇𝒊𝒍𝒆 𝒔𝒊𝒛𝒆

Note that this formula only works for files with a fixed size of a multiple of the block size. The terms external and internal fragmentation are related with the allocation and selection of block size. External fragmentation occurs when stored data is released and the related disk space is rewritten multiple times. Internal fragmentation occurs when the typical file size is smaller than the set block size. Both effects are undesired.

Data rate (KiB/s)

Disk exploitation (percentage)

29

3.4.2 FREE SPACE MANAGEMENT

Since a HDD is divided into a vast number of separate blocks it is necessary to keep track of the used and unused disk space. The memory mapping needs to be efficient and accurate in case a read/write operation form the CPU is given in order. The size of the blocks determines the complexity and space requirement of the management method and thus its overall effectiveness. Following are two methods of memory management presented:

1. Bitmap.

Fig. 3-14 Bit-Map

A Bitmap provides a simple way of tracking memory words in a fixed amount of memory. The size of bitmap depends on the size of the mapped memory and how it is allocated. Every cluster is mapped according to its virtual position (usually represented by registered number) and represented by one bit. The bit is set to 0 if the space is free and set to 1 if the space is occupied. The smaller the cluster size is set the larger the bitmap gets, because more clusters need to be addressed and represented through the bitmap.

Advantage: fast and effective method to find an unoccupied memory block or n consecutive blocks, because there is no need to access the HDD for information.

Disadvantage: To be effective the bitmap has to be stored in the main memory of the system. The space (in Bits) it occupies corresponds to the number of sectors of the HDD plus an offset of system-relevant bytes.

Example: A HDD of 128GiB, formatted in NTFS has a cluster size of 4KiB. The size of the pure bitmap is calculated according to the following formula, k equals the number of Bits one cluster is represented in the bitmap (here: 1)

𝑆𝐵𝑖𝑡𝑚𝑎𝑝 = 𝑆𝐻𝐷𝐷 ∗ 𝑘 𝐵𝑖𝑡

𝑆𝐶𝑙𝑢𝑠𝑡𝑒𝑟 ∗ 𝑛 𝐵𝑖𝑡

𝐵𝑦𝑡𝑒

= 128 𝐺𝑖𝐵

4 𝐾𝑖𝐵 ∗ 8 1

𝐵𝑦𝑡𝑒

= 128 ∗ 230 𝐵𝑦𝑡𝑒

4 ∗ 210 ∗ 8= 4𝑀𝑖𝐵

30

2. Linked list of free blocks.

Fig. 3-15 Linked list

This method creates a linked list of disk blocks, with each block holding as many free disk block addresses as possible. The space needed for a linked list can be calculated by the following formula:

Stable = (Nbr. of Bytes per pointer * Nbr of free blocks) * (1 + 1 / cluster size)

Example:

Same HDD as in the Bitmap-example is used, all blocks are considered free:

Number of free blocks = 𝑆𝐻𝐷𝐷

𝑆𝐶𝑙𝑢𝑠𝑡𝑒𝑟 = 32 * 2^20 = 32,554,432

Number of Bytes per pointer => 2^5 * 2^20 => 25 Bits => 4 Byte

𝑆𝑡𝑎𝑏𝑙𝑒 = (4 𝐵𝑦𝑡𝑒 ∗ 32,554,432) ∗ (1 + 1

4∗1024 𝐵𝑦𝑡𝑒)

≈ 128 𝑀𝑖𝐵

The linked list is much bigger than the bitmap but does not need to be stored in the main memory, since it shows only free areas of the disk, making a search routine for free memory blocks unnecessary. A search-operation for a bulk of free clusters is only carried out if the free clusters need to be adjacent to each other. Like the linked list linked block index system this method has equal disadvantages and the same structure. The last entry of a block contains the pointer to the next block with addresses of free clusters.

In the days of MS-DOS this method has been developed into a more effective one, which is still used today, called the File Allocation Table. Files a stored via linked list method (ref Fig. 2-16) and a table in main memory keeps track where a file begins and which blocks of the HDD are unused.

31

Fig. 3-16 Organization of files in MS-DOS (FAT)

Figure 2-17 shows the interaction between the directory table and the memory. To find a free block, FAT is searched from the beginning until a cluster appears that matches the requirements.

Fig. 3-17 Linked List on MS-DOS (FAT)

32

3.4.2.1 FAT Sizes: FAT12, FAT16 and FAT32

The file allocation table or FAT stores information about the clusters on the disk in a table. There are three different varieties of this file allocation table, which vary based on its maximum size. The tool of the system used to partition the disk will normally choose the optimal type of FAT for the volume available, however, the type of FAT can sometimes still be chosen manually.

Since each cluster has one entry in the FAT, and these entries are used to hold the cluster number of the next cluster used by the file, the size of the FAT is the limiting factor on how many clusters any disk volume can contain. The following are the three different FAT versions now in use:

• FAT12: The oldest type of FAT uses a 12-bit binary number to hold the cluster number. A volume formatted using FAT12 can hold a maximum of 4,086 clusters, which is 2^12 minus a few values (to allow for reserved values to be used in the FAT). FAT12 is therefore most suitable for very small volumes, and is used on floppy disks and hard disk partitions smaller than about 16 MB.

• FAT16: This FAT is mostly outdated but may still be found on older systems, uses a 16-bit binary number to hold cluster numbers. A volume using FAT16 can hold a maximum of 65,526 clusters, which is 2^16 less a few values (again for reserved values in the FAT). FAT16 is used for hard disk volumes ranging in size from 16 MB to 2,048 MB. VFAT is a variant of FAT16.

• FAT32: The newest FAT type, FAT32 is supported by all past windows-systems since 95. FAT32 uses a 28-bit binary cluster number--not 32, because 4 of the 32 bits are "reserved". 28 bits is still enough to permit large volumes--FAT32 can theoretically handle volumes with over 268 million clusters, and will support (theoretically) drives up to 2 TB in size. However to do this the size of the FAT grows very large. For these limitations most memory storage systems with a size > 4GiByte use the now-standard NTFS as a default file system.

The following table shows a comparison of the three types of FAT:

Attribute FAT12 FAT16 FAT32 NTFS

Used For Floppies and very small hard disk volumes

Small sized hard disk volumes (old)

Small to Medium-sized to hard disk volumes (pre NTFS standard)

New standard for data storage greater or equal 4 GiB

Size of Each FAT Entry

12 bits 16 bits 28 bits

Maximum Number of Clusters

4,086 65,526 ~268,435,456 ~4,295 Billion

Cluster Size Used

0.5 KiB to 4 KiB usually 512 Byte

0.5 KiB to 64 KB usually 4 KiB

0.5 KB to 32 KB usually 4KiB

4KiB to 64KiB

Maximum Volume Size

16,736,256 2,147,123,200 Win NT 3.5 ~ 2^41 newer OS: ~ 2^35

256 * 10^40

Fig. 3-18 FAT size table

33

3.5 DIRECTORIES

Directories are used in a computer system for several reasons. They are used to order files or separate them from each other in order to avoid unintended data modification. Users can implement them to create backups from existing files, while an OS does this in a similar manner before an update is implemented. Folders also contain data used by a joint program to minimize the search of certain software elements and to sort files by certain standards to simplify a search initialized by the OS. Summarized:

To order files

Backup and updates

To avoid unintended data modification

Allow joint use of data

Issue of access rights

Shorter search time for files by operating system

The directories are located on the HDD in disk blocks and they include the mapping of file names to their respective disk blocks. A directory entry in MS-DOS looks like this, Fig. 2-19.

Byte: 8 3 1 10 2 2 2 4 |32Byte

File name

Name extension

Attribute

Reserved

time

date

number 1st block

size

Fig. 3-19 Directory entry in MS-DOS

The following table shows the directory entries and information about file and cluster sizes. Here the size of a sector is assumed to be 512 bytes and there is no distinction between block and sector.

Attributes

Read only: not modifiable |R

Archive: archived since last modification |A

System: cannot be delete using del |S

Hidden: is not listed in dir |H

Directory |D

Disk label |V

Fig. 3-20 Attributes for entries in MS-DOS

34

The following table shows the entries of a directory in detail and offers information about the file- and cluster size (ref. Fig. 2-21). In this case the size of a sector is fixed 512 Byte and there is no differentiation between block and sector.

Position Length Content

0x00-0x07 8 File name

0x08-0x0A 3 File name extension 0x0B 1 Attribute

0x01 File is read-only 0x02 Hidden file(will not be shown with DIR) 0x04 System File(will not be shown with DIR) 0x08 Volume label (Directory entry ist he disk name)

0x10 Directory entry refers to a subdirectory

0x20 File has not yet been archieved 0x0C-0x15 10 reserved for DOS 0x16-0x17 2 Time of last modification or the creation of the file 0x18-0x19 2 Data of last modification or the creation of the file 0x1A-0x1B 2 Start cluster number of the file 0x1C-0x1F 4 Length of the file in bytes

Cluster size in sectors max. File system size

(FAT 12) (FAT 16)

1 = 512 Byte 2 MiByte 32 MiByte 4 = 2 KiByte 8 MiByte 128 MiByte 8 = 4 KiByte 16 MiByte 256 MiByte

16 = 8 KiByte 32 MiByte 512 MiByte 32 = 16 KiByte 64 MiByte 1024 MiByte

Cluster size in sectors FAT-size with 32-MiByte-File system

1 = 512 Byte 64 KiByte (16-Bit-FAT) 4 = 2 KiByte 16 KiByte (16-Bit-FAT) 8 = 4 KiByte 6 KiByte (12-Bit-FAT)

16 = 8 KiByte 3 KiByte (12-Bit-FAT)

Cluster size in sectors Number of file Lost Storage/KiB

1 100 25 4 500 500 8 1000 2000

16 2000 8000

Fig. 3-21 Directory entries, file and cluster sizes

The following figures show several examples of directory structures in different systems. While the general structure of the different directory methods is quite similar, there are differences between each method according to their type of space management. The older floppy disks / hard disks utilized the structure of a file directory was according to figure 2-22.

35

Read from ROM-BIOS

Fig. 3-22 Arrangement of entries on disk

The boot sector is the first sector in this arrangement which starts at the relative address 0. This was always a reserved sector but usually the only one. Followed by this was the file allocation table which size depended on the chosen type of FAT as well as the root directory and the file blocks.

The UNIX file system uses the following directory entry:

Byte 2 14/255 im BSD-UNIX

Fig. 3-23 UNIX Directory Entries

Those entries are ordered as shown in figure 2-24 on a hard disk.

Content

Boot

Block

Super-

Block

I-Nodes

(per 64 bytes) Data Blocks

512 Byte

Fig. 3-24 Arrangement of entries in a disk

Every I-Node (of 64 Bytes) equals one file. Every file is stored in data blocks. The I-Node reference which block belongs to a file and the super block contains relevant information about the file system.

Boot Sector

Sector 0

FAT

FAT- copy

Root Directory

File Blocks

I-Node Data name

36

3.5.1 FILE ORGANIZATION IN UNIX

Each file system has a table of contents, in which all existing files are recorded. In UNIX this is the I-Node list, the I-Nodes are the elements, which represent the header.

The structure has the following order:

Block 0 (boot block)

Superblock

List of Head of Files (I-node list)

Range of the data blocks

Fig. 3-25 UNIX file structure

The super block in main memory contains information about:

the size of the file system in blocks of 512 bytes

Name of the log. Disc

Name of the file system

Size of the I-node list

Pointer to the first element of the list of free data blocks

Pointer to the first element of the list of the free I-nodes

Date of last modification

Date of last backup

Identification whether 512 or 1K byte file system exists

37

The structure of an I-Node is shown in Fig. 3-. The I-Node 1 manages faulty blocks, node 2 contains the root folder.

Fig. 3-26 Search for file /usr/gd/DV3

38

4 MEMORY MANAGEMENT

The term memory management is a preamble for certain methods to order, save and if necessary relocate files on storage systems. As it is known so far the memory access speed has a large impact on the overall speed of the system. The typical computer has several levels of memory depending on the specific purpose the memory serves. The fast access cache, which is usually implemented on the CPU is used to store code segments that are needed for the current operation. The main Memory (in a PC called RAM) is used to store files relevant for the executed application. Files with a low access priority are stored on a mass data storage device (HDD). Those devices are designed for storage capacity and organized by the operating system for optimal usage of disk space.

The amount of space of a disk that can be utilized to store files depends on the method how the files are stored and ordered and the actual size of the storage medium (See Chapter 3).

When dealing with wasted memory space there are 2 terms that are encountered:

Internal fragmentation: large unused wasted memory spaces that result from allocation.

External fragmentation: very small free memory spaces that are spread all over the memory, that can neither be used for allocation nor to be merged with one another or, in case of a HDD, free memory space between used memory blocks.

The performance of a computer system depends partially on the effective usage of space in the cache and the main memory. If a large amount of memory is wasted, due to fragmentation, ineffective indexing and similar problems, a request for data will take more time than necessary. While this data request is pending the CPU cannot continue calculating the current process and the system will slow down.

A high amount of fragmentation can lead to memory shortage in the RAM, which will result in a higher amount of necessary reloads from the hard disk (Virtual Memory Concept or Storage Access), thus resulting in longer memory access times. An inefficient indexing affects the data access time directly because the process of copying data from the main memory to the cache gets interrupted by the tracing of the next memory block from the current memory segment.

Developers want to achieve an increase of computer performance as much as possible. This performance is considered by a user as time between request and response; the operator of a router considers this as throughput, i.e. number of routed packets/time unit.

The difference between two execution times is calculated to:

𝒏 = 𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 𝑿

𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 𝒀

With n giving a relative number of the improvement (or worsen) between two execution times. The performance is measured by programs that load (as in burden) different systems with an equal task and compare the performance to one another (Benchmarks).

39

Locality Principle

According to a rule of thumb: 90% of the execution time of a program consumes 10% of the commands of the program. This is known as access locality. The same applies to the data access.

We distinguish between temporal and spatial locality:

a) Temporal locality states that recently addressed commands are more likely to be addressed again next;

b) Spatial locality means that temporally consecutive commands are also spatially adjacent (for example, loops).

Observations show that actually the last-mentioned addresses are most likely to be addressed again as the next. It is further shown that access to spatially neighboring addresses occurs adjacent also in time. From this experience results the concept of memory hierarchy as a measure for organizational performance improvement. We distinguish between CPU registers, cache, main memory/RAM, disks, floppy disk and CD’s.

40

4.1 MEMORY HIERARCHY

Implementing a smart memory hierarchy is meant to speed up the execution of tasks. To address these improvements some basic concepts should be cleared first:

Addressing is divided two parts: a physical address (for CPU, registers, cache, RAM) and a logical address (for mass storage). Mapping is the translation from logical address (virtual address) to physical address.

Amdahl´s Law:

The performance improvement of a system to be gained from using faster mode of execution is limited by the slowest fraction of a system that can’t be parallelized. This means that the slowest part of a process specifies the optimal time the process can be done. The performance improvement is also called speedup.

Example formula for determining the speedup of a system:

𝑺𝒐𝒗𝒆𝒓𝒂𝒍𝒍 =𝑶𝒓𝒊𝒈𝒊𝒏𝒂𝒍 𝒆𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆

𝑰𝒎𝒑𝒓𝒐𝒗𝒆𝒅 𝒆𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆

𝑺𝒐𝒗𝒆𝒓𝒂𝒍𝒍 =𝒕𝒐𝒍𝒅

𝒕𝒏𝒆𝒘

=𝟏

(𝟏 – 𝑷𝒊𝒎𝒑𝒓𝒐𝒗𝒆𝒅) + 𝑷𝒊𝒎𝒑𝒓𝒐𝒗𝒆𝒅

𝑺𝒇𝒓𝒂𝒄𝒕𝒊𝒐𝒏

=𝟏

(𝟏 − 𝑷𝒊𝒎𝒑𝒓𝒐𝒗𝒆𝒅) + 𝑷𝒊𝒎𝒑𝒓𝒐𝒗𝒆𝒅𝒕𝒐𝒍𝒅𝒕𝒏𝒆𝒘

Formula. 4-1 Speedup

The improvement of a system is therefore determined by the probability of the activation of the part and the speed improvement it provides.

41

CPU L1 Cache L2 Cache Main Memory I/O-Device

(Register) (SRAM) (SRAM) (DDR3 RAM) (Drives)

Size 256 B 32 KiB 256 KiB 4 GiB >128 GiB Access time 0.28 ns ~1 ns ~3 ns ~40 ns ~5ms

For taking a look at the memory access, Amdahl´s law is used to compare a system with and without a cache.

If it is supposed that the cache is 10 times faster than main memory and the cache can be used 90% of time (90% cache hits, the following performance improvement (speedup) can be achieved:

SOverall =1

(1 − % cache access) + % cache access

Scache

𝑆𝑂𝑣𝑒𝑟𝑎𝑙𝑙 = 1

1 − 0.9 + 0.910

= 1

0.19 = 5.26

If the CPU execution is considered, it should be known that performance is made out of

• the time texecution to handle a cache hit (i.e. the time to execute the instructions only including cache accesses) and

• The time tmemory_stall the CPU is stalled waiting for a memory access.

Resulting in:

𝑪𝑷𝑼𝒕 = 𝒕𝒆𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 + 𝒕𝒎𝒆𝒎𝒐𝒓𝒚_𝒔𝒕𝒂𝒍𝒍

𝑪𝑷𝑼𝒕 = (𝑪𝑷𝑼 𝒄𝒚𝒄𝒍𝒆𝒔 + 𝒘𝒂𝒊𝒕 𝒄𝒚𝒄𝒍𝒆𝒔 𝒐𝒏 𝒎𝒆𝒎𝒐𝒓𝒚) ∙ 𝑻

T = wait periods

Wait cycles on memory = number of misses * penalty cycles

To have the possibility to calculate the CPU execution time the different terms of the sum should be considered in a more detailed way:

Access to the cache

The CPU execution time without any memory access (including all cache hits) is the product of:

• CPI, cycles per instruction,

• IC, the number of instructions called instruction count and

• T, the CPU clock, e.g. 1 GHz T = 1ns.

Resulting in:

42

𝒕𝒆𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 = 𝑪𝑷𝑰 × 𝑰𝑪 × 𝑻

Memory accesses

The time the CPU is stalled during the cache misses is the product of

• MPI, the memory accesses per instruction

• IC, the instruction counter,

• MR, the (cache) miss rate,

• penaltymiss , the number of cycles in case of a miss and

• T, the clock cycle time.

Resulting in:

tmemory_stall = MPI · MR · penaltymiss · IC · T

The miss rate MR is calculated by the number of the accesses that miss divided by number of accesses (both hits and misses).

MPI means only if an instruction demands for memory accesses, the number of the memory accesses is described by MPI.

The product MPI · MR describes the number of the cache misses per instruction.

So the final execution time for CPU is:

CPUt = (CPI + MPI · MR · penaltymiss) · IC · T

Example: Assume a system with CPI = 8.5, MPI = 3, MR = 0.11%, penaltymiss = 6. Calculate the execution time!

What will be the CPU execution time if no cache exists?

With Cache:

CPUt = (8.5 + 3 · 0.11 · 6) · IC · T = 10.48 · IC · T

Without Cache:

CPUt = (8.5 + 3 · 1 · 6) · IC · T = 26.48 · IC · T

Thus it is reasonable, in addition to a correspondingly large set of registers, to provide a cache with sufficiently large size, which takes up reasonably large program parts that would result in shorter processing times. The term cache makes it clear that it is not a question of addressable storage locations, such that the registers or the RAM, but of an area that is not accessible or hidden to the programmers (assembler level). The operating system administers the (automatic) management of memory hierarchy. In case of cache, this is done by a special hardware (usually the CPU).

43

4.2 CACHE MEMORY

The “hidden” cache memory is located between the CPU and RAM, initially outside, today on the CPU chip. Nowadays, two cache levels - an internal one and an external one with different sizes and speed- are common, in order to further increase the speed. Since the physical size of memory elements has decreased over the past years most modern CPU’s have a third level cache (L3 cache) that is usually shared between the separate cores and has a size up to 8 MiB. The basic cache arrangement is shown here:

Fig. 4-1 Basic memory arrangement with cache

Figure 3-2 shows the attributes and relative location of the different types of memory:

Fig. 4-2 Basic relation between different components

Figure 3-2 shows the relative location of the memory to the CPU. The speed of a data access operation from the CPU to a memory system depends on the speed of the BUS system and the relative “distance” to the specific system. A cache is usually connected directly with the core of a CPU and stores only small amounts of data, meaning a data transfer can be done in 1 CPU clock. Transfer Data from RAM (i.e.) DDR3 1600 RAM system is done with a transfer rate of 1600 MT/s (Mega Transfers/Sec) on a 64 Bit Bus, requiring about 5 system clocks (Not including delays and access times).

44

DRAM and SRAM:

The difference between both RAM types is given in the first letter, D stands for “dynamic” and S stands for “static”. The cycle time of SRAM is 10 to 20 times faster than DRAM, but for same technology, the capacity of DRAM is 5 to 10 times bigger than that of SRAM. Therefore it can be said:

• Main memory is DRAM

• On-chip caches are SRAM

• Off-chip caches depends

4.2.1 CACHE STRUCTURE

A cache miss may increase the processing time of a given task. To improve a system it is necessary to understand what causes a cache miss and how to decrease the chance of a miss. Usual causes of a miss are:

First access (Compulsory)

Capacity is too low

Conflict by different block addresses

Parameters of cache:

Block (or row) size 4 - 128 Bytes

Hit time 1 - 4 Cycles (normal 1)

Failure access time 8 - 32 Cycles (time to replace a block)

Access time 6 -10 Cycles (Access to 1. word of the Block)

Transfer time 2 - 22 Cycles (Time for remaining words)

Failure access rate 1 % - 20 %

Cache size 1 KiB – 64KiB (L1), 256KiB – 2MiB (L2), 2-20MiB (L3)

45

4.2.2 CACHE PERFORMANCE

In order to calculate the time a CPU needs to acquire a set of data the following formula is used.

𝒕𝒂𝒄𝒄 = 𝜶 ∙ 𝒕𝒉𝒊𝒕 + (𝟏 − 𝜶) ∙ 𝒕𝒑𝒆𝒏𝒂𝒍𝒕𝒚

In this case thit stands for the time the process needs in case of a cache hit, tpenalty

equals the time a system needs for an external memory access. 𝜶 is the probability of a cache hit.

Based on the experience that the hit rate of the instructions is higher than the one of the data, one can now make a decision about:

separate

combined

instruction and data cache.

Example:

thit = 5 ns tpenalty = 5 ns ∙ 20 penalty cycles = 100 ns

From statistical studies of programs it can be seen that approximately 26% of the commands relate to the instructions of a program and 9% to data. This means that 74% of the memory accesses (26 of 26 +9) go to the instruction cache and 26% to the data cache. Table 4-1 shows results from measurements on the hit rates for various cache sizes.

Memory Space

Instruction Cache

Data Cache

Shared Cache

1 KiB 3.06% 24.61% 13.34%

2 KiB 2.26% 20.57% 9.78%

4 KiB 1.78% 15.94% 7.24%

8 KiB 1.10% 10.19% 4.57%

16 KiB 0.64% 6.47% 2.87%

32 KiB 0.39% 4.82% 1.99%

64 KiB 0.15% 3.77% 1.35%

128 KiB 0.02% 2.88% 0.95%

Table 4-1 Measurement results on hit rates and cache sizes

46

Example:

Which arrangement gives the lower miss rate?

Shared cache with 32 KiB or

Separate instruction and data cache with 16 KiB each

About 74% of the memory accesses in a system relates to instruction access and 26% to the data access. The data in Table 4-1 are calculated for the 16 KiB cache miss rate from:

(0.74 x 0.64%) + (0.26 x 6.47%) ≈ 2.16%

The table provides, on the other hand, for the 32 KiB common cache: 1.99%. Therefore, for the performance, the average access time is to be considered with respect to miss rates!

For the average access time to memory:

𝒕𝒂𝒄𝒄 = % 𝒄𝒐𝒎𝒎𝒂𝒏𝒅𝒔 (𝒓𝒆𝒂𝒅 𝒕𝒊𝒎𝒆 𝒐𝒏 𝒉𝒊𝒕 + 𝒎𝒊𝒔𝒔 𝒓𝒂𝒕𝒆 × 𝒑𝒆𝒏𝒂𝒍𝒕𝒚 𝒕𝒊𝒎𝒆) + % 𝒅𝒂𝒕𝒂 𝒂𝒄𝒄𝒆𝒔𝒔 (𝒘𝒓𝒊𝒕𝒆 𝒕𝒊𝒎𝒆 𝒐𝒏 𝒉𝒊𝒕 + 𝒇𝒂𝒊𝒍𝒖𝒓𝒆 𝒂𝒄𝒄𝒆𝒔𝒔 𝒓𝒂𝒕𝒆 × 𝒑𝒆𝒏𝒂𝒍𝒕𝒚 𝒕𝒊𝒎𝒆)

For separate cache 16 KiB:

𝑡𝑎𝑐𝑐−𝑠𝑝𝑙𝑖𝑡 = 74% (1 + 0.64 × 50) + 26% (1 + 6.47% × 50) ≈ 2.08

For combined cache 32 KiB:

𝑡𝑎𝑐𝑐 = 74% (1 + 1.99% × 50) + 26% (1 + 1∗ + 1.99% × 50) ≈ 2.26

* The additional clock cycle resulting from the collision between instruction and data cache access in shared cache

4.2.3 AVERAGE ACCESS TIME

The average access time for each memory access (to either cache or to the main memory) can be calculated as follows:

𝒕𝒂𝒄𝒄 = 𝜶 ∙ 𝒕𝒄𝒂𝒄𝒉𝒆 + (𝟏 − 𝜶) ∙ 𝒕𝑫𝑹𝑨𝑴

Where: α = cache hit rate 1- α = cache miss rate (MR) tcache = time to access cache tDRAM = time to access main memory

For example: Main Memory with 125 ns access time 11 Wait States* Cache with 12 ns access time “Zero Wait States” Clock cycle processor 12 ns = 83 MHz

* Wait states: number of clock cycles, the CPU waits for memory

47

The average access time of a set of data with a hit rate of 90% results in:

tz /ns= 0,9 *12 + 0,1 *125 = 23,3

The average number of wait states can be calculated to: NState_wait = 0.9*0 + 0.1 *11 = 1.1

The average access time gives a rough overview of an access time of a given system, it cannot however calculate all cases of a cache access. For example if a given set of data is not simultaneously copied to the CPU registers and the Cache, a new data access with a cache hit would be required to access this newly copied data. This depends on the working parameters of the system that manages the cache memory and data access.

4.2.4 CACHE ORGANIZATION

Cache can be organized as follows:

• Fully Associative: a block can be placed anywhere in the cache

• Direct Mapped: each block has only one place it can appear in the cache n# of block = Block address MOD Number of blocks in cache

• N-Way Set Associative – a block can be placed in a restricted set of n blocks in the cache n# of set = Blocks address MOD Number of set in cache

The range of caches from direct mapped to fully associative is in fact a subtype of levels of set associative cache:

• Direct mapped cache is one-way set associative

• Fully associative cache consisting of m blocks is m-way associative.

Thus direct mapped cache and fully associative cache are special cases of the n-way set associative cache. The majority of processor caches today are direct mapped, 2-way set associative or 4 way set associative, depending on their respective level.

If the cache is not direct mapped, there are many blocks to choose from on a miss. Which strategies are employed for selecting which block to replace?

Random – the candidate blocks are randomly selected (simple to build in hardware).

Least-recently used – the block replaced is the one that has been unused for the longest time.

48

The following figure depicts the miss rates vs. set associativity.

Fig. 4-2 Miss rates vs. set associativity

Figure 3-2 shows, that a bigger cache and a higher level of set associativity result in lower miss rates but. But higher levels of associativity may result in longer cache searches resulting in a smaller time improvement in contrast to lover associativity. Following is an example for a cache system with different methods of mapping:

A cache contains 8 blocks and the main memory consists of 32 blocks. Fig. 4-3 and Fig. 4-4 describe the mapping of the block 12 from the main memory into the cache.

Fig. 4-3 Block 12 of the memory

Fig. 4-4 Set Associative Cache Mappings

A real cache contains hundreds of blocks and has to map millions of blocks of a real RAM system.

49

4.3 MAIN MEMORY MANAGEMENT

Main memory management is an important task of the operating system. If the RAM is managed effectively the overall speed of the computer system improves. Usually main memory management-systems can be classified into:

1. One that transfers processes during execution between main memory and hard disk by means of swapping and paging

2. One that does not.

4.3.1 MEMORY MANAGEMENT WITHOUT SWAPPING AND PAGING

The easiest Memory-Management is to keep only one process in memory. The process has full access to the whole memory and after the process terminates the next process is loaded.

As mentioned later in this lecture, having several processes in memory is better for many reasons. However, to process several processes, memory has to be assigned to that a specific process.

All multi process operation systems have the problem of relocation and protection.

4.3.1.1 Relocation

Since it is unknown where in memory a process will be loaded, absolute addressing of memory needs additional modification.

For example, if a process wants to read or write data to the address 100 and the process begins at the position 100KiB, the memory access must be modified to (100KiB + 100). If the process starts at 200 KiB it must be modified to (200KiB + 100) and so on.

One solution is to store additional information, which reference to all positions in the program using absolute addressing. Another solution is to use a hardware register which is automatically added to all memory accesses (segmentation).

4.3.1.2 Protection

Relocation does not solve the problem that a process may read or write from partitions of other processes (spying or destructing).

Segmentation could be a solution by adding a second register at the end of a partition. Memory request not within the start and the end of a partition are disabled by the hardware.

4.3.1.3 Partitions

4.3.1.3.1 Creating Partitions

There are two ways for creating partitions:

Fixed Partitions: splits the memory into n fixed partitions. This is by far the easiest way. General problem of this method is that the number of partitions and the sizes of the partitions can never be optimal. For example:

50

Consider the following scenario: Memory is divided into 4 partitions with the respective sizes as depicted in Fig. 4-5. The green boxes indicate the jobs assigned to the corresponding partitions. The distribution of jobs are carried out using:

o Several wait queue: each partition manages a list of jobs. Jobs are added to the shortest partition list. The problem of this method is, lists of small partitions may be full while lists of bigger partitions are empty (waste of time).

o One wait queue: partitions share a common list, hence the jobs are distributed to the next idle partition. Here the problem appears, that small jobs may waste space of a big partition while smaller partitions are idle and a job that needs a big partition has to wait.

Fig. 4-5 a) Several wait queue b) One wait queue

Variable partitions: This method splits the memory into varying partition volume, depending on the request. Considering the size and number of partitions, this method is efficient. However, memory allocation and de-allocation becomes complicated.

4.3.1.3.2 Allocation Strategies

The following strategies are used to find the partitions to be allocated:

1. First fit: takes the first partition that is large enough to accommodate the request. Easy, quick and cheap to implement.

2. Next fit: remembers the last position of a free partition, and starts from there to find the next appropriate free space.

3. Best fit: finds for the most optimal size of a partition for the corresponding request. Although this strategy prevents large memory waste, it often leads to the creation of many very small partitions that cannot be used at all.

4. Worst fit: finds the largest possible partition to be allocated to the process.

The 4 strategies above often lead to the main problem of memory allocation - the creation of many very small partitions that cannot be allocated to large memory request. The solution to this was memory compaction where neighboring partitions are put together to form larger partitions. Unfortunately, the process of finding neighbouring partitions results in large

51

administration overhead. To reduce large administration overhead, the Buddy-System is introduced.

Buddy-System:

Memory is divided into buddies (neighboring partitions of exact same volume) only when a request arrives. The size of a partition is always 2k, and it is determined according to the request. If the buddies are freed, they can be merged to form a larger partition.

Advantages: fast allocation and de-allocation, reduced overhead for memory compaction.

Disadvantages: some requests results in a large unused memory space. For example request is 513 KiB, where the complete memory size is 1MiB. The whole memory has to be assigned to the request!

4.3.2 MEMORY MANAGEMENT WITH SWAPPING AND PAGING

As long as there is enough memory to keep all processes, there is no need to use something more complicate, e.g. embedded system. In other systems there may not be enough memory for all processes. If currently idle, waiting or interrupted processes can be moved from memory to disk, the partitions can be freed, compacted and used by a process reloaded from the disk to its (new) position.

In swapping-system the whole process and its data is moved in or out. This could be several megabytes every time the process is moved. For swapping to be allowed, the whole process its data must fit into to the available memory.

Example of process loading

Fig. 4-6 Example of a filled cache

Now suppose Process B is swapped out.

52

Example of process loading (cont.)

Fig. 4-7 Swapping of data in a cache

Simple Paging

• Main memory is partitioned into equal fixed-sized chunks (of relatively small size)

• Trick: each process is also divided into chunks of the same size called pages

• The process pages can thus be assigned to the available chunks in main memory called frames (or page frames)

• Consequence: a process does not need to occupy a contiguous portion of memory

Page tables

Fig. 4-8 Example of page tables

• The OS now needs to maintain (in main memory) a page table for each process

• When process A and C are blocked, the pager loads a new process D consisting of 5 pages

• Process D does not occupy a contiguous portion of memory

• There is no external fragmentation

• Internal fragmentation consist only of the last page of each process

53

• Each entry of a page table consist of the frame number where the corresponding page is physically located

• The page table is indexed by the page number to obtain the frame number

• A free frame list, available for pages, is maintained

Logical address used in paging

• within each program, each logical address must consist of a page number and an offset within the page

• A CPU register always holds the starting physical address of the page table of the currently running process

• Presented with the logical address (page number, offset) the processor accesses the page table to obtain the physical address (frame number, offset)

Fig. 4-9 Example of paging

• By using a page size of a power of 2, the pages are invisible to the programmer, compiler/assembler, and the linker

• Address-translation at run-time is then easy to implement in hardware

• The logical address becomes a relative address when the page size is a power of 2

• Ex: if 16 bits addresses are used 10 bits for offset and have 6 bits for offset and have 6 bits available for page number

• Then the 16 bit address obtained with the 10 least significant bit as offset and 6 most significant bit as page number is a location relative to the beginning for the process

54

Abstract addresses

A programmer and the CPU “think” in abstract addresses, because at the time of the design and execution of a program the physical addresses of the used system are unknown. Therefore a separation of physical and logical addresses is necessary. When a program is installed its contents are stored on a disk and the physical addresses of the contents are mapped within the program / the operating system.

When the program is executed its functions work with abstract addresses that are decoded to the physical ones as soon as a data access is necessary. It parts of the program are stored in the main memory the mapping of the data has to be dynamic, since the data blocks can be relocated within the RAM and thus the abstract address the CPU is working with needs to be remapped to the new physical address.

Logical-to-Physical Address

The logical address (n, m) gets translated to physical address (k, m) by indexing the page table and appending the same offset m to the frame number k (ref. Fig. 3-10).

Fig. 4-10 Translation in paging

This figure shows how a logical address is translated into a physical. The 6 bits referencing the page number are compared with the page table and the relative address bits replace the page related bits in the logical address.

55

4.3.3 REPLACEMENT STRATEGIES FOR PAGES

When a process is still running in the CPU but the main memory / cache reaches a critical level of free space old data must be replaced, should a new data request appear. There are several strategies to determine which set of data will be deleted in order to make space for a new set. These replacement methods can have a huge impact on the systems speed, should they delete a set of data, which is important for the process within the next cycles. In this case a new access to the main memory has to be made, causing a delay depending of the data access speed to the HDD. Following are some of the most used processes for swapping data pages and a short description how they work.

Not recently used: The NRU method deletes a set of data that has not been marked as ‘referenced’ recently. Recently means that the files are time stamped at each reference and the file with the oldest time stamp is usually the one marked for deletion. If more than one or no page qualifies for this criteria the NRU method checks if one of those pages has been modified lately. If there is a match, this page will be deleted. In case of more than one page fulfilling that criteria one will be chosen randomly. Should no page qualify for replacement, one that has been read recently will be chosen randomly and deleted. This method is easily implemented and presents good results.

Least recently used: The LRU algorithm checks the ‘referenced’ marking of all pages and chooses the one with the oldest ‘last access’ entry. Problem of this method is, that pages that may be used more often but with longer intervals between two accesses may be removed from the main memory. This would result in repeating loading times of those files from a HDD.

FIFO: FIFO stands for First in – first out, which basically describes how this method works. The first data that was stored within the memory will be replaced first. This method has no regards how often a data is modified or when it was recently used, which makes this method inefficient in modern RAM-management. It is still used to create stacks however in order to memorize a sequence of tasks i.e.

Second Chance: The second chance algorithm works like the FIFO algorithm, but instead of replacing the page instantly it checks for the ‘reference’ bit. If the bit is set the page will not be swapped and the next page in line will be checked, thus preventing the removal of a heavily used page. Should all the pages in the main memory have their ‘reference’ bit set, second chance turns into a simple FIFO algorithm.

Not frequently used: The NFU method is more difficult to implement, since every page needs a counter-byte (or several). Every time a reference is done to a particular page its counter increments by one. If a page will be replaced the algorithm picks the one with the least number of accesses. This method is effective for larger main memory systems, since it needs time to figure out, which data sets are used most. Should a replacement be necessary in the early stages of a process all counters may be at a very low level, which could result in the replacement of a process that is needed more often in the future.

56

4.4 VIRTUAL MEMORY

Most computers today have a lot of GiB of RAM available for the CPU to use. Unfortunately, that amount of RAM is not enough to run all of the programs that most users expect to run at once. The amount of RAM may not be enough even to run a single program. So the concept of virtual memory was come out to solve this problem.

Here are three types of memory organization:

• One-word-wide memory organization

Fig. 4-11 One-word-wide memory

• Wide memory organization

Fig. 4-12 Wide memory

• Interleave memory organization

Fig. 4-13 Interleave memory

57

Virtual memory uses the concept of paging presented in 3.3.2. A set of data that is swapped from the working memory because it is not needed for the current process may be stored on a HDD for later use. In this case the physical address of the data will be stored in the RAM or even cache if the data is needed again. Since a physical address from a HDD may get very big in size itself it is often indexed through a paging table. This creates the illusion of a very large working memory while actually parts of the HDD are used to store process relevant data. Since the data access of a HDD takes considerably more time than access to cache and/or RAM only rarely used blocks of data are stored in the virtual memory. The translation is usually done through a hardware-implemented translation unit. The paging table operates usually not only as a reference for the physical address but gives information about the data stored, like the following:

Is the content of the address resident in the main memory?

Has a modification occurred => Differs the HDD Version from a copy in main memory

The drawback from virtual memory are the loss of memory capacity caused by the tables. This method also tends to create internal fragmentation through the pages since all pages can only be handled completely, whether or not they need all the allocated memory space. In case of how large the page size should be some arguments need to be considered. If the page is too small the result are huge tables that often need to reload but with less internal fragmentation. If they are too large fewer reloads need to be done but the main memory may be used ineffectively. In order to allocate virtual memory to best use the operation system implements the partition strategies from 3.3.1.3 for the virtual memory concept.

For a detailed analysis of virtual memory we recommend the thesis “Virtual Memory: Issues of Implementation by Bruce Jacob & Trevor Mudge

58

5 PROCESS MANAGEMENT

A process in a computer system describes the instance of a computer system that is being executed by the hardware. A program contains the data for several instructions, when it is started the corresponding hardware executes these instructions within one or several processes. The operating system needs to manage and distribute those processes according to the abilities of the hardware in order to achieve the best possible performance. To understand which method works best for a system it is necessary to examine the specifics of a process and the timeframe that is needed for completion. A program usually resides on a local memory device and is partially or entirely transferred to the main memory upon execution. The program can then be comprised of:

Programs, sub-routines

Data

Instruction pointers

Stack pointers

CPU states

Register contents

The hardware unit that executes programs or processes data is usually the CPU, some programs may use another piece of hardware like the graphics accelerator card, but in this lecture the CPU will be the main program processing unit.

Processes are described via context and are registered with the process control block (PCB). The PCB often consists of two parts:

[a] Hardware-PCB (internal by operating system)

a. description of current process

b. or (if stopped): status of all frozen variables

[b] Software-PCB (external by programmer)

a. Identification

b. State

c. Priority, etc.

59

5.1 PROCESS STATES AND PRINCIPLES OF HANDLING

Several possible process states are defined for regular process administration. This is depicted in the following figure, where conditions and triggers for the transition states are also taken into account.

Fig. 5-1 Process states and its transitions

1. Process creation by running program or operating system

2. Integration into process system by running program or operating system

3. Done by scheduler according to execution time or dispatcher according to interrupt/priority

4. Blocked while waiting for input

5. Input is completed, ready to continue

6. Process termination

60

5.1.1 PROCESS MANAGEMENT TOOLS

To run a single process on a single processor is simple (e.g. the standard personal computer), but if several processes have to be done on a single processor or maybe even distributed across several processors the above-stated definitions are of critical importance. In this case the operating system needs to monitor free resources and open tasks to allocate CPU time to them while another process is waiting for data and not actively using processor power. The tasks of the usual administrative tools which carry out these processes are illustrated in

Fig. 5-2.

Fig. 5-2 Process management tools

61

5.1.2 STRUCTURE OF MULTI-TASKING OPERATING SYSTEMS

A multi-tasking operating system constantly allocates upcoming tasks to the CPU (or one Core of the CPU) to create the image of a parallel processing of several tasks. In reality, multitasking does not mean parallel execution, but depending on the strategy the overall efficiency of processing can be increased. Those strategies almost always operate at the same principle: Processes are split up into segments and each segment is given a specific amount of time at the CPU (ref. Fig. 4-3). The amount of time and when a process is given the time depends on the implemented algorithm. Multitasking can be described as time-multiplexing, since only one CPU (or Core) is occupied with several tasks and allocated to them within a certain time frame.

Fig. 5-3 Structure of a multitasking system

In a multi-user system the OS has to monitor how many user are currently demanding the resources of the system in order to allocate them efficiently. As new users log into an operating system, access rights are monitored (log in) and fixed contingents of computer time, storage space and further resources are assigned. Furthermore, an account is opened by the operating system to keep track of the computer time used and possibly of the executed functions. Accounting allows the control of all computer operations during use as well as afterwards. In addition, an actual bill for computer services used can be produced. For every user his/her tasks are set up in a dynamic administrative chart. Apart from that, there are a number of general administrative tasks, e.g. log in and account, as mentioned above, as well as the usual terminal- I/O functions, print, etc. The scheduler assigns the processor to work on the different tasks in a time-sharing operation mode.

Two questions can arise from that structure:

Problem 1: How to change between processes

Problem 2: How to optimize efficiency in the use of CPU

62

5.1.2.1 The time sharing concept (Solution for Problem 1: To change between processes)

Figure 4-4 shows the model of the time sharing concept displayed as a time wheel. It is important to know, that not only the times of each process combined result in the time wheel, but the time needed to change between the tasks times the number of processes.

Fig. 5-4 Time-wheel model

Assume: T= 100 ms are given for pseudo parallel execution of multiple tasks if only one processor is available. Tasks P1, P2, P3 occur only a few times over hours (log in), or seldom (print). Tasks P4-P8 are user tasks. They require as much time as possible.

A scheduler divides the CPU time and allocates it to a task, which is running as context for (e.g.) registers, program counter, or stack. After the processing time of a task is over, the scheduler changes context, and, as a result, moves on to the next task (e.g. account). This sequential allotment of time slices and changes of contexts continues until the time slice of the last task has been completed at a time T and a new turn of the time wheel starts.

To change from one task to another, context has to be changed too. Context changes are used to reload process-specific registers of the processor, i.e. to switch from one virtual processor to the next. At context change, status has to be saved and reloaded, and a von Neumann machine doesn’t differentiate between data and instruction, so that it causes slow context change. Solution for context switch is separation of data and instruction.

When a process/task changes, then the stack pointer, registers, etc. must be changed according to the next process. It is necessary to ensure that the context change is done in a very short amount of time.

63

There are two methods to do so:

The complete set of user data or process data is stored in a private memory space.

Context change by pointers only: - PC: program state. - Pointers points to ceiling/start of memory slot. That’s why context change

can occur in a few µ𝑠 (another reason why increasing memory can increase performance).

- Disadvantage: The entire task must be kept within the memory.

5.1.2.2 Scheduling algorithms (Solution for problem 2: Increase efficiency of use of CPU))

The second main task of a multi-tasking operating system is the synchronisation of running programmes with technical processes. The interlocking of data-technical processes with value entry or a user input, for example, can be executed by:

a) cyclic flow control (polling).

Processes are watching inputs or other processes until either the data is available or an input occurs. The process then starts other processes or releases the CPU. Drawback: CPU is always busy with polling-processes, so it is often idle due to I/O operations.

Fig. 5-5 Flowchart of cyclic flow control

64

b) time-controlled.

Processes get a certain (fixed) amount of time for operation. After the allocated time expires the next process in line gets the same amount of time. Drawback: how to handle idle time? If the time intervals are too short the number of data swapping will increase until a process is finished, which leads to a decrease of overall efficiency. If the time intervals are too long a finished process will leave the CPU with idle time.

Fig. 5-6 Flowchart of time controlled processing

65

c) by request (interrupt).

Processes are run with an underlining algorithm until an interrupt occurs. The interrupt stops the current process and is executed.

Fig. 5-7 Flowchart of an interrupt based algorithm

Wait active processes stay idle and cannot start until an interrupt occurs and the appropriate process is started by the dispatcher.

Fig. 5-8 Flowchart of a request based model

Process requests (triggered by the technical process):

• can be announced at any time

• have high priority (importance) by

- priority in execution and/or

- blocking other requests

• also lead to context change

66

An example of this is shown in Fig. 5-9. A background process (this can also be the time-sharing operation) is interrupted by two interrupts.

Fig. 5-9 Interrupt Control

Here the priority of the background process is at its lowest at 1, i.e. every interrupt is able to interrupt this process, as can be seen with I1, for instance. The priority of IRS 1 is still set relatively low at 2, so that the process can easily be interrupted by I2 with its higher priority IRS 2. In this example IRS 2 can complete its task without any further interruptions before IRS 1 can complete its remaining tasks. Finally, the background process is resumed.

As mentioned above, time-sharing operations and interrupt control usually overlap.

A process computer allows the execution of several data manipulation processes (quasi-parallelism) as well as real-time measuring, controlling or regulation of technical processes.

From now on the terms process (in the sense of data processing) and task will be used synonymously; the former originating from data processing and computer science respectively, the latter stemming from the user's point of view. As far as process computers are concerned, the terms are identical. However, the terms task and user have to be clearly distinguished. At least one task (or the solution of which) will be assigned to every user. However, a user can request several tasks without any problem, and similarly, several tasks can be assigned to no user at all, since they simply co-operate with a technical process. Thus, a process computer is originally a multi-tasking computer which can be used as a single-user computer as well as a multi-user computer.

For the sake of simplicity the following explanation of administrative mechanisms will begin with the structure of a multi-tasking administration, then moving on to the co-operation between computer and technical processes. In practice both mechanisms are intertwined.

67

5.2 PROCESS CHANGE

If only one Processor or Core is available and the operating system is multitasking the CPU has to alternate between processes. Figure 4.10 shows the problem of time consumption within a CPU for each activity in a time sharing situation. The time taken to switch can be between μs and ns. The address range of the old process eventually has to be saved and updated by complex memory management.

Fig. 5-10 Switching between processes by the operating system (OS)

The operating system maintains several queues; one is for the jobs that are waiting for joining in the distribution processes and one for jobs which are waiting for next allocation of CPU. The others are for the devices: I/O, printer, storage etc. Fig. 4-11 shows an example:

Fig. 4-11: Queues in the operating system

If a process switch is triggered by (i.e.) a timer, the context switch is taken over by a program, the so called dispatcher, in the operating system. Fig. 4-12 shows a possible procedure. New context will be updated from the corresponding stack region. The stack pointer (SP) points to the current position of an operation and memorized should a context change occur.

68

Fig. 4-12: Context save for process change

The processing performed by the dispatcher command sequence is shown in Fig. 4-13. RET instruction causes the loading of CS: IP (80x86) with the return address from the stack.

Fig. 4-13: New context update

Hardware

69

The situation in the queue is shown in Fig. 4-14; a) before, b) after the process change.

Fig. 4-14: The queue of “ready” processes a) before, b) after the process change

The position in the queue determines (in the example) the priority; this is recorded in the PCB. This means that there must be a (linked) list of PCBn in the operating system. The register contents are then saved by the dispatcher in each PCB. Meanwhile the address is passed to the dispatcher when calling the scheduler. SP can be used again for the rapid exchange.

70

5.3 SCHEDULING

When a computer is designed for multiprogramming, it frequently has multiple processes or threads that need the CPU resources at the same time. In the scheduling process the operating system chooses the next steps for the upcoming or already queued processes and which one is up next for the hardware. In terms of planning we distinguish between:

a) Long-term planning (in Batch Systems, job scheduling)

The long-term planning organizes the multi-program behavior; it occurs when a process comes to an end. It is then initialized to schedule the next line of jobs.

b) Short-term planning

The short-term planning is about 100 µs active and should take the shortest possible time.

c) Dispatcher

The dispatcher performs the context switch, which is an immediate action. This takes about 1 µs to 1 ns. If the context switch exceeds a fixed amount of clock cycles, the system is storing parts of the job to a higher level memory, resulting in a delay within the workflow.

The program alternates between CPU cycles that occur in bundles, and individual I/O-Instructions. The dispatcher may not require any activity of the CPU.

5.3.1 SCHEDULER

Depending on certain events the scheduler selects the next process queue of ready processes from the operating system. This is done by the short-term scheduler. The data in the queue, in general, consists of the PCBs of the processes.

Scheduling decisions can be made:

- In the transition of a process from the active to the waiting state (I/O),

- During the transition of a process from active to ready state (Interrupt),

- During the transition of a process of waiting in the ready state (I/O end),

- Upon termination of a process.

If a process has to wait for an I/O input, which would take several cycles, it improves the overall efficiency if the process is cleared temporary from the CPU and another process gets assigned. If the scheduler already knows, that a process will need another I/O input it can plan ahead. Usually the scheduler cannot know which actions a process will take until it is processed, but the process itself can change variables upon running, like its own priority. With this knowledge the two kinds of scheduling algorithms can be analyzed: the preemptive and non-preemptive.

At the non-preemptive scheduling a process is picked and started and it will run until it blocks (either on I/O or waiting state [Example: MS-Windows]) or it voluntarily releases the CPU. This process will not be forcibly suspended until a clock interrupt

71

occurs, meaning that the time a process has been given is has run out. During the clock interrupt no scheduling decisions are made, but if there is now a process in queue with a higher priority the old process will be replaced by the new one.

At the preemptive scheduling once a process is started, it runs a fixed amount of time. If the process is still running at the end of the time, it is suspended and the scheduler picks another one. Preemptive scheduling requires having a clock interrupt occur at the end of every time interval to give control of the CPU back to the scheduler.

There are problems associated, requiring additional effort:

• Data consistency: two processes use the same data set

• Call the scheduler while working on a system call by the operating system. Eventually the scheduler activity replaces the process that led the invocation. Some operating systems, e.g. many versions of UNIX, first end the system call or the I/0-Block before running the context switch. Real-time processing is not possible.

Planning Criteria

The scheduler needs to be planned with a respect to the following criteria:

• Use/efficiency of CPU-Utilization (40 % - 90 %)

• Throughput (completed processes / time unit)

• Cycle time (time from input to output of the results)

• Waiting time (Waiting in the CPU-WS)

• Response time (In interactive systems: Time to reaction)

5.3.2 SCHEDULING ALGORITHMS

A scheduling algorithm is a set of rules that determine which process to be run at a particular time or in particular period of time. If the processor are more the scheduling algorithm determines the distribution to the processors.

5.3.2.1 Requirements to a Scheduling Algorithm

In order to fulfill the planning criteria shown in 4.3.1, to a full or at least partial extend, some requirement must be fulfilled.

• Fairness: Each process need a fair share of the CPU. This is really important, because a comparable process should get comparable service. Of course different categories of processes may be treated differently but in the end the importance and time consumption should always be respected.

• Good utilization of resources, keep all parts of the system busy.

• The algorithm must be executed efficiently.

72

5.3.2.2 Classification of scheduling algorithms

There exist several different algorithms to manage scheduling. In order to classify them it has to be distinguished between the following:

1. Static (a priori) Scheduling Algorithms

The scheduling decisions of static algorithms take place at a fixed time interval. When the process coordination takes place the scheduling is planned before the programs run. The input of the scheduling algorithm is a set of processes which will be considered for scheduling, all processes that arrive the scheduler during the runtime of a process are stored for the next cycle. The scheduled processes run until all of them are done and/or until the given time is over.

Conditions for static algorithms: No dynamic process creation during the program. Event-driven processes can be incorporated only if the time conditions are schedulable.

2. Dynamic Scheduling Algorithms

The coordination process takes place while the processes are running. Time points of the update are:

o at fixed time intervals,

o as soon as a new process is created,

o as soon as a process ends.

Advantages and disadvantages of the dynamic scheduling contain: event-driven processes can be coordinated, but the decision process during a scheduler operation costs time. Due to lack of efficiency of the algorithms heuristics are required.

3. Algorithms for more Processors

In contrast to the scheduling of a processor, there are multiple processors generally only NP complete algorithms, so that one must work with heuristics.

4. Algorithms for complex Process models

Complex process models are understood as processes with one or more of the following properties:

o Interruptible, non-interruptible (preemptive, non-preemptive) processes with the further distinction in any or only at certain points interruptible process

o Cyclic, non-cyclic processes

o Switching to another processor

73

4.3.3 ANALYSIS OF SCHEDULING ALGORITHMS

This chapter takes a look on several scheduling algorithms and describes how they work and what advantages and disadvantages they might have. For most algorithms it is difficult to perform well/optimal due to:

- Complexity

- The execution time of a task might be unknown in advance

To visualize the function of an algorithm several types of diagrams will be introduced to give a clear picture of the working order and time consumption of each process.

4.3.3.1 Gantt-Diagram

The Gantt-Diagram shows task order of a CPU in dependence of the time a process arrives at the scheduler. It is assumed that each process needs a given amount of time that is either known or can be speculated, but in reality are not needed for an algorithm. If more than one processors are available the y-axis specifies the processor that is used for the allocated tasks.

Fig. 4-15: Gantt-Diagram of a 2 processor system with 5 tasks

Example:

Two Processors I, II are available and five processes have to be handled. The scheduling algorithm specifies the task-sequence depending on the priority of a task. A running process will be interrupted if another process with higher priority arrives. Should a CPU be unoccupied and a higher priority task needs to be scheduled running tasks will not be interrupted, instead the free Core will be assigned to that task. Priorities for the example-task P1-P5 are:

P3 > P2

P4, P5 > P1

P1 > P2

Process arrivals and executions

- P1 and P2 arrive at the same time (t=0), P3 at t=1, P4 at t=5, P5 at t=6

Processor Process Assumption:

P3 is more urgent than P2, P1 is more urgent than P2 P4, P5 are more urgent than P1. Order of arrival of the processes: - P1, P2 simultaneously, - then P3, P4. - while P3 and P4 are still running, P5 occurs.

74

Priority of P1 is higher, but both Processors are available => Assign P1 -> Processor I

=> Assign P2 -> Processor 2

- after that: P3, then P4 arrives

Priority P1 > P2

Priority P3 > P2

P2 will be interrupted by P3 (on Processor II)

P2 continues (on Processor I) after P1 finished (P4 not yet arrived)

By arrival of P4

Priority P4 > P1 > P2

Priority P3 > P2

P2 will again be interrupted, now by P4 (on Processor I)

- While P3, P4 are running; arrival of P5

P4, P5 > P1

P3 > P2

P1 > P2

P3 will interrupted, by P5 (on Processor I)

After P4 finished, P3 will be continued on Processor I

4.3.3.2 Timing Diagram

A timing diagram shows the sequence of processes for (mostly) a single processor. The y-axis is divided to the number of tasks currently running on the CPU, while the x-axis shows the timeframe. In a single-CPU case only one task can be executed at any given time (a special case of the Gantt chart). For each additional CPU the number of tasks running simultaneously is increased by one

Fig. 4-16: Timing-Diagram of a 1 processor system with 3 tasks

75

4.3.3.3 Example of Planning Algorithms

Following is a list of several common known and used scheduling algorithms.

First-come, first-served (FCFS); FIFO-WS

The FCFS algorithm does exactly what its name describes. The first process that arrives the scheduler will be the next in line, there are no interrupts.

Process P1: τ1 = 300 ms Waiting period 0 ms

P2: τ2 = 40 ms Waiting period 300 ms

P3: τ3 = 32 ms Waiting period 340 ms

Gantt Chart:


The average wait time is: 640/3. 213 ms

The average wait time is minimal when the order of P3, P2, P1 is: (32+72) / 3 ≈ 35 ms.

Shortest-Job-first (SJF)

Another example of a description within the name. When the scheduler has to decide which process will be next he will choose the job that needs the least amount of time. This provides the optimum in terms of waiting time. For short-term planning the process is not applicable, since the next block computation time is not known. Also this method is quite problematic should a more critical process arrive with a long runtime. It will be stalled until the probably less important but faster processes are computed.

Assuming that the time required for the next block is known, it is possible to construct an example shown in figure 4-17 (times in ms).

76


Priority Scheduling

The priority scheduling chooses the process with the highest priority for the next in line. Should another process with a higher priority arrive an interrupt is send and the active task will be replaced by the new one.

The problem is that low-priority processes will wait a long time should high priority-long time processes stall the CPU. To solve this, increasing the priority of the waiting tasks is a solution.

Round-robin Scheduler (RR)

A round robin algorithm switches tasks after a fixed interval of time. New processes are added to this lest and complete ones are removed. This guaranties, that each process is receiving a fair amount of time at the CPU.

The disadvantages of this method are, that in case of many active tasks a lot of time is wasted due to the change of a process. Also the completion of a long runtime process will be delayed more with an increase of active tasks.

Combination Several allocation levels

Combination could be time sharing as background job (RR) plus interrupt on demand based on priorities.

Processes differ according to priorities. Within a priority level, the allocation shall be made according to RR.

After an appropriate waiting time processes can move up in priority. Example: Aging

4

operating systems and computer networks scipt - part 1... · a modern computer consists of one or...

Documents