final jaypaper linux

Shriji JaySwaminarayan Bapa

2010-2011

Submitted By:

Jay J Bhatt

Vishwanath Kamble

7th SEM ISE

Srinivas Institute of Technology

I. Abstract

The purpose of this paper is to understand how to develop the project to share the load on one system by process sharing using linux platform.If one system is executing some task and if more load is there on the system then using load sharing we can share the load

II. History

The days of supercomputers and mainframes dominating computing are over. With the cost benefits of the mass production of smaller machines, the majority of today’s computing power exists in the form of PCs or workstations, with more powerful machines performing more specialized tasks. Even withincreased computer power and availability, some tasks require more resources. Load balancing and process migration allocate processes to interconnected workstations on a network to better take advantage of available resources.This paper presents a survey of process migration mechanisms. There have been other such survey papers on process migration but this paper is more current and includes one new algorithm, theFreeze Free algorithm. 1 motivations for load balancing and process migration are presented in Section.2. Section gives a brief overview of load balancing.3 which describes process migration. The differences between process migration and load balancing are discussed, and several of the current process migration implementations and algorithms are examined.4 presents future work, possible improvements, and extended applications of process migration.Finally, concluding remarks are given

III. Introduction

Process migration is the act of transferring an active process between two machines and restoring the process from the point it left off on the selected destination node.The purpose of this is to provide for an enhanced degree of dynamic load distribution, fault resilience, eased system administration, and data access locality. The potential of process migration is great especially in a large network of personal workstations. In a typical working environment,there generally exists a large pool of idle workstations whose power is unharnessed if the potential of process migration is not tapped.Providing for a reliable and efficient migration module that helps handle inconsistencies in load distribution and allows for an effective usage of the system resource potential is highly warranted As high-performance facilities shift from supercomputers to networks of workstations, and with the ever-increasing role of the World Wide Web expect migration to play a more important role and eventually to be widely adopted. This paper reviews the field of process migration by summarizing the key concepts and giving an overview of the most import implementations and by providing a unique alternative implementation of our own. Design and implementation issues of process migration are analyzedin general, and these are used as pointers in describing our implementation. Figure 1: Process

Migration and Mobility Process migration, which is the main theme of this paper primarily involves transferring both code and data across nodes. It also involves transfer of authorisation information, for instance access to a shared resource, but the scope is limited to being under the control of a single administration. Finally mobile agents involve transferring code, data and complete authorisation information,to act on the behalf of the owner on a wide scale, such as the World wide internet

IV. Characteristics

1) Harnessing resource locality Processes which are running on a node which is distant from the node which houses the data that the processesare using tend to spend most of their time in performing communication between the nodes for the sake of accessing the data. Process migration can be used to migrate a distant process closer to the data that it is processing, thereby ensuring it spends most of its time doing useful work. 2)Resource sharingNodes which have large amount of resources can act as receiver nodes in a process migration environment. 3) Effective Load Balancing: Migration is particularly important in receiver-initiated distributed systems, where a lightly loaded node announcesits availability thereby enabling the arbitrator to provide its processing power to another node which is relatively heavily loaded. 4) Fault tolerance This aspect of a system is improvedby migration of a process from a partiallyfailed node, or in the case of long runningprocesses when different kinds of failures are probable. In conjunction with checkpointing, this goal can be achieved. 5) Eased system administration: When a node is about to be shutdown, the system can migrate processes which are

running on it to another node, thereby enabling the process to go to completion, either on the destination node or onthe source node by migrating it back. 6) Mobile computing Users may decide to migrate a running process from their workstations to their mobile computers or vice versa to exploitthe large amount of resources that workstation can provide.

V. Motivations

Load balancing and process migration are used in a network of workstations (NOW) to share the load ofheavily loaded processors with more lightly loaded processors. The major reason for load balancing is that there is a lot of distributed processing power that goes unused. Instead of overloading a few machines with a lot of tasks, it makes sense to allocate some of these tasks to under-used machines to increase performance.This allows for tasks to be completed quickly, and possibly in parallel, without affecting the performance of other users on other machines. There are also more subtle motivations for performing process migration. By moving a process, it maybe possible to improve communication performance. For example, if a process is accessing a huge remote data source, it may be more efficient to move the process to the data source instead of moving the data to the process. This results in less communication and delay and makes more efficient use of network resources.Process migration could also be used to move communicating processes closer together. Other motivations for process migration include providing higher availability, simplifying reconfiguration,and utilizing special machine capabilities. Higher availability can be achieved by migrating a process,presumably a long-running process, from a machine that is prone to failure or is about to be disconnected from the network.

Thus, the process will continue to be available even after machine failure or disconnection. This availability is desirable in reconfiguration where certain processes are required to be continuallyoperational (e.g. name servers). Migration allows the process to be available almost continually, except for migration time, as opposed to restarting the process after the service is moved to another machine. Finally, certain machines may have better capabilities (e.g. fast floating point processing) which can be utilized bymoving the process to the machine. In this case, the process can be started on a regular machine and then migrated to a more powerful machine when it becomes available. Process migration may also find application in mobile computing and wide area computing. In mobile computing, it may be desirable to migrate a process to and from the base station supporting the mobile unit. In this way, the process can optimize its performance based on network conditions and the relative processing power of the mobile and base station. Also, as a mobile unit migrates, a process associated with the mobile unit running on a base station can migrate with the mobile unit to its new base station. This avoids communication between the base stations to support the mobile unit’s process. Process migration also has applications in wide area computing, as processes can be migrated to the location of the data thatthey use. This is important because despite increasing network bandwidth, network delays are not being reduced significantly. Thus, it may be beneficial to move a process to the location of the data to improve performance. In total, there are many motivations for load balancing and process migration. The goal of thesetechniques is to improve performance by distributing tasks to utilize idle processing power. Process migration can also reduce network transmission and provide high availability. The following sectionsexamine how load balancing and process migration achieve these goals.

VI. Load Balancing

Is the static allocation of tasks to processors to increase throughput and overall processor utilization. Load balancing can be done among individual processors in a parallel machine or amonginterconnected workstations in a network. In either case, there is an implicit assumption of cooperation between the processors.That is, a processor must be willing to execute a process for any other.The key word in the load balancing definition is static.The most fundamental issue in load balancing is determining a measure of load. There are many different possible measures of load including: number of tasks in run queue, load average, CPU utilization, I/O utilization, amount of free CPU time/memory, etc., or any combination of the above indicators. Otherfactors to consider when deciding where to place a process include nearness to resources, processor and operating system capabilities, and specialized hardware/software features. The "best" load measure as determined by empirical studies is "number of tasks in run queue". This statistic is readily availableusing current operating system calls, and it is less time-sensitive than many other measures. Even though the number of the tasks in the system is a dynamic measure, the number of tasks changes slower than the CPU or I/O utilizations, for example, which may vary within a task execution.The time-sensitivity of an indicator is very important because load balancing is performed with stale information. Each site makes load balancing decisions based on its view of the loads at other sites. There are numerous proposed load balancing algorithms. Some of these algorithms are topology specific, very theoretical (and hence not practical), or overly simplistic.Load balancing ensures that print jobs are always output as quickly as possible. It does this by evenly distributing the workload so that, for example, if one workflow is busy processing a large-

volume print job or if one Server encounters a problem, the print job will automatically be diverted to another workflow. Once a print job starts being processed, it is deleted from the hotfolder, thus ensuring that it is not processed by more than one workflow.

VII. Process Migration

Is the movement of a currently executing process to a new processor. Process migration allows the system to take advantage of changes in process execution or network utilization during theexecution of a process. A combinationof load balancing and process migration is the most beneficial. By providing a process migration facility, aload balancing algorithm has flexibility in its initial allocation as a process can be moved at any time.In addition, process migration must also handle movement of process state, communication migration, and process restart given state information. Also, process migration must be performed quickly to realize an increase in performance. The process migration latency time is the time from issuing the migration request until the process can execute on the new machine. Excision is the proces of extracting process information from the operating system kernel, and insertion is the process of inserting process information into the operating system kernel.Process migration algorithms must handle process immobility. A process is considered immobile if it should not be moved from its current location. Such processes include I/O processes which interact heavily

VIII. Components 1) Central ServerThe centralized server locates the appropriate machine when an overload signal is received. Every machine

on the network communicates with this module to first establish its identity and then to communicate regarding the load status. 2) Load BalancerThe Load Balancer keeps track of the relative loads of the machine on the network and chooses the appropriatemachine when an overload signal is received. 3) CheckpointerThis Checkpointer sends a signal to the process chosen by the Load Balancer for pre-emption and making it dump a core file. Thereafter, this module traverses the/proc directory to locate the attributes of the process selected for preemption. It then creates a file named filedescriptors which stores details about all the open files used by the process. These are used as checkpointing information 4) RestarterThe restarter uses the Linux machine’s ptrace system call to get and set the values of the following attributes of the process 5) File TransferrerThe FileTransferrer establishes a UDP connection with the main server and transfers the aforementioned files to that server when the overload message is sent.The following files are sent by this module The core dump The filedescriptors file 6) Load CalculatorThe Load Calculator is a daemon that runs on every process on the network to calculate the load at regular intervals. The calculation formula is a parametricequation based on various parameters. The total load on the system is calcualed as the sum of the loads of individual process. The memory and the cpu usage are the most significant parameters in the calculationof load. The /proc file system has one directoryper process. The directories containrelevantinformation regarding memory and cpu usage for all processes in the system. When the load exceeds a certain threshold value, this module sends an overload signal to the main server. This in turns begins the process of preemption, checkpointing and scheduling.

FIG :-Process Migration and its modules

IX. 9 Homogenous Migration Homogeneous process migration involves migrating processes in a homogeneous environment where all systems have the same architecture and operating system but not necessarily the same resources or capabilities.Process migration can be performed either at the user-level or the kernel level.

1) User-level Process MigrationUser-level process migration techniques support process migration without changing the operating system kernel. User-level migration implementations are easier to develop and maintain but have two common problems:They cannot access kernel state which means that they cannot migrate all processes.They must cross the kernel/application boundary using kernel requests which are slow and costly. 2) Kernel Level Process MigrationKernel level process migration techniques modify the operating system kernel to make process migration easier and more efficient. Kernel modifications allow the

migration process to be done quicker and migrate more types of processes. The Linux kernel is a monolithic kernel i.e. it is one single large program where all the functional components of the kernel have access to all of its internal data structures and routines. The alternative to this is the micro kernel structure where the functional pieces of the kernel are broken out into units with strict communication mechanism between them. This makes adding new components into the kernel, via the configuration process, rather time consuming. The best and the robust alternative is the ability to dynamically load and unload the components of the operating system using Linux Kernel Modules. The Linux kernel modules are piece of codes, which can be dynamically linked to the kernel (according to the need), even after the system bootup. They can be unlinked from the kernel and removed when they are no longer needed. Mostly the Linux kernel modules are used for device drivers or pseudo-device drivers such as network drivers or file system. When a Linux kernel module is loaded, it becomes a part of the Linux kernel as the normal kernel code and functionality and it posses the same rights and responsibilities as the kernel code.A kernel module is not an independant executable, but an object file which will be linked into the kernel in runtime. As a result, they should be compiled with the -c flag.

Loading a module into the kernel involves four major tasks:

1) Preparing the module in user space. Read the object file from disk and resolve all undefined symbols.

2) Allocating memory in the kernel address space to hold the module text, data, and other relevant information.

3) Copying the module code into this newly allocated space and provide the

kernel with any information necessary to maintain the module (such as the new module's own symbol table).

4) Executing the module initialization routine, init_module() (now in the kernel).

The Linux module loader, insmod, is responsible for reading an object file to be used as a module, obtaining the list of kernel symbols, resolving all references in the module with these symbols, and sending the module to the kernel for execution.

While insmod has many features, it is most commonly invoked as insmod module.o, where module.o is an object file corresponding to a module.

X. ELF FILES

ELF (Executable and Linking Format) is file formatthat defines how an object file is composed and organized. With thisinformation, your kernel and the binary loader know how to loadthe file, where to look for the code, where to look the initializeddata, which shared library that needs to be loaded and so on.

An ELF file has two views: The program header shows the segments used at run-

time, whereas the section header lists the set of sections of the binary.Each ELF file is made up of one ELF header, followed by file data. The file data can include:

Program header table, describing zero or more segments

Section header table, describing zero or more sections

Data referred to by entries in the program header table or section header table

The segments contain information that is necessary for runtime execution of the file, while sections contain important data for linking and relocation. Each byte in the entire file is taken by no more than one section at a time, but there can be orphan bytes, which are not covered by a section. In the normal case of a Unix executable one or more sections are enclosed in one segment.Some object file control structures can grow, because the ELF header contains their actual sizes. If theobject file format changes, a program may encounter control structures that are larger or Smaller #defineEINIDENT16

Typedefstruct{unsignedchar e_ident[EINIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version Elf32_Addr e_entry;

Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e _ehsize; Elf32_Half e_phentsize; Elf32_Half e_ phnum; Elf32_Half e_shentsize; Elf2_Half e_shnum ; Elf32_Half e_shstrndx ; } Elf32Ehdr;

e_ident The initial bytes mark the file as an object file and provide machine-independent data with which to decode and e_type This member identifies the object file type.

http://en.wikipedia.org/wiki/File:Elf-layout--en.svg

Name Value MeaningET_NONE 0 No file typeET_REL 1 Relocatable fileET_EXEC 2 Executable fileET_DYN 3 Shared object fileET_CORE 4 Core fileET_LOPROC 0xff00 Processor-specificET_HIPROC 0xffff Processor-specific e_Machine This member’s value specifies the required architecture for an individual file.e_version This member identifies the object file version.Name Value MeaningEV_NONE 0 InvalidEV_CURRENT 1 Currentversion e_entry This member gives the virtual address to which the system first transfers control, thus starting the process. e_phoff This member holds the program header table’s file offset in bytes. If the file has no program header table, this member holds zero.e_shoff This member holds the section header table’s file offset in bytes. If the file has no sectionheader table, this member holds zero.e_flags This member holds processor-specific flags associated with the file. Flag names take the form EF_machine_flag.e_ehsize This member holds the ELF header’s size in bytes.e_phentsize This member holds the size in bytes of one entry in the file’s program header table; all entries are the same size. e_phnum This member holds the number of entries in the program header table.e_shentsize This member holds a section header’s size in bytes. A section header is one entry in the section header table; all entries are the same size.e_shnum This member holds the number of entries in the section header table. e_shstrndx This member holds the section header table index of the entry associated with the section name string table.

XI. Freeze Algorithm

Thealgorithm is as follows:

1) Old host suspends the migrating process. 2) Old host marshalls execution state and process control state and transmits it to the new host. 3) New host initializes empty process object and responds with an accept or reject. 4) Simultaneously and right after its initial transmission, the old host determines the current code, stack,and heap pages and transmits them. 5) When the new host receives the first code, stack, and heap (optional) page it resumes the migratingprocess. 6) Concurrently, the old host ships remaining stack pages, followed by communication links and filedescriptor information. 7) The old host then flushes dirty heap and file cache pages. (This can be done in parallel with transfersto new host if there is a separate communication link.) 8) After all data is transferred off the old host, it sends a flush complete message to the new host.

XII. Migration Algorithm

1. A migration request is issued to a remote node.After negotiation, migration has been accepted.2. A process is detached from its source node by suspending its execution, declaring it to be in a migratingstate, and temporarily redirecting communication asdescribed in the following step.3.Communication is temporarily redirected by queuing up arriving messages directed to the migrated process, and by delivering them to the process after migration. This step continues in parallel with steps 4,5, and 6, as long as there are additional incoming messages. Once the communication channels are enabled after migration (as a result of step 7), themigrated process is known to the external world.

4.The process state is extracted, including memorycontents; processor state (register contents); communication state (e.g., opened files and message channels);and relevant kernel context. Thecommunication state and kernel context are OSdependent. Some of the local OS internal state is not transferable. The process state is typically retained onthe source node until the end of migration, and in some systems it remains there even after migration completes. Processor dependencies, such as register and stack contents, have to be eliminated in the caseof heterogeneous migration.5. A destination process instance is created into which the transferred state will be imported. A destination instance is not activated until a sufficient amount ofstate has been transferred from the source processinstance. After that, the destination instance will be promoted into a regular process.6. State is transferred and imported into a new instance on the remote node. Not all of the state needs to be transferred; some of the state could be lazily brought over after migration is completed.7. Some means of forwarding references to the migrated process must be maintained. This is required in order to communicate with the process or to control it. It can be achieved by registering the current location at the home node (e.g. in Sprite), by searching for the migrated process (e.g. in the V Kernel, at the communicationprotocol level), or by forwarding messagesacross all visited nodes (e.g. in Charlotte). This step also enables migrated communication channels at the destination and it ends step 3 as communicationis permanently redirected.8.The new instance is resumed when sufficient state has been transferred and imported. With this step, process migration completes. Once all of the state hasbeen transferred from the original instance, it may be deleted on the source node.

XIII. Why Linux

1. Low cost: You don’t need to spend time and money to obtain licenses since Linux and much of its software come with the GNU General Public License. You can start to work immediately without worrying that your software may stop working anytime because the free trial version expires. Additionally, there are large repositories from which you can freely download high quality software for almost any task you can think of.

2. Stability: Linux doesn’t need to be rebooted periodically to maintain performance levels. It doesn’t freeze up or slow down over time due to memory leaks and such. Continuous up-times of hundreds of days (up to a year or more) are not uncommon. 3. Performance: Linux provides persistent high performance on workstations and on networks. It can handle unusually large numbers of users simultaneously, and can make old computers sufficiently responsive to be useful again.

4. Network friendliness: Linux was developed by a group of programmers over the Internet and has therefore strong support for network functionality; client and server systems can be easily set up on any computer running Linux. It can perform tasks such as network backups faster and more reliably than alternative systems. 5. Flexibility: Linux can be used for high performance server applications, desktop applications, and embedded systems. You can save disk space by only installing the components needed for a particular use. You can restrict the use of specific computers by installing for example only selected office applications instead of the whole suite. 6. Compatibility: It runs all common Unix software packages and can process all common file formats. 7. Choice: The large number of Linux distributions gives you a choice. Each distribution is developed and supported by a different organization. You can pick the one you like best; the core

functionalities are the same; most software runs on most distributions. 8. Fast and easy installation: Most Linux distributions come with user-friendly installation and setup programs. Popular Linux distributions come with tools that make installation of additional software very user friendly as well.9. Full use of hard disk: Linux continues work well even when the hard disk is almost full. 10. Multitasking: Linux is designed to do many things at the same time; e.g., a large printing job in the background won’t slow down your other work. 11. Security: Linux is one of the most

secure operating systems. “Walls” and flexible file access permission systems prevent access by unwanted visitors or viruses. Linux users have to option to select and safely download software, free of charge, from online repositories containing thousands of high quality packages. No purchase transactions requiring credit card numbers or other sensitive personal information are necessary.12. Open Source: If you develop software that requires knowledge or modification of the operating system code, Linux’s source code is at your fingertips. Most Linux applications are Open Source as well.

XIV. Advantages

1)Dynamic Load Distribution, by migrating processes from overloaded nodes to less loaded ones.

2)Fault Resilience, by migrating processes from nodesthat may have experienced a partial failure.

3)Improved system administration, by migrating processes from the nodes that are about to be shut down or otherwise made unavailable

XV. Application

1) Distributed applications can be started on certain nodes and can be migrated at the application level or by using a system wide migration facility in response to things like load balancing. 2)Long running applications, which can run for days or weeks on end can suffer various interruption.

3)Multiuser Applications, can benefit greatly from process migration. As users come and go, the load on individual nodes varies widely.

XVI. References Black, D., Golub, D., Julin, D., Rashid, R., Draves, R., Dean,R., Forin, A., Barrera, J., Tokuda, H., Malan, G., and Bohman,D. (April 1992). Microkernel Operating System Architectureand Mach. Proceedings of the USENIXWorkshop on Micro-Kernels and Other Kernel Architectures

Goldberg, A. and Jefferson, D. (1987). Transparent Process Cloning: A Tool for Load Management of Distributed Systems.Proceedings of the 8th International Conference onParallel Processing

Zhou, S. and Ferrari, D. (September 1987). An ExperimentalStudy of Load Balancing Performance. Proceedings of the7th IEEE International Conference on Distributed Com49puting Systems,

Google.com

Linuxgenerals.com

Linuxpdf.com