an application service provider for a mobile computing environmentgkucuk/kucuk_msc.pdf ·...
Post on 14-Apr-2020
5 Views
Preview:
TRANSCRIPT
An Application Service Provider for a Mobile Computing Environment
by
Gürhan Küçük
A thesis submitted to Institute of Graduate Studies in Science and Engineering partial fulfillment of the
requirements for the degree of Master of Science
in Information and Computer Science
YEDİTEPE UNIVERSITY 1998
Acknowledgements
I would like to express my deepest gratitude to my supervisor, Associate Professor
Şebnem Baydere, for her invaluable guidance, motivation and support during all
stages of this study.
I would like to thank specially to Onur Demir, my colleague and partner throughout
the past four years, for his invaluable help and patience. I would also like to thank the
members of the MaROS project Mehmet Can Yıldız and Giray Devlet, and former
MaROS members Nurhan Çetin, Değer Cenk Erdil and Nilüfer Girgin for their
presence and invaluable help. Moreover, thanks to Yeditepe University Engineering
Faculty staff Alper Özpınar, Ayşegül Ergin, Ertan Toprakbastı, İlker Birbil, Mehmet
Ali Özcan, Rana Belen, Tansel Çağlar and Zeynep Yazıcıoğlu for their moral support
and help.
Last but not least, thanks to my mother, Emek Küçük, and my father, İlhan Küçük,
and the rest of my family for their moral support and patience throughout all stages of
this study and my life.
2
Abstract
MaROS (Mobile and Relocatable Object System) application development platform
is especially designed for mobile computers. In this platform, registered mobile
computers may transfer their applications to MaROS server. This process is called
“relocation”. In this thesis, the design and implementation of the MaROS Notification
and Recovery modules are presented.
MaROS Notification module deals with transfer and removal of MaROS objects. It is
the preparation step for the object relocation process. MaROS Recovery module deals
with the system recovery and startup process after voluntary shutdown requests. It
coordinates the recovery process by keeping a table called “Recovery Table”.
3
Contents
LIST OF TABLES vii LIST OF FIGURES viii LIST OF ABBREVIATIONS x
1 INTRODUCTION...................................................................................................... 1 1.1 Introduction.......................................................................................................... 1
1.2 Mobile and Relocatable Object System (MaROS) .............................................. 3
1.3 Motivation and Aims ........................................................................................... 4
1.4 Background Information...................................................................................... 5 1.4.1 Mobile Computing ......................................................................................... 5 1.4.2 Mobile Host ................................................................................................... 5 1.4.3 Disconnected Communication ....................................................................... 6 1.4.4 Objects ........................................................................................................... 6 1.4.5 Object Relocation (Object Migration) ........................................................... 7 1.4.6 Object Notification ........................................................................................ 7 1.4.7 Object Recovery ............................................................................................ 7
1.5 Thesis Summary................................................................................................... 8
2 PREVIOUS WORK................................................................................................... 9 2.1 Introduction.......................................................................................................... 9
2.2 Rover: A Toolkit for Mobile Information Access ............................................... 9
2.3 ARTEMIS:Advanced Reliable disTributed Environment Middleware System 11
2.4 Eden ................................................................................................................... 13
2.5 LOCUS .............................................................................................................. 15
2.6 Discussion .......................................................................................................... 16
3 MaROS Environment .............................................................................................. 18 3.1 Introduction........................................................................................................ 18
3.2 The Physical Structure of MaROS..................................................................... 18
4
3.3 The Logical Structure of MaROS...................................................................... 20
3.4 The System Agents ............................................................................................ 21 3.4.1 Communication Agent ................................................................................. 22 3.4.2 Object Manager............................................................................................ 22 3.4.3 Notification Agent ....................................................................................... 23 3.4.4 Migration Agent........................................................................................... 23 3.4.5 Recovery Agent ........................................................................................... 23
3.5 Host Registration and Authentication Protocol ................................................. 24
3.6 MaROS Objects ................................................................................................. 26 3.6.1 Object Types ................................................................................................ 26 3.6.2 Object Modes ............................................................................................... 26 3.6.3 Object States ................................................................................................ 27
3.7 Communication Structure .................................................................................. 29
3.8 System Recovery ............................................................................................... 30
4 NOTIFICATION DESIGN ..................................................................................... 32 4.1 Introduction........................................................................................................ 32
4.2 Notification in General ...................................................................................... 33
4.3 Detailed Design.................................................................................................. 34 4.3.1 Peer Entities ................................................................................................. 35 4.3.2 Reserving Ports ............................................................................................ 36 4.3.3 Notifier Tables ............................................................................................. 39
4.3.3.1 Notifier Object Transfer Table (NOTT) ............................................... 39 4.3.3.2 Partial Object Transfer Table (POTT) .................................................. 39 4.3.3.3 Class Dependency Table (CDT) and Class Replica Table (CRT) ........ 40 4.3.3.4 Notifier Information Table (NIT) ......................................................... 42
4.3.4 Object Transfer (Object Creation) ............................................................... 43 4.3.4.1 Full Transfer.......................................................................................... 46 4.3.4.2 Partial Transfer...................................................................................... 47 4.3.4.3 No Need To Transfer ............................................................................ 47
4.3.5 Object Deletion ............................................................................................ 48
5 SYSTEM RECOVERY DESIGN ........................................................................... 50 5.1 Introduction........................................................................................................ 50
5.2 Recovery Table (RT) and Recovery Tree Structure .......................................... 51
5.3 Recoverable Objects vs. Unrecoverable Objects ............................................... 53
5.4 System Shutdown............................................................................................... 55 5.4.1 Creating Image Files .................................................................................... 57
5.5 System Startup ................................................................................................... 58 5.5.1 Object Manager Startup Process .................................................................. 58 5.5.2 Recovery Agent Startup Process.................................................................. 60 5.5.3 Startup of the System Agents....................................................................... 61
5
5.5.4 Mutation of SS fields and RA Garbage Collector........................................ 62
6 PILOT SYSTEM IMPLEMENTATION............................................................... 63 6.1 Introduction........................................................................................................ 63
6.2 Pilot System Implementation Language ............................................................ 63
6.3 Pilot System Implementation Environment ....................................................... 65
6.4 Pilot System Implementation............................................................................. 65 6.4.1 Notify Package............................................................................................. 66
6.4.1.1 Notify.NotificationAgent Class ............................................................ 66 6.4.1.2 Notify.NIT Class ................................................................................... 66 6.4.1.3 Notify.NotifierClass Class .................................................................... 68 6.4.1.4 Notify.NOTT and Notify.POTT Classes .............................................. 68
6.4.2 Notify.CDT Package.................................................................................... 70 6.4.2.1 Notify.CDT.CDT Class......................................................................... 70 6.4.2.2 Notify.CDT.CRT Class......................................................................... 72
6.4.3 Recovery Package ........................................................................................ 73 6.4.3.1 RecoveryTable Class............................................................................. 73 6.4.3.2 RecoveryAgent Class ............................................................................ 75 6.4.3.3 Recoverable Object Implementation..................................................... 76
7 EVALUATION AND FUTURE WORK ............................................................... 81 7.1 Introduction........................................................................................................ 81
7.2 Performance Evaluation..................................................................................... 81 7.2.1 Full Transfer Tests ....................................................................................... 82 7.2.2 No Need Type Object Transfer and Object Deletion Tests.......................... 85
7.3 Future Work ....................................................................................................... 86 7.3.1 Future Work on the Notification Module .................................................... 87 7.3.2 Future Work on the Recovery Module ........................................................ 87 7.3.3 Future Work on MaROS.............................................................................. 88
8 CONCLUSION......................................................................................................... 90
REFERENCES.............................................................................................................. 92
BIBLIOGRAPHY......................................................................................................... 94
6
List of Tables
Table 1.1 Characteristics of computer hardware .............................................................. 2
Table 1.2 Characteristics of network technology ............................................................. 2
Table 3.1 An instance of the Host Identification Table (HIT)........................................ 24
Table 3.2 Object states.................................................................................................... 29
Table 4.1 Sample NOTT and POTT instances ............................................................... 40
Table 4.2 Sample CDT and CRT instances .................................................................... 41
Table 4.3 Sample NIT instances ..................................................................................... 42
Table 6.1 The format of the MH version of the NIT ...................................................... 67
Table 6.2 The format of the MSP version of the NIT..................................................... 67
Table 6.3 The format of the NOTT................................................................................. 69
Table 6.4 The format of the POTT ................................................................................. 69
Table 6.5 The format of the CDT ................................................................................... 71
Table 6.6 The format of the CRT.................................................................................... 72
Table 6.7 The format of the Recovery Table.................................................................. 73
Table 7.1 Transfer results of 505319 bytes object.......................................................... 82
Table 7.2 The results of the No Need type object transfer tests...................................... 86
7
List of Figures
Figure 2.1 The Rover toolkit client/server distributed object model .............................. 11
Figure 2.2 Highly reliable distributed environment provided by ARTEMIS................. 12
Figure 2.3 Transitions between active and passive representations in Eden .................. 14
Figure 3.1 The physical structure of MaROS ................................................................. 19
Figure 3.2 MaROS layers ............................................................................................... 20
Figure 3.3 The host authentication process..................................................................... 25
Figure 4.1 Initial phase of a Notification process ........................................................... 35
Figure 4.2 A Notification design approach (Initial Phase) ............................................. 36
Figure 4.3 Current design of the Notification process (Initial Phase) ............................ 37
Figure 4.4 The comparison of two approaches............................................................... 38
Figure 4.5 Scope of the Notifier tables ........................................................................... 40
Figure 4.6 Object transfer (creation) process.................................................................. 43
Figure 4.7 Message format of a notification request ...................................................... 44
Figure 4.8 cfinfo field in detail........................................................................................ 44
Figure 4.9 Message format of the message that is sent from HMSP to NotifierMSP ......... 46
Figure 4.10 pcfinfo field in detail.................................................................................... 47
Figure 4.11 Message format of the message sent from Notifiers to Handlers................ 48
Figure 4.12 Object deletion process................................................................................ 48
Figure 5.1 A sample Recovery Table instance and its corresponding tree structure...... 52
Figure 5.2 Setting the SS field ........................................................................................ 55
Figure 5.3 The OM realizes the Shutdown Signal.......................................................... 56
Figure 5.4 Flow of the Shutdown Signal ........................................................................ 57
Figure 5.5 Startup of the Object Manager ...................................................................... 59
Figure 5.6 Startup of the Recovery Agent ...................................................................... 60
Figure 5.7 Code replication solution for the recovery process ....................................... 61
Figure 6.1 The example piece of code showing the addition of code states................... 76
8
Figure 6.2 The implementation of check_ShutdownSignal() method............................. 77
Figure 6.3 An example code part of the saveImage() method........................................ 78
Figure 6.4 An example image file reader code piece...................................................... 79
Figure 7.1 Transfer time vs. buffer size graph for 100 Mbit tests .................................. 83
Figure 7.2 Transfer time vs. buffer size graph for 115200 bit tests................................ 83
Figure 7.3 Transfer time vs. buffer size graph for 19200 bit tests.................................. 84
Figure 7.4 Transfer speed vs. buffer size graph for 100 Mbit tests ................................ 84
Figure 7.5 Transfer speed vs. buffer size graph for 115200 bit tests.............................. 85
Figure 7.6 Transfer speed vs. buffer size graph for 19200 bit tests................................ 85
9
List of Abbreviations
AO Authentication Object
CA Communication Agent
CPU Central Processing Unit
CDT Class Dependency Table
CGI Common Gateway Interface
CRT Class Replica Table
DBMS Database Management System
DNS Domain Name Service
FTP File Transfer Protocol
H Handler
HIT Host Identification Table
ID Identifier
JVM Java Virtual Machine
LAN Local Area Network
MA Migration Agent
MaROS Mobile and Relocatable Object System
MASL MaROS Application Support Layer
MCSL MaROS Communication Support Layer
MH Mobile Host
MID Mobile Host Identifier
MMX Multimedia Extension
10
MRB MaROS Recycle Bin
MSP Mobile Host Service Provider
MUI MaROS User Interface
N Notifier
NA Notification Agent
NIT Notifier Information Table
NOTT Notifier Object Transfer Table
O MaROS Object
OID Object Identifier
OOP Object Oriented Programming
OM Object Manager
OS Operating System
OT Object Table
POTT Partial Object Transfer Table
QRPC Queued Remote Procedure Call
RA Recovery Agent
RDO Relocatable Dynamic Object
RT Recovery Table
SS Shutdown Signal
TCp Turkish Coffee Protocol
TCP Transmission Control Protocol
UDP User Datagram Protocol
VPM Virtual Port Mapper
VPR Virtual Port Reservation
WWW World Wide Web
11
1
Introduction
1.1 Introduction
Computers and computer networks have opened a new era in the world history:
Information Age. Today, we can access any database located in any region of the
world as if it were locally available, join multimedia conferencing, and do online
shopping. Distributed systems and client/server architecture are actually the best
keywords that may describe this age. A distributed system consists of many
computers that are connected to a computer network. In the client/server model,
servers are the computers that share their resources to the system, and clients are the
computers that use these resources. This scheme is ideal for computer networks with
fixed hosts.
In the beginning of this decade, wireless networks have started to become popular
with the increase in the number of portable computer sales. Wireless networks have
provided computers with wireless interfaces that allow networked communication
even while a user is travelling. The rapid advances in cellular communication
technology, wireless LAN, and satellite services have enabled mobile users to access
information anywhere and at anytime [1].
1
Wireless networks need everything that classical computer networks need. However,
they also need an improvement over classical client/server model. Because, reliable
transport protocols have been tuned for networks composed of wired links and
stationary hosts [2]. Moreover, the classical client/server model assumes that both the
clients and the servers are connected to a network via a fixed, and continuous
connection. The portable computers have some deficiencies such as low bandwidth
capacity, limited power supply, limited CPU power, and vulnerability to line failures,
hand-offs, etc. Table 1.1 presents the characteristics of computer hardware. From the
table it is clear that portability is traded for performance.
Hardware/ Characteristic
Server Workstation Laptop Palmtop
Processing power High High Medium Limited Storage capacity High High Medium Limited Portability None Limited Slightly limited Full User interface Full Full Slightly limited Limited Reliability High Medium Limited Limited
Table1.1: Characteristics of computer hardware.
Table 1.2 summarizes the major characteristics of network technology. From this
table it is obvious that availability is traded for performance.
Techology/ Characteristic
Fixed WAN/LAN Dial-up wire Dial-up cellular
Bandwidth High Medium Low Reliability High Medium Low Initial Cost High Low Low Latency Low Medium High Cost to use Low Medium High Topology Fixed Fixed, but readily
changeable Dynamic
Available at Outlet in organization Any phone outlet Anywhere (theoretically)
Table 1.2: Characteristics of network technology
2
A new type of client/server model or an application development platform has to be
designed to deal with the inadequacies of the portable computers. This new model has
to support many new ideas such as disconnected communication, object relocation,
and system recovery, which are hardly needed in classical client/server model.
1.2 Mobile and Relocatable Object System (MaROS)
MaROS is a mobile computing environment that is especially designed for
suppressing the inadequacies of mobile computers [3]. In a mobile computing
environment, there are portable computers that are connected to a static network via
wireless links. Portable computers in these mobile environments, have limited power
supply, and limited communication bandwidth. Because, the classical client/server
model assumes that both the clients and the server (or servers) are connected to a
network via a fixed, and uninterrupted connection, it is not a good solution for
portable computers. Moreover, classical communication protocols, such as TCP, does
not take the wireless networks into consideration.
MaROS proposes a new type of client/server model, and a new type of
communication protocol. This model and protocol, helps mobile hosts to extend their
computing environment. The MaROS server called as MSP (Mobile Host Service
Provider), is a fixed, and powerful host that has a wireless interface to communicate
with mobile hosts (or MaROS clients). MaROS clients (or MHs, or Mobile Hosts),
may transfer their objects to MSP, and execute there. This approach enables Mobile
Hosts to extend their computing environment using fixed, and powerful server
machines.
3
1.3 Motivation and Aims
Extending the computing environment of a Mobile Host is one of the major aims of
MaROS. With this service, Mobile Hosts may run CPU, and bandwidth bound
applications even when they are powered off. However, this service requires the
transfer of applications, or parts of applications to the MSP site. This process is a kind
of object synchronization process which creates the exact copy of an object in the
MSP. The Notification service primarily concerns with the transfer of the MaROS
objects. The additional task of this service is deletion of transferred objects, when
they are not needed anymore. The object notification process is also a preparation to
object migration process. It simplifies the migration process by transferring the
objects when they are created.
The mobile computing environment of MaROS should be efficient, and reliable. If the
Mobile Host has to be shutdown (Note that, this is a very usual case for a portable
machine), without a Recovery service, many jobs have to be restarted from the
beginning, or -worst of all- many jobs may be lost, and may not be started anymore.
The Recovery service enables the Mobile Hosts to continue their execution after a
proper system shutdown and start-up process. It provides much more stable and
efficient computing environment for Mobile Host users. There are many research
efforts related with recovery in distributed systems. Some of them deal with failure
recovery which covers software and hardware failures all together. Many of them use
checkpointing algorithms for rollbacking to a previous state after a failure. The
current recovery system of MaROS does not deal with hardware failures. The system
tries to recover itself, after a shutdown request is given by the user. The recovery
module is one of the most important parts of the MaROS system. It enables the
4
MaROS user to exit the system whenever it is necessary to know that at the next
startup, the system will continue its execution from the exact point where it has
stopped.
1.4 Background Information
This thesis is mainly concerned with an application development environment for
mobile computers. Therefore, it is vital for any reader to know the basic concepts in
this area. This section provides a general background information on mobile
computing environments, and object related issues.
1.4.1 Mobile Computing
With the rapid increase in the number of portable computers, and the existence of
their wireless link interfaces, a new computing approach has become feasible: Mobile
Computing. The Wireless Computing, Ubiquitous Computing, Location-independent
Computing and the Nomadic Computing terms are also equally used. The portable
computers in a mobile computing system are usually called as Mobile Hosts.
1.4.2 Mobile Host
A Mobile Host is a portable computer that may connect to a network via its wireless
link interface. It may be carried easily; however, it has some important disadvantages.
It has to be recharged periodically; since, it has a limited power supply. It may be
connected to a mobile network via a cellular phone line, when on the move. This type
of connection provides communication hand-offs when changing cells. Therefore, the
communication is not reliable as it is in fixed networks. Moreover, the communication
5
bandwidth is far less than the bandwidth of the fixed networks. Furthermore, mobile
communication is still expensive. A Mobile Host is also vulnerable. It may be
dropped and physically broken, or it may be stolen. All of these handicaps have
forced the computer scientists to design new type of computing environments that
covers Mobile Hosts.
1.4.3 Disconnected Communication
A Mobile Application Environment should provide communication primitives
transparent to the applications. Disconnected communication is the part of a new type
of communication protocol designed for supporting mobile hosts. It tries to minimize,
or totally remove the side effects of connecting to a network via a mobile host. By
using queuing strategies at both side of the connection, the client and the server may
send data, even when there is no connection. The queued data is sent to the target
host, when the communication link becomes up again.
1.4.4 Objects
The object concept is used in many areas. A car, a TV set, a computer program, and a
door knob are examples of an object. There are two main characteristics of an object
that are universal: 1) Properties and 2) Methods. Each object has special properties.
For instance, a car object has a color, a type, a width, and a length. It has also
methods such as drive, stop, and change gear. In MaROS context, the objects are
Java threads. They have many properties such as recoverability, relocatability, and
state. A MaROS object has also many methods. The suspendObject, sleepObject,
activateObject, and relocateObject methods are some examples of these methods.
6
1.4.5 Object Relocation (Object Migration)
Sometimes, an object may be needed to move from one machine to another. The
reasons for that process may be various. The host that contains the object may be very
loaded, and the object may be moved to a less loaded machine. This process is called
as load balancing. Another reason may be long-running objects. They require
uninterrupted execution, and this may not be possible every time. For instance, a
portable computer has a limited power supply and unreliable communication
interface. All these handicaps are not suitable for such kind of objects. In short,
Object Relocation is the movement of an object from one host to another. In MaROS
terms, it is the movement of a MaROS object from Mobile Host (MaROS client) to
Mobile Host Service Provider (MaROS server); since, MaROS supports one-way
object relocation.
1.4.6 Object Notification
In order to support object relocation, the copy of the objects should be created at the
MSP. These mirror objects should also be deleted, when the object is to be deleted
from the mobile host. This is a kind of object synchronization process, and since the
MSP is informed about the process, the process is called as notification.
1.4.7 Object Recovery
In a mobile computing environment, there are portable computers that are not reliable.
The recovery process is used for handling system failures. There are exactly two types
of failures: Involuntary and voluntary. The first one may occur any time, and its
7
source may be a hardware or software problem. This type of failures are handled by
saving snapshot of the system (checkpointing) periodically, and rollbacking to a
previous stable state in the next system startup. The second type of failures is
voluntary. The user may want to shutdown the system, and the system may be
signalled before the system shutdown occurs. The voluntary system interruptions give
the system a chance to save its crucial data. Of course, dealing with the first type of
failures is much more difficult.
1.5 Thesis Summary
MaROS is an application development platform especially designed for portable
computers. These mobile hosts are called as MaROS clients. MaROS is a client/server
environment, and consists of two types of machine: MaROS clients (MH, in short)
and MaROS server or Mobile Host Service Provider (MSP). A MaROS client runs
MaROS client code and connects to the MSP via wireless link. MSP accepts
connections from MaROS clients and permits them to relocate their objects.
This thesis mainly concerns with the design and implementation of Notification and
Recovery modules of MaROS. It will also cover the general architecture of MaROS to
provide more clear vision of the system to the reader. In Chapter 2, previous work on
this area is surveyed. Chapter 3 covers the overall design of the MaROS. In Chapter 4
and 5, the design of Notification and Recovery services of MaROS is explained in
detail. Chapter 6 provides the implementation details of the pilot system. Chapter 7
presents the performance evaluation and also additional ideas that may be
implemented in the future. Chapter 8 is the final chapter, and it presents a conclusion
for the whole work.
8
2
Previous Work
2.1 Introduction
There are several operating systems and toolkits focused on mobile and/or distributed
computing platforms. In this chapter, some of those projects are discussed, in detail,
focusing especially on their recovery modules.
2.2 Rover: A Toolkit for Mobile Information Access
This toolkit is one of the projects that are very close to MaROS. The Rover toolkit
was developed in the Computer Science Laboratory in MIT. It provides mobile
application developers with a set of tools to isolate mobile applications from the
limitations of mobile communication systems. It supports mobile communication by
providing Relocatable Dynamic Objects (RDOs) and Queued Remote Procedure Call
(QRPC). An RDO is an object with a well-defined interface that can be dynamically
loaded into a client computer from a server computer (or vice versa) to reduce client-
server communication requirements. Queued remote procedure call is a
communication system that permits applications to continue to make non-blocking
remote procedure calls [6] even when a host is disconnected: requests and responses
are exchanged upon network reconnection [5].
9
The Rover toolkit offers applications a uniform distributed object system based on
client/server architecture. Rover applications employ a check-in, check-out model of
data sharing: They import RDOs into their address spaces, invoke methods provided
by the RDOs, and export the RDOs back to servers.
The latest extensions provide the tools for handling a specific class of faults: transient,
recoverable faults. These faults are typically caused by environmental circumstances
(e.g. power glitches, communication link errors or failures, resource exhaustion due to
high system load, etc.) or software errors in rarely used code paths. The extensions do
not address repeatable or non-recoverable failures (e.g. those due to critical design or
implementation errors).
The reliability extensions leverage functionality already provided by the Rover
toolkit: stable logging of each message sent by a client and message retransmission
after communication failures. While the use of stable logging at the client provides
reliable delivery of a message to a server, it does not handle failures at the server [7].
Figure 2.1 shows the Rover toolkit client/server distributed object model. Rover offers
applications client caching and optimistic concurrency control based upon a check-in,
check-out model of data sharing. Client applications use QRPCs to import RDOs
from servers (steps 1 and 2) and to export changed RDOs back to servers (steps 3 and
4).
10
Figure 2.1: The Rover toolkit client/server distributed object model.
2.3 ARTEMIS: Advanced Reliable disTributed Environment Middleware System
ARTEMIS is a middleware to improve reliability of application programs, which are
executed in distributed environment such as 3-tiers client-server model application
programs or groupware application programs, without changing them.
ARTEMIS is implemented as library routines and daemon processes with the
configuration where there is a backup computer for a server computer. ARTEMIS
uses checkpoints as its key method for achieving high reliability. It provides a
checkpointing protocol which makes checkpoints of distributed processes
consistently.
11
Figure 2.2: Highly reliable distributed environment provided by ARTEMIS.
Figure 2.2 shows the environment provided by ARTEMIS. In this example,
ARTEMIS controls a WWW server, CGI application programs and a DBMS running
in a server computer as well as WWW browsers running in client computers. Under
control of ARTEMIS, even if the primary server computer goes down, all the
processes in the primary server computer can be resumed in the backup server
computer using their checkpoints and replicated files. DBMS can continue to run in
the backup server computer without executing journal recovery processing.
In the ARTEMIS environment, it is not necessary to modify application programs;
because, ARTEMIS libraries are linked to application programs dynamically, and
they have the same interfaces with and operating systems. ARTEMIS libraries keep
12
watch on behavior of a process to which they are linked, and acquire checkpoints of
its process [8].
2.4 Eden
The Eden system was developed at the University of Washington in Seattle. The goal
of Eden system was to investigate logically-integrated but physically-distributed
operating systems.
Eden was based on the object model. It is descendant of Hydra (Wulf et al. 1981). All
‘traditional’ programs and physical and logical resources are represented as objects.
There are no pure data objects – Eden objects are supported by active processes. An
Eden object may be seen as an instance of an abstract data type. Because, there are
some differences between Eden’s objects and those of other systems and languages,
the designers refer to them as Ejects (for Eden Objects).
The underlying system of Eden is Berkeley UNIX running on VAXes. Each active
Eject executes within a separate UNIX process with its own address space. This
process is managed by the Eden kernel using UNIX facilities.
Ideally, an Eject should be active. However, it is not always active, either because it
or its computer has crashed, or because it has explicitly deactivated itself in order to
economize on the use of system resources. Thus, an Eject has two manifestations: An
active representation (with its system-level process) and a passive representation. The
passive representation consists primarily of a disk file, and only the passive
representation can survive a crash.
13
An Eject can perform a Checkpoint operation. This operation creates a passive
representation, that is, a data structure designed to endure system crashes. This means
that the in a passive representation should be sufficient to enable the Eject to
reconstruct its long term state. Acquiring and releasing active and passive
representations are illustrated in Figure 2.3.
Figure 2.3: Transitions between active and passive representations in Eden.
The figure shows that when an Eject is created, only an active representation exists. It
does not have its state saved in permanent store. This implies that if this Eject were to
Deactivate, or if the system were to crash, it would vanish and it could not be invoked
again.
Performing a Checkpoint operation results in the following operations: Opening a
passive representation of an Eject, writing its state in a series of PutData calls, and
14
completing the passive representation with a call. The Eject then has its state and
identity on permanent store. If this Eject Deactivates or crashed, its active
representation vanishes, but the passive representation remains. If the Eject having a
passive representation is invoked by another Eject, then the kernel reactivates it, that
is, it constructs a new active representation [9].
2.5 LOCUS
LOCUS is a UNIX-compatible, distributed operating system developed by Popek,
Walker and their co-workers at the University of California, Los Angeles. The system
has been in use for several years.
LOCUS’s general goals include making the development of distributed applications
as simple as single machine programming, and realizing the potential that distributed
systems with redundancy have for highly reliable, available operation. The LOCUS
architecture addresses the goals of:
(1) Network transparency – giving all users the illusion of operating on a single
computer. The network is not visible; there is no need to refer to a specific node
of a network;
(2) High reliability and availability – introduced for two general reasons. First,
many applications demand a high level of reliability and availability. Second, the
distributed environment presents new sources of failure, and recovery
mechanisms to deal with them are far more difficult to construct than in
centralized computer systems. LOCUS processes one very important reliability
15
feature, namely, it supports automatic replication of stored data, with the degree
of replication indicated by associated reliability profiles; and
(3) Good performance – LOCUS achieves two basic performance characteristics
desirable in the case of distributed system:
(a) Access to local resources in a distributed system should have comparable
performance to access to resources in a centralized system, as if mechanisms
for remote access were not present.
(b) Remote access, of course slower than local access, should be reasonably
comparable to local access [9].
2.6 Discussion
All of the above platforms provides system reliability by utilizing a special system
object (agent) or including an extension package to the system. Some of the platforms
use replication strategy for a more reliable system [8,9]. The rest of the platforms
prefer checkpointing strategy for the system recovery. The checkpointing strategy is
divided into two camps. One camp applies checkpointing operation periodically over
a system-wide perspective [7], and the other side uses a one-time checkpointing
operation over only recoverable objects [8].
The replication strategy requires a very extensive network traffic that is not suitable
for mobile platforms. In the mobile platforms, the most vulnerable machines are the
mobile hosts. Keeping a replica for each mobile host is not a feasible approach; since,
the mobile hosts do not have enough network bandwidth for supporting this kind of
strategy. However, this strategy is very effective when dealing with hardware failures.
16
On the other hand, the checkpointing strategy may be effectively used by the mobile
platforms. However, it is not as effective as the replication strategy for the hardware-
based failures. It is possible to continue execution from the last checkpoint; however,
there is nothing to do, if the checkpoint information is damaged. The second type of
checkpointing strategy (one-time checkpointing) could not deal with hardware
failures.
17
3
MaROS Environment
3.1 Introduction
This thesis covers the design and implementation of the two crucial parts (The
notification and the recovery modules) of the MaROS environment; however, it is
necessary to explain the MaROS environment clearly to give an idea about the whole
system, before going into more details about its modules. The first topic of this
chapter is physical and logical structure of the MaROS. Then, the registration and
authentication process of the mobile hosts is explained in detail. The object specific
events and the system recovery process are discussed at the end of this chapter.
3.2 The Physical Structure of MaROS
The physical structure of MaROS consists of many portable computers and a fixed
host. The portable computers may connect and disconnect to the fixed host via their
wireless network interfaces using cellular phones. The fixed host has also a wireless
network interface to communicate with the portable computers. The portable
computers run the MaROS client software, and they are called as MaROS clients. On
18
the other hand, the fixed host runs the MaROS server software, and it is called as the
MaROS server. In MaROS terms, the MaROS clients are called as Mobile Hosts (or
MHs, in short), and the MaROS server is called as Mobile Host Service Provider (or
MSP, in short). In Figure 3.1, the physical structure of MaROS is shown in detail. In
the current design and implementation of MaROS, only one MSP is available. The
design may be modified by using additional MSPs. These MSPs may provide many
new ideas, which may be implemented in the near future, to the current design such as
parallel processing and load balancing.
Figure 3.1: The physical structure of MaROS.
The MaROS is platform independent. It may run on any OS that supports Java Virtual
Machine (JVM). This is a great advantage for both system programmers and
19
application developers. Once a Java code is compiled, it may be transferred into any
other platform and run there.
3.3 The Logical Structure of MaROS
MaROS uses a layered approach. In each layer, there may be one or more modules.
Each module is responsible from a specific task, and it may use the services provided
by the lower layer modules. The interaction between layers is shown in Figure 3.2.
RA: Recovery Agent MA: Migration Agent OM: Object Manager NA: Notification Agent CA: Communication Agent A, B: MaROS Application
Figure 3.2: MaROS layers.
The lowest layer is a composite layer. It is the combination of the OS kernel and the
JVM. The upper layers do not directly communicate with the kernel. They use the
lowest layer services via JVM.
20
The second layer is called as the MaROS Communication Support Layer (MCSL). In
this layer, the Communication Agent (CA) takes place. It provides reliable
communication primitives. These layer also supports disconnected communication
that is essential for the mobile hosts. The communication protocol provided by this
layer is called as the Turkish Coffee Protocol (TCp). The name is probably an
inspiration based on Java.
The third layer is the MaROS Application Support Layer (MASL). This layer is very
rich in agents. There are four agents in this layer: Object Manager, Notification
Agent, Migration Agent, and Recovery Agent. These agents provide many services to
the MaROS applications. Even the Communication Agent uses some of these
services. Object specific services such as object creation, deletion, notification,
relocation and system recovery are supported in this layer.
In the upper layer, the MaROS applications are take place. A MaROS application is a
Java application that may use MaROS services provided by the lower layers.
3.4 The System Agents
The software agent concept is one of latest programming techniques in the computing
world. Software agents are software modules with cognitive abilities such as
motivation, goal processing, reasoning and autonomy [10]. They are capable of
learning, act independent of the user to achieve a given goal [11]. In the current
implementation of MaROS, the system agents are MaROS applications that are
responsible from performing the given tasks and providing services to the applications
independent of the user. They do not have artificial intelligence.
21
There are five system agents in MaROS: Communication Agent (CA), Object
Manager (OM), Notification Agent (NA), Migration Agent (MA) and Recovery
Agent (RA). In the current design, the MSP does not support recovery. Therefore, the
Recovery Agent is available for only MHs. On the other hand, the other system agents
have MH and MSP versions.
3.4.1 Communication Agent In MaROS, the Communication Agent is the system agent which is responsible from
all the communication backbone. It uses a new communication protocol called TCp
(Turkish Coffee Protocol) that supports disconnected operations and virtual
connections. Some crucial tasks of the Communication Agent are handling
disconnected operations, providing non-blocking primitives, and the re-establishment
of existing connections after voluntary shutdowns.
3.4.2 Object Manager The Object Manager is the system agent that is responsible from all the object specific
operations. It handles the Object Table that keeps information about all objects in the
local system [12]. It is the creator of the other system agents. It interacts with other
system agents to support notification, migration and recovery operations. Another task
of the Object Manager is to provide system-wide unique identification for all the
objects.
22
3.4.3 Notification Agent
When a relocatable object1 is to be created or deleted, the MSP site, which keeps the
exact copy of the object, has to be informed. This process is necessary for the object
synchronization between the MH and the MSP, and called as notification. In the
object creation phase, the copy of the object is automatically created at the MSP site.
The Notification Agent uses notifier threads for handling notification requests.
Actually, the notification process is a preparation for the relocation process. Chapter 4
explains the Notification Agent, in detail.
3.4.4 Migration Agent The agent that handles the object relocation requests is the Migration Agent. Since,
the Notification Agent automatically creates copies of the relocatable objects at the
MSP site, the Migration Agent deals with the transfer of the parameters, running the
object at the MSP site, and retrieving the results when the object execution ends. The
logical structure of the Migration Agent is very similar to Notification Agent. It uses
migrator objects to handle the relocation requests [13].
3.4.5 Recovery Agent The mobile hosts may not run forever. They need to be shutdown periodically. The
task of the Recovery Agent is the recovery of the system after voluntary shutdowns. It
detects the shutdown request, and coordinates the shutdown process by managing a
table called Recovery Table (RT). This agent is explained in Chapter 5.
1 Types of objects are explained in section 3.6.1
23
3.5 Host Registration and Authentication Protocol
The system security is one of the most important parts of a system. The system should
be protected from unauthorized access. MaROS tries to provide the system security
by registering its users. After the registration process, the system may easily identify
and authenticate its clients.
The Host Identification Table (HIT) keeps the information of registered users. If a
mobile host user wants to use MaROS environment, s/he supplies some information
to the administrator of the MSP. This information is recorded into HIT and the user is
given a password for further authentication [12].
NAME ID AT MT OI TI P
gurhan 00:27:45:10 30/08/1998 30/08/1998 Yeditepe Univ. Compaq35 NB xxxx
mandrake 00:26:40:12 01/09/1998 01/09/1998 ABC Company IBM 350L xxxx
… … … … … … …
Table 3.1: An instance of the Host Identification Table (HIT).
The Host Identification Table contains seven fields:
1. NAME: The name of the mobile host.
2. ID: Network interface number (It is a worldwide unique identifier)
3. AT: Time of the host registration (Addition Time).
4. MT: Time of modification.
5. OI: Mobile host owner information.
6. TI: Technical details of the mobile host.
7. P: Encrypted password of the mobile host.
24
The authentication is done at the startup by the Authentication Object (AO). In order
to join the MaROS environment, the Authentication Object at the mobile host sends a
data packet that contains its NAME, ID and Password to its peer at the MSP site. The
Authentication Object at the MSP site searches the Host Identification Table for the
ID of the mobile host. If the ID is found in the table, the password is checked. If the
ID is not found in the table or the password is incorrect, the authentication request is
discarded. Otherwise, the host is accepted to the MaROS environment by sending a
positive acknowledgement. The host authentication process is depicted in Figure 3.3.
AO: Authentication Object HIT: Host Identification Table
Figure 3.3: The host authentication process.
25
3.6 MaROS Objects
An object can be defined as a collection comprising of a data structure and a set of
operation on this data structure [9]. In MaROS, an object is a program or a part of it,
which can be executed in the MaROS environment.
3.6.1 Object Types MaROS environment provides a big opportunity for its objects: The relocation
process. However, some parts of the objects do not need relocation capability.
Therefore, two types of objects are available in MaROS: ordinary and relocatable.
The type of the object should be decided at the time of the creation. After the creation
of an object, it is not possible to change its type. The ordinary objects do not have
relocation capability, and they may only be run at the host in which they are created.
On the other hand, the relocatable objects are automatically transferred to the MSP
site via the notification process. After a successful notification process, the relocatable
object may be run either in the MH or in the MSP. However, there is a restriction for
the relocatable objects. Once they start to execute, they may not change their host.
Since, each site has a copy of the object, another object internal is considered to start
the execution of the relocatable object: The object modes.
3.6.2 Object Modes A relocatable object has two copies at both sites. It is not possible to run them at the
same time in the system. There are two modes of a relocatable object: active or
passive. Only the object that is in active mode may be run. The ordinary objects have
only one mode that is always active.
26
The object may be activated in one of the two possible ways:
• By using activate() call: Ordinary objects and relocatable objects that do not need
relocation is activated by using this call. They are run in the environment where
they are created.
• By using relocate() call: Only the relocatable objects that are notified may use
this call. They are run in the MSP site, if this call is used. The mode of the object
at the mobile host is set to passive, and the mode of the copy object located in the
MSP site is set to active.
3.6.3 Object States Every object has a state concept. For instance, a student may be in the studying state.
A worker may be in the working state. The MaROS objects may be found in one of
the nine possible states:
• created: This state is the initial state of ordinary objects.
• created_notnotified: This state is the initial state of relocatable objects. The
objects in this state can be activated on the MH. However, the relocate()
primitive may not be used in this state.
• created_notified: After a successful object notification phase, the state of the
object is set to created_notified. By the use of the relocate() primitive, that object
may be run at the MSP site.
• ready: When an object is activated, its state becomes ready. The objects in the
ready state may be suspended or deleted. If the object is a relocatable object, it
cannot be relocated once it is activated.
• sleeping: If the execution of an object is suspended temporarily, its state becomes
sleeping. In this state, the object can be resumed (activated again) or deleted.
27
• relocating: The relocation process transfers all the input parameters of a
relocatable object to the MSP site. Until the end of the transfer operation, the
state of the object is relocating. The next state may be either relocated or
deleted_notnotified.
• relocated: After the transfer of all the input parameters of the object, the state of
the object is set to relocated. The next state may only be deleted_notnotified.
• finished: This state is special to the relocatable objects. If a relocatable object on
the MSP finishes its execution (not being deleted by a system call), its output
values must be transferred back to the mobile host. During the output transfer
process, the state of the object is set to finished.
• deleted_notnotified: After the end of the output transfer operation or a delete
request from the parent object, the object changes its state to deleted_notnotified.
This state lasts until the end of the successful notification for the object deletion
process, and ends up with the death of the object.
It is not possible to create relocatable objects at the MSP site. That means some of the
states are not used at the MSP site, whereas some of them are not used at the MH site.
Table 3.2 shows all the possible states of a MaROS object depending on its type and
location. More detailed information can be found in [3].
28
Relocatable Ordinary Object States MH MSP MH MSP
ready √ √ √ √ sleeping √ √ √ √ created √ √ √ created_notnotified √ created_notified √ relocating √ relocated √ finished √ deleted_notnotified √ √
Table 3.2: Object States.
3.7 Communication Structure
The communication design is one of the most important parts of a system like
MaROS. The performance of the system heavily depends on its communication
infrastructure.
The mobile systems mostly operate in a voluntary disconnected state. Since, the
current communication primitives are designed for the fixed networks, they easily
block a mobile host if there is no connection. In MaROS, a new communication
protocol, the Turkish Coffee Protocol (TCp), is designed to overcome the problems of
the mobile hosts. It resides over UDP, and provides virtual ports, nonblocking
primitives and message queues for supporting the reliable disconnected operations.
The Tp protocol uses two system objects to manage queues and virtual ports: The
Communication Agent (CA) and Virtual Port Mapper (VPM).
29
The CA is the creator and the manager of the outgoing message queue (send queue).
This queue contains the messages that are to be sent to the remote host. The use of
queues prevents the loss of messages, incase the mobile host is disconnected from the
system. The other system agents and user programs may get information about the
connection status via the CA.
The VPM is the controller of the virtual ports. There are two main task of the VPM:
1. Virtual Port Assignment: The job of the VPM is the mapping process of the
virtual ports to physical ports. When an object requests a port, the VPM allocates
a physical port. Then, it maps the port to a virtual port in its table, and returns the
virtual port number to the object. The object, which requests a virtual port, may
request any virtual port, or a special virtual port. The VPM provide services for
both types of requests.
2. Virtual Port Reservation: An object may need to reserve a port for its
subobjects. This process is very popular among the system agents to gain time.
The object reserves a virtual port and is given a key for accessing that port, later.
Then that object or any object that knows the port number and the key may use the
reserved port. Section 4.3.2 explains the process in more detail.
3.8 System Recovery
The mobile hosts have limited power supply. They have to be voluntarily shutdown,
when their battery requires rechargement. Because of this limitation, a mobile host
user does not want to run a long-run process. However, when the system shutdown is
necessary, a program may signal all the programs in the system to create their
recovery files. In MaROS, the task of the Recovery Agent is exactly the same. It
30
detects and coordinates a shutdown process. More detailed information may be found
in Chapter 5.
31
4
Notification Design
4.1 Introduction
This chapter briefly explains the design of the Notification process of the MaROS
system. In MaROS, there are two sites that need to communicate with each other: The
Mobile Host and the Mobile Host Service Provider. The user at the MH may need to
transfer his/her MaROS programs (objects) to the MSP site, and may want to delete
them after running those programs and retrieving the results. The transfer operation
creates copies of the chosen objects at the MSP site, and deletion operation deletes
those copies. In both operations, the MSP site is said to be notified. There is a special
system agent that is responsible from the Object Transfer and the Object Deletion
operations. This agent is called as Notification Agent, and the term Notification
covers both Object Transfer and Object Deletion.
32
4.2 Notification in General
A relocatable object may only be created at the MH site and may only be transferred
to the MSP. The initial state of a relocatable object is created-notnotified. In this state,
a MaROS user has two possible choices: 1) Running the object at the MH site, or 2)
Running the object at the MSP site. If the first choice is selected, the user may run the
object anytime it is necessary. However, after running the object, the user may not
interrupt the object execution and relocate it to the MSP site (In the current design and
implementation, the migration of running objects are not supported). If the user
selects the second way, he/she should wait the object state to change to
created_notified. When a relocatable object is created, this state change is
automatically initiated by MaROS. The Notification Agent is responsible from the
transfer of the Java Class files of the relocatable object to the MSP site. When the
files are transferred, the object may be run at the MSP site, also. The Notification
Agent receives Notification requests2 directly from the Handler (The worker thread of
the Object Manager), and informs it whether or not the transfer or deletion job was
successful. If the notification job is successfully done, and it is an object transfer
process, then the OM changes the state of the relocatable object to created_notified.
In order to run the object at the MSP site, relocate() primitive is used. The relocate()
primitive activates the Migration Agent, and the MA transfers the command line
parameters that are necessary to run the object to the MSP site.
2 A Notification request is either an object transfer or object deletion request.
33
4.3 Detailed Design
The NA uses reliable communication primitives of Turkish Coffee Protocol (TCp). It
directly receives information from the Handler thread of the local Object Manager.
The information contains the type of the task (Actually, the NA may handle more than
one task). After successful transfer of the object, the user may want to delete the
object. It is in the NAs responsibility to delete the object specific files after the arrival
of delete request.
The Notification Agent is a very busy agent, and it should be available any time it is
needed. Transferring and deleting objects are very time-consuming tasks, and they
require extensive work. If the request is a transfer or delete request (if it may be an
invalid packet, the NA actually does nothing, and ignores the packet.) It immediately
reserves a virtual port from the Virtual-Port-Mapping service of the Communication
Agent, and creates a Notifier. The Notifier is a system object that is responsible for
transfer or delete requests. When creating the Notifier, the NA passes all the
necessary information to the Notifier as parameters. This information contains
reserved virtual port number, type of the task, Object ID and name, and Java Class file
names to be transferred (if the job type is object transfer). The Notifier uses a reserved
virtual port for communicating with other system objects. There may be more than
one Notifier running at a time. Each of them is responsible for one special transfer or
delete request.
34
4.3.1 Peer Entities
The four system agents are located at both on the MSP and on the MH. For instance,
the Notification Agent at the MH has a peer at the MSP site. In the rest of the text, the
NAMH refers to the Notification Agent at the Mobile Host, and the NAMSP refers to the
Notification Agent at the MSP site. When the NAMH is creating its Notifier, it also
informs its peer for the new incoming request. As depicted in Figure 4.1, the NAMSP
reserves a virtual port number and creates a peer Notifier at the MSP site. When
creating the Notifier, the NAMSP gives all necessary information about the notification
request and its peer Notifier which is located at the MH site. This information
contains the virtual port number of the NotifierMH, the type of the notification request,
the Internet address of the Mobile Host, the reserved virtual port number and virtual
port key for using that port.
O: MaROS object OM: Object Manager H: Handler NA: Notification Agent N: Notifier VPM: Virtual Port
Mapper
Figure 4.1: Initial phase of a Notification process
35
Figure 4.1 shows the initial phase of the Notification process. At this stage, both the
NAMH and the NAMSP assign the given task to the Notifier peers and start waiting for
new requests. In the figure, the numbers on arrows indicate the order of events.
4.3.2 Reserving Ports
The Notification process is a time-critical process, and everything should be
organized in an efficient way. The first priority job of the NAMH is to inform its peer
(NAMSP) as soon as possible. Any latency at this process delays the Notification
process. There are two design choices in the initial phase of Notification process. The
first approach is simpler than the second one; however, it is not optimal. The Figure
4.2 illustrates the first approach.
tnc: Time of Notifier creation at MH tnc’: Time of Notifier creation at MSP ti: Time for informing peer NA tm: Time for informing OMMSP
Figure 4.2: A Notification design approach (Initial Phase).
36
In this approach, the NAMH creates its Notifier before informing its peer. The NAMH
and the NAMSP are in different machines and they may carry on their work in parallel.
This approach proposes a sequential execution. The NAMSP sits idle until the
NotifierMH informs it. In the Figure 4.3, tnc denotes the time for the Notifier creation, ti
denotes the time for informing the NAMSP, tnc' denotes the time for the NotifierMSP
creation, and tm denotes the time for informing the OMMSP.
Figure 4.3 illustrates the second and currently used approach. The NAMH immediately
informs the NAMSP. In this case, they may work in parallel providing shorter
Notification time values over the first approach. In Figure 4.4, these two approaches
may easily be compared.
tnc: Time of Notifier creation at MH tnc’: Time of Notifier creation at MSP ti: Time for informing peer NA tm: Time for informing OMMSP
Figure 4.3: Current design of the Notification process (Initial Phase).
37
However in the second approach, the NAMH should inform its peer about its
NotifierMH, which is not created yet. The NAMH should know the virtual port number
of its NotifierMH, before the creation of that NotifierMH. The same process is used at
the MSP site by the NAMSP. At the end, the NAMH may not inform its peer, before
creating its Notifier, and it may not create its Notifier, before informing its peer. This
deadlock is solved by a service called Virtual Port Reservation (VPR) in Virtual Port
Mapper (VPM) system object of the communication layer. The VPR is actually not a
NA specific service. It is used by Handler objects of the Object Manager, and it may
be used by any other user program. The VPR enables the NA to reserve port numbers.
The NA requests and reserves a port number. This adds a small amount of time to the
Notification process. This time period is shown in Figure 4.4 as time period of tpr (tpr
denotes the time for virtual port reservation). It receives the port number and also a
key for using that reserved port. The key provides security in the reservation process.
Only the Notifier with this key may get the reserved port. However, the Notifier
should use a different method when creating a connection. It passes virtual port
number and the virtual port key to the VPM for obtaining the reserved port.
tnc: Time of Notifier creation at MH tnc’: Time of Notifier creation at MSP ti: Time for informing peer NA tm: Time for informing OMMSP tpr: Time for port reservation
Figure 4.4: The comparison of two approaches.
38
4.3.3 Notifier Tables
Notifiers are the worker threads of the Notification Agent. There are five different
tables hold and used by Notifiers: Notifier Object Transfer Table (NOTT), Partial
Object Transfer Table (POTT), Class Dependency Table (CDT), Class Replica Table
(CRT), and Notifier Information Table (NIT).
4.3.3.1 Notifier Object Transfer Table (NOTT)
This table holds the names and full path of the class files to be transferred, and also
the length of these files. When an object transfer is in progress, the NOTT is always
used. The NOTT contains the names of the files that are going to be transferred.
4.3.3.2 Partial Object Transfer Table (POTT)
The POTT is used when a partial transfer operation is in progress. An object may be
partially transferred to the MSP site, due to many factors such as system shutdown,
link failures, etc. When the object transfer operation is interrupted, the retransfer of all
already-transferred objects in the next startup is not an ideal way. It is wasting of time
and system resources. MaROS tries to optimize the transfer operation by keeping an
additional table called POTT. The POTT keeps the indices of the partial files in the
NOTT. It additionally keeps the current length of each partial file. This information is
used to transfer partial files.
39
idx Filename with fullpath File length
1 /MaROS/test.class 17000
2 /MaROS/sample.class 25000
3 /MaROS/rect.class 1007
NOTT
idx NOTT index Current length
1 1 4096
2 2 8192
3 3 0
POTT
Table4.1: Sample NOTT and POTT instances.
4.3.3.3 Class Dependency Table (CDT) and Class Replica Table (CRT)
The CDT and its sub-table the CRT are used by only the Notifiers on the MSP. The
Figure 4.7 depicts the scope of all tables used by the Notification Agent and its
Notifiers. Each NOTT and POTT are created and used by only one Notifier. It is
shown that each NOTT and POTT have a copy at the peer Notifier.
Figure 4.5: Scope of the Notifier tables.
40
On the other hand, the NIT, the CDT and the CRT are global tables used and updated
by all Notifiers. For this reason, these tables should be synchronized for preventing
readers-writers problem (A typical and famous synchronization problem).
OID # of
Classes
File 1 File 2 File 3 …
28 3 1 2 3
29 2 2 3
… … … … … …
CDT
idx Class Name # of occurrence
1 /gurhan/Circle.class 1
2 /gurhan/Rect.class 2
3 /gurhan/Test.class 2
… … …
CRT Table 4.2: Sample CDT and CRT instances.
In the sample tables above, the task of each table is depicted. The CDT keeps object
records. It knows which object has which classes. There are references to the CRT for
class names. For example, in the sample CDT object Circle has three classes that are
referenced as 1st, 2nd, and 3rd positions in the CRT. A class file may be used by more
than one MaROS object. The CRT holds the information of how many objects are
using how many dependent classes. From the sample CRT, it is seen that class Rect
and class Test are used by two objects; whereas, class Circle is used by only one
object. Moreover, all these classes are uploaded from the mobile host which is called
as gurhan.
If any object deletion is necessary; first, the CDT is searched and all dependent
classes are located at the CRT. Each class occurrence number is decremented by 1 in
the CRT. If there are any class file with an occurrence number less than 1, this means
the object may be physically deleted since there are no any other object is using that
class (Another design preference is to hold the class file, since a new object may need
it very soon. However, at this case, a ttl (time-to-live) number may need to be
41
attached to that record. The CRT is periodically checked and if any ttl becomes zero,
the corresponding object is deleted)
4.3.3.4 Notifier Information Table (NIT)
As its name implies, this table holds the information of all Notifiers. There are two
versions of the NIT: The MH version and the MSP version. The MSP version holds
Mobile Host Identifier (MID) as an additional field. Each Notifier records itself into
that table in the beginning of its execution, and deletes itself at the end of its
execution. It is a synchronized table as CDT and CRT. The NITMH consists of three
fields: The Object ID of the Notifier, the Object ID of the object that Notifier deals
with, and finally the notification type. The notification type may be Creation or
Deletion. The NITMSP contains an MID field. This is used to identify which objects
belong to which Mobile Hosts. This table is mainly used by the Object Deletion
process (Section 4.3.5).
Notifier
OID
OID MID Type of
Notification
3090 980 00000000001 C
3095 993 00000000001 C
3110 765 00000000001 D
… … … …
Notifier
OID
OID Type of Nofification
1050 980 C
1067 993 C
1056 765 D
… … …
NITMH NITMSP
Table 4.3: Sample NIT instances.
42
4.3.4 Object Transfer (Object Creation)
The object transfer process deals with copying all the Java class files of the object to
the MSP site. It is a unidirectional process; the object transfer is only possible from
the MH to the MSP. This process is automatically initiated by the Object Manager,
when an object is created as relocatable. The Figure 4.10 shows the entire object
transfer scenario.
Figure 4.6: Object transfer (creation) process
The Object Manager creates a Handler (3), and the Handler at the Mobile Host (HMH)
sends a Notification request to the Notification Agent (4). The message format of the
request is as follows:
43
pnum rt oid path ncf cfinfo opts
Figure 4.7: Message format of Notification request.
This message is sent from the HMH to the NAMH. The first field, pnum, contains the
virtual port number of the HMH. This information is used by the NotifierMH to connect
to the HMH. The second field is the request type. As indicated before, a Notification
request may be either a Create (Transfer) or a Delete request. The next field contains
the object identifier of the object that will be notified. The next three fields are only
used when the request is an Object Creation request. The path contains Java
CLASSPATH of the object. The field ncf contains the number of class files to be
transferred. The next field, cfinfo, holds the information of the class files. The figure
4.12 shows this part of the message, in detail. The final field, opts, is reserved for
future use. It is added to the message format to hold any possible options that may be
added to the Notification process, in the future.
name len cdt owner Opts # …. #
Figure 4.8: cfinfo field in detail.
The fifth field in the Notification request message contains information about the
class files that the object have. Each file record has five fields, and file records are
separated with # signs. The first field contains the name of the class file. This
44
information is combined with the path value for locating the class file. The next field
is the length of the file. The third field contains the creation date and time of the file.
This information is planned to be used to detect different versions of the objects. The
fourth field holds the owner of that file in the system. In current design, there may be
only one user running MaROS in a MH, and this field is set to user maros. The last
field is again reserved for future use, and currently null.
The NAMH reserves a virtual port (5) for its (currently non existent) Notifier and
writes this port to the first field of the message. Then, it adds MID (Mobile Host
Identifier) and 48-bit security code3 to the head of the message, and forwards it to its
peer (6), and creates a Notifier for dealing with that process (7). The Notifiers use a
file transfer protocol similar to FTP (File Transfer Protocol). Since, TCp primitives
are being used, the Notifiers do not deal with packet sequencing and error correction.
When the NAMSP receives the request, it reserves a virtual port number (8) for its
(currently non existent) Notifier and writes this port to the third field (pnum) of the
message. This process is very similar to the job of the NAMH. After replacing the first
field in the message, it forwards the message to the OMMSP (10). The OMMSP creates a
Handler (HMSP) for informing the NotifierMSP (11). The HMSP checks the file structure
and creates a message for the NotifierMSP (12).
3 In current design, this security code is added for future work. It does not have any function, yet.
45
tt ncpf pcfinfo opts
Figure 4.9: Message format of the message that is sent from HMSP to NotifierMSP.
This message contains all the necessary information related with the object transfer.
The first data field contains the type of transfer. An object transfer may be in three
types: Full, Partial, or No need for transfer as illustrated in Figure 4.13. When the
HMSP checks the object files in the file system of the MSP, it may find out that none of
the object files are found in the file system. Then it determines the object transfer type
as Full transfer. However, if some of the object files are in the system, or some of
them are partially in the system, the object transfer type is set to be Partial transfer. In
the third case, all of the object files may already be in the system. Then, there is no
need for the object transfer, since all the files are in the system. The transfer type is
No Need in this case. Brief explanation of these types is as follows:
4.3.4.1 Full Transfer
In Full Transfer mode, the HMSP only sends an F (indicating Full transfer) in the
message. When the NotifierMSP receives this message, it knows that all the files
should be transferred in the NOTT (The NOTT is created and filled in the beginning
of the Notifier execution). The NotifierMSP forwards the packet to the NotifierMH (13),
and the NotifierMH starts transferring files one by one (14).
46
4.3.4.2 Partial Transfer
In Partial transfer mode, the HMSP sends a P (indicating Partial transfer) followed by
the number of partial files (ncpf). The pcfinfo contains partial class files information.
The Figure 4.14 pcfinfo field in detail. It contains the NOTT indices, and the lengths
of those partial files at the MSP site (If file does not exist, this field is 0). The file
records are again separated with # signs. The NotifierMSP forwards this packet to the
NotifierMH (13). Both of them create their POTT, and the NotifierMH starts transferring
only those partial files (14).
NOTT_index len opts # … #
Figure 4.10: pcfinfo field of the message in Figure 4.13.
4.3.4.3 No Need To Transfer
In this case, the HMSP sends an N (indicating No transfer is needed) in the message.
When the NotifierMSP receives this message it decides that all the files already exist in
the MSP. It simply forwards this message to the NotifierMH (13).
When the transfer operation ends, the NotifierMSP signals the HMSP, and the NotifierMH
signals the HMH for success or failure in object transfer operation (15). Figure 4.15
displays the message format of the message that is sent from Notifiers to Handlers.
This message contains a Success or an Fail indicating the result of the Notification
process. The second field contains the object identifier of the object that is just
notified or failed to be notified. If the operation is successfully finished, the HMSP and
47
the HMH update their tables and change the state of the object from
created_notnotified to created_notified at the MH site.
result oid
Figure 4.11: Message format of the message sent from Notifiers to Handlers.
4.3.5 Object Deletion
After running the object, and obtaining the results, the user may want to delete the
object. In order to delete the object, deleteObject() primitive of the OM is used.
Figure 4.16 illustrates the object deletion process.
Figure 4.12: Object deletion process
48
When the OM receives the deletion request (1), it immediately checks the type of the
object (2). If the object is an ordinary (non-relocatable) type object, the OM may
delete the object immediately. Ordinary objects may be deleted without regarding
their state. However, the object may be a relocatable object. In this case, the state of
the object is checked by the OM. The OM creates a HMH (3), and the HMH interacts
with other system agents for a successful object deletion.
Object deletion process is very similar to object transfer process. The Notifiers are
created at both site (7, 9), and the OMMSP is informed as it is in object transfer (10).
However, each Notifier checks the Notifier Information Table (NIT) for learning if
that object is already transferred or not (8, 10). If there is a Notifier dealing with the
transfer of that object, it is stopped by the Notifier who is charged to delete that
object. Meanwhile, the OMMSP creates a HMSP for dealing with this deletion process
(11). HMSP checks the Object Table (OT) and if the object is present, it removes the
object from the OT (12). Then it informs the NotifierMSP whether the deletion was
successful or not (13). If the object is deleted from the OT successfully, the
NotifierMSP tries to delete all the class files of that object. First, it deletes the object
record from the CDT and the CRT (The deletion of the object from the CDT and the
CRT is explained in Section 4.3.3.3). Then, it sends the result of the deletion process
to the NotifierMH (14). Finally, the NotifierMH does nothing but forwards the packet to
the HMH (15). The first field of this packet contains an S or F for notifying success, or
failure in the deletion process. The second, and the last, field contains the object
identifier of the object for providing security in the Notification process.
49
5
System Recovery Design
5.1 Introduction
System recovery is one of the most crucial parts in MaROS. There are two possible
types of Recovery: Heavy-weight Recovery, and light-weight Recovery. The first type
of recovery deals with all type of unexpected failures such as system lock-ups,
hardware failures, etc. In the current design and implementation, this type of recovery
is not handled. The second type of recovery deals with expected system interruptions
such as shutdown request by the user. Since, a shutdown request is detected by the
system, a negligible amount of time may be spent to backup some crucial data. This
process enables the system to continue its execution as if there were no interruptions,
in the next system startup. The process of backing up all the crucial data is not a
straightforward issue, and it should be coordinated in a careful manner. MaROS uses
a special agent to control the recovery process: Recovery Agent (RA). Recovery
Agent provides a controlled shutdown and this is called as System Suspension. When
the MaROS is rebooted, everything continues their execution from the point where
50
they are suspended. Without the RA, all interrupted processes should be restarted
without any chance. That means wasting of resources, and time.
5.2 Recovery Table (RT) and Recovery Tree Structure
Java does not support signal handling. This deficiency led the MaROS group to
implement their own signal handling backbone. The Recovery Agent and the
Recovery Table are the two main components of the signal handling structure. The
Recovery Agent keeps track of the Recovery Table (RT) for handling recoverable
objects. The RT holds all recoverable objects and their subobjects (if there are any).
This table actually holds a recovery tree structure4 as depicted in Figure 5.1. The root
of the tree is the Recovery Agent. Since each system agent controls a crucial part of
the system, all of them are recoverable.
The RA creates and handles a Recovery Tree structure by the help of the RT. If a
MaROS object has subobjects, the programmer should decide whether these
subobjects need recovery or not. For instance, the Notifiers are such subobjects that
rely on recovery. They transfer class files from the MH to the MSP. Incase there is an
interruption, all the transfer operation should not be started from the beginning.
4 Recovery Table does not contain the root (RA) of the Recovery Tree.
51
Figure 5.1: A sample Recovery Table instance and its corresponding tree structure.
One of the main tasks of the RA is to detect shutdown requests and initiate the
shutdown process. Before a proper shutdown, all recoverable objects should be
signalled and given a chance to write their crucial data to disk. The RA signals all
recoverable objects by traversing the Recovery Tree using the RT. Traversing the
Recovery Tree is a cooperative task. The RA initiates the process, and the OM
continues. All recoverable objects signal their subobjects by using the RT class
methods and wait until the subobjects finish their recovery process. Then, they do
their recovery work and signal their parent object.
52
5.3 Recoverable Objects vs. Unrecoverable Objects
Recoverable Objects are MaROS objects that are recorded into the RT. This table
contains the recoverable objects and their subobjects. This means if a subobject of an
object is recoverable, it should also be recoverable. Best examples for recoverable
objects are system objects such as the Notification Agent and its Notifiers. They are
all recoverable and they are recorded into the RT, when the system is in the startup
process. A MaROS programmer should make a plan and decide which of his/her
objects should be recoverable. Unfortunately, choosing the objects that are eligible to
be recoverable is not a straightforward issue. There are some drawbacks of making an
object recoverable. First of all, a recoverable object should have extra code inside and
this additional code decreases the execution speed slightly. So, if there will be a five
minutes execution of an object, that object may not be eligible to be recoverable.
However, if an object has a very time-consuming calculation, there may be need for
recovery. After many hours of calculation, it may be necessary to shutdown the
system. The object should be recoverable for surviving from system shutdowns.
If an object is chosen as recoverable, it is written into the RT by the Object Manager
(or by its parent object if it is a subobject). A recoverable object should be signalled
for any incoming shutdown process. The object is signalled by setting the SS
(Shutdown Signal) field of the recoverable object from 0 to 1 in the RT.
Each recoverable object has a record in the RT. Each of them has a Shutdown Signal
(SS) field that is initially 0. They should periodically check their SS fields. The object
knows that system will be shutdown soon, if its SS field becomes 1.
53
Shutdown process is a hierarchical and decentralized process. Some objects should
wait other objects to go into the recovery state. This explains the meaning of the
keyword hierarchical. In addition, each recoverable object is responsible for its
recovery. The RA does not deal with how the recoverable objects recover themselves.
It just coordinates the proper shutdown process.
Periodically checking the SS field is possible with additional code inside the
recoverable objects. Additional code deals with the object execution states. An object
changes its execution state, when it receives data, produces data or sends data. After
changing its execution state, the object should check the RT for its SS field. It is in
programmer's control to determine these execution states. States may be atomic or
coarse depending on the program behavior. However, one must understand that after
the system startup, the object may continue its execution starting from it last state
visited. Unfortunately, it looses all calculations after that state.
There are some disadvantages of this approach. First of all, it increases the object
code size and the execution time. However, its advantage is very obvious. An object
may continue its execution from the point where it is interrupted. Another big
advantage over signal-based recovery approach is that it is more efficient and nearly
optimal. If a shutdown signal approaches between states, the object is not aware of the
shutdown event until it changes its state. This approach requires sligthly more
shutdown time, but provides more efficient recovery.
54
5.4 System Shutdown
When the RA receives a shutdown request, it immediately sets the SS field of the OM
to 1. When periodically checking the RT, the OM realizes that its SS field is 1, and it
immediately changes SS fields of its subobjects to 1 in a hierarchical order. By doing
so, it signals user objects, and its Handlers, and wait their SS fields all become 0
again. When they all become 0, the OM knows that user objects properly finished
their execution.
Figure 5.2: Setting the SS field.
Then it signals the system objects in a hierarchical order. It signals the Notification
Agent and the Migration Agent first. (since, there are no user objects at this stage, it
does not receive any request such as create, delete, or relocate. This means that the
Object Manager does not create new Handlers.) There is a design choice after this
stage. When the NA and the MA are done with the shutdown process (their SS field
becomes 0 again), the OM may signal its Handlers, instead of signalling them
simultaneously with the user objects. Handlers are the worker threads of the Object
Manager, and they just send Notification requests to the Notification Agent. Then,
55
they start waiting for the Notification results from the Notifiers. This design
alternative may optimize the shutdown process. However, in the current design, the
Handlers are signalled before the NA and the MA. The CA is the final object to be
signalled, since it is the lowest layer agent that is responsible from all the
communication backbone of the MaROS. Figure 5.4 illustrates the whole recovery
hierarchy. The numbers indicate the shutdown order. Some of the objects have the
same number for indicating parallel processing.
Figure 5.3: The OM realizes the Shutdown Signal.
56
Figure 5.4: Flow of the Shutdown Signal
5.4.1 Creating Image Files As indicated in Section 5.3, the RA uses a decentralized approach for the recovery.
Each recoverable object is responsible for backing up its valuable data before
shutdown. Basically, they create a file that stores all the necessary information to
resume the object at the next startup. This file is called image file. After the creation
of the file, the recoverable object set its SS field to 0 again to signal its parent that it
has finished its recovery work.
Each recoverable object is responsible from its image file content. The file name is
the MaROS object identifier of the object. For instance an object with an Object ID of
4 creates an image file 4. It is ideal to keep all the image files in the same directory.
57
5.5 System Startup
When starting up the system, the first object, which will be run, is the Object Manager
(the creator of all other MaROS objects). It checks if there is an image file in the
image file directory. If there is not, normal startup process is initiated. However, if
there is such an image file, it immediately enters its recovery state. Each recoverable
object has a recovery state that restores its tables and data. In this state, the object
reads its image file and restores its environment.
5.5.1 Object Manager Startup Process Figure 5.5 illustrates the startup process of the Object Manager. The OM restores its
Object Table and creates the Recovery Agent. Then, it waits RA to finish its recovery
work. After the creation of the RA, the RA restores its RT and flushes all SS fields
with 1s. Then, it signals the OM. The OM continues with creating other system and
user objects. Thereafter, it starts waiting all objects' SS fields to become 0. Each
recoverable object checks whether its image file is present or not. Incase it is present,
the recoverable object starts by running its recovery state. After the recovery state
finishes, the object set its SS field to 0. If all SS fields that the OM is checking
becomes 0, the OM starts its normal execution after flushing 1s to the SS fields
signalling all of its objects that the recovery state successfully completed. When a
recoverable object realizes that its SS field is 1 again, it understands that everything is
OK. It sets its SS field 0 again and continues its execution.
58
Figure 5.5: Startup of the Object Manager
59
5.5.2 Recovery Agent Startup Process
The very first job of the Recovery Agent after its creation is to check its image file. If
its image file does not exist, it continues its normal execution without entering any
recovery state. Otherwise, it enters a recovery state in which it restores its Recovery
Table and its state. Figure 5.6 depicts the startup process of the Recovery Agent. After
restoring the Recovery Table, the RA flushes SS fields of all objects with 1s. This
process is one of the key points of the recovery protocol. The Object Manager detects
recovered objects by checking their SS fields.
Figure 5.6: Startup of the Recovery Agent.
60
5.5.3 Startup of the System Agents The startup process of the other system agents are very similar to each other. The
same approach may be used by all of the recoverable objects. In the current design,
each recoverable object checks whether there is a filename equals to its object
identifier. If there is any, the object first restores its tables and variables by reading
that image file. Then, it continues from the state where it left. The original code is
replicated for each state. An alternative approach may be use of labels for jumping to
the exact desired state. However, Java does not allow this solution; since, it does not
support unconditional jump operations. Figure 5.7 depicts the code replication
process.
Figure 5.7: Code replication solution for the recovery process.
61
5.5.4 Mutation of SS fields and RA Garbage Collector
The execution of the recoverable objects may finish any time, and the immediate
removal of those objects from the RT is not a viable solution; since, their parent
objects may be blocked on a suddenly dead child object. The parent objects should be
informed in some way. The mutation process of the shutdown signal field sets the SS
field of the dead object to an extraordinary value of 2. This value of the SS field
indicates that the owner of that SS field is dead.
The Recovery Agent runs a garbage collector in order to remove those dead objects
from the Recovery Table. The garbage collector periodically checks the SS field of all
recoverable objects for mutation. Incase it finds any, it immediately removes that
object from the list of its parent object record, and from the Recovery Table.
62
6
Pilot System Implementation
6.1 Introduction
This chapter provides a detailed information for the pilot implementation of MaROS
environment. First, the implementation language that is chosen to implement MaROS
is discussed, briefly. Secondly, the pilot system implementation environment is
described. Finally, the implementation details of the Notification and Recovery
modules of MaROS will be given.
6.2 Pilot System Implementation Language
The implementation language of the system has been chosen as Java. It is an object-
oriented programming language very similar to C++. It has many advantages, and
unfortunately some disadvantages. The choice of Java as a programming language in
MaROS was one of the milestones in the design phase. Java is a very powerful
programming language because:
• Java is platform independent. A compiled object code may be run on any
hardware and OS without any modification and even compilation. This was one of
the main reasons for choosing Java as the implementation language.
63
• Java does not support pointers. This property provides system security; since,
users may not garble the crucial memory locations by using pointer operations.
• Java does not support the disadvantageous properties of other object-oriented
languages. For example, many OOP languages support multiple inheritance,
which can sometimes lead to confusion or unnecessary complications. Java does
not.
• Java provides many pre-implemented utility classes such as Hashtable, Stack and
Vector classes. This property prevents programmers to implement and use their
own classes providing simplicity. This property also enhances the code
readability. Of course, programmers may extend these main classes by writing
their own methods.
• Java has a simple Thread package. MaROS is a multithreaded system, and Java is
one of the ideal programming languages for implementing such a system.
Java has also some disadvantages. First of all, it is slower than its counterparts; since,
it interprets the compiled byte code at runtime. It does not give full control to the
programmer for the sake of system security. For instance, with the lack of pointers,
and the process management tools, system programmers may encounter very
frustrated work.
Briefly, the choice of Java is a trade-off between system performance and system
portability & security.
64
6.3 Pilot System Implementation Environment
The MaROS environment consists of one SUN UltraSparc1 and eight Intel PCs. The
SUN system uses Solaris 2.5.1 operating system. Four of the PCs run Windows'95,
and the other four run Turkuaz 1.0.3 GNU/Linux operating systems. The UltraSparc1
is used as the MSP and Windows 95 machines are used as MHs. The Linux machines
were used as local MSPs by each programmer for testing MaROS modules. When the
new versions of the modules become feasible, those modules are transferred to the
MSP.
6.4 Pilot System Implementation
There are five main modules that forms the MaROS when combined together. These
modules are called as packages in Java. The five main packages are listed below:
• OM: The MaROS.OM package is the Object Manager of the system. It handles
objects and their operations.
• net: The MaROS.net package is responsible from the communication
infrastructure of MaROS.
• Notify: The MaROS.Notify package handles the object notification process.
• Migration: The relocation of the objects is managed by the MaROS.Migration
package.
• Recovery: The system recovery process after voluntary shutdowns is handled by
the MaROS.Recovery package.
There are also additional utility packages in MaROS. This thesis only covers the
Notify and the Recovery packages.
65
6.4.1 Notify Package
The MaROS.Notify package contains all the applications that are necessary for the
object notification process. The NotificationAgent is the main class of that package.
In the following subsections, the classes of the Notify package are overviewed.
6.4.1.1 Notify.NotificationAgent Class
This class is the heart of the notification process. It is implemented as a MaROS
thread, and it is the part that listens notification requests coming from the Object
Manager. This Class also contains Notifier class, and additional three table classes:
Notifier Information Table (NIT), Notifier Object Transfer Table (NOTT), and Partial
Object Transfer Table (POTT).
There are two versions (MH and MSP) of this class; since, there are two types of
machines in MaROS. For the system recovery, the MH version has additional
methods such as check_ShutdownSignal(), and saveImage(). Since, the job of the
Notification Agent is listening notification requests and assigning Notifiers to those
requests, it contains an infinite loop and a Notifier creation code.
6.4.1.2 Notify.NIT Class
NotificationAgent class maintains Notifier Information Table (NIT) for keeping track
of its Notifiers. The NIT is a global table that is used by all the Notifiers. Therefore,
all of the NIT methods are synchronized so that there is only one access to each
method at a time. The NIT has been implemented by using a table structure mapped
on array structures.
66
There are two versions of the table: The MH and the MSP version. In the MSP
version, there is an additional field for keeping the MaROS identifier of the mobile
host that is the source of that notification request. Table 6.1 and 6.2 displays the
formats of two versions of the NIT.
Notifier Object Identifier Object Identifier Notification Type
int int char
Table 6.1: The format of the MH version of the NIT.
Notifier Object Identifier Object Identifier Notification Type Mobile Host Identifier
int int char 12 chars
Table 6.2: The format of the MSP version of the NIT.
The table size is 255, in default. That number determines the maximum number of
Notifiers that may run simultaneously. It may seem very large for a MH, and very
small for the MSP. Currently, the use of the array structure seems optimal; however,
the use of other data structures may be considered for better scalability in the future.
The methods of the NIT are explained below:
• ClearTable(): It sets the length of the table to 0. All elements in the table are left
untouched. However, they may be overwritten with the new table entries.
• insert (int NOID, int OID, char Type): This method inserts a new item into the
NIT. It accepts three parameters: The Object Identifier of the Notifier (NOID),
Object Identifier of the object which is to be notified (OID), and finally the type of
the notification (Create or Delete).
67
• delete (int NOID): This method deletes an existing entry from the table.
• get_NOID (int OID, char Type): It returns the Notifier Object Identifier (NOID)
of a given notification request.
• get_tmax (): This method returns the current size of the table.
• set_tmax (int mx): With a given parameter, this method sets the table size. It is
used by the recovery process.
• get_Notifier_OID (int idx), get_ObjectID (int idx), get_Type (int idx): Those
three methods are used by the recovery process to restore the table entries.
6.4.1.3 Notify.NotifierClass Class
The NotifierClass is the class that is responsible from the Object Notification process.
It accesses NIT for registering or removing the current Notifier instance. It handles
the notification request forwarded by the Notification Agent. For the object transfer,
two additional tables are used. These tables are called as the NOTT and the POTT.
There are two versions of the NotifierClass like the NotificationAgent Class: The MH
and the MSP versions. The MH version contains additional methods for the recovery
process. It also has additional code inside. That means the MH versions of the
Notification Agent and the Notifiers run slower than their MSP counterparts for the
sake of reliability.
6.4.1.4 Notify.NOTT and Notify.POTT Classes
The Notifier Object Transfer Table (NOTT) and the Partial Object Transfer Table
(POTT) are used by the NotifierClass, and they are local tables for each of the
68
Notifiers. Each Notifier, that is responsible from an object transfer process, manages
its NOTT. On the other hand, the Partial Object Transfer Table (POTT) is created,
and used only, when the object transfer operation is in partial type.
File Names File Lengths
String long
Table 6.3: The format of the NOTT.
File Indices Current File Lengths
int long
Table 6.4: The format of the POTT.
The structure of the NOTT and the POTT are very similar to NIT. The formats of
both tables are shown in Table 6.3 and Table 6.4, respectively. The maximum table
size is 255 for both tables, in default. The methods of the NOTT are explained below:
• ClearTable(): It sets the length of the table to 0. All elements in the table are left
untouched. However, they may be overwritten with the new table entries.
• insert (string Filename, long Filelength): This method inserts a new item into
the NOTT. It accepts two parameters: The name and the length of the file. This
information is used for the object transfer operation.
• get_tmax (): It returns the current size of the table.
• get_filename (int idx): It returns the filename field of a given index in the NOTT.
• get_filelength (int idx): This method returns the filelength field of a given index
in the NOTT.
69
The POTT has almost the same methods. There are some differences in the methods
listed below:
• insert (int Fileindex, long currentFilelength): This method inserts a new item
into the POTT. It accepts two parameters. The first parameter is the index of the
file in the NOTT. The second parameter contains the partial file length of the file.
The partial transfer starts from that point.
• get_fileidx (int idx): It returns the fileindex field of a given index in the POTT.
• get_currfilelength (int idx): This method returns the filecurrlength field of a
given index in the POTT.
6.4.2 Notify.CDT Package
The Class Dependency Table (CDT) package is available in the MSP version of the
Notify package. A mobile host user may want to transfer many objects that use shared
classes. If a copy of a class file is available at the MSP site, there is no use of
transferring it again. The CDT package prevents retransmission of the same classes by
keeping track of a table called as Class Dependency Table (CDT). The CDT class
uses another class called as Class Replica Table (CRT). Chapter 4 contains a very
detailed information about the logical structures of these tables. This section gives a
detailed explanation about the physical structure of the tables.
6.4.2.1 Notify.CDT.CDT Class
The Class Dependency Table (CDT) has been implemented as an array structure. The
table size is 512, in default. Since, the CDT and the CRT are global tables that may be
70
updated by many Notifiers at a time, they are synchronized. The format of the CDT is
shown in Table 6.5.
Object Identifier Object Name # of Dependent Classes Dependent Classes
int String int int Vector
Table 6.5: The format of the CDT.
Each MaROS object consists of at least one class file. The CRT contains the names
and the number of occurrences of transferred files at the MSP. The CDT contains the
number of these files and their references to the CRT. The Dependent Classes field is
a Vector structure that contains the indices of these files in the CRT. All of the vector
components are in integer format.
There are a number of methods for the table management:
• insert (int OID, String ObjectName, int DepClassNo): This method inserts a
new element into the table. The dependent classes are inserted by the insertClass()
method.
• insertClass (int index, String ClassName): It inserts the dependent class file
information into the dependent classes vector field. This information contains the
CRT index and the name of the class file.
• delete (int OID): This method removes an entry from the CDT. It also updates
the CRT entries.
71
• getClassName (int index, int classIndex): It is used for obtaining the file names
from the CDT. This method is mainly used by the Notifiers for the full object
transfer operation.
• printCDT(): It is used for debugging purposes.
6.4.2.2 Notify.CDT.CRT Class
The Class Replica Table (CRT) is used by the CDT. The CRT has been implemented
as an array structure. Its default size is 16384 (4000 in hexadecimal format). In the
future, it is planned to be implemented by using hashtable structure. The format of the
table is shown in Table 6.6. This table simply keeps the number of occurrences of
each transferred class file in the MSP.
Class Names How Many Copies
String int
Table 6.6: The format of the CRT.
There are several methods for maintaining the table:
• insert (String ClassName): It inserts a new class name into the table. If that class
name already exists, the How Many Copies field is incremented by 1.
• delete (int index): This method decrements the How Many Copies field by 1. If
its value becomes 0, that record is deleted, and its corresponding class file is
removed from the system.
• getClassName (int index): It returns the name of the class file at a given index.
• printCRT(): It is used for debugging purposes.
72
6.4.3 Recovery Package
The Recovery package has been designed and implemented to increase the reliability
of the mobile hosts. Therefore, this package has only MH version. There are two
classes in the Recovery package: The RecoveryAgent and the RecoveryTable classes.
Since, there is no signal handling in Java, an alternative approach has been designed.
This approach uses a mutually exclusive shared global table: The Recovery Table.
Chapter 5 contains all of the details for the design of this table. In this section, the
implementation of the table is explained.
6.4.3.1 RecoveryTable Class
The Recovery Table is used by the Recovery Agent and all recoverable MaROS
objects. It is a signal handling backbone for the MaROS. A recoverable object may be
signalled by setting its Shutdown Signal (SS) bit to 1 in the Recovery Table.
Moreover, a recoverable object may wait for another recoverable object, and then
continue its work by using special Recovery Table methods. The Recovery Table is a
global table, and it must be set as mutually exclusive. All the methods of the
Recovery Table are synchronized. The format of the table is shown in Table 6.7. The
table is a hashtable. The keys for the hashtable are OIDs of the objects. All hashtable
elements are vectors. Each element in the vector is in Object format.
Shutdown Signal Object Identifier
(Hash Key)
OID of
SubObject #1
OID of
SubObject #2
…
Object Object Object Object …
Table 6.7: The format of the Recovery Table
73
The RecoveryTable class methods are explained below:
• Insert (Object key): This method inserts a new recoverable object information
into the RT. In all of the RT methods key is the object identifier, and Pkey is the
parent object's object identifier.
• Insert_SubObjectID (Object Pkey, Object key): It is used by the Recovery
Agent for recovering the table at the startup.
• Insert_SubObject (Object Pkey, Object key): This method inserts a new
element into RT, and updates its parent record adding the key of the subobject.
• Shutdown_Signal (Object key): It returns either 0 or 1. If this method returns a
1, that means the object with the given key should start its recovery procedure.
• Signal_Object (Object key): An object may signal another object by using this
method. It simply sets the Shutdown Signal (SS) field of the given object to 1.
• Wait_Object (Object key): In the recovery hierarchy, an object may have to wait
another object (e.g. its subobjects) to go on with its own recovery procedure. This
method blocks the calling object until the object with the given key finishes its
recovery.
• Signal_All_SubObjects (Object Pkey): This method is the enhanced version of
the Signal_Object() method. The parameter Pkey is the object identifier of the
object with one or many subobjects. It simply calls Signal_Object() method for all
of the subobjects.
• Wait_All_SubObjects (Object Pkey): This method is the enhanced version of
the Wait_Object() method. The object with an object identifier equals to Pkey
waits until all of its subobjects finish their recovery.
74
• clearShutdownSignal (Object key): It clears the Shutdown Signal (SS) field of
the given object in the Recovery Table.
• mutateShutdownSignal (Object key): This method sets the Shutdown Signal
field of the given object to 2. It is used by the terminating objects as a last call.
The methods Wait_Object() and Wait_All_SubObjects() check the SS field of the
object(s). These objects are removed from the RT by the garbage collector.
• deleteObject (Object Pkey, Object key): This method removes the object with
an object identifier equals to key. It also removes its entry from the record of its
parent object.
6.4.3.2 RecoveryAgent Class
The RecoveryAgent class is responsible from the coordination of the recovery
process. It is one of the main MaROS system agents. It manages the Recovery Table
(RT) and enables the system and the user objects to use the available Recovery Table
methods.
Another task of the Recovery Agent is the recovery of the Recovery Table. When the
system recovery is in progress in the system startup, the Recovery Agent restores the
Recovery Table.
The RecoveryAgent class maps the RecoveryTable class methods. Those methods
have already been explained in the previous section. The Recovery Agent has also
additional methods for the recovery process of the Recovery Table. These methods
are explained below:
75
• Signal (): This method is the controller of the shutdown process. It is triggered by
a shutdown request. Then, it initiates the recovery process.
• saveImage(): It is used for saving the image of the Recovery Table.
6.4.3.3 Recoverable Object Implementation
The current design and implementation of the recovery process in MaROS does not
provide a user transparent interface. In order to make an object recoverable, the
programmer should complete the following steps:
• Shutdown Specific Steps:
• The determination of the code states: A MaROS code may be divided into
several pieces. These pieces are called as code states. A code state change may
occur, when a program sends, receives or updates data. Figure 6.1 displays a
code part before and after the addition of the code states.
String tmpstr = strarr.substring (5); String tmpstr = strarr.substring (5); strarr=tmpstr; strarr=tmpstr; lngth = lngth - 5; lngth = lngth - 5; strarr=strarr.substring(0,lngth)+'\0'; strarr=strarr.substring(0,lngth)+'\0'; // Get Virtual Port Number from VPM // ##################################
// MESSAGE RECEIVED try { // E N T E R I N G S T A T E 1 dummy = new TCpClient (); }catch(MaROS.net.VPortException vpe) { STATE = 1; // exception handling // NECESSARY ROLLBACK DATA: } // - NIT
// - lngth PortNumber = dummy.reservePort(); // - strarr (<- OMMH) PortKey = dummy.getKey(); // - HandlerPort
// Check Shutdown Signal if (check_ShutdownSignal() == 1)
return; // ##################################
// Get Virtual Port Number from VPM
try {
dummy = new TCpClient (); }catch(MaROS.net.VPortException vpe){
// exception handling } PortNumber = dummy.reservePort(); PortKey = dummy.getKey();
BEFORE AFTER
76Figure 6.1: The example piece of code showing the addition of code states.
• The addition of the signal checker and the image file creator: Each
recoverable object has a record in the Recovery Table. Since, there is no signal
handling backbone in Java, the objects should periodically check their
Shutdown Signal field in the Recovery Table. This check may be done
between the code state transitions. At each transition, a method may be called
to check the Shutdown Signal field of the recoverable object. This method is
check_ShutdownSignal() as a tradition. Figure 6.1 and Figure 6.2 show the use
and implementation of this method, respectively.
// This method check shutdown signal for this object. // It returns 0, if everything is usual // Otherwise, it returns 1 indicating shutdown signal has reached public static int check_ShutdownSignal() { if (RecoveryAgent.Shutdown_Signal(new
Integer(MaROSobject.currentObject().getOID() ) ) == 1) { saveImage(); RecoveryAgent.clearShutdownSignal(new Integer
(MaROSobject.currentObject().getOID()) ); return (1); } return (0); } // check_ShutdownSignal()
Figure 6.2: The implementation of check_ShutdownSignal() method.
The next job is the creation of the image file creator. The traditional method
saveImage() is used for this process. This method creates a random access file in
the image directory with the name of the object identifier of the recoverable
object. Then, it saves the necessary tables and variables one by one. The format of
the image file is left to the programmer. However, it is also tradition to use "^"
character between the fields.
77
// This method takes image of the current object instance to disk // including tables, etc. public static void saveImage() { // Signal all subobjects RecoveryAgent.Signal_All_SubObjects (new
Integer(MaROSobject.currentObject().getOID() )); // All subobjects signalled // Now wait them to finish RecoveryAgent.Wait_All_SubObjects (new Integer
(MaROSobject.currentObject().getOID() )); // All subobjects finished their recovery job // Save Image File RandomAccessFile imagefile; String imagefilename = null; // Create file imagefilename =
SysConst.TempRecoveryPATH+MaROSobject.currentObject().getOID(); try { imagefile = new RandomAccessFile (imagefilename,"rw"); }catch (IOException ioe){
// error handling return;
} // STATE try { imagefile.writeByte (STATE); imagefile.writeBytes ("^"); } catch (IOException ioe) { // Error handling: Unable to write image file } // Write data into file if (STATE >= 0) { // NIT try { int i; int tmax = NIT_instance.get_tmax(); imagefile.writeInt(tmax); for (i=1; i<=tmax; i++) { imagefile.writeInt (NIT_instance.get_Notifier_OID (i)); imagefile.writeBytes ("^"); imagefile.writeInt (NIT_instance.get_ObjectID (i)); imagefile.writeBytes ("^"); imagefile.writeChar(NIT_instance.get_Type (i)); imagefile.writeBytes ("^"); } } catch (IOException ioe) { // Error handling: Unable to write NIT to image file } // NIT written } if (STATE >= 1) { <CODE CONTINUES>
Figure 6.3: An example code part of the saveImage() method.
78
• Startup Specific Steps:
• The addition of the image file reader: Each recoverable object code start by
checking its image file. If the object has an image file, it should be read and
the last state has to be restored. An example image file reader code is shown in
Figure 6.4.
RandomAccessFile imagefile; String imagefilename = null; byte recovery_state = (byte) 255; // Image file name imagefilename = SysConst.TempRecoveryPATH+MaROSobject.currentObject().getOID(); try {
int mx=0; // Maximum size of the NIT int i; int _NOID, _OID; char _Type; imagefile = new RandomAccessFile (imagefilename,"r"); // There is an image file // Get STATE First try { recovery_state = imagefile.readByte(); imagefile.readByte(); } catch (IOException ioe){ // Error Handling: Unable to read image file } if (recovery_state >= 0) { // recover NIT mx = (imagefile.readInt ()); NIT_instance.set_tmax (mx); for (i=1;i<=mx;i++) { _NOID = imagefile.readInt (); imagefile.readByte(); _OID = imagefile.readInt (); imagefile.readByte(); _Type = imagefile.readChar(); imagefile.readByte();
NotificationAgent.NIT_instance.insert (_NOID, _OID, _Type); } } if (recovery_state >= 1) { <CODE CONTINUES>
Figure 6.4: An example image file reader code piece.
79
• The replication of the original code for each state: The original code should
be replicated for each state in the final step of the recovery. The replicated
code at each state enables the continuation of the execution of MaROS
objects. Another approach may be use of labels and unconditional jumps;
however, Java does not support any of them. Figure 5.7 shows the code
replication process in detail.
80
7
Evaluation and Future Work
7.1 Introduction
This chapter presents the results and the evaluation of the performance tests for the
notification module. Moreover, the future research areas for the system is discussed at
the end of the chapter.
7.2 Performance Evaluation
The testing platform has been set by using two different computers: One of them is
for the MaROS client and the other for the MaROS server. A Pentium 166MMX
machine with Turkuaz GNU/Linux 0.99 operating system has been set as an MSP,
and a Pentium 200 machine with Windows '95 operating system has been set as a
MaROS client. In the tests, two types of object transfer (full transfer and no transfer)
and the object deletion processes have been tested on a 100 Mbit ethernet, 115200
bit/sec., 24000 bit/sec. and 19000 bit/sec. modem connections. In all the tests, the
machines have minimum CPU load, and there are minimum network traffic. Both of
the machines run Java 1.1.6.
81
7.2.1 Full Transfer Tests
NotifierMH reads the files into a buffer, and then it sends them to NotifierMSP. The
buffer size is 4096 bytes (4K), in default. The NotifierMSP constructs the files by
collecting the incoming packets together. The default buffer size may be increased or
decreased by changing the SysConst.DEF_BUFF_SIZE system constant. In full
transfer tests, the effect of different buffer size values over transfer speed has been
tested. Since, the TCp uses 8K-packet size, the tests were run for 2K, 4K, 6K and
8000 bytes buffer sizes (The 8K-buffer size is not allowed, since 8K-TCp packet
contains a header). A MaROS object, with a size of approximately 500K, has been
transferred throughout the test. The results are shown in Table 7.1. The timer is
started before the first packet is sent from the NotifierMH to the NotifierMSP, and
stopped right after the HandlerMH receives the notification result.
Buffer Size vs. Transfer Time (ms.) 100 Mbit 115200 bit 19200 bit
2K (2048 bytes) buffer size 60040 259910 432083
4K (4096 bytes) buffer size 32627 197017 364160
6K (6144 bytes) buffer size 26290 192423 346157
8000 bytes buffer size 18837 186070 351723
Table 7.1: Transfer results of 505319 bytes object.
From Figure 7.1 through Figure 7.3, the graph of transfer time vs. buffer size and
transfer speed vs. buffer size are shown for both 100 Mbit ethernet and modem tests.
Those figures show that the performance of the 100 Mbit ethernet connection
increases, when the buffer size is increased. On the other hand, modem tests show that
there is a barrier value for the buffer size (Figure 7.3). There is no performance
82
increase in the full transfer operation, when this barrier is exceeded. Moreover, the
use of larger buffer sizes may drastically decrease the performance as a side effect,
since large buffers has to be segmented for TCp encapsulation. In order to summarize,
there is no speed-up when using buffer size values larger than the communication
bandwidth of the mobile host.
010000200003000040000500006000070000
2K 4K 6K 8000Buffer Size
Tran
sfer
Tim
e (m
s.)
Figure 7.1: Transfer time vs. buffer size graph for 100 Mbit tests
050000
100000150000200000250000300000
2K 4K 6K 8000Buffer Size
Tran
sfer
Tim
e (m
s.)
Figure 7.2: Transfer time vs. buffer size graph for 115200 bit tests
83
0
100000
200000
300000
400000
500000
2K 4K 6K 8000Buffer Size
Tran
sfer
Tim
e (m
s.)
Figure 7.3: Transfer time vs. buffer size graph for 19200 bit tests
The transfer speed vs. buffer size graphs below clearly depict the effect of different
buffer size values on full transfer operation. It is seen that the transfer speed increases,
if the buffer size is increased. However, there is a barrier for the buffer size as it is
seen in Figure 7.6. This barrier value is strictly effected by the network bandwidth.
For example, the 19200 bit/sec. modem connection provides maximum of about 6
Kbit/sec network bandwidth with compression, and this is the barrier for the buffer
size value.
05000
1000015000200002500030000
2K 4K 6K 8000Buffer Size
Tran
sfer
Spe
ed (K
byte
/sec
.)
Figure 7.4: Transfer speed vs. buffer size graph for 100 Mbit tests
84
0500
10001500200025003000
2K 4K 6K 8000Buffer Size
Tran
sfer
Spe
ed (K
byte
/sec
.)
Figure 7.5: Transfer speed vs. buffer size graph for 115200 bit tests
0200400600800
1000120014001600
2K 4K 6K 8000Buffer Size
Tran
sfer
Spe
ed (K
byte
/sec
.)
Figure 7.6: Transfer speed vs. buffer size graph for 19200 bit tests
7.2.2 No Need Type Object Transfer and Object Deletion Tests A No Need Type object transfer is an object transfer operation without a full transfer
or a partial transfer. The tests performed on the same test platform where the full
transfer tests were done. The timer is started right after the notification request is
received by the Notification Agent. It is stopped right after the HandlerMH receives the
notification result. In the current code, there is a five seconds synchronization delay
85
included in these test results. Since, there is no object transfer, the size of the buffer
has no importance on the test results. Table 7.2 displays the results of the tests. The
modem tests have required approximately three more seconds to finish the operation
when compared with the 100 Mbit tests. The table also contains the results of the
object deletion tests. There is not much time difference between these two test results.
In these tests, it is seen that the bandwidth of the TCp connection does not have much
importance on the No Need Type object transfer and the object deletion operations.
Run #1 (ms.) Run #2 (ms.) Run #3 (ms.) Average (ms.)
24 Kbit modem
(No Need Transfer)
12300 11810 12410 12173
24 Kbit modem
(Object Deletion)
11860 12300 12300 12153
100 Mbit ethernet
(No Need Transfer)
10050 9410 9120 9527
100 Mbit ethernet
(Object Deletion)
9060 10110 9170 9447
Table 7.2: The results of the No Need type object transfer tests.
7.3 Future Work
This section presents the future research areas related with the Notification and
Recovery modules of the MaROS. Furthermore, the future works planned for the
MaROS is discussed at the end of the section.
86
7.3.1 Future Work on the Notification Module
The Notification Module deals with the transfer and the deletion operations on the
relocatable objects. In the performance tests, it has been proved that there is no
considerable speed-up in the object transfer operation, when the buffer size exceeds
the communication bandwidth. Since, the mobile hosts may connect to the MSP in
different connection speeds, dynamic buffer size values may be used in order to
optimize the transfer operation for each connection.
Object compression is another useful approach for an optimal transfer process.
MaROS objects can be compressed and then be transferred to the MSP. When the
MSP receives the compressed object data, it may decompress and create the
relocatable MaROS object. This process requires additional object compression time;
however, the transfer speed will be improved considerably.
The object deletion process may be modified by adding a new server machine next to
the MSP. This machine may be called as MaROS Recycle Bin (MRB), and all the
objects, which are to be deleted, may be moved to the MRB instead of being deleted
from the MSP automatically. This approach does not increase the object deletion time
too much; since, there will be a very fast network connection between the MSP and
the MRB.
7.3.2 Future Work on the Recovery Module Currently, the system is vulnerable to failures such as system lockups and hardware
problems. In order to overcome these problems, the Recovery Module should be
completely redesigned. Since, Java does not provide signal handling primitives, the
87
implementation language may need to be changed. However, this is not good for the
portability of the MaROS.
The Recovery Module may be made at-least semi-transparent to the programmers by
providing a programming interface, in the future. In the current version of Recovery
Module, a MaROS programmer should know almost everything about Recovery
Module to write recoverable applications.
7.3.3 Future Work on MaROS There are many research areas that are not designed and implemented in the current
version of MaROS. Some of these areas are system security, heavy-weight migration,
and load balancing on multiple MSPs.
The system security is one of the most important issues in a system like MaROS;
since, there may be many unauthorized attempts to access to the system. There is a
host registration and authentication protocol; however, in the future, the design of a
new agent (Security Agent) should be considered.
In the current design and implementation, the Migration Agent only deals with the
light-weight type object migration. In the future, the migration of running MaROS
objects may be implemented. This type of migration may be made possible with
increase in the bandwidth of wireless connections in the future.
Another possible enhancement that may be implemented in the future is the use of
multiple MSPs. The current design may be extended to a distributed system of MSPs
88
connected via high-speed networks. In this case, MaROS may be optimized by using
techniques such as load balancing, and parallel processing.
In order to increase the system performance, the MaROS threads may communicate
using shared memory instead of using MaROS communication primitives. However,
all the possible problems such as starvation and deadlock of the objects should be
dealt in that case.
Finally, the implementation language may be changed in order to increase the overall
system performance. However, in this case, the system should be redesigned
considering the advantages and disadvantages of the new implementation language.
The C++ seems the ideal alternative. In order to keep the portability feature of
MaROS, the Java-based MaROS objects may continue to be used.
89
8
Conclusion
The Mobile and Relocatable Object System (MaROS) is an application development
platform especially designed to minimize the problems that arise from the limitations
of mobile computers. The system supports disconnected operations, object relocation,
and recovery of MaROS clients. In this dissertation, the design and the
implementation of the Notification and the Recovery modules have been presented.
The transfer operation of the relocatable objects is automatically initiated by the
system. A copy of the relocatable object is created on the MSP site, while the object is
being created on the mobile host. This process is called as the notification. The
notification process simplifies and speeds up the object relocation process. The
Notification Agent and other system agents use worker threads to achieve optimal
response times for the requests.
System recovery is one of the most important issues in a system like MaROS. In the
current design, the recovery of all recoverable objects is possible after voluntary
shutdowns. However, the Recovery module is not transparent to the programmers.
Currently, a MaROS programmer should follow a well-defined path to code
recoverable objects.
90
The recovery process is hierarchical and decentralized. The Recovery Agent only
coordinates the process by signalling the system objects in a hierarchical order.
However, the process is not centralized; since, all the system agents and recoverable
objects are responsible from their own recovery.
The current design of Recovery module does not cover failure recovery which is the
result of hardware and OS failures. In order to deal with that type of recovery, a
checkpointing approach should be designed and implemented. Java does not provide
low level primitives for accesing to the system resources directly. Therefore, another
programming language may be chosen for the implementation, in the future.
Mobile Computing is the technology of the future. Currently, there are many research
projects that are carried out on mobile computing platform. The aim of those projects
is to improve the performance and the functionality of mobile computers, in general.
MaROS is one of them, and it tries to provide an application development platform
especially designed for the mobile computers.
91
References
[1] M. Faiz, A. Zaslavsky, B. Srinivasan. Revising Replication Strategies for Mobile
Computing Environments
[2] Ramon Caceres, Liviu Iftode. Improving the Performance of Reliable Transport
Protocols in Mobile Computing Environments, Proceedings of the IEEE, special issue
on Mobile Computing Networks, 1994.
[3] Şebnem Baydere et. al. MaROS: A Framework For Mobile Application
Development, EURO-PDS'97 International Conference on Distributed and Parallel
Systems, June'97, Barcelona, Spain http://www.yeditepe.edu.tr/MaROS/paper1.ps.Z
[4] Jeppe D. Nielsen. Transactions in Mobile Computing,1995.
[5] Anthony D. Joseph, Alan F. deLespinasse, Joshua A. Tauber, David K. Gifford
and M. Frans Kaashoek, Rover: A Toolkit for Mobile Information Access, Proceedings
of the Fifteenth Symposium on Operating Systems Principles, December 1995.
[6] A.D. Birrel and B.J. Nelson, Implementing Remote Procedure Calls, ACM Trans.
Comp. Syst., 2(1):39-59, Feb. 1984.
[7] Anthony D. Joseph, M. Frans Kaashoek, Building Reliable Mobile-Aware
Applications Using the Rover Toolkit, Wireless Networks Magazine, Vol.3 (1997),
No. 5, October 1997.
92
[8] Toshio Shirakihara, Hideaki Hirayama, Kiyoko Sato and Tatsunori Kanai,
ARTEMIS: Advanced Reliable disTributed Environment Middleware System,
Proceedings of the International Conference on Parallel and Distributed Processing
Techniques and Applications, July 97.
[9] Andrzej Goscinski, Distributed Operating Systems The Logical Design, 1992,
Addison-Wesley Publishing Company
[10] Ören, T.I., Software Agents: Basic Concepts and Internet Applications,
Bilisim’96, Bildiriler96, 1996.
[11] Wreggit, D.J., Software Agents Using Java, Distributed Processing, 1995.
[12] Yıldız, M.C., Object Naming and Creation in a Mobile System, MSc. thesis,
Yeditepe University,1998.
[13] Demir, O., Object Relocation in a Mobile Computing Environment, MSc. thesis,
Yeditepe University, 1998.
[14] Devlet, G., A Communications Infrastructure for Disconnected Operations in a
Client/Server Computing Environment, MSc. Thesis, Yeditepe University, 1998.
93
Bibliography
[1] Naughton P., Schmidt H., Java: The Complete Reference, Osborne, MCGrawHill.
[2] Brian N. Bershad and Henry M. Levy. A remote computation facility for a
heterogeneous environment, Computer, 21(5): 50-60, May 1988.
[3] Bruce Walker, Gerald Popek, Robert English, Charles Cline and Greg Thiel, The
LOCUS Distributed Operating System, In Proceedings of the Ninth ACM Symposium
on Operating System Principles, pages 49-70, October 1983.
[4] P. Stanski, An Integrating Architecture for Distributed and Persistent Mobile
Software Agents, PESOS Technical Report, Monash University Department of
Computer Technology, Australia, 1997.
[5] Theimer M.M., Lantz K.A. and Cheriton D.R., Preemptable Remote Execution
Facilities for the V System, In Proceedings of the Tenth ACM Symposium on
Operating Systems Principles, Orcas Island, Washington pp.2-12, 1985.
[6] V. Koudounas, Why Mobile Computing? Where can It be Used?, http://www-
dse.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/vk5/article1.html
[7] J.F. Bartlett, W4 – the Wireless World Wide Web, In Proceedings of IEEE
Workshop on Mobile Computing Systems and Applications, December 1994.
94
[8] T.F. La Porta, K.K. Sabnani and R.D. Gitlin, Challenges for nomadic computing:
Mobility management and wireless communications, Mobile Networks and
Applications 1(1), 1996.
[9] Object Management Group, Corba Services: Common Object Services
Specification, revised edition, 95-3-31, March 1995.
[10] Object Management Group, The Common Object Request Broker Architecture
and Specification 2.0, July 1995.
[11] G.M. Voelker and B.N. Bershad, Mobisaic: An Information system for a mobile
wireless computing environment, In Proceedings of IEEE Workshop on Mobile
Computing Systems and Applications, December 1994.
[12] R. Want et al., An overview of the ParcTab ubiquitous computing environment,
IEEE Personal Communications Magazine, 2(6), December 1995.
[13] T.F. La Porta et al., Experiences with network-based user agents for mobile
applications, Mobile Networks and Applications, Vol.3 pp.123-141, August 1998.
[14] N. Davies et al., L2imbo: A distributed systems platform for mobile computing,
Mobile Networks and Applications, Vol.3 pp.143-156, August 1998.
[15] W.N. Schilit, A system architecture for context-aware mobile computing, PhD.
Thesis, Department of Computer Science, Columbia University, New York, 1995.
[16] B.D. Noble, M. Price and M. Satyanarayanan, A programming interface for
application-aware adaptation in mobile computing, In Proceedings of MLIC’95,
pp.57-66, Ann Arbor, MI, April 1995.
[17] A. Friday and N. Davies, Distributed systems support for mobile applications, In
Proceedings of IEEE Symposium on Mobile Computing and its Applications, Savoy
Place, London, November 1995.
95
[18] N. Davies, S. Pink and G.S. Blair, Services to support distributed applications in
a mobile environment, In Proceedings of SDNE’94, pp.84-89, Prague, June 1994.
[19] R. Parkash, M. Singhal, Dependency sequences and hierarchical clocks: Efficient
alternatives to vector clocks for mobile computing systems, Wireless Networks, Vol.3.
pp.349-360, October 1997.
[20] G.H. Forman and J. Zahorjan, The challenges of mobile computing, IEEE
Computer 27(4) pp.38-47, April 1994.
[21] K. Birman and T. Joseph, Reliable communication in the presence of failures,
ACM Transactions on Computer Systems 5(1) pp.47-76, February 1987.
[22] M. Ahamad, P. Dasgupta and R.J. Leblanc, Fault-tolerant atomic computations
in an object-based distributed system, Distributed Computing 4 pp.69-80, 1990.
[23] Sun Microsystems Corporation, Remote Method Invocation for Java,
http://chatsubo.javasoft.com/current/rmi/index.html, July 1996.
96
top related