search on centralized networks
TRANSCRIPT
-
8/6/2019 Search on Centralized Networks
1/40
Peter S.B. Dushkin
Efficient Search Methods
In Centralized Systems
Diploma in Computer Science
Queens College, Cambridge 2003
-
8/6/2019 Search on Centralized Networks
2/40
2
-
8/6/2019 Search on Centralized Networks
3/40
Proforma
Peter B Dushkin
Queens College
Peer to Peer Content SharingDiploma, Computer Science, 2003
Word Count: 9,039
Project Originator: Mr. Peter Dushkin
Project Supervisor: Mr. Meng How Lim
Original Aims:
The original aim of this project was to build an example of a peer to peer content search
and distribution system. Each user of the system is capable of entering keywords (such as
getFile or getPeers) to find out information about attached nodes on the network. Theproject follows Napsters example by making a central server available to the connected
clients. The server contains a registry file that maintains and keeps current information
about connected clients and what content each is advertising.
Work Completed:
An RMI client and server were designed and implemented to test various search and
retrieval methods in a centralized network environment. Class libraries were built for
both the local and remote software and data was collected to test the project.
Special Difficulties:
There were no difficulties.
Declaration:
I Peter Dushkin of Queens College, being a candidate for the Diploma in ComputerScience, hereby declare that this dissertation and the work described in it are my own
work, unaided except as may be specified below, and that the dissertation does not
contain material that has already been used to any substantial extent for a comparablepurpose.
Signed
3
-
8/6/2019 Search on Centralized Networks
4/40
Date
4
-
8/6/2019 Search on Centralized Networks
5/40
Table of Contents
List of Figures ...................................................................................................................... 6
1 Introduction ....................................................................................................................... 71.1 Abstract ...................................................................................................................... 7
1.2 Motivation .................................................................................................................. 7
1.3 Subject Overview and Terminology .......................................................................... 92 Preparation ...................................................................................................................... 10
2.1 Resources ................................................................................................................. 10
2.2 Planning & Documentation ......................................................................................103 Implementation ............................................................................................................... 15
3.2 Classes and Methods ................................................................................................15
3.2 The Client ................................................................................................................17
3.3. The Server .............................................................................................................. 18
3.4 Security ................................................................................................................... 194 Evaluation ....................................................................................................................... 21
4.1 Data Collection ........................................................................................................213.3 Node Discovery ....................................................................................................... 21
3.4 Content Discovery ................................................................................................... 23
3.5 Content Delivery ...................................................................................................... 253.6 Observing results .................................................................................................... 26
5 Conclusions ..................................................................................................................27
5.1 Further Development ............................................................................................... 275.2 Final Conclusion ......................................................................................................28
Appendices .....................................................................................................................29
A. Key Code Samples ....................................................................................................29B. Bibliography ..........................................................................................................31
....................................................................................................................................... 31
C. Project Proposal .....................................................................................................32
Supervision Requirements ................................................................................................. 37
5
-
8/6/2019 Search on Centralized Networks
6/40
List of Figures
Figure 1..11An example of the question and answer exercises in the planning and documentation
period. A number of key outputs of this exercise helped us get a better sense of what
needed to be addressed in sequence diagrams and use cases.Figure 2..12
User view of the system. This use-case describes the main inputs required of a user on the
system. Later, these inputs will be translated into classes and methods.Figure 3..14
The pseudo code for how the client is intended to interact with the system. A number of
such pseudo code examples were designed and updated later informing sequencediagrams and class and method design.
Figure 4..16Sequence flow of the central indexing architecture. The local host queries the remote
directory server for the location of a given piece of content. The server replies with the IPlocation of the node containing the text file. Local host then handshakes with the remote
host to receive the content.
Figure 5..17Design of the systems core class diagrams.
Figure 6..18
The public interface for the client.Figure 7..19
Design of the systems core class diagrams.
Figure 8..20The above class example remotely queries the server and returns a time stamp.Figure 9..21
The startPeers method on the client side invokes the remote givePeers.
Figure 1022Functioning of the timing sequences. Commands issued on the local client, the execution
of remote methods, and returned results are all time stamped to issue data points for
further exploration.Figure 1123
Wait time for locating remote connected nodes.
Figure 1223
Wait time for locating remote connected nodes. One client.Figure 1324
Wait time for multiple peers requesting multiple files.
Figure 1424Wait time for multiple peers requesting the same file.
Figure 1525
Wait time for multiple peers requesting small files.Figure 1625
6
-
8/6/2019 Search on Centralized Networks
7/40
Wait time for multiple peers requesting a large file.
1 Introduction
1.1 Abstract
The digital exchange of information over peer-to-peer networks is not a new topic.Applications such as classroom educational tools, chat services, multicast applications,
and, more commonly, electronic mail, could all be categorized as variations on this
design. The now infamous birth of Napster as a quick and dirty way to access music fileshas reinvigorated industrial and academic activity in the p2p arena. Core to such
development activities has been issues such as disk space utilization, bandwidth
constraints, effectiveness of peer discovery and group management, point of failure,quality of data location, reliable exchange of information, and other issues. The fruit of
these development efforts have been second generation services such as Gnutella, Kazaa,
JXTA, and Pastry; all of which offer a completely decentralized method of file sharing.
1.2 Motivation
A common trait of all such Peer to Peer systems is that they establish their own particular
foundation, leaving the application development community much flexibility to build
their products and services on top. The assumption being, of course, that the foundationthat was laid by their predecessors is both efficient and reliable. This simply quandary
makes up driving motivation behind this diploma project. How do I test the effectiveness
of the underlying architecture of a file-sharing system? Given the popularity of this area
of development, there seem to be as many protocols as there are claims to a solidsolution. For example, Gnutellas flexible network design dramatically decreases the
possibility of a single point of failure. But, as the system scales, Gnutellas method of
flooding the network with queries eventually does more harm than good. Anotherexample is Pixie, a peer-to-peer architecture that uses the concept of content scheduling
to decrease the limitations imposed by network utilization. In this case, rather than flood
the network with requests, the application schedules content based on the efficient use ofresources. Finally, there is the recently popular hybrid networks a combination of the
decentralized and centralized architectures of companies such as Gnutella and Napster
respectively.
With this latest evolution in networking approaches in mind, the goal of this project is to
design a small centralized network and test how efficiently that network performs search,discovery, and retrieval methods under varying conditions. Since Napsters introduction,
central indexing services, as a scalable solution for content delivery, have been largely
replaced by decentralized systems that are less vulnerable to a single point of failure.
However, as both research and industrial efforts continue to offer solutions to theshortcomings of both centralized and decentralized networks, it is becoming increasingly
apparent that a combination of the two networks is (currently) the best solution. This is
7
-
8/6/2019 Search on Centralized Networks
8/40
the stuff that the KaZaA file-sharing network is made of. In the KaZaA solution,
centralized servers are located through-out decentralized peer groups, combining the fast
access of a central index with the request propagation strengths of a decentralizedsolution.
In large part due to the renewed utility of centralized search methods within hybridsolutions, this project sets out to explore how centralized search and retrieval is
accomplished and, if need be, where it can be improved. The four primary system
features considered in this project are:
Node Discovery. One of the major challenges in peer-to-peer systems design is
the discovery of nodes on the network. Each individual node needs a reliablemethod for discovering and handshaking with every other node. Additionally,
information about the various nodes on the network needs to be stored in some
manner. In an indexing system such as this one, a central server is used to
maintain information about the nodes on the network. Each individual node must
log onto the network, register with the central server, and query the serversregistry to discover other nodes on the network.
Content Discovery. Content discovery is the location of files on the network. In
centralized systems, this can be directly from one peer to another. In our system,
the directory server acts as the adjudicating element in the network, directing thelocal clients requests. Specifically, requesting peers are given IP references to
nodes on the network with requested files. Of interest to me in this project is the
time it takes between content request and node discovery.
Content Delivery. Once the server, remote hosts, and content location is
discovered, there needs to be an efficient mechanism for delivering files over thenetwork. In many peer-to-peer systems, this piece of the puzzle is key to theoverall success of the design. For centralized architectures in particular, content
delivery can become an electronic thorn relative to the amount of nodes on the
network. The more hosts requesting content, the more likely the single directoryserver is unavailable for replying to requests.
Security. Security should never be overlooked when designing any networkedsystem. Security is especially important in peer to peer networks where both the
volume of content and network nodes can be quite large. Common security
problems such as viruses, encryption cracking, bandwidth clogging, internal and
external network attacks, eavesdropping, and so on are all concerns whendesigning such a system. Additionally, as the number of nodes on the peer to peer
network increases, so does the systems overall vulnerability to security breaches.
For our centralized application of peer to peer, I have decided to implement abasic example of Secure Sockets Layer (SSL) from the standard java SDK.
8
-
8/6/2019 Search on Centralized Networks
9/40
1.3 Subject Overview and Terminology
At the heart of any discussion of efficient network design is overall topology, or, howto best connect the nodes within a group. For centralized system such as the one I
considered, the topology is considered in terms of information flow of the network as
a whole. The nodes in the graph are the peers and links (or edges) between peersindicate a regular sharing of information. For the network to be truly effective, the
nodes should be able to use the edges to share information without unnecessarily
loading the network.
How the network is designed determines how information is shared. Below are
definitions of the most common network models in use today:
1. Centralized. The architecture considered in this diploma project. Centralized
client/server systems are currently the most popular form of network with a
central server adjudicating among its client peers. Examples include web
servers, databases, SETI@Home, Napster, etc.2. Ring. A common method for scaling centralized services is to use a cluster of
machines arranged in a ring to act as a distributed server. Communication
between servers coordinate the sharing of the system state. This establishes agroup of nodes that provide identical function to a single server but incorporate
redundancy and load balancing capabilities. Typically, ring systems consist of
machines that are nearby on the network and owned by a single organization.3. Hierarchical. DNS is an example of such a system. In the case of hierachical
systems, authority flows from the root name servers to the server for the
registered name and downward. Usenet is another example of a largehierarchical system.
4. Decentralized. Popular as truepeer to peer computing, decentralized systemscommunicate symmetrically, where each node takes on all responsibility as bothclient and server. Popular examples are Gnutella and FreeNet.
Each of the above system architectures dramatically effects the overall success
and usability of the network. In our centralized example, there are numerousenvironmental factors that can influence the overall stability and effectiveness of the
system. For example, resources may suddenly become unavailable if a user decides to
disconnect from the network or power-off a machine. Of course, the volume of users andtheir volume of use is a key concern. There are also random events such as connectivity
failures, hackers, viruses, etc. (albeit, not a consideration in this project) that can
influence the systems performance.
9
-
8/6/2019 Search on Centralized Networks
10/40
2 Preparation
This chapter details the requirements gathering and design phase prior to algorithmdesign and data collection. In this phase, a waterfall type method was employed,
consisting of evaluating the projects requirements, building and updating use cases,planning the final system design, and finally, implementing, debugging and testing it.
2.1 Resources
2.1.1 Hardware
The only significant hardware requirements of this project was access to a small
network. After considering setting up a number of Linux boxes in Concroft, it was
decided to opt for the convenience of setting up a small network environment withinmy college room. This was done by networking my laptop directly to a rented PC.
2.1.2 Protocol
There are a great deal of file-sharing protocols on the market today. Most reflect its
particular flavor of file-sharing. For example, the Gnutella networks decentralized
approach is, in fact, its own protocol and can be developed for accordingly. When Ioriginally considered protocols with my supervisor, we opted to go with the relatively
new and much-discussed JXTA API provided by Sun Microsystems. This decision
was based entirely on our desire to explore a new application of peer-to-peernetworking with a large development community supporting it. Later on in the
project, after learning a good deal about the JXTA API, we decided to move to
Remote Method Invocation (RMI) as the protocol of choice. We did this to be able todig deeper than the JXTA solution would have allowed.
2.2 Planning & Documentation
A good deal of design-time efforts went into this project. Requirements analysis was
made up of several sections, each defining a particular functionality of the system.This project followed the following planning constructs: Analysis (the textual what,
how, and why of planning for development), Use Case Diagrams (graphicaldescriptions of how the user interacts with the system), Sequence Diagrams (what is
the particular step-by-step process of the system), & Class Diagrams (what are theclasses and their methods). Other diagrams such as State, Activity, Collaboration, &
Deployment were considered but deemed overkill and, in many cases, redundant.
10
-
8/6/2019 Search on Centralized Networks
11/40
2.2.1 Analysis
The analysis phase is used as a top-level exploration of the project as a whole. Itfollows a simple question and answer format and is intended to touch on each and
every aspect of the system, from hardware and software requirements, to how
users collaborate. Below is a piece of the initial Analysis phase Q and A.Q: What is the intended purpose of this project?
A: To build a simple example of a peer-to-peer system. In this case, a
central directory server will be used (similar to Napsters model).
Q: What are the particular inputs of the system?
A: The user should have a way of connecting to the central server
and registering with the network. Additionally, it should be able
to retrieve information on network nodes and file contents.
Finally, methods for searching for and retrieving files should be
designed.
Q: What software is required?
A: For this project, the j2sdk1.4.2 was used in addition to SparxSystems
UML tools. Additionally, RMI was used for remote networking. RMI is
included in the java development kit.
Q: What hardware is required?
A: For realistic client/server interaction, two computers would be nice
but much can be done on a single computer.Q: How many users of the system will there be?
A: For this system, a single directory server and three remote clients.
Q:
Figure 1: An example of the question and answer exercises in the
planning and documentation period. A number of key outputs of thisexercise helped us get a better sense of what needed to be addressed in
sequence diagrams and use cases.
2.2.2 Use Case Diagrams
Use case diagrams were designed to describe how the individual user interactswith the system. This exercise helped us to satisfy feedback in the analysis phase
such as what are the desired inputs?, how is a file advertised? and what is beingaccomplished?. It became clear early on that one of the key issues was going to behow Content Delivery was going to occur. Protocols such as I/O Streams, TCP/IP,
and UDP would have made the job easier but, since this is an RMI project, I had
to rely on remote object calls. This issue is further explored in the Implementationsection below.
11
-
8/6/2019 Search on Centralized Networks
12/40
Figure 2: User view of the system. This use-case describes the main
inputs
required of a user on the system. Later, these inputs will be translated into
classes and methods.
2.2.3 Sequence Diagrams
At this stage of the projects documentation, the overall architecture of the projectstarted to take shape. When I discussed different design alternatives for this
project with my supervisor, we concentrated on determining the factors most
important to issues relating to peer location, content location, content delivery andsecurity. Several early architecture ideas were designed and eventually dropped.
Accordingly, the main issue we initially struggled with was the choice of
protocol. Sun Microsystemss JXTA seemed like a good option initially but, oncewe decided to move away from a decentralized architecture to a directory server,it was no longer relevant.
Pseudo code was designed in this stage to for both the client and server
implementations. The pseudo code underwent numerous revisions as the projectprogressed. Below is a version of the client implementation:
Pseudo Code for Client Implementation
If the directory server accepts a new connection
register client with the server/network;update the directory;
initialize the peer group;
else if the server refuses to connect
attempt to establish a
connection n times; else fail;
if there is input into the user interface
search clients for files matching entered
string; else download files from clients;
12
-
8/6/2019 Search on Centralized Networks
13/40
Figure 3: The pseudo code for how the client is intended to interact withthe system. A number of such pseudo code examples were designed
and updated later informing sequence diagrams and class and
method design.
Both the Use Case Diagrams and the pseudo code laid a good foundation for thedevelopment of the several sequence diagrams that were built to help us describe the
steps that would be taken during the exchange between client and server. A goodexample of the flow of events can be seen in Figure 4.
13
-
8/6/2019 Search on Centralized Networks
14/40
Figure 4.Sequence flow of the central indexing architecture. The local host queries
the remote directory server for the location of a given piece of content. The server
replies with the IP location of the node containing the text file. Local host thenhandshakes with the remote host to receive the content.
14
-
8/6/2019 Search on Centralized Networks
15/40
3 Implementation
The work completed is outlined in the following sections. The implementation of theRMI client/server was informed by the use cases set up in the requirements. Classes and
their associated attributes and procedures were designed based on the information
gathered during the planning and requirements gathering phase. A first draft of the Serverand Client implementations were designed. Because RMI transparently accomplishes a
good deal of the work involved in setting up a file sharing system, it made much of the
design time efforts straight forward.
I make a few assumptions during implementation. I assume that only one peer is
accessing the server at a time and that queries to remote methods are not happening in
tandem. I also assume that the same is true for similar activity on the system such as filerequests and downloads. Finally, I have designed a small network and assume the reader
understands that the results will not be the same should the number of nodes or overall
usage increase.
3.2 Classes and Methods
Classes and methods were designed during the implementation phase to transport method
calls from the local GUI interface to the remote server. When I designed the methods in
my client and server classes, they essentially followed the below conventions:
Derive an interface from java.rmi.Remote that contains the methods to be madeavailable to RMI clients.
Define a class that extends the appropriate subclass of
java.rmi.server.RemoteServer. In our case, this class is UnicastRemoteObject. Implement the derived interface in the derived class.
Use javac to create class files.
Create stub and skeleton classes with the JDK rmic utility, and make the stubclasses available to the client and servers.
Start the RMI registry on the local machine.
Start the main application, which should instantiate the RMI server class and
register it with the local registry.
Originally, the system was designed with a command-line interface but a GUI was added
later on in the development process. On the server side, Start() and Stop() methods weredesigned for allowing clients to register and deregister with the server. As information
flows to and from the server, and updateDir() method updates the directory of files.
15
-
8/6/2019 Search on Centralized Networks
16/40
Figure 5. Design of the systems core class diagrams.
ThestartPeers() method is provided as a callback from the server to the
requesting client. It calls this method back on the client and passes its references
to the other peers on the network. Once this happens, the receiving peers hold the
node information in an array and searches this information when necessary.
The generic actionPerformed() method is used here to handle all the inputs from
the GUI. The possibilities reflect the BLAH:
Search Nodes on the Network. To do this, the client user enters the
getPeers string in the command line. This puts a call to theserver
to get a new set of peers for the client.
Search Files on the Network. To do this, the client calls the remote file
search method on all peers in a clients peer array and put the results in the
list object of the graphical interface.
Download a Located File. To do this, the client user calls agetFile
method on the peer it needs to receive the file from. The results are written
to a byte array on the local client.
16
-
8/6/2019 Search on Centralized Networks
17/40
Additional methods are used to return basic information to the requesting client.
They include writeFile, which writes the contents of the byte array contents to a
file. AgetFile is called by the remote client to return the byte array of the contentsof the file that is requested. In other words, if a client wants to download a file
called A_Tale_of_Two_Cities.txt, it must call that method on the providing client.
3.2 The Client
After the classes were designed and revised, the next step was to implement the core code
for the local client. The client design is declared in a public interface that extends remoteobjects. Accordingly, this interface extends java.rmi.Remote and its methods are declared
to throw RemoteExceptions to the server.
As each new client joins and leaves the network, it calls the methods designed in the
Client interface class. Additionally, fileName and fileSize strings were added to storeinformation about the files and file sizes located on the other clients on the network. The
class takes the form:
package network;
import java.rmi.*;
import java.io.*;
public interface Client extends Remote {
String [] filenames = new String[99];
long [] fileSizes = new long[99];
public abstract void initPeers(Client clientA; Client clientB; Client c
lientC) throws java.rmi.RemoteException;
public abstract void getHost() throws java.rmi.RemoteException;
public abstract void listFile(int i) throws java.rmi.RemoteException;
public abstract void getNumFiles() throws java.rmi.RemoteException;
public abstract void getIP(string searchString) throws
java.rmi.RemoteException;
public void writeFile(string filename) throws java.rmi.RemoteException;
public byte[] getFile(string filename) throws java.rmi.RemoteException;
Figure 6. The public interface for the client.
The interfaces purpose is to mark derived interfaces that contain methods to be exported
by the remote RMI Server. These method calls were designed to find out as much as
possible about the surrounding network environment importantly information about thelocation of other nodes, getIP() and getHost(), and information about the files each and
every node has in its local directory. This is done via the listFile, getNumFiles. Finally,
methods to download the file are getFile and writeFile. These last two methods later
proved to be somewhat problematic. Namely, the getFile byte array. I will discussreasons and solution further down in the paper.
Once the public Client interface was designed and tested, the main client implementation
class ClientImpl was coded. This class contains the actual logic of the designedmethods and procedures in addition to the relationship to the graphical user interface. The
core functionality of this class is its ability to register each new client with the network
and ultimately make its information available to the other clients. Other key algorithms
17
-
8/6/2019 Search on Centralized Networks
18/40
involve the deregistering of the client and the method that updates the array of file names
and sizes on the current listing. This is the updateDir() method.
This file is too large to demonstrate here, but one method of note is the getFile method
that reads from filename in the public interface to an array of bytes. The contents of the
file is assigned a location in memory by declaring a new temporary file location. Thetemporary byte array location is later written a new file in the local disk of the requesting
node. The essence of this process is detailed in the below steps:
{
byte[] temp = new byte[1];
byte[] contents1;
File inputFile;
try {
inputFile = new File(filename);
size = inputFile.length();
contents1 = new byte[(int) size];
FileInputStream in = new FileInputStream(inputFile);
}
return temp;
}
Figure 7. Design of the systems core class diagrams.
3.3. The Server
The remote extension of the server class is, by comparison, not as complex as the designof the client classes. Essentially, the server was designed to simply do the followingthings: register and deregister clients, store information about client activity, and give
information to requesting clients as need be. Accordingly, the Server classes methods are
register, deregister, and givePeers.
In the implementation of the Server class, a vector array was provided that keeps
information about client activity. When local clients query the server, the server searchesits vector array to provide information about what clients are currently active on the
network. The code for this functionality takes the form:
protected vector clients;
public ServerImpl () throws RemoteException {
clients = new Vector ();
}
Additional efforts were made to determine how long objects were taking to execute onthe server-side. The two key operations for the remote timing of events were, first, the
18
-
8/6/2019 Search on Centralized Networks
19/40
initialization of a timer at the local node that executes on the server. The timer terminates
once the result is returned from the remote server. The data from this operation helped me
gather more information about how the network was behaving in different environments -such as increased traffic or the transmission of different file sizes.
The server code that tests the start and stop time of a remote call should be distinguishedfrom the Timer package which attempts to gauge the length of remote execution. To do
this the local java code makes a call on the remote object being implemented and returns
a date and time in association with the object being acted on. An example of one of theclass diagrams in this package takes the form:
public class Timer implements Runnable {
TimeMonitor tm;
public Timer(TimeMonitor tm){
this.tm = tm;
}
public void start(){
(new Thread(this)).run();
}
public void run(){
while(true){
try{
Thread.currentThread().sleep(10000);
} catch(InterruptedException x){
}
if (this.tm!=null){
try{
this.tm.tellMeTheTime(new Date());
} catch(RemoteException x){
}
}
Figure 8. The above class example remotely queries the server and returns
a time stamp.
The purpose of the this code (in addition to its related classes) is to remotely start the
execution of a timer on each new thread that is executing on the server. When the threadhas completed its execution, the current system time is returned as a result. The end result
is that I get a sense of how long the various methods take to execute on the server.
3.4 Security
Security in networking, and particularly in large peer-to-peer applications, is an important
topic. Because this diploma project is about the effective sharing of resources, the is a lot
of potential to deliver harmful material to any of the nodes on the network. In largernetworks this topic of research is crucial to the vitality of the system.
In my particular RMI client/server program, the intention was to decrease the flexibilityenjoyed by local clients when invoking remote classes. Otherwise, any client program
19
-
8/6/2019 Search on Centralized Networks
20/40
could run any server object, some of which could be potentially harmful to the network.
When researching solutions provided by RMI, the answer I decided to use in my
implementation was to install a security manager. Without the installation of a securitymanager there are no restrictions placed on how remote objects are accessed and by
whom.
I used the java.rmi.security classes to quite simply instatiate the security manager with
the below statement:
if(System.getSecurityManager() == null) {
System.setSecurityManager(new RMISecurityManager());
}
In addition to the above lines of code, the Java SDK that I was using for this project
required that a security policy file be specified at runtime. This is done by defining thejava.security.policy property:
java -Djava.security.policy = mypolicy
In order to access remote objects on the system, Java looks for a system-wide policy file
in its runtime library. It also looks for a local policy file in the home directory of each
requesting client. A sample policy file that grants full access permissions to everyonelooks like:
grant {
permission java.security.AllPermission;
};
Through the use of its local policy file, each client on the network can grant permissionsto each other node on the network. This exchange is made possible by the Permission
classes in the java.security package, which provides access grants to specific resources.
20
-
8/6/2019 Search on Centralized Networks
21/40
4 Evaluation
4.1 Data Collection
Presented in this project are the results of three experiments. In all cases, the intent is tolearn something about the strengths and weaknesses of centralized search methods. To do
this effectively, I use the algorithms designed in a Time package that help to gauge how
long an action event takes from the time it is initiated to the time a response is returnedfrom the server. Data is collected on performance tests on each of the three areas of
interest mentioned above: node discovery, content discovery, and content download. In
each case, the load placed on the system is equal to that performed by 50 simultaneous
queries performed by each of the three clients.
System tests were completed over a period of one week. Tests and resulting data was not
gathered consecutively as research has suggested that this can cause results to varysignificantly.
Later on in the testing phase, results were compiled and explored.
3.3 Node Discovery
Of primary interest when implementing the Timer class is the location of additional peers
on the network As each new node comes onto the network, it registers with the server.
The servers chief task is to keep track of all the nodes currently logged onto the systemand give references to requesting peers. To keep information updated, any given node
can, at any time, contact the server to request an updated list of information about the
other nodes.
When a new client comes onto the network, it initializes the startPeers method and
remotely invokes the servers givePeers method. Information about the other nodes onthe network is then loaded into a local array. The initializes method on the client side
takes the form:
.
public void getPeers (Client clientA, Client clientB, Client clientC)
{
clients[0] = clientA.getHost();
clients[1] = clinetB.getHost();
clinets[2] = clientC.getHost();
}
Figure 9.The startPeers method on the client side invokes the remote givePeers
method on the server. A resulting list of connected nodes is delivered to the
client.
21
-
8/6/2019 Search on Centralized Networks
22/40
It is the first objective of this project to study the effectiveness of this peer discovery
interplay between client and server. Data is colleted by timing the initialization of the
getPeers method, the remote invocation of the givePeers method, and the final responsefrom the server. The sequence of events takes the form:
> getPeers
client sent time
Time: Tues Aug 02 12:11:09 CST 2003
server received time
Time: Tues Aug 02 12:11:10 CST 2003
client returned time
Time: Tues Aug 02 12:11:11 CST 2003
Figure 10.Functioning of the timing sequences. Commands issued on the local
client, the execution of remote methods, and returned results are all time stamped
to issue data points for further exploration.
As Figures 7 and 8 indicate, I tested the getPeers method call in an environment where
only one client was querying the server and then, in an environment where three clients
where simultaneously querying the server. In the case of Figure 7, the overall averageof the combined data points for each client was 1.562. In Figure 8, the single clients
average was 1.262. While it wasnt surprising that the increased load of the Figure 7
resulted in a larger overall average, I did expect the numbers to be further apart.
A second observation was in the difference in fluidity between the two figures. In
Figure 7, a somewhat erratic behavior is observed that is not so (or not at all) present in
its counterpart. I am guessing that this observation can possibly be attributed to thethree clients competing for the same method invocations on the remote server. This
touches on the issues inherent in concurrent systems programming mentioned earlier.
For me, the question that Figure 7 raises is whether or not RMI is thread-safe. There area lot of possibilities here. One such possibility is that the connections are being pooled
in such a way that only one is being used by an outstanding remote call at a time. Just
because the stub never modifies any instance data does not mean that concurrent callswriting to the same socket will marshall correctly. Another possible explanation is the
actual activation of the remote objects. In Suns documentation it was unclear to me
how to tell whether a remote object is in an active or passive state when being accessed.
Without clarity here, it is possible that the graph below reflects multiple threads trying
to spawn multiple processes for the same activation group in this case the givePeersmethod.
22
-
8/6/2019 Search on Centralized Networks
23/40
0
1
2
3
4
-10 10 30 50
0
0.5
1
1.5
2
-10 10 30 50
Figure 11: wait time for locating Figure 12: wait time for locating remote
remote connected nodes. connected nodes. One client.
3.4 Content Discovery
The way in which content is stored and advertised on a network can dramatically
influence the effectiveness of its associated search methods. The advantage of a systemwith a centralized directory is that it is possible to quickly gain access to informationabout which nodes contain which files. Systems such as KaZaA use this fact to its
advantage by combining a fast directory lookup node, or supernode, with the propagate
power of decentralized systems.
Fast access to content references initially happens when the client registers with the
remote server. As each new node registers and deregisters with the remote server, the
updateDir() method updates the array of file names and sizes with the currentinformation. The local client then stores that information as an array in its local directory:
String[] filenames = new String[99]; //stores file listLong[] fileSizes = new long[99]; //stores file sizes
The metrics I used to explore the content location qualities of centralized systems areoutlined in the below charts. Two approaches were designed. In the first test, each peer
simultaneously searches for a different file on the network. In the second approach, each
peer is simultaneously searching for the same file on the network. The focus of these two
different tests is, in general, to gauge the overall time it takes to locate a file on thenetwork and how well file location performs under increased load conditions.
23
-
8/6/2019 Search on Centralized Networks
24/40
0
1
2
3
4
0 20 40
0
1
2
3
4
5
-10 10 30 50
Figure 13: wait time for multiple Figure 14: wait time for multiple
peers requesting the same peers requesting multiple
file. files.
As observed by Figures 9 and 10, there is no real significant differences between multiple
clients accessing the same file and multiple files being accessed by multiple clients. I didexpect to see some variation in the results. There is, however, some notable spikes in
Figure 9s activity. Whether or not I can attribute these small increases to issues such as
thread safety or the remote activation of objects is hard to say - although I doubt it. It ismore likely that these notables are due to slight variations in the results. As a side note, it
is important to point out that, of the files searched, the majority of the desirable files
(likely the ones I was querying the most often) were located on a single client. In terms
wait time resulting from competition for resources, this observation (on a small scale,anyway) doesnt seem to have much effect on the systems activity.
A second test was conducted to rank the expected results from content searches based onthepopularity of the content. Unfortunately, I dont have the luxury of a large network
used by a diverse group of users with varied interests to get a truly random sampling of
how popularity may influence usage of the network. As a next best solution, I gave eachfile on the network a popularity ranking. This was done by assigning values from 1 to
10 (10 being the highest) to each of the ten files on each for the three clients being tested.
As the remote method invocations request files at random, the final results hope to give
us an idea of where traffic might be directed within the network. The results of this testrevealed:
Client A Rank Hits Client B Rank Hits Client C Rank Hits
Timer.java 10 x Crossley.txt 10 Hayden.txt 10
Server.java 9 Xxx Hello.c 9 X Client.java 9
Bio.txt 8 Xxxxx Resume.txt 8 Xxx Letter.txt 8 x
Memo.doc 7 Xxx Dad.doc 7 Xx Crypt.java 7
Summary.txt 6 Xxx Itinerary.txt 6 Xx ToDo.txt 6 xx
Funny.txt 5 Xx FindIt.html 5 Xx Flight.txt 5 xx
RMI.html 4 Xx NIHA.txt 4 X Monitor.java 4 x
NMH.html 3 x Dickens.txt 3 Xxx eCOS.html 3 x
Sam.doc 2 Stream.java 2 X JXTA.txt 2 xx
Mom.doc 1 Jill.doc 1 Columbia.txt 1 x
24
-
8/6/2019 Search on Centralized Networks
25/40
3.5 Content Delivery
This section of my diploma project explores how content delivery behaves in a
centralized network environment. Namely, I evaluate how the flow of informationhappens from one node to the next. To get a sense of how efficient this type of
information exchange is in our small network, the two tests I used evaluated 1) multiple
file downloads (on various small files under 2 MEG in size) happening at the same timeand 2) a multiple downloads of a single large file (20 MEG in size).
0
2
4
6
8
-10 10 30 50
Figure 13: wait time for multiple Figure 14: wait time for multiplepeers requesting peers requesting a large file.
small files.
Once a file was located on the network, the actual transmission from one node to the nextproved to be a bit more complicated than I anticipated. After researching solutions such
as TCP/IP and IO Streams, it seemed that the best method for RMI file transfer was to
read the files contents into an array of bytes on the remote client. To do this, I followedthe following sequence of events:
1) Instantiate the remote object
2) Open the file and get its size3) Allocate the byte array and read the file into that array.
4) Copy the file name.
Once these steps were accomplished, the remote file object could be transferred by
calling thegetFile method on the remote client. This method call fills the local clients
array of bytes with the bytes from the remote file. Then, the local client calls the
writeFile method which writes the contents of the byte array to a file.
As observed in Figure 12, initial large-file transfer tests yielded poor results. In most
cases, the transfer of a large file proved too memory intensive and the system simplyhung. After exploring this problem, I discovered that this wasnt a short-coming of the
centralized architecture design but how the writeFile method call was writing bytes into
the local array. The (short-term) remedy to the file was to cut the client file into byte 5
25
-
8/6/2019 Search on Centralized Networks
26/40
MEG byte arrays and transfer the file as a sequential series, reassembling on the
receiving end.
3.6 Observing results
The network characteristics of centralized systems were studied with peer location,
content location, and file download effectiveness in mind. In our fist test, I tested a singleclients ability to invoke remote method calls on the server to get a listing of connected
peers on the network. The metric used is the wait time between the execution of the
command and the returned results. In the fist case, I found that that average wait time is
roughly 1.562 in an environment where load is being placed on the server. In the casewhere one node is accessing the remote server to get peer information, the wait time is
comparatively less 1.262 as might be expected.
The ability for a local client to quickly find the location of files on the network was
shown in the second exploration. In both cases multiple nodes querying the same and
different files the response time was immediate. As an added observation to this test, itis interesting to note that the percentage of responses to requests maintains a high and
predictable level. At no one time was a request for a file rejected by the remote server. A
second test was added to the subject of content location popularity. The popularity of aparticular file (or group of files on a particular node) on the network can dramatically
influence how activity is distributed. In our test a sampling taken at random shows that
that overall load of the network was weighted towards Client A. In such a case, especially
if the popularity of files is proportionally small on the surrounding peers, a bottleneckcould possibly occur. As observed by the weighted results in Client A, an effective
solution to evenly distributing how and when remote objects are invoked by connected
clients is an important consideration when dealing with a centralized system.
Finally, the effects a file retrieval has on the system was tested. In both cases, all three
peers in the network were transferring a relatively small file under 2 MEG and then alarger file of 20 MEGS.
26
-
8/6/2019 Search on Centralized Networks
27/40
5 Conclusions
5.1 Further Development
There are a number of improvements I would likely make to this project, given more
time. As mentioned earlier, one of the primary problems of centralized systems is thatthey are not as efficient in propagating requests throughout the network as their
decentralized counterparts. The problem is that a remote peer cannot send unrequested
data to a client doing a search. The remote peer can only send data when it is explicitlycalled for by the requesting client. This inflexibility of the centralized RMI approach
makes it quite impossible to seamlessly share information throughout the network. To
illustrate the point, consider Client A. When Client A wants a file, it makes a call to theremote Server, requesting information about the other connected nodes. If Client B has
the requested file, then Client A makes a direct connection. But, what if the file does not
exist within Client As peergroup, but is available somewhere else on the network? In an
ideal situation, Client B could be able to refer the requesting client to another node on thenetwork that does have the file. This is the essence of the JXTA API. JXTA uses the
notion of Advertisements and a Peer Discovery Protocol (PDP) to fluidly locate
references to information throughout the network. Advertisements are essentiallymessages represented as XML that make available information stored in a given peers
cache such as other peers, peer groups, or available local or remote content. When a
peer attempts to discover a particular piece of content, it searches the referring
advertisements until a reference to the correct node is found. The efficient propagationmethods of the JXTA protocol are not possible in an RMI environment where
information exchange is a one-to-one dynamic.
A second improvement would address the problem of concurrency within distributed
systems. The way RMI currently works, a method dispatched by the RMI runtime to a
remote object implementation may or may not execute in a separate thread. The RMIruntime makes no guarantees with respect to mapping remote object invocations to
threads. As a result, when an RMI server is written, any assistance in executing separate
threads must be hand coded. This introduces a degree of complexity that, although I didnot have time to address it in this diploma project, is crucial to any system that entertains
the possibility of numerous simultaneous client requests (such as a multi-user file-sharing
system).
Another improvement I would like to add, time allowing, would be the inclusion ofadditional tests for each of the three subject groups. I feel that additional variations could
be done on system load testing. For example, in the case of the content discovery tests, it
might be interesting to explore how the system behaves when content is evenlydistributed throughout the network versus unevenly located in only a few nodes. Another
27
-
8/6/2019 Search on Centralized Networks
28/40
such improvement might be a decent stab at building a system that manages to propagate
requests from node to node. Given the focus of this project, I was not able to invest too
much effort in finding a good solution to this problem. None-the-less, propagation is theshortcoming of centralized systems (and the advantage of its decentralized kin) and the
design of an RMI system that intelligently tackles this problem would certainly be
interesting. Finally, it would definitely be useful to see gauge how each of the search,discovery, and retrieval subject areas behave as the size of the system scales. While a
three node network is useful for the purposes of an academic exploration, translation into
the day-to-day environments would require a more robust architecture.
5.2 Final Conclusion
Overall, the goals of this project have been accomplished. I have spent a good deal of
time testing the various strengths of searching a centralized network and have found thatsuch a network can be both powerful and powerless depending on what you are
demanding of it. Centralized directory servers are a very powerful tool for providing fast
references to remote locations on the network. This fact is certainly a valuablecommodity in large, multi-node environments where multiple files are being shared. Thedata points under peer discovery and peer delivery sections certainly back up this finding.
On the other hand, I found content delivery to be a problematic for the reasons stated
above. I dont feel this is the result of a centralized environment but, rather theshortcomings of RMI. Certainly, my earlier solution of cutting up my files into segments
of byte streams could be solved with sockets or some other such solution, but, the issue of
propagation makes RMI a poor solution for large-scale file sharing environments centralized or decentralized.
28
-
8/6/2019 Search on Centralized Networks
29/40
Appendices
A. Key Code Samples
//The group of Time classes in the Time package act as an
//aid for data collection by remotely invocing the stub objects
//on the server returning the times that objects were
//invoked.
TimeMonitorImpl.java
Package Time;
import java.rmi.*;
import java.util.Date;
import java.io.Serializable;
public class TimeMonitorImpl implements TimeMonitor, Serializable
{
public void tellMeTheTime( Date d ) throws RemoteException
{
System.out.println("Time: " + d.toString() + "\n");
}
}
//The below method helps register the clients with the server.
register() in ServerImpl.java
public int register (Client client) {
String chost="";
try {
chost = getClientHost();
} catch (ServerNotActiveException ignored) {
}
clients.addElement (client);
try{System.out.println(chost + " has registered - sharing "
+client.getNumFiles()+" files");
givePeers(client);}
catch (RemoteException ignored) {}
return clients.size()-1;}
public void givePeers (Client client) throws RemoteException{
if (clients.size () > 1) { // Only give random clients if there is
try {int randNumber = (int)(Math.random()*(clients.size()-1));
System.out.println("Giving new Peer "+randNumber+" to
remote.");
Client temp = (Client) clients.elementAt (randNumber);
randNumber = (int)(Math.random()*(clients.size()-1));
29
-
8/6/2019 Search on Centralized Networks
30/40
System.out.println("Giving new Peer "+randNumber+" to
remote.");
Client temp2 = (Client) clients.elementAt(randNumber);
randNumber = (int)(Math.random()*(clients.size()-1));
System.out.println("Giving new Peer "+randNumber+" to
remote.");
Client temp3 = (Client) clients.elementAt
(randNumber);
client.initPeers(temp, temp2, temp3); }
catch (RemoteException ignored) {}
} else { // First client, register with itself three
times;
Client temp = (Client) clients.elementAt (0);
try {client.initPeers(temp, temp, temp); }
catch (RemoteException ignored) {}
}
}
//The main method of the ClientImpl class. Key methods from the
//Client class are passed via this method.
public static void main (String[] args) throws RemoteException,
NotBoundException {
if (args.length != 1) throw new IllegalArgumentException
("please enter host name");
ClientImpl Client = new ClientImpl (args[0]);
Client.start ();
}
public String listFile(int i) throws java.rmi.RemoteException {
return this.fileNames[i];
}
public long listSize(int i) throws java.rmi.RemoteException {return this.fileSizes[i];
}
public int getNumFiles() throws java.rmi.RemoteException {
return numFiles;
}
public int getTime() throws java.rmi.RemoteException {
return getTime;
}
public String getIP() throws java.rmi.RemoteException {
return myip;
}
30
-
8/6/2019 Search on Centralized Networks
31/40
B. Bibliography
[1] Brookshier, Govoni, Krishnan, & Soto, JXTA: Java P2P Programming.
[2] Gradecki, Mastering JXTA.
[3] Rosenberg & Scott, Applying Use Case Driven Object Modeling with
UML.
[4] Fowler & Scott, UML Distilled, Second Edition.
[5] Kolenikov & Hatch, Building Linux Virtual Private Networks (VPNs).
[6] Nelson Minar: Distributed Systems Topologies, Parts 1 and 2
http://www.openp2p.com/pub/a/p2p/2001/12/14/topologies_one.html
31
-
8/6/2019 Search on Centralized Networks
32/40
C. Project Proposal
Peter DushkinQueens
pbd22
Diploma in Computer Science Project Proposal
A File Discovery Scheme in Decentralized Computing
December 6th, 2002
Project Originator: Peter Dushkin
Project Supervisors: Meng How Lim
Signature:
Director of Studies: Dr. Robin Walker
Signature:
Overseers: Dr. Larry Paulson & Dr. Tim Harris
32
-
8/6/2019 Search on Centralized Networks
33/40
Table of Contents
Introduction..........................................................................................................................2Project Proposal...................................................................................................................3
I. Front end application.................................................................................................3
II. Node software...........................................................................................................4Resources.............................................................................................................................5
Supervision Requirements...................................................................................................5
Phases of Development........................................................................................................6Timetable and Milestones....................................................................................................7
Weeks 1 and 2: Proposal Definition.............................................................................8
Weeks 3 to 6: Paper Network Design...........................................................................8Weeks 7 to 10: Paper Software Design.........................................................................8
Weeks 11 to 13: Physical Network Implementation.....................................................8
Weeks 14 and 20: Application Coding.........................................................................8
Weeks 21 to 27: Evaluation & Debugging...................................................................9
Weeks 28 to 35: Evaluation & Debugging/Dissertation...............................................9Week 36: Final Form....................................................................................................9
33
-
8/6/2019 Search on Centralized Networks
34/40
Introduction
The ways in which a network of computers share files have come a long wayover the past decade. The days of a small office or university network sharing
documents over a dedicated LAN or WAN have rapidly evolved into radically newareas of network computing. The two most commonly used today are bothcentralized and decentralized architectures.
Centralized, or client/server, networks rely on one central server to adjudicateactivity. The central computer maintains a database of files owned by computerson the network. When a computer requests a file, it is checked by the centralserver against the database and, if acknowledged, a direct connection can beestablished between the requesting and sending computers.
The problem with a centralized network architecture is that a lot of demand is
placed on the central server. As a result, the network can become quite slow dueto bottlenecks. Also, should the central server experience problems or go down,the whole network is affected.
A response to these problems is decentralized computing. In this model, all of thenodes on the network act in both a client and server capacity, removing the needfor a central server. This project will be using a file location scheme to show thecomparative advantages of decentralized to centralized computing.
34
-
8/6/2019 Search on Centralized Networks
35/40
Project Proposal
This diploma project will use the TCP and RMI protocols to search for files on a smalldecentralized network. A Java-based GUI application will be developed to serve as the
primary interface to the network. It will allow the end-user to ping the nodes on the
network and discover information about the various computers - with the end-goal of filelocation. Each node on the network will provide a simple query interface that enables
them to receive requests and respond accordingly.
Outlined below are some possible features of the requesting and responding nodes:
I. Front end application
The front-end application will be the GUI interface to the decentralized network.
Possible networking protocols involved are RMI, TCP, IP and UDP. The
applications core objective is to serve as an interface for the location anddiscovery of files. Additionally, it should return information about individual
nodes. Some of the returning information might be:
a) The IP address of each computer
b) Network Bandwidth used by nodes.
c) Network status of each computer.
d) Time/milliseconds between ping and pong.e) The geographic location of the computers.
II. Node software
The node software will directly relate and respond to the incoming
packets sent by the requesting computer. As a result, the primaryresponsibility of the node software is to return the appropriate
information or pass the request along to the next computer. Some of the
resulting class definitions should be:
Pong (to return a packet request with related information)
Retrieval of information
Download of file(s)
35
-
8/6/2019 Search on Centralized Networks
36/40
Figure 1: Intended Network Build
Below is the intended network set-up. I will be logging onto the suggested networkthrough SSH.
College Server
linux2.pwf.cl.cam.ac.uk linux3.pwf.cl.cam.ac.uk linux4.pwf.cl.cam.ac.uk
131.111.128.110
CAMBRIDGE
NETWORK
PWF Server/Client PWF Server/Client PWF Server/Client
Router
L. 1
L. 2
L. 3
Possible Extension of the Project
Depending on the overall development of the project, ways of decreasing bandwidthutilization may be considered. A number of peer-to-peer networking protocols have been
wrestling with possible solutions to the problem of excessive network traffic. Below are
some suggested solutions.
I. Pong Limiting
Pong limiting reduces the amount of traffic on the network by only
returning a pong with its own address if the host is not restricted by afirewall. Moreover, only a fixed number of pongs should be returned in
response to a given ping.
II. Pong Caching
A drawback of pong limiting is that it is inefficient if too many pings arebeing sent. To solve this problem, a possible solution is the caching ofthe most recent pongs and avoiding the broadcast of pings.
In other words, if the appropriate reply is cached, then the distance that
the matching request has to travel can be significantly reduced.
III. Ping Multiplexing
36
-
8/6/2019 Search on Centralized Networks
37/40
The idea behind Ping Multiplexing is that when a singe incoming ping
reaches a node, it is "multiplexed" into numerous outgoing pings. The
reverse is true for pongs (numerous pongs can be "demultiplexed" intoone pong).
Resources
I am planning on using the following:
1. Operating Systems: Linux
2. Programming Language: Java, UNIX, possibly Perl
3. Networking Protocols: TCP, IP, RMI, UDP4. Hardware: C4 computers via SSH.
5. Additional Software: Viso for UML
6. Storage: My ADP Tape drive, possibly Penguin.
Supervision Requirements
I will be sitting down with Mr. How Lim once ever two weeks to provide a project update
and discuss milestones. Otherwise, we will be corresponding via email as needed.
Phases of Development
37
-
8/6/2019 Search on Centralized Networks
38/40
Research andImformation
Gathering
Paper
Planning
Physical
NetworkDesign
Physical
SoftwareDesing
Dissertation and
Completion
Timetable and Milestones
Weeks 1 and 2: Proposal Definition
38
-
8/6/2019 Search on Centralized Networks
39/40
* Meetings with Supervisor, Overseer, Director of studies.
* Set up schedule of meetings with overseers and supervisor.* Consolidation of project plan and overall direction.
* Acknowledgement of resource availability.
* Supporting research and data collection.* Project fine tuned per overseer/supervisor comments and finalized.
* Network hardware availability sorted out.
* All appropriate signatures prior to final project plan.* Milestones: Final project plan.
Weeks 3 to 6: Paper Network Design
* Study Linux and how it is going to be used for this project.
* Study particularly relevant aspects of Unix/Linux documentation.
* Study particularly relevant aspects of P2P documentation.* Start designing sketches of network diagram and related functions.
* Software Modules* Start building UML diagrams of approved sketches.
* Milestone: Final Network Design.
Weeks 7 to 10: Paper Software Design
* Study related networking java source code.
* Study documentation on ping/pop schemes.* Create sketches of the application's GUI.
* Create pseudocode for all of the items in "A" to "I" above.* Design methods, classes, procedures, functions, etc.* Apply work to UML diagrams.
* Milestone: Final Application Design
Weeks 11 to 13: Physical Network Implementation
* Implement Network Design* Start to think about dissertation.
* Milestone: Running Network
Weeks 14 and 20: Application Coding
* Implement Application Design
* Successful Implementation of all "core items".* Start to work on the development of dissertation.
* Milestone: Prototype Application
Weeks 21 to 27: Evaluation & Debugging
39
-
8/6/2019 Search on Centralized Networks
40/40
* Check consistency, logic, etc.
* Evaluate the overall protocol* Check against original schematics/specification
* Evaluate network efficiency/functioning
* Evaluate bandwidth usage
Weeks 28 to 35: Evaluation & Debugging/Dissertation
* These weeks are a repetition of the last weeks.
* Check consistency, logic, etc.
* Evaluate the overall protocol* Check against original schematics/specification
* Evaluate network efficiency/functioning
* Continuation of work and development on dissertation.
* Milestone: Fully flushed out dissertation
Week 36: Final Form
* Completed application and dissertation.
* Milestone: Completed Dissertation and Application