evaluation of reliable network communication solutions in

Evaluation of Reliable Network Communication Solutions

in an Industrial Environment

A L B E R T D U N B E R G

Master’s Thesis in Computer Science (20 credits) at the School of Computer Science and Engineering Royal Institute of Technology year 2007 Supervisor at CSC was Olof Hagsand Examiner was Stefan Arnborg TRITA-CSC-E 2007:019 ISRN-KTH/CSC/E--07/019--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.csc.kth.se

Evaluation of Reliable Network CommunicationSolutions in an Industrial Environment

Abstract

Industrial systems often have different requirements and priorities than systems foundin a desktop environment. In this master’s thesis, various network communication solutionsare examined and evaluated in order to find a solution that is suitable for an existingindustrial system. The study covers several network protocols, including common Internetprotocols, and communication solutions from the growing field of Industrial Ethernet.

The featured system is found on the plant floor in a harsh industrial environment whereelectromagnetic interference from other equipment is expected to affect the system. It isan isolated system with well-defined requirements and properties that will seldom change.Under these conditions, the system requires reliable, message-oriented communication withlow latencies and has low demands on capacity.

Two promising communication solutions are selected for further study. The first solutionis based on the Transmission Control Protocol (TCP) and is simple to implement sinceTCP is widely available. The second solution is based on the Reliable User DatagramProtocol (RUDP) and requires a larger implementation effort since RUDP itself has tobe implemented. Both solutions are implemented and evaluated regarding communicationlatency and error handling. To sum up, the solution based on RUDP is recommendedbecause of it being more adjustable and therefore able to provide better error handlingperformance for the particular system.

Utvärdering av tillförlitliga kommunikationslösningarför nätverk i en industriell miljö

Sammanfattning

Industriella system har ofta andra krav och prioriteringar än vad system funna i enskrivbordsmiljö har. I denna rapport studeras och utvärderas olika lösningar för nätverks-kommunikation med avsikt att finna en kommunikationslösning som uppfyller kraven förett existerande industriellt system. Studien omfattar flera nätverksprotokoll, inklusive pro-tokoll funna på Internet, och kommunikationslösningar från det växande området IndustrialEthernet.

Systemet som beskrivs i rapporten återfinns på verkstadsgolvet i en påfrestande indust-riell miljö där elektromagnetisk interferens från annan utrustning förväntas påverka syste-met. Det är ett isolerat system med väldefinierade krav och egenskaper som sällan ändras.Under dessa förhållanden kräver systemet tillförlitlig, meddelandebaserad kommunikationmed korta fördröjningar och har låga krav på kapacitet.

Två lovande kommunikationslösningar väljs ut för fortsatta studier. Den första lösningenär baserad på Transmission Control Protocol (TCP) och är enkel att implementera eftersomTCP är allmänt tillgängligt. Den andra lösningen är baserad på Reliable User DatagramProtocol (RUDP) och kräver en större insats att implementera eftersom den också inklu-derar en implementation av RUDP. Dessa två lösningar implementeras och utvärderas medavseende på fördröjningar under kommunikationen samt felhantering. Sammanfattningsvisrekommenderas lösningen baserad på RUDP eftersom den tillåter mer justeringar och därförkan leverera bättre prestanda vid hantering av fel i systemet som beskrivs i rapporten.

Preface

The master’s project described in this thesis was carried out at Omnivisor duringthe year of 2006. It was performed within the realm of Computer Science at theSchool of Computer Science and Communication (CSC) at the Royal Institute ofTechnology (KTH).

I would like to thank associate professor Olof Hagsand, my supervisor at CSC,for his excellent support. I would also like to thank Arne Dunberg, my supervisorat Omnivisor, for making this project a reality.

Last but not least, a big thanks to my friends at KTH throughout the years,you know who you are.

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 42.1 Layered Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4.1 Acknowledgment and Retransmission . . . . . . . . . . . . . . 62.4.2 Forward Error Correction . . . . . . . . . . . . . . . . . . . . 7

2.5 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5.1 Stop-and-Wait . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5.2 Sliding Window . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.6 Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Technology 103.1 Physical and Data Link Layer . . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.2 Medium Access Control . . . . . . . . . . . . . . . . . . . . . 113.1.3 Switched Ethernet . . . . . . . . . . . . . . . . . . . . . . . . 113.1.4 Full-Duplex Connections . . . . . . . . . . . . . . . . . . . . . 123.1.5 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Network Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 Internet Protocol (IP) . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Address Resolution Protocol (ARP) . . . . . . . . . . . . . . 153.2.3 Internet Control Message Protocol (ICMP) . . . . . . . . . . 15

3.3 Transport Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3.1 User Datagram Protocol (UDP) . . . . . . . . . . . . . . . . 16

3.3.2 Transmission Control Protocol (TCP) . . . . . . . . . . . . . 163.3.3 Reliable Data Protocol (RDP) . . . . . . . . . . . . . . . . . 203.3.4 Reliable User Datagram Protocol (RUDP) . . . . . . . . . . . 223.3.5 Stream Control Transmission Protocol (SCTP) . . . . . . . . 223.3.6 Datagram Congestion Control Protocol (DCCP) . . . . . . . 243.3.7 Real-time Transport Protocol (RTP) . . . . . . . . . . . . . . 24

3.4 Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.5 Industrial Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5.1 Modbus TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.5.2 Ethernet/IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.5.3 EtherCat and Ethernet Powerlink . . . . . . . . . . . . . . . 27

4 Summary and Selection 284.1 Available Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Solution Based on TCP . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Solution Based on RUDP . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Implementation 315.1 General Implementation Details . . . . . . . . . . . . . . . . . . . . . 315.2 Implementation of Solution Based on TCP . . . . . . . . . . . . . . 315.3 Implementation of Solution Based on RUDP . . . . . . . . . . . . . 32

6 Evaluation 366.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.1.1 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.1.2 Simulated Communication . . . . . . . . . . . . . . . . . . . . 36

6.2 Preliminary Evaluation of the RUDP Implementation . . . . . . . . 376.3 Comparative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3.1 Communication Latency . . . . . . . . . . . . . . . . . . . . . 386.3.2 Error Detection and Handling . . . . . . . . . . . . . . . . . . 39

7 Conclusion and Recommendation 43

Bibliography 45

A Acronyms and Abbreviations 48

Chapter 1

Introduction

This chapter describes the background and aim of the master’s project described inthis thesis. It also defines the problem, presents a method for solving the problemand outlines the thesis.

1.1 Background

The purpose of this master’s thesis is to present a replacement for the communica-tion solution used in an existing industrial system. Presently, the system is usingpoint-to-point communication through slow serial links. The system consists of ahandful of nodes in a centralized architecture. One node is the master and the restare slaves, with serial links connecting each slave to the master. All communicationwithin the system is initialized by the master node.

The aim of the project is a system where all nodes can exchange messagesbetween each other using a local network. In order to make this possible, a solutionfor communication using the network and which meets certain requirements had tobe found.

Once the transition to communication using the network is complete, the ex-change of messages within the system will initially resemble the communicationwithin the present system, including the centralized architechture. The only changewill be that the network is used instead of the serial links. In the future, thistransition will enable the communication within the system to develop and becomemore distributed by abandoning the centralized architechture and fully embrace theadvantages of the network architechture.

1.2 Problem Definition

The master’s project described in this thesis consisted of finding a suitable solutionfor communication over a network which meets the requirements of the existingsystem. The project was required to include a study of available technology onwhich to base further work. A selection of promising communication solutions was

1

to be evaluated and finally, a recommended communication solution for the systemwas to be presented.

The requirements for a communication solution to be used in this system are asfollows:

• The communication within the system is based on an exchange of short mes-sages between the nodes. It is important that the delays when delivering thesemessages are short. As a reference, earlier tests experienced delays of about200 ms which was regarded as unacceptably long.

• The communication solution has to be reliable since the current applicationexpects this. The sender should receive an acknowledgment of correctly re-ceived messages.

• Another aspect of reliability is the ability to cope with external electro-magnetic interference. Since the network will be used on the plant floor inan industrial environment, interference from electrical equipment is expected.The current communication solution experiences some problems which areassumed to be the cause of interference.

• The handling of errors occurring during communication is important. A swiftresponse when errors occur is a requirement when short delays during thedelivery of messages is of the essence.

• The platform of the system is Microsoft Windows NT/2000 equipped com-puters and a Fast Ethernet network. Any communication solution of interestmust be available on this platform.

• A communication solution may not be dependent on technology which involveslicensing fees.

The evaluation was to include communication latency, error detection, the han-dling of errors and of course the ability to meet the requirements in general. Mea-surements were to be made in a simulated environment based on the characteristicsof the existing system.

Any source code produced during this project was to be written in C and is theproperty of Omnivisor.

1.3 Delimitations

There were also several conditions that restricted the scope of the project.A single message sent by the system is guaranteed to be small enough to fit

within the maximum supported size of the network. Therefore, a message willnever be divided into several parts, and then reassembled at the destination, inorder to traverse the network.

2

The capacity of the network is much greater than the combined demand ofthe connected nodes. It can also be assumed that the network is used solely forthe purpose of communication within the scope of this project, no other traffic ispresent.

Only unicast technology is relevant since the communication within the systemis inherently point-to-point. Multicast technology does not provide any significantadvantages for this system, not even for future use, and was not to be considered.

1.4 Method

In accordance with the problem definition, this master’s project was divided intothe following five tasks:

1. Perform a study of existing solutions for communication over networks whichmay be suitable for the system featured in this project.

2. Select promising communication solutions for further work and to include ina more detailed evaluation.

3. Implement the previously selected communication solutions in order to per-form the evaluation.

4. Evaluate the selected communication solutions using a simulated environment.

5. Recommend a communication solution for the system based on the evaluation.

1.5 Thesis Outline

Chapter 1 is an introduction to this master’s thesis. A problem is defined anda method for solving the problem is presented. In Chapter 2, common conceptsfound within the field of network communication, and which are important for theunderstanding of this thesis, are described. Chapter 3 contains the result of thestudy performed in the beginning of the project and is the basis for further work.This study is summarized in Chapter 4 and a selection of suitable communicationsolutions is presented. In Chapter 5, the implementation of each selected solution isdescribed. Chapter 6 contains the details of the evaluation performed and presentsthe obtained results. The thesis is concluded with Chapter 7 which also presents arecommended solution to the problem.

3

Chapter 2

Theory

This chapter describes some fundamental concepts and mechanisms of network com-munication which are vital to the understanding of this thesis. As an aid to thereader, a reference of acronyms and abbreviations commonly used in this thesis canbe found in Appendix A.

2.1 Layered Protocols

Network communication is often discussed in terms of different layers of the networkarchitecture. The dominating network architecture is the TCP/IP protocol suitewhich is made up of five ordered layers: physical, data link, network, transport andapplication [7]. The TCP/IP model of network communication between two hostsis shown in Figure 2.1. Each host feature the five layers, the layers are connectedto any adjacent lower or higher layer through an interface.

The physical layer is responsible for moving individual bits from one node of thenetwork to the next. The data link layer adds grouping of bits into frames. Thepurpose of the network layer is to add communication between networks and thetransport layer allows communication between specific processes on hosts connectedto a network. The transport layer also provides an interface to user developedapplications which are residing in the application layer.

The only direct communication between host A and B in Figure 2.1 exist at thephysical layer which is connected to the network. Information exchanged betweenhosts by protocols at higher layers must move down through the layers to thephysical layer, over to the other host and then back up through the layers. Eachlayer creates a packet by adding its own information to a message it receives fromthe layer above and passes it on to the layer below. The process is reversed whena packet is received from the lower layer, the layer specific information is strippedand the message contained in the packet is passed on upwards. This mechanism ofwrapping and unwrapping information in packets is called encapsulation.

4

Network

Physical layerPhysical protocol

Physical layer1 1

Data link protocolData link layerData link layer2 2

Application layerApplication protocol

Application layer5 5

Transport layerTransport protocol

Transport layer4 4

Network layerNetwork protocol

Network layer3 3

Host A Host B

Figure 2.1. Two hosts connected to a network by the TCP/IP protocol suite.

2.2 Reliability

The definition of reliability in communication protocols incorporates several fea-tures. A transport service between two users is reliable if and only if it has all ofthe following features [10]:

• No loss of data. All data that is sent is guaranteed to either arrive at thereceiver or the sender is informed if this is not the case.

• No duplicated data. All data that is sent is guaranteed to arrive at the receiverat most once.

• Ordered data. All data is guaranteed to arrive at the receiver in the sameorder it was submitted by the sender.

• Data integrity. There is a high probability that all data bits received areidentical to those originally submitted.

If even one of these features is lacking the service is unreliable. To maintain areliable service, errors must be detected and corrected.

2.3 Error Detection

Error detection mechanisms are used to detect anomalies threatening the conditionsfor a reliable service. By using sequence numbers, each data packet sent is marked

5

with a monotonically increasing number, it is possible to detect lost, duplicatedand out-of-order data [10]. Lost data will create a gap in the sequence of receivednumbers and keeping track of received sequence numbers also make it possible todiscard duplicates and handle data received out of order.

Corrupted data has by some interference suffered bit errors, that is, one or moredata bits has had its value changed. Bit errors are detected by means of redundantinformation in the form of error detecting codes [6]. Extra bits are added and areused by the receiver to check the integrity of the received data. Common errordetecting codes are parity check, checksum and cyclic redundancy check (CRC).

In order to maintain and keep track of sequence numbers, a transport servicemust preserve its state information between transmissions which means that it isconnection-oriented [10]. A connection-oriented service typically consist of threephases: connection establishment, data transmission and connection termination.In the connection establishment phase, state information such as sequence numbersis initialized. During the data transmission phase this state is maintained and finallydiscarded in the termination phase.

2.4 Error Correction

Error correction is the mechanism used by sender and receiver to recover frompreviously detected errors and maintain reliability. It is traditionally accomplishedby retransmission using timers, sequence numbers and acknowledgments of receiveddata.

The problem of lost or corrupt data can be solved by retransmission. Thesequence numbers make it possible to simply discard duplicates. Data received outof order can be stored while waiting for previous data or discarded. If out-of-orderdata is discarded it has to be retransmitted until it arrives in order.

2.4.1 Acknowledgment and Retransmission

A popular form of error correction is to require the receiver to acknowledge receiveddata. If the sender does not receive an acknowledgment within a certain timeframe,the data is assumed lost and transmitted again. This mechanism relies on positiveacknowledgments (ACK) from the receiver [6, 10]. The sender maintains a bufferfor sent data and a timer mechanism. When an ACK is received, the acknowledgeddata may be cleared from the buffer. This mechanism has no error reporting, itrelies on the timer for error detection and retransmission of data.

Adjusting the retransmission timeout (RTO) to a suitable value is essential. Ifthe RTO is too short, unnecessary retransmissions will occur because the ACKshave not had enough time to arrive at the sender. An unreasonably long RTO willlimit responsiveness and throughput. The RTO is usually based on the round-triptime (RTT) which is the time required to traverse the link from the sender to thereceiver and back again.

6

Normally only data is acknowledged. If an ACK is lost during transmission thesender will wait until the timer expires and then send the unacknowledged dataonce again which will create a duplicate of the data. The receiver will discard theduplicate and send another ACK when received.

In contrast to positive acknowledgments there are also negative acknowledg-ments (NACK) [6, 10]. A NACK is an explicit identifier of data that has not beenreceived, an error reporting mechanism, and can be used to immediately start aretransmission without waiting for a timeout.

Retransmission is used to recover from data loss during transmission and inmany cases also from bit errors. Since the common case is for the receiver to simplydiscard data where bit errors have occurred, a retransmission is forced due to thelack of acknowledgment.

2.4.2 Forward Error Correction

In forward error correction (FEC) techniques the receiver makes use of redundantdata and error correcting codes in order to handle bit errors [6]. In error detection,redundant data is used to detect the occurrence of bit errors. The purpose of FECis to use even more redundant data in order to not just locate invalid bits but alsocorrect them, making retransmission unnecessary for corrupt data.

The trivial FEC code is to repeat every bit of data several times. If a bit erroroccurs, the correct value can be determined by a majority decision based uponthe values of all copies of the bit. This code requires a lot of redundant data.There exists several advanced and more efficient error correcting codes, such as theHamming code.

2.5 Flow Control

Flow control is the mechanism regulating transmission of data at the sender [10]. Itdetermines the amount of data that can be sent before receiving acknowledgment.Ultimately, the sender should be able to limit the rate of data being sent in orderto avoid overflowing buffers at the destination.

2.5.1 Stop-and-Wait

Stop-and-wait is the simplest flow control mechanism [6]. The sender transmits oneunit of data and then waits for it to be acknowledged before sending the next unit.The single data unit is stored while waiting for acknowledgment and retransmittedif necessary due to a timeout. Since the sender is idle while waiting for acknowl-edgment, stop-and-wait is quite inefficient, especially if the RTT is high. There isalso an unnecessary communication overhead since each unit of data sent requiresan acknowledgment.

7

2.5.2 Sliding Window

The sliding window mechanism addresses the issues of low efficiency and large over-head for acknowledgments found in the stop-and-wait flow control mechanism [6, 10].To improve the efficiency, multiple units of data is in transit while waiting for ac-knowledgment instead of a single unit.

The mechanism resembles a window covering a part of the stream of sequencenumbers. Data units with sequence numbers within the window can be sent imme-diately. The imaginary window has two walls. Sequence numbers to the left of thewindow are already acknowledged, while those to the right can not yet be transmit-ted. The maximum number of data units the sender can transmit before receivingacknowledgment is determined by the size of the window, the distance between thewalls.

Since several units of data are transmitted at once, the sender does not haveto acknowledge every single one with a separate ACK, it can send a cumulativeACK covering several sequence numbers and therefore reduce the communicationoverhead. It is called a sliding window because the walls move as the transmissioncontinues. Once an ACK covering the leftmost sequence number is received by thesender, the window slides to the right. The window then includes new, unsent unitsof data which can be transmitted, at least this is true for a constant window size.

The size of the window can be variable in order to provide the receiver with amethod to limit the rate of data being sent. Using a variable window size, an ACKis not used to move the right wall of the window, only the left. The location of theright wall, and the size of the window, is determined by the window size advertisedby the receiver. The size is not constant and can be changed by the receiver atany time in order to control the flow of data. The advertised window size typicallyreflects the available space in the receiver’s incoming buffer.

The sliding window mechanism requires larger buffers at both the sending andreceiving side but is capable of higher efficiency since the link can be utilized whilewaiting for acknowledgment. The sender’s buffer must be able to store all the datain the current window in case of retransmissions. Using large enough buffers andwindow size, a high efficiency can be maintained even when the RTT of the link ishigh. Sliding window is of course well suited to bulk transfer of data.

2.6 Congestion Control

Congestion control are the mechanisms used to handle occurrences of traffic con-gestion and to keep the load below the capacity of the network. It is often tightlycoupled with the flow control mechanism since both control the flow of data betweensender and receiver.

Congestion in a network occurs when the load of the network, the number ofpackets sent, is greater than the capacity of the network [6]. Network devices likerouters and switches have queues. If packets arrive at a higher rate than the devicecan handle or the arrival rate is higher than the departure rate, congestion occurs

8

as packets are blocked in the queues. The congestion can be temporary and resolveitself over time, only resulting in delayed packets, or the queues may become full andcause packets to be dropped. Dropped packets, and sometimes also delayed packets,result in retransmissions and even more packets on the network which makes thecongestion problem worse.

The goal of congestion control is to avoid congestion if possible and to resolveit when it occurs. Often it is desirable to provide fairness between data flows andto optimize the usage of the network [10]. Since the network is a shared resourcethis requires the cooperation between involved users to limit their traffic and followthe rules of the congestion control mechanism. Cooperation is a difficult thing toaccomplish on a busy wide area network (WAN) such as the Internet where severaldifferent protocols are used, some lacking congestion control completely. On a localarea network (LAN), congestion control is not as important since available capacityis generally much greater and the LAN is rarely congested.

9

Chapter 3

Technology

This chapter is the result of the study of available technology made in the beginningof this master’s project. Existing solutions for network communication are describedand significant differences between them are discussed.

3.1 Physical and Data Link Layer

The technology chosen for the physical and data link layer of the system featured inthis project is Fast Ethernet. Even though there are other technologies available forLANs, Ethernet is by far the most dominating one in use today [6]. The hardwareand software is commonly available and very cost efficient. There exists severalvariants of Ethernet, Fast Ethernet is one of them and operates at 100 Mbps.

The purpose of this section is to determine what can be expected of a FastEthernet network.

3.1.1 Ethernet

Ethernet is responsible for delivery of data between devices, connected on the sameLAN. To achieve this, every Ethernet device has a unique physical address (MACaddress). Data is encapsulated in units called frames containing both source anddestination addresses. Ethernet also provides error detection but is still an unreliableservice since there is no error correction mechanism. Neither does it provide anyflow or congestion control mechanisms, these must be implemented in higher layers.

Ethernet in its original form has generally not been used in industrial appli-cations because of its principle of a shared medium with the risk of transmissioncollisions [14]. Such collisions can cause non-deterministic delays in transmissionsand therefore Ethernet has been thought of as unsuitable for applications demand-ing low latency or real-time communication. The medium access control (MAC)method used by Ethernet is the source of this problem as described below. Furtherdevelopment of Ethernet has reduced or even eliminated this problem altogether byintroducing new technology.

10

3.1.2 Medium Access Control

Carrier Sense Multiple Access with Collision Detection (CSMA/CD) is the methodused by Ethernet to access the medium [6]. This method states that any device cansend data on the medium which is shared between all devices. Only one device canuse the medium at a time. To transmit, a device first listens to the medium. If themedium is idle, it starts to send, otherwise it waits. The device then monitors themedium to see if the transmission was successful. Because of the propagation delay,it is possible that two devices both find the medium idle and start to transmit whichresults in a collision between frames. All sending devices will notice the collisionsince they will receive garbled data while monitoring the medium. If this does nothappen the transmission was successful.

When a collision occurs, the frame has to be sent again. To reduce the possibilityof simultaneous resending of data, the device waits a random period before tryingagain. Each time a collision occurs the maximum time to wait is doubled, thuscreating what is called an exponential backoff. This algorithm is the source of theunpredictable delay found in Ethernet. Eventually after a total of 16 failed attemptsto send the frame, it is dropped.

It is clear that CSMA/CD is not deterministic. It is impossible to determinebeforehand how long time the collision resolution will take. The question is how bigthis problem is in reality, which of course depends on the application.

In an analysis [26] which assumes an isolated network and traffic consisting ofmany small packets of similar size, it is shown that the probability of CSMA/CDcausing unreasonable delays is very small when the load is light. Using Fast Ethernetand sending 1000 packets of 128 bytes each per second, it is 99% certain that therewill not be a delay of more than 2 ms due to collisions in 300 000 years. Increasingthe packet size to 1024 bytes, it will not happen in 600 years. In the latter case wehave a total load created by traffic of 8 Mbps, slightly less than one tenth of thecapacity of the network. When network load is increased, performance will quicklyget worse.

It seems unlikely that CSMA/CD should be a problem if the capacity of thenetwork far exceeds the demands of communication, then collisions will be few andresolution very quick. Fortunately, this is actually no longer an issue with theintroduction of switched Ethernet and full-duplex connections.

3.1.3 Switched Ethernet

There are several kinds of connecting devices used to build a LAN [6]. The mostsimple is the repeater. The repeater operates only in the physical layer and actsas a node with two ports. It simply copies the signal received on one of the portsand forwards it to the other port. A repeater has no filtering capability, it forwardsevery signal it receives.

A multiport repeater is called a hub. An incoming signal on one of the portsis forwarded to all the other ports. Communication through a hub is broadcasted

11

to all connected devices and because of this they share the total capacity of thenetwork and compete for access to the medium. The larger the network gets, thereis less capacity per node and a greater risk of collision when transmitting.

A more effective connecting device is the bridge. It operates in both the physicaland the data link layers. As a physical layer device it works as the repeater. As adata link layer device it has access to the MAC addresses of the Ethernet frame andtherefore can provide filtering of traffic. Depending on the destination address, abridge can decide whether to forward the frame or not. This requires the bridge tohave a table that maps addresses to ports. Upon receipt of a frame, the destinationaddress is used to find the appropriate port in the table. If the departing port isdifferent from the arriving port, the frame is forwarded, otherwise it is dropped.The forwarding table is dynamically created by the bridge. In the beginning thetable is empty and frames have to be broadcasted because the device locations arenot known. By inspecting the traffic, the bridge learns the locations of the devicesautomatically.

In an Ethernet network with hubs, the total capacity is divided among all theconnected devices and contention for access to the medium is a problem. By usingbridges a network can be divided into segments that works as separate networksconcerning capacity and medium access. Only traffic destined to another segmentis forwarded by the bridge. This makes better use of the network capacity andperhaps more important, creates several smaller collision domains. Devices onlyhave to contend with devices of the same segment for access to the medium whichgreatly reduces the probability of collisions.

From bridged Ethernet the next logical step is to switched Ethernet where thenetwork is divided into as many segments as there are connected devices. A switchis a multiport bridge with a single device connected to each port. The capacity andcollision domain is shared only between the device and the switch which is a greatimprovement compared to Ethernet using hubs as connection devices.

3.1.4 Full-Duplex Connections

Half-duplex connections means that a device can either send or receive, but notat the same time. Since devices are connected using a single link, CSMA/CD isnecessary in case both devices decide to transmit at the same time.

Full-duplex connections on the other hand uses two links, one to transmit andone to receive. In a full-duplex switched Ethernet network there is no need for theCSMA/CD method [6]. Each device is connected to the switch via two separatelinks and both ends of the connection can send and receive independently withoutthe risk of collisions. Since each link is a dedicated one-way path between deviceand switch, collisions are no longer possible and CSMA/CD is no longer needed.

12

3.1.5 Error Detection

Ethernet provides a method for detection of data corruption which may happenduring transmission. It uses the cyclic redundancy check (CRC) technique whichis a very powerful error detection technique and is considered to be more reliablethan those of the common protocols found in the higher layers [6]. CRC is based onbinary division and each frame is padded with a number to make it exactly divisibleby a predetermined binary number. At the destination, the frame is divided by thesame number and if there is no remainder the frame is assumed to be intact. Ifthere is a remainder, the frame has been corrupted in transit and is dropped.

Error correction is not implemented in Ethernet, when errors occur, frames aresimply lost. The use of CRC makes it difficult to implement forward error correction(FEC) techniques in protocols using Ethernet since the data will never reach thehigher layers if an error is detected. In order to change this, modifications to theexisting implementation is required.

3.1.6 Performance

With the use of full-duplex switched Ethernet the risk of frame collisions are elimi-nated which means that old arguments about Ethernet not being suitable for low-latency industrial applications are no longer valid. Performance evaluations [14]show that switched Ethernet will keep its low-latency characteristics with delaysof a few milliseconds well above a network load of 50%, when at the same timeEthernet using CSMA/CD will have delays clocking in around 500 ms and packetloss due to repeated collisions frequently occurring. The communication delay ofswitched Ethernet is very low with little variation and is therefore well suited forlow-latency and real-time applications.

The only thing that could hamper the performance of switched Ethernet is theswitch itself. Most switches work along the store and forward principle which meansthat when receiving a frame, the switch checks the link to the destination and if itis idle, it forwards the frame. If the link is busy, the frame is stored in a buffer andsent as soon as the link is idle. If the capacity of a link is exceeded this buffer mayoverflow and frames are dropped. Overflow may happen if several incoming linksare forwarded to the same outgoing link and the combined load is too large for asingle link. One can also safely assume that modern switches have the processingpower required to forward a large enough amount of packets to handle almost anycommunication requirements [14].

Switched Ethernet using full-duplex connections is not a guaranteed reliabletechnology but it can be assumed that packet loss is unlikely to happen. Sincecollisions is a thing of the past, almost the only reason for packet loss on a LANwould be because of data corruption which is dependent on the environment wherethe network is used and can vary greatly.

13

3.2 Network Layer

While the data link layer handles physical addressing and delivery from one node tothe next within a network, the network layer is responsible for delivery between hostsacross possibly multiple networks. Independent networks are connected togetherto large internetworks using a network layer protocol which implements a logicaladdressing scheme for the entire internetwork. The devices connecting the networksare called routers and deliver packets between networks based on the logical address.

3.2.1 Internet Protocol (IP)

The Internet Protocol (IP) [7, 9] is the network layer protocol used on the Internetand is very often found in other both small and large networks. IP is the networklayer protocol of the TCP/IP protocol suite that also includes transport layer pro-tocols which will be described later on. The version most in use today is version 4(IPv4) and is the one described here. IP is an unreliable and connectionless protocolfor delivery of packets. Each packet, called a datagram, is handled independentlyand may travel across a different path to the destination. Datagrams can arrive outof order and be lost or corrupted. IP relies on higher level protocols to deal withreliability issues.

An IP datagram has a header containing the source and destination IP addresses,checksum and time to live (TTL) counter among other things. The checksum onlycovers the header itself. Since the header is changed at each router when the TTLcounter is decreased it has to be recalculated and therefore ignores the payload inorder to speed up calculation. Since datagrams can travel through many differentnetworks using different physical and data link layers, IP must be able to cope witha changing maximum transfer unit (MTU). The MTU is the maximum supportedsize of a packet in a particular network.

Each router extracts the IP datagram from the frame it receives and then encap-sulates it in a new frame supported by the destination network. If an IP datagrammust travel through a network supporting a MTU which is smaller than the currentdatagram, the datagram has to be divided into several parts, this is called frag-mentation. The sending host usually do not fragment an IP datagram, it uses theMTU of the network it is connected to and creates datagrams of appropriate size.When a datagram is fragmented each fragment gets its own header and is treatedas a independent IP datagram until it reaches the destination. A datagram canbe subjected to repeated fragmentation during transmission. The receiving hostfinally reassembles the original datagram, if fragments are lost the whole datagramis discarded.

Hosts and routers on an internetwork are identified by their unique logical ad-dresses, IP addresses. The physical address of the data link layer is a differentidentifier and depends on the chosen technology, Ethernet uses the MAC addressas previously described. When a datagram traverses an Ethernet network its IPaddress determines its path through routers while the MAC addresses are used to

14

create the Ethernet frames in order to traverse each link between two nodes. Themapping of IP addresses to MAC addresses can be achieved by using a static tableor some other mechanism.

3.2.2 Address Resolution Protocol (ARP)

The Address Resolution Protocol (ARP) [7, 20] is used to dynamically resolve theMAC address using a known IP address. Anytime a host or a router needs to find outa MAC address, it broadcasts an ARP query containing the IP address. The receiverwhich recognizes its IP address in the query sends back a response including its MACaddress. The ARP messages are encapsulated directly into Ethernet frames.

In order to improve efficiency, ARP uses cache tables. It is highly inefficient todo an ARP query for each datagram if many are sent to the same address during ashort time. The cache table stores the known mapping of an IP address to a MACaddress for a limited time. Unless the MAC address is already known, the ARPmechanism adds extra work and therefore a delay to the sending of an IP datagram.There is usually the possibility to add static entries to the cache table in order toavoid unnecessary latencies when sending to predefined hosts.

3.2.3 Internet Control Message Protocol (ICMP)

IP lacks mechanisms for reporting errors and performing network diagnostics. Forthese purposes the Internet Control Message Protocol (ICMP) [7, 22] was created. Itdefines a number of messages for reporting errors and query messages for diagnosticpurposes. Each ICMP message is encapsulated in an IP datagram.

ICMP error messages are always sent to the original source using the source IPaddress found in the datagram causing the error. Some of the errors that can bereported are unreachable hosts, datagrams discarded by routers due to congestionand erroneous datagram headers.

The query messages define requests and corresponding replies. Such as the echo-request and echo-reply messages which are used by the well known ping programto determine if two hosts can communicate with each other and provide statisticalinformation of the link between them.

3.3 Transport Layer

As previously described, some features are left out of the lower layers and have to beimplemented at the transport layer. Perhaps the most important thing is the issueof reliability. The main task of the transport layer protocols is to provide processto process communication. Addressing a specific process on a host is accomplishedby a combination of the host’s IP address and the port number registered by theprocess. This pair is called a socket.

This section is intended to describe common or otherwise interesting transportlayer protocols.

15

3.3.1 User Datagram Protocol (UDP)

The User Datagram Protocol (UDP) [21] is a message-oriented transport protocolthat provides a minimum of protocol mechanisms. The messages are sent in packetswhich have a small header with a fixed size. In addition to the compulsory sourceand destination port numbers it contains the length of the packet and an optionalchecksum. UDP provides no error correction and is therefore an unreliable transportprotocol.

What is also lacking from UDP is flow control and congestion control mecha-nisms. The lack of flow control means that there is a possibility of incoming buffersoverflowing and packets being dropped at the receiver since there is no mechanismto control the rate of incoming data [7].

UDP provides a connectionless service where each packet sent is an independentpacket. Since there is no relationship between packets they are not numbered andprovide no ordering other than the order they arrive in at the destination. This lackof ordering is significant on the Internet where packets may travel different pathsbut it is not a problem on a LAN. Since a LAN provides only a single path betweentwo nodes, packets will arrive in the order in which they were sent. Sadly though,packets may be lost due to corruption of data and without sequential numbering ofthe packets this will not be detected.

A limitation of UDP being connectionless is that each message sent by an appli-cation has to fit into a single packet. It cannot be divided into several packets andthen reassembled at the destination since the packets are completely independent.The maximum size for a UDP packet is 64 kB, even though it is usually furtherlimited by lower protocol layers and by the specific implementation [28]. This is ofcourse not a limitation for a system sending messages that are small enough.

UDP is a lightweight and efficient protocol using a minimum of overhead andprovides the basic functionality of a transport layer protocol, communication be-tween processes. As such, it is a great framework to extend and build other protocolson top of and UDP is often used in this way.

3.3.2 Transmission Control Protocol (TCP)

The Transmission Control Protocol (TCP) [31] is alongside UDP one of the tradi-tional transport layer protocols of the TCP/IP protocol suite. While UDP providesa minimum of features, TCP is a reliable transport protocol with flow control andcongestion control mechanisms.

TCP provides a stream delivery service that allows sender and receiver to ex-change data as a stream of bytes, a virtual pipe carrying data across the network.Since the layers below TCP handles data in packets, TCP groups a number ofbytes together and adds a header, creating a packet called a segment. Segments canvary in size and are marked with sequence numbers in order to provide a reliabletransport service. A consequence of TCP being byte-oriented rather than message-oriented is that the sequence numbers are not pure segment numbers. The sequence

16

number for each segment is the number of the first byte carried in that segment.TCP is connection-oriented and provides full-duplex communication, that is, the

two hosts can send segments to each other simultaneously. The connection is estab-lished using a three-way handshake where initial sequence numbers are exchangedas well as parameters and options. Once the connection is established, data andacknowledgments can be sent in both directions. Acknowledgments are carried inthe same segments as data if possible, this is called piggybacking.

Streams and Buffers

There is a sending and a receiving buffer at each end of the TCP connection. Ap-plications write a stream of data to TCP and it ends up in the sending buffer.Depending on the state of the connection, the data is organized in one or moresegments and transmitted. The segments eventually end up in the receiving bufferat the other end where they are reassembled into a stream of data and deliveredto the receiving application. The buffers are managed to favor the transmissionof large segments to improve TCP efficiency and may introduce unwanted delays.During normal operation the applications have no control or knowledge of the TCPbuffering mechanism.

There are occasions when an application wants to ensure swift delivery of dataand when unnecessary delays due to buffers are unacceptable. For those occasionsTCP provides the push feature. During a push operation the sender’s TCP mustimmediately create a segment and transmit the data. It also sets the push flag(PSH) in the segment header. When a segment where PSH is set is received, thedata must not be delayed in the receiver’s buffer but delivered to the application assoon as possible. Most APIs do not provide a way for the applications to demanda push operation because data is usually sent when it is written anyway and PSHis set if the data segment being sent empties the buffer [28]. This behavior is alsosupported by the specification [23]. The PSH flag is not a record marker and maynot correspond with segment boundaries. It is optional for TCP to pass a receivedPSH flag to the application layer.

Normally TCP delivers data in order but it also provides a mechanism for ap-plications that may need urgent data delivered out of order. The segment headercontains an urgent pointer field as well as a flag (URG) to tell that the field is valid.If URG is set, the urgent field contains a pointer to the last byte of urgent dataof the segment. When an application sends urgent data, TCP creates a segmentand inserts the urgent data in the beginning, the remainder of the segment maycontain normal data. The URG flag is set and the pointer field denotes the tran-sition from urgent to normal data in the stream. When a segment with the URGflag set is received, the urgent data can be extracted from the stream by using theurgent pointer and delivered out of order to the receiving application [7]. This isnot true out-of-band data since the urgent data is sent in the same stream [28].True out-of-band data could be accomplished by using a second TCP connection.

17

Error Detection and Correction

TCP is a reliable transport protocol and accomplishes error correction by retrans-mitting lost segments and segments failing the checksum test. TCP uses a manda-tory checksum for each segment and corrupted segments are discarded. For eachtransmitted segment the sender starts a timer and when the timer expires the seg-ment is presumed lost and therefore retransmitted. Timers are not used for segmentscarrying only acknowledgments and no such segment is ever retransmitted.

Received segments are acknowledged by positive cumulative acknowledgments(ACKs). The sequence number of the ACK advertises the next byte expected bythe receiver and confirms all bytes with a lower sequence number. This strategyis not optimal and may lead to retransmissions of more data than necessary ona timeout since segments can be received out of order but only acknowledged cu-mulatively. The selective acknowledgment (SACK) option addresses this issue andextends the normal ACK with additional acknowledgments of segments receivedout of sequence [7]. Since TCP is full-duplex, the efficiency is improved by alwayspiggybacking ACKs to data segments going in the other direction if possible. ACKscan even be delayed in order to wait for data segments being sent, however thisdelay should not be so long that it triggers retransmissions. A typical value for themaximum delay of an ACK is 200 ms and it must be lower than 500 ms [28].

When retransmissions are made, TCP is not forced to resend the identical seg-ments, it can form new ones if appropriate. This is possible since the TCP sequencenumbers are byte-related instead of segment-related. Retransmissions occur eitherbecause of a retransmission timeout (RTO) or because three duplicate ACKs hasbeen received. TCP sends an immediate duplicate ACK when an out-of-order seg-ment with a sequence number higher than the expected one is received to inform thesender of a possibly lost segment. The three duplicate ACK mechanism is calledfast retransmission and can make a big difference when the value of the RTO islarge.

The value of the RTO is dynamically calculated to handle all kinds of connec-tions and fluctuating environments. It is based on the measured round-trip time(RTT) and smoothed by using a weighted average, then the calculated deviation isadded. For each retransmission the value of the RTO is doubled, according to theexponential backoff strategy, until TCP finally gives up and signals an error.

Flow Control

TCP uses a variable sliding window mechanism for flow control. The size of thewindow is specified in bytes and is constantly advertised in the receiver’s ACKs.The size is the number of bytes available in the receiver’s incoming buffer and thesender computes its usable window using the last acknowledged sequence number.The window can be shrunk, the right edge moved to the left, although this is notencouraged. Instead a window size of zero can be advertised to temporarily shutdown the transmission, the sender does not really shrink the window, just stops

18

transmitting. The sender can still transmit a segment of 1 byte which is usedfor probing in case of the segment opening the window again is lost. The sizeof the receiver’s buffer and therefore also the window can have a large impact onperformance during bulk transfers [28].

Situations can arise in the flow control mechanism which result in small move-ments of the window and therefore also small segments being transmitted. Thisproblem is called the silly window syndrome and results in poor efficiency since theoverhead for each segment is very large. At the sending end of the connection, thisproblem is prevented by forcing TCP to wait and collect data in larger segmentsusing Nagle’s algorithm. The algorithm states that the sender can transmit the firstpiece of data regardless of size. After sending a segment, the next one can not besent until an ACK for the previous one has arrived or there is enough data collectedto send a maximum-sized segment. On slow links, Nagle’s algorithm leads to largersegments and increased efficiency. At the receiving end of the connection, the sillywindow syndrome is prevented by delaying ACKs which results in advertising thelargest window possible since there is more space in the buffer as more data hasbeen processed.

The silly window syndrome is mostly a problem on WANs, on a LAN the capac-ity is normally much greater and the poor efficiency is not as significant. Nagle’salgorithm may introduce unwanted delays when sending data, for instance, a sit-uation could arise where it waits for a delayed ACK before sending the data. Forsome applications this is not acceptable and TCP must provide a way to disableNagle’s algorithm [23].

Congestion Control

The congestion control mechanism of TCP is tightly integrated with the flow con-trol mechanism and the size of the sender’s window. The window size is not onlydetermined by the receiver but also by congestion in the network. The actual sizeof the sender’s window is the smallest of the size advertised by the receiver and thesize determined by the congestion control mechanism. The congestion control isbased on three phases: slow start, congestion avoidance and congestion detection.

TCP begins with the slow start phase which starts with a small congestionwindow and quickly increases the size. The initial congestion window size is onemaximum segment size (MSS) which is determined during connection establishment.The MSS is usually set to 1460 bytes on connections across Ethernet to fit withinthe maximum transfer unit (MTU) of 1500 bytes including headers [28]. The sizethen increases by one MSS for each ACK received which provides an exponentialincrease until a predefined threshold is reached. Then the congestion avoidancephase begins which uses an additive increase instead of an exponential, in order toslow down the rate of increase and hopefully avoid congestion.

Congestion is detected by the loss of a segment and the need for retransmission,though not all cases of segment loss are the cause of congestion. Retransmissionmay occur because of a timeout or when three duplicate ACKs are received. If a

19

timeout occurs there is a stronger possibility of congestion and in this case TCPstarts over with the slow start phase, this time with a decreased threshold set to halfthe current window size. In the latter case there is a weaker possibility of congestionsince some segments later in the sequence must have arrived safely. TCP then setsthe congestion window and the threshold to half the current window size and startsthe congestion avoidance phase, this is called fast recovery.

TCP Improvements

TCP is a very successful protocol but although it is quite adaptable to most situa-tions it is often regarded as too slow and bloated with features to be used in highperformance computing [8]. Significant efforts have been made to address differentperformance issues of TCP. To increase throughput on modern high capacity net-works it is of interest to decrease the overhead imposed by the large header usedby TCP. This is accomplished by recognizing and using the larger MTUs supportedby many modern networks in order to create larger segments and therefore lessoverhead. There are also several TCP options designed to enhance performance onhigh capacity networks such as the option to use larger windows and a differentRTT measurement mechanism using timestamps [11]. To decrease latency in TCPconnections, effort has been put into decreasing the involvement of the operatingsystem by offloading work to the network interface adapters, especially checksumcalculations [3]. Modifications have also been made in order to avoid unnecessarycopying between different layers of buffers in the operating system. Several of theseenhancements require modifications to existing implementations or the use of specialhardware.

There is also an extension to TCP called TCP for Transactions (T/TCP) [2]which aims at reducing communication overhead for transactions. A transactionis simply a request followed by a response. When using normal TCP, a transac-tion would result in first a three way handshake to establish the connection, thenthe exchange of messages followed by a connection termination. T/TCP accom-plishes the same result with a minimal three messages, combining the request andconnection initiation messages as well as the response and connection terminationmessages [28]. The beauty of T/TCP is that it is an extension to TCP and isused when possible while still being backwards compatible. It also makes use of theexisting TCP mechanisms for reliability and congestion control.

3.3.3 Reliable Data Protocol (RDP)

The Reliable Data Protocol (RDP) [19, 32] is based around the idea of providingfeatures similar to TCP but with less complicated mechanisms and therefore alsowith less computational overhead. The original idea was to design a transport pro-tocol for reliable and efficient bulk transfer of data. By providing only the necessaryservices, the protocol should remain efficient and relatively easy to implement. A

20

small footprint would also allow devices with limited resources to make use of theprotocol.

RDP provides a reliable, message-oriented transport service. It detects dam-aged, lost, duplicated and out-of-order packets by using checksums and sequencenumbers and handles errors by retransmission of packets or by informing the user.A message from a higher level is transmitted as a single packet, there is no mech-anism for aggregating several messages into one packet or splitting messages intoseveral packets.

RDP is a connection-oriented protocol and the connection is full-duplex. Uponthe creation of a connection, the nodes exchange sequence numbers using a three-way handshake. The nodes also advertise a window size to be used by the flowcontrol mechanism and a maximum segment size (MSS), specifying the maximumsize of a packet. Each end of the connection specifies its own values. Messageswhich result in a packet larger than the MSS are not allowed and if the user tries tosend such a message, an error will be returned. RDP provides for both ordered andunordered delivery of packets. Which delivery mechanism to use is also establishedduring the connection phase.

Each data packet in RDP has a checksum, which is the same one as TCPuses, and a sequence number. The receiver acknowledges packets as they arrive.RDP allows two types of acknowledgments (ACKs). A cumulative ACK is usedto acknowledge all packets received in sequence up to a specified sequence numberand a non-cumulative ACK to acknowledge packets received out of sequence. Thelatter is similar to the TCP SACK option and has the benefit of no unnecessaryretransmission of packets, only unacknowledged packets are retransmitted. The typeof ACK that is used depends on the received packets, if possible, the cumulativeACK is used.

Packets may be lost during transmission or be discarded at the receiver becauseof a failed checksum test or if the MSS of the connection is exceeded. Missing packetsare detected using a retransmission timer. A timer is set by the sender for eachtransmitted packet. When an ACK is received for a packet the timer for that packetis cancelled. If the timer expires, the packet is assumed lost and is retransmittedand the timer is restarted. There is no specified method for determining the initialvalue of the timer, what value to use is decided upon implementation.

RDP uses a simple version of the sliding window flow control mechanism basedon the advertised window size. The window size specifies how many unacknowledgedpackets which may be in transit at a time. The receiver decides which window sizeto use according to available buffers. Once the window size is advertised during theconnection phase, it may not be changed. This makes the flow control mechanismvery limited since the receiver has no way to throttle down the rate of transmissionother than to delay ACKs, which may lead to unnecessary retransmissions and inthe end a congested network [18]. It is unfortunate if this would happen since RDPdoes not specify a mechanism for congestion control.

RDP is in an experimental state and implementations are scarce. Probably be-cause the benefits over using TCP are not that great and its usefulness is hampered

21

by the lack of decent flow and congestion control mechanisms. On the other hand,if these mechanisms were added to RDP its complexity would increase. TCP hasalso received several updates during its lifetime, especially to increase efficiency forbulk transfers on fast links and RDP has probably failed to maintain an edge overTCP even in that area.

3.3.4 Reliable User Datagram Protocol (RUDP)

The Reliable User Datagram Protocol (RUDP) [1] is not a standardized protocol butrather the result of a discussion of a simple, message-oriented transport protocol.It is based on the specification of RDP [19, 32] and then adapted to run on top ofUDP, hence the name.

RUDP packets are encapsulated in UDP packets and therefore uses the deliverymechanism of UDP. What RUDP adds are a reliable delivery of packets and a flowcontrol mechanism while still keeping a low overhead by using simple mechanismsand therefore providing high performance. Rather than aiming at bulk transfersas RDP did, RUDP aims at signaling protocols that need a highly configurabletransport protocol.

Just like RDP, RUDP is a connection-oriented protocol that provides reliabledelivery of messages. It is designed to be very adaptable, the behavior of retrans-missions, timers and other parameters are configurable for each connection. Whichparameters to use are negotiated upon and determined during the connection es-tablishment phase.

RUDP also adds a keep alive mechanism to determine if the other end of theconnection is still active and what to do if the connection is broken. It uses a timerand a special message which the other node immediately has to respond to whenreceived. The timer is started when the connection is opened and reset each timea user message is sent. If the connection is idle, the timer will eventually expireand the special message will be sent to check if the other node is still active. If theother node is not responding, RUDP has an auto reset feature which tears down theconnection and tries to reestablish it by going through the connection phase onceagain. If this is not possible, the user is informed.

RUDP shares the features and limitations of RDP as it tries to refresh theideas of RDP and make it more useful. It is still a simple and probably efficientprotocol with a limited flow control and non-existent congestion control mechanisms.This protocol was proposed in a draft and since then there has been no furtherdevelopment, but it still serves as a template for a reliable protocol built uponUDP.

3.3.5 Stream Control Transmission Protocol (SCTP)

The Stream Control Transmission Protocol (SCTP) [29] is a new advanced trans-port protocol. The motivation for developing SCTP is that even though TCP hasprovided excellent service for a long time, there is an increasing number of applica-

22

tions which find TCP too limiting. SCTP provides a reliable and message-orientedtransport service that combines good features of UDP and TCP [7] as well as addingcompletely new features like multihoming and multiple streams within a connection.

One of the annoyances of TCP is its byte-oriented nature. TCP provides astream of bytes which means that applications using messages must add their ownmarking to preserve message boundaries and possibly make use of the TCP pushfacility to ensure quick delivery of the message. By being message-oriented fromthe start, SCTP removes this limitation.

SCTP is a connection-oriented protocol supporting data exchange between twohosts. These hosts may be represented by multiple IP addresses in order to pro-vide redundancy and improve fault tolerance, this is the multihoming feature. Aconnection in SCTP is called an association and can contain multiple streams. Incomparison, TCP features a single stream in each direction which means that ifthere is a loss of a piece of the stream, TCP blocks the delivery of the rest of thedata until the missing piece is retransmitted. SCTP allows for data to be parti-tioned into multiple independent streams and if one is blocked, only delivery withinthat stream is affected, the others can still deliver their data.

SCTP accomplishes multiple streams by creating independence between datatransmission and data delivery [17]. An SCTP packet can contain several chunksof either data or control information. Each chunk of data uses two sets of sequencenumbers, a transmission sequence number (TSN) and the stream ID (SI)/streamsequence number (SSN) pair. The TSN is used to govern the transmission of dataand detect the loss of packets, the SI/SSN pair is used in the delivery of data tomaintain the correct sequence of received data. If there is a gap in the transmis-sion sequence there will be a corresponding gap in the stream sequence of affectedstreams, other streams will not show a gap and can continue to deliver data.

A stream supports both ordered and unordered data delivery. Unordered de-livery ignores sequence numbers and delivers data to the application in the orderit arrives. This feature negates the need for an urgent pointer facility as used byTCP. Thanks to its multistream capability, SCTP can deliver urgent messages in aseparate unordered stream instead.

SCTP is a reliable transport protocol and provides error correction by retrans-mitting data. This is where the TSN is used. Data is acknowledged using theselective acknowledgment procedure (SACK). The receipt of data is acknowledgedby sending a SACK that indicates the current cumulative TSN range received andif there is a gap in the sequence, any non-cumulative TSNs also received. Similarto TCP, SACKs are delayed with an upper limit on the delay between SACKs.Retransmission of missing TSNs occur when either the retransmission timer ex-pires and causes a timeout or SACKs are received indicating a gap in the receivedTSN sequence. Once again similar to TCP, the retransmission timeout (RTO) isdynamically adjusted based on estimates of the round-trip delay (RTT).

Since transport is done within a single SCTP association, all streams are sub-jected to a common flow and congestion control mechanism which reduces the over-head required at the transport level compared to using several TCP streams. Flow

23

and congestion control in SCTP follow the corresponding TCP mechanisms, usingsliding window and slow start, congestion avoidance and fast recovery.

SCTP also provides other features and enhancements over TCP like a connectionestablishment procedure which is less vulnerable to denial of service attacks and animproved checksum.

SCTP is definitely an advanced protocol, set to replace TCP in applicationsthat could make good use of the multistreaming and multihoming features, like IPtelephony [7]. It is still quite new and it will probably take a while before SCTP isas mature as current TCP implementations.

3.3.6 Datagram Congestion Control Protocol (DCCP)

The Datagram Congestion Control Protocol (DCCP) [12, 13] is a transport protocolunder development that provides a congestion-controlled flow of unreliable packets.One of the motivations for DCCP is to enable the use of congestion control forapplications that would otherwise be using UDP, such as streaming media. Insteadof implementing various different congestion control mechanisms on top of UDP forhigher level protocols, DCCP will provide a common base. Just like UDP it aims toprovide a minimal foundation to use as a building block of more advanced services.

One thing of interest in DCCP is the use of sequence numbers and acknowl-edgments. In order to make congestion control work, the sender must know aboutoccurring packet losses. In DCCP sequence numbers and acknowledgments are usedto inform the sender about packet loss due to congestion in the network. They arenot used to provide a reliable transport service. Lost packets are never retransmit-ted and there are no guarantees of packets being delivered in order.

Congestion control is an important feature on the Internet but not necessarilyon a LAN where capacity mostly is not a problem.

3.3.7 Real-time Transport Protocol (RTP)

The Real-time Transport Protocol (RTP) [27] is designed to handle real-time traffic.Since it does not have any delivery mechanism of its own, it is built on top of UDP,each RTP packet is simply encapsulated in a UDP packet [7]. RTP adds facilitiesfor payload type identification, sequence numbering, timestamping and deliverymonitoring. RTP is mainly used for real-time audio and video data.

Timestamping is provided to be able to preserve a relationship in time betweenpackets in a transmission. When used for video, the receiver knows when to playeach incoming packet by looking at the timestamps. The timestamps alone are notenough to ensure ordered delivery and therefore sequence numbers are also provided.The sequence numbers allow the receiver to reconstruct the sender’s packet sequenceand detect packet losses.

RTP only allows packets that carry data from the source to the destination. Inorder to provide delivery monitoring the recipient must be able to send feedback tothe source. The data transport is augmented by a control protocol, the Real-time

24

Transport Control Protocol (RTCP), which defines several messages which allow thesource and receiver to communicate during the transmission. The primary functionof RTCP is to provide feedback on the quality of service being provided by RTP.

RTP and RTCP themselves do not provide mechanisms to ensure reliable ortimely delivery, these things have to be provided by some other mechanism.

3.4 Middleware

Middleware is really an application that logically lives in the application layer butwhich provides some service to other applications [30]. This service is often a highlevel communication protocol, completely independent from the application thatmakes use of it. The idea of middleware is to hide the mechanisms at work atthe lower layers and providing a common interface to the application even if thelower layers are changed. This makes it possible to write applications for a certainmiddleware and run it on different platforms since the interface to the middlewareis constant.

Middleware is heavily used in todays applications and common middlewaremechanisms are remote procedure call (RPC), distributed objects and message queu-ing.

The idea behind RPC is to provide applications with the ability to call proce-dures located on other machines on a network. To the user there is no differencebetween local and remote procedure calls, the procedure is called, parameters passedand a result returned. The resulting network communication is invisible.

Distributed objects use the same principle as RPC and applies it to objects.Objects may reside on one machine and be invoked remotely from an applicationon another machine on the network, just as if it were a local object.

Message queuing systems support message exchange between applications basedon persistent communication, that is, if the destination is unavailable at the time ofthe message creation it will reside in some storage and will be delivered as soon aspossible. This behavior is quite different from RPC and distributed objects whichrequire that both sender and receiver are active at the same time. The emphasisof message queuing is not on quick delivery but rather to provide loosely coupledcommunication.

These middleware mechanisms provide the user with powerful methods of in-formation exchange between nodes on a network. They trade control of protocolmechanisms for ease of use. The primary reason to why they are so convenient touse is that they hide the details of network communication and present a familiarinterface to the user, like procedure calls or objects, which fit into the commondevelopment environment. Which mechanism to use largely depends on the pro-gramming paradigm of choice. Usually a middleware uses TCP or UDP as itstransport protocol.

25

3.5 Industrial Ethernet

Industrial Ethernet is the name given to the use of the Ethernet protocol in anindustrial environment, for automation and production machine control. Histori-cally, several different transmission mediums, like serial links, have been used forthese applications along with some proprietary or possibly open protocol. Decreas-ing costs of Ethernet hardware, the huge advantage in capacity and an increasingdemand for a single network type, from boardroom to plant floor, have led to thedevelopment of Industrial Ethernet. The emerging solutions usually complementthe old standards by using common user interfaces [4]. The trend is to define anapplication layer environment along with the TCP/IP protocol suite to realize anindustrial networking solution. These application layer protocol implementationscould be called middleware since they provide a communication service to otherapplications.

The Industrial Ethernet protocols provide a higher level service than the trans-port layer protocols and often define an entire system for communication with defi-nitions of different types of messages and the exchange of messages. A good reasonto using any of these protocols is the backwards compatibility offered. Many indus-trial systems are already built around one protocol and by using an implementationof the same protocol on top of Ethernet, the systems can expand to new technol-ogy without severe changes to the existing parts. These protocols are also wellestablished standards which ensures compatibility between products from differentsources. Two common Industrial Ethernet protocols are Modbus TCP and Ether-net/IP.

3.5.1 Modbus TCP

Modbus TCP [25] is an implementation of the Modbus protocol using TCP/IP. It isan open specification that aims for ease of implementation. Therefore it uses TCPas its transport protocol since it already provides a reliable service. Since TCP isbyte-oriented, Modbus messages are prefixed with a header containing the length ofthe message before being inserted into the TCP stream. This enables the receiver toextract the messages from the stream. The Modbus implementation guide [16] callsfor the use of the standard BSD Socket interface. It is recommended that Nagle’salgorithm is disabled in order to avoid performance issues regarding latency, other-wise the default TCP parameters are acceptable. Modbus TCP does not specify arequired response time since it is designed to be used in various situations.

3.5.2 Ethernet/IP

Ethernet/IP [24], where the IP stands for Industrial Protocol, is an implementationof the Industrial Protocol at the application layer, similar to Modbus TCP. It usesboth TCP and UDP for communication. TCP is used for general purpose trafficand UDP is used for traffic with stricter timing requirements. During initializa-

26

tion, and other exchange of protocol and service information, TCP is used becauseits reliable, connection-oriented exchange is preferred and its worse timing charac-teristics are neglectable. Once the connection is initialized, UDP may be used totransfer information with high timing requirements. Only data is transferred thisway, no commands or control information. The meaning of the data is configuredat initialization. The usage of UDP and the protocol stack adds an element of non-determinism and even though response times may be quick, it is not always possibleto guarantee a timely delivery, which is required of a true real-time solution [5].

3.5.3 EtherCat and Ethernet Powerlink

There are also more advanced Industrial Ethernet solutions than those describedabove. EtherCat and Ethernet Powerlink are two solutions for systems that requirecommunication with real-time guarantees [5]. Both solutions accomplishes this bybypassing most of the protocol stack and accessing the link layer directly. Mes-sages are inserted into Ethernet frames. A single master controls the network byinitiating all communication, the slaves only respond. Because of this, the mas-ter can schedule messages in a deterministic way and guarantee a timely delivery.The drawbacks of such solutions is that the compatibility with other networkingproducts is compromised and other traffic on the network may hamper the per-formance. These communication solutions are also significantly more demandingto implement than solutions using the common TCP/IP protocol suite. EtherCatand Ethernet Powerlink are specialized solutions for applications that require truereal-time communication.

27

Chapter 4

Summary and Selection

In this chapter, the available technology described in the previous chapter is sum-marized and suitable solutions are selected for implementation and evaluation.

4.1 Available Solutions

To begin with, using Ethernet as the transmission medium for the system featuredin this project is a good choice and one that is being made by many others asEthernet is currently phasing out older technology in industrial applications. Thecapacity offered by Fast Ethernet, which is selected for this system, is much greaterthan the demand and by using switches as connecting devices it is ensured that thenetwork will provide stellar service. What Ethernet does not provide and is stillneeded is some means of reliable communication in order to handle possible datacorruption caused by electro-magnetic interference.

The Industrial Ethernet technologies described, all provide a complete solutionfor communication, although they use different approaches. EtherCat and EthernetPowerlink are suitable for systems with strict time limits for communication and byproviding this, their designs become more complex and harder to implement sincemost of the protocol stack is bypassed. Such solutions must also be used togetherwith an operating system (OS) that is able to provide the same guarantees regardingtime limits in order to be effective. Since the Windows OS used in our system is notdesigned to provide such support, there is little to gain from using these advancedsolutions compared to other simpler solutions.

Modbus TCP and Ethernet/IP are both using transport protocols available inthe TCP/IP protocol suite and therefore are less complex and easier to implement.The downside with these solutions is that they inherit unnecessary complexity fromthe older protocols which they are based on. Our system has no compatibilityissues that require the use of any of these protocols which makes these solutionsless attractive. These solutions do not add any significant functionality comparedto other alternatives to justify the higher cost of implementation and to be viablesolutions for our system.

28

The middleware mechanisms described are unsuitable simply because they donot fit the programming model. Their purpose is to hide the network communicationand provide a familiar interface to the programmer, which is of no interest here.

The requirements for our system can be met by a reliable transport protocol.Of the transport protocols described, UDP, DCCP and RTP are unreliable andtherefore unsuitable. This leaves us with TCP and SCTP, which are standardizedprotocols, and the more experimental RDP and RUDP. All of them are reliabletransport protocols. For our purposes, SCTP and TCP are very similar solutionssince most of the features available only in SCTP will be unused. The excep-tion is that SCTP provides message-oriented communication in comparison to thebyte-oriented communication provided by TCP. The T/TCP extension is not suit-able since it creates and terminates the connection for each message sent. For oursystem it is more appropriate to keep the connection open. SCTP is still underdevelopment and no suitable implementation for the required platform was found,therefore TCP was selected for further work since it is already available on the plat-form. A solution based on TCP is quite simple to implement since TCP alreadyprovides most necessary features.

A solution based on TCP was implemented and during initial testing it showedpotential problems so an alternate solution was desired for comparison. The optionsthat remained were solutions based on RDP and RUDP, two very similar protocolssince RUDP is based on RDP. RUDP was selected for further work because it iseasier to implement since it runs on top of UDP which is available on the platform.In comparison to TCP, RUDP is a simple transport protocol with few features. Asolution based on RUDP is possible to customize and adapt to the system and canprovide only the necessary features. This approach is quite a different compared to asolution based on TCP. Since TCP is developed to work in all kinds of environments,it features mechanisms such as congestion control and a dynamic retransmissionmechanism. These are features that make a lot of sense when the protocol is usedon the Internet but they are not at all necessary on the small LAN available to oursystem.

Two solutions based on different transport protocols, one that provides almostall the features one would ever need and one that provides the bare minimum,provided for an interesting evaluation.

4.2 Solution Based on TCP

A solution based on TCP is easy to implement since TCP already provides reliablecommunication. However, message-oriented communication has to be implementedon top of the byte-oriented service provided by TCP. Preferably a message shouldalso make use of the TCP push feature to ensure the smallest possible delay dueto buffers. Possibly, Nagle’s algorithm should also be disabled in order to minimizenegative effects of the flow control mechanism.

Other mechanisms of concern are the congestion control mechanism and the

29

dynamic RTO calculation. Congestion control should not cause any problems on aLAN unless an unnecessary small MSS is advertised.

A TCP implementation usually supports several options and parameters to bechanged. Most TCP options concern the maximum throughput which is not of inter-est for our system. Platform specific parameters that may increase the performancein our system was investigated.

4.3 Solution Based on RUDP

The second solution is a simplified variant of RUDP in order to reduce the timerequired for implementation. Features that are not necessary for our system wasnot implemented. The parameter negotiation during connection establishment is notnecessary since this version of RUDP will only be used in this particular system.The network and its nodes were known beforehand and are not subject to frequentchanges, therefore parameters could be decided upon during implementation. Nokeep-alive mechanism was implemented either.

The flow control mechanism was changed from the proposed sliding window tothe much less complex stop-and-wait. This severely affects throughput but sincethe latencies on a LAN are low, it should still provide enough throughput for oursystem.

30

Chapter 5

Implementation

The previous chapter described the general approach of the two solutions selectedfor implementation. In this chapter there is a more detailed description of what wasimplemented.

5.1 General Implementation Details

The implementation of the selected solutions was made in C and compiled using theBorland version 5.6 compiler on a Microsoft Windows 2000 platform. Both solutionswere implemented in user space as a layer above the transport layer of the TCP/IPprotocol suite using the Windows Sockets 2 (Winsock) API. The implementationsaim to be thread safe or at least as thread safe as the Winsock library.

A common, easy to use interface for applications to manage connections andexchange messages was used for both solutions to facilitate the development ofapplications for testing purposes.

5.2 Implementation of Solution Based on TCP

Using the implementation of Modbus TCP [16] as a template, messages over TCPwas implemented by prefixing each message with a 2-byte header. The headercontains the length of the entire packet, that is, the combined length of the messageand the header. The receiver reads from the TCP stream and can extract a packetfrom it using the length specified in the header. Then the message contained in thepacket is delivered to the application. In order to minimize the number of Winsockfunction calls, an extra level of buffering was used at this stage. The receiverreads chunks from the TCP stream and stores them in a buffer, then the header isinterpreted and the received packet is processed. Also per recommendation, Nagle’salgorithm was disabled by enabling the TCP_NODELAY option for all sockets.

Winsock provides no means of manually controlling the TCP push feature whensending, it is handled automatically by TCP. When data arrives with the push flagset, Winsock correctly interprets it and completes the receive function call [15].

31

SYN

AC

KR

STN

UL

0 7

Sequence number

ACK number

0

Figure 5.1. The RUDP header.

During initial testing the push feature worked as expected, the push flag was setfor each individual packet sent. It was also verified that a correct MSS, based onthe MTU of Ethernet, was advertised.

The TCP/IP protocol suite implementation for Windows provides a numberof parameters in the Windows registry that may be altered. The few parametersthat might be of interest to this system were TcpDelAckTicks, which controls thedelay of ACKs, and TcpMaxDataRetransmissions, which determines the number ofretransmissions of data segments. Since the RTO of TCP is based on the RTTmeasurements, it is possible that delayed ACKs may affect the value of the RTO.The default delay in Windows is 200 ms and it is possible to completely disabledelayed ACKs. Whether this parameter can be used to increase performance wasdetermined during the evaluation of this implementation. The number of retrans-missions, which defaults to 5, definitely affects the time before an error is signalledwhen TCP is unable to deliver a segment. Decreasing this value will have a negativeeffect on reliability and the parameter remained at its default value for our system.

5.3 Implementation of Solution Based on RUDP

The second solution is a simplified variant of RUDP, as previously stated. No param-eter negotiation or keep-alive mechanism was implemented. Also, the less complexflow control mechanism, stop-and-wait, was used. Since the link layer error detectionfulfills our system’s need for error detection, no checksum was used for the RUDPpackets. The checksum of the UDP packets which are transporting the RUDP pack-ets was disabled for the same reason by enabling the UDP_NOCHECKSUM socketoption.

The header of this implementation, shown in Figure 5.1, is simpler than theproposed RUDP header as it only features a control field, the sequence number andthe acknowledgment number. The length of the packet is reported by UDP and thelength of the message contained in the packet can be calculated since the size of theheader is constant. The control bits indicate what is present in the packet. Detailsare as follows:

32

Table 5.1. States for RUDP

State DescriptionCLOSED There is no connection.LISTEN Passive open performed, waiting for incoming request.SYN-SENT Active open performed, request sent to destination.SYN-RCVD Incoming request received and acknowledged,

waiting for remote acknowledgment.OPEN Connection established, data transfer may begin.CLOSED-WAIT Connection closing, wait before cleaning up.

• The SYN bit indicates that the packet is a synchronization packet which areused to establish connections.

• The ACK bit indicates that the ACK number in the header is valid.

• The RST bit indicates that the packet is a reset packet which are used toterminate connections.

• The NUL bit indicates that the packet is empty. Empty packets are used forkeep-alive probes which are not yet implemented.

• The 0 field must have the value 0.

The inner workings of RDP, of which RUDP is based, are well documented [18]and served as a specification for some parts of this implementation. The states ofa connection, state changes during connection establishment and termination andthe contents of the connection record of this RUDP implementation are based onthis specification. Table 5.1 shows a description of the possible states and a statetransition diagram for this implementation is shown in Figure 5.2. The circles of thediagram represent the states and the arrows the transition between states. Eacharrow has a box describing first the event that triggers the state transition andsecond, the output produced by RUDP.

The implementation is a two layer design with the top layer providing the sameinterface to the application as the TCP-based solution and the bottom layer con-taining the bulk of the RUDP implementation with an interface to the upper layermimicing that of Winsock with blocking function calls.

The implementation of RUDP is multi-threaded, using a separate thread tohandle the receipt of packets for each open connection. The thread sends ACKsfor correctly received packets, handles received ACKs and enqueues messages fromincoming data packets. Messages are enqueued in an internal circular buffer waitingfor a call to receive. During other operations than receive, the calling thread is usedto perform the work, often blocking it in order to wait for results. A call to sendis blocked until an ACK is received and also performs necessary retransmissions, a

33

Passive openCLOSED

User: Active openUser:

Send:

Send:

Send: RST

ACK

SYN

LISTEN

SYNSENT

User: Close

SYNRCVD

OPEN

Timeout

WAITCLOSED

Recv:

Recv:

Recv: SYN

Send: SYN,ACK

Recv: SYN,ACKACK

RST

Figure 5.2. The state transition diagram for this RUDP implementation.

34

call to connect is blocked until the connection is up. A call that fails returns a valueindicating an error. Threads are synchronized using mutex locks and semaphores.

Parameters such as the maximum number of retransmissions and the RTO areconstant and determined at compile time. The RTO was set at 10 ms for this system.Since Windows 2000 does not guarantee real-time execution, timeouts shorter thanthis makes little sense. The internal timer of the operating system usually runs at100 Hz which results in an interrupt each 10 ms and the time-slices scheduled toprocesses are usually in the range of 40 ms. During heavy load, even an RTO of10 ms may not work as expected since the process can be forced to wait for executionbut I/O bound processes have a high priority and during moderate load it workedwhen tested. A 10 ms RTO can not be guaranteed but has a good chance of workingand if it occasionally does not, the retransmission will simply be delayed.

35

Chapter 6

Evaluation

This chapter describes the tests performed in order to evaluate the two implementedsolutions. The results of the tests are presented and discussed.

6.1 Test Environment

The evaluation was made in a simulated environment. The system used for runningthe tests was similar to the actual system and input was created by simulatingcommunication within the system.

6.1.1 Platform

The tests were performed on two computers equipped with the Microsoft Windows2000 operating system, Pentium III/Athlon class CPUs and Fast Ethernet networkinterfaces, a setup which resembles the actual system. Each computer was connectedto a separate Fast Ethernet switch and the switches were connected as shown inFigure 6.1. All connections were full-duplex. Using this setup, the cable connectingthe switches can be disconnected in order to simulate connection errors withoutalerting the operating system on any of the computers, which would be the case ifthe connection between a computer and a switch was disconnected since the networkinterface can sense link failure.

All tests were performed with hot ARP caches in order to avoid unexpecteddelays. No traffic other than what was generated by the test programs were presenton the network. Measurements were made using the Windows high-resolution timerwithin the test programs or by studying the traffic using the Ethereal networkprotocol analyzer program.

6.1.2 Simulated Communication

The communication present in the current system is limited by the slow serial linkswith a capacity of 9.6 kbps that are connecting the hosts. A network-based systemwith 100 Mbps links should have little trouble coping with the generated traffic.

36

Switch Switch

Host A Host B

Figure 6.1. The test platform.

Therefore the simulated communication used for testing purposes was based onfuture demands. If the tested solutions can handle these higher demands, they willeasily handle the current demands as well.

It was decided that a very high load for the network-based system would begenerated if each message had the size of 1000 bytes and each node would sendsuch a message every 10 ms. The generated traffic would demand a capacity of0.8 Mbps for each connection. This load is not likely to be continuous, but fortesting purposes, it was. The number of nodes on the network today is four andit will remain few even for the foreseeable future, a total of five nodes was decidedsuitable for evaluation purposes.

6.2 Preliminary Evaluation of the RUDP Implementation

Because of the changes made to the RUDP implementation regarding the flow con-trol mechanism, from the specified sliding window to stop-and-wait, the throughputof the RUDP-based solution was tested. The stop-and-wait mechanism is inefficientregarding throughput and it was necessary to validate that the implementation cansupply the system with sufficient capacity.

Test Setup

Since the throughput is heavily dependent on the size of each packet, with biggerpackets giving a higher throughput, the message size of 1000 bytes used for theother tests was also used in this test. This ensures that the result from this test isnot an artificially high value with little correlation to the other tests.

Testing of the throughput was performed by sending one million messages fromhost A in Figure 6.1 to host B as fast as possible, with no delays between messages.The time required to complete the entire transfer was measured. Then the resultingthroughput was calculated.

37

Test Result

The throughput was measured to approximately 19.6 Mbps during this test which issignificantly lower than the capacity of the network at 100 Mbps. It is still enoughto be a huge improvement compared to the serial links with a capacity of 9.6 kbpswhich are used in the current system and easily enough to cope with the demandsof the simulated communication which needs 0.8 Mbps for each connection.

Since the requirements for capacity are low, the RUDP implementation is able toprovide enough throughput even though it is crippled by an inefficient flow controlmechanism which does not make full use of the capacity of the network.

6.3 Comparative Evaluation

In order to find out which solution is the most suitable for the system featured inthis project, an evaluation of the implemented solutions was performed. The eval-uation consisted of testing both of the solutions and comparing their performanceregarding the properties specified in the problem definition. Both solutions are pro-viding reliable and message-oriented communication which is a requirement. Theimportant properties that remained to test were the communication latency andthe handling of errors.

6.3.1 Communication Latency

In the problem definition it is stated that the delay during communication shouldbe short. This test is designed to find out if the implemented solutions are capableof communication with low latency.

Test Setup

In order to measure the latency present during communication, a message was sentfrom host A in Figure 6.1 to host B which then immediately echoed the messageback to host A. Host A kept track of the resulting RTT, the time elapsed betweensending and receiving the message. The size of the messages was 1000 bytes andthey were sent at 10 ms intervals. An average RTT and the standard deviation wascalculated from 10 000 measurements.

The solution based on TCP was tested twice, once with delayed ACKs set attheir default setting and once with delayed ACKs turned off completely. DelayedACKs were not expected to have a great impact on these measurements, but bothalternatives were tested since the TCP-based solution was tested in this way furtheron as well.

In order to simulate a heavy load, five connections with equal load were createdbetween host A and B simultaneously during the measurements. Thus resemblinga real world scenario where a single host is serving several other hosts which puts

38

Table 6.1. Results from the measurements of the RTT in milliseconds (ms).

Solution based on x̄ min max σ

TCP 1.0 0.8 12.1 0.30TCP (Delayed ACKs off) 1.0 0.8 14.3 0.28RUDP 1.1 0.9 13.6 0.30

more strain on the system. A heavy load may increase the latencies and thereforeit was more interesting to evaluate the RTT under such circumstances.

Test Results

The results from the RTT measurements are shown in Table 6.1. It is clear that theTCP-based solution does not gain or lose any significant performance due to turningoff the delayed ACKs, just as predicted. It is also apparent that the higher com-plexity of TCP and all the mechanisms involved does not prevent it from showinga great low-latency performance. The solution based on RUDP seems to performslightly worse but is still quite comparable. During all of these measurements, afew spikes were experienced where the RTT was in the range of 10–15 ms. Theseoccurrences were few, less than one in a thousand, but they are a good remainder ofthe fact that the Windows operating system is not guaranteeing this kind of perfor-mance, due to the scheduling of processes, but on a moderately loaded system it isquite achievable. These few but very large values also affect the standard deviationsignificantly.

Both solutions present an average RTT close to 1 ms, which means that theyare capable of low-latency communication. They are also well below the limit of200 ms, even in the worst case, which was specified as an unacceptably high delayin the problem definition.

The outcome of this test was not decisive concerning which solution is the bestsuited for the system featured in this project but the solution based on TCP cameout slightly on top.

6.3.2 Error Detection and Handling

The problem definition states that the detection and handling of errors is importantdue to expected electro-magnetic interference from the surrounding environment.Both solutions are equally good at detecting errors during communication. TheCRC present in Ethernet provides both of the solutions with a good and sufficienterror detection mechanism. Since the policy is to discard any packets containingerrors, both solutions use the same approach to correct errors.

Errors are corrected by retransmission but even though the mechanisms work ina similar way, the RTO is handled very differently and will dictate the responsivenessof the error handling mechanism of each solution. This test is designed to find out

39

Table 6.2. Results from the measurements of the RTO in milliseconds (ms).

1st RTO 2nd RTO 3rd RTO 4th RTOSolution based on x̄ σ x̄ σ x̄ σ x̄ σ

TCP 429 49 805 0.44 1610 0.46 3219 0.66TCP (Delayed ACKs off) 352 30 805 0.47 1609 0.50 3219 0.70RUDP 10 0.49 11 0.50 11 0.50 10 0.50

if the dynamically calculated RTO present in TCP or the preset RTO of RUDP isthe most suitable strategy for our system.

Test Setup

During this test, messages were sent from host A in Figure 6.1 to host B at regularintervals. To simulate a lost packet, the cable connecting the two switches weredisconnected, causing host A to send a packet but not receiving any correspondingACK from host B. Eventually the RTO was triggered and host A sent the lostpacket once again. Finally, the maximum number of retransmissions was reachedand an error was signalled.

The solution based on TCP was tested twice, once with delayed ACKs set attheir default setting of 200 ms and once with delayed ACKs turned off completely.These tests were made in order to investigate if turning off delayed ACKs has apositive effect on the RTT measurements made by TCP and therefore also loweringthe RTO which is based on the RTT. An initial transaction of 1000 messages weremade over the connection before the cable was disconnected in order to stabilizethe RTT measurements and the value of the RTO.

Using Ethereal running on host A, the times of the four first occurrences ofretransmissions were investigated and the RTO for each retransmission calculated.This process was repeated 100 times. Then the average RTO and the standarddeviation was calculated for each order of retransmission.

Test Results

In Table 6.2, the average RTO and the standard deviation of the four first re-transmissions of the TCP-based solution, with and without delayed ACKs, and theRUDP-based solution are presented. During the measurements, all values were verystable, except for the first timeout in both of the solutions based on TCP. This isapparent when comparing the standard deviation of the first timeout to the others.In the case of TCP, the exponential backoff strategy is also clearly visible when com-paring the 2nd, 3rd and 4th timeouts, with the RTO effectively doubled betweeneach retransmission. RUDP manages to keep its RTO close to the preset value of10 ms and uses no exponential backoff.

40

The value of the first timeout for TCP varied significantly between measure-ments but the test still indicates that turning off delayed ACKs lowered the RTOsomewhat, although the improvement failed to affect the RTO of further retrans-missions. Unfortunately, it seems as if TCP still does take full advantage of thelow-latency LAN and presents timeouts of several hundred and even thousands ofmilliseconds and no additional parameters that may affect the RTO were found.

The implication of the long RTO of TCP is that in an environment where packetsare lost, the delay when delivering packets may vary wildly. From a couple ofmilliseconds when everything is working to several hundred milliseconds when apacket is lost and has to be retransmitted. In contrast, RUDP will retransmitlost packets much earlier and therefore maintains quite a short delay, even in anenvironment where interference cause packet loss. The downside with aggressiveretransmission is the greater risk of unnecessary retransmissions which leads to anincreased load on the network. On a LAN where capacity is not a problem andlatency is low, the long RTO and the exponential backoff strategy of TCP areunsuitable for systems such as the one featured in this project.

To some extent, the TCP fast retransmit mechanism can help with the problemof the long RTO of TCP. As previously described, it may perform early retrans-missions and increase error handling performance. Fast retransmission works wellduring bulk transfers but if packets are sent more seldom, it is less likely thatenough traffic will be generated to trigger this mechanism before the RTO expires.Therefore the fast retransmit mechanism will not be a reliable means of shorteningthe time between the loss of a packet and its retransmission in a system where thetraffic consists of sporadic messages.

Further Discussion

The total time elapsed before an error is signalled when trying to send a messageusing any of these solutions is determined by the RTO and the number of retrans-missions made. The RUDP implementation allows both these parameters to beadjusted while the TCP implementation offers no direct control of the RTO butallows the number of retransmissions to be changed. This is of interest if one wantsthe system to signal errors within a reasonable time. The only way to decrease thetime before and error is signalled using TCP is to adjust the number of retrans-missions to make, which will dramatically decrease the total time because of theexponential backoff seen in action in Table 6.2. Needless to say, RUDP offers morefine grained control of this condition.

One has to remember that the performance of TCP and especially the parame-ters available to tweak the performance are highly dependant on which implemen-tation of TCP that is used. All results presented above are specific to the featuredplatform and other results may be obtained on other platforms.

A way to increase performance in a system experiencing heavy packet loss couldbe to deliberately duplicate each packet sent. This way, the chance of at leastone packet reaching the destination intact is increased. The cost of using such a

41

mechanism would be more traffic on the network caused both by the extra packetsand ACKs being sent in return. There would also be an increased strain on thehosts handling all the extra packets. Packet duplication can easily be added tothe RUDP implementation, but not to the TCP-based solution. Since everythingsent through TCP is part of a stream and if a segment of this stream is lost, theremainder of the stream is not delivered until the lost segment is retransmitted andreceived, effectively blocking any duplicate packets that may be present later on inthe stream.

42

Chapter 7

Conclusion and Recommendation

The aim of this master’s thesis is to present a suitable solution for communicationusing a network for an existing industrial system. Important properties are reliable,message-oriented communication with low latency and resistance to errors causedby electro-magnetic interference from other equipment.

In the study described in Chapter 3, I established that the Fast Ethernet networkchosen for the system is a good choice and when equipped with switches and full-duplex connections will have excellent performance. I also performed a study ofavailable communication solutions for systems of this kind.

During the selection process in Chapter 4, I came to the conclusion that none ofthe technologies under the name of Industrial Ethernet are suitable for this system.A less complex and quite adequate solution is to adjust one of the reliable transportprotocols to suit the requirements. Two different approaches was selected for im-plementation and further testing. The first solution was based on the TransmissionControl Protocol (TCP) and the second one based on the Reliable User DatagramProtocol (RUDP).

Both solutions feature reliable and message-oriented communication. The TCP-based solution was easy to implement since TCP already incorporates almost allof the required features. The solution based on RUDP was more demanding toimplement since RUDP is not readily available, it was still relatively straight forwardthough.

During the evaluation in Chapter 6, it became clear that both solutions arecapable of low-latency communication and that the issue separating the two candi-dates is the ability to quickly respond to packets being lost during communication.Thanks to the fact that RUDP offers more control over the parameters involved inthe error handling mechanism than TCP do, it was possible to adapt the solutionbased on RUDP to a suit the particular system. The RUDP-based solution featuresquicker retransmissions of lost packets and therefore less impact on the delay whendelivering messages in the case of packets being lost due to errors caused by externalinterference, a property which is crucial. Therefore the solution based on RUDP ismy recommendation for the system featured in this project.

43

The downside of the RUDP-based solution is of course that it requires moreeffort to implement, and therefore probably also for testing, than the solution basedon TCP. Quite a large part of the required implementation was made during thisproject but there is still some work to do before it can be used in a critical system.

44

Bibliography

[1] T. Bova and T. Krivoruchka. “Reliable UDP Protocol”. Internet-Draft draft-ietf-sigtran-reliable-udp-00, Internet Engineering Task Force, 1999. Expired,work in progress.

[2] R. Braden. “T/TCP — TCP Extensions for Transactions Functional Specifi-cation”. RFC 1644, Internet Engineering Task Force, 1994

[3] J. S. Chase, A. J. Gallatin and K. G. Yocum. “End system optimizations forhigh-speed TCP”. In IEEE Communications Magazine, volume 39, issue 4,pages 68–74, 2001.

[4] P. Doyle. “Real-Time Ethernet 1 — Introduction to Real-Time Electronic Con-trol Systems”. Industrial Ethernet University.http://www.industrialethernetu.com/courses/401_1.htm,visited 2006-05-30.

[5] P. Doyle. “Real-Time Ethernet 2 — Introduction to Real-Time Solutions Avail-able to Industry”. Industrial Ethernet University.http://www.industrialethernetu.com/courses/402_1.htm,visited 2006-05-30.

[6] B. A. Forouzan. Data Communications and Networking, 3rd edition. McGraw-Hill, New York, 2004.

[7] B. A. Forouzan. TCP/IP Protocol Suite, 3rd edition. McGraw-Hill, New York,2005.

[8] P. Gilfeather and A. B. Maccabe “Making TCP Viable as a High PerformanceComputing Protocol”. In Proceedings of the LACSI Symposium, 2002.

[9] “Internet Protocol”. J. Postel, Editor. RFC 791, Internet Engineering TaskForce, 1981.

[10] S. Iren, P. D. Amer and P. T. Conrad. “The transport layer: tutorial andsurvey”. In ACM Computing Surveys, volume 31, issue 4, pages 360–404, 1999.

[11] V. Jacobson, R. Braden and D. Borman. “TCP Extensions for High Perfor-mance”. RFC 1323, Internet Engineering Task Force, 1992.

45

[12] E. Kohler, M. Handley and S. Floyd. “Designing DCCP: Congestion ControlWithout Reliability”. ICSI Center for Internet Research, 2003.

[13] E. Kohler, M. Handley and S. Floyd. “Datagram Congestion Control Proto-col (DCCP)”. Internet-Draft draft-ietf-dccp-spec-13, Internet Engineering TaskForce, 2005. Work in progress.

[14] K. C. Lee and S. Lee. “Performance evaluation of switched Ethernet for net-worked control systems”. In IEEE Annual Conference of the Industrial Elec-tronics Society, volume 4, pages 3170–3175, 2002.

[15] D. MacDonald and W. Barkley. “Microsoft Windows 2000 TCP/IP Implemen-tation Details”. Microsoft TechNet.http://www.microsoft.com/technet/itsolutions/network/deploy/depovg/tcpip2k.mspx, visited 2006-06-01.

[16] “MODBUS Messaging on TCP/IP Implementation Guide”. Modbus-IDA,2004.

[17] L. Ong and J. Yoakum. “An Introduction to the Stream Control TransmissionProtocol (SCTP)”. RFC 3286, Internet Engineering Task Force, 2002.

[18] C. Partridge. “Implementing the Reliable Data Protocol (RDP)”. In Proceed-ings 1987 Summer USENIX Conference, pages 367–380, 1987.

[19] C. Partridge and R. Hinden. “Version 2 of the Reliable Data Protocol (RDP)”.RFC 1151, Internet Engineering Task Force, 1990.

[20] D. C. Plummer. “An Ethernet Address Resolution Protocol”. RFC 826, Inter-net Engineering Task Force, 1982.

[21] J. Postel. “User Datagram Protocol”. RFC 768, Internet Engineering TaskForce, 1980.

[22] J. Postel. “Internet Control Message Protocol”. RFC 792, Internet EngineeringTask Force, 1981

[23] “Requirements for Internet Hosts — Communication Layers”. R. Braden, Ed-itor. RFC 1122, Internet Engineering Task Force, 1989.

[24] J. Rinaldi. “An Overview of EtherNet/IP: An Application Layer Protocol forIndustrial Automation”. Real Time Automation.http://www.rtaautomation.com/ethernetip/, visited 2006-03-24.

[25] J. Rinaldi. “Modbus TCP Overview: Modbus TCP Unplugged — An intro-duction to Modbus TCP Addressing, Function Codes and Modbus TCP Net-working”. Real Time Automation.http://www.rtaautomation.com/modbustcp/, visited 2006-03-24.

46

[26] S. Schneider, G. Pardo-Castellote and M. Hamilton. “Can Ethernet be RealTime?”. Real-Time Innovations Inc., 1998.

[27] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson. “RTP: A TransportProtocol for Real-Time Applications”. RFC 3550, Internet Engineering TaskForce, 2003.

[28] W. R. Stevens. TCP/IP Illustrated, Volume 1: The Protocols. Addison Wesley,1994.

[29] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I.Rytina, M. Kalla, L. Zhang and V. Paxson. “Stream Control TransmissionProtocol”. RFC 2960, Internet Engineering Task Force, 2000.

[30] A. S. Tanenbaum and M. van Steen. Distributed Systems: Principles andParadigms. Prentice-Hall, New Jersey, 2002.

[31] “Transmission Control Protocol”. J. Postel, Editor. RFC 793, Internet Engi-neering Task Force, 1981.

[32] D. Velten, R. Hinden and J. Sax. “Reliable Data Protocol”. RFC 908, InternetEngineering Task Force, 1984.

47

Appendix A

Acronyms and Abbreviations

ACK AcknowledgmentARP Address Resolution ProtocolCRC Cyclic redundancy checkCSMA/CD Carrier Sense Multiple Access with Collision DetectionDCCP Datagram Congestion Control ProtocolFEC Forward error correctionICMP Internet Control Message ProtocolIP Internet Protocolkbps Kilobit per secondLAN Local area networkMAC Medium access controlMbps Megabit per secondMSS Maximum segment sizeMTU Maximum transfer unitNACK Negative acknowledgmentRDP Reliable Data ProtocolRPC Remote procedure callRTCP Real-time Transport Control ProtocolRTO Retransmission timeoutRTP Real-time Transport ProtocolRTT Round-trip timeRUDP Reliable User Datagram ProtocolSACK Selective acknowledgmentSCTP Stream Control Transmission ProtocolTCP Transmission Control ProtocolT/TCP TCP for transactionsTTL Time to liveUDP User Datagram ProtocolWAN Wide area network

48

TRITA-CSC-E 2007:019 ISRN-KTH/CSC/E--07/019--SE

ISSN-1653-5715

www.kth.se

evaluation of reliable network communication solutions in

Documents