studies on protocol layer residency in an intelligent network interface unit

Upload: ohm-fendi

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Studies on Protocol Layer Residency in an Intelligent Network Interface Unit

    1/5

    Singapore ICCSllSlTA '9 2

    Studies on Protocol Layer Residencyin an Intelligent Network Interface Unit

    Author(s)N i t S . Bapat

    S.V. Raghavan

    Department of Com puter Science& EngineeringIndian Instituteof Technology, Mad ras, India

    ABSTRACT

    To xploit Ihe high data rates &ered by th e physical media, clhientprotocol processing in mry network node is a must. NetworkInterface Units (NIU) have a slgnllicant role in this mgard.Balanced protocol partitioning and processing In a host machine

    and an NIU, Is a key to achieve faster protocol processing in anetwork node. This paper discusses various issues related topartitioning a protocol stack between a host machine and an NIU.We describe the experiments carried out to understand thepartitioning of the OS1 protocol layers between a host machine andan NIU. OSINET, rrehuorking sofhvarc developed at IIT Madras,is used in the experiments that wrc designed speclticaliy for thispurpose. The impiementatlon detail s of the work along withperformance measurements are presented.

    1.0 Intriiductiiin

    In he networkin~environnient. he data thruughput availahlc to anapplication program has hecn considcrahly lower than t l ic raw datarate ol Ihc undcrlying phy5ic;il nicdium. Prutcicul pruccssingoverhead 121 and lack of proper architectural support I S ] arcconsidered to he the factors causing his inefficiency. We concentrateour study on pratocol processing n a network node. W ith the adventuf hardware technology, the s o callcd 'Intclligcnt NIUs' are fastbcconiing commonplace. Most (if such NlUs have at least one on-hoard processor, that p crfurnis sumc protocol functiuns and controls

    the protoc ol processing hardware. Such an intelligent NI U, form5 amultiprocessor system with the host machine. Protucol processinghas to he properly partitioned between the processors if this system.to achieve optimum performance froni the network node. This paperaddrcses the issues in partitioning of the OS1 protocol layers, in atypical host-NIU system. W e escribe the experiments. carried outusing OSINET protocol software and an intelligent NIU. to supportou r discus&n.

    2.0 Related Work

    Several designs arc proposed fo r a high pcrfiirniancc NIU. that off-loads a host machine hy pcrfurming ccrtiiin tinic critical protocolfunctions in hardware. The designs mnge from the use of protocolspecific hardware, to iniplcnienti ng protocols in silicon. Kan akia el a1151 and Chesson I1 1 have designed protocol specific NIUs, whichperform some transport level functions in hardware. ldou e er a1 141have a dcsign which uses general purpose processor to executeprotocol layers in the NI U. Thcy propose to cxccutc entire OSI stackon the NIU. Giarriz~iieral 31 has a design which cniploys multipleprocessors on the NIU, cxccuting individual layc rl on scparateprtscsstirs. Thcsc designs citn uff .hud co~iipIctc protocul stack fromthe. hmt machine, hut only prototypes or siniuliition studies arercportcd. Al l these cfforts differ in tlic :miiiunt of proiociil lunct ims

    hcing cxcculcd un the NIU . There ;~ppcars tu he no gcncralconsensus, on how niuch protocril ciidc should reside (in the NIU.These designs concentrate on the NIU, and try to niake protocolprocessing on the NIU more efficient. I n h is paper, we are trying tostudy the host machine and the NIU as a single syslcm, to optimiu:the performance of a network node.

    3.0 Typical Ho st-NIU System

    Nl Us on a network can have different types of host interfacemechanisms. A n N I U can have a character I/ O type, a DMA ype, ora shared memory type of interlace w ith the.hos.1 machine. In thecharacter I/O type of intcrl;ice , the interf:ice overhc;id i s Ixirnc by hehost machine. I n the DM A type of interface, the job of carrying outthe DMA transfer generally lies with the NlU. The shared memorytype of interface avoids an extra copy of data and offers least interfaceoverhead. For every mechanism, there is certain overhead of dataexchange across the interlace, which is borne either by the NIU or bythe host.

    Various cases of host -NIU operatin g environments aredepicted in figure 1. Th e host machine ciin run a unitasking or amultitasking operating system. I n a unitasking host, the protocollayers may run as a singlc monolith ic progra m as shown in figures l aand Id , or they may run as multiplc threads (if a single program. as infigures I b and IC . A m ultitasking host will generally run protocollayers as separate processes, as in figures I C and I f. The processor onthe NIU may execute some protocol luncliims i n sirftware, apart fro mcont rolli ng the hardware operations. The protoco l software runni ngon the N I U may run as a single process, as in figures la , l b and IC, rit m:!y r un different laye rs as separate hreads of a single process. as infigures ld . IC and If. Some implementations can also have multipleprocessors on the NIU. The operating environment of the hostdecides the typical processing tinie for p rotocol functions. In aunitasking host, the host processor is mainly utilized for protocolprocessing. so protocol pniccssing time can be fixed. In amultitasking host, the processing tinic will depend on factors likescheduling policy of the host and its processing oad.

    Fur our discussion. we consider a typical N I U architecture asshown in figure 2. Depending on implenientation, sume ofthe blocksof the figure may be void and some may have varying complexity.Protocul processing time on the NI U will depend on i t s executionspeed. and the tmrdwarc resources nv;d;hk.

    Ilcncc. a typical Iio st- NIU system c:m IK pecilicd Iiy the hostpriitocol processing time, th e N I U protocol processing time. and theoverhead at the interface.

    4.0 Issues In Yarlitiening 111U Brolucel Sonware

    I n his section, we discuss the issues that have to be addressed, whenprotocol functions have to be partitione d among the processors of thehost-NIU system.

    1286

    0 1992 IEEE

    ~~

    Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 24, 2009 at 16:21 from IEEE Xplore. Restrictions apply.

  • 8/9/2019 Studies on Protocol Layer Residency in an Intelligent Network Interface Unit

    2/5

    Singapore ICCS/ISITA '92

    Figure

    M o s t Protocol Stack

    NIU protocol stack

    Fi g 1 Monollthlc host stack.

    monollthlc NIU stack

    Most Protocol Stack

    1 mull...zIU protocol stack

    FI g ld. Monollthlc host alack.Multl-thread NIU stack

    4.1 Overhead du eto Protocol Purtilioning

    When al l the protocol layers are runningon a

    1. Host-NIU Operating Environments

    Most Protocol Stack

    1"11 .YII...

    -&IU protocol stackFlp. 1b. Multl-thread hoat stack.

    monolithic NIU stack

    M o r t Protocol Stack

    NIU protocol stack

    Fig. le . Multi-thread host alack,Multl-thread NlU alack

    single processor,passing of c o n k an d daia between them-is easier. -Da;a may bepassed between layers using pointers in shared memory or throughsystem area. The underlying operating system provides suitableprimitives for inter layer communication and synchronization. Whenprotocol stack is partitiuncd. it C X C C U I C Sun separate processors. Itmay involve an addition:il copy of dxta, as data is cxchangcd acrossthe processors. Overheadof this copy is borne by the host processoror by the NIU processor, Synchronizalion between the twoprocessors has to he explicitly done through interrupts, or through

    some polling mechanism. If there arc frequent interruptsfrom th eNIU , then con ten switching overhead can be significant. In thepolling type of synchronization, processor cyclesarc wasted. A n

    improvement in the performanccof th c ovcr a11 system can b esecn.

    only when these additional costs dueto partitioning. arc overcome bythe performance advantages gained by splitting the stack.

    4 3 The Ilosl.NIU Ci~ii iniui~ir ; i t ioi~

    Every NIU usessome local buffer memory.Data packets reside in thisbuffer memory fora hricf transit time. until theyar c processed by theNIU and transmitted, or they arc acccptcd by Ihc host profess an ddiscarded. In character 1 / 0 or DMA type o f interlace with the hort.data packets ar e copied in this memory, proccsscdan d discarded, Inthe shared memory type of interlace, the host and the NIU use th eshared memory for formingan d processing packets. I f either ol h e

    Flours 2 Tyolcd NIU kch i l r lu re

    Most Protocol Stack-

    NIU protocol siack

    FIg 1c. Multl-tasklnp host stack.

    monolllhlc NI U siack

    M o a t Protocol Stack

    NIU protocol stack

    FI g 11. Multl-taaklng host stack.Multl-thread NIU sIaCk

    host or the NIU proccssing strcam is slow, data packets will ge tqueued in this buffer memory. I f the diflcrcnce in their processingtimes is large, morc an d mor c packets willgel queued. At some point01 time, inter-layer flow control will be cnlurccd, or the faster streamwill gct blocked o n uvcrrunning the bulfcr space. I f the buffermemory is suflicicnlly large, then hlocking may nu t be observed. butstill response time for a packcl queued at the interface, willdeteriorate du e to longcr queu e lengths. Th e blocking will removetemporal parallelism bctwccn the host and ihc NIU processes.Balanced protocol processing hy the hostand the NIU will rcduccqueuing time of data packets at the inlcrl;ice, an d will increase thedegree of purallclisnl bctwecn the hurt and the NIU processors.

    4 3 Pre tc~ .o l y e r He i id rncy

    I n t h e l a y e r e d a p p r o a c h to p r o t o c o l p r o c e s si n g . e a c h l a y e rsuccessively acts on a data packet. A s a data packet is processed bythe layers. p r o t c d headers arc ci thcr added or rcmovcd from thedata packet. In a multiprocessor situation (cg. a host L an NIU) ,p e r f o r m a n c e c a n h c i m p r o v e d b y i d c n t i fy i n gsome temporalparallelism in the processing of data packets. Pipelined cxccutionofprotocol functions cnablcs this parallelism. I f a pipeline of datapackets exists, the host and the NIU processors can concurrently acton this pipcline. I f both the processors ar e continuously busy, theadditional cost incurred hy p:irtitinninyof llic protocol layers ca n h eziniiirti/ed over scvcriil uvcrlapping cycle\. The p:Atio ning ofp r < ) l o c d h y e r r 4xmld C I I \ U T C :I nmving pipcl inc o f 'l:tta hctwcen thehost an d the NIU.

    If blocking is cncuuntcrcd at th e hos t -NIU interlace. furtherpackclsexchanged hctwccn thc host :Ind the NIUar e likcly to followa stop-and-wait typeof comniunicaliun, irrcspcctiveof t he amoun t ofbuffer space at the interlace. The blocking will removc temporalparallelism hctwccn th e host and the NIU processors, and the(moving) pipcline (of data packets) will assunie an alternating burstycharacteristic. Balanccd partitioningof the protocol processingloadbetween the host and the NIU. canbe used ID e n s u e a (smoothly!)moving pipclineof data packets.

    1287

    @ 1992 IEEE7Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on Au ust 24 2009 at 16:21 from IEEE X lore. Restrictions a l .

  • 8/9/2019 Studies on Protocol Layer Residency in an Intelligent Network Interface Unit

    3/5

    Singapore ICCSllSlTA '92

    2 L a r g e Trensm8t data megrnenkd

    3 L a r g e r e ce ived data r e e s s e m o l e dt h e r n e t Coproces so r

    5.0 OSlNE Tund ts Arcliltecture 7.0 Implementation uf Protocol Partitlenlng

    OSINET I6 1 is an implementation of a subsct of ISO-OS1 model forLANs. I t has a network kcrnel which providcs the hasic networkingsupport, and User Applications that ru n over the kernel. The

    network kernel consists of lower six layers of the OS1 referencemodel. The MA C nd the LLC type I ayers form the data link layer.Network layer is null. Transport class 4 is implemented to providereliable connection oriented service. The session layer supportsdialogue management and synchronization. The presentation layerprovides AS Nl encoding and dccoding. The application layerconsists of FTAM, CASE and DASE services. A single control loopschedules execution of each laycr in round robin hshion.

    6.0 llie ntelligent NI U

    An intelligent NIU, PC Link2 Network Interlace Adapter, from IntelInc. was uscd in our work. The hoard is an ethernet interface NIU.providing is2586 ethernet co-processor, iLWlX6 proccssor and 256khytcs local memory. Any arbitrary protoco l software can bedownloaded and executed o n the NIU. The hoard represents ageneral purpose intelligcnt N IU , and hence was sclccted for th eexperiments.

    6.1 The Hos t-NIU Coniniunieation

    The host machine can access entire 25hk hylcs of NIU memory,through an Rk window which is mappcd into its nddrcss space. ThcNI U memory is shared by the hint processor. on-board XOIW and bythe 82586. Access to the memory by the host mxhinc i s through R-bit data path, while the NIU local processors access it through 16-bitdata path. Bccausc of this, and due t o the dual-portedimplementation. access to the NI U memory is somewhat slower forthe host processor, compared to the access to i t s local mcniory. Thehost machine controls the wid ow in g mcchmisiii. an d opcration ofthe NLA, through two control ports mapped in ts IO spacc. The NI Ucommunicates with the hurt machinc through interrupt signals on thestandard PC bus. Apa rt from this, it cannot in tcr lc re with the hostopcratiiin.

    We have performed our work in unitasking host machines runningMS-DOS operating system. The host-NIU operating environmentlooks like the one in figure la . The data and commands areexchanged between the host and the NIU through the sharedmemory on the NIU.

    7.1 Host-NIU Data Exchange

    In OSINE T. data cxchangc across the protocol layers i s done byrcfcrciice, using a buffer passing mcchmism. Two buffer pools areused. The OS INET handles the application layer messages n chunksof Wo o bytes, which is scgmcntcd by the transport laycr in scgnicntsof IW Obytes each. O ne of the buffer p o l s , callcd the session bufferpool, thus has lixcd sized hullers of 2044 bytes each, enough toaccommodate largest application layer message. The other bufferpool, called the transport buffer pool, has buffers of size 1518 byteseach, enough to accommodate a largest cthcrnct frame. Theapplication laycr treats large ;ind small data differently, and copies itinto a buffer from the appropriate buffer pool. This avoids an cnracopy of data. The host and the NIU protoci il partitions use similarbuffer structure. in their own address spiice. The data exchange

    k twecn he host and the N I U takes pi;icc thr ough the buffer p(u)lsdclined in the NIU mcmory spacc. The host protocol partition passesdata to thc N I U partition by writing i t in tu appropriate buffer pool inthe NIU memory. Handshake variables arc uscd, which indicate tothe host protocol partition. as to where the data meant for the NIU isto bc written. This data is proccrsed by th e NIU. and handed over tothe ethernet coprticcsror for transmission on the network. The dataon he network is received by the CO-proccaa~ir nto a transport huffcrpool buffer. This data is processed by the NI U and rcassemhled intoa session buffer p l uffer if necessary. The NIU passes this data tothe host, hy posting the buller number and the buffer pool identityinto the handshake variables. Figure 3 shows the typical data flowbetween the host and the NI U. The NI U offers shared memoryoption. but sti l l we prefer a copy cif data across the intcrlace. This isbecause of the slower access to the NIU memory from the hostmachine, as explained in the earlier section.

    Figure 3. Host-NIU Data Movement

    Memory Area of Host Protocol Procemm

    Eo s t

    M e m o r y

    NI U Memory

    tTrsnsmt l end Receive d e l e

    linked to COProCeSsOr data s l r u c l s1Network

    1288

    0 1992 IEEE

    Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 24, 2009 at 16:21 from IEEE Xplore. Restrictions apply.

  • 8/9/2019 Studies on Protocol Layer Residency in an Intelligent Network Interface Unit

    4/5

    Singapore ICCSllS lTA '92

    73 Organization ort l ie N I U Sonw;rre

    The software running in th e NI U is orga niicd into following threemodules.1. Host In l e r hcs Modu le :Th i s module handles interaction with thehost machine. A corresponding moduleruns on the host machine.2. OS1 Protocol Softwure : This module consists of the protocollaycrs of the OSIN ET software.3. LAN I n t e r f u c e M o d u l e : hih module consists of software thathandles the ethernet co-processor.

    8.0 Performance Studies

    The experiments to measure performance of a network nodewerecarried out on IBM PC XT an d PC AT compatible machines, runningMS-DOS perat ing system. The machines were connectedon a 10Mbps ethernet. which forms the backbone of the departmental LANat the Compu ter Science department of IITMadras.

    8.1 Design of the Experinients

    W c intend to study the protocol layer partitioningin dillcrunt host-NIU systems and verify that the optimum partitioning varies fromon ehost-NIU systemto another. An 8 M Hz PC XT host with a n 8 M H zNIU, and a 25 M H z PC AT host with an 8 M H z NIU, provide for twohost-NIU systemsunder study. Throughput offered by the host-NIUsystem was chosen as the performance parameter. A mcmory-lo-memory file transfer pro gram, directly accessing thesession layerseMccs was used for the measurements. We consider the following

    three cases of OS1 layer partitioning.1 Only the mac layer executesin th e NIU.11. Layers upto the transport layerreside in NIU.111. Layers uplo th e session laycr reside in NIU.

    Each case of protocol partitioning represents different loadso n he host an d th e NI U processors. To compute the processing loadon each processor, we mcasurcd typical processing time for theindividual protocol layerin each of the host and in the NIU, and theoverhead at the interface. With these timings, partitioning of theprotocol processing load was calculated. Throughput offered by thehost-NIU systemwas measured for application messagesizes of 16 to1024 ytes.

    8 3 Observations

    By partitioning the stack we arc incurring additionalEOSI of a datacopy across the interface. W e are using a general purp ose"I, hichis not faster than the host machine. T he NIUdoes not pcrform anyfunction in hardware and so complete protocol is executed insoftware on the NIU processor. In such ad environment performancegain is expected only due to parallel operation of the host and theNIU processors.

    Figure 4 hows the relative performance obtaincd in the threecases of protocol partitioning. In each host-NIU case, throughputobtained fo r app l ip t ion messagesize of lG bytes with only mac layerexecuting on the NIU. is treatedas unity, and throughput obtained inother cases are plotted with referenceto this throughpu t. Thus, thisratio plot gives the relative performance of the host-NIU system forvarious application messagesizes. In th e 8 M H z host machine, forcase I of layer partitioning, the host proc essor hasabout 80% of theprocessing load. The NIU processor remainsunder loaded and sothere is lesser degree of parallelism between the two processors. Incase 11, the host shares about 46% of the processing load, whileincase 111. the host ha s about 44% of the processing load. In dataexchange stage of a connection, session layer has minimal tasks.soEdseS 1 an d 111 reflect similar partitioning of the processin g load.In

    Figure 4Comparative Throughput

    10 100 1000

    Message size (Bytes)- T, S e s s N I U + T. Tr a n s N I U

    -T.MacNIU

    + T , S e s s N I U ++ X T. T r a n s N I U -3- X T. M a c N I U

    AT: 2 5 M h z h o s t , X T: 8 M h z h o s t m a c h i n e

    1289

    0 992 IEEE7~-

  • 8/9/2019 Studies on Protocol Layer Residency in an Intelligent Network Interface Unit

    5/5

    Singapore ICCS/IS ITA '92

    these cases, protocol partitions ar e well balanced in termsof theirexecution times. No blockingof the host process is observed at theinterface. nie liighcr degree ofparu//elistii berwecii (lie liosr and rlirN / U p m e s s u r smsrr/ls iiiro higher rhroirglipirr.

    In th e 25 MH z host machine, the protocol layers arc migratedfrom a faster host machine to a slower NIU. Case I of protocolpartitioning represents46% load on the host. whilecase I I an d case111 represent 23% and 21% load on the host machine respectively.Thus, case I reflects balanced partitioningof the protocol processingload. Fo r this case, no blocking was observedat the host-NIUinterface. This results in higher throughput than the other twocasesof partitioning. host process having lessprocessing load, queues up packets at the interlace at a faster ratethan the NIU can process them. Hence.it runs ou t of the bufferspace in the NIU and gets blocked at the interfacefor every packettransferred to the NIU. Thus, the host-N IU communication reducesto a stop-and-wait typeof protocol. T he hostprocess ha s to wait for abuffer and throughput of the systcm drops down compared to thatobtained for case 1. These two cases emphasize the loss inperformance caused by blockingof the host process. The blockingcan bc avoided by balancing the protocol processing load.

    The performancc gain observed du eto balanced protocol loadpartitioning, is available lor all sizes of application message. Ourexperimental results show th:it, by appropriat e partitioningofprotocol layers, temporal parallelism between the host and the NIUprocessors can bc exploited and thruughpulof the network node canbe improved. For the 8 MH z host machine.up to 4 times increase inthroughput is obtained dueto b:ilanccd sharingof protocol load ofcases I I an d 111. Fo r the 25 MH z host machine, the blockingat th e

    interface observed in casesII

    an d 111 of layer partitioning, r educes th ethroughput by a factor of 2 compared t o the throughput obtained incase 1.prolocol /ayer resideticyfur diflemtif host-NlUpairs.

    In cases I I an d 111,

    Balaticiiig 111e prot ocol processiil&! loud srrgesls diflemiit

    Th e advantages du eto balanced protocul load sharing holdgood fo ra typical host-NIU pairo f a network node. O ur implementation useda general purpose NIU which gave pcrforniance gain by suitablyoff-loading the host machine. Upto four times improvement inthroughput of a host-N IU system was obtained. by exploitingtemporal parallelism bctwecn the host and the NIU processors.Imbalance in partitioning of prutocol processing load deterioratedthe performance bya factor of wo in another host-NIU system.

    In a high pcrfurmancc NIU, migrating the protocol functionsto the eff icient N IU maygive better results. even in theabsence ofany parallelism. But the perform ance may be further improved if thehost and the NIU processors exhibit higher degreeof parallelism.Balancing the host and the NIU protocol processing times, will givedifferent protocol layer rcsidcncy solutions in different host-NIUpairs. This is being investigated further.

    Hrrcrencrs

    [ I ] G. Chcsson, "XTP /PE Design Considerations', inProfocds o rHi811 Specd Ncrwurks, H. Rudin & R. Williamson (Eds), ElsevierScicncc Puhl. 1989, pp 27-33.

    [ 2 ] D. Clark er U/ , "Architectural Considerations for a NewGenerat ion ol Protocols". in Proc. 01ACAI SIGCOAIAI 'YO, l W , pp2lXl-2IM.

    [ 3 ] D . ( i i a r r i z y o el a / , ' H i g h S p e e d P a r a l l e l P r o t o c o lImplcmcntation", inPr