building network-centric systems liviu iftode

63
Building Network-Centric Building Network-Centric Systems Systems Liviu Iftode Liviu Iftode

Upload: keegan-moody

Post on 02-Jan-2016

46 views

Category:

Documents


5 download

DESCRIPTION

Building Network-Centric Systems Liviu Iftode. Before WWW, people were happy. E-mail, Telnet. Mostly local computing Occasional TCP/IP networking with low expectations and mostly non-interactive traffic local area networks: file server (NFS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building Network-Centric Systems Liviu Iftode

Building Network-Centric Building Network-Centric SystemsSystems

Liviu IftodeLiviu Iftode

Page 2: Building Network-Centric Systems Liviu Iftode

Before WWW, people were Before WWW, people were happy...happy...

Mostly local computingMostly local computing Occasional TCP/IP networking with low expectations and mostly non-Occasional TCP/IP networking with low expectations and mostly non-

interactive trafficinteractive traffic local area networks: file server (NFS)local area networks: file server (NFS) wide area networks -Internet- : E-mail, Telnet, Ftpwide area networks -Internet- : E-mail, Telnet, Ftp

Networking was not a major concern for the OSNetworking was not a major concern for the OS

NFSCS.umd.EDU CS.rutgers.EDU

TCP/IP

TCP/IP

E-mail, Telnet

Emacs

Page 3: Building Network-Centric Systems Liviu Iftode

One Exception: Cluster One Exception: Cluster ComputingComputing

Cost-effective solution for high-performance Cost-effective solution for high-performance distributed computingdistributed computing

TCP/IP networking was the headache TCP/IP networking was the headache large software overheadslarge software overheads

Software DSM not a network-centric system :-(Software DSM not a network-centric system :-(

Multicomputers Clusters of computers

Page 4: Building Network-Centric Systems Liviu Iftode

The Great WWW ChallengeThe Great WWW Challenge

World Wide Web made access over the Internet easyWorld Wide Web made access over the Internet easy Internet became commercialInternet became commercial Dramatic increase of interactive traffic Dramatic increase of interactive traffic WWW networking creates a WWW networking creates a network-centric systemnetwork-centric system: Internet server : Internet server

performance: service more network clientsperformance: service more network clients availability: be accessible all the time over the networkavailability: be accessible all the time over the network security: protect resources against network attackssecurity: protect resources against network attacks

Bank.com

TCP/IP

http://www.Bank.comWeb Browsing

Page 5: Building Network-Centric Systems Liviu Iftode

Network-Centric SystemsNetwork-Centric Systems

Networking dominates the operating systemNetworking dominates the operating system

Mobile SystemsMobile Systems mobility aware TCP/IP (Mobile IP, I-TCP, etc), disconnected file systems (Coda), mobility aware TCP/IP (Mobile IP, I-TCP, etc), disconnected file systems (Coda),

adaptation-aware applications for mobility(Odyssey), etcadaptation-aware applications for mobility(Odyssey), etc Internet ServersInternet Servers

resource allocation (Lazy Receive Processing, Resource Containers), OS shortcuts resource allocation (Lazy Receive Processing, Resource Containers), OS shortcuts (Scout, IO-Lite), etc(Scout, IO-Lite), etc

Pervasive/Ubiquitous SystemsPervasive/Ubiquitous Systems Tiny OS , sensor networks (Directed Diffusion, etc), programmability (One World, etc)Tiny OS , sensor networks (Directed Diffusion, etc), programmability (One World, etc)

Storage NetworkingStorage Networking network-attached storage (NASD, etc), peer-to-peer systems (Oceanstore, etc), network-attached storage (NASD, etc), peer-to-peer systems (Oceanstore, etc),

secure file systems (SFS, Farsite), etcsecure file systems (SFS, Farsite), etc

Page 6: Building Network-Centric Systems Liviu Iftode

Big PictureBig Picture

Research sparked by various OS-Networking Research sparked by various OS-Networking tensionstensions

Shift of focus from Performance to Availability Shift of focus from Performance to Availability and Manageabilityand Manageability

Networking and Storage I/O Convergence Networking and Storage I/O Convergence Server-based and serverless systems Server-based and serverless systems TCP/IP and non-TCP/IP protocolsTCP/IP and non-TCP/IP protocols Local area, wide-area, ad-hoc and Local area, wide-area, ad-hoc and

application/overlay networksapplication/overlay networks Significant interest from industrySignificant interest from industry

Page 7: Building Network-Centric Systems Liviu Iftode

OutlineOutline

TCP ServersTCP Servers Migratory-TCP and Service ContinuationsMigratory-TCP and Service Continuations Cooperative Computing, Smart Messages and Cooperative Computing, Smart Messages and

Spatial ProgrammingSpatial Programming Federated File SystemsFederated File Systems Talk Highlights and ConclusionsTalk Highlights and Conclusions

Page 8: Building Network-Centric Systems Liviu Iftode

Network Processing

71%

Other system calls9%

User space20%

Problem 1: TCP/IP is too ExpensiveProblem 1: TCP/IP is too Expensive

Breakdown of the CPU time for Apache (uniprocessor based Web-server)

Page 9: Building Network-Centric Systems Liviu Iftode

Traditional Send/Receive Traditional Send/Receive CommunicationCommunication

sender receiver

App OS AppOS

send(a)

receive(b)

copy(a,send_buf)

DMA(send_buf,NIC)

send_buf is transferred interrupt

DMA(NIC,recv_buf)

copy(recv_buf,b)

NIC NIC

Page 10: Building Network-Centric Systems Liviu Iftode

TCP Send45%

Other system calls9%

User space20%

TCP Receive 7%

IP Send0%

IP Receive0%

Software Interrupt Processing

11%

Hardware Interrupt Processing

8%

A Closer LookA Closer Look

Page 11: Building Network-Centric Systems Liviu Iftode

Multiprocessor Server Multiprocessor Server Performance Performance Does not Does not ScaleScale

•0

•100

•200

•300

•400

•500

•600

•700

•300 •350 •400 •450 •500 •550 •600 •650 •700 •750

•Offered load (connections/s)

• Th

rou

gh

pu

t (r

equ

ests

/s) Dual Processor

Uniprocessor

Apache Web server 1.3.20 on 1 Way and 2 Way 300MHz Pentium II SMP with repeatedly accessing a static16 KB file

Page 12: Building Network-Centric Systems Liviu Iftode

TCP/IP-Application Co-HabitationTCP/IP-Application Co-Habitation TCP/IP “steals” compute cycles and memory from TCP/IP “steals” compute cycles and memory from

applicationsapplications TCP/IP executes in kernel-mode: mode switching TCP/IP executes in kernel-mode: mode switching

overheadoverhead TCP/IP executes asynchronouslyTCP/IP executes asynchronously

interrupt processing overheadinterrupt processing overhead internal synchronization on multiprocessor servers causes internal synchronization on multiprocessor servers causes

execution serializationexecution serialization Cache pollutionCache pollution Hidden “Service-work” Hidden “Service-work”

TCP packet retransmissionTCP packet retransmission TCP ACK processingTCP ACK processing ARP request serviceARP request service

Extreme cases can compromise server performanceExtreme cases can compromise server performance Receive livelocksReceive livelocks Denial-of-service (DoS) attacksDenial-of-service (DoS) attacks

Page 13: Building Network-Centric Systems Liviu Iftode

Two SolutionsTwo Solutions Replace TCP/IP with a lightweight transport protocolReplace TCP/IP with a lightweight transport protocol Offload some/all of the TCP from host to a dedicated Offload some/all of the TCP from host to a dedicated

computing unit (processor, computer or “intelligent” computing unit (processor, computer or “intelligent” network interface)network interface)

Industry: high-performance, expensive solutionsIndustry: high-performance, expensive solutions Memory-to-Memory Communication: InfiniBandMemory-to-Memory Communication: InfiniBand ““Intelligent” network interface: TCP Offloading Engine(TOE)Intelligent” network interface: TCP Offloading Engine(TOE)

Cost-effective and flexible solutions:Cost-effective and flexible solutions: TCP Servers TCP Servers

Page 14: Building Network-Centric Systems Liviu Iftode

Memory-to-Memory(M-M) Memory-to-Memory(M-M) CommunicationCommunication

OS

NIC

OS

Remote DMA

NIC

MemoryBuffer

TCP/IP

Application

OS

ReceiveSend

Network Interface (NIC)

Sender Receiver

M-M

Page 15: Building Network-Centric Systems Liviu Iftode

Memory-to-Memory Memory-to-Memory Communication Communication

is Non-Intrusiveis Non-Intrusive

Sender: low overhead Receiver: zero overhead

App App

RDMA_Write(a,b)

b is updated

NIC

a transferred into b

NIC

Page 16: Building Network-Centric Systems Liviu Iftode

TCP Server at a GlanceTCP Server at a Glance

A software offloading architecture using existing hardwareA software offloading architecture using existing hardware Basic idea: Basic idea: Dedicate one or more computing units exclusively for Dedicate one or more computing units exclusively for

TCP/IPTCP/IP Compared to TOECompared to TOE

track technology better: latest processorstrack technology better: latest processors flexible: adapt to changing load conditionsflexible: adapt to changing load conditions cost-effective: no extra hardwarecost-effective: no extra hardware

Isolate application computation from network processingIsolate application computation from network processing Eliminate network interrupts and context switchesEliminate network interrupts and context switches Efficient resource allocationEfficient resource allocation Additional performance gains (zero-copy) with extended socket APIAdditional performance gains (zero-copy) with extended socket API

Related workRelated work Very preliminary offloading solutions: Piglet, CSPVery preliminary offloading solutions: Piglet, CSP Socket Direct Protocol, Zero-copy TCPSocket Direct Protocol, Zero-copy TCP

Page 17: Building Network-Centric Systems Liviu Iftode

Two TCP Server ArchitecturesTwo TCP Server Architectures

CPU CPU

Shared Memory

TCP-Server Server Appl

TCP/IP

TCP/IP

TCP-Server Server Appl

M-M

TCP Servers for Multiprocessor ServersTCP Servers for Multiprocessor Servers

TCP Servers for Cluster-based ServersTCP Servers for Cluster-based Servers

Page 18: Building Network-Centric Systems Liviu Iftode

Where to Split TCP/IP Processing? Where to Split TCP/IP Processing? (How much to offload?)(How much to offload?)

SEND

copy_from_application_buffers

TCP_send

IP_send

packet_scheduler

setup_DMA

packet_out

RECEIVE

copy_to_application_buffers

TCP_receive

IP_receive

software_interrupt_handler

interrupt_handler

packet_in

APPLICATION

SYSTEM CALLS

ApplicationProcessors

TCP Servers

Page 19: Building Network-Centric Systems Liviu Iftode

Evaluation TestbedEvaluation Testbed

Multiprocessor ServerMultiprocessor Server 4-Way 550MHz Intel Pentium II system 4-Way 550MHz Intel Pentium II system

running Apache 1.3.20 web server on Linux running Apache 1.3.20 web server on Linux 2.4.9 2.4.9

NIC : 3-Com 996-BT Gigabit Ethernet NIC : 3-Com 996-BT Gigabit Ethernet Used sclients as a client program [Banga 97]Used sclients as a client program [Banga 97]

Page 20: Building Network-Centric Systems Liviu Iftode

Comparative ThroughputComparative Throughput

0

500

1000

1500

2000

2500

3000

3500

Uniprocessor SMP 4 processors SMP - 1 TCPServer

SMP - 2 TCPServers

Th

rou

gh

pu

t (r

eq

ue

sts

/se

c)

Clients issue file requests according to a web server trace

Page 21: Building Network-Centric Systems Liviu Iftode

Adaptive TCP ServersAdaptive TCP Servers

Static TCP Server configuration Static TCP Server configuration Too few TCP Servers can lead to network Too few TCP Servers can lead to network

processing becoming the bottleneckprocessing becoming the bottleneck Too many TCP Servers lead to degradation in Too many TCP Servers lead to degradation in

performance of CPU intensive applicationsperformance of CPU intensive applications Dynamic TCP Server configurationDynamic TCP Server configuration

Monitor the TCP Server queue lengths and Monitor the TCP Server queue lengths and system load system load

Dynamically add or remove TCP Server Dynamically add or remove TCP Server processorsprocessors

Page 22: Building Network-Centric Systems Liviu Iftode

Next Target: The Storage NetworkingNext Target: The Storage Networking

non-TCP/IP solutions require new wiring or non-TCP/IP solutions require new wiring or tunneling over IP-based Ethernet networkstunneling over IP-based Ethernet networks

TCP/IP solutions require TCP offloadingTCP/IP solutions require TCP offloading

TCP Offloading

iSCSI (SCSI over IP)

M-M Communication (InfiniBand)

DAFS (Direct Access File Systems)

TCP or not TCP?

Storage Networking dilemmaStorage Networking dilemma

Page 23: Building Network-Centric Systems Liviu Iftode

Future Work: TCP Servers & iSCSIFuture Work: TCP Servers & iSCSI

Use TCP-Servers to connect to SCSI storage Use TCP-Servers to connect to SCSI storage using iSCSI protocol over TCP/IP networksusing iSCSI protocol over TCP/IP networks

CPU CPU

Shared Memory

TCP/IP

TCP-Server & iSCSIServer Appl

iSCSI

SCSI Storage

Page 24: Building Network-Centric Systems Liviu Iftode

Server vs. Service AvailabilityServer vs. Service Availability client interested in Service availabilityclient interested in Service availability

Adverse conditions may affect service availabilityAdverse conditions may affect service availability internetwork congestion or failureinternetwork congestion or failure servers overloaded, failed or under DoS attackservers overloaded, failed or under DoS attack

TCP has one responseTCP has one response network delays => packet loss => retransmissionnetwork delays => packet loss => retransmission

TCP limits the OS solutions for service availabilityTCP limits the OS solutions for service availability early binding of service to early binding of service to aa server server client cannot switch to another server for sustained client cannot switch to another server for sustained

service after the connection is establishedservice after the connection is established

Problem 2: TCP/IP is too RigidProblem 2: TCP/IP is too Rigid

Page 25: Building Network-Centric Systems Liviu Iftode

Service Availability through MigrationService Availability through Migration

Client

Server 1

Server 2

Page 26: Building Network-Centric Systems Liviu Iftode

Migratory TCP at a GlanceMigratory TCP at a Glance

Migratory TCP migrates live connections among Migratory TCP migrates live connections among cooperative serverscooperative servers

Migration mechanism is Migration mechanism is genericgeneric (not application specific) (not application specific) lightweightlightweight (fine-grained migration) and (fine-grained migration) and low-latencylow-latency

Migration triggered by client or serverMigration triggered by client or server Servers can be geographically distributed (different IP Servers can be geographically distributed (different IP

addresses)addresses) Requires changes to the server application Requires changes to the server application Totally transparent to the client applicationTotally transparent to the client application Interoperates with existing TCPInteroperates with existing TCP Migration policies decoupled from migration mechanismMigration policies decoupled from migration mechanism

Page 27: Building Network-Centric Systems Liviu Iftode

Basic Idea: Fine-Grained State Basic Idea: Fine-Grained State MigrationMigration

Client

Server1 Process

Application state

Connection state

Server2 Process

C1 C2 C3 C4

C5 C6

C2

Page 28: Building Network-Centric Systems Liviu Iftode

Migratory-TCP (Lazy) ProtocolMigratory-TCP (Lazy) Protocol

Connect

(0)

C’

< S

tat e

Requ

est

> (

2)

< S

tate

Reply

> (

3)

Client

Server 1

Server 2

Migration Request (1)Migration Accept(4)

Page 29: Building Network-Centric Systems Liviu Iftode

Non-Intrusive MigrationNon-Intrusive Migration

Migrate state without involving old-server application Migrate state without involving old-server application (only old server OS)(only old server OS)

Old server exportsOld server exports per-connection state periodically per-connection state periodically Connection state and Application state can go out of syncConnection state and Application state can go out of sync Upon migration, Upon migration, new server importsnew server imports the last exported the last exported

state of the migrated connectionstate of the migrated connection OS uses connection state to synchronize with applicationOS uses connection state to synchronize with application Non-intrusive migration with M-M communicationNon-intrusive migration with M-M communication

uses RDMA read to extract state from the old server with zero-uses RDMA read to extract state from the old server with zero-overheadoverhead

works even when the old server is overloaded or frozenworks even when the old server is overloaded or frozen

Page 30: Building Network-Centric Systems Liviu Iftode

Service Continuation (SC) Service Continuation (SC)

Connection state

Back-End Server Process1

Back-End Server Process2

exportedstate

exported state

pipe pipe

Front-End Server Process

exportedstate

socket

SC

Pipe state Pipe state

sc= create_cont(C1);

p1=pipe();

associate(sc,p1);

fork_exec(Process1);

….

export(sc,state)

sc= open_cont(p1);

export(sc, state)

sc= open_cont(p2);

….

export(sc,state)

SCAPI

Page 31: Building Network-Centric Systems Liviu Iftode

Related WorkRelated Work

Process migration: Sprite [Douglis ‘91], Locus Process migration: Sprite [Douglis ‘91], Locus [Walker ‘83], MOSIX [Barak ‘98], etc.[Walker ‘83], MOSIX [Barak ‘98], etc.

VM migration [Rosemblum ‘02, Nieh ‘02]VM migration [Rosemblum ‘02, Nieh ‘02] Migration in web server clusters [Snoeren ‘00, Luo Migration in web server clusters [Snoeren ‘00, Luo

‘01]‘01] Fault-tolerant TCP [Alvisi ‘00] Fault-tolerant TCP [Alvisi ‘00] TCP extensions for host mobility: I-TCP [Bakre ‘95], TCP extensions for host mobility: I-TCP [Bakre ‘95],

Snoop TCP [Balakrishnan ‘95], end-to-end Snoop TCP [Balakrishnan ‘95], end-to-end approaches [Snoeren ‘00], Msocks [Maltz ‘98] approaches [Snoeren ‘00], Msocks [Maltz ‘98]

SCTP (RFC 2960)SCTP (RFC 2960)

Page 32: Building Network-Centric Systems Liviu Iftode

EvaluationEvaluation

Implemented SC and M-TCP in FreeBSD kernelImplemented SC and M-TCP in FreeBSD kernel Integrated SC in real Internet serversIntegrated SC in real Internet servers

web, media streaming, transactional DBweb, media streaming, transactional DB MicrobenchmarkMicrobenchmark

impact of migration on client perceived impact of migration on client perceived throughput for a two-process server using TTCPthroughput for a two-process server using TTCP

Real applicationsReal applications sustain web server throughput under load sustain web server throughput under load

produced by increasing the number of client produced by increasing the number of client connectionsconnections

Page 33: Building Network-Centric Systems Liviu Iftode

Impact of Migration on Impact of Migration on ThroughputThroughput

7,300

7,400

7,500

7,600

7,700

7,800

7,900

8,000

No migration 2 5 10

Migration period (s)

Eff

ec

tiv

e t

hro

ug

hp

ut

(KB

/s)

SC size 1 KBSC size 5 KBSC size 10 KB

Page 34: Building Network-Centric Systems Liviu Iftode

Web Server ThroughputWeb Server Throughput

0

100

200

300

400

500

600

700

800

900

300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 1,700

Offered load (connections/s)

Th

rou

gh

pu

t(re

plie

s/s)

0

2,000

4,000

6,000

8,000

10,000

12,000

Mig

rate

d c

on

nec

tio

ns

Migrated Connections

M-Apache

Apache

Page 35: Building Network-Centric Systems Liviu Iftode

Future Research: Use SC to Build Future Research: Use SC to Build Self-Healing Cluster-based Self-Healing Cluster-based SystemsSystems

SC2

SC3

Page 36: Building Network-Centric Systems Liviu Iftode

Linux CarSensors Linux CameraLinux Watch

Problem 3: Computer Systems move Problem 3: Computer Systems move OutdoorsOutdoors

Massive numbers of computers will be embedded Massive numbers of computers will be embedded everywhere in the physical worldeverywhere in the physical world

Dynamic ad-hoc networkingDynamic ad-hoc networking How to execute user-defined applications over these How to execute user-defined applications over these

networks?networks?

Page 37: Building Network-Centric Systems Liviu Iftode

Outdoor Distributed ComputingOutdoor Distributed Computing Traditional distributed computing has been indoor Traditional distributed computing has been indoor

Target: performance and/or fault toleranceTarget: performance and/or fault tolerance Stable configuration, robust networking (TCP/IP or M-M)Stable configuration, robust networking (TCP/IP or M-M) Relatively small scaleRelatively small scale Functionally equivalent nodesFunctionally equivalent nodes Message passing or shared memory programmingMessage passing or shared memory programming

Outdoor Distributed ComputingOutdoor Distributed Computing Target: Collect/Disseminate distributed data and/or perform collective tasksTarget: Collect/Disseminate distributed data and/or perform collective tasks Volatile nodes and linksVolatile nodes and links Node equivalence determined by their physical properties (content-based Node equivalence determined by their physical properties (content-based

naming)naming) Data migration is not goodData migration is not good

expensive to perform end-to-end transfer controlexpensive to perform end-to-end transfer controltoo rigid for such a dynamic networktoo rigid for such a dynamic network

Page 38: Building Network-Centric Systems Liviu Iftode

Cooperative Computing at a Cooperative Computing at a GlanceGlance

Distributed computing with execution migrationDistributed computing with execution migration Smart MessageSmart Message: carries the execution state (and possibly the : carries the execution state (and possibly the

code) in addition to the payloadcode) in addition to the payload execution state assumed to be small (explicit migration)execution state assumed to be small (explicit migration) code usually cached (few applications)code usually cached (few applications)

Nodes “cooperate” by allowing Smart Messages Nodes “cooperate” by allowing Smart Messages to execute on them to execute on them to use their memory to store “persistent” data (tags)to use their memory to store “persistent” data (tags)

Nodes do not provide routingNodes do not provide routing Smart Message executes on each node of its pathSmart Message executes on each node of its path Application executed on target nodes (nodes of interest)Application executed on target nodes (nodes of interest) Routing executed on each node of the path (Routing executed on each node of the path (self-routingself-routing))

During its lifetime, an application generates at least one, During its lifetime, an application generates at least one, possibly multiple, smart messagespossibly multiple, smart messages

Page 39: Building Network-Centric Systems Liviu Iftode

Smart vs. “Dumb” MessagesSmart vs. “Dumb” Messages

Data migration

Mary’s lunch:

AppetizerEntreeDessert

•`Execution migration

Page 40: Building Network-Centric Systems Liviu Iftode

Smart MessagesSmart Messages

Hot

Hot

Application

do

migrate(Hot_tag,timeout);

Water_tag = ON;

N=N+1;

until (N==3 or timeout);

Routing

migrate(tag,timeout) {

do

if (NextHot_tag)

sys_migrate(NextHot_tag,timeout);

else {

spawn_SM(Route_Discovery,Hot);

block_SM(NextHot_tag,timeout);

until (Hot_tag or timeout); }

0 1 1 1 2 2 3

SM Execution

Hot

Page 41: Building Network-Centric Systems Liviu Iftode

Cooperative Node Architecure Cooperative Node Architecure

AdmissionManager

VirtualMachine

Tag Space

OS & I/O

SM Arrival SM Migration

Scheduling

Admission control for resource securityAdmission control for resource security Non-preemptive scheduling with timeout-killNon-preemptive scheduling with timeout-kill Tags created by SMs (limited lifetime) or I/O tags (permanent)Tags created by SMs (limited lifetime) or I/O tags (permanent)

global tag name space {hash(SM code), tag name}global tag name space {hash(SM code), tag name} five protection domains defined using hash(SM code), SM source node ID, five protection domains defined using hash(SM code), SM source node ID,

and SM starting time.and SM starting time.

Page 42: Building Network-Centric Systems Liviu Iftode

Related WorkRelated Work

Mobile agents (D’Agents, Ajanta)Mobile agents (D’Agents, Ajanta)

Active networks (ANTS, SNAP)Active networks (ANTS, SNAP)

Sensor networks (Diffusion, TinyOS, TAG)Sensor networks (Diffusion, TinyOS, TAG)

Pervasive computing (One.world)Pervasive computing (One.world)

Page 43: Building Network-Centric Systems Liviu Iftode

8 HP iPAQs running Linux8 HP iPAQs running Linux

802.11 wireless 802.11 wireless

communicationcommunication

Sun Java K Virtual MachineSun Java K Virtual Machine

Geographic (simplified GPSR) Geographic (simplified GPSR)

and On-Demand (AODV) and On-Demand (AODV)

routingrouting

Prototype ImplementationPrototype Implementation

user node intermediate node node of interest

Completion Time

Routing algorithm Code not cached (ms) Code cached (ms)

Geographic (GPSR)On-demand (AODV)

415.6 126.6

506.6 314.7

Page 44: Building Network-Centric Systems Liviu Iftode

There is no best routing outdoorsThere is no best routing outdoors Depends on application and node property Depends on application and node property

dynamicsdynamics Application-controlled routingApplication-controlled routing

Possible with Smart Messages (execution state Possible with Smart Messages (execution state carried in the message)carried in the message)

When migration times out, the application is When migration times out, the application is upcalled on the current node to decide what to upcalled on the current node to decide what to do nextdo next

Self-RoutingSelf-Routing

Page 45: Building Network-Centric Systems Liviu Iftode

• geographical routing to reach target regions• on-demand routing within region• application decides when to switch between the two

starting node node of interest other node

Self-Routing Effectiveness Self-Routing Effectiveness (simulation)(simulation)

Page 46: Building Network-Centric Systems Liviu Iftode

Next Target: Spatial Next Target: Spatial ProgrammingProgramming

Smart Message: too low-level programmingSmart Message: too low-level programming How to describe distributed computing over dynamic How to describe distributed computing over dynamic

outdoor networks of embedded systems with limited outdoor networks of embedded systems with limited knowledge about resource number, location, etcknowledge about resource number, location, etc

Spatial ProgrammingSpatial Programming (SP) design guidelines: (SP) design guidelines: space is a first-order programming concept space is a first-order programming concept resources named by their expected location and properties (spatial resources named by their expected location and properties (spatial

reference) reference) reference consistency: spatial reference-to- resource mappings are reference consistency: spatial reference-to- resource mappings are

consistent throughout the programconsistent throughout the program program must tolerate resource dynamicsprogram must tolerate resource dynamics

SP can be implemented using Smart Messages (the spatial SP can be implemented using Smart Messages (the spatial reference mapping table carried as payload)reference mapping table carried as payload)

Page 47: Building Network-Centric Systems Liviu Iftode

Spatial Programming ExampleSpatial Programming Example

Program sprinklers to water the hottest spot of the Left HillProgram sprinklers to water the hottest spot of the Left Hill

Mobile sprinklers withtemperature sensorsLeft Hill Right Hill

Hot spot

for(i=0;i<10;i++) What if <10 hot spots ?

if {Left_Hill:Hot}[i].temp > Max_temp

Max_temp = {Left_Hill:Hot[I]}.temp;

id = i;

{Left_Hill:Hot}[id].water = ON; Spatial Reference consistency

Spatial Reference for Hot spots on Left Hill

Page 48: Building Network-Centric Systems Liviu Iftode

Problem 4: Manageable Distributed Problem 4: Manageable Distributed File File SystemsSystems

Most distributed file servers use TCP/IP both for client-server and intra-server communicationMost distributed file servers use TCP/IP both for client-server and intra-server communication Strong file consistency, file locking and load balancing: difficult to provideStrong file consistency, file locking and load balancing: difficult to provide File servers require significant human effort to manage: add storage, move directories, etcFile servers require significant human effort to manage: add storage, move directories, etc Cluster-based file servers are cost-effective Cluster-based file servers are cost-effective Scalable performance requires load balancingScalable performance requires load balancing

Load balancing may require file migration Load balancing may require file migration File migration limited if file naming is location-dependentFile migration limited if file naming is location-dependent

We need a scalable, location-independent and easy to manage cluster-based distributed file We need a scalable, location-independent and easy to manage cluster-based distributed file systemsystem

Page 49: Building Network-Centric Systems Liviu Iftode

Federated File System at a GlanceFederated File System at a Glance

Global file name space over cluster of autonomous Global file name space over cluster of autonomous local file systems interconnected by a local file systems interconnected by a M-M networkM-M network

FedFS

LocalFS

M-M Interconnect

LocalFS

LocalFS

LocalFS

A1A2 A2 A3 A3 A3

FedFS

Page 50: Building Network-Centric Systems Liviu Iftode

Location Independent Global File Location Independent Global File NamingNaming

Virtual Directory (VD): union of local directories Virtual Directory (VD): union of local directories volatile, created on demand (volatile, created on demand (dirmergedirmerge)) contains information about files including location (homes of files)contains information about files including location (homes of files) assigned dynamically to nodes (managers) assigned dynamically to nodes (managers) supports location independent file naming and file migrationsupports location independent file naming and file migration

Directory Tables (DT): local caches of VD entries (~TLB)Directory Tables (DT): local caches of VD entries (~TLB)

usr

file1

usr

file2

usr

file1 file2

virtual directory

local directories

Local file system 1

Local file system 2

Page 51: Building Network-Centric Systems Liviu Iftode

Direct Access File System (DAFS)Direct Access File System (DAFS)

Page 52: Building Network-Centric Systems Liviu Iftode

Federated DAFS Federated DAFS

NFS Server

Local FSM-MFedFS

ApplicationNFS Client

+Application

NFS Client

ApplicationNFS Client

NFS Server

Local FSM-MFedFS

NFS Server

Local FSM-MFedFS

Distributed NFS over FedFS

TCP/IP M-M

Application

M-M

DAFS Client DAFS ServerLocal FSM-M

Direct Access FS

(DAFS)

M-M

DAFS Server

Local FSM-MFedFS

Application

M-MDAFS Client

Application

M-MDAFS Client

Application

M-MDAFS Client

DAFS Server

Local FSM-MFedFS

DAFS Server

Local FSM-MFedFS

Federated DAFS

M-M M-M

TCP/IP

TCP/IP

TCP/IP

Page 53: Building Network-Centric Systems Liviu Iftode

Related WorkRelated Work

Cluster-based File SystemsCluster-based File Systems Frangipani[Thekkath’97], PVFS Frangipani[Thekkath’97], PVFS

[Carns’00],GFS, Archipelago [JI’00], Trapeze [Carns’00],GFS, Archipelago [JI’00], Trapeze (Duke)(Duke)

DAFS [NetApp’03,Magoutis’01,02,03]DAFS [NetApp’03,Magoutis’01,02,03] User-level communication in cluster-based User-level communication in cluster-based

network servers [Carrera’02]network servers [Carrera’02]

Page 54: Building Network-Centric Systems Liviu Iftode

Experimental PlatformExperimental Platform

Eight node server clusterEight node server cluster 800 MHz PIII, 512 MB SDRAM, 9 GB 10K RPM 800 MHz PIII, 512 MB SDRAM, 9 GB 10K RPM

SCSISCSI Client Client

Dual processor (300 MHz PII), 512 MB SDRAMDual processor (300 MHz PII), 512 MB SDRAM Linux-2.4Linux-2.4 Servers and Clients equipped with Emulex cLAN Servers and Clients equipped with Emulex cLAN

adapter (M-M network)adapter (M-M network)

Page 55: Building Network-Centric Systems Liviu Iftode

Workload IWorkload I

Postmark – Synthetic benchmarkPostmark – Synthetic benchmark Short-lived small filesShort-lived small files Mix of metadata-intensive operationsMix of metadata-intensive operations

Postmark outlinePostmark outline Create a pool of filesCreate a pool of files Perform transactions – Perform transactions – READ/WRITEREAD/WRITE paired with paired with CREATE/DELETECREATE/DELETE

Delete created filesDelete created files Each Postmark client performs 30,000 transactionsEach Postmark client performs 30,000 transactions Clients distribute requests to servers using a hash Clients distribute requests to servers using a hash

function on pathnamesfunction on pathnames Files are physically placed on the node which Files are physically placed on the node which

receives client requestsreceives client requests

Page 56: Building Network-Centric Systems Liviu Iftode

Postmark ThroughputPostmark Throughput

•0

•5000

•10000

•15000

•20000

•25000

•30000

•0 •1 •2 •3 •4 •5 •6 •7 •8 •9

•Number of Servers

• Po

stm

ark

Th

rou

gh

pu

t (t

xn

s/se

c)

•File size: 2K

•File size: 4K

•File size: 8K

•File size: 16K

Page 57: Building Network-Centric Systems Liviu Iftode

Workload IIWorkload II

Postmark performs only READ transactionsPostmark performs only READ transactions No No create/deletecreate/delete operations operations Federated DAFS does not control file placementFederated DAFS does not control file placement No client request sent to file’s correct locationNo client request sent to file’s correct location

Page 58: Building Network-Centric Systems Liviu Iftode

Postmark Read ThroughputPostmark Read Throughput

•0

•10000

•20000

•30000

•40000

•50000

•60000

•2 •4

•Number of Servers

• Po

stm

ark

Rea

d T

hro

ug

hp

ut

(txn

s/se

c)

•PostmarkRead

•PostmarkRead - NoCache

Page 59: Building Network-Centric Systems Liviu Iftode

Next Target: Federated DAFS Next Target: Federated DAFS over the Internetover the Internet

DAFS Server

Local FSM-MFedFS

Application

M-MDAFS Client

Application

M-MDAFS Client

Application

M-MDAFS Client

DAFS Server

Local FSM-MFedFS

DAFS Server

Local FSM-MFedFS

Internet

TCP/IP

Page 60: Building Network-Centric Systems Liviu Iftode

OutlineOutline

TCP ServersTCP Servers Migratory-TCP and Service ContinuationsMigratory-TCP and Service Continuations Cooperative Computing, Smart Messages and Cooperative Computing, Smart Messages and

Spatial ProgrammingSpatial Programming Federated File SystemsFederated File Systems Talk Highlights and ConclusionsTalk Highlights and Conclusions

Page 61: Building Network-Centric Systems Liviu Iftode

Talk HighlightsTalk Highlights Back to MigrationBack to Migration

Service Continuation: service availability and self-healing clustersService Continuation: service availability and self-healing clusters Smart Messages: programming dynamic networks of embedded Smart Messages: programming dynamic networks of embedded

systemssystems

Exploit Non-Intrusive M-M CommunicationExploit Non-Intrusive M-M Communication TCP offloadingTCP offloading State migrationState migration Federated file systems Federated file systems

Network and Storage I/O ConvergenceNetwork and Storage I/O Convergence TCP Servers & iSCSITCP Servers & iSCSI Federated File Systems & M-MFederated File Systems & M-M

ProgrammabilityProgrammability Smart Messages and Spatial ProgrammingSmart Messages and Spatial Programming Extended Server API: Service Continuation, TCP Servers, Federated file Extended Server API: Service Continuation, TCP Servers, Federated file

systemsystem

Page 62: Building Network-Centric Systems Liviu Iftode

ConclusionsConclusions

Network-Centric Systems: very promising border-crossing Network-Centric Systems: very promising border-crossing systems research areasystems research area

Common issues for a large spectrum of systems and Common issues for a large spectrum of systems and networksnetworks

Tremendous potential to impact industryTremendous potential to impact industry

Page 63: Building Network-Centric Systems Liviu Iftode

AknowledgementsAknowledgements UMD students: UMD students: Andrzej Kochut, Chunyuan Liao, Tamer Nadeem, Iulian Neamtiu and Jihwang YeoAndrzej Kochut, Chunyuan Liao, Tamer Nadeem, Iulian Neamtiu and Jihwang Yeo..

Rutgers students:Rutgers students: Ashok Arumugam, Kalpana Banerjee, Aniruddha Bohra, Cristian Borcea, Suresh Gopalakrisnan, Ashok Arumugam, Kalpana Banerjee, Aniruddha Bohra, Cristian Borcea, Suresh Gopalakrisnan, Deepa Iyer, Porlin Kang, Vivek Pathak, Murali Rangarajan, Rabita Sarker, Akhilesh Saxena, Steve Smaldone, Kiran Deepa Iyer, Porlin Kang, Vivek Pathak, Murali Rangarajan, Rabita Sarker, Akhilesh Saxena, Steve Smaldone, Kiran Srinivasan, Florin Sultan and Gang Xu.Srinivasan, Florin Sultan and Gang Xu.

Post-doc: Post-doc: Chalermek IntanagonwiwatChalermek Intanagonwiwat

Collaborations at Rutgers:Collaborations at Rutgers: EEL (Ulrich Kremer), DARK (Ricardo Bianchini), PANIC (Rich Martin and Thu Nguyen) EEL (Ulrich Kremer), DARK (Ricardo Bianchini), PANIC (Rich Martin and Thu Nguyen)

Support: Support: NSF ITR ANI-0121416 and CAREER CCR-013366NSF ITR ANI-0121416 and CAREER CCR-013366