tal lavian advanced technology research, nortel networks pile of selected slides august 18th, 2005
DESCRIPTION
3 New Networking Paradigm >Great vision – LambdaGrid is one step towards this concepts >LambdaGrid – A novel service architecture Lambda as a Scheduled Service Lambda as a prime resource - like storage and computation Change our current systems assumptions Potentially opens new horizon “A global economy designed to waste transistors, power, and silicon area -and conserve bandwidth above all- is breaking apart and reorganizing itself to waste bandwidth and conserve power, silicon area, and transistors.“ George Gilder Telecosm (2000) ☺TRANSCRIPT
Tal [email protected] www.nortel.com/dracAdvanced Technology Research , Nortel Networks
Pile of selected Slides
August 18th, 2005
2
Optical Networks Change the Current Pyramid
George Stix, Scientific American,
January 2001
x10
DWDM- fundamental miss-balance between computation and communication
3
New Networking Paradigm
> Great vision – • LambdaGrid is one step towards this concepts
> LambdaGrid – • A novel service architecture • Lambda as a Scheduled Service • Lambda as a prime resource - like storage and computation• Change our current systems assumptions• Potentially opens new horizon
“A global economy designed to waste transistors, power, and silicon area -and conserve bandwidth above all- is breaking apart and reorganizing itself to waste bandwidth and conserve power, silicon area, and transistors.“ George Gilder Telecosm (2000)
☺
4
The “Network” is a Prime Resource for Large- Scale Distributed System
Integrated SW System Provide the “Glue”Dynamic optical network as a fundamental Grid service in data-intensive Grid application, to
be scheduled, to be managed and coordinated to support collaborative operations
Instrumentation
Person
Storage
Visualization
Network
Computation
5
From Super-computer to Super-network>In the past, computer processors were the fastest
part• peripheral bottlenecks
>In the future optical networks will be the fastest part• Computer, processor, storage, visualization, and
instrumentation - slower "peripherals”> eScience Cyber-infrastructure focuses on computation,
storage, data, analysis, Work Flow. • The network is vital for better eScience
> How can we improve the way of doing eScience?
6
Data Schlepping ScenarioMouse Operation> The “BIRN Workflow” requires moving massive amounts of data:
• The simplest service, just copy from remote DB to local storage in mega-compute site• Copy multi-Terabytes (10-100TB) data • Store first, compute later, not real time, batch model
Mouse network limitations: • Needs to copy ahead of time• L3 networks can’t handle these amounts effectively, predictably, in a short time window • L3 network provides full connectivity -- major bottleneck• Apps optimized to conserve bandwidth and waste storage • Network does not fit the “BIRN Workflow” architecture
7
Limitations of Solutions with Current Network Technology
>The BIRN networking is unpredictable, a major bottleneck, specifically over WAN, limit the type, way, data sizes of the biomedical research, prevents true Grid Virtual Organization (VO) research collaborations
>The network model doesn’t fit the “BIRN Workflow” model, it is not an integral resource of the BIRN Cyber-Infrastructure
8
Problem Statement>Problems
• BIRN Mouse often:• requires interaction and cooperation of resources that are distributed over many heterogeneous systems at many locations;
• requires analyses of large amount of data (order of Terabytes);
• requires the transport of large scale data;• requires sharing of data;• requires to support workflow cooperation model
Q? Do we need a new network abstraction?
9
BIRN Network Limitations> Optimized to conserve bandwidth and waste storage
• Geographically dispersed data • Data can scale up 10-100 times easily
> L3 networks can’t handle multi-terabytes efficiently and cost effectively
> Network does not fit the “BIRN Workflow” architecture• Collaboration and information sharing is hard
> Mega-computation, not possible to move the computation to the data (instead data to the computation site)
> Not interactive research, must first copy then analyze• Analysis locally, but with strong limitations geographically• Don’t know a head of time where the data is
• Can’t navigate the data interactively or in real time• Can’t “Webify” the information of large volumes
> No cooperation/interaction between the storage and network middleware(s)
10
Our proposed Solution
>Switching technology: • Lambda switching for data-intensive transfer
>New abstraction: • Network Resource encapsulated as a Grid service
>New middleware service architecture: LambdaGrid service architecture
11
Our proposed Solution
> We are proposing LambdaGrid Service architecture that interacts with BIRN Cyber-infrastructure, and overcome BIRN data limitations efficiently & effectively by:
• treating the “network” as a primary resource just like “storage” and “computation”• treat the “network” as a “scheduled resource”• rely upon a massive, dynamic transport infrastructure: Dynamic Optical Network
12
Goals of Our Investigation> Explore a new type of infrastructure which manages codependent storage and network
resources
> Explore dynamic wavelength switching, based on new optical technologies
> Explore protocols for managing dynamically provisioned wavelengths
> Encapsulate “optical network resources” into the Grid services framework to support dynamically provisioned, data-intensive transport services
> Explicit representation of future scheduling in the data and network resource management model
> Support a new set of application services that can intelligently schedule/re-schedule/co-schedule resources
> Provide for large scale data transfer among multiple geographically distributed data locations, interconnected by paths with different attributes.
> To provide inexpensive access to advanced computation capabilities and extremely large data sets
13
Example: Lightpath Scheduling
> Request for 1/2 hour between 4:00 and 5:30 on Segment D granted to User W at 4:00
> New request from User X for same segment for 1 hour between 3:30 and 5:00
> Reschedule user W to 4:30; user X to 3:30. Everyone is happy.
Route allocated for a time slot; new request comes in; 1st route can be rescheduled for a later slot within window to accommodate new request
4:30 5:00 5:304:003:30
W
4:30 5:00 5:304:003:30
X
4:30 5:00 5:304:003:30
WX
☺
14
Scheduling Example - Reroute > Request for 1 hour between nodes A and B between 7:00 and 8:30
is granted using Segment X (and other segments) is granted for 7:00
> New request for 2 hours between nodes C and D between 7:00 and 9:30 This route needs to use Segment E to be satisfied
> Reroute the first request to take another path thru the topology to free up Segment E for the 2nd request. Everyone is happy
A
D
B
CX
7:00-8:00
A
D
B
CX
7:00-8:00
Y
Route allocated; new request comes in for a segment in use; 1st route can be altered to use different path to allow 2nd to also be serviced in its time window
☺
15
Generalization and Future Direction for Research> Need to develop and build services on top of the base encapsulation
> LambdaGrid concept can be generalized to other eScience apps which will enable new way of doing scientific research where bandwidth is “infinite”
> The new concept of network as a scheduled grid service presents new and exciting problems for investigation:• New software systems that is optimized to waste bandwidth
• Network, protocols, algorithms, software, architectures, systems• Lambda Distributed File System• The network as a Large Scale Distributed Computing • Resource co/allocation and optimization with storage and computation• Grid system architecture • enables new horizon for network optimization and lambda scheduling• The network as a white box, Optimal scheduling and algorithms
16
Goals> An application drives shares of network resources
• Resources = {bandwidth, security, acceleration, sensors, …}• Within policy-allowed envelopes, end-to-end• No more point-and-click interfaces, no operators’ involved, etc.
> Dynamic provisioning of such resources• JIT, TOD, schedulable• Create alternatives to bandwidth peak-provisioning across MAN/WAN• With a continuum of satisfaction vs. utilization fruition points
> Tame and exploit network diversity• Heterogeneous and independently managed network clouds, end-to-end• Integrated Packet-Optical to best accommodate known traffic patterns
> Network as a 1st class resource in Grid-like constructs• Joins CPU, DATA resources
Improve Network Experience and Optimize CapEx/OpEx
17
What used to be a bit-blasting race
has now become a complex mix of:Bit-blasting+ Finesse (granularity of control)+ Virtualization (access to diverse knobs)+ Resource bundling (network AND …)+ Security (AAA to start)+ Standards for all of the above
= [it’s a lot!]= [it’s a lot!]
18
Enabling new degrees of App/Net coupling
> Optical Packet Hybrid• Steer the herd of elephants to ephemeral optical circuits (few to few)• Mice or individual elephants go through packet technologies (many to many)• Either application-driven or network-sensed; hands-free in either case• Other impedance mismatches being explored (e.g., wireless)
> Application-engaged networks• The application makes itself known to the network• The network recognizes its footprints (via tokens, deep packet inspection)• E.g., storage management applications
> Workflow-engaged networks• Through workflow languages, the network is privy to the overall “flight-plan”• Failure-handling is cognizant of the same• Network services can anticipate the next step, or what-if’s• E.g., healthcare workflows over a distributed hospital enterprise
DRAC - Dynamic Resource Allocation Controller
19
Teamwork
Admin.Application
connectivity plane
virtualization plane
dynamic provisioning plane
Alert, Adapt,Route, Accelerate
Detectsupply
events
eventssupply
AgileNetwork(s)
Application(s)
AAA
NE
from/to peering DRACs
demand
Negotiate
DRAC, portable SW
20
P-CSCF Phys. PCSCF
Session Convergence &
NexusEstablishment
End-to-endPolicy
DRAC Built-inServices
(sampler)
WorkflowLanguage
3rd PartyServices
AAA
Access
Value-AddServices
Sources/Sinks
Topology
Metro
Core
Proxy Proxy ProxyProxyProxy
P-CSCF Phys. P-CSCF Proxy
Grid CommunityScheduler
•smart bandwidth management •Layer x <-> L1 interworking
•Alternate Site Failover
•SLA Monitoring and Verification •Service Discovery•Workflow Language Interpreter
Bird’s eye View of the Service Stack
</DRAC>
<DRAC>
LegacySessions
(Management & Control Planes)
ControlPlane A
ControlPlane B
OAMOAMOAMPOAMOAMOAMPOAMOAMOAMPOAMOAM
OAMOAMOAMOAM
OAMOAM
21
Application ApplicationServices Services Services
SC2004 CONTROL CHALLENGE
data
control
data
control
Chicago Amsterdam
• finesse the control of bandwidth across multiple domains• while exploiting scalability and intra- , inter-domain fault recovery• thru layering of a novel SOA upon legacy control planes and NEs
AAA
DRAC DRACDRAC
AAA AAA AAA
DRAC*
OMNInetODIN Starligh
tNetherligh
t UvA
* Dynamic Resource Allocation Controller
ASTNASTNSNMPSNMP
22
Grid Network Serviceswww.nortel.com/drac
Internet (Slow)
Fiber (FA$T)
Grid Network Serviceswww.nortel.com/drac
GT4GT4
GT4
GT4GT4
GT4
GW05 Floor
AA
AAAA
BB BB
BB
PP8600
PP8600
PP8600
OM3500 OM3500
23
> Several application realms call for hi-touch user/network experiences• Away from point-and-click or 1-800 cumbersome network provisioning• Away from ad-hoc application/network interfaces, and contaminations thereof• Away from gratuitous discontinuities as we cross LAN/MAN/WAN• E.g., Point in time Storage (bandwidth-intensive)• E.g., PACS Retrievals (latency-critical and bandwidth-intensive)• E.g., Industrial design, bio-tech as in Grids (bandwidth- and computation-
intensive)
> DRAC is network middleware of broad applicability• It can run on 3rd party systems or L2-L7 switches• Its core is upward decoupled from the driving application, and downward
decoupled from underlying connectivity (e.g., ASTN vs RPR vs IP/Eth)• It can be extended by 3rd party• It can be applied to workflows and other processes on fixed or wireless
networks
> DRAC fosters managed service bundles and fully aligns with Grid challenges
• It equips Globus/OGSA with a powerful network prong
Conclusion
Back-up
25
e-Science example “Mouse BIRN”
Source: Tal Lavian (UCB, Nortel Labs)
Application Scenario Current MOP Network Issues
Pt – Pt Data Transfer of Multi-TB Data Sets
Copy from remote DB: Takes ~10 days (unpredictable)Store then copy/analyze
Want << 1 day<< 1 hour, innovation for new bio-scienceArchitecture forced to optimize BW utilization at cost of storage
Access multiple remote DB N* Previous Scenario Simultaneous connectivity to multiple sitesMulti-domainDynamic connectivity hard to manageDon’t know next connection needs
Remote instrument access (Radio-telescope)
Cant be done from home research institute
Need fat unidirectional pipesTight QoS requirements (jitter, delay, data loss)
Other Observations:• Not Feasible To Port Computation to Data• Delays Preclude Interactive Research: Copy, Then Analyze• Uncertain Transport Times Force A Sequential Process – Schedule Processing After Data Has Arrived• No cooperation/interaction among Storage, Computation & Network Middlewares•Dynamic network allocation as part of Grid Workflow, allows for new scientific experiments that are not possible with today’s static allocation
26
Bandwidth is nothingwithout control
The Wright Brothers had plenty thrust. Their problem was how to control it
Likewise, we have plenty bandwidth. We must now devise ways to harness it, and expose proper value props, where and when they are needed
27
Why are we here today?New protocols for today’s realities
28
Big Disk Availability> What is the implication of having very large disk?
> What type of usage we will do with abounded disks?
> Currently disk cost is about $700/TB
> What type of applications can we use it?
> Video files – movie 1GB, - 70 cents store, in 5 years – 0.3 cents• How this change the use of personal storage? • type of new things we will store if the disk is so inexpensive?
29
Bandwidth is Becoming Commodity > Price per bit went down by 99% in the last 5 years on the optical side
• This is one of the problems of the current telecom market
> Optical Metro – cheap high bandwidth access• $1000 a month for 100FX (in major cities)• This is less than the cost of T1 several years ago
> Optical Long-Haul and Metro access - change of the price point• Reasonable price drive more users (non residential)
30
Fiber
BundleSingle
Fiber
Technology Composition • L3 routing – drop packets as a mechanism
• (10-3 lose look good)• Circuit switching – set the link a head of time
• Optical networking – bit transmission reliability • (error 10-9 -10-12)
• L3 delay – almost no delay in the optical layers
• Routing protocols are slow – Optics in 50ms• Failure mechanism redundancy
• DWDM s tradeoff- higher bandwidth vs. more s• For agile L1-L2 routing may need to compromise on
bandwidth• RPR – break L3 geographical subnetting
• Dumb Network - Smart Edge? Or opposite?
Fiber
DataNetworking
OpticalTransport
OpticalSwitching
Overlay Networks
SONET
OC-3 / 12 / 48 / 192
STS-NTDM
STS-NcEthernet
VT’sVT’sVT’sVT’s
80M1M
1000M10M
300M500M
31
• An economical, scalable, high-quality network has to be a dumb network.
• Deployment of a dumb network is overdue.
32
Underlay Optical Networks> Problem
Radical mismatch between the optical transmission world and the electrical forwarding/routing world. Currently, a single strand of optical fiber can transmit more bandwidth than the entire Internet core. Mismatch between L3 core capabilities and disk cost. With $2M disks (2PB) can fill the entire core internet for a year
> Approach • Service architecture interacts with the optical control, provides applications a
dedicated, on-demand, point-to-point optical link that is not on the public Internet
> Current Focus • Grid Computing, OGSA, MEMs, 10GE, Optical technologies• OmniNet testbed in Chicago, which will be connected to major national and
international optical networks
> Potential Impact Enabling technology for Data-Intensive applications (multi Terabytes)
33
Service Composition Current peering is mainly in L3. What can be done in L1-L2? • The appearance of optical Access, Metro, and Regional networks • L1-L2: Connectivity Service Composition
• Across administrative domains• Across functionality domain (access, metro, regional, long-haul, under-see)• Across boundaries (management, trust, security, control, technologies)• Peering, Brokering, measurement, scalability
• Appearance of standards UNI – NNI
Access
Provider A
Provider BTrust C
Metro
Technology G
Provider FControl E
Regional
Admin L
Trust TSecurity S
Long Haul
latency P
Bandwidth QResiliency R
Client
Server
34
ASF withLeased pipes
Enterprise
• Very expensive and lengthy provisioning of leased pipes• Maintenance of many PHYs is cumbersome, doesn’t scale• A pipe is a single point of failure, there is no work-around
other than purchasing a companion leased pipe (and wasting good money on it)
HourlyBusiness
Downtime CostsBrokerage operations
$6,450,000Credit card sales authorizations
$2,600,000Pay-Per-View TV
$150,000Home shopping TV
$113,000Catalog sales
$90,000Airline reservations
$90,000Tele-ticket sales
$69,000Package shipping
$28,000ATM fees
$14,500
Data Center A
Data Center B
Do outages cost money??
35
Summary> The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
> Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing
> Globus Toolkit a source of protocol and API definitions, reference implementations
> Current static communication. Next wave dynamic optical VPN
> Some relation to Sahara• Service composition: computation, servers, storage, disk, network…• Sharing, cooperating, peering, brokering…
36
Improvements in Large-Area Networks
> Network vs. computer performance• Computer speed doubles every 18 months• Network speed doubles every 9 months• Difference = order of magnitude per 5 years
> 1986 to 2000• Computers: x 500• Networks: x 340,000
> 2001 to 2010• Computers: x 60• Networks: x 4000
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
37
Log GrowthLog Growth
Processor
PerformanceW
AN/MAN Band
width
100
10,000
1M
2x/9 mo
2x/18 mo
Gilder vs. Moore – Impact on the Future of Computing
10x
1995 1997 1999 2001 2003 2005 2007
10x every 5 years
38
Example: lambdaCAD(CO2 meets Grids)
visualizationcorridor
Interactive 3D visualization
0.8 TB
1.7 TB
1.1 TB
CO2
CO2 CO2
CO2
CO2
“LambdaCAD”—The 5 CO2 instances dynamically pool to negotiate BoD and diversity such that the various terabyte-scale datasets are all guarantee to start flowing into the visualization corridor (B) at the same time. The
corridor only needs a modest circular buffer to stage incoming data awaiting processing
Site C
Headquarters
Site B
“LambdaCAD”—When the user starts a visualization session, the 5 CO2 instances dynamically pool to negotiate BoD and diversity such that the various terabyte-scale datasets are all guarantee to start flowing into the visualization corridor (B) at the same time. The corridor now only needs a modest
circular buffer to stage incoming data awaiting processing (as opposed to requiring petabytes of local storage)
When the user starts a visualization session, the 5 CO2 instances dynamically pool to negotiate BoD and diversity such that the large (TB+) databases can pump queried data (GB+) into the visualization
“crunch” corridor (C) at times t0, t1, tn based on data inter-dependencies. The corridor only needs a circular buffer to stage incoming data awaiting processing (as opposed to petabytes of local, possibly
outdated storage)
Contractor X
Site A
dynamically provisioned tributaries to site B4.2 TB
39
End-to-end Nexus via DRAC
Applications
AAA
DRAC
Topology DB
Applications
AAA
Topology DB
Applications
AAA
Topology DB
Applications
AAA
Topology DB
Boundary Address list
Services Services Services
Customer A Customer B
Topology DB
: Synchronized Topology DB’s, populated by DRAC
: Authentication, Authorization, and AccountingAAA
DRAC DRACDRAC
Network NetworkRPR ASTN
Physical TopServices
40
What happened if:5M Businesses Connected at 100 Mbps = 500Terabit/s
100M Consumers Connected at 5 Mbps = 500Terabit/s
Constant streaming ~ 260 ExaByte a MonthThe current(4/2002) monthly data transfer on the core Internet is 40-70 TB
>
1 Petabit/s
Can the current Internet architecture handle growth of 5-6 orders of magnitude?
What are the technologies and the applications that might drive this growth?
41
Some key folks checking us out at our booth, GlobusWORLD ‘04, Jan ‘04