1111 reliable network/service infrastructures. 222 availability, reliability and survivability...

11
1 Reliable Network/Service Infrastructures

Upload: shanon-barber

Post on 30-Dec-2015

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

1111

Reliable Network/Service Infrastructures

Page 2: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

222

Availability, Reliability and Survivability

Availability Reliability Survivability

• The expected ratio of the system uptime to total elapsed time

• Empirical factor

• The probability of the system keep being available (not fail) over certain period of time.

• Empirical factor

• The capability of the system to continue its operation and fulfill its mission in a full or limited scale during failure

• Probabilistic

– Expected time between failures

– Expected time to recover

• Probabilistic

– Expected time between failures

• Non-probabilistic

– Assumes explicit failures of different span and magnitude

MTTRMTBF

MTBF

A

interval time ,MTBF

1

)(

t

tetR

Page 3: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

333

Availability Downtime per Year (24x7x365)

99.000% 3 Days 15 Hours 36 Minutes

99.500% 1 Day 19 Hours 48 Minutes

99.900% 8 Hours 46 Minutes

99.950% 4 Hours 23 Minutes

99.990% 53 Minutes

99.999% 5 Minutes

99.9999% 30 Seconds

What Is “High Availability”?

• The ability to define, achieve, and sustain “target availability objectives” across services and/or technologies supported in the network that align with the objectives of the business (i.e. 99.9%, 99.99%, 99.999%)

Page 4: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

444

Leading Causes of Downtime

SOURCE: Graph Data: The Yankee Group, The Road to a Five Nines Network, Feb 2004.

• Change management

• Process consistency

• Communications

• Links

• Hardware Failure

• Design

• Environmental issues

• Natural disasters

Telco/ISP35%

Power Failure14%

Human Error 31%

Hardware Failure

12%

Unresolved 8%

Page 5: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

555

Link/Circuit Diversity

Enterprise

THIS

Enterprise

THIS, which Is Better Than…Service ProviderNetwork

But what is beyond this???

Enterprise

THIS Is Better Than…

Page 6: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

666

Network Point of Presence/Data Center

• Cable management

• Power: Diversity/UPS

• HVAC

• Hardware placement

• Physical security

• Labeling

• Environmental control systems

666

Page 7: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

777

Technology Can Increase MTBF

People, Process, and Politics Can Increase Complexity

THIS DECREASES MTBF and Increases MTTR

Network Complexity

Network Design

Page 8: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

888

Network Design

• Hierarchical

• Modular and consistent

• Scalable

• Manageable

• Reduced failure

• Domain (Layer II/III)

• Interoperability

• Performance

• Availability

• Security

Primary Design Considerations

Page 9: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

999

Examples of Hardware Reliability(Reliability Block Diagrams)

Hardware Reliability = 99.938% with 4 Hour MTTR (325 Minutes/Year)

Hardware Reliability = 99.961% with 4 Hour MTTR (204 Minutes/Year)

Hardware Reliability = 99.9999% with 4 Hour MTTR (30 Seconds/Year)

Page 10: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

101010

Network Availability Calculation

Router R1, R2, R3 and R4

MTBF = 16000 Hours

MTTR = 24 Hours

Router Availability R1, R2, R3 and R4

16000/(16000+24) = 0.9985

Can Include Hardware + Software

Components

1Availability of R1, R2 in Parallel with R3, R4

= 1 - ((1-0.997)(1 - 0.997)) = 0.99999104

3

Availability of R1, R2 and R3, R4 in

Series = (0.99850.9985) = 0.997006

2 Network Availability = 99.999%

Only Base on Device Availability

Values; Link Availability Not Included

4

R1

R4R3

R2

Page 11: 1111 Reliable Network/Service Infrastructures. 222 Availability, Reliability and Survivability AvailabilityReliabilitySurvivability The expected ratio

111111

High Availability - Layered Approach

Application Level Resiliency

Redundant Processors (RP), Switch Fabric, Line Cards, Ports, Power, CoPP, ISSU, Config Rollback

Circuits, SONET APS, RPR, DWDM, Etherchannel,802.1d, 802.1w, 802.1s, PVST+,Portfast, BPDU guard,PagP, LacP,UDLD, Stackwise technology, PPP,

NSF/SSO,HSRP, VRRP, GLBP, IP Event Dampening , Graceful Restart (GR): BGP, ISIS, OSPF, EIGRP, OER, BGP multipath, fast polling, MARP, incremental SPF

Global Server Load Balancing and positioning Gateways, gatekeepers, SIP servers, DB servers

Protocol Level Resiliency

Transport/Link Level Resiliency

Device Level Resiliency