multi-lane pmd reliability and partial fault protection...

25
HUAWEI TECHNOLOGIES Co., Ltd. IEEE 802.3ba, January, 2008 Multi-Lane PMD Reliability and Partial Fault Protection (PFP) IEEE 802.3ba Task Force, 23-25 Jan 2008 WB Jiang, Zeng Li, Min Ye, Chiwu Ding Huawei Technologies

Upload: vuongnhu

Post on 05-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

IEEE 802.3ba, January, 2008

Multi-Lane PMD Reliability and Partial Fault Protection (PFP)

IEEE 802.3ba Task Force, 23-25 Jan 2008

WB Jiang, Zeng Li, Min Ye, Chiwu DingHuawei Technologies

Page 2: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 2

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Outline

• Multi-Lane PMD reliability• Current network level protection scheme• Partial fault protection (PFP) proposal• Conclusions

Page 3: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 3

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Leading IEEE 802.3ba PMD Options

• 4x25G xWDM around 1310nm over 10km and 40km SMF– Early products will likely be EML based

• Traditional InGaAsP based cooled EML

• Uncooled InAlGaAs based EML– Cooling required for non-CWDM

– Uncooled DML could be a low cost option if the distance is reduced to less

than 10km to accommodate CWDM

• 4x10G and 10x10G over 100m parallel OM3 MMF– Likely candidate will be oxide VCSEL array

Page 4: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 4

IEEE 802.3ba Task Force, 21-25 Jan, 2008

OE Device Failure Rate Characteristics --- Bathtub Curve

• Early failures: Infant mortality normally eliminated through burn-in• Random failures: constant failure rate occurring throughout the device life

due to accidental reasons, such as mis-handling, ESD, EOS, etc.• Wear-out failures: device aged and close to the end of its life

Page 5: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 5

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Single Mode Fiber PMD

• Traditional InGaAsP/InP technology has long been proven reliable• InAlGaAs/InP technology improves high temperature performance by

better confining carriers in the active region– Al is chemically active and easy to trap oxygen– Exposed cleaved facets tend to rapidly degrade laser operation and facet

degradation is the top cause of failures– Good facet protection is required to reduce early and random/sudden failures– Field deployment history is not long enough and volume not high enough for

high confidence, although lab studies from some vendors have shown good long term reliability potential

• Early deployment will be based on 4 discrete laser components for the four-lane xWDM PMD, and the failure rate will be four times higher

Page 6: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 6

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Multimode Fiber PMD

• 4-channel oxide VCSEL array for 40GE• 10-channel oxide VCSEL array for 100GE• Extension of discrete 10G oxide VCSEL technology

– Smaller oxide confinement aperture than 1-4G oxide VCSELs

Page 7: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 7

IEEE 802.3ba Task Force, 21-25 Jan, 2008

n-GaAs Substrate

n-AlGaAs DBR

p-AlGaAs DBR

n-Au Contact

p-Au Contact

GaAs/AlGaAs MQW Active

AlGaAs oxidized into AlxOy

850 nm laser emission

Current confinement aperture diameter•1-4G VCSEL: 12 – 17 μm•10G VCSEL: 6 – 10 μm

Oxide VCSEL Structure Schematics

Page 8: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 8

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Oxide VCSEL Structure

• A thin layer of AlGaAs (97-98%Al) oxidized into AlxOy @ 400+ °C in water vapor for current confinement

• Mechanical stress introduced due to materials shrinkage after oxidation– Stress increases at lower temperature

– Source of Dark Line Defects (|110>DLD)

– High temperature burn-in not effective in removing early failures due to this cause

• AlxOy introduced in the VCSEL structure calls for hermetic packaging– A cause for random failure for array VCSEL not hermetically packaged

• 10G VCSELs less reliable due to smaller apertures and more susceptible to mechanical stress

• Smaller devices more susceptible to ESD due to smaller capacitance– Latent ESD and point defects as sources for |100> DLD

Page 9: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 9

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Oxide VCSEL Failure Modes --- Wear-out

• Wear-out failure– Failure rate increasing over the time

– Uniform dimming due to point defects generation across active emission area

– MTTF several million hours or longer

– Rarely a concern for VCSEL or VCSEL array• An array behaves similarly to a discrete in wear-out

Page 10: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 10

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Oxide VCSEL Failure Modes --- Random and ESD• Random failure

– Constant failure rate

– Due to process flaws, miss-handling, poor packaging, not hermetically sealed, etc.

– |100> DLD or |110> DLD• Activation energy of 0.7eV normally used for wear-out does not apply

• Lower temperature failure rate could be equivalent to or higher than higher temperature due

to extra mechanical stress (flat or negative activation energy)

– Random nature, follow Gaussian statistics• X10 array will be 10 times more likely to fail than a discrete VCSEL

• ESD failure (HBM, MM, CDM)– Large black spots in EL topography

– Latent ESD can be a source for |100>DLD, thus a cause for random failure

– 10G VCSEL more susceptible to ESD

– Unlike CMOS IC, VCSEL does not have any built-in ESD protection

– The failure path follows the weakest link• An x10 array will be 10 times more likely to fail than a discrete VCSEL

Page 11: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 11

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Oxide VCSEL Failure Modes --- Early Failures

• Infant Mortality– Failure rate decreasing over the time– Bad processed/grown wafers– Excessive mechanical stress due to oxidation– ESD during process and packaging– Burn-in removes most of the early failures, but some could

escape from the burn-in screening due to long infant mortality tails of oxide VCSELs

• Screening effectiveness varies from vendor to vendor• Once a while, bad wafers escape from being screened out• High temperature burn-in not effective in removing failures due to

mechanical stress in the oxide VCSEL

Page 12: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 12

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Field Failure Returns & Implication to Array

• Field transceiver failures due to VCSELs are not wear-out failures– Oxide VCSEL transceiver shipping history is less than 7 years

– VCSEL wear-out MTTF is at least 10 times longer, even operating at 85°C

• 90%+ VCSEL failures from field returns are due to either DLD or ESD– Categorized as either residual early failures, random failures or ESD

– A x10 VCSEL array will be 10 times more likely to fail than the current field

returns based on discrete VCSELs

• Published VCSEL reliability study and projection by VCSEL suppliers are done at temperatures too high to reveal the failure modes due to oxide layer stress, thus not effective in predicating random failure rates under normal operation condition

Page 13: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 13

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Field Return FA from a VCSEL Vendor

•None due to wear-out

•Predominant failures as “probable ESD/EOS”

Ref: “A plot twist: the continuing story of VCSELs at AOC”, Technical Publication, www.finisar.com

Page 14: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 14

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Partial Fault Protection (PFP)

• Network level and physical layer link redundancy are normally used to protect from single link failure

• Initial 100G applications are more likely for 10G aggregation, and its cost can be high

• Considering 4 times more likely to fail for a 4-lane PMD and 10 times more likely to fail for a 10-lane PMD, PFP on the PMD/PHY layer will reduce the burden on having network layer protection, and reducethe need for a redundant physical layer link.– A multi-lane PMD provides an opportunity to utilize the surviving lanes for data

transmission without an immediate need for replacement and reducing the

network redundancy requirements

Page 15: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 15

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Current Line-card Protection Scheme

• The connection between router and transport devices must be highly reliable

• 1+1 line card protection has been used• But, it is costly, and makes physical links more complex

Data network

Transport networkOTN

Line-cardRouter

Line-cardOTN

Line-card

A B

Data network

Router Line-card

OTN Line-card

Router Line-card Router

Line-cardOTN

Line-card

Backup link Backup link

Page 16: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 16

IEEE 802.3ba Task Force, 21-25 Jan, 2008

100GE Implementation per Current (1+1) Protection Scheme

• Two sets of system fabric buses are on for backup• Reduce the equipment capacity and increase the ssytem cost due to

redundancy

OTN Line-card

Router Line-card

OTN Line-card

Router Line-card

OTN Line-card

Router Line-card

OTN Line-card

Router Line-card

System Fabric Bus

System Fabric Bus

4x25G xWDM

(4-Lane PMD)10x10G Multi-lane fiber

(10-Lane PMD)

Page 17: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 17

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Current Network Level Protection Scheme

• Path H-A-E-D-F, E to D is a 100GE link by multi-lane PHY.

• If one lane fails in the Node E, node E will not be available. A new 100G link needs to be computed

– Backup with an additional 100G link (e.g. 1+1) is very costly to the network (Capex)

– If relying upon reroute protection protocol (e.g. FRR, 1:1), it may take a long time to compute new flow path when there are large numbers of 100G flow streams.

Mesh network

H F

A

C

B

D

E

hundreds msec<200

>several sec>200

TimeFlow numbers

ASON Recovering time, depending on how many paths to be recovered

Page 18: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 18

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Proposed Line-card Protection with PFP

• With PFP (partial fault protection), when a laser fails, the link connection will be maintained at a lower data rate– Laser has been the weakest link among all components in the system

• Partial fault indication may be sent to inform the higher level protocol (802.1, FRR, 1:1 protect) to decrease the bandwidth to accommodate the remaining lanes

• It will have little effect on the flow in the data network.• Failed modules will be replaced during scheduled maintenance• Network level protection kicks in when PFP fails

– Reduce the need for redundant link card protection

Data network

Transport networkData

network OTN Line-card

A

Router Line-card

Router Line-card

OTN Line-card

B

Page 19: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 19

IEEE 802.3ba Task Force, 21-25 Jan, 2008

PFP for Multi-Lane PMD

• When a laser or lane fails, receiver or DDM detects the failure.• Receiver or SD sends LLF (Local Link Fail) to the higher level, and

sends RLF (Remote Link Fail) to the transmit.• Transmit inserts NULL blocks to the failed lane, and sends a flow control

indication to the high level protocol in order to decrease the bandwidth to accommodate the remaining lanes

High Level Protocol

Flow Control Indicate

MAC PBL PHY PHY MACPBL

LLF

RLF

NULL

Flow Control

Transmitter Receiver

Page 20: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 20

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Optional Line-card Protection for Fiber Breakage

• Passive splitter and optical switch introduced as 1+1 link protection to protect the link from fiber breakage

• The combination of PFP and 1+1 link protection provides a seamless PHY level protection to alleviate the need for network level protection through data rerouting

OTN Line-card

Router Line-card

splitter optical switch

Work link fiber

Backup link fiber

Page 21: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 21

IEEE 802.3ba Task Force, 21-25 Jan, 2008

PFP for 10-Lane PMD in PBL181716

21

1 2 16 17 18

20 19 18 2 1

110

821711

918

A A A A A

110

211

817

918

A A A A A

RS

PBL

PCS

First block181716

21

1 2 16 17 18

18 17 16 2 1

110

211

818

918

A A A A A

110 2

118

17

918

AA A

A

A

RS

PBL

PCS

AFirst block

Alignment

TXC<3:0>

Sync(2bits)

Transmit Data Path Receive Data Path

Lane1 Lane10Lane1 Lane10

1 Data block

N Special Control Block

NN

NN

NN

NN

DiscardInsert Special Control Block into failure Lane

• When informed of a lane failure, TX inserts NULL blocks into thefailed lane, keeping valid blocks distributed to the valid lanes.

Page 22: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 22

IEEE 802.3ba Task Force, 21-25 Jan, 2008

PFP for 4-Lane PMD in PBL181716

21

1 2 16 17 18

20 19 18 2 1

14

25

37

A A A A

14

25

37

A A A A

RS

PBL

PCS

First block181716

21

1 2 16 17 18

18 17 16 2 1

14

25

37

A A A A

14 2

5

37

AA

A

A

RS

PBL

PCS

AFirst block

Alignment

TXC<3:0>

Sync(2bits)

Transmit Data Path Receive Data Path

Lane1 Lane4Lane1 Lane4

1 Data block

N Special Control Block

NN

NN

NN

NN

DiscardInsert Special Control Block into failure Lane

• When informed of a lane failure, TX inserts NULL blocks into the failed lane, keeping valid blocks distributed to the valid lanes.

Page 23: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 23

IEEE 802.3ba Task Force, 21-25 Jan, 2008

PFP for 4-Lane PMD with CTBI in PBL

4X25G

Skew Line

PL1PL2

PL3PL4

4X25G

Skew Line

PL1PL2

PL3PL4

OAM--RLF

Distributor

Respond to insert NULL blocks to disable the lane

Send to higher layer with flow control

Page 24: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

Page 24

IEEE 802.3ba Task Force, 21-25 Jan, 2008

Conclusions• Non-wear-out failures are the major concerns for field PMD degradation• Current single channel PMD field return rate due to laser failures has not been

negligible and multi-lane PMD failure rate will increase by four to ten times– Encourage equipment vendors to share transceiver field failure statistics to help with the decision

making

• Multi-lane PMD provides an opportunity for lower cost link fault protection by utilizing the surviving lanes --- PFP

– PFP is complementary to the current network protection architecture

– PFP makes line-card protection more economical• NULL block is defined to disable the failure lane

• Laser/PHY-Lane fail----- use PFP

• Fiber fail ----- use 1+1 protection

• Router or node fail ----- use network protection

• PFP indication may be used by higher layer (802.1) to achieve network wide protection

– Dialogue needed with 802.1

Page 25: Multi-Lane PMD Reliability and Partial Fault Protection …grouper.ieee.org/groups/802/3/ba/public/jan08/jiang_01_0108.pdf · Multi-Lane PMD Reliability and Partial Fault Protection

HUAWEI TECHNOLOGIES Co., Ltd.

IEEE 802.3ba, January, 2008

Thank You