channel operating margin for 56 gb/s pam4 chip-to-chip and ... · beyond current kp4 code will be...

DesignCon 2019

100+ Gb/s Ethernet Forward

Error Correction (FEC) Analysis

Cathy Ye Liu, Broadcom Inc.

[email protected]

mailto:[email protected]

2

Abstract

In this paper, high-speed serial link error propagation models and different Ethernet PMA

multiplexing and codeword interleaving schemes have been studied and simulated to

provide FEC performance analysis for 100/200/400 GbE systems with 100+ Gb/s per

lane PAM4 interfaces. Different scenarios such as 1/ (1+D) mod4 precoding, PMA bit

multiplexing, symbol multiplexing and FEC codeword interleaving and their impacts on

overall system performance will be discussed. Multi-part link where a single FEC is

shared between electrical and optical parts is studied as well. Advanced FEC schemes

beyond current KP4 code will be explored at the end.

Authors Biography

Cathy Ye Liu, distinguished Engineer and director, currently heads up Broadcom SerDes

architecture and modeling group. Previously she worked as R&D director and

distinguished engineer in Avago/LSI which acquired Broadcom in 2016. Since 2002, she

has been working on high speed transceiver solutions. Previously she has developed read

channel and mobile digital TV receiver solutions. Her technical interests are signal

processing, FEC, and modeling in high-speed optical and electrical transceiver solutions.

She has published many journal and conference papers and holds 20+ US patents. Cathy

has demonstrated her leadership roles in industry standard bodies and forums. Currently

she serves as a member of the board director of Optical Internetworking Forum (OIF), a

member of the board of advisors for the department of Electrical & Computer

Engineering (ECE) of University of California at Davis and the co-chair of the

DesignCon technical track of high speed signal processing, equalization and coding. She

received her B.S. degree in Electronic Engineering from Tsinghua University, China, in

1995 and received her M.S. and Ph.D. degrees in Electrical Engineering from University

of Hawaii in 1997 and 1999, respectively.

3

1. Introduction

Future data center and high speed computation require 100+ Gb/s per lane connectivity to

meet the increasing set of applications and bandwidth. The IEEE 802.3bj 100 GbE

interface is based on an aggregation of 4 lanes with each lane’s data rate of 25.7812 Gb/s

[1]. To enable higher density and lower cost systems, IEEE 802.3bs [2] and 802.3cd [3]

deployed 4 or 8 lanes of 53.125 Gb/s to support 200 GbE and 400 GbE. In order to

continuously double the system bandwidth and density, the IEEE 802.3TM has recently

established the 802.3ck [4] 100 Gb/s, 200 Gb/s, and 400 Gb/s electrical interfaces task

force to support single or multiple lanes of 100+ Gb/s.

Since 2013 industry has painfully but successfully incorporated signaling format updates

from NRZ to PAM4 during the transition from 25 Gb/s to 50 Gb/s link data rates. The

design challenges of PAM4 SerDes such as linearity and tuning complexity are not the

focus of this paper. However assuming the same maximum signal amplitude, the

detection penalty of four-level signaling format (PAM4) over two-level signaling format

(NRZ) is 9.54 dB or even larger if considering horizontal margin degradation due to

multi-level signal crossings. It is an indisputable fact that the PAM4 detection penalty

can be partially offset by forward error correction (FEC) and FEC becomes a part of the

PAM4 system solution [5]. The interfaces of 25 Gb/s or slower data rates normally do

not take advantage of FEC and therefore target very low detector error ratio (DER0).

Note that DER0 is the baud rate PAM4 symbol error rate for a symbol by symbol

detector. It is equivalent to bit error ratio (BER) for NRZ signaling format while DER0 is

generally twice of BER for PAM4 signaling format. FEC can relax the DER0 target

significantly with the promise that the BER will be acceptably low following error

correction. For example, DER0 requirement in IEEE802.3cd is 10-4

to achieve final post-

FEC BER performance of 10-13

for 200/400 GbE.

The experience (and lesson learned) from PAM4 SerDes development and system

production [6] [7] highlighted that pre-FEC SerDes DER0 performance is no longer a

reliable overall system performance metric, and does not correlate well to the post-FEC

metric required for an accurate system performance evaluation. Furthermore, it is hard (if

not impossible) to simulate system performance bit by bit due to the long simulation time

and low post-FEC BER requirement of 10-13

or lower. Therefore, accurate and fast

analysis methods become essential for 50 Gb/s and 100 Gb/s serial link system.

This paper will start with error propagation models of random and burst errors. The

models can be analytical and Monte Carlo simulation based and the correlation between

those two is presented. Different Ethernet coding scenarios such as 1/(1+D) mod4

precoding, Physical Medium Attachment (PMA) multiplexing and codeword interleaving

schemes have been studied and simulated to provide FEC performance analysis for

100/200/400 GbE systems with 100+ Gb/s per lane PAM4 interface. Multi-part link

systems where a single FEC is shared between electrical and optical parts are then

addressed. Finally, Advanced FEC schemes beyond current KP4 code will be explored.

4

2. Random and burst error models

In this section, different error models will be studied.

2.1 Binary symmetrical channel (BSC) random error model

A very simple random error model can be used to study FEC coding gain for a binary

symmetric channel with additive Gaussian noise. The random PAM4 symbol error rate

prior FEC coding is

where SNR is signal to noise ratio at the PAM4 symbol detector. Eq. 1 assumes the

PAM4 signal variance is 5 with four signaling levels as 3, 1, -1, and -3. Note that

SERPAM4 is equivalent to DER0 defined in Ethernet specifications 802.3bs and 802.3cd.

For the PAM4 4-level signaling format, noise effects likely cause a detection error

between two adjacent levels. For gray coded PAM4, this causes one of the two bits in

each PAM4 symbol to be in error. Therefore,

𝐵𝐸𝑅𝑃𝐴𝑀4 =1

2𝑆𝐸𝑅𝑃𝐴𝑀4 Eq. 2

Independent random bit errors are assumed at the input of the FEC decoder of a Reed

Solomon (RS) code (n, k, t) over GF (2l), where n is the codeword length and k is the

information length [11]. Each RS code symbol has l bits, i.e., m=l/2 2-bit PAM4 symbols.

The input RS symbol error rate prior FEC decoder can be calculated as

𝑆𝐸𝑅𝑅𝑆_𝑝𝑟𝑒 = 1 − (1 − 𝑆𝐸𝑅𝑃𝐴𝑀4)𝑚 Eq. 3

With t symbol error correction capability in each codeword, the uncorrectable RS

codeword error rate after decoding is

𝐶𝐸𝑅 = ∑ (𝑛𝑖

) 𝑆𝐸𝑅𝑅𝑆_𝑝𝑟𝑒𝑖𝑛

𝑖=𝑡+1 ∙ (1 − 𝑆𝐸𝑅𝑅𝑆_𝑝𝑟𝑒)𝑛−𝑖 Eq. 4

The RS symbol error rate after decoding is

𝑆𝐸𝑅𝑅𝑆_𝑝𝑜𝑠𝑡 = ∑ (𝑛𝑖

)𝑖

𝑛𝑆𝐸𝑅𝑅𝑆_𝑝𝑟𝑒

𝑖𝑛𝑖=𝑡+1 ∙ (1 − 𝑆𝐸𝑅𝑅𝑆_𝑝𝑟𝑒)𝑛−𝑖 Eq. 5

Then the bit error rate after RS decoding is approximately

𝐵𝐸𝑅𝑝𝑜𝑠𝑡 ≈1

𝑙𝑆𝐸𝑅𝑅𝑆_𝑝𝑜𝑠𝑡 Eq. 6

2.2 Analytical Gilbert-Elliot channel burst error model

𝑆𝐸𝑅𝑃𝐴𝑀4 =3

4erfc (√

𝑆𝑁𝑅

2×5) Eq. 1

5

The previous assumption of independent random detector error and Gaussian noise are

not always true for channels dominated by inter symbol interference (ISI). Furthermore,

error propagation produces burst errors instead of random bit errors when a DFE is

included in the receiver. In order to improve the model accuracy, Gilbert-Elliot burst

error model [8] [9] can be modified to predict FEC coding gain for an ISI channel and

DFE based receiver over PAM4 interfaces.

The error propagation is modeled in this paper on the assumption that the probability of

getting an error in the symbol following an initial error is “a”, the probability of a burst of

3 errors is a2, the probability of a burst of 4 is a

3, and so on. The Gilbert-Elliot model is

limited to 1-tap DFE architectures. If the DFE tap coefficient equals to 1 (the same

amplitude as the main signal cursor), a= 0.75 for PAM4. This is derived from 1/4 of the

time, error causes the input to saturate to the highest or lowest symbol, the rest of the

time (3/4th), there is a 100% chance of an error occurring if DFE tap coefficient is 1.

Similarly, if the DFE tap coefficient reduces to 0.5, a=0.375. For the random error case,

a=0.

In this section, we only consider the symbol multiplexing and single KP4 FEC coding

scheme as described in Figure 1 in which there is no bit-multiplexing and codeword

interleaving applied. We will add other coding schemes and their impacts on FEC

performance in later sections. Figure 1 shows how a group of bits encode to RS symbols

and later to PAM4 symbols. Each block represents one bit. For each bit the top number

represents corresponding RS symbol index number that it is encoded to and the bottom

number represents the corresponding bit index number within that RS symbol. Each 10-

bit RS symbol consists of five PAM4 symbols. Each PAM4 symbol consists of two bits,

lsb (least significant bit) and msb (most significant bit). In general one gray coded PAM4

symbol error only consists of either the lsb or msb bit in error, it cannot have two errors

simultaneously. If multiple PAM4 symbols (up to five) errors cross the RS symbol

boundary, they will cause two RS symbols in error. Otherwise they cause only one RS

symbol in error.

Figure 1. Symbol multiplexing coding scheme without bit-multiplexing and codeword

interleaving

6

Now we can calculate error signature, {p(1), p(2), p(3), …, p(t)}, the probability of a

burst error exactly causing 1, 2, 3, … RS symbol errors given an initial error and its error

propagation probability a as

𝑝(1) = ∑𝑚−𝑖

𝑚

𝑚−1𝑖=0 ∙ 𝑎𝑖 ∙ (1 − 𝑎) Eq. 7

𝑝(𝑘) = ∑𝑖−(𝑘−2)∙𝑚

𝑚∙ 𝑎𝑖 ∙ (1 − 𝑎) + ∑

𝑘𝑚−𝑖

𝑚∙ 𝑎𝑖 ∙ (1 − 𝑎)𝑘𝑚−1

𝑖=(𝑘−1)∙𝑚+1(𝑘−1)∙𝑚

𝑖=(𝑘−2)∙𝑚+1 Eq. 8

for k>1.

Based on the error signature {p(1), p(2), p(3), …, p(t)} and the initial symbol error rate

SERRS, the RS (n, k, t) codeword error rate can be calculated as

errorsburstseparatet

ttttn

RS

t

RS

errorsburstseparte

n

RSRS

errorsburst

n

RSRS

errorburstSingle

n

RSRS

errorsburstseperatet

iin

RS

i

RS

n

ti

ppt

ppt

ppt

SERSERt

n

tppptppptppSERSERn

tptptpptpptppSERSERn

tpSERSERn

pSERSERi

nCER

...))2()1(3

)2()1(2

)2()1(1

()1(

...

...))3()3()1(2

3)2()2()1(

2

3)1()1(

1

3()1(

3

))2/)1(()2/)1((...)2()3(1

2)1()2(

1

2)()1(

1

2()1(

2

)1()1(1

)1()1(

35231

3

233

2

22

11

1

Eq. 9

For Ethernet applications, the frame loss ratio (FLR) is normally used as the post FEC

system performance metric:

𝐹𝐿𝑅 = 𝐶𝐸𝑅 ∙ (1 + 𝑀𝐹𝐶)/𝑀𝐹𝐶 Eq. 10

where MFC is the number of MAC frames per codeword, say MFC=8. FLR to post-FEC

BER ratio is roughly 620 [10].

In this section, we focus on the analysis of KP4 FEC, RS (544, 514, 15) over GF (210

).

Table 1 shows pre-FEC SerDes detector SNR and DER0 requirements to achieve 10-18

post-FEC BER (or 6.2e-16 FLR) for random errors and burst errors with a=0.75 and

a=0.375. Figure 2 and 3 show post-FEC FLR performances vs. different pre-FEC DER0

and SNR values with and without KP4 FEC for random error and burst errors. The three

dashed line thresholds are equivalent to 10-12

(100 GbE), 10-13

(200/400 GbE) and 10-15

(OIF CEI) post-FEC BERs. We can see that KP4 FEC can significantly relax the SerDes

DER0 requirement and provide SNR gains. Burst errors with a=0.75 have less coding

7

gain, or increase pre-FEC DER0 requirement compared with the cases of random error

a=0 and burst errors with a=0.375.

Table 1. Pre-FEC SNR and DER0 requirements with RS (544, 514, 15) to achieve 10-18

post-FEC BER

Random Error Burst Error a=0.75 Burst Error a=0.375

SNR (dB) 17.98 22.96 18.65

DER0 2.95e-4 2.47e-10 9.66e-5

Figure 2. FLR vs. SerDes detector DER0 with and without RS (544, 514, 15) FEC for

random and burst error cases

8

Figure 3. FLR vs. SerDes detector SNR with and without RS (544, 514, 15) FEC for

random and burst error cases

2.3 1/(1+D) mod 4 precoding

The burst error run length caused by DFE error propagation can be reduced by using

precoding. PAM4 precoding 1/(1+D) mod 4 as defined in IEEE 802.3cd 120.5.7.2 [3] is

illustrated in Figure 4.

Figure 4. Block diagram of 1/(1+4) mod 4 precoding

The feature of this precoding is to reduce the long consecutive burst errors {1, -1, 1, -1, 1,

-1 …} caused by a 1-tap DFE with a=0.75 into 2 errors per error event, one error at the

9

entry and the other error at the exit. We can calculate the error signature {p(1), p(2),

p(3), …, p(t)} with precoding for 1-tap DFE burst error with a=0.75 as

𝑝(1) = ∑𝑚−𝑖−1

𝑚∙ 𝑎𝑖 ∙ (1 − 𝑎)𝑚−2

𝑖=0 , Eq. 11

𝑝(2) = 1 − 𝑝(1),

𝑝(𝑘 > 2) = 0.

On the other hand, a single random error at the slicer output turns into two errors after the

pre-coding is removed. Similarly for a 1-tap DFE with smaller error propagation factor of

a=0.375 whose error pattern is unlikely consecutive as {1, -1, 1, -1, 1, -1 …}, hence the

fact of reducing the long burst error to 2 errors is not true anymore. Therefore, precoding

doesn’t always mitigate error propagation. The analysis results in Table 2 and Figure 5-6

support this statement. For a burst error model with a=0.75 precoding can provide

3.67dB SNR gain and relax DER0 target 5 order of magnitude, while for a burst error

model with a=0.375 precoding has 0.35dB SNR and ½ order magnitude DER0 penalty

instead.

Since precoding is simple and easy to implement, IEEE 802.3cd has adopted it as a

mandatory function to implement in the transmitter but the link can configure to either

enable or disable the usage of the precoder depending on the receiver architecture and the

error propagation characteristics.

Table 2. Pre-FEC SNR and DER0 requirements to achieve 10-18

post-FEC BER with and

without precoding functions for burst error a=0.75 and 0.375

Random

Error

Burst Error

a=0.75

Burst Error

a=0.375

Burst Error a=0.75

+precoding

Burst Error a=0.375

+precoding

SNR (dB) 17.98 22.96 18.65 19.29 19.00

DER0 2.95e-4 2.47e-10 9.66e-5 2.82e-5 5.06e-5

10

Figure 5. FLR vs. SerDes detector DER0 for random and burst error cases with and

without precoding over RS (544, 514, 15) FEC

Figure 6. FLR vs. SerDes detector SNR for random and burst error cases with and without

precoding over RS (544, 514, 15) FEC

11

2.4 Multiple-tap DFE burst error model

The previous analytical Gilbert-Elliot channel burst error model is based on a 1-tap DFE

receiver architecture. In this section, the analysis is extended to multiple tap DFE

architectures. A Monte Carlo simulation approach is applied to model multiple tap DFE

error propagation characteristics.

First, random data is generated and encoded using a real set of multiple-tap DFE tap

weights obtained from a link simulation or bench measurement. The DFE tap weights are

assumed to be exactly equal to the post-cursor samples of the pulse response. A simple

pulse response is then generated based on the DFE tap magnitudes. The generated pulse

response length matches the length of the DFE buffer. It is convolved with the data to

generate the channel response at the DFE input. Gaussian noise is added at the detector

slicer such that the received symbols have the simulated or measured DER0 (say 1e-4).

Second, a single error is injected into the detector slicer and error propagation is

monitored through the multiple-tap DFE feedback and random noise at the detector

slicer. This process is repeated for a programmed number of burst error events (say 1e7 in

this paper). Each burst error event simulation uses an independent generation of random

data.

Third, error signatures with different FEC coding schemes can be calculated among the

simulated number of burst error events. Burst length is defined as the total length of a

burst error from the injected error until the last error. Additionally, the error signatures

{p(1), p(2), p(3), …, p(t)} of different coding schemes are calculated to understand how

they affect FEC performance. Pre-coding functions can be applied in the Monte Carlo

simulator to capture its effect. Ethernet bit-multiplexing and codeword interleaving

schemes described in the later sections can be simulated as well. Each burst error event

occurs randomly and it could locate anywhere in the RS symbols or codewords. To

capture different alignments the burst error event is swept through all the possible

locations in the RS codewords.

The analysis concludes with the error signature of each case being fed into KP4 FEC

model to see its post-FEC FLR and BER performance.

To correlate the previous analytical model with the Monte Carlo simulation, we can

compare the results, the error signatures {p(1), p(2), p(3), …, p(t)}, generated by those

two. For the random error case, we simply set the DFE tap weights to all zero, h=[0 0 0 0

0 …]. For a 1-tap DFE burst error case we can set DFE tap weights as h=[1 0 0 0 0 …]

and h=[0.5 0 0 0 0 …] for a=0.75 and a=0.375, respectively. Figure 7 shows good match

between the Monte Carlo simulation (dots and circles) and the analytical random error

and the 1-tap DFE Gilbert-Elliot burst error model (solid and dash lines) with and without

precoding.

12

Figure 7. Correlation between analytical model and Monte Carlo simulation results

Now we can use the matched Monte Carlo model to simulate the multiple-tap DFE. Four

test cases are selected, all with 12 DFE taps. The chosen 12-tap DFE tap weights h are:

Case 1: h=[0.7 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2]

Case 2: h=[0.7 -0.2 0.2 -0.2 0.2 -0.2 0.2 -0.2 0.2 -0.2 0.2 -0.2]

Case 3: h=[0.700 0.072 -0.027 -0.039 -0.023 -0.017 -0.012 -0.009 -0.006 -0.006 -

0.005 -0.005]

Case 4: h=[0.700 0.200 0.200 0.200 0.200 0.147 0.116 0.086 0.071 0.056 0.044

0.042]

Case 1 and 2 apply the maximum tap weights as defined in IEEE 802.3cd [3], b_max

(1)=0.7 and b_max (2:12)=0.2. Case 1 has the same polarity for DFE taps h2-h12 while

case 2 has an opposite polarity. Case 1 and 2 represent the worst cases to make some

bounds on performance. Cases 3 and 4 are real DFE tap coefficients obtained from COM

calculations over a real channel at 112 Gb/s data rate. Case 3 has reflections and Case 4

has a long decaying ISI tail.

Precoding is enabled for all cases. From the simulation results shown in Figure 8 and 9

we can see that results of cases 3 and 4 are close to the result of 1-tap DFE a=0.75. In

another words, the analytical Gilbert-Elliot channel burst error model is a good candidate

for FEC performance analysis in 100/200/400 GbE systems with 100+ Gb/s per lane

PAM4 interfaces. Case 1 has a little worse performance than a 1-tap DFE case, but not

too much. However, Case 2 with the opposite polarity DFE coefficients has significant

13

error propagation beyond the KP4 error correction capability. Case 2 is too pessimistic to

predict FEC performance analysis for 100/200/400 GbE systems since the chance of all

DFE taps toggling with opposite polarity and reaching maximum coefficient constraints

is very low.

Figure 8. FLR vs. SerDes detector DER0 with different DFE profiles

Figure 9. FLR vs. SerDes detector SNR with different DFE profiles

14

3. Ethernet FEC encoder and decoder interface and coding schemes

In Section 2 we only focused on symbol multiplexing mode as shown in Figure 1. In this

section other coding schemes like bit-multiplexing and codeword interleaving will be

added.

3.1 PMA bit multiplexing

As today’s FEC for 100 GbE (2x50 Gb/s), PMA 2:1 bit multiplexing is deployed as

shown in Figure 10 and defined in IEEE802.3bs section 120.5.2 [2]. Figure 11 shows

how a group of bits encode to PAM4 symbols and RS symbols for this case. Unlike

symbol multiplexing, lsb and msb bits of a PAM4 symbol belong to different RS symbols

in 2:1 bit multiplexing. So a very short error pattern like 2 PAM4 symbols in a row could

easily cause two RS symbols in error. Thus the bit multiplexing could harm the FEC

performance. For 100 GbE with 100 Gb/s per lane interface, the bit multiplexing could be

4:1 instead of 2:1 as shown in Figure 12. Compared with 2:1 bit multiplexing, we can

see that a short burst error can easily cause 4 RS symbols in error for 4:1 bit multiplexing.

Therefore, we expect further FEC performance degradation by using 4:1 bit multiplexing.

It is proven by the analysis results in Figure 13 and 14 for burst error model with a=0.75.

We can see that for 100 GbE FLR target 6.2e-10 (equivalent to BER 1e-12), compared

with symbol multiplexing scheme, 2:1 bit multiplexing has 0.41dB FEC coding gain

degradation while 4:1 bit multiplexing has 1.32dB degradation. However, pre-coding can

wipe out the coding gain delta between the cases of with and without bit multiplexing.

Therefore, precoding is a necessary function to alleviate the bit-multiplexing penalty for a

burst error channel with large error propagation factor a.

Figure 10. Block diagram of current 50 GbE and 100 GbE FEC

15

Figure 11. 2:1 bit multiplexing coding scheme without codeword interleaving

Figure 12. 4:1 bit multiplexing coding scheme without codeword interleaving

Figure 13. FLR vs. SerDes detector DER0 with symbol multiplexing, 2:1 bit multiplexing

and 4:1 bit multiplexing with and without precoding

16

Figure 14. FLR vs. SerDes detector SNR with symbol multiplexing, 2:1 bit multiplexing

and 4:1 bit multiplexing with and without precoding

3.2 Codeword interleaving

As today’s FEC for 200 GbE and 400 GbE with 50 Gb/s per lane, 2:1 codeword

interleaving is deployed with a checkerboard order as in Figure 119-10 and 119-11 in [2].

Figure 15 illustrates how 2:1 codeword interleaving will apply to 200 GbE and 400 GbE

with 100 Gb/s per lane interface. Figure 16 shows how a group of bits encode to PAM4

symbols and RS symbols for 4:1 bit multiplexing and the 2:1 codeword interleaving

coding scheme. The block colors (blue and red) represent which codeword it belongs to.

The purpose of codeword interleaving is to mitigate the error propagation by breaking a

long burst error into two separate codewords and thus to improve the coding gain. From

the analysis results in Figure 17 and 18 for a burst error model with a=0.75, we can see

that for the 200/400 GbE FLR target 6.2e-11 (equivalent to BER 1e-13) 2:1 codeword

interleaving provides 2.06 dB more coding gain than no codeword interleaving for the

cases without precoding. With pre-coding the delta between those two is getting smaller,

but still 0.45dB coding gain with 2:1 codeword interleaving is observed.

17

Figure 15. Block diagram of 200 GbE and 400 GbE FEC with 2:1 codeword interleaving

Figure 16. 4:1 bit multiplexing coding scheme and 2:1 codeword interleaving

Figure 17. FLR vs. SerDes detector DER0 with and without 2:1 codeword interleaving

4 or 8 FEC lanes

18

Figure 18. FLR vs. SerDes detector SNR with and without 2:1 codeword interleaving

3.3 Summary for 100/200/400 GbE with 100 Gb/s per lane interface

Table 3 shows FEC performance for 100 GbE and 200/400 GbE with a 100 Gb/s

interface with different coding schemes and burst error profiles. For 100 Gb/s Ethernet,

the target FLR is 6.2e-10. There is no codeword interleaving but there would be PMA 4:1

bit multiplexing which may harm KP4 FEC performance. For 200 Gb/s and 400 Gb/s

Ethernet, the target FLR is 6.2e-11. Two codewords are interleaved in a checkerboard

pattern and there would be PMA 4:1 bit multiplexing. At the SerDes PHY level

precoding can be enabled or disabled. Random error, 1-tap DFE and 12-tap DFE are

considered for the analysis.

From the results we can conclude:

Compared with the random error case, DFE error propagation degrades KP4 FEC

coding gain. A multiple-tap DFE could be worse than a 1-tap DFE.

Precoding is a necessary function to mitigate the bit-multiplexing penalty for

burst error channel with large error propagation factor a.

PMA 4:1 bit multiplexing harms KP4 FEC performance while 2:1 codeword

interleaving helps.

DFE architecture plays an important role for FEC performance. If a 1-tap DFE is

used, DER0 requirements to achieve FLR performance are higher than 1e-4 for

both 100 GbE and 200/400 GbE even with the worst case of error propagation

a=0.75. However, if a 12-tap DFE with maximum DFE tap coefficients (defined

in 802.3cd) is used, DER0 requirement is tightened to the order of 1e-5.

19

Table 3. Pre-FEC SNR and DER0 requirements to achieve FLR performance of 100 GbE

and 200/400 GbE with different coding schemes and burst error profiles

FLR=6.2e-10 (100 GbE) FLR=6.2e-11 (200/400 GbE)

Case DER0 SNR (dB) DER0 SNR (dB)

Random 7.53E-04 17.33 6.40E-04 17.45

a=0.75+symbol 2.88E-05 19.28 9.78E-06 19.78

a=0.75+symbol+prec 1.57E-04 18.37 1.18E-04 18.54

a=0.75+BM(2:1) 1.20E-05 19.69 3.65E-06 20.19

a=0.75+BM(2:1)+prec 1.42E-04 18.43 1.06E-04 18.60

a=0.75+BM(4:1) 1.24E-06 20.60 2.64E-07 21.13

a=0.75+BM(4:1)+prec 1.22E-04 18.52 9.07E-05 18.69

a=0.75+BM(4:1)+CI(2) - - 4.39E-05 19.07

a=0.75+BM(4:1)+CI(2)+prec - - 1.63E-04 18.35

12-tap DFE case 1 +symbol+prec 9.02E-05 18.69 3.61E-05 19.17

12-tap DFE case 3 +symbol+prec 1.29E-04 18.49 7.08E-05 18.82

12-tap DFE case 1 +BM(4:1)+prec 4.70E-05 19.04 1.59E-05 19.56

12-tap DFE case 3 +BM(4:1)+prec 8.54E-05 18.72 4.59E-05 19.05

12-tap DFE case 1 +BM(4:1)+CI(2)+prec - - 2.33E-05 19.38

12-tap DFE case 3 +BM(4:1)+CI(2)+prec - - 4.83E-05 19.02

4. FEC for the multi-part link

In this section, let’s extend the FEC analysis to the multi-part link. Figure 19 and Figure

20 illustrate the differences between the single part link where a FEC is dedicated to the

link and the multi-part link where a FEC is shared between 2 or more parts of the link.

For a multi-part link with two chip-chip or chip-module electrical links and one optical

link as shown in Figure 20, the three parts of the link shared a single KP4 FEC encoder

and decoder. If the FEC parity check bytes are added at the beginning part of the link and

then the correction is applied only at the destination of the link, the worst case input BER

for the FEC decoder must be met by the concatenation of all of the sub-links. In general,

the electrical sub-links might be chip-module and chip-chip with DFE (and error

propagation) while the optical link is assumed to have random errors. The toughest part

of the link is assigned the bulk of the coding gain. For Ethernet, this is the optical link.

However, to allocate a relatively smaller coding gain to the electrical links, a SNR

penalty must be taken from the optical link. Now assumes a tolerable (e.g., 0.1-0.2 dB)

SNR penalty to be taken away from the optical part of the link.

20

Figure 19. Example of single part link

Figure 20. Example of multi-part link

Note that optical links are random error dominant with RS symbol error rates SERRSo,

while electrical links are burst error dominant with equal error contributions SERRSe,

respectively. The principle of the multi-part link FEC model is to calculate the probability

of t+1 or more symbol errors over optical and electrical parts of the link. The calculation

finds the probability of t, t-1, t-2… 1 and 0 symbol errors due to the electrical sub-links

combined with the probability of 0, 1, 2, 3… t symbol errors due to the optical sub-link.

In another words, when the optical random error sub-link produces i=0, 1, 2, 3… t

symbol errors, the electrical burst (or random) error sub-links have a less powerful RS

code with te=t-i to rely on. Therefore, the overall CER can be calculated as in Eq. 12 in

which both electrical sub-link and optical sub-link are random error based and as in Eq.

13 in which optical sub-link is random error based and electrical sub-link is burst error

based.

innandittelectricalforcodeRS

jin

RSe

j

RSe

in

tji

errorssymbolrandomopticalti

in

RSo

i

RSo

t

i

errorssymbolrandomopticalt

in

RSo

i

RSo

n

ti

randomrandom

ee

SERSERj

inSERSER

i

nSERSER

i

nCER

:

01

/ ))1()1(()1(

Eq. 12

)),,(()1(()1(

:

01

/

innandittelectricalforcodeRS

in

tji

eeeburst

errorssymbolrandomopticalti

in

RSo

i

RSo

t

i

errorssymbolrandomopticalt

in

RSo

i

RSo

n

ti

burstrandom

ee

tknRSCERSERSERi

nSERSER

i

nCER

Eq. 13

Table 3 lists the target DER0 and SNR that an electrical link would need to be to keep the

optical link penalty as 0dB, 0.1dB, 0.2dB and 0.7dB. We can see that if the optical sub-

link cannot take a large penalty (or in another words, to offer more FEC capability to

electrical sub-links), the DER0 and SNR requirements for electrical sub-link are

significantly tightened. For 200/400 GbE systems where 4:1 bit multiplexing, 2:1

codeword interleaving and precoding are deployed, if optical sub-link can only take

Module/Retimer Module/Retimer

21

0.1dB penalty, the electrical link DER0 target has to be lower than 2.1e-5 while it only

needs 1.6e-4 if the FEC is dedicated for the electrical link only. In order to maintain the

electrical sub-link DER0 target higher than 1e-4, about 0.7dB penalty has to be taken

from optical sub-link which could be considered too much.

Table 3. Pre-FEC SNR and DER0 requirements for electrical link to achieve FLR=6.2e-11

(200/400 GbE) performance with different optical link penalties

Optical penalty(dB) /DER0 Single part 0/6.4E-4 0.1/5.5E-04 0.2/4.8E-04 0.7/2.0E-04

Case DER0 (end-end) DER0 (elec) DER0 (elec) DER0 (elec) DER0 (elec)

random 6.40E-04 4.36E-07 9.05E-05 1.60E-04 4.41E-04

a=0.75 9.78E-06 9.43E-10 1.84E-07 3.68E-07 3.23E-06

a=0.75+prec 1.18E-04 6.82E-08 1.46E-05 2.66E-05 7.90E-05

a=0.75+BM(4:1) 2.64E-07 2.39E-10 4.52E-09 9.01E-09 9.11E-08

a=0.75+BM(4:1)+prec 9.07E-05 5.31E-08 1.14E-05 2.08E-05 6.22E-05

a=0.75+BM(4:1)+CI(2) 4.39E-05 1.59E-08 3.84E-06 7.51E-06 2.68E-05

a=0.75+BM(4:1)+CI(2)+prec 1.63E-04 9.90E-08 2.10E-05 3.78E-05 1.09E-04

5. Advanced FEC

Previous sections only focused on a RS (544, 514, 15) code over GF(210

) (also-known-as

KP4 FEC) for 100 GbE and 200/400 GbE systems and potential coding schemes like 4:1

bit multiplexing and 2:1 codeword interleaving. We can see that for severe error

propagation cases or multi-part links, the KP4 FEC might not be able to relax DER0 to

values SerDes designers require or provide the SNR coding gain that system designers

want. On the other hand, the KP4 FEC has about 100-200ns latency added to the system

data path. For some low latency demanding applications, an alternative FEC that requires

shorter encoding and decoding time is more attractive. In this section, we’ll explore other

options including different coding schemes and even different FEC codes. For the

exploration, three coding factors will be briefly discussed, coding gain, encoder/decoder

latency and complexity. Note that this paper is more focused on performance analysis

while the detailed study of implementation complexity and latency is beyond its scope.

5.1 Alternative coding schemes and RS codes

First let’s start with picking some low hanging fruit.

From section 3.1 we concluded that PMA bit multiplexing harms FEC performance. To

make things worse, Ethernet systems with 100 Gb/s per lane interfaces are likely to

increase the bit-multiplexing from 2:1 (deployed in the current 50 Gb/s interface) to 4:1.

To avoid the bit multiplexing penalty we can consider symbol multiplexing described in

Section 2. From Table 3 we can see that over 1dB (without precoding) and up to 0.4dB

(with precoding) more coding gain can be achieved by replacing 4:1 bit multiplexing

22

with symbol multiplexing. However, bit multiplexing is good-to-have for backward

compatibility especially in chip to module interface. So symbol multiplexing coding

schemes are more suitable for chip to chip or over backplane/cable interfaces.

The next low hanging fruit is to increase codeword interleaving depth. The current 50

Gb/s interface has an interleaving depth of 2. We can consider increasing it to 4, i.e. 4:1

codeword interleaving. By doing this, we expect the long burst errors will be further

divided to 4 separate codewords. The analysis shows that up to 0.3-0.5dB coding gain

can be achieved by increasing the codeword interleaving from 2:1 to 4:1 without

precoding. However, the coding gain is reduced to negligible if precoding is deployed.

Furthermore, decoding latency is proportionally increased by the interleaving depth

unless striping over multiple lanes is implemented to reduce the latency with the cost of

design complexity.

The 3rd

low hanging fruit is to explore longer RS codes to improve the coding gain or

shorter RS codes to reduce latency. For this paper, we studied two other RS codes besides

KP4 RS (544, 514,15) code: KR4 RS (528,514,7) and Long RS (1023,967,28) both over

GF(210

). KR4 FEC has a lower overhead (or code rate) than KP4 FEC but with weaker

error correction capability t=7. The Long RS code has a similar overhead as KP4 but

longer codeword length (almost 2x) and therefore stronger error correction capability of

t=28. We expect the KR4 code has slightly shorter encoding/decoding latency than KP4

but with a significant gate and area reduction compared with KP4 due to smaller t value,

while the Long RS code has 2x encoding/decoding latency unless striping over multiple

lanes is implemented.

Table 4 shows FEC performance with a coding scheme of 4:1 bit multiplexing and

precoding to achieve 1e-18 post-FEC BER performance for the three RS codes and

different error models. Figure 21 and 22 show the trending of FLR performance vs.

SerDes detector DER0 and SNR requirements for the three RS codes. We can see that the

stronger error correction capability the better long burst error tolerance, of course with

the cost of encoder/decoder complexity and latency.

Table 4. Pre-FEC SNR and DER0 requirements with coding scheme of 4:1 bit

multiplexing and precoding to achieve 1e-18 post-FEC BER performance for different RS

codes and different error models

Case Random 1-tap DFE a=0.75 12-tap DFE case 1 12-tap DFE case 3

DER0 SNR DER0 SNR DER0 SNR DER0 SNR

KR4 1.79E-05 19.51 1.54E-07 21.30 3.60E-14 24.53 7.04E-09 22.17

KP4 2.95E-04 17.98 2.07E-05 19.44 6.50E-07 20.83 1.72E-05 19.52

Long RS 7.86E-04 17.30 1.38E-04 18.45 8.27E-05 18.74 1.34E-04 18.46

23

Figure 21. FLR vs. SerDes detector DER0 with different RS codes

Figure 22. FLR vs. SerDes detector SNR with different RS codes

24

5.2 Advanced FEC options for next generation Ethernet

In this section other FEC codes besides RS codes will be briefly discussed in terms of

their coding gain, latency and complexity.

BCH codes are similar to Reed Solomon codes. BCH codes are a class of cyclic code

[11]. The main difference is that the BCH code is over GF (2) thus it is a binary version

of RS code for correcting multiple random errors. BCH codes are widely considered for

applications demanding low latency since its codeword length in terms of bits is shorter

than a similar RS code. However, BCH encoding and decoding complexity is nominally

higher than RS codes since its computation is bit based instead of symbol based. Another

disadvantage of BCH coding is its performance. It is fragile in burst channels with high

DFE error propagation compared with RS codes.

To achieve complementary advantages between BCH and RS codes, we can combine

those two in a concatenated code [11], a class of error correcting codes that consist of an

inner code (BCH code) and an outer code (RS code). The outer code can be considered to

use a KP4 type of RS code and the inner coder is a BCH code that could be short and

only correct t=2 or 3 errors. By doing this, the complexity and latency is slightly higher

than KP4 itself but with improved coding gain.

For an optical link that requires a high coding gain but less demands for low latency, a

more advanced FEC like turbo product code, staircase code, or a low density parity

checking (LDPC) code with iterative soft-decision decoding algorithm can be considered.

Some of those codes can provide impressive coding gains and achieve channel capacity

very close to the Shannon limit [11]. However, we believe their long latency (in the order

of us or even ms) make them unsuitable for electrical links.

6. Conclusions

In this paper, high-speed serial link error propagation models and different Ethernet

coding schemes have been studied and simulated to provide FEC performance analysis

for 100/200/400 GbE systems with 100+ Gb/s per lane PAM4 interfaces. Different

scenarios such as 1/ (1+D) mod4 precoding, PMA bit multiplexing, symbol multiplexing

and FEC codeword interleaving and their impacts on overall system performance have

been discussed. Multi-part link where a single FEC is shared between electrical and

optical parts has been briefly studied as well. Advanced FEC schemes beyond current

KP4 code are further explored.

Acknowledge

I would like to thank my Broadcom colleagues Adam Healey and Shaohua Yang who had

helped me on FEC modeling and analysis through many fruitful discussions and

feedbacks. I also want to thank Pete Anslow from Ciena who had spent time and effort on

the correlation between his results and ours for next generation 100G and 400 Gb/s

Ethernet system.

25

Reference

[1]: IEEE Std 802.3bj-2014 IEEE Standard for Ethernet Amendment 2: Physical Layer

Specifications and Management Parameters for 100 Gb/s Operation Over Backplanes and

Copper Cables.

[2]: IEEE Std 802.3bs-2017 IEEE Standard for Ethernet Amendment 10: Media Access

Control Parameters, Physical Layers and Management Parameters for 200 Gb/s and 400

Gb/s Operation

[3]: IEEE Std 802.3cd-2018 IEEE Standard for Ethernet Amendment 3:Media Access

Control Parameters for 50 Gb/s and Physical Layers and Management Parameters for 50

Gb/s, 100 Gb/s, and 200 Gb/s Operation

[4]: IEEE 802.3 100 Gb/s, 200 Gb/s, and 400 Gb/s Electrical Interfaces Task Force:

http://www.ieee802.org/3/ck/index.html

[5]: A. Healey and C. Liu, "Channel Operating Margin for 56 Gb/s PAM4 Chip-to-Chip

and Backplane Interfaces," DesignCon 2016, Santa Clara, CA, 2016.

[6]: X. Dong, N. Huang and G. Zhang, “Improved Engineering Analysis in FEC System

Gain for 56G PAM4 Applications”, DesignCon 2018, Santa Clara, CA, 2018.

[7]: Y. Lu, L. Ma, D. Mo and L. Liang, “High Gain Low Complexity Low Latency FEC

Codes for Ethernet and Backplane Applications”, DesignCon 2018, Santa Clara, CA,

2018.

[8]: Gilbert, E. N. (1960), "Capacity of a burst-noise channel", Bell System Technical

Journal, 39: 1253–1265, doi:10.1002/j.1538-7305.1960.tb03959.x.

[9]: Elliott, E. O. (1963), "Estimates of error rates for codes on burst-noise channels",

Bell System Technical Journal, 42: 1977–1997, doi:10.1002/j.1538-7305.1963.tb00955.x.

[10]: P. Anslow, “BER and FER for 100GBASE-SR4”, 2012, Available online:

http://www.ieee802.org/3/bm/public/mmfadhoc/meetings/nov29_12/anslow_01a_1112_

mmf.pdf

[11]: S. Lin and D. Costello, Error Control Coding, Prentice Hall, February 2002.

http://www.ieee802.org/3/ck/index.html

http://www.ieee802.org/3/bm/public/mmfadhoc/meetings/nov29_12/anslow_01a_1112_mmf.pdf

http://www.ieee802.org/3/bm/public/mmfadhoc/meetings/nov29_12/anslow_01a_1112_mmf.pdf

channel operating margin for 56 gb/s pam4 chip-to-chip and ... · beyond current kp4 code will be...

Documents