telecom systems simulations acceleration via cpu/gpu...
Post on 09-Mar-2018
219 Views
Preview:
TRANSCRIPT
Slide title
minimum 48 pt
Slide subtitle
minimum 30 pt
Paolo Spallaccini, Stefano Chinnici
Telecom Systems Simulations via
CPU/GPU co-processing
NVIDIA GPU Technology Conference 2012, San Jose, CA
Slide title
minimum 48 pt
Slide subtitle
minimum 30 pt
Paolo Spallaccini, Stefano Chinnici
TURBO CODES CASE STUDY
›Ericsson Telecomunicazioni, Milan (ITALY)
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 3
Fast Simulations for Communications Systems
PHY layer simulation is extremely
time consuming
Extrapolation of short timescale
results is risky
So what ?
HW prototyping (full/reduced
speed) is costly
Educated guesses are not always
optimal
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 4
CPU-based serially iterated
simulations
First time right ASIC design
Trading-off computation latency for
data-level parallelism!
TTM driven development
Higher Design Quality
Fast Simulations for Communications Systems
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 5
describing the battlefield
Simulating a
whole
telecom
system chain
is a very time
intensive
task, due to
the
complexity of
the overall
system
Typically,
physical
layer
simulations
on
conventional
CPUs have
a runtime of
several
weeks
Several
algorithms
that depend
on the
particular
processed
layer have to
be
implemented.
They often do
not benefit
from parallel
data
processing
An
adequate
statistic
characteriz
ation of the
simulation
often
requires a
very large
number of
iterations
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 6
THE phy layer simulation model
Random
Bits
Source
Serial
Turbo
Code
Encoder
QAM
Modulator
AWGN
Channel
Soft QAM
Demod
Serial
Turbo
Code
Decoder
BER
Meter
SCCC
The key players:
traffic source channel features
transceiver
modulation and
coding
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 7
our case study...
Random
Bits
Source
Serial
Turbo
Code
Encoder
QAM
Modulator
AWGN
Channel
Soft QAM
Demod
Serial
Turbo
Code
Decoder
BER
Meter
TURBO Forward-
Error Correction
(FEC)
performance
characterization
below BER 10-9
Iterative
Decoding has
a very high
computational
complexity
The decoding algorithm
performs both
... + recursive
computations
intrinsically parallel
calculations
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 8
...and the next steps
Random
Bits Source
Serial
Turbo
Code
Encoder
QAM
Modulator
AWGN
Channel
Soft QAM
Demod
Serial
Turbo
Code
Decoder
BER Meter
Deliver a GPU-accelerated Simulation Library
Next functions to be attacked
(in order of comp. complexity): 1. soft decision demodulator
2. Additive White Noise Gaussian channel model
3. Turbo-code encoder
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 10
The T-shaped approach:
1. (Horizontal) Widespan the whole
simulation system to identify the best
CPU-GPU synergy perspective, in a
scenario able to exploit parallel
processing
2. (Vertical) Dive deeply into the block
targeted for GPU implementation
our Cpu-gpu co-processing perspective
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 11
Coarse Profiling, first (and maybe some coffee...)
...Checking
system
performance... “GPU idea” is
promising but
you can feel
you’re going
along an
unpaved path!
Easy finding, the
rule of thumb:
Decoder must
be ”accelerated”
before any other
simulation block
Very next in line
is the soft
demod, heavier
for larger
modulations
CPU execution
times [ms]
QPSK 1024
QAM
Average time
per FEC
block 45.7 57.8
Spent in
SISO decoder 43.0 49.1
Spent in soft
demodulator 2.0 7.2
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 12
“complex” scenario in a parallel processing perspective
The algorithm mix in a typical telecom
network phy layer model is extremely
complex
Execution level Parallelization work
often requires algorithm re-
engineering
Data level parallelism is:
- either inherent (but always limited)
- or obtainable treating larger input
data sets
- don’t forget we are not
running real time stuff!
BUT,
surely this is not an isolated case!
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 13
inherently serial vs Embarassingly parallel algorithms
Simple,
light,
Parallel
unfriendly
Simple,
medium,
Parallel
friendly
Simple,
medium,
Parallel
friendly
Complex,
heavy,
Parallel
unfriendly
Simple,
medium,
Inherently
serial
Complex,
very heavy,
Poses
challenges
Random
Bits
Source
Serial
Turbo
Code
Encoder
QAM
Modulator
AWGN
Channel
Soft QAM
Demod
Serial
Turbo
Code
Decoder
BER
Meter
From GTC 2010 - among many other examples: deflation (highly parallel) and preconditioning (inherently serial) of conjugate gradient
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 14
Exploiting parallelism in a telecom system
Algorithm Intrinsic Parallelism Da
ta L
eve
l P
ara
llelis
m
System Block 1 Syste
m B
lock 1
System Block 1 Syste
m B
lock 1
Parallel domain 2
Parallel domain 2
Parallel domain 1
Parallel domain 1
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 15
From A Naive...
PARALLEL
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 16
...to a more structured approach
The T-shaped approach is good to prove the feasibilty of a GPU-
based simulation platform for heterogeneous and (very) complex
systems
But, to what extent shall we push efforts to paralelize and optimize
(in the ”CUDA sense”) the implementations of a single block...
...rather than try to regard at the overall simulation? Ultimately, what
kind of research efforts are needed in order to sort out challenges
posed by a telecom system?
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 17
Need for newer abstraction perspectives in modeling?
So, possibly the most important lesson learned
is related to the fact that a different modeling
strategy is necessary in order to ... ... define methods and criteria to optimately
cope with the different topics of a fully-parallel
approach to the problem of simulating a
telecommunication network (even if at just the
phy layer)
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 18
Serial turbo code overview
Outer
Conv.
Coder
(rate ½)
Outer
Code
Punct (6/8) π
Inner
Conv.
Coder
(rate ½)
Inner Code
Punct
Message Bits Coded Bits
OUTER
CONVOLUTIONAL
CODE COUT
RATE k/p
INNER
CONVOLUTIONAL
CODE CIN
RATE p/n
INTERLEAVER N
SCCC (rate = k/n)
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 19
Serial turbo code: key points
Iterative Decoding of Turbo Codes
Iterated functions acceleration
leads to high speed-up
non-Iterated functions acceleration
removes memory bottlenecks
inner code puncturing
data permutation constituent codes soft decoding
outer code puncturing
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 20
BCJR decoding algorithm: BLOCK DIAGRAM
Turbo Code Decoding Algorithm
based on Bahl, Cocke, Jelinek and Raviv (1974)
updated by Berrou (1993) and Benedetto (1996)
Iterative Algorithm:
- Minimizes the bit error probability
- Iterates until convergence is reached
Each iteration uses a double recursion
to compute updated probabilities
of each bit in the received FEC block
given the channel characteristics and
the code structure.
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 21
Decoder constituents: definitions
ix iy
The INTERLEAVER
I
)()( ii xyxIy
iii ,)( Under causality assumption
SOFT
DEMODULATOR ky )|()( cypcL kk def
)|1(
)|1(ln
yuP
yuP
k
k
Log-Likelihood ratio: LLR
def
1)(~ )0(
2 iL k
)()(~
))(()(~
:
)1(
211
)(
1 lal
iu kl
m
l
m
k upuLucLiLk
u
)()(~
))(()(~
:
)(
122
)(
2 lal
iu kl
m
l
m
k upuLucLiLk
u
Extrinsic information: Extrinsic information
provided by decoder 1
used as a-priori
information by the
other decoder
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 22
decoding algorithm details: siso block
(A)-SISO
);( IcPk
);( IuPk
);( OcPk
);( OuPk
)()(~
);( )1(
2 upuLIuP a
n
kk
)();( 11 cLIcP kk
The core is the Soft-Input Soft-Output (SISO)
decoder
- based on a double recursion
- Log-domain formulation (log-MAP algorithm) used
- lower computation complexity!
Key Steps:
- branch metrics computation
- forward and backward recursions
- soft output computation
all these operations are performed on the code trellis
(1st SISO)
)()(~
);( )(
)(1 upuLIuP a
n
kk
)();( 22 cLIcP kk
(2nd SISO)
A posteriori extrinsic:
)(~ )(
1 uL n
k )(~ )(
2 uL n
k );( OuPk);( OuPk
1SISO 2SISO
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 23
SISO decoding: double recursions
SISO decoding on the code trellis
1- branch metrics (received symbol normed distance with every
possible code alphabet symbol) are computed as in Viterbi decoding
2- forward path metrics are computed according to
3- backward path metrics are computed according to
4- soft output is computed according to
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 24
decoding iterations (turbo code generic scheme)
SISO
1
A priori LLR
SISO
2
decision
not used not used
)( 11 cL k
)( 22 cL k
)(~ )1(
2 uL n
k
)(
~ )(
1 uL n
k)(
~ )(
)(1 uL n
k
)(~ )(
)(2 uL n
k
)(~ )(
2 uL n
k
)(upa
I1I
)()(~
);( )1(
2 upuLIuP a
n
kk
)();( 11 cLIcP kk
)()(~
);( )(
)(1 upuLIuP a
n
kk
)();( 22 cLIcP kk
SISO 1 SISO 2 A single decoding iteration
involves
- two SISO decoding
operations
- on inner and outer code
- two soft bits permutations
- direct and reverse
Iterations are repeated until
convergence - or until a limit value is
reached
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 25
turbo decoding parallel algorithm
? Parallel Friendly Sections:
- permutation (direct and reverse)
- memory access problems to be
taken into account!
- SISO branch metrics computation
- soft output computation
Inherently serial Sections:
- forward and backward recursions
- algorithmic
reformulation/reengineering
needed !
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 26
decoding bottleneck: double recursion
SISO decoding bottleneck
- forward path metrics computation
recursive sum over ”past” trellis steps
- backward path metrics computation
recursive sum over ”future” trellis steps
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 27
breaking recursions
Recursions can be split over N
windows
- minimum window size is FEC dependent
- windows boundary values
- taken from previous iteration
Parallel-friendly algorithmic re-engineering
- side effect: convergence is slowed down
- more iterations are required
Input codewords vector
window 0 window 1 window N-1 window k-1 window k
BACKWARD
FORWARD
BACKWARD
FORWARD
Backward recursion state metric distribution
Forward recursion state metric distribution
iteration 1
Iteration 2
INIT
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 28
BCJR algorithm implemented - iterations
cudaMonolithicBCJR_I <<< B, T >>>
(Cod_cu, Inf_cu, OInf_cu, AlphaWinMem_I_cu, BetaWinMem_I_cu);
cuda_Deinterleaver <<< blocksInter, threadsInter >>>
(Permutation_cu, InfOInner_cu, Ext_cu);
cudaO_Depunct <<< blocksPunct, threadsPunct >>>
(Ext_cu, OCod_cu);
cudaMonolithicBCJR_O <<< B, T >>>
(Cod_cu, Punctarray_cu, OInf_cu, OCod_cu, OPunct_cu, AlphaWinMem_O_cu,
BetaWinMem_O_cu, StopOrGo_cu);
cuda_Punct_Interleaver_StopRule <<< blocksInter, threadsInter >>>
(Permutation_cu, Ext_cu, OPunct_cu, Inf_cu, Stopping_cu);
Smaller grid, time-intensive
kernel. (huge data vectors
exchanged with host)
Larger grid,
lean kernel
Larger grid, lean
kernel (no host-
device data
exchange
Smaller grid, time-
intensive kernel
Main data interface
with host; relatively
large grid
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 29
enc
mod
demod
dec
input
0 1 2 ... N-3 N-2 N-1
chan
looped
execution
looped
execution
looped
execution
looped
execution
looped
execution
TX
RX enc
mod
demod
dec
input
chan
looped
execution
looped
execution
looped
execution
looped
execution
looped
execution
TX
RX
Sim Architecture – baseline evolution
Data-parallel
”unfriendly” model:
CPU baseline
architecture
”Very large input vector” means that a lot of
input frames are processes at the same
time. Only parallel architectures allow such
kind of processing for telecom systems!
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 30
enc
mod
Sim Architecture – cuda
very large input vector 0 1 2 ... N-3 N-2 N-1
chan
demod
looped
execution
looped
execution
looped
execution
looped
execution
BC
JR
- GP
U g
rids
I_SISO
O_SISO
Deinter +
Depunct
Inter +
Punct
CP
U b
ase
d b
locks
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 31
The “monolithic kernel” example
extern "C"
__global__
void cudaMonolithicBCJR_I
(int* Cod_cu, int* Inf_cu, int* OInf_cu, int* AlphaWinMem_I_cu, int* BetaWinMem_I_cu)
{
int vector, window, p, i, j, jj, k, m, base = blockDim.x*blockIdx.x + threadIdx.x;
[…]
mP = &BetaWinMem_I_cu[m+EIGHT];
tempState3 = *--mP;
tempState2 = *--mP;
tempState1 = *--mP;
tempState0 = *--mP;
for(jj = (vector + disp*TRELLIS_MEM_LENGTH__TIMES__TWO),
p=vector, k=window; k>0; p-=N_STATES, k-=EIGHT, jj-=N_STATES)
{
[…]
}
*--mP = tempState3;
*--mP = tempState2;
*--mP = tempState1;
*--mP = tempState0;
mP = &AlphaWinMem_I_cu[m];
tempState0 = *mP++;
tempState1 = *mP++;
tempState2 = *mP++;
tempState3 = *mP++;
vector = base >> 2;
for(p=vector, k=0; k<WINDOW_SIZE__TIMES__N_STATES; p+=INPUTS, k+=EIGHT)
{
[…]
}
*mP++ = tempState0;
*mP++ = tempState1;
*mP++ = tempState2;
*mP = tempState3;
}
15 or 18μs in there!
(≈30% of GPU time for a
single decoding iteration)
What Visual Profiler said: (ipse dixit...)
1. This kernel is most probably
computationally bounded
2. Global memory accesses have large
improvements margins (although we
already suspected...)
3. GPU computational resources should
be better used (occupancy issues)
We do have potential for improvements in
performance, but:
Should we stick on this CUDA architecture
or try to reorganize data structures
and computational stuff?... 15 or 18μs in there!
(≈30% of GPU time for a
single decoding iteration)
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 32
SimulationS: BER results in two corner cases
BER 1024 QAM AWGN
1.00E-10
1.00E-09
1.00E-08
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
22.50 23.00 23.50 24.00 24.50 25.00 25.50
Eb/No coded
BER QPSK AWGN
1.00E-09
1.00E-08
1.00E-07
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
1.00E-01
4 4.5 5 5.5 6
Eb/No coded
BE
R
1024 QAM QPSK
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 33
Simulation performance comparison
Execution Time: Whole Simulation @ BER = 10-6
3070
3601
190305
16,17
11,82
0
500
1000
1500
2000
2500
3000
3500
4000
1 2
QPSK 1024 QAM
tim
e [
s]
0,00
10,00
20,00
Sp
eed
-up
facto
rCPU CUDA Accelerated Speed-up factor
Execution Time: Decoder @ BER = 10-6
51
62
2,021 1,9552
25,24
31,71
0,00
10,00
20,00
30,00
40,00
50,00
60,00
70,00
1 2
QPSK 1024 QAM
tim
e [
ms
]
-5,00
5,00
15,00
25,00
35,00
Sp
eed
-up
facto
r
CPU CUDA Accelerated Speed-up factor
CPU: Intel Xeon X5690 3.47GHz; 12GB RAM
GPU: NVIDIA Tesla C2050
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 34
A glance into next challenges
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
0 10 20 30 40 50
frame parallel factor
no
rmali
zed
execu
tio
n t
ime
Execution Time: Single Decoder Iteration @ BER = 10-6
1781417072
198 194
89,97
88,00
0,00
2000,00
4000,00
6000,00
8000,00
10000,00
12000,00
14000,00
16000,00
18000,00
20000,00
1 2
QPSK 1024 QAM
tim
e [
ms
]
80,00
90,00
100,00
Sp
eed
-up
facto
r
CPU CUDA Accelerated Speed-up factor
Slide title
minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1
minimum 24 pt
Bullets level 2-5
minimum 20 pt
!"#$%&'()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć
ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ
šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—
‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl
ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ
ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș
ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ
ΤΥΦΧΨΩΪΫΌΎΏ
ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ
ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in
the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 35
top related