toward more efficient dpa-resistant aes hardware ... · toward more efficient dpa-resistant aes...
TRANSCRIPT
![Page 1: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/1.jpg)
Rei Ueno, Naofumi Homma, and Takafumi Aoki
Toward More Efficient DPA-Resistant AES Hardware
Architecture Based on Threshold Implementation
13th April 2017, Paris, France
Constructive Side-Channel Analysis and Secure Design (COSADE)
Tohoku University
![Page 2: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/2.jpg)
Threshold Implementation (TI)
Achieve provable security considering glitches
Masking-based countermeasure
a: secret value,ai: share
Exploits pipelining to avoid propagating glitches
dth-order TI defeats dth-order DPAs
2
![Page 3: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/3.jpg)
(Unprotected) AES hardware architectures
3
Latency
Are
a
Round-
based
Byte-
serial
Un-
rolled
Resource
sharing
Datapath
replication
2K~
10K~
100K~
GE
![Page 4: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/4.jpg)
TI-based AES architectures
4
Byte-
serial
Round-
based
Un-
rolled
Latency
Are
a
6K~
40K~
GE
400K~
![Page 5: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/5.jpg)
Randomness in TI-based AES architectures
5
Byte-
serial
Round-
based
Un-
rolled
Latency
Are
a
6K~
40K~
GE
400K~
32~bit
512~
bit
5,120~
bit
![Page 6: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/6.jpg)
This work
New TI-based S-box
Combine algebraic characteristic of AES S-box with
state-of-the-art TI construction (d + 1 input share TI)
Achieve 25% smaller area than conventional ones
Efficient byte-serial AES HW architecture for TI
Resister-retiming for low latency encryption
Achieve 11-21% lower latency without area overhead
6
![Page 7: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/7.jpg)
Outline
Introduction
TI-based AES S-box
AES HW architecture for TI
Experimental leakage evaluation
Concluding remarks
7
![Page 8: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/8.jpg)
Conditions for dth-order TI
Correctness
dth-order non-completeness
Uniformity (or mask refreshing)
8
First-order
non-complete circuitRing-refreshing
z0
z1
z2
![Page 9: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/9.jpg)
Two constructions of TI-based circuit
TI with td + 1 input shares (t : algebraic degree)
Less registers and randomness
Efficiently applied to some practical 4-bit S-boxes
TI with d + 1 input shares
Smaller area, but more registers and randomness
Input shares must be independent of each other
9
![Page 10: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/10.jpg)
TI-based AES
Linear functions are easily realized
For non-linear function (i.e., S-box)?
Inversion determines security-order and performance
10
ShiftRows
MixColumns
ShiftRows
MixColumns
ShiftRows
MixColumns
ShiftRows
MixColumns
ShiftRows
MixColumns
Unprotected TI-realization
S-box
Unprotected
TI-based
S-box
TI-realization
![Page 11: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/11.jpg)
TI-based inversion circuits
Efficiently decomposed into three stages based
on tower-field arithmetic
Three designs w.r.t. Stage 2 and TI constructions
11
Algebraic
degree 2 3 2
Tower-field inversion
Moradi et al.,
Eurocrypt 2011
Bilgin et al.,
TCAD 2015
Cnudde et al.,
CHES 2016
Stage 2 2-stage pipelined Non-pipelined 2-stage pipelined
Input shares td + 1 td + 1 d + 1
Area (GE) 4,244 2,224 1,872
![Page 12: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/12.jpg)
Proposed TI-based inversion
TI with d + 1 input shares
Non-pipelined Stage 2
Non-pipelined Stage 2 can be efficiently optimized
using OR (NOR) gates and factoring
a xor b xor (a and b) = a or b
Difficult to be applied to pipelined Stage 212
![Page 13: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/13.jpg)
First-order TI-based S-boxes
Logic synthesis with area optimization
Tool: Synopsys Design Compiler
Technology: TSMC 65 nm standard CMOS
25% more compact and efficient
13
Moradi+,
Eurocrypt
2011
Bilgin+,
TCAD
2015
Cnudde+,
CHES
2016
This work
Area (GE) 4,244 2,224 1,872 1,342
Latency 5 4 6 5
Area-Latency
product21,220 8,896 11,232 6,710
Randomness
[bit]44 32 56 64
Synthesis results
![Page 14: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/14.jpg)
Outline
Introduction
TI-based AES S-box
AES HW architecture for TI
Experimental leakage evaluation
Concluding remarks
14
![Page 15: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/15.jpg)
Criteria
Only one TI-based inversion
20 clock cycles per round
16 clocks for SubBytes
4 clocks for SubWords in key scheduling
Latency caused by pipelining should have
impact on only first round
Exploit parallelism
With few additional modules and few path selectors
15
![Page 16: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/16.jpg)
Proposed AES HW architecture
Separated Inversion, Isomorphism, and Affine
TI is not applied to key scheduler
16
![Page 17: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/17.jpg)
State Array
Register-retiming for parallel computation
Last four affine transformations after ShiftRows
Output to TI-based inversion during MixColumns17
![Page 18: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/18.jpg)
Timing diagram
18
Clock cycle
SubBytes
. . . 15 1610 17 18 19 20 21 22 23 24
SR MixColumns
SubWord in KeyScheduling
6 7 . . .
Conventional method
20 (16 + pipeline-latency) clocks for SubBytes
Distinct clocks for ShiftRows (SR) and MixColumns
![Page 19: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/19.jpg)
20 clock cycles per round instead of 25
Decompose SubBytes into Inversion and Affine
Parallel execution of SR, Inversion, Affine, MixColumns
Timing diagram
19
Clock cycle
This work
SubBytes
. . . 15 1610 17 18 19 20 21 22 23 24
SR MixColumns
SR MixColumns
SubWord in KeyScheduling
6 7 . . .
SubWord
Inversion
Affine
Conventional method
Affine
MixColums
Inversion
![Page 20: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/20.jpg)
Performance evaluation
11-21% lower latency and 25% higher efficient
20
Moradi+,
Eurocrypt
2011
Bilgin+,
TCAD
2015
Cnudde+,
CHES
2016
This work
Area [GE] 11,114 8,119 6,681 6,334
Latency 266 246 276 219
Power [uW]24.12
(3.14*)No data 3.06
Area-Latency
product2,956 K 1,997 K 1,844 K 1,387 K
Power-Latency
product
6,416.92
(835.24*)No data 672.33
Process [nm] 180 65
* Multiplied by square of process rate
![Page 21: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/21.jpg)
Outline
Introduction
TI-based AES S-box
AES HW architecture for TI
Experimental leakage evaluation
Concluding remarks
21
![Page 22: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/22.jpg)
Experimental evaluation of DPA-resistance
Implement proposed AES on FPGA to perform
Test Vector Leakage Assessment (TVLA)
22
Setup overview Experimental setup
Board SASEBO-G
FPGAXilinx Virtex
PRO II
Frequency 24MHz
OscilloscopeTektronix
DPO7254
Sampling rate 1GS/s
# of traces 500,000
Proposed AES on FPGA
![Page 23: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/23.jpg)
Test Vector Leakage Assessment (TVLA)
23
Measured traces
Random
plaintext
Fixed
plaintext
t-test
4.5
4.5
-4.5-4.5
ResistantVulnerablet-
value
t-
value
![Page 24: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/24.jpg)
Measured trace
Random number for TI-based S-box is
generated using LFSR-based PRNG
24
PRNG off
(Unprotected)
PRNG on
(Protected)
AES PRNG
![Page 25: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/25.jpg)
Result
We can confirm first-order DPA resistance of
our architecture under 500,000 traces
25
PRNG off
(Unprotected)
PRNG on
(Protected)
4.5
-4.5
4.5-4.5
t-value t-value
![Page 26: Toward More Efficient DPA-Resistant AES Hardware ... · Toward More Efficient DPA-Resistant AES Hardware Architecture Based on Threshold Implementation 13th April 2017, Paris, France](https://reader034.vdocuments.net/reader034/viewer/2022042104/5e81f3a8751d0474ff1b6ec9/html5/thumbnails/26.jpg)
Concluding remarks
Efficient DPA-resistant AES HW architecture
based on TI
New TI-based S-box
• 25% smaller area
Proposed AES HW architecture for TI
• 11—21% lower-latency without area overhead
• Secure under first-order DPAs with 500,000 traces
Future works
Design and evaluate proposed HW architecture with
higher-order TI-based S-boxes
Reduce randomness
26