lightweight tbc-based modes for small hardware …

33
Lightweight TBC-Based Modes for Small Hardware Implementations Mustafa Khairallah 1 NTU, Singapore December 15, 2019 1 Joint with Tetsu Iwata, Kazuhiko Minematsu and Thomas Peyrin 1 / 33

Upload: others

Post on 20-Nov-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lightweight TBC-Based Modes for Small Hardware …

Lightweight TBC-Based Modes for SmallHardware Implementations

Mustafa Khairallah1

NTU, Singapore

December 15, 2019

1Joint with Tetsu Iwata, Kazuhiko Minematsu and Thomas Peyrin1 / 33

Page 2: Lightweight TBC-Based Modes for Small Hardware …

Overview

Lightweight Cryptography Design Space

Our Goals

State of the art

Romulus-N

Misuse Resistance

FSM Optimization

Results

What’s next for Romulus?

2 / 33

Page 3: Lightweight TBC-Based Modes for Small Hardware …

Lightweight Cryptography Design Space

Security Area Thput Energy Power

3 / 33

Page 4: Lightweight TBC-Based Modes for Small Hardware …

Lightweight Cryptography Design Space

Security Area Thput Energy Power

4 / 33

Page 5: Lightweight TBC-Based Modes for Small Hardware …

Lightweight Cryptography Design Space

5 / 33

Page 6: Lightweight TBC-Based Modes for Small Hardware …

What went wrong?

1. Broken2. Too slow3. Too much energy4. Too complex5. Too big?!

6 / 33

Page 7: Lightweight TBC-Based Modes for Small Hardware …

Security Goals

1. Urgent Provable 128-bit security in the standard TBC model.2. Urgent Easy to mask for side channel protection.3. Optional Misuse resistance.

7 / 33

Page 8: Lightweight TBC-Based Modes for Small Hardware …

Area Goals

1. Urgent No extra state beyond the TBC.2. Urgent No feed-forward.3. Urgent No key/nonce storage.4. Urgent Around 6000 GEs for round based implementations.5. Urgent Below 10,000 GEs for threshold round based

implementations.6. Urgent Strictly inverse free.7. Urgent Decryption is almost free.8. Important Minimal use of multiplexers.9. Important Around 3000 GEs for byte serial implementations.

10. Important A wide variety of trade-offs.

8 / 33

Page 9: Lightweight TBC-Based Modes for Small Hardware …

Performance Goals

1. Urgent Fast for short messages and authentication.2. Important Fast enough (≫ 1 Gbps on modern ASIC

technologies) even with first order masking.3. Important Wide performance range.4. Important Competitive with AES-based designs and CAESAR

lightweight portfolio (namely, Ascon).

9 / 33

Page 10: Lightweight TBC-Based Modes for Small Hardware …

Where to look?

▶ ΘCB3 has great features:▶ Standard TPRP assumption.▶ The security bound is independent of the length.▶ Parallelizable.▶ Not inverse free, needs n extra flip flops, needs an extra call.

▶ ΘCB3 needs n extra flip flops at least.

10 / 33

Page 11: Lightweight TBC-Based Modes for Small Hardware …

Where to look?

1. iCOFB is interesting starting point: lightweight.2. ZAE has higher rate for authentication compared to

encryption.

11 / 33

Page 12: Lightweight TBC-Based Modes for Small Hardware …

Romulus-N

0n ρ

A[1] A[2] A[a− 1]

ρ

pad(A[a]) N

Sn

n

n

t

S ρ

M [1] N N

ρ

0n

n

n

n

t

C[1]

n

ρ

pad(M [m])

C[m]

lsb

T

ωA ∈ {24, 26}

ωM ∈ {20, 21}

E8,1K E

8,a−2K E

ωA,aK

E4,1K E

ωM ,a−2K

lsb

12 / 33

Page 13: Lightweight TBC-Based Modes for Small Hardware …

Rx Architecture

state

Skinny lt

input

output

13 / 33

Page 14: Lightweight TBC-Based Modes for Small Hardware …

Sx Architecture

S0 S1 S2 S3

S4 S5 S6 S7

S8 S9 Sa Sb

Sc Sd Se Sf SBox

RC

RTKρ

input

0x00

len

output

0x00

14 / 33

Page 15: Lightweight TBC-Based Modes for Small Hardware …

Why Serial?

Parallelization is not that efficient in hardware.

Table: Synthesis results of the Deoxys-I-128 implementation using TSMC65nm technology with x4 parallelization

Impl. Area Max. Freq. Throughput Efficiency(KGE) (MHz) (Mbps) (Mbps/KGE)

[KCP17] 59.53 847 7,227 121.40ATHENa 53.37 549 4,684 87.76

15 / 33

Page 16: Lightweight TBC-Based Modes for Small Hardware …

Why Serial?

▶ PFB: designed independently by Naito and Sugawara [NS19]around the same time of our work.

▶ It has partial parallelization for encryption only.▶ Doesn’t work for authentication and decryption.▶ Makes the feedback function and padding complicated.

16 / 33

Page 17: Lightweight TBC-Based Modes for Small Hardware …

Why Skinny?

▶ Cheap.▶ Wide variety of trade-offs.▶ Easy to mask.▶ Well analyzed with large security margin.▶ Large tweakey space.▶ Designed and supported by a strong team of researchers

providing different implementations.

17 / 33

Page 18: Lightweight TBC-Based Modes for Small Hardware …

Skinny SBoxMSB LSB

MSB LSB

18 / 33

Page 19: Lightweight TBC-Based Modes for Small Hardware …

Skinny Tweakey Schedule: Where the magic happens

Utilize the fully linear tweakey scheduling, mostly routing andrenaming bytes▶ Reverse tweakey schedule at the end of every TBC call,

instead of keeping input▶ Very low area, only 67 XOR gates! including both key

correction and block counter.▶ If we were to maintain tweakey state (due to modes/TBC), at

least 320 FFs

19 / 33

Page 20: Lightweight TBC-Based Modes for Small Hardware …

Performance Trade-offs

Lightweight core is suitable to unroll, excellent trade-off▶ Speeding up ×2 by two-round unrolling : ≈ + 1,000 GEs, +

20 % of total area

20 / 33

Page 21: Lightweight TBC-Based Modes for Small Hardware …

Why LFSR Counter?

▶ Counters are the bottle neck of TBC based designs.▶ A 50 bit arithmetic counter costs ∼ 600 GEs, with depth =

100.▶ An LFSR counter costs 7 ∼ 15 GEs.

21 / 33

Page 22: Lightweight TBC-Based Modes for Small Hardware …

Why ρ?

Possible feedback types:▶ Plaintext Feedback.▶ Combined Feedback.▶ Hybrid Feedback.

22 / 33

Page 23: Lightweight TBC-Based Modes for Small Hardware …

Is it all about the gate count?

Hybrid Feedback (e.g. HyENA, mixFeed): n XORs.

EKaEKb

EKc

M0

C0

M1

C1

M2

C2

M3

C3

δM

T

128 XORs. 32 XORs and 32 MUXes for a 32-bit bus.

23 / 33

Page 24: Lightweight TBC-Based Modes for Small Hardware …

Is it all about the gate count?▶ COFB Feedback: 288 XORs▶ In order to serialize, we need 32 Flip Flops and 32 MUXes.

S ρ

X

Y

S′ S

X

Y

S′

G

1. S0 = S1

2. S1 = S2

3. S2 = S3

4. S3 = S3 ⊕ S0

1. T← S0,S0 ← S1

2. S1 = S2

3. S2 = S3

4. S3 = S3 ⊕ T24 / 33

Page 25: Lightweight TBC-Based Modes for Small Hardware …

G goal: 0 XORs

▶ Impossible!!▶ What about 1? Possible, yet needs extra flip flops and MUXes

to serialize.▶ What about 1 XORs for byte serial, 4 for 32-bit bus, 16 in

total? Romulus Feedback.

25 / 33

Page 26: Lightweight TBC-Based Modes for Small Hardware …

ρ Feedback Function

Simple operation defined over bytes▶ Each input byte affects one and only one

byte.▶ Rotation happens within the same byte.▶ Computation is on the fly even for 8-bit

buses.

Choice of G1. S0 = G4(S0)

2. S1 = G4(S1)

3. S2 = G4(S2)

4. S3 = G4(S3)

S ρ

X

Y

S′ S

X

Y

S′

G

G4 =

Gs 0 0 00 Gs 0 00 0 Gs 00 0 0 Gs

26 / 33

Page 27: Lightweight TBC-Based Modes for Small Hardware …

Romulus M-variants

Romulus-AD

A M N M N

T C

Romulus-ENC

▶ (Fully) Nonce-misuse-resistance via SIV [RS06].▶ Greatly shares Romulus-N components (easy to implement

both using the same ciruit).▶ 1.5 Rate only (3 TBC calls to process two blocks).

27 / 33

Page 28: Lightweight TBC-Based Modes for Small Hardware …

Last pieces of the puzzle

▶ Cryptographers often neglect the cost of the control logic andpadding, which is drastic for LWC.

▶ Kumar [KHK+17] showed that a naive/referenceimplementation of the CAESAR Hardware API requires 4KGE, almost as much as the lightweight scheme itself.

▶ Our goals for the FSM: simple logic, simple padding, low areaand low power.

28 / 33

Page 29: Lightweight TBC-Based Modes for Small Hardware …

Padding.

For X ∈ {0, 1}≤128 let

padl(X) ={

X if |X| = 128,X ∥ 0l−|X|−8 ∥ len(X), if 0 ≤ |X| < 128,

▶ A bus of width w needs only w + 4 MUXes for padding, thelength is already stored in the FSM so no decoding required.

▶ 10∗ padding requires n8 bit decoder and 9w

8 MUXes.▶ Security is exactly the same.

29 / 33

Page 30: Lightweight TBC-Based Modes for Small Hardware …

FSM Optimization

qastart

qb

qd

qc

qe

qf qg qh

qi

30 / 33

Page 31: Lightweight TBC-Based Modes for Small Hardware …

FSM Optimization

qastart

qb

qd

qc

qe

X = αi

31 / 33

Page 32: Lightweight TBC-Based Modes for Small Hardware …

Current Design Corners for Romulus-N1 on TSMC 65nm

Arch. Area Power Energy Throughput(GE) (mW) (Enc/Auth) (pJ) (Enc/Auth only) (Gbps)

R1 5772 0.2 24/13 4.2/8R2 6635 0.25 16/9 8/14.2R4 8740 0.32 13.8/8.5 8.8/14.5R8 12990 0.45 21.1/14.4 7.3/10.6S1 3318 0.15 489/247.5 0.131/0.259P1 8048 0.28 36/19.2 4.2/8PS 5154 0.21 772/391 0.131/0.259

32 / 33

Page 33: Lightweight TBC-Based Modes for Small Hardware …

Comments and future work

▶ Romulus is not the limit: Remus-N2 acheives the samebit-security for smaller state and and smaller TBC, but underdifferent assumptions (ICM instead of TPRP).

▶ Focus on low power/energy implementations.▶ Study the design space of the threshold implementations more

closely.▶ Experiments on Romulus-M.▶ Architectures to combine Romulus with TBC-based hash

functions.▶ Higher level Side-Channel protection solutions.

Thank you!

33 / 33