#RSAC
Gauss Sieve on GPUs
Shang-Yi Yang1, Po-Chun Kuo1, Bo-Yin Yang2, and Chen-Mou Cheng1
1 Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
{ilway25,kbj,doug}@crypto.tw
2 Institute of Information Science, Acamedia Sinica, Taipei, Taiwan, [email protected]
#RSAC
Agenda
Motivation
Lattice-based Cryptography and Cryptanalysis
Sieve Algorithm
Lifting Technique
Parallel Method
Results
#RSAC
Motivation
NIST announces the Post-Quantum Standard competition
Lattice based cryptography provides many efficient cryptosystem
But how about the security?Security model is all based on lattice enumeration
What is the security estimation from Sieve algorithm?
#RSAC
Lattice based cryptography
Resistance to quantum attack
Provable security
Average-case to worse-case reduction [Ajtai’96, MicReg’05]
Most crypto-primitives can be constructed with lattice problemFHE [Gen’09, BV’11a] PKC secure under CCA[PW’08,Pei’09]OT [PVW’08]IBE [GPV’08, CHKP’10, ABBF’10] FE [GKP+’13]one-way functions [Ajt’96]digital signature schemes [GPV’08, CHKP’10]zero-knowledge proof [MV’03, KTX’08]
#RSAC
Lattice Problems
Central Problem: Shortest Vector Problem (SVP)
Related Problem: CVP, BDDP, GapSVP, short basis problem, covering radius problem….etc.
Relation between these problems cf. [LyuMic’09, AhaReg’05]
SVP: Given Basis, find the
shortest nonzero vector
#RSAC
How to estimate the hardness of lattice problem
Lattice basis reductionShorter vector
More orthogonal
Approximation algorithmLLL
BKZ
Exact algorithmEnumeration
Sieve
Voronoi Cell computing
Algorithm Time Space Type
Enumeration 2O(n log n) Poly(n) Deterministic
Voronoi Cell-based 2O(cn) 2O(cn) Deterministic
Sieve 2O(cn) 2O(cn) Randomized
#RSAC
Sieve algorithm
Proposed by Ajtai, Kumar, and Sivakumar in 2001
Main IdeaSample 2cn pointsCover the samples with spheres of sufficient radius centered at samplesObtain shorter vectors by subtracting the centers
ProsTime complexity is bounded in 2O(cn)
ConsSpace complexity is in 2O(cn)
Algorithm Time Space
AKS’2001 2O(n) 2O(n)
Regev’04 216n+o(n) 28n+o(n)
NV’08 25.9n+o(n) 22.95n+o(n)
ListSieve’10 23.2n+o(n) 21.6n+o(n)
ListSieve’10(birthday.)
22.465n+o(n) 21.233n+o(n)
Algorithm Time Space
NV’08 20.415n+o(n) 20.2075n+o(n)
Three-Level 20.3778+o(n) 20.2833n+o(n)
GaussSieve’10 20.52n+o(n) 20.41n+o(n)
Heuristic Version of Sieve
#RSAC
Gauss Sieve
Algorithm described by Micciancio and Voulgaris in 2009
All the vector in the list are pair-wise Gauss reduced
||a+b|| ≧ max(||a||,||b||)
Algorithma subtracts the projection quant on b as large as possible, swap a and b, repeatedly do this
— a := a – (<a,b>/<b,b>)•b
Swap(a,b) and goto previous step a
b
#RSAC
Gauss Sieve
List
Sampler
Stack
Vector
1. Take a Vector from Stack or sample by GaussianSampler
2. Reduce Vector by the list and Reduce List vectors by Vector, if a List vector is reduced, move it into Stack
3. move the Vector into List
List
Sampler
Stack
Vector
List
Sampler
Stack
Vector
• Repeat steps 1—3 below Until a short vector is found.
#RSAC
Gauss Sieve Implementation
[IKMT PKC’14]First massive parallel implementation
parallel on CPU by MPI
Our work [Kuo, Yang, Cheng, Yang] 50 times faster than previous single-CPU core
lifting computations in prime-cyclotomic ideals
reduces the complexity of inner products of ideals from O(n3) to O(n2)
Parallelize Gauss Sieve on single-GPU by the framework of Ishiguro et al.
Parallelize Gauss Sieve on multi-GPUs by the framework of Bos et al.
#RSAC
lifting computations in prime-cyclotomic ideals
… mod xn+xn-1+…1 = ℒ
mod xn+1-1 = ҧℒ = (xn+xn−1+…1)(x-1)= xn−1…
Know F(x) mod xn+xn-1+…1Compute xF(x) mod xn+xn-1+…1
Know F(x) mod xn-1Compute xF(x) mod xn-1
O(n), Not very easy O(1), very easy
prime-cyclotomic ideals cyclic ideals
Problem: waste spacehow to compute inner product in ҧℒ
#RSAC
lifting computations in prime-cyclotomic ideals
• Use the freedom to make vector components sum to zero
• Reduces the complexity of inner products over ideal lattice from O(n3) to O(n2)
• Fastest in prime-cyclotomic ideal lattice to the best of our knowledge
< 𝑢, 𝑣 > = < ത𝑢, ҧ𝑣 > −𝑝 ത𝑢 − 𝑞 ҧ𝑣 + 𝑛 + 1 𝑝𝑞
(1, 2, 3, 4, 5) mod x4+x3+x2+x+1
(1, 2, 3, 4, 5,0) mod x5-1 = (x4+x3+x2+x+1)(x-1)
(2, 3, 4, 5, 6, 1) mod x5-1
(1-p,2-p, 3-p, 4-p, 5-p, p) mod x5-1
p is the norm of this vector
#RSAC
Parallel Gauss Sieve (inner layer, in GPU)
List
Sampler
Stack
Vectors
1. Sample Vectors from Stack or GaussianSampler
List
Sampler
Stack
Vectors
2. Reduce Vectors by the list
List
Sampler
Stack
Vectors
3. Reduce Vectors by themselves, if a vector is reduced, move it into Stack
List
Sampler
Stack
Vectors
4. Reduce List vector by Vectors, if a List vector is reduced, move it into Stack move the Vectors into List
#RSAC
Parallel Gauss Sieve (outer layer, between GPUs)
ListSampler
Stack Vectors
List0Vectors List1
Vectors Listn-1VectorsStack Stack Stack
Update the minimal Listi
#RSAC
Our record
#RSAC
Implementation Results
CUDA version 7.5
8x NVIDIA GeForce GTX TITAN X
4 in the main machine
4 in a PCIe extension box.
SVP is from Darmstadt’s Ideal Lattice ChallengeAll the input is pre-computed by BKZ with blocksize=30 and delta=0.99
#RSAC
Parallel Efficient (in GPU)
Compare 1 GPU to single-core CPUIn dimension 96, [IKMT14] requires 200-CPU hour; our single-GPU implementation requires 9.6 GPU-hour
21.5x faster
Hardness Comparison between general/ideal latticeUse the model as [IKMT14] for speed-up ratio in ideal lattices
[IKMT14] 600x speedup in anti-cyclic ideal lattices in dimension 128.
This work: 300x speedup in prime-cyclotomic ideal lattices in dimension 130
Prime cyclotomic “1/2 as nice ” as anti-cyclic
#RSAC
Parallel efficient (between GPUs)
Parallel efficient = runtime for N GPUs
N × runtime for 1 GPU
In dim 108, parallel efficient is 74%, 72%, 55% and 45%, respectively
In dim 112 base on 2 GPUs, parallel efficient is 86%, 81% and 74% respectively
#RSAC
Experiment Results
16.81
23.2324.5
18.76
21.4 22.1
0
5
10
15
20
25
30
112 126 130
log_
2
Running Time [GPU-second] Number of vectors
#RSAC
Hardness Estimation Model from Sieve
conservative model of SVP hardness in ideal lattices, with approximation
#RSAC
Conclusion
We propose the first implementation on GPUsBoth inner & outer layer parallelism
We solve a 130 dimensional SVP instance over ideal latticeSpecifically, a prime-cyclotomic ideal lattice
We propose the first hardness estimation model for (ideal-) SVP based on sieve algorithm
#RSAC
Thank you!
Any questions?
#RSAC
Atsushi Takayasu and Noboru Kunihiro
A Tool Kit for Partial Key Exposure Attacks on RSA
The University of Tokyo, Japan
#RSAC
24
Background
#RSAC
RSA
Public key: 𝑁, 𝑒
Secret key: (𝑝, 𝑞, 𝑑)
Key generation: 𝑁 = 𝑝𝑞 and𝑒𝑑 = 1 mod (𝑝 − 1)(𝑞 − 1)
The security relates to the hardness for factoring 𝑁.
Several attacks with partial information of the secret key have been studied using lattice-based Coppersmith’s method.
25
#RSAC
Partial Key Exposure Attacks
Partial information of 𝑝, 𝑞
MSBs of 𝑑
LSBs of 𝑑
𝑝 = 100101010? ? ? ? ? ? ? ? ?
𝑑 = 111011010? ? ? ? ? ? ? ? ?
𝑑 =? ? ? ? ? ? ? ? ? 10110100126
#RSAC
Multi-Prime RSA
Public key: 𝑁, 𝑒
Secret key: (𝑝1, ⋯ , 𝑝𝑟 , 𝑑)
Key generation: 𝑁 = ς𝑖=1𝑟 𝑝𝑖 and
𝑒𝑑 = 1 modෑ
𝑖=1
𝑘
(𝑝𝑖−1)
The standard RSA is the special case for 𝑟 = 2.
Analogous attacks have been studied. 27
#RSAC
Previous Works
MSBs/LSBs of 𝑑 for RSA [BDF98],[BM03],[EJMW@EC’05],[SGM10],[TK@SAC’14]
Small prime differences for RSA [Weg02]
MSBs/LSBs of 𝑑 for Multi-Prime RSA [Hin08]
MSBs of 𝑝, 𝑞 for RSA [SMS08]
MSBs/LSBs of 𝑑 and MSBs of 𝑝, 𝑞 for RSA [SM08]
𝑝, 𝑞 for RSA sharing the LSBs [SWS+08]
Small prime differences for Multi-Prime RSA [ZT13],[ZT14],[TK@ICISC’14]
28
#RSAC
Previous Works
MSBs/LSBs of 𝑑 for RSA [BDF98],[BM03],[EJMW05],[SGM10],[TK14b]
Small prime differences for RSA [Weg02]
MSBs/LSBs of 𝑑 for Multi-Prime RSA [Hin08]
MSBs of 𝑝, 𝑞 for RSA [SMS08]
MSBs/LSBs of 𝑑 and MSBs of 𝑝, 𝑞 for RSA [SM08]
𝑝, 𝑞 for RSA sharing the LSBs [SWS+08]
Small prime differences for Multi-Prime RSA [ZT13],[ZT14],[TK14a]
29
Are all the papers valuable?
#RSAC
30
Our Contributions
#RSAC
General Exposure Scenarios
Public key: 𝑁, 𝑒 = 𝑁𝛼
Secret key: (𝑝1, ⋯ , 𝑝𝑟 , 𝑑 = 𝑁𝛽)
Key generation:𝑁 = ς𝑖=1𝑟 𝑝𝑖 and 𝑒𝑑 = 1 mod 𝛷(𝑁)
31
𝛼, 𝛽, 𝛾, 𝛿 -Partial Key Exposure Attacks
Attacks with ሚ𝑑, ෩𝛷 𝑁
s.t. ሚ𝑑 is the (𝛽 − 𝛾)-log𝑁 bit MSBs/LSBs of 𝑑෩𝛷 𝑁 − 𝛷 𝑁 ≤ 𝑁𝛿
#RSAC
Our proposed Attacks
We propose attacks for general exposure scenarios.
Our attacks contain all the currently known best attacks [EJMW@EC’05],[TK@SAC’14],[TK@ICISC’14] as special cases.
Special cases of our attacks improve attacks with
the MSBs/LSBs of 𝑑 for Multi-Prime RSA [Hin08],
the MSBs/LSBs of 𝑑 and MSBs of 𝑝, 𝑞 for RSA [SM08].
The result can be viewed as a tool kit for partial key exposure attacks on RSA.
32
#RSAC
MSBs of 𝑑 and MSBs of 𝑝, 𝑞 for 𝛾 = 5/16
33Sizes of Secret ExponentsPo
rtio
ns
of
Part
ial I
nfo
rmat
ion
fo
r 𝑑
#RSAC
MSBs of 𝑑 and LSBs of 𝑝, 𝑞 for 𝛾 = 5/16
34Sizes of Secret ExponentsPo
rtio
ns
of
Part
ial I
nfo
rmat
ion
fo
r 𝑑
#RSAC
MSBs of 𝑑 for Multi-Prime RSA for 𝑟 = 3
35Sizes of Secret ExponentsPo
rtio
ns
of
Part
ial I
nfo
rmat
ion
fo
r 𝑑
#RSAC
LSBs of 𝑑 for Multi-Prime RSA for 𝑟 = 3
36Sizes of Secret ExponentsPo
rtio
ns
of
Part
ial I
nfo
rmat
ion
fo
r 𝑑
#RSAC
37
Coppersmith’s Methods
#RSAC
Coppersmith’s Method
Coppersmith’s method uses the LLL lattice reduction algorithm and solves modular/integer equations with small roots.
19/27
#RSAC
Coppersmith’s Method
Coppersmith’s method uses the LLL lattice reduction algorithm and solves modular/integer equations with small roots.
Formulate an RSA key generation as appropriate equations.
𝑓 𝑥, 𝑦 = 𝑥 𝑁 + 𝑦 + 1 − 𝑒𝑑1 = 0 mod 𝑒𝑀
19/27
#RSAC
Coppersmith’s Method
Coppersmith’s method uses the LLL lattice reduction algorithm and solves modular/integer equations with small roots.
Formulate an RSA key generation as appropriate equations.
Construct a matrix whose row elements are coefficients of polynomials which has the same roots as the original equations.
19/27
#RSAC
Coppersmith’s Method
Coppersmith’s method uses the LLL lattice reduction algorithm and solves modular/integer equations with small roots.
Formulate an RSA key generation as appropriate equations.
Construct a matrix whose row elements are coefficients of polynomials which has the same roots as the original equations.
Since the short lattice vectors generated by the matrix have information of the roots, recover the vectors by applying the LLL reduction.
19/27
#RSAC
Coppersmith’s Method
Coppersmith’s method uses the LLL lattice reduction algorithm and solves modular/integer equations with small roots.
Formulate an RSA key generation as appropriate equations.
Construct a matrix whose row elements are coefficients of polynomials which has the same roots as the original equations.
Since the short lattice vectors generated by the matrix have information of the roots, recover the vectors by applying the LLL reduction.
19/27
#RSAC
Coppersmith’s Method
Coppersmith’s method uses the LLL lattice reduction algorithm and solves modular/integer equations with small roots.
Formulate an RSA key generation as appropriate equations.
Construct a matrix whose row elements are coefficients of polynomials which has the same roots as the original equations.
Since the short lattice vectors generated by the matrix have information of the roots, recover the vectors by applying the LLL reduction.
19/27
The matrix construction is crucial to obtain the best bounds.
#RSAC
44
Our General Formulations
#RSAC
Spirit of Our General Formulations
45
Partial information of
𝑑
Partial information of
𝑝, 𝑞Multi-Prime RSA
#RSAC
Spirit of Our General Formulations
46
Partial information of
𝑑
Partial information of
𝑝, 𝑞Multi-Prime RSAPartial information of
𝑝, 𝑞Multi-Prime RSA
Essentially the same information
#RSAC
Example
Given 𝑑1 the LSBs of 𝑑 for RSA𝑒(𝑑0𝑀 + 𝑑1) = 1 + 𝑘 (𝑁 − 𝑝 − 𝑞 + 1)
𝑓 𝑥, 𝑦 = 𝑥 𝑁 + 𝑦 + 1 − 𝑒𝑑1 = 0 mod 𝑒𝑀
The root 𝑥, 𝑦 = (𝑘, −𝑝 − 𝑞 + 1) is bounded above by 𝑋, 𝑌.
47
#RSAC
Example
Given 𝑑1 the LSBs of 𝑑 and 𝑝1, 𝑞1 the MSBs of 𝑝, 𝑞 for RSA𝑒(𝑑0𝑀 + 𝑑1) = 1 + 𝑘 (𝑁 − 𝑝 − 𝑞 + 1)
𝑓 𝑥, 𝑦 = 𝑥 𝑁 − 𝑝1 − 𝑞1 + 𝑦 + 1 − 𝑒𝑑1 = 0 mod 𝑒𝑀
The root 𝑥, 𝑦 = (𝑘, −𝑝 + 𝑝1 − 𝑞 + 𝑞1 + 1) is bounded above by 𝑋, 𝑌.
48
#RSAC
Example
Given 𝑑1 the LSBs of 𝑑 for Multi-Prime RSA
𝑒(𝑑0𝑀 + 𝑑1) = 1 + 𝑘ෑ𝑖=1
𝑘
(𝑝𝑖−1)
𝑓 𝑥, 𝑦 = 𝑥 𝑁 + 𝑦 + 1 − 𝑒𝑑1 = 0 mod 𝑒𝑀
The root 𝑥, 𝑦 = (𝑘,ς𝑖=1𝑘 (𝑝𝑖−1)−𝑁) is bounded above by
𝑋, 𝑌. 49
#RSAC
Spirit of Our General Formulations
50
Partial information of
𝑝, 𝑞Multi-Prime RSA
larger 𝑌smaller 𝑌
#RSAC
General Exposure Scenarios
Public key: 𝑁, 𝑒 = 𝑁𝛼
Secret key: (𝑝1, ⋯ , 𝑝𝑟 , 𝑑 = 𝑁𝛽)
Key generation:𝑁 = ς𝑖=1𝑟 𝑝𝑖 and 𝑒𝑑 = 1 mod 𝛷(𝑁)
51
𝛼, 𝛽, 𝛾, 𝛿 -Partial Key Exposure Attacks
Attacks with ሚ𝑑, ෩𝛷 𝑁
s.t. ሚ𝑑 is the (𝛽 − 𝛾)-log𝑁 bit MSBs/LSBs of 𝑑෩𝛷 𝑁 − 𝛷 𝑁 ≤ 𝑁𝛿
#RSAC
52
Our Lattice Constructions
#RSAC
Our Lattice Constructions
Applying the previous best strategy appropriately
• [EJMW@Eurocrypt05]The basic attacks that solve integer equations by simple lattice constructions.
• [TK@SAC’14]The attacks solve modular equations and are better for small ሚ𝑑.
• [TK@ICISC’14]The attacks solve modular equations and are better for large ෩𝛷 𝑁 .
53
#RSAC
Summary
We defined general exposure scenarios that include several partial key exposure attacks on (Multi-Prime) RSA as special cases.
For the general scenarios, we propose several attacks.
The attacks contain all the state-of-the-art attacks as special cases.
The attacks improve previous ones in two scenarios.
Our result enables beginners of Coppersmith’s methods to analyze the security of RSA.
54