an expandable montgomery modular multiplication processor adnan abdul-aziz gutubalaaeldin a. m. amin...

34
An Expandable An Expandable Montgomery Modular Montgomery Modular Multiplication Multiplication Processor Processor Adnan Abdul-Aziz Gutub Adnan Abdul-Aziz Gutub Alaaeldin A. Alaaeldin A. M. Amin M. Amin Computer Engineering Department Computer Engineering Department King Fahd University of Petroleum & King Fahd University of Petroleum & Minerals Minerals Dhahran, SAUDI ARABIA Dhahran, SAUDI ARABIA

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

An Expandable An Expandable Montgomery Modular Montgomery Modular

Multiplication Multiplication ProcessorProcessor

Adnan Abdul-Aziz GutubAdnan Abdul-Aziz Gutub Alaaeldin A. M. Alaaeldin A. M. AminAmin

Computer Engineering DepartmentComputer Engineering Department

King Fahd University of Petroleum & King Fahd University of Petroleum & MineralsMinerals

Dhahran, SAUDI ARABIADhahran, SAUDI ARABIA

Presentation OutlinePresentation Outline Introduction (RSA cryptographic systemIntroduction (RSA cryptographic system The Systolic MultiplierThe Systolic Multiplier The Basic CellThe Basic Cell Montgomery Product (MP) AlgorithmMontgomery Product (MP) Algorithm Expandability of the Parallel DesignExpandability of the Parallel Design The Expandable MP HardwareThe Expandable MP Hardware ConclusionConclusion

RSA Public Key RSA Public Key CryptosystemCryptosystem

Developed in 1978, by Rivest, Developed in 1978, by Rivest, Shamir & AdlemanShamir & Adleman

Its security is based on theIts security is based on the integer integer factoring problemfactoring problem

The most popular method :-The most popular method :-– simple to understand & implementsimple to understand & implement– same algorithm for encryption & same algorithm for encryption &

decryptiondecryption– can also be used for digital signaturecan also be used for digital signature

ConceptConcept

EncryptionRSA

DecryptionRSA

Plaintextmessage PlaintextCiphertext

Encryptionkey

Decryptionkey

Concept

EncryptionRSA

DecryptionRSA

Plaintextmessage PlaintextCiphertext

Encryptionkey

Decryptionkey

Concept

Different

EncryptionRSA

DecryptionRSA

Plaintextmessage PlaintextCiphertext

RSA AlgorithmRSA Algorithm

For Encryption :

C = ME mod NFor Decryption :

M = CD mod N

M is the message, (E,N) is the encryption key, C is the cipher text, (D,N) is the decryption key.

Encryption key (E,N)

Encryption key (E,N)Decryption key (D,N)

Decryption key (D,N)public

private

RSA SecurityRSA Security

* Security depends on the key size.* Security depends on the key size.

largerkey size

largerkey size

more securesystem

more securesystem

Modular Multiplication• multiply/divide• add/subtract• logarithmic speed • Montgomery

Modular Multiplication• multiply/divide• add/subtract• logarithmic speed • Montgomery

hardwarehardware

Modular Exponentiationrepeated squaring

Modular Exponentiationrepeated squaring

softwareslow speed

softwareslow speed

RSA Implementations

&&

(G .A lia 1 9 9 1 )(C .W u 1 9 9 4 )

L ooku p Tab les

(E .F .B ricke ll1 9 8 3 )

R es id u e N u m b erS ys tem s

L ow er sp eedn o t ve ry

u se fu ll fo r ou rp rob lem

(S .E .E ld rid g e&C .D .W alte r1 9 9 3 )

(C .D .W alte r1 9 9 5 )

H ard w areD es ig n s

(n on -sys to lic )

(C .D .W alte r1 9 9 3 )

S ys to lica rrays

M on tg om ery'sM od u la r

M u lt ip lica tionA lg orith m

(C .D .W alte r 1 9 9 4 )

A L og arith m icS p eed M od u la r

M u lt ip lica tionA lg orith m

M od u la r A rith m etic

Montgomery’s MethodMontgomery’s Method

Introduced by P. Montgomery in 1985Introduced by P. Montgomery in 1985 Modular multiplication with out trial Modular multiplication with out trial

divisiondivision Can be implemented in VLSICan be implemented in VLSI Requires some pre-computations.Requires some pre-computations. Suitable for large number Suitable for large number

multiplication.multiplication.

MontgomeryMontgomery Modular Modular MultiplicationMultiplication

To Compute Z= XY mod NTo Compute Z= XY mod N

Pre-computation : R, R-1, N’11

mapping X &Y to Montgomery Domain :-x = XR mod N , y = YR mod N22

Montgomery Product: z = MP(x,y) = xy R-1 mod N33

OBJECTIVEOBJECTIVE

map z from Montgomery to normal: Z = MP(1,z)44

Mapping to Montgomery’s Domain:

Montgomery’s AlgorithmMontgomery’s AlgorithmTo computeTo compute : XY mod N : XY mod N

Pre-computations Pre-computations :: choose R= 2choose R= 2kk ; k = number of bits of E; R > N & ; k = number of bits of E; R > N &

GCD(R,N)=1.GCD(R,N)=1. compute: Rcompute: R-1-1 ; such that: R ; such that: R-1-1R mod N=1 & 0<RR mod N=1 & 0<R-1-1<N.<N. compute: N’ ; such that: N’=-Ncompute: N’ ; such that: N’=-N-1-1 mod R & 0<N’<R. mod R & 0<N’<R.

compute: x = X.R mod N.compute: x = X.R mod N. compute: y = Y.R mod N. compute: y = Y.R mod N.

performedby software

Montgomery’s AlgorithmMontgomery’s AlgorithmMP(x,y) = xyRMP(x,y) = xyR-1-1 mod N mod N

Montgomery’s Modular Multiplication: Montgomery’s Modular Multiplication: MP(x,y)MP(x,y)

P = x.yP = x.y U = P + N. (P.N’ mod R)U = P + N. (P.N’ mod R) S = U/RS = U/R MP = S (if S<N) ELSE MP = S-N MP = S (if S<N) ELSE MP = S-N

A2 A1* A :

R= 2k* A mod R : A1

* A/R : A2

k k

Number RepresentationNumber Representation

A :

Al-1 Al-2 A2A1 A0

A :

A : k-bits := l -words

Numbers RepresentationA :

Al-1 Al-2 A2A1 A0

A :

b - bitsA : k-bits := l -wordsA : k-bits := l*b - bits

Numbers RepresentationA :

Al-1 Al-2 A2A1 A0

A :

A : A0 + A12b + A222b + . . . + Al-2 2(l-2)b+ Al-12(l-1)b A : A0 + A12b + A222b + . . . + Al-2 2(l-2)b+ Al-12(l-1)b

b - bits

SystolicMultiplier

p = x.y + q

clock

xyqp

The Systolic Multiplier

0,...,0, xl-1 , xl-2 ,...., x1 ,x00,...,0, yl-1 , yl-2 ,....., y1 ,y0

0, q2l-1 , q2l-2 ,........, q1 ,q0

p0 , p1,..............., p2l-1 , p2l

z0,...,0,1

Control input

First product digit

Building the Systolic Building the Systolic MultiplierMultiplier

clock

0,..,0, xl-1 ,...., x1 ,x0

0,..,0, yl-1 ,....., y1 ,y0

0, q2l-1 ,........, q1 ,q0

p0 , p1,......., p2l-1 , p2l

x

y

q

p

z0,...,0,1 zin

xin

yin

qin

pout

cell 1 cell 2 cell l/2+1

0

• (l/2 + 1) cells required for l-digit multiplication• (l/2 + 1) cells required for l-digit multiplication

Expandable Systolic Multiplier

x

y

q

p

z zin

xin

yin

qin

pout

cell 1 cell l/2+1

zin

xin

yin

qin

pout

zout

xout

yout

qout

pin

clock

cell 1 cell l/2+1

zout

xout

yout

qout

pin 0

Multiplier for l-digits Multiplier for l-digits

Multiplier for 2l-digits

Systolic Montgomery Reduction

(J. Sauerbrey 1992) N’0= -N-1 mod 2b ;

p = x.y ; for i = 0 to l-1 vi = pi . N’0 mod 2b ;

p = p+vi N 2bi ;

end for ; return p/R ;

Note that x,y < N< R where R = 2l*b & gcd(R,N) = 0

SystolicMultiplierx

y

q

p

z

clock0,...,0,1

p = x.y + q

0,...,0,Nl-1,...,N0

X0,...,0,N0’,...,N’0

0, p2l-1 , p2l-2 ,......., p1 ,p0

0,...,0,t0 , t1,............, tl-1

l-times

l-times

VHDLVHDL

Implementation of the Systolic Montgomery Reduction for l = 4

x y

qx.y + q x

y

x.y mod 2b

2b : base of numbers x & y

2T

delay of 2-clock cycles

T

T

T

T

T

T

T

T

2T2T2TT

2T2T2T

N

000 N’0

p(0)

p(4)

Systolic Multiplier

CorrectCorrect

Clarificationfor l = 4

Clarificationfor l = 4

p(2)

• N’0 = -N-1 mod 2b ;

• p(0) = x.y ;

• for i = 0 to l-1

• vi = pi(i) . N’0 mod 2b ;

• p(i+1) = p(i) + vi N 2b i ;

• end for ;

• return p(l)/R ;

T

T

T

T

T

T

T

T

2T2T2TT

2T2T2T

N

0 N’0

p(0) p(4)

v0

p(1) p(3)

v1v2 v3

p(0)

p(0) & N’0 is precomputed

Expandability of the Expandability of the Parallel ImplementationParallel Implementation

basic design for l-digits

expanded design for 2l-digits

expanded design for 3l-digits

ProjectionProjection

x y

qx.y + q x

y

x.y mod 2b

2b : base of numbers x & y

2T

delay of 2-clock cycles

T

T

T

T

T

T

T

T

2T2T2TT

2T2T2T

N

000 N’0

p(0)

p(4)

Systolic Multiplier

The Serial MP DesignThe Serial MP Design

multiplier

Systolic Multiplier p = xy + q

z

x

y

q

p

z(i)

N(i)

v(i)

p(i)

2l+1

p(i+1)

2l

z(i+1)

2l+ 1

2TN(i+1)

N’0{ }Mux 0

z(i)

LOOP : i = 0 to l-1LOOP : i = 0 to l-1 p(0) is precomputed

For ExpandabilityFor Expandability

Allow input data to have more digitsAllow input data to have more digits Allow systolic multiplier to be Allow systolic multiplier to be

expandableexpandable Allow registers to be expandableAllow registers to be expandable MultiplexingMultiplexing

The Expandable MP The Expandable MP systemsystem

Basicchipfor

l-digits

inputdata

Results

Chip for

additional l-digitsDesign for 2l-digits

Design for 3l-digits

additional l-digits

Design for 4l-digits

Chip for

additional l-digits

VHDL ModelingVHDL Modeling All three designs were modeled in All three designs were modeled in

VHDLVHDL Structural level => similar to real Structural level => similar to real

hardwarehardware Designs >> fully parametrized in Designs >> fully parametrized in

terms:terms:– ‘ ‘ll’ number of words ’ number of words – ‘‘bb’ number of bits in each word’ number of bits in each word– ‘‘tt’ time delay for each gate’ time delay for each gate

An expandable Montgomery modular An expandable Montgomery modular multiplication processor was designed, multiplication processor was designed, modeled in VHDL, and analyzed. modeled in VHDL, and analyzed.

Conclusion

..p(0)1 p(0)0

Systolic Systolic Montgomery Montgomery

ReductionReductionsignal flow graph for l = 4signal flow graph for l = 4

N’N’0 0 = -N= -N-1-1 mod 2 mod 2bb ; ; p(0) = x.y ;p(0) = x.y ; for i = 0 to for i = 0 to ll-1-1 vvii = p = pii(i) . N’(i) . N’00 mod 2 mod 2b b ;;

p(i+1) = p(i) + vp(i+1) = p(i) + vii N b N bb b

ii end for ;end for ; return p(return p(ll)/r ; )/r ;

time : 0 1 2 3 4 5 6

....0 0 0 0 N’0

....0 0 N3 N2 N1 N0

x y

qx.y + q x

y

x.y mod 2b

2b : base of numbers x & ySystolic Multiplier

• N’0 = -N-1 mod 2b ;

• p(0) = x.y ;

• for i = 0 to l-1

• vi = pi(i) . N’0 mod 2b ;

• p(i+1) = p(i) + vi N 2b i ;

• end for ;

• return p(l)/R ;

Montgomery’s AlgorithmMP(x,y) = xyR-1 mod N

Montgomery’s AlgorithmMP(x,y) = xyR-1 mod N

Loop: i = 0• v0 = p0(0) . N’0 mod 2b

• p(1) = p(0) + v0 N 20

Loop: i = 1• v1 = p1(1) . N’0 mod 2b

• p(2) = p(1) + v1 N 2b

Loop: i = 2• v2 = p2(2) . N’0 mod 2b

• p(3) = p(2) + v2 N 22b

Th e R S A c ryp tog rap h icp rocessor (H . S ed lak1 9 8 8 )V L S I im p lem en ta tion o f p u b lickey en c ryp tion a lg orith m s(G . O rton 1 9 8 7 )F as t R S A -H ard ware(F . H oorn eart 1 9 8 8 )

n o en ou g h in fo rm ation ,

o r n o t p rac tica lfo r exp an d ab ility

V IC TO R : A nE ffic ien t R S A

H ard ware(H . O ru p 1 9 9 0 )

A H ig h S p eedR S A P rocessor(F . A l-Tu wa ijry

1 9 9 1 )

A Mod u lar SystolicExp on en tiation Un it(J.Sau erb rey 1992)

w ell-d e fin edp rop osed

Im p lem en ta tion s

R S A D es ig n s

suitable for expandabilitylogical start