1 lucas-lehmer primality tester team: w-4 nathan stohs w4-1 brian johnson w4-2 joe hurley w4-3...

44
1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

Post on 20-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

1

Lucas-Lehmer Primality Tester

Team: W-4

Nathan Stohs W4-1

Brian Johnson W4-2

Joe Hurley W4-3

Marques Johnson W4-4

Design Manager: Prateek Goenka

Page 2: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

2

Agenda

• Background (Marques)• Project Description (Marques) • Algorithmic Description (Joe)• Data Flow/Block Diagram (Joe)• Design Process (Nathan)• Simulations (Nathan)• Floorplan/Layout (Brian)• Conclusions (Brian)

Page 3: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

3

History of 2P-1

• 16th century it was believed 2P-1 was prime for all prime P’s

• 1536 Hudalricus Regius proved 211-1 was not prime

• French monk Marin Mersenne published Cogitata Physica-Mathematica where he stated 2P-1 was prime for P = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127 and 257 

Page 4: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

4

Lucas-Lehmer

• François Edouard Anatole Lucas

• 1876 proved that the number 2127 - 1 is prime using his own methods

• Derrick Lehmer – 1930 he refined Lucas’s method

Page 5: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

5

Make History

• December 2005• 43rd Known Mersenne Prime Found!!• Dr. Curtis Cooper and Dr. Steven Boone• Professors at Central Missouri State University • 230,402,457-1

Page 6: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

6

Prime Number Competitions• Electronic Frontier Foundation

• $50,000 to the first individual or group who discoversa prime number with at least 1,000,000 decimal digits (awarded Apr. 6, 2000)

• $100,000 to the first individual or group who discoversa prime number with at least 10,000,000 decimal digits

• $150,000 to the first individual or group who discoversa prime number with at least 100,000,000 decimal digits

• $250,000 to the first individual or group who discoversa prime number with at least 1,000,000,000 decimal digits

Page 8: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

8

Mersenne Prime Algorithm

• Only used for numbers that are in the form 2P-1

• For P > 2

• 2P-1 is prime if and only if Sp-2 is zero in this sequence:

• S0 = 4

• SN = (SN-12 - 2) mod (2P-1)

Page 9: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

9

Example to Show 27 - 1 is Prime

• 27 – 1 = 127

• S0 = 4

• S1 = (4 * 4 - 2) mod 127 = 14

• S2 = (14 * 14 - 2) mod 127 = 67

• S3 = (67 * 67 - 2) mod 127 = 42

• S4 = (42 * 42 - 2) mod 127 = 111

• S5 = (111 * 111 - 2) mod 127 = 0

Page 10: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

10

Computations needed:-Squaring (not a problem…)-Add/Subtract (not a problem…)

-Modulo (2n – 1) multiplication (?)

Algorithmic description

We knew the necessary computations, but how to translate that to gates?

Page 11: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

11

Mechanisms behind the math• If done with brute force, modulo 2n-1 could have

been ugly.– Would need to square and find the remainder

via division.• Luckily, for that specific computation, math is on

our side, the 2n-1 constraint saves us from division, as will be seen.

• A quick search on www.ieee.org produced inspiration.

• Reto Zimmermann. Efficient VLSI Implementation of Modulo (2n +- 1) Addition and Multiplication. Computer Arithmetic, 1999; p158-167.

Page 12: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

12

Useful Math: Multiplication

Just like any other multiplication, a modulo multiplication can be computed by (modulo) summing the partial products.

So modulo multiplication is multiplication using a modulo adder.

From the Zimmerman paper

Page 13: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

13

Mod Calc

Mod add

Count

Subtract 2

Block Diagram

P

Out

16 16

1

FSM

start

1done

Register

16

16

Compare

2

1

4

2

2

1

16

Counter

Next Partial Product

16

Register

16

16

2

S1 = (4 * 4) mod 127 - 2 = 14

Loop xP-2

S5 = (111 * 111 - 2) mod 127 = 0

...S2 = (14 * 14) mod 127 - 2 = 67

Loop x16

Page 14: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

14

Design ProcessThe Process So far:

- Found Mathematical Means (core algorithm)

- Found Computational Means (modulo multiplier, adder)

From the above, a high level C program was written in a manner that would easily translate to verilog and gates, or at least more standard operations

int mod_square_minus(int value, int p, int offset) { int acc, i; int mod = (1 << p) - 1; for(acc=offset, i=0; i<(sizeof(int)*8-1); i++) { int a = (value >> i) & 1; int temp; if (a) { if (i-p > 0)

temp = value << (i-p); else

temp = value >> (p-i); acc = acc + temp + ((value << i) & ((1 << p) - 1)); } if (acc >= mod) acc = acc - mod; } return acc;}

This easily translated into behavorial verilog, and readily turned into a gate-level implementation. Essentially it was written in a more low-level manner.

Page 15: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

15

Design Process

The rest of the design can simply be thought of as a wrapper for the modulo multiplier.

The following slides contain Verilog code that was directly taken from the C code below.

module mod_mult(out, itrCount, x, y, mod, p, reset, en, clk); input [15:0] x, y, mod, p; output [15:0] out;

input reset, en, clk;

wire [15:0] pp, ma0, temp; output [3:0] itrCount;

counter mycount(itrCount, reset, en, clk); partial_product ppg(pp, x, y, itrCount, mod, p); mod_add modAdder(out, pp, temp, mod); dff_16_lp partial(clk, out, temp, reset, en);

endmodule

Top level of multiplier

Page 16: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

16

module partial_product(out, x, y, i, mod, p); output [15:0] out; input [15:0] x, y, mod, p; input [3:0] i;

wire [15:0] diff1, diff2, added, result, corrected, final; wire [15:0] high, low, shifted, toadd; wire cout1, cout2, ithbith, toobig;

sub_16 difference1(diff1, cout1, {12'b0, i}, p); sub_16 difference2(diff2, cout2, p, {12'b0, i}); shift_left shiftL(high, y, diff1[3:0]); shift_right shiftR(low, y, diff2[3:0]); mux16 choose(high, low, shifted, cout1);

shift_left shiftL2(toadd, y, i); and16 bigand(added, toadd, mod);

fulladder_16 addhighlow(.out(result), .xin(added), .yin(shifted), .cin({1'b0}), .cout(nowhere));

sub_16 correct(.out(corrected), .cout(toobig), .xin(mod), .yin(result)); mux16 correctionMux(.out(final), .high(corrected), .low(result), .sel(toobig));

shift_right ibit({15'b0, ithbit}, x, i); select16 checkfor0(.out(out), .x(result), .sel(ithbit));

endmodule

Partial Product Unit w/ modulo reduction

Page 17: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

17

module mod_add(out, x, y, mod); input [15:0] x, y, mod; output [15:0] out;

wire cout, isDouble, cin; wire [15:0] plus, lowbits, done, mod_bar, check;

fulladder_16 add(.out(plus), .xin(x), .yin(y), .cin(cin), .cout());

invert_16 inverter(mod_bar, mod);

and16 hihnbits(check, plus, mod_bar); and16 lownbits(done, plus, mod);

or8 (cin, check[0], check[1], check[2], check[3], check[4], check[5], check[6], check[7], check[8], check[9], check[10], check[11], check[12], check[13], check[14], check[15]);

compare_16 checkfordouble(isDouble, done, 16'b1111_1111_1111_1111); mux16 fixdouble(.out(out), .high(16'b0), .low(done), .sel(isDouble));

endmodule

Modulo Adder

Page 18: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

18

Final Design Process Notes

• Lessons learned: Never tweak the schematics without retesting the verilog first. Timing issues can be subtle. Verilog is better for catching them and quickly fixing/retesting than schematics.

• Considering total time spent during this phase, roughly half was on the “core” and the FSM, the rest on the “wrapper”.

Page 19: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

19

Road to verification : C2 Examples of the high-level C implementations:

Tyrion:~/Desktop/15525 nstohs$ ./prime4 7round 1: (4 * 4 - 2) mod 127 = 14round 2: (14 * 14 - 2) mod 127 = 67round 3: (67 * 67 - 2) mod 127 = 42round 4: (42 * 42 - 2) mod 127 = 111round 5: (111 * 111 - 2) mod 127 = 027-1 is prime

Tyrion:~/Desktop/15525 nstohs$ ./prime4 11round 1: (4 * 4 - 2) mod 2047 = 14round 2: (14 * 14 - 2) mod 2047 = 194round 3: (194 * 194 - 2) mod 2047 = 788round 4: (788 * 788 - 2) mod 2047 = 701round 5: (701 * 701 - 2) mod 2047 = 119round 6: (119 * 119 - 2) mod 2047 = 1877round 7: (1877 * 1877 - 2) mod 2047 = 240round 8: (240 * 240 - 2) mod 2047 = 282round 9: (282 * 282 - 2) mod 2047 = 1736211-1 is not prime

Page 20: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

20

Road to verification: Verilog

Samples of Verilog Verification output:

Partial Product Unit p = 7380 ppOut= 56, x= 14, y= 14, i= 2, mod= 127, p= 7400 ppOut= 112, x= 14, y= 14, i= 3, mod= 127, p= 7420 ppOut= 0, x= 14, y= 14, i= 4, mod= 127, p= 7440 ppOut= 0, x= 14, y= 14, i= 5, mod= 127, p= 7

Top Level p = 7itrOut= xitrOut= 4itrOut= 14itrOut= 67itrOut= 42itrOut= 111itrOut= 0

Top Level p = 11itrOut= xitrOut= 4itrOut= 14itrOut= 194itrOut= 788itrOut= 701itrOut= 119itrOut= 1877…

Tests were either specific tests on important units such as Partial_Product

…or top level tests. Note that these are the same results generated from the C code

Page 21: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

21

Road to verification: Schematic I

Schematic Test of our modular adder.

128 + 68 Mod 127 = 69

Page 22: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

22

Road to verification: Schematic II

Plot of the top level output after a single iteration, p=7

Output after a single iteration is 14, the expected value.

Page 23: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

23

Road to verification: Schematic III

4 14 67 42 111

Page 24: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

24

Road to verification: Intermission

Disk Space required for a full-length schematic test of p=7 : 6 GBTime required for a full-length schematic test of p=7 : 5 hours

Disk Space required for a full-length extractedRC test of p=7 : 20 GBTime required for a full-length extractedRC test of p=7 : 8 hours

Simulations become lengthy due to tests needing to be “deep” to be useful.

Page 25: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

25

Layout: ExtractedRC – Full Run

4 14 67 42 111

Page 26: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

26

TimingTo determine the bounds of our clock, Pathmill was used once major portions of the schematic was complete.

The critical path through our design is one loop through the modular multiplier, which runs through the modular adder and partial products module.

The pathmill delay of the modular adder was 9ns, and 5.2 ns through the partial products module.

This already puts our total delay at 14.2 ns, putting our schematic delay at 70 MHz.

For extractedRC, due in part to simulation issues, a conservative 50 MHz was chosen as the final clock.

Page 27: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

27

Issues

• extractedRC of partial_product module• Registers switch

– Custom design to DFFs with muxes

• Switching from parallel calculations to series– Transistor count vs. clock cycles

• Syncing up design between people– Transferring files– Different design styles

• LONG simulation times• Floorplanning

– Too much emphasis on aspect ratios and not enough on wiring– Couldn’t decide on one set floorplan

Page 28: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

28

Floorplan v1.0

Page 29: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

29

Floorplan v2.0

Page 30: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

30

Final Floorplan

Page 31: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

31

Pin Specifications

Pin Type # of Pins

Vdd! In/Out 1

Gnd! In/Out 1

p<0:15> In 16

clk In 1

start In 1

Done Out 1

out Out 1

Total - 22

Page 32: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

32

Initial Module SpecificationsModule Transistor

Count

Area

(µm²)

Transistor

Density

FSM 300 900 .33

mod_p 2,440 7,000 .35

mod_add 1,282 9,000 .14

partial_product 8,676 65,000 .13

count 1,656 6,000 .27

sub_16 704 3,500 .20

Registers 1,848 6,000 .30

compare 36 300 .12

Total 16,942 97,700 .17

Page 33: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

33

Final Module Specifications

Module Transistor

Count

Area

(µm²)

Transistor

Density

FSM 152 1,200 .13

mod_p 1,280 8,603 .15

mod_add 1,168 5,603 .21

partial_product 7,520 54,680 .14

count 1,424 8,701 .16

sub_16 576 2,934 .20

Registers 896 6,028 .15

compare 56 201 .28

Total 13,702 86,621 .16

Aspect

Ratio

2.45

0.79

2.40

1.16

6.88

4.49

4.76

4.41

1.01

Page 34: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

34

Chip Specifications

• Transistor Count: 13,702

• Size: 296.51µm x 292.13µm

• Area: 86,621µm²

• Aspect Ratio: 1.01:1

• Density: 0.16 transistors/µm²

Page 35: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

35

Final Floorplan

Page 36: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

36

Final Floorplan

Page 37: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

37

Partial Product

shift_rightshift_left

shift_right shift_left

adder

16-bit and Select16

Sub_16

mux

Page 38: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

38

Poly Layer

Density: 7.14%

Page 39: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

39

Active Layer

Density: 8.76%

Page 40: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

40

Metal1 Layer

Density: 23.86%

Page 41: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

41

Metal2 Layer

Density: 19.97%

Page 42: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

42

Metal3 Layer

Density: 11.30%

Page 43: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

43

Metal4 Layer

Density: 10.34%

Page 44: 1 Lucas-Lehmer Primality Tester Team: W-4 Nathan Stohs W4-1 Brian Johnson W4-2 Joe Hurley W4-3 Marques Johnson W4-4 Design Manager: Prateek Goenka

44

Conclusions

• Plan for buffers-Will be hard to put them in after the fact

• Your design will change dramatically from start to finish so be flexible

• Communication is key

• Do layout in parallel