c o u r se ca l e n d a r a n d l e ct u r e n o t e...

23
6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 1/10 On this page... (hide) 1. Rough Overview: 2. Form team: 3. Project plan: (2% of project grade) 4. Design review: (4% of project grade) 5. Demo #1 Unpipelined design (14% of project grade) 5.1 SingleCycle Memory Specification 6. Demo #2.0 Pipelined design with Perfect Memory (30% of project grade) 7. Demo #2.1 Pipelined design with Aligned Memory (0% of project grade) 7.1 Aligned SingleCycle Memory Specification 8. Demo #2.2 Pipelined design with Stalling Memory : 1 week after demo 2.0 (0% of project grade) 8.1 Stalling Memory Specification 9. Cache Demo Working twoway setassociative cache (15% of project grade) 10. Demo #3 (final demo) Pipelined Multicycle Memory with Optimizations (30% of project grade) 11. Final Project Report: May 10th (5% of project grade) Important Notes All provided modules are included in a project tar file. You will want to download this when getting started. Additional information on how to simulate your design is on the verification and simulation page on the sidebar. Deadlines and grading Date Project 4Feb Form project team (Feb 8th) 23Feb Project plan 3Mar Design Review 17Mar Demo 1 12Apr Demo 2 14Apr Cache FSM turnin 21Apr Cache Demo 9May Final demo 10May Final report There are four major deadlines over the course of your term project design, which will be met in the form of project demos with the course TA and a final project report. During a demo, it is important that both team members posses a conceptual understanding of the entire design. Answers such as "I don't know, my partner did that" will not be acceptable. However, a response such as "I didn't implement that part of the design, but it works in the following way..." is perfectly fine. CS552 Course Wiki: Spring 2016 Main » Course calendar and lecture notes

Upload: others

Post on 15-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 1/10

On this page... (hide)1. Rough Overview:2. Form team:3. Project plan: (2% of project grade)4. Design review: (4% of project grade)5. Demo #1 Unpipelined design (14% of project grade)5.1 SingleCycle Memory Specification

6. Demo #2.0 Pipelined design with Perfect Memory (30% of project grade)7. Demo #2.1 Pipelined design with Aligned Memory (0% of project grade)7.1 Aligned SingleCycle Memory Specification

8. Demo #2.2 Pipelined design with Stalling Memory : 1 week after demo 2.0 (0% of projectgrade)8.1 Stalling Memory Specification

9. Cache Demo Working twoway setassociative cache (15% of project grade)10. Demo #3 (final demo) Pipelined Multicycle Memory with Optimizations (30% of projectgrade)

11. Final Project Report: May 10th (5% of project grade)

Important NotesAll provided modules are included in a project tar file. You will want to download this whengetting started.Additional information on how to simulate your design is on the verification and simulationpage on the sidebar.

Deadlines and grading

Date Project

4Feb Form project team (Feb 8th)

23Feb Project plan

3Mar Design Review

17Mar Demo 1

12Apr Demo 2

14Apr Cache FSM turnin

21Apr Cache Demo

9May Final demo

10May Final report

There are four major deadlines over the course of your term project design, which will be met inthe form of project demos with the course TA and a final project report. During a demo, it isimportant that both team members posses a conceptual understanding of the entire design.Answers such as "I don't know, my partner did that" will not be acceptable. However, a responsesuch as "I didn't implement that part of the design, but it works in the following way..." is perfectlyfine.

CS552 Course Wiki: Spring 2016 Main » Course calendar and lecture notes

Page 2: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 2/10

Teams should be well prepared before showing up to a demo. Time is limited and your grade maybe negatively impacted if the demo could not be completed. Be sure that the designs you hand inwork without alteration in such a way that the TA could easily compile and simulate the designwithout special instructions.

1. Rough Overview:You can think of this project as having roughly six stages of development with several demos alongthe way.

1. You will first build a single cycle nonpipelined processor with a highly idealized memory2. Your processor can then be pipelined into distinct stages but while still using a highlyidealized memory

3. The memory will then be transitioned to using a more realistic banked memory module thatcannot respond to requests in a single cycle

4. A cache can then be implemented that can be used to improve the now degraded memoryperformance

5. Once the cache has been fully verified it can be incorporated into the full processor6. Optimizations can then be added for additional processor performance

2. Form team:The project is done in groups of two. These groups should be formed no later than February 19th.A google doc will be made available to specify your team.

3. Project plan: (2% of project grade)

Each group needs to turn in a typed report (one to two page singlespaced) describing your projectdesign and test plan. You are expected to develop a detailed schedule identifying key milestonesand a breakdown of the tasks by project partner. Make sure that your schedule takes into accountthe remaining homework assignments and your other course obligations (e.g., midterms).

You must have thought about the design at the high level and partitioning of work between youand your partner. The plan you come up will be your master plan for the semester and you will beasked to update/revise the plan as we go along.

In addition to the design, you are expected to develop a detailed test plan, including highleveldescriptions of component, module, and system tests. Include both project members names, emailaddresses, and team name on the report.

Look through the course calendar for the designreview, demo1, demo2, cachedemo,and finaldemo dates and plan your work accordingly. These dates are nonnegotiableand you must adhere to them. There will be a signup for a 15 minute meeting for designreview. Depending on how things shape up, we may do a signup and meetings for demo1, demo2, cachedemo and finaldemo also.

Bring this report (printed) to class on the due date.

4. Design review: (4% of project grade)Each group needs to create a complete handdrawn (or drawn with the aid of a graphing programlike Openoffice draw) schematic of an unpipelined WISCSP13 implementation. Each module, bus,and signal should be uniquely labeled. The schematic should be hierarchical so that the top leveldesign contains only empty shells for each planned submodule. In general, there will be a onetoone mapping of modules in your schematic to the modules you will eventually write in Verilog.

Page 3: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 3/10

While explicitly drawing pipeline stages in the schematic is not required, you should still designwith a pipeline in mind. It is a good idea to place modules near their final location in the pipelineddesign.

During the review, individual team members should be able to describe the datapath of any legalWISCSP13 instruction using the schematic as a reference. Teams will also be expected to defendthe design decisions that they make. You need to have thought through the control path anddecode logic. Not necessary to have done a complete table of signals, but if you have such a tablewith the control signal values for every instruction, that would be great.

Signup instructions are posted. You should signup for a timeslot in the google doc. Write eachpartner's last name against a timeslot. If none works, discuss with your class mates about apossible swap. If you still cannot find a timeslot that works, email both the TAs and Karu.

Both partners are required to be present and both are expected to explain and answerquestions about the whole design. Answering a question with: "I have no idea, mypartner did that" is a failing answer. You must (at least) be able to answer: "My partnerimplemented that, but it works in the following way....".

5. Demo #1 Unpipelined design (14% of project grade)All of the files you will need for the project are in a project tar file. You should download and untarthis while getting started.

To start, you should do a singlecycle, nonpipelined implementation. Figure 4.24 on page 271 is agood place to start.

For this stage, you will use the Single cycle perfect memory. Since you will need to fetchinstructions as well as read or write data in the cycle, use two memories one for instructionmemory and one for data.

Your design should be running the full WISCSP13 instruction set, except for the extracreditinstructions. It should use the singlecycle memory model. You should run vcheck and your filesmust all pass vcheck.

In the demo you will run a set of programs on your processor using the wsrun.pl script (check theverification and simulation page for more info), show that your processor works on the testprograms (full list in Test Programs page). You should run the tests under the following threecategories:

1. Simple tests2. Complex tests3. Random tests for demo1

1. rand_simple2. rand_complex3. rand_ctrl4. rand_mem

If you have more than two failures in the simple tests, you will automatically lose 75% of thedemo1 grade.

Use the list file to run each of the categories of test. When you run wsrun.pl with the list option,it will generate a file called summary.log, which looks like below:

Page 4: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 4/10

add_0.asm SUCCESS CPI:1.3 CYCLES:12 ICOUNT:9 IHITRATE: 0 DHITRATE: 0

add_1.asm SUCCESS CPI:1.7 CYCLES:7 ICOUNT:4 IHITRATE: 0 DHITRATE: 0

add_2.asm SUCCESS CPI:1.7 CYCLES:7 ICOUNT:4 IHITRATE: 0 DHITRATE: 0

SUCCESS means the test passed. Run all the categories and rename the summary.log files asshown below:

1. Simple tests: simple.summary.log2. Complex tests: complex.summary.log3. Random tests for demo1

1. rand_simple: rand_simple.summary.log2. rand_complex: rand_complex.summary.log3. rand_ctrl: rand_ctrl.summary.log4. rand_mem: rand_mem.summary.log

The log files MUST have the exact name. These are the log files produced by runningwsrun.pl list with the all.list file for each of those sets of benchmarks. You will have torename summary.log manually into these names. If your handed in code does not followthis convention, it will not be accepted and you will receive a zero for this demo. If indoubt about what to submit, email the TA *before* the deadline and doublecheck.

You should do rigorous testing and verification and should try to have zero failures on the othercategories. It is ok to have a very small number of failures but for every failure you must knowthe reason. You will submit your design electronically, which will be graded automatically. Theinstructor will then schedule oneonone appointments with teams that have exhibited a largenumber of failures.

In this demo, you must also synthesize your processor and submit the results of synthesis,including the area and timing reports. If there are any synthesis errors you will get ZERO for theentire demo1.

Everything due before class.

Electronic submission instructions

Submit a single demo1.tar containing the following directories [ tar cvf demo1.tar verilogsummary synthesis ] These subdirectories will already exist if you do your work in the demo1directory from the original tar that was provided.

1. The subdirectories should contain the following files1. verilog/ containing all verilog files. Please copy over ALL necessary files, yourprocessor should be able compile and run with files from this directory alone.

2. synthesis/ containing area_report.txt, cell_report.txt, timing_report.txt.3. summary/ containing the 6 summary.log files.

If the summary.log files are missing, you will automatically get zero points.

5.1 SingleCycle Memory SpecificationSince your singlecycle design must fetch instructions as well as read or write data in the samecycle, you will want to use two instances of this memory one for data, and one for instructions.

Page 5: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 5/10

Note: You should instantiate this memory module twice. One instance will serve as theinstruction memory while the other will serve as the data memory. Note that theprogram binary should be loaded into both instances. This will indeed be done (withoutany additional effort from your side) if you use the same module definition for bothinstances

+-------------+ data_in[15:0] >-------| |--------> data_out[15:0] addr[15:0] >-------| 65536 word | enable >-------| by 8 bit | wr >-------| memory | clk >-------| | rst >-------| | createdump >-------| | +-------------+

During each cycle, the "enable" and "wr" inputs determine what function the memory will perform:

enable wr Function data_out

0 X No operation 0

1 0 Read M[addr]

1 1 Write data_in 0

During a read cycle, the data output will immediately reflect the contents of the address input andwill change in a flowthrough fashion if the address changes. For writes, the "wr", "addr", and"data_in" signals must be stable at the rising edge of the clock ("clk").

The memory is intialized from a file. The file name is "loadfile_all.img", but you may change that inthe Verilog source to any file name you prefer. The file is loaded at the first rising edge of the clockduring reset. The simulator will look for the file in the same location as your .v files (or thedirectory from which you run wsrun.pl. The file format is:

@0 12 12 12 12 where "@0" specifies a starting address of zero, and "12" represents any 2digit hex number. Anynumber of lines may be specified, up to the size of the memory. The assembler will produce files inthis format.

At the end of the simulation, the memory can produce a dumpfile so that you may determine whathas been written to the memory. When "createdump" is asserted at the rising edge of the clock,the memory will create a file named "dumpfile" in the mentor directory. You may want to use thedecode of the "halt" instruction to assert "createdump" for a single cycle.

When a dumpfile is created, it will contain locations zero through the highest address that has beenmodified with a write cycle (not the highest address loaded from the loadfile). The format is:

0000 1234 0001 1234 0002 1234 Examining the source file memory2c.v, several possible changes should be obvious. The names ofthe files may be changed. The format of the dumpfile may be changed by modifying the $fdisplay

Page 6: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 6/10

statement; the syntax is very similar to C's fprintf statement. The starting and ending addresses todump may be modified in the "for" statement. The only thing that cannot be modified is the formatof the loadfile; that is builtin.

When you have two copies of the memory, for instructions and data, you may want to let bothmemories load the same loadfile, but only have the data memory generate a dumpfile.

The way to load programs for your processor is to use the assembler, create the memory dump.Name the memory dump, loadfile_all.img and copy this into the directory where memory2c.v ispresent.

6. Demo #2.0 Pipelined design with Perfect Memory (30% ofproject grade)At this point, the pipelined version of your design needs to be running correctly, but nooptimizations are needed yet. Correctly means that it must detect and do the right thing onpipeline hazards (e.g., stall). You will still use the singlecycle memory model. We will follow similarprotocol as demo1. I will run your tests and ask teams with any failures to signup for a demo withme.

In this demo also, you must also synthesize your processor and submit the results of synthesis,including the area and timing reports. If there are any synthesis errors you will get ZERO for theentire demo1.

We recommend that you write at least two tests additional hand tests to test pipelining. Writingmore will help simplify debugging. If you write additional tests, include them inverification/mytests/.

You must create and submit a document which should give an explanation of the behavior of yourprocessor for the perftestdepldst.asm test. Please use the following format:

Cycle Instruction Retired Reason

1

2

etc

The instruction retired would either be one of the instructions from the test program or a "NOP" ifdependencies necessitate any stall cycles. The reason column would give an explanation of why astall was needed in that instance. Please include this information in a pdf file titledinstruction_timeline.pdf.

Everything due before class.

What to submit

Electronic submission instructions

Submit a single demo2.tar file containing the following directories [tar cvf demo2.tar verilogverification synthesis]. These subdirectories will already exist if you do your work in thedemo2 directory from the original tar that was provided.

Page 7: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 7/10

1. verilog/ containing all verilog files. Please copy over ALL necessary files, yourprocessor should be able compile and run with files from this directory alone.

2. verification/mytests/ The assembly (.asm) files that you have written.3. verification/results/ Run all the categories and rename the summary.log files asshown below:

4. verification/instruction_timeline.pdf The timeline you have created for the retiringinstructions of perftestdepldst.asm1. Simple tests: simple.summary.log2. Complex tests: complex.summary.log3. Random tests for demo1

1. rand_simple: rand_simple.summary.log2. rand_complex: rand_complex.summary.log3. rand_ctrl: rand_ctrl.summary.log4. rand_mem: rand_mem.summary.log

4. Random tests for demo2: complex_demo2.summary.log5. Your code results: mytests.summary.log

5. synthesis/ the area and timing report

The log files MUST have the exact name. These are the log files produced by runningwsrun.pl list with the all.list file for each of those sets of benchmarks. You will haveto rename summary.log manually into these names. If your handed in code does notfollow this convention, it will not be accepted and you will receive a zero for thisdemo. If in doubt about what to submit, email the TA *before* the deadline anddoublecheck.

The next few demos are minor changes to your processor and you should plan on doing them veryquickly. They are optional. No print or electronic submissions required. April 24th is simplya suggested date. Make sure all demo2 tests pass at this phase.

7. Demo #2.1 Pipelined design with Aligned Memory (0% ofproject grade)No Submission required. This is optional.

At this step, replace the original singlecycle memory with the Aligned single cycle memory. This isa very similar module, but it has an "err" output that is generated on unaligned memory accesses.Your processor should halt when an error occurs. Verify your design.

7.1 Aligned SingleCycle Memory SpecificationBefore building your cache, you should use this memory to update and test your processor'sinterface to properly handle unaligned accesses. Many processors (e.g., MIPS) are byte addressable,but require that all accesses be aligned to their natural size (i.e., byte loads and stores can accessany individual byte, but word loads and stores must access aligned words). Since your processoronly has word loads and stores, this is pretty simple (to support byte stores, the memory wouldneed byte write enable signals; to support byte loads, either the memory or the processor needs amux to select the right byte). Notice that the memory always returns aligned data even on amisaligned load.

The verilog source (memory2c_align.v) and synthesizable version (memory2c_align.syn.v) wereincluded in the project tar.

Since your singlecycle design must fetch instructions as well as read or write data in the samecycle, you will want to use two instances of this memory one for data, and one for instructions.

Page 8: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 8/10

+-------------+ data_in[15:0] >-------| |--------> data_out[15:0] addr[15:0] >-------| 65536 word | enable >-------| by 16 bit |--------> err wr >-------| memory | clk >-------| | rst >-------| | createdump >-------| | +-------------+

During each cycle, the "enable" and "wr" inputs determine what function the memory will perform.On a unaligned access err is set.

enable wr Function data_out err0 X No operation 0 01 0 Read M[addr] 01 1 Write Write data_in 01 X X if (data[0]) set 1

8. Demo #2.2 Pipelined design with Stalling Memory : 1 weekafter demo 2.0 (0% of project grade)No Submission required. This is optional.

At this step, replace the single cycle memory with the Stalling memory. This is a very similarmodule, but has stall and done signals similar to the cache you built. Your pipeline will need to stallto handle these conditions. Verify your design.

Instruction memory: First replace your instruction memory module with this stallingmemory, keep your data data memory module the same (i.e. aligned perfect memory fromprevious step). Verify your design. This will be easier to debug, as only module's behavior haschanged.

Data memory: Now, replace your data memory module alone with this stalling memory,revert your instruction memory module back to the aligned perfect memory. Verify yourdesign. This will be easier to debug, as only module's behavior has changed.

Instruction and Data memory: Now change both instruction and data memories to thestalling memory design. Verify your design.

8.1 Stalling Memory SpecificationThis module has an interface identical to the cache interface in mem_system_hier.v. With the samesemantics.

Examining the source file stallmem.v, you will see "rand_pat", a shift register which controls the"ready" output. This is a random 32bit number. You can changes its value by changing the seedused for random number of generation. You can do this by passing in "seed" to wsrun.pl. Forexample:

wsrun.pl -seed 45 -prog foo.asm proc_hier_pbench *.v

Page 9: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 9/10

If you are executing from inside ModelSim with run All or using a testbench of your own forpreliminary testing, you can pass in the seed, by adding the string "+seed=<value>" to the vsimcommand. Or simply edit stallmem.v and set the seed to a different value.

9. Cache Demo Working twoway setassociative cache (15% ofproject grade)All information on the cache design and submissions instructions can be found on the cache designpage. Note that an FSM diagram is due a week earlier than the demo.

10. Demo #3 (final demo) Pipelined Multicycle Memory withOptimizations (30% of project grade)

Due May 9th, 5pm.Absolutely no extensions.If you have more than 2 failures (not counting aligntest and extracredit failures) you willreceive at least a 50% penalty.

At this final demo teams are expected to demonstrate the complete design to all specifications.This includes the following required items:

Twoway setassociative caches with multicycle memoryRegister file bypassingBypassing from beginning of the MEM stage to beginning of EX stageBypassing from beginning of the WB stage to the beginning of the EX stageBranches predicted nontakenHalt instructions must leave the PC pointing to Halt+2. Do not let it increment past thisaddress

Format will be similar to demo1.

What to submit:

Electronic submission instructions Submit a single demo3.tar file containing the followingdirectories [tar cvf demo3.tar verilog verification synthesis]. These subdirectories willalready exist if you do your work in the demo3 directory from the original tar file that wasprovided.

1. verilog/ containing all verilog files. Please copy over ALL necessary files, yourprocessor should be able compile and run with files from this directory alone.

2. verification/mytests/ The assembly (.asm) files that you have written.(atleast twotests)

3. verification/results/ Run all test programs and rename the summary.log files as listedbelow:1. perf.summary.log2. complex_demofinal.summary.log3. rand_final.summary.log4. rand_ldst.summary.log5. rand_idcache.summary.log6. rand_icache.summary.log7. rand_dcache.summary.log8. complex_demo1.summary.log9. complex_demo2.summary.log10. rand_complex.summary.log

Page 10: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : Course calendar and lecture notes browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ProjectDeadlinesAndGrading#toc5 10/10

11. rand_ctrl.summary.log12. inst_tests.summary.log

4. synthesis/ the area and timing report (no reports or zero area => zero for this demo)

You can use the script run-final-all.sh to run all the required tests. It will create allthese summary.log files.Running all the tests will take about 40 minutes. So plan ahead!

I will electronically grade this submission, if you have more than 2 failures you will receive atleast a 50% penalty. TAs will be holding office hours 15 on Wednesday the 11th for any teamsthat have failures and need to meet about partial credit. If this time does not work for a teamplease send us an email in advance to set up an alternative time.

No late submissions shall be graded. If we meet you for a demo, we will use files thatyou submitted at or before 5PM on the 9th of May.

If your design has known failures, then bring to the demo a written short explanation for asmany failures as you can track down. This will exponentially increase the points you will get,compared to simply showing up and saying we don't know the reason for the failures.

If your entire design does not work, then you may show me a demo of a partially completeprocessor. So in your best interest, snapshot working parts of your design as you add morefunctionality. For example, you may show me any one of the following, if your fullpipeline+cache does not work.

Stalling instruction memory aloneStalling data memory aloneStalling inst+data memoryDirectmapped instruction memory aloneDirectmapped data memory aloneDirectmapped inst+data memory2way instruction memory alone2way data memory alone2way inst+data memory

Both partners are required to be present and both are expected to explain and answerquestions about the whole design. Answering a question with: "I have no idea, mypartner did that" is a failing answer. You must (at least) be able to answer: "My partnerimplemented that, but it works in the following way....".

11. Final Project Report: May 10th (5% of project grade)Due by 1:00pm on May 10th

Each team should turn in one final report that is typed, well written, and well organized. Semantic,spelling, or grammatical errors will be penalized.

Please check the template for details on what is required.

For writing the final report use this template if you use Word, or follow the format in this pdf.

FinalReport.doc, FinalReport.pdf.

Page last modified on May 04, 2016, visited 2858 times

Page 11: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : WISCSP13 Instruction Set Architecture browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ISASpecification 1/5

WISCSP13 Instruction Set Architecture

On this page... (hide)1. Instruction Summary2. Formats2.1 Jformat2.2 Iformat2.3 Rformat

3. Special Instructions

1. Instruction SummaryInstructionFormat

Syntax Semantics

00000xxxxxxxxxxx

HALT Cease instruction issue, dump memory state tofile

00001xxxxxxxxxxx

NOP None

01000 sss dddiiiii

ADDI Rd, Rs,immediate

Rd <- Rs + I(sign ext.)

01001 sss dddiiiii

SUBI Rd, Rs,immediate

Rd <- I(sign ext.) - Rs

01010 sss dddiiiii

XORI Rd, Rs,immediate

Rd <- Rs XOR I(zero ext.)

01011 sss dddiiiii

ANDNI Rd, Rs,immediate

Rd <- Rs AND ~I(zero ext.)

10100 sss dddiiiii

ROLI Rd, Rs,immediate

Rd <- Rs <<(rotate) I(lowest 4 bits)

10101 sss dddiiiii

SLLI Rd, Rs,immediate

Rd <- Rs << I(lowest 4 bits)

10110 sss dddiiiii

RORI Rd, Rs,immediate

Rd <- Rs >>(rotate) I(lowest 4 bits)

10111 sss dddiiiii

SRLI Rd, Rs,immediate

Rd <- Rs >> I(lowest 4 bits)

10000 sss dddiiiii

ST Rd, Rs,immediate

Mem[Rs + I(sign ext.)] <- Rd

10001 sss dddiiiii

LD Rd, Rs,immediate

Rd <- Mem[Rs + I(sign ext.)]

10011 sss dddiiiii

STU Rd, Rs,immediate

Mem[Rs + I(sign ext.)] <- Rd Rs <- Rs + I(sign ext.)

11001 sss xxxddd xx

BTR Rd, Rs Rd[bit i] <- Rs[bit 15-i] for i=0..15

11011 sss tttddd 00

ADD Rd, Rs, Rt Rd <- Rs + Rt

11011 sss tttddd 01

SUB Rd, Rs, Rt Rd <- Rt - Rs

11011 sss tttddd 10

XOR Rd, Rs, Rt Rd <- Rs XOR Rt

11011 sss tttddd 11

ANDN Rd, Rs, Rt Rd <- Rs AND ~Rt

11010 sss tttddd 00

ROL Rd, Rs, Rt Rd <- Rs << (rotate) Rt (lowest 4 bits)

CS552 Course Wiki: Spring 2016 Main » WISCSP13 Instruction Set Architecture

Page 12: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : WISCSP13 Instruction Set Architecture browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ISASpecification 2/5

ddd 00

11010 sss tttddd 01

SLL Rd, Rs, Rt Rd <- Rs << Rt (lowest 4 bits)

11010 sss tttddd 10

ROR Rd, Rs, Rt Rd <- Rs >> (rotate) Rt (lowest 4 bits)

11010 sss tttddd 11

SRL Rd, Rs, Rt Rd <- Rs >> Rt (lowest 4 bits)

11100 sss tttddd xx

SEQ Rd, Rs, Rt if (Rs == Rt) then Rd <- 1 else Rd <- 0

11101 sss tttddd xx

SLT Rd, Rs, Rt if (Rs < Rt) then Rd <- 1 else Rd <- 0

11110 sss tttddd xx

SLE Rd, Rs, Rt if (Rs <= Rt) then Rd <- 1 else Rd <- 0

11111 sss tttddd xx

SCO Rd, Rs, Rt if (Rs + Rt) generates carry out then Rd <- 1 else Rd <- 0

01100 sssiiiiiiii

BEQZ Rs, immediate if (Rs == 0) then PC <- PC + 2 + I(sign ext.)

01101 sssiiiiiiii

BNEZ Rs, immediate if (Rs != 0) then PC <- PC + 2 + I(sign ext.)

01110 sssiiiiiiii

BLTZ Rs, immediate if (Rs < 0) then PC <- PC + 2 + I(sign ext.)

01111 sssiiiiiiii

BGEZ Rs, immediate if (Rs >= 0) then PC <- PC + 2 + I(sign ext.)

11000 sssiiiiiiii

LBI Rs, immediate Rs <- I(sign ext.)

10010 sssiiiiiiii

SLBI Rs, immediate Rs <- (Rs << 8) | I(zero ext.)

00100ddddddddddd

J displacement PC <- PC + 2 + D(sign ext.)

00101 sssiiiiiiii

JR Rs, immediate PC <- Rs + I(sign ext.)

00110ddddddddddd

JAL displacement R7 <- PC + 2 PC <- PC + 2 + D(sign ext.)

00111 sssiiiiiiii

JALR Rs, immediate R7 <- PC + 2 PC <- Rs + I(sign ext.)

00010 siic Rs produce IllegalOp exception. Must provide onesource register.

00011xxxxxxxxxxx

NOP / RTI PC <- EPC

2. FormatsWISCSP13 supports instructions in four different formats: Jformat, 2 Iformats, and the Rformat. These are described below.

2.1 JformatThe Jformat is used for jump instructions that need a large displacement.

JFormat 5 bits 11 bits

Op Code Displacement

Page 13: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : WISCSP13 Instruction Set Architecture browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ISASpecification 3/5

Jump Instructions

The Jump instruction loads the PC with the value found by adding the PC of the next instruction(PC+2, not PC+4 as in MIPS) to the signextended displacement.

The JumpAndLink instruction loads the PC with the same value and also saves the address of thenext sequential instruction (i.e., PC+2) in the link register R7.

The syntax of the jump instructions is:

J displacement

JAL displacement

2.2 IformatIformat instructions use either a destination register, a source register, and a 5bit immediatevalue; or a destination register and an 8bit immediate value. The two types of Iformatinstructions are described below.

Iformat 1 InstructionsIformat 1 5 bits 3 bits 3 bits 5 bits

Op Code Rs Rd Immediate

The Iformat 1 instructions include XORImmediate, ANDNImmediate, AddImmediate, SubtractImmediate, RotateLeftImmediate, ShiftLeftLogicalImmediate, RotateRightImmediate, ShiftRightLogicalImmediate, Load, Store, and Store with Update.

The ANDNI instruction loads register Rd with the value of the register Rs ANDed with the one'scomplement of the zeroextended immediate value. (It may be thought of as a bitclearinstruction.) ADDI loads register Rd with the sum of the value of the register Rs plus the signextended immediate value. SUBI loads register Rd with the result of subtracting register Rs fromthe signextended immediate value. (That is, immed Rs, not Rs immed.) Similar instructionshave similar semantics, i.e. the logical instructions have zeroextended values and the arithmeticinstructions have signextended values.

For Load and Store instructions, the effective address of the operand to be read or written iscalculated by adding the value in register Rs with the signextended immediate value. The valueis loaded to or stored from register Rd. The STU instruction, Store with Update, acts like Store butalso writes Rs with the effective address.

The syntax of the Iformat 1 instructions is:

ADDI Rd, Rs, immediate

SUBI Rd, Rs, immediate

XORI Rd, Rs, immediate

ANDNI Rd, Rs, immediate

ROLI Rd, Rs, immediate

SLLI Rd, Rs, immediate

RORI Rd, Rs, immediate

SRLI Rd, Rs, immediate

ST Rd, Rs, immediate

LD Rd, Rs, immediate

Page 14: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : WISCSP13 Instruction Set Architecture browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ISASpecification 4/5

STU Rd, Rs, immediate

Iformat 2 InstructionsIformat 2 5 bits 3 bits 8 bits

Op Code Rs Immediate

The Load Byte Immediate instruction loads Rs with a signextended 8 bit immediate value.

The ShiftandLoadByteImmediate instruction shifts Rs 8 bits to the left, and replaces the lower 8bits with the immediate value.

The format of these instructions is:

LBI Rs, signed immediate

SLBI Rs, unsigned immediate

The JumpRegister instruction loads the PC with the value of register Rs + signed immediate. TheJumpAndLinkRegister instruction does the same and also saves the return address (i.e., theaddress of the JALR instruction plus one) in the link register R7. The format of these instructions is

JR Rs, immediate

JALR Rs, immediate

The branch instructions test a general purpose register for some condition. The available conditionsare: equal to zero, not equal to zero, less than zero, and greater than or equal to zero. If thecondition holds, the signed immediate is added to the address of the next sequential instructionand loaded into the PC. The format of the branch instructions is

BEQZ Rs, signed immediate

BNEZ Rs, signed immediate

BLTZ Rs, signed immediate

BGEZ Rs, signed immediate

2.3 RformatRformat instructions use only registers for operands. Rformat 5 bits 3 bits 3 bits 3 bits 2 bits

Op Code Rs Rt Rd Op Code Extension

ALU and Shift InstructionsThe ALU and shift Rformat instrucions are similiar to Iformat 1 instructions, but do not require animmediate value. In each case, the value of Rt is used in place of the immediate. No extension ofits value is required. In the case of shift instructions, all but the 4 leastsignificant bits ofRt are ignored.

The ADD instruction performs signed addition. The SUB instruction subtracts Rs from Rt. (Not Rs Rt.) The set instructions SEQ, SLT, SLE instructions compare the values in Rs and Rt and set thedestination register Rd to 0x1 if the comparison is true, and 0x0 if the comparison is false. SLTchecks for Rs less than Rt, and SLE checks for Rs less than or equal to Rt. (Rs and Rt are two'scomplement numbers.) The set instruction SCO will set Rd to 0x1 if Rs plus Rt would generate acarryout from the most significant bit; otherwise it sets Rd to 0x0. The BitReverse instruction,

Page 15: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

6/19/2017 CS552 Course Wiki: Spring 2016 : WISCSP13 Instruction Set Architecture browse

http://pages.cs.wisc.edu/~karu/courses/cs552/spring2016/wiki/index.php/Main/ISASpecification 5/5

BTR, takes a single operand Rs and copies it to Rd, but with a leftright reversal of each bit; i.e. bit0 goes to bit 15, bit 1 goes to bit 14, etc.

The syntax of the Rformat ALU and shift instructions is:

ADD Rd, Rs, Rt

SUB Rd, Rs, Rt

ANDN Rd, Rs, Rt

ROL Rd, Rs, Rt

SLL Rd, Rs, Rt

ROR Rd, Rs, Rt

SRL Rd, Rs, Rt

SEQ Rd, Rs, Rt

SLT Rd, Rs, Rt

SLE Rd, Rs, Rt

SCO Rd, Rs, Rt

BTR Rd, Rs

3. Special InstructionsSpecial instructions use the Rformat. The HALT instruction halts the processor. The HALTinstruction and all older instructions execute normally, but the instruction after the halt will neverexecute. The PC is left pointing to the instruction directly after the halt.

The Nooperation instruction occupies a position in the pipeline, but does nothing.

The syntax of these instructions is:

HALT

NOP

The SIIC and RTI instructions are extra credit and can be deferred for later. They will be not testeduntil the final demo.

The SIIC instruction is an illegal instruction and should trigger the exception handler. EPC shouldbe set to PC + 2, and control should be transferred to the exception handler which is at PC 0x02.

The syntax of this instruction is:

SIIC Rs

The source regsiter name must be ignored. The syntax is specified this way with a dummy sourceregister, to reuse some components from our existing assembler. The RTI instruction shouldremain equivalent to NOP until the rest of the design has been completed and thoroughly tested.

RTI returns from an exception by loading the PC from the value in the EPC register.

The syntax of this instruction is:

RTI

See the Optimizations page for more information.

Page last modified on January 22, 2013, visited 55 times

Page 16: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

CS552 Final Project Team Name: Fetch, Decode, and XORcute Team Members: Eric Sullivan and Kai Zhao

2016 May 10

Design Overview − Less than a page describing overall what the project is about and what you did. Discuss each of the pipeline stages

and cache design. Diagrams are not necessary.

− CS552 final project is a processor for the WISC13 assembly language. Features include 1) pipelining by separating instructions into multiple stages to execute multiple instructions concurrently, 2) data cache, 3) instruction cache, 4) branch predictor, 5) register file bypassing, and 6) carrylookahead adder.

− The five pipeline stages are Fetch, Decode, Execute, Memory, and Writeback.

− The fetch (IF) stage basically outputs the instruction based on an internal PC. IF has a register to hold the PC. On every fetch, the PC increments by 2 via a carrylookahead adder to get the next instruction. IF uses an instruction cache to fetch the instruction based on PC. Lastly, IF takes a isFlush input signal to toss away current instruction and fetch new instruction based to PC passed in.

− The decode (ID) stage basically takes an instruction input and outputs all the control signals. ID has a register file with bypassing to allow data to be written and read from the same register. ID uses a control unit to decipher the instruction and generate the control signals. Lastly, ID takes the RegWrite signal back from the writeback stage to write data back into the register file.

− The execute (EX) stage basically does the computation. The operands is based on the ALU_OP control signals. The operands are from the decode stage (either register value or immediate) or from further stages of the pipeline, depending on whether there is a data dependency. The ALU inside EX supports 15 operations: 1) add, 2) subtract, 3) xor, 4) andn, 5) rol, 6) sll, 7) ror, 8) srl, 9) btr, 10) seq, 11) slt, 12) sle, 13) sco, 14) lbi, and 15) slbi. The ALU inside EX has a secondary computation unit to compute whether the branch should be taken. The execute unit generates the address to access memory, the writeback_data, and/or the new PC along with the flush signal.

− The memory (MEM) stage basically reads xor writes to the data memory system, which uses a 2way set associate cache with 4 words per a bank. The address is generated by the EX stage, the data_in comes from the register file in the ID stage, and the mem_read/mem_write control signal is pipelined from the ID stage. If the mem_read signal is asserted, then the MEM stage will output the data at the memory address. Lastly, MEM generates the stall signal that will stall the entire pipeline in case of cache miss.

− The writeback (WB) stage basically determines whether the ALU or the data memory system should write back to the register file, if either.

− The forwarding unit is placed on the execute stage to determine whether future writebacks will write to the register the EX uses. If the register number matches, then the forwarding unit will assert control signals to bypass from either registers between EX_MEM or MEM_WB.

1

Page 17: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

− (This part is optional. You will not lose points for skipping this. Nor will you get extra credit for including this.) Design overview which should include your highlevel processor schematics. Your design hierarchy must be clear. One highlevel schematic that shows only the highlevel pipeline is too little detail. Showing schematics for every MUX is too much. Use your discretion. It is OK to scan handdrawn schematics or attach them to the end of the document with a clear note in the document explaining what is where. All your material must be in one single PDF or .DOC file. (Maximum 3 pages)

2

Page 18: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

− (This part is optional. You will not lose points for skipping this. Nor will you get extra credit for including this.) A state diagram for any state machine controllers in your design. All other controllers should include a highlevel textual description. (This also can be short, maximum 4 pages)

3

Page 19: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

Optimizations and Discussions − Brief discussion of optimization implemented (Maximum 0.5 pages)

− pipelining separates instructions in multiple stages to execute multiple instructions concurrently. Pipelining can increase throughput by up to 5x because every stage of the pipeline will have workload

− forwarding handles data hazards by bypassing data from MEM or WB into EX

− data cache helps access data faster by limiting access to a smaller set of memory

− instruction cache helps fetch faster by limiting access to a smaller set of memory

− register file bypassing allows writing and reading to/from the same register in a single cycle

− 2way set associative cache helps on conflict cache misses over direct mapped cache. Since the comparisons are done in parallel, adding associativity adds more area than decreasing latency

− 4 words per a cache line/block helps with spatial locality if words nearby (within 4 words) previously accessed words are accessed again

− branch predict not taken helps prevent stalling by fetching the next instruction. If the branch prediction is wrong, then the pipeline will flush

− carry look ahead adder (CLA) for 4x faster execution than carryrippleadder as the 16bit addition is broken down into 4 4bit CLA.

4

Page 20: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

− Discussion about failures, if any (Required for partial credit): A discussion of what does not work and why. Also include what you would have liked to implement given more time. For each part of the implementation that does not work, turn in an annotated output in the form of a trace or script run that clearly shows the error. Give your thoughts as to why the error occurs and what could be done to fix it. (without counting traces this section should not exceed half a page).

Everything works now. Since the demo3 dropbox link is closed, a tar file with everything working (with the exception of siic extra credit) can be downloaded at https://www.dropbox.com/s/ap782vqgo2820mj/demo3.tar?dl=0

failure 1: Cache always wrote back to memory regardless of whether the cache line was valid.

failure 2: On a data dependency detected in ID, only IF is stalled. However, after changing to forwarding, the ID no longer need to check for stalls. The extra stall in the IF stage messes up the pipeline.

failure 3: On flush, the reg_IF_ID stage should output a nop regardless of whether there is a memory stall.

failure 4: The instruction cache should wait for memory stall.

Attached below is screenshots of verilog changes instead of waveforms since the processor works now

In figure below, we ANDed dirty with valid in isWriteback signal

In figure below, we changed isDataStall = 0;

5

Page 21: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

In figure below, we changed instr_ID to remove conditonal dependency on memStall

In figure below, we fixed instruction cache by waiting on !memStall

6

Page 22: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

Design Analysis − A table listing the possible hazards that arise in your pipelined design and the number of stall cycles that each hazard

incurs. (Maximum half a page)

Hazards Number of Stall Cycles

Structural hazard 0 since single issue

Data hazard: loaduse 1 + cache delay since data comes from end of MEM, and needs to go to beginning of EX

Data hazard: readafterwrite 0 + cache delay due to forwarding

Data hazard: writeafterread 0 + cache delay since antidependency

Data hazard: writeafterwrite 0 + cache delay since inorder with 1 writeback stage

Control Hazard 2, since EX computes isTaken and EX is 2 stages after IF

− A brief discussion of your cache design that explains the number of cycles for a cachehit, cachemiss (with eviction of a line), cachemiss (without any eviction). (Maximum half a page)

Cache Activity Number of Cycles

Cache hit 2 (1 to get data and another 1 to return to IDLE/ready)

Cache miss with eviction 17 (2 to determine missed, 4 to write back, 9 for allocate into cache with stall, and 2 to redo cache access). It takes 9 for allocate because there are 4 words, each which requires 1 cycle to access from memory and another cycle to write to cache, with an extra cycle at the end to be able to access different offsets.

Cache miss without eviction 13 (2 to determine missed, 9 for allocate into cache with stall, and 2 to redo cache access)

7

Page 23: C o u r se ca l e n d a r a n d l e ct u r e n o t e spages.cs.wisc.edu/~kzhao32/projects/cs552WISC-SP13.pdf · There are four major deadlines over the course of your term project

Conclusions and Final Thoughts − A conclusion outlining what you learned by doing this project and what you would have done differently.

(Maximum half a page)

We learned how to design a pipeline processor to execute WISC13 assembly instructions. Since we were limited to a subset of verilog in which we know the hardware design for, we learned how to build a pipelined processor with hardware building blocks as well as verilog. We learned how processors work and how to verify a processor even though processors have no data inputs/outputs. We developed intuition for coding and debugging large design projects. We also learned how to implement cache associativity and cache eviction policy in hardware.

We would have done the cache before the forwarding because adding cache after a mix of stalling/forwarded messed up all stages of the pipeline. On top of that, since ancient stall signal was the cause of 1 of our failures, we would have done forwarding from the start as opposed to just stalling. Also, we should have written debugging signals earlier as opposed to towards the end. Lastly, a guide of signal names would have helped because “read1data” and “read2data” is not descriptive nor helpful.

8