course lecture notes for - mcmaster universitybruha/2mf3cs08.pdf · course lecture notes . for ....

Department of Computing and Software

Course Lecture Notes

for

CS2MF3

Digital Systems and

Systems Programming

W.F.S.Poehlman November, 2008 Revision 1

McMaster University Hamilton, Ontario Canada L8S 4K1

i

CS2MF3 Lecture Notes

TABLE of CONTENTS

lectures Lecture 1 1

OVERVIEW Lecture 2 5

BASIC COMPUTER COMPONENTS Lecture 3 10

HISTORICAL CONSIDERATIONS Lecture 4 14

COMPUTER SOFTWARE CONSIDERATIONS Lecture 5 19

DATA REPRESENTATION Lecture 6 24

DECIMAL-TO-BINARY CONVERSIONS Lecture 7 28

INTEGER REPRESENTATION Lecture 8 32

COMPLEMENT ARITHMETIC Lecture 9 37

FLOATING POINT NUMBER REPRESENTATION Lecture 10 41

FLOATING POINT ARITHMETIC Lecture 11 45

CHARACTER REPRESENTATIONS Lecture 12 49

ERROR DETECTION AND ERROR CORRECTION Lecture 13 52

BOOLEAN ALGEBRA Lecture 14 56

BOOLEAN OPERATIONS Lecture 15 59

DIGITAL SYSTEMS AND LOGIC Lecture 16 63

COMBINATIONAL CIRCUITS Lecture 17 68

SEQUENTIAL CIRCUITS AND FSMS Lecture 18 73

APPLICATIONS: REGISTERS, COUNTERS, MEMORY ELEMENTS, ADDERS, ETC. Lecture 19 77

A SIMPLE COMPUTER (MARIE) Lecture 20 81

MEMORY OPERATIONS AND DATA PATHS Lecture 21 87

MARIE PROGRAMMING MODEL AND THE MICROMACHINE Lecture 22 92

RTN AND ASSEMBLY LANGUAGE PROGRAMMING

ii

TABLE of CONTENTS

(continued)

Lectures – cont’d

Lecture 23 97 ADDRESSING MODES

Lecture 24 102 CPU DECODING AND ISA

Lecture 25 106 MICROMACHINE PROGRAMMING

Lecture 26 111 REAL WORLD MACHINES (RISC VS CISC)

Lecture 27 115 MORE ON ISA: ENDIANESS, STACKS, GPRS, RPN

Lecture 28 120 INSTRUCTION FORMATS: OPCODE SIZES VS ADDRESSING SIZES

Lecture 29 124 INSTRUCTION PIPELINING

Lecture 30 128 VIRTUAL MACHINES

Lecture 31 132 MEMORY SYSTEMS: CACHE

Lecture 32 137 MORE ON MEMORY SYSTEMS: CACHING

Lecture 33 143 VIRTUAL MEMORY SYSTEMS: TRANSLATION LOOK-ASIDE BUFFERING

Lecture 34 149 INPUT/OUTPUT SYSTEMS

Lecture 35 154 STORAGE SYSTEMS: BUS STRUCTURES

Lecture 36 160 STORAGE SYSTEMS: MAGNETIC -- HARD AND FLOPPY; OPTICAL

Lecture 37 165 RAID SYSTEMS

Lecture 38 170 DATA STORAGE SYSTEMS

Lecture 39 175

Lecture 40

File:2mf3toc.doc Date:12jun07/wfsp

cs2mf3/wfsp L01-1

CS2MF3Digital Systems and Systems Programming

What is computer organization and architecture?Usually composed of a hardware part and a software part :

• Together these form the Computer System and operate as a unit– We shall study digital systems (all the main hardware parts), and– Systems programming (composed mainly of machine coding, but also

including the operating system, system programs and high-level coding concepts)

Why study computer organization and architecture?Design better programs, including system software such as compilers, operating systems, and device drivers.

Optimize program behavior.

Evaluate (benchmark) computer system performance.

Understand time, space, and price tradeoffs

Slide 2

cs2mf3/wfsp L01-2

CS2MF3 – OVERVIEW I

Computer organizationEncompasses all physical aspects of computer systems.E.g., circuit design, control signals, memory types.How does a computer work?

Computer architectureLogical aspects of system implementation as seen by the programmer.E.g., instruction sets, instruction formats, data types, addressing modes.How do I design a computer?

• For the Computer Scientist, this means we study the computer at the unit level (black box level).

• For the Computer Engineer, this means they study the computer at the individual component level (transistors and integrated circuits).

W.F.S.Poehlman CS2MF3 -- Digital Systems & Systems Programming Page 1

cs2mf3/wfsp L01-3

CS2MF3 – OVERVIEW II

Computer softwareThe Software Engineer studies how to design application solutions using programming systems and software tools, how to model the solutions so that formal logic proofs that show the design is correct, and how to test solution validity.The Computer Scientist studies how to develop operating system software, new computer languages, handle embedded specialized processors, make better parallel computing systems, etc.

We will focus on computer Assembly LanguageThis is the lowest level of software possible.Higher levels include languages such as "C", Pascal, etc.Above this is Object Oriented programming systems.Above this is complex Component-Based programming systems

Slide 4

cs2mf3/wfsp L01-4

CS2MF3 – OVERVIEW III

High-level languages vs. Low-level languages

Harder to debugVery fast execution

Programs are very machine dependent

Low demand for resources

Changes are difficult to make

Unrestricted access to all hardware features

Low-level

System size very largeEasily modifiable

Language may not appropriate for solution

Error detection and program recovery good

Programmer constrained to language constructs

Abstraction is at a high level

High-levelCONsPROsLanguage


cs2mf3/wfsp L01-5

CS2MF3 – OVERVIEW IVWhy study machine assembly language?

To do something that is impossible or awkward with a high-level languageTo speed up a slow program with being executedTo create a program that is as small (takes up very little room in computer memory) as possibleTo learn the details of a particular microprocessor and its operating systemFor enjoyment (Yes, some people like to do this – me included!)

Principle of Equivalence of Hardware and Software:Anything that can be done with software can also be done with hardware, and anything that can be done with hardware can also be done with software.*

Slide 6

cs2mf3/wfsp L01-6

Basic Computer Components -- I

At its most primitive level,a computer consists of:

A processor to interpretand execute programs

A memory to store bothdata and programs

A mechanism for transferringdata to and from the outsideworld; commonly termed I/Oor Input/Output devices.

Another mechanism, usuallytermed "busses", that enablesthese units to communicateamongst themselves.


cs2mf3/wfsp L01-7

Basic Computer Components -- II

Most important among theseis the processor, or more fully,the Central Processing Unit(CPU) which is, itself,composed of:

A control unit, or control stores,An Arithmetic and Logic Unit,or ALUAn internal memory usuallycomposed of registers,sometimes a register bankA set of busses to interconnectthese internal components.

Slide 8

cs2mf3/wfsp L01-8

Basic Computer Components -- III

Text sections:For this lecture see 1.1 – 1.3.

Next lecture:describing these components by specification.


cs2mf3/wfsp L02-1

CS2MF3 – Basic Computer ComponentsWhat are these parts for?

Recall from the last lecture that there are four main components that compose modern computers, (as shown below in a slightly different form, one that focuses on data flow paths): CPU, MEMORY and I/O.This form of architecture for a computer is called a stored program computer, where a computer program (composed of a sequence of instructions) is stored in a place (called main memory) until it is needed for execution which occurs in the Central Processing Unit or CPU.

• when either data input or result outputs are ready then these are handled by the I/O or Input/Output devices.

• Architecture like this is called von Neumann architecture – more later.

Slide 2

cs2mf3/wfsp L02-2

Basic Computer Components Exposed -- I

How do we specify them?Usually by their most important characteristic, such as

• CPU by its speed• Memory by its size• I/O devices by what they do and perhaps how fast they do it.

But computers are fast compared to humans so typing 60 words a minute is slow for computers, in fact a slow CPU can execute oneinstruction every 1 millionth of a second.CPU speeds are specified by how many instructions they can execute in one second. Example: from above that is 1 million instructions per second (the inverse ratio of the above).To talk such large numbers we must speak Greek!We use Greek prefixes to set the decimal in the specification, such as 1MHz for CPU speed, or I Million Instructions/Second or cycles per second which is what a Hertz (Hz) is.


cs2mf3/wfsp L02-3

Basic Computer Components Exposed -- II

Modern CPU speeds are better than 4 GHz (GigaHertz)Modern memory size (integrated circuit form) are 1MB (MegaBytes) -- more about bytes (= 8 bits) laterI/O devices like printers are 10 pages/minute (fooled ya!),but servers store TeraBytes (TB) of information, now.

Integer Powers Fractional Powers

Slide 4

cs2mf3/wfsp L02-4

Basic Computer Components Exposed -- IIISeems easy BUT, why are powers of two listed in the table?Consider in more detail measures of capacity and speed:

• Kilo- (K) = 1 thousand = 103 and 210

• Mega- (M) = 1 million = 106 and 220

• Giga- (G) = 1 billion = 109 and 230

• Tera- (T) = 1 trillion = 1012 and 240

• Peta- (P) = 1 quadrillion = 1015 and 250

• Exa- (E) = 1 quintillion = 1018 and 260

• Zetta- (Z) = 1 sextillion = 1021 and 270

• Yotta- (Y) = 1 septillion = 1024 and 280

each factor of three in the decimal system exponent is a factor of 10 in the powers of two radix system and they are ALMOST equalExample – consider talking about memory sizes where 1 byte = 8 bits

1KB = 103 bytes = thousand bytes ~ 210 = 1024 Bytes1MB = 106 bytes = million bytes ~ 220 = 1,048,576 Bytes

NOTEWhether a metric refers to a power of ten or a power of two typicallydepends upon what is being measured.


cs2mf3/wfsp L02-5

Basic Computer Components Exposed -- IVMeasures of time and space:

• Milli- (m) = 1 thousandth = 10 -3

• Micro- (µ) = 1 millionth = 10 -6

• Nano- (n) = 1 billionth = 10 -9

• Pico- (p) = 1 trillionth = 10 -12

• Femto- (f) = 1 quadrillionth = 10 -15

• Atto- (a) = 1 quintillionth = 10 -18

• Zepto- (z) = 1 sextillionth = 10 -21

• Yocto- (y) = 1 septillionth = 10 -24

ExamplesMillisecond = 1 thousandth of a second

• Hard disk drive access times are often 10 to 20 milliseconds.Nanosecond = 1 billionth of a second

• Main memory access times are often 50 to 70 nanoseconds.Micron (micrometer) = 1 millionth of a meter

• Circuits on computer chips are measured in microns.

Slide 6

cs2mf3/wfsp L02-6

Basic Computer Components Exposed -- VOther notes:

the cycle time is the reciprocal of clock frequency.Example:

• A bus operating at 133MHz has a cycle time of 7.52 nanoseconds,that is 133,000,000 cycles/second = 7.52 nanoseconds/cycle or ns/cycle.

MEMORYPRIMARY

• Computers with large main memory capacity can run larger programs with greater speed than computers having small memories.

• RAM is an acronym for random access memory. Random access means that memory contents can be accessed directly if its location is known.

• Cache is a type of temporary memory that can be accessed faster than RAM.


cs2mf3/wfsp L02-7

Basic Computer Components Exposed -- VIMEMORY -- continued

SECONDARY

• Usually means magnetic or hard disk storage systems,we shall look at these in more detail later in the first half of the course

• Disk storage is measured in GB (gigabytes) for small systems, andTB (terabytes) for large systems.

Input/Output or I/O DevicesThese devices are hooked to various types of ports that allow one or more devices to send or receive information to or from the computer

• Serial ports send data as a series of pulses along one or two data lines.

• Parallel ports send data as a single pulse along at least eight data lines.

• USB, Universal Serial Bus, is an intelligent serial interface that is self-configuring. (It supports “plug and play.”)

• More on the distinction between PORTS from BUSses later on in the course

Slide 8

cs2mf3/wfsp L02-8

Basic Computer Components Exposed -- VII

All these units are fine but how did they come about?To communicate these very small or very large numbers we all need to understand their meaningThis is the job of STANDARDS ORGANIZATIONS, such as

• The Institute of Electrical and Electronic Engineers (IEEE)– Promotes the interests of the worldwide electrical engineering community.

– Establishes standards for computer components, data representation, and signaling protocols, among many other things

• The International Telecommunications Union (ITU), formerly CCITT

– Concerns itself with the interoperability of telecommunications systems, including data communications and telephony.

• National groups establish standards within their respective countries:– The American National Standards Institute (ANSI)– The British Standards Institution (BSI)


cs2mf3/wfsp L02-9

Basic Computer Components Exposed -- VIII

STANDARDS ORGANIZATIONS -- continuedThe International Organization for Standardization (ISO)

• Establishes worldwide standards for everything from screw threads to photographic film.

• Is influential in formulating standards for computer hardware and software, including their methods of manufacture.

Text sections:For this lecture see 1.4, 1.7.

Next lecture:Describing historical computers leading to modern ones.Describing von Neumann and non-von Neumann types.


CS2MF3 – Digital Systems and Systems ProgrammingHistorical Considerations (predominantly based on hardware evolution)

The evolution of computing machinery has taken place over several centuries.

In modern times computer evolution is usually classified into fourIn modern times computer evolution is usually classified into four generations according to the salient technology of the era.

Generation Zero -- Mechanical Calculating Machines (1642 - 1945)

C l l i Cl k Wilh l S hi k d (1592 1635)• Calculating Clock - Wilhelm Schickard (1592 - 1635).• Pascaline - Blaise Pascal (1623 - 1662).• Difference Engine - Charles Babbage (1791 - 1871), also designed but never

b il h A l i l E ibuilt the Analytical Engine. • Punched card tabulating machine – Herman Hollerith (1860 - 1929).

• The Hollerith Card shown at the lower left, onto which a single line of a computer program source code was

cs2mf3/wfsp L03-1

p gtyped and then input into the computer by a card reader.

Historical Considerations -- I

Generation Zero – the Hollerith tabulating machine (the card punch machine)

NOTE: in order for the computer to read pthe information on the card, it could not be bent, spindled, stapled or mutilated in any way and, by the way, Hollerith cards were

l d f t i t ll

cs2mf3/wfsp L03-2

commonly used for computer input well into the 1970s.


Historical Considerations -- II

The First Generation: Vacuum Tube Computers (1945 - 1953) -- continued

Atanasoff Berry Computer ABCAtanasoff Berry Computer, ABC(1937 - 1938) solved systems of linear equations using the truly first completely electronic computer (specialized)electronic computer (specialized).

• Developed by John Atanasoff and Clifford Berry of Iowa State University in 1945.

The ENIAC was the first general purpose• The ENIAC was the first general-purpose computer.

• Electronic Numerical Integrator and C t (ENIAC)Computer (ENIAC)

• Developed by John Mauchly and J. Presper Eckert at the University of Pennsylvania in 1946

cs2mf3/wfsp L03-3

1946

Historical Considerations -- III

The First Generation: Vacuum Tube ComputersThe First Generation: Vacuum Tube Computers(1945 - 1953) -- continued

The IBM 650 first mass-produced computer. (1955)p p ( )• It was phased out in 1969.

Other major computer manufacturers of this period include UNIVAC Engineering Research Associates (ERA) and ComputerUNIVAC, Engineering Research Associates (ERA), and Computer Research Corporation (CRC).

° UNIVAC and ERA were bought by Remington Rand, the ancestor of the Unisys Corporationthe Unisys Corporation.

° CRC was bought by the Underwood (typewriter) Corporation, which left the computer business.

The first major computer in Canada was built at Chalk River inThe first major computer in Canada was built at Chalk River in 1949 and contained 1,000 tubes of which several would burn out per day and a technician, whose sole job was to carry a wheelbarrow full of various tubes to find them and then replace

cs2mf3/wfsp L03-4

ee ba o u o a ous tubes to d t e a d t e ep acethem when they failed every day.


Historical Considerations -- IV

The Second Generation: TransistorizedComputers (1954 - 1965)

Mainframes (most cost Millions of dollars)Mainframes (most cost Millions of dollars)• IBM 7094 (scientific) and 1401 (business)• Univac 1100 (a stack machine)• Control Data Corporation 1604 (a multi-CPU machine).• . . . and many others.

MinicomputersMinicomputers• Digital Equipment Corporation (DEC) PDP-1 where PDP represents

Programmable Data Processor and costs about $150,000

cs2mf3/wfsp L03-5

Historical Considerations -- V

The Third Generation: Integrated Circuit (IC) Computers (1965 - 1980) – many transistors on a chip (IC)

General purpose mainframes (millions)General purpose mainframes (millions)• IBM 360

General purpose minicomputers (hundreds of thousands)DEC PDP 8 d PDP 11• DEC PDP-8 and PDP-11

Cray-1 supercomputer (multi-millions). . . and many others.

By this time, IBM had gained overwhelming dominance in the industry.

Computer manufacturers of this era were characterized asIBM and the BUNCH (Burroughs, Unisys, NCR, Control Data, and Honeywell).

cs2mf3/wfsp L03-6

( g , y , , , y )


Historical Considerations -- VI

The Fourth Generation: VLSI Computers(1980 - ????) – millions of transistors on a chip

Very large scale integrated circuits (VLSI) have more thanVery large scale integrated circuits (VLSI) have more than 10,000 components (gates, flip/flops, etc.) per chip.Enabled the creation of microprocessors.The first was the 4-bit Intel 4004.Later versions, such as the 8080, 8086, and 8088 spawned the idea of “personal computing ”the idea of personal computing.Lately, Intel now puts two CPUs on a single chip (dual core technology as Moore's law takes effect) and the operating system is able to keep two processors running in parallel through the useis able to keep two processors running in parallel through the use of multi-threading (more than one thread of control) software technology.

cs2mf3/wfsp L03-7

Historical Considerations -- VI

Modern SystemsMoore’s Law (1965)

• Gordon Moore, Intel founder

• “The density of transistors in an integrated circuit will double every year.”

Contemporary version:• “The density of silicon chips

doubles every 18 months.”Everyone expects this Law to y pfail sometime, but Intel keeps providing new ideas. The latest, as mentioned above, is dual core processing chips

tubetransistorsmall scale

integrated very large scale

cs2mf3/wfsp L03-8

is dual core processing chips. circuit (IC) very large scale integrated circuit (IC)


Historical Considerations -- VII

Text sections:For this lecture see 1.4, 1.7.

Next lecture:Next lecture:Software considerations.von Neumann architecture via software view.

cs2mf3/wfsp L03-9

A Word About Assignment Submissions in CS 2MF3WebCT will only be used for discussion sessions and a calender of events for the class work (including due dates of assignments.Course web site (www.cas.mcmaster.ca/~cs2mf3) will be used for lecture and assignment notices as well as student download of assignment materials, when ready.download of assignment materials, when ready.Assignments solutions must be printed and then submitted in sealed envelopes, bearing the student name, number and course (CS2MF3) name on the outside into anand course (CS2MF3) name on the outside, into an appropriately labelled slot box on the outside wall of the CAS DIC (Drop-in-Centre – ITB-101) BEFORE the DUE DATEDATE.Only in special cases (approved by the instructor or TA) can electronic submissions be made to WebCT (such as l t d ill d il bilit )

cs2mf3/wfsp L03-10

lateness, excused illnesses and on-campus availability).


cs2mf3/wfsp L04-1

CS2MF3 – Digital Systems and Systems Programming

Computer Software ConsiderationsSoftware, unlike hardware, has had no clear improvements over the past decades, except to become more complicatedSoftware Engineers are only now just becoming professionals so this may change in timeSo we shall just detail the current state of affairs in modern software

• We do know that before a computer can do anything worthwhile,it must use software in concert with the hardware (or be a system)

• Writing complex programs requires a “divide and conquer” approach, where each program module solves a smaller problem.

• Complex computer systems employ a similar technique through a series of virtual machine layers.

• Ultimately, the lowest level of computer software (called machine code, or the executable or the binary) will control the hardware which, in turn, will finally do the work required.

• How we get to this level is one of the major subjects of concern to the discipline of Computer Science.

Slide 2

cs2mf3/wfsp L04-2

Computer Software Considerations -- I

Computer level hierarchy

Each virtual machine layer is an abstraction of the level below it.The machines at each level execute their own particular instructions, calling upon machines at lower levels to perform tasks as required.Usually the Operating System provides tools to migrate from the upper to the lower levels, often with only minor interaction of the user.


cs2mf3/wfsp L04-3

Computer Software Considerations -- IILevel 6: The User Level

Program execution and user interface level.The level with which we are most familiar.

Level 5: High-Level Language LevelThe level with which we interact when we write programs in languages such as C, Pascal, Lisp, and Java.

Level 4: Assembly Language LevelActs upon assembly language produced from Level 5, as well as instructions programmed directly at this level.

Level 3: System Software LevelControls executing processes on the system.Protects system resources.Assembly language instructions often pass through Level 3 without modification.

Slide 4

cs2mf3/wfsp L04-4

Computer Software Considerations -- III

Level 2: Machine LevelAlso known as the Instruction Set Architecture (ISA) Level.

Consists of instructions that are particular to the architecture of the machine.

Programs written in machine language need no compilers, interpreters, or assemblers.

Level 1: Control LevelA control unit decodes and executes instructions and moves data through the system.Control units can be microprogrammed or hardwired. A microprogram is a program written in a low-level language that is implemented by the hardware. (More on this in second half.)Hardwired control units consist of hardware that directly executes machine instructions.


cs2mf3/wfsp L04-5

Computer Software Considerations -- IV

Level 0: Digital Logic LevelThis level is where we find digital circuits (the chips).Digital circuits consist of gates and wires.These components implement the mathematical logic of all other levels.

EXAMPLE:On the ENIAC, all programming was done at the digital logic level.

Programming the computer involved moving plugs and wires so that configuring the ENIAC to solve a “simple” problem required many days labor by skilled technicians.

A different hardware configuration was needed to solve every unique problem type.

Slide 6

cs2mf3/wfsp L04-6

Computer Software Considerations -- VMore on the von Neumann Model for Architecture

Inventors of the ENIAC, John Mauchley and J. Presper Eckert, conceived of a computer that could store instructions in memory.

The invention of this idea has since been ascribed to a mathematician, John von Neumann, who was a contemporary ofMauchley and Eckert.

Stored-program computers have become known asvon Neumann Architecture systems as shown at the lower left of this slide.


cs2mf3/wfsp L04-7

Computer Software Considerations -- VIModern stored-program computers have the following characteristics:

Three hardware systems: • A central processing unit (CPU)• A main memory system• An I/O system

The capacity to carry out sequential instruction processing.A single data path between the CPU and main memory.

• This single path is known as the von Neumann bottleneck.These computers employ a fetch-decode-execute cycle to run programs as follows:

• The control unit fetches the next instruction from memory using the program counter to determine where the instruction is located.

• The instruction is decoded into a language that the ALU can understand.

• Any data operands required to execute the instruction are fetched from memory and placed into registers within the CPU.

• The ALU executes the instruction and places results in registers or memory.

Slide 8

cs2mf3/wfsp L04-8

Computer Software Considerations -- VIIStored-program computers (von Neumann type)

Conventional stored-program computers have undergone many incremental improvements over the years.These improvements include adding specialized buses, floating-point units, and cache memories, to name only a few.But enormous improvements in computational power require departure from the classic von Neumann architecture.Adding processors is one approach.

In the late 1960s, high-performance computer systems were equipped with dual processors to increase computational throughput.

In the 1970s supercomputer systems were introduced with 32 processors.

Supercomputers with 1,000 processors were built in the 1980s.

In 1999, IBM announced its Blue Gene system containing over one million processors.


cs2mf3/wfsp L04-9

Computer Software Considerations -- VIIIVon Neumann Architectural Computers

Parallel processing is only one method of providing increased computational power.

More radical systems have reinvented the fundamental concepts ofcomputation.

These advanced systems include genetic computers, quantum computers, and dataflow systems.

At this point, it is unclear whether any of these systems will provide the basis for the next generation of computers.

In the second half of this course we will submerse ourselves in Assembly Language coding and following its path to executable statements called machine code or binary, so much of the concepts above will become crystal clear.

Text Sections: 1.6-1.7.

Next Lecture: Data Representations.


cs2mf3/wfsp L05-1


Data Representation in ComputersAs we shall see when we take a closer look at how computers are built in several weeks from now, computers can only work with numbers that are constructed from either ones or zeros. Even alphabetic letters are really patterns of 1's and 0's to a computer.Therefore a bit is the most basic unit of information in a computer.

• It is a state of “on” or “off” in a digital circuit.• Sometimes these states are “high” or “low” voltage instead of “on” or

“off..”A byte is a group of eight bits.

• A byte is the smallest possible addressable unit of computer storage.• The term, “addressable,” means that a particular byte can be retrieved

according to its location in memory.

A group of four bits is called a nibble (or nybble).• Bytes, therefore, consist of two nibbles: a “high-order nibble,” and a

“low-order” nibble.

Slide 2

cs2mf3/wfsp L05-2

Data Representation in Computers -- I

NomenclatureA word is a contiguous group of bytes.

• Words can be any number of bits or bytes.

• Word sizes of 16, 32, or 64 bits are most common.

• In a word-addressable system, a word is the smallest addressable unit of storage.

Bytes store numbers using the position of each bit to represent a power of 2.

The binary system is also called the base-2 system.

Our decimal system is the base-10 system. It uses powers of 10 for each position in a number.

Any integer quantity can be represented exactly using any base (or radix).


cs2mf3/wfsp L05-3

Data Representation in Computers -- IIRadix 10 (decimal system)

The decimal number 947 in powers of 10 is:

The decimal number 5836.47 in powers of 10 is:

Radix 2 (binary system)The binary number 11001 in powers of 2 is:

When the radix of a number is something other than 10, the base is denoted by a subscript.

• Sometimes, the subscript 10 is added for emphasis:

110012 = 2510

9 × 10 2 + 4 × 10 1 + 7 × 10 0

5 × 10 3 + 8 × 10 2 + 3 × 10 1 + 6 × 10 0 + 4 × 10 -1 + 7 × 10 -2

1 × 2 4 + 1 × 2 3 + 0 × 2 2 + 0 × 2 1 + 1 × 2 0

= 16 + 8 + 0 + 0 + 1 = 25

Slide 4

cs2mf3/wfsp L05-4

Decimal to Binary Conversions -- IBecause binary numbers are the basis for all data representation in digital computer systems, it is important that you become proficient with this radix system.

Your knowledge of the binary numbering system will enable you tounderstand the operation of all computer components as well as the design of instruction set architectures.

In an earlier slide, we said that every integer value can be represented exactly using any radix system.

INTEGERS (decimal to other radix conversions)You can use either of two methods for radix conversion: the subtraction method and the division remainder method.

The subtraction method is more intuitive, but cumbersome. It does, however reinforce the ideas behind radix mathematics.


cs2mf3/wfsp L05-5

Decimal to Binary Conversions -- IIINTEGER subtraction method:

Suppose we want to convert the decimal number 190 to base 3.

• We know that 3 5 = 243 so our result will be less than six digits wide. The largest powerof 3 that we need is therefore3 4 = 81, and 81 × 2 = 162.

• Write down the 2 and subtract 162 from 190, giving 28.

• The next power of 3 is 3 3 = 27. We’ll need one of these, so we subtract 27 and write down the numeral 1 in our result.

• The next power of 3, 3 2 = 9, is too large, but we have to assign a placeholder of zero and carry down the 1.

• 3 1 = 3 is again too large, so we assign a zero placeholder.

• The last power of 3, 3 0 = 1, is our last choice, and it gives us a difference of zero.

• Our result, reading from top to bottom is:

19010 = 210013

Slide 6

cs2mf3/wfsp L05-6

Decimal to Binary Conversions -- III

INTEGER division method:Another method of converting integers from decimal to some otherradix uses division.

This method is mechanical and easy.

It employs the idea that successive division by a base is equivalent to successive subtraction by powers of the base.

We will use the division remainder method to again convert 190 in decimal to base 3.

First we take the number that we wish toconvert and divide it by the radix in whichwe want to express our result.

In this case, 3 divides 190 63 times,with a remainder of 1.

Record the quotient and the remainder.


cs2mf3/wfsp L05-7

Decimal to Binary Conversions -- IVINTEGER division method – continued:

63 is evenly divisible by 3.

Our remainder is zero, and the quotient is 21.

Continue in this way until the quotient is zero.

In the final calculation, we note that3 divides 2 zero times with a remainder of 2.

Our result, reading from bottom to top is:

19010 = 210013

Other RADIX number systemsAS a slight digression, we shall look at two common bases that humans use with computers. They are• OCTAL or base 8, and• HEXADECIMAL or base 16.

Slide 8

cs2mf3/wfsp L05-8

Decimal to Binary Conversions -- V

Radix and common alternative number systems

Shown at the rightNote how other characters are needed in hexadecimal and not all numbers in octal are needed.This is because base 2 (binary), base 8 (octal) are less than base 10 (decimal) but this in turn is less that base 16 (hexadecimal = hex(6) + decimal(10) =16) 10201000016

11211000117

F17111115E16111014D15110113C14110012B13101111A121010109111001981010008771117661106551015441004331132210211110000

HexadecimalOctalBinaryDecimal


cs2mf3/wfsp L05-9

Decimal to Binary Conversions -- VI

Rule of numeralsA numeral can be represented as follows:

where, dn is the nth digit and r is called the radixTherefore we can say that

• r = 2 for binary numerals• r = 8 for octal numerals• r = 10 for decimal numerals• r = 16 for hexadecimal (hex) numerals

Text Sections: 2.1-2.3.

Next Lecture: More on binary representations, both integer and fractional.

0−− ++++ rdrdrdrd n

nn

n 01

11

1 ...


cs2mf3/wfsp L06-1

CS2MF3 – Digital Systems and Systems ProgrammingDecimal to Binary Conversions – continued

More words about binary number representations in computers• The binary numbering system is the most important radix system for

digital computers, as it uses electronic components that can contain only binary digits, that is, one or zero, OR true/false, OR high/low .

• However, it is difficult to read long strings of binary numbers-- and even a modestly-sized decimal number becomes a very long binary number.

– For example: 110101000110112 = 1359510

• For compactness and ease of reading, binary values are usually expressed using either the octal/base-8, or the hexadecimal/base-16, numbering systems.

• We have seen examples of both these systems in the previous lecture.

Slide 2

cs2mf3/wfsp L06-2

Decimal to Binary Conversions -- continuedAlternate Numbering Systems – continued

The hexadecimal numbering system uses the numerals 0 through 9 and the letters A through F.

• The decimal number 12 is C16.• The decimal number 26 is 1A16.

It is easy to convert between base 16 and base 2, because 16 = 24.Thus, to convert from binary to hexadecimal, all we need to do is group the binary digits into groups of four.

Using groups of hextets, the binary number 110101000110112 (= 1359510) in hexadecimal is:

Octal (base 8) values are derived from binary by using groups ofthree bits (8 = 23):


cs2mf3/wfsp L06-3

Decimal to Binary Conversions -- IFRACTIONAL methods:

Fractional values can be approximated in all base systems.

Unlike integer values, fractions do not necessarily have exact representations under all radices.The quantity ½ is exactly representable in the binary and decimal systems, but is not in the ternary (base 3) numbering system.

Fractional decimal values have nonzero digits to the right of the decimal point.

Fractional values of other radix systems have nonzero digits to the right of the radix point. NOTE: we are more used to saying to the right of the "decimal" point, but that assumes a base 10 numbering system, which is reasonable, until we speak of computers, where other radices are possible, such as binary or base two. The more correct and generic form is therefore "radix" point.

Slide 4

cs2mf3/wfsp L06-4

Decimal to Binary Conversions -- IIFRACTIONAL methods -- continued:

Numerals to the right of a radix point, represent negative powers of the radix:

As with whole-number conversions, it is possible to use either of two methods: a subtraction method and an easy multiplication method.

The subtraction method for fractions is identical to the subtraction method for whole numbers. Instead of subtracting positive powersof the target radix, we subtract negative powers of the radix.

0.4710 = 4 × 10 -1 + 7 × 10 -2

0.112 = 1 × 2 -1 + 1 × 2 -2

= ½ + ¼= 0.5 + 0.25 = 0.75


cs2mf3/wfsp L06-5

Decimal to Binary Conversions -- IIIFRACTIONAL methods -- Subtraction:

Start with the largest value first, n -1, where n is the radix, and then work along using larger negative exponents.The calculation to the right is anexample of using the subtractionmethod to convert the decimal0.8125 to binary.• Our result, reading from top to bottom is:

0.812510 = 0.11012

• Of course, this method workswith any base, not just binary.

Multiplication Method:Using the multiplication methodto convert the decimal 0.8125 tobinary, multiply by the radix 2.

The first product carries into the units place.

Slide 6

cs2mf3/wfsp L06-6

Decimal to Binary Conversions -- IV

FRACTIONAL methods – Multiplication:Ignoring the value in the units placeat each step, continue multiplying eachfractional part by the radix.The calculation is complete when theproduct is zero, or until the desired number of binary places has been reached.The result, reading from top to bottom is:

0.812510 = 0.11012

This method also works with any base.Simply use the target radix as the multiplier.


cs2mf3/wfsp L06-7

Decimal to Binary Conversions -- V

Negative numbers and radixThe conversions so far presented have involved only positive numbers.EXAMPLE:

• Consider a car on road that has a trip odometer that shows 000000• If the car moves forward a kilometer the odometer reads 000001

• But if the odometer is reset to zero and the car moves backwardsone kilometer then the odometer reads 000000 – 1 = 999999

• Looking at this more closely, 999999 + 1 = 000000, where therehas been a "carry out" from the leftmost (MSB) digit position.

• PROBLEM:– does 999999 represent –1 or does it represent a million –1 ????????– That is, is this an unsigned number (106-1) ?– OR, is this signed, negative number (-1).

• It could be either but the computer will know which, according to rules we will establish in the next lecture.

Slide 8

cs2mf3/wfsp L06-8

Decimal to Binary Conversions -- VI

Text Sections:2.3 and first part of 2.4.

Next Lecture:More on negative binary representations, and then arithmetical operations.

…………

…………

…………

011111111271777F

010000006410040

00000011330300000010220200000001110100000000000011111111-1377FF11111110-2376FE11111101-3375FD

11000000-64300C0…………

10000000-12820080BinaryDecimalOctalHexadecimal


cs2mf3/wfsp L07-1

CS2MF3 – Digital Systems and Systems ProgrammingSigned Integer Representation

To represent negative values, computer systems allocate the high-order bit to indicate the sign of a value.

• The high-order bit is the leftmost bit in a byte. It is also called the most significant bit or MSB.

The remaining bits contain the value of the number.To make a decimal number negative, we simply place a minus sign in front of it, but a computer cannot do so, unless we allocate the MSB to be the minus sign and take it away from being part ofthe significant figures of the number.However, there is more to this as we saw at the end of the last lecture. There are really three ways we COULD represent negative numbers in a computer:

• Signed magnitude, • One’s complement and • Two’s complement.

In an 8-bit word, signed magnitude representation places the absolute value of the number in the 7 bits to the right of the sign bit.

Slide 2

cs2mf3/wfsp L07-2

Signed Integer Representation -- INegative numbers (signed magnitude convention) – cont'd

For example, in 8-bit signed magnitude,• Positive 3 is: 00000011• Negative 3 is: 10000011

Computers perform arithmetic operations on signed magnitude numbers in much the same way as humans carry out pencil and paper arithmetic.

• Humans often ignore the signs of the operands while performing acalculation, applying the appropriate sign after the calculation is complete.

Binary addition is easy as there are only four rules:0 + 0 = 0 0 + 1 = 11 + 0 = 1 1 + 1 = 10 (note carry to left)

The simplicity of this system makes it possible for digital circuits to carry out arithmetic operations.These circuits will be described in a later lecture.


cs2mf3/wfsp L07-3

Signed Integer Representation -- IINegative numbers (signed magnitude convention) – cont'd

Addition Example:• Using signed magnitude binary arithmetic,

find the sum of 75 and 46.

First, convert 75 and 46 to binary, andarrange as a sum, but separate the(positive) sign bits from the magnitude bits.Just as in decimal arithmetic, find the sumstarting with the rightmost bit and work left.In the second bit, there is a carry, so note it above the third bit.The third and fourth bits also result in carries.Once all eight bits have been completed, we are done.In this example, two values were chosen whose sum would fit into seven bits. If that is not the case, there is a problem.

Slide 4

cs2mf3/wfsp L07-4

Signed Integer Representation - IIINegative numbers – signed magnitude convention – cont'd

Problem Example:• Using signed magnitude binary

arithmetic, find the sum of 107 and 46.• Since the carry from the seventh

bit cannot be represented in the givennumber of allowed bits, we say that thecarry overflows and is discarded, givingus the erroneous result: 107 + 46 = 25.

Addition of two negative number Examples:• The signs in signed magnitude representation work just like the signs in

pencil and paper arithmetic.• Using signed magnitude binary arithmetic,

find the sum of - 46 and - 25.• Because the signs are the same,

all that is necessary is to add thenumbers and supply the negativesign when complete.


cs2mf3/wfsp L07-5

Signed Integer Representation - IVNegative numbers (signed magnitude convention) – cont'd

Operational conclusions:• Signed magnitude representation is easy for people to understand, but it

requires complicated computer hardware.

• Another disadvantage of signed magnitude is that it allows two different representations for zero:

– positive zero and negative zero– +0 is 0 0000000– -0 is 1 0000000

• For these reasons (among others) computers systems employ complement systems for numeric value representation.

• As we have mentioned previously, they are– Ones complement, and – Twos complement

Slide 6

cs2mf3/wfsp L07-6

Signed Integer Representation - VNegative numbers (ones complement's convention)

In complement systems, negative values are represented by some difference between a number and its base.In diminished radix complement systems, a negative value is given by the difference between the absolute value of a number and one less than its base.In the binary system, this is called one’s complement. It amounts to flipping the bits of a positive binary number to obtain its negative value.An 8-bit one’s complement example:

• +3 is 00000011• -3 is 11111100

In one’s complement, as with signed magnitude, negative values are indicated by a 1 in the high order bit.Complement systems are useful because they eliminate the need for subtraction. The difference of two values is found by adding theminuend to the complement of the subtrahend, which is easy to doelectronically.


cs2mf3/wfsp L07-7

Signed Integer Representation - VINegative numbers (ones complement convention) – cont'd

With one’s complement addition, the carry bit is “carried around” and added to the sum.

• Example:Using one’s complement binary arithmetic,find the sum of 48 and - 19

• Note that 19 in one’s complement is00010011,

so -19 in one’s complement is11101100.

Although the “end around carry” addssome complexity, one’s complement issimpler to implement than signed magnitude.But it still has the disadvantage of having two different representations for zero: positive zero and negative zero.Two’s complement solves this problem.Two’s complement is the radix complement of the binary numbering system.

Slide 8

cs2mf3/wfsp L07-8

Signed Integer Representation - VII

Negative numbersones and twos complement convention

Text Sections: 2.4Next Lecture: Twos complement representation


cs2mf3/wfsp L08-1


More on Signed Integer RepresentationNegative numbers (twos complement convention)

• Recall that two’s complement is the radix complement of the binary numbering system.

• To express a value in two’s complement:– If the number is positive, convert it to binary.– If the number is negative, find the one’s complement of the number and

then add 1.• Example:• (1) -3

– In 8-bit one’s complement, positive 3 is: 00000011– Negative 3 in one’s complement is: 11111100– Adding 1 gives us -3 in two’s complement form: 11111101.

• (2) –2– In 8-bit one’s complement, positive 2 is: 00000010– Negative 2 in one’s complement is: 11111101– Adding 1 gives us -2 in two’s complement form: 11111110.

Slide 2

cs2mf3/wfsp L08-2

CS2MF3 – Digital Systems and Systems ProgrammingMore on Signed Integer Representation -- continued

Negative numbers (twos complement convention)• More Examples:• (1) -1

– In 8-bit one’s complement, positive 1 is: 00000001– Negative 1 in one’s complement is: 11111110– Adding 1 gives us -1 in two’s complement form: 11111111.

• (0) 0 – AND NOTE that +0 is 00000000 – AND NOTE that -0 in one's complement is 11111111– AND NOTE that -0 in two's complement is 00000000

– HOORAY! Only one representation for ZERO (0)• To work with the addition of two numbers in two's complement (or two's

complement subtraction), when one of the numbers is negative, do the following

– add our two binary numbers, and– discard any carries emitting from the high order bit.– An example follows …


cs2mf3/wfsp L08-3

More on Signed Integer Representation - INegative numbers (twos complement convention) – cont'd

Example of two's complement subtraction(and therefore addition)– Using two’s complement binary arithmetic,

find the sum of 48 and - 19.• 19 in one’s complement is: 00010011, so• -19 in one’s complement is: 11101100, and• -19 in two’s complement is: 11101101.• add the two binary numbers, and

discard any carries emitting from the high order bit

PROBLEM: There is a condition working with finite word size in computers that, as a result of an arithmetic operation, this result may be too big to be represented in the number of bits that the computer uses, that is, the result of a calculation becomes too large to be stored in the computer. This condition is known as an OVERFLOW CONDITION.More on this coming up on the next slide …

Slide 4

cs2mf3/wfsp L08-4

More on Signed Integer Representation - IINegative numbers (twos complement convention) – cont'd

When a finite number of bits are used to represent a number,there is always the risk of overflow

While overflow cannot always be prevented,it can always be detected.

In complement arithmetic,an overflow condition is easy to detect.Example:

• Using two’s complement binary arithmetic,find the sum of 107 and 46.

Notice that the nonzero carry from theseventh bit overflows into the sign bit,giving us the erroneous result: 107 + 46 = -103.Rule for detecting signed two’s complement overflow:When the “carry in” and the “carry out” of the sign bit differ,overflow has occurred.


cs2mf3/wfsp L08-5

More on Signed Integer Representation - III

Negative numbers (twos complement convention) –cont'd

Signed and unsigned numbers are both useful.• For example, memory addresses are always unsigned.

Using the same number of bits, unsigned integers can express twice as many values as signed numbers.Trouble arises if an unsigned value “wraps around.”

• In four bits: 1111 + 1 = 0000.Good programmers stay alert for this kind of problemRemember the odometer example from several lectures ago.The question arises, Are there better ways to do arithmetic, particularly those more suitable to assembly language programming, where bit manipulation is much easier to do than inhigh level programs?As we shall see, the answer is a big YES!

Slide 6

cs2mf3/wfsp L08-6

More on Signed Integer Representation - IVNegative numbers (twos complement convention) – cont'd

Research into finding better arithmetic algorithms has continued for over 50 years.One interesting product of this work is Booth’s algorithm.In most cases, Booth’s algorithm carries out multiplication faster and more accurately than naïve pencil-and-paper methods.The general idea is to replacearithmetic operations with bitshifting to the extent possible.In Booth’s algorithm, the first 1in a string of 1s in the multiplieris replaced with a subtractionof the multiplicand.LEFT Shift the partial sums, onedigit for each sequential 1 thatfollows until the last 1 in thestring is detected (1 in this case).Then add the multiplicand.

0011

x 0110

+ 0000

- 0011

+ 0000

+ 0011____

00010010

Multiplicand

Multiplier

From 1st

digit (0) in multiplier

Or + 1101


cs2mf3/wfsp L08-7

More on Signed Integer Representation - VNegative numbers (twos complement convention) – cont'd

The following is alarger example:• Assume 8 bits sig. figs.so

result will be 16 bits longthat is, n=8

• First multiplier digit is 0

• First 1 multiplier follows

• There are 5 sequential1's following the first one,so need 5 left shifts.

• Add multiplicand

Ignore all bits over 2n.

00110101x 01111110

+ 0000000000000000+ 111111111001011+ 00000000000000+ 0000000000000+ 000000000000+ 00000000000+ 0000000000+ 000110101_______10001101000010110

Slide 8

cs2mf3/wfsp L08-8

More on Signed Integer Representation - VINegative numbers (twos complement convention) – cont'd

Overflow and carry are tricky ideas.

Signed number overflow means nothing in the context of unsigned numbers, which set a carry flag instead of an overflow flag.

If a carry out of the leftmost bit occurs with an unsigned number, overflow has occurred.

Carry and overflow occur independently of each other.

The table below summarizes these situations

0110


cs2mf3/wfsp L08-9

More on Signed Integer Representation - VII

Text Sections: 2.4Next Lecture: Floating point system representation in computers


cs2mf3/wfsp L09-1

CS2MF3 – Digital Systems and Systems ProgrammingFloating Point Number Representation

The signed magnitude, one’s complement, and two’s complement representation that we have just presented deal with integer values only.We have also visited small fractional representations, however, without modification, these formats are not useful in scientific or business applications that deal with real number values.Floating-point representation solves this problem.It turns out that by clever manipulation, it is possible to perform floating-point calculations using any integer format.This is called floating-point emulation, because floating point values aren’t stored as such, we just create programs that make it seem as if floating-point values are being used.But since the advent of complex integrated circuitry, most modern computers are equipped with specialized hardware that performs floating-point arithmetic with no special programming required.

Slide 2

cs2mf3/wfsp L09-2

Floating Point Number Representation -- IIntroduction

Recall from your background studies, that any number can be written with an arbitrary number of significant figures using scientific notation.For example: 0.125 = 1.25 × 10-1

5,000,000 = 5.0 × 106

And Avogadro's number is 6.023 x 1023

Also, floating point numbers allow an arbitrary number of decimal places to the right of the decimal point, such as 0.5 × 0.25 = 0.125

Computers use a form of scientific notation for floating-point representation

Numbers written in scientific notation have three components:


cs2mf3/wfsp L09-3

Floating Point Number Representation -- IIIntroduction -- continued

Computer representation of a floating-point number consists of three fixed-size fields:

This is the standard arrangement of these fields.

The one-bit sign field is the sign of the stored value.

The size of the exponent field, determines the range of values that can be represented.

The size of the significand determines the precision (number of significant figures) of the representation.

Slide 4

cs2mf3/wfsp L09-4

Floating Point Number Representation -- IIIIntroduction -- continued

The IEEE-754 single precision floating point standard uses an 8-bit exponent and a 23-bit significand.The IEEE-754 double precision standard uses an 11-bit exponent and a 52-bit significand.For illustrative purposes, we will use a 14-bit model with a 5-bit exponent and an 8-bit significand.The significand of a floating-point number is always preceded by an implied radix point or since we are using computers then the radix point (decimal point) should be termed the binary point.Thus, the significand always contains a fractional binary value.The exponent indicates the power of 2 to which the significand is raised.Example: Express 3210 in the simplified 14-bit floating-point model.

• Now 32 is 25. So in (binary) scientific notation 32 = 1.0 x 25 = 0.1 x 26.• Using this information, put 110 (= 610) in the exponent field and 1 in the

significand as shown below.


cs2mf3/wfsp L09-5

Floating Point Number Representation -- IVSome problems

The illustrations shown at the rightare all equivalent representationsfor 32 using our simplified model.

Not only do these synonymousrepresentations waste space,but they can also cause confusion.

Another problem with our systemis that we have made no allowancesfor negative exponents. We have no way to express 0.5 (=2 -1) since there is no sign in theexponent field!However, with a little bit ofthought, all of these problemscan be fixed with no changesbeing necessary to our basicmodel.

Slide 6

cs2mf3/wfsp L09-6

Floating Point Number Representation -- VSome problems -- continued

To resolve the problem of synonymous forms, we will establish a rule that the first digit of the significand must be 1. This results in a unique pattern for each floating-point number.

• In the IEEE-754 standard, this 1 is implied meaning that a 1 is assumed after the binary point.

• By using an implied 1, we increase the precision of the representation by a power of two.

• NOTE: since we are not specialists in scientific computation, we shall use no implied bits in our instructional model used here.

To provide for negative exponents, we will use a biased exponent.A bias is a number that is approximately midway in the range of values expressible by the exponent. We subtract the bias from the value in the exponent to determine its true value.

• In our case, we have a 5-bit exponent. We will use 16 for our bias. This is called excess-16 representation.

In our model, exponent values less than 16 are negative, representing fractional numbers.


cs2mf3/wfsp L09-7

Floating Point Number Representation -- VIExamplesExample 1 -- Express 3210 in the revised 14-bit floating-point model.

We know that 32 = 1.0 x 25 = 0.1 x 26.

To use our excess 16biased exponent, we add16 to 6, giving 2210 (=101102). Diagramatically:

Example 2 -- Express 0.062510 in the revised 14-bit floating-point model.We know that 0.0625 is 2-4. So in (binary) scientific notation0.0625 = 1.0 x 2-4 = 0.1 x 2 -3.

To use our excess 16biased exponent, we add 16to -3, giving 1310 (=011012).

Example 3 -- Express -26.62510 in the revised 14-bit floating-point model.We find 26.62510 = 11010.1012.Normalizing, we have:26.62510 = 0.11010101 x 2 5.

To use our excess 16 biased exponent, we add 16 to 5, giving 2110 (=101012). We also need a 1 in the sign bit.

Slide 8

cs2mf3/wfsp L09-8

Floating Point Number Representation -- VIMore on FP Standards

The IEEE-754 single precision floating point standard uses bias of 127 over its 8-bit exponent.

• An exponent of 255 indicates a special value.– If the significand is zero, the value is ± infinity.– If the significand is nonzero, the value is NaN, “not a number,” often

used to flag an error condition.The double precision standard has a bias of 1023 over its 11-bit exponent.

• The “special” exponent value for a double precision number is 2047, instead of the 255 used by the single precision standard.

Both the 14-bit model that we have presented and the IEEE-754 floating point standard allow two representations for zero.

• Zero is indicated by all zeros in the exponent and the significand, but the sign bit can be either 0 or 1.

This is why programmers should avoid testing a floating-point value for equality to zero.

• Negative zero does not equal positive zero.

Text Sections: 2.5Next Lecture: Floating Point Arithmetic Operations



Review of previous work on signed and unsigned integers:integers:

• For signed numbers – overflow has occurred if, for the sign bit, there has been a carry IN and not carry OUT, or visa versa; no overflow has occurred is there is either no carries or there are two carries IN and OUT.

• For unsigned numbers – overflow has occurred only g ywhen there has been a carry OUT of the most significant bit (MSB) of the number.

• Otherwise, carry and overflow occur independently of each other.

• The table below summarizes these situations

Signed Numbers Result Sign Bit Carry IN

Sign BitCarry OUT

Overflow?

Correct Result?

0100 (+4) + 0010 (+2) 0110 (+6) No No No Yes

0100 (+4) + 0110 (+6) 1010 (-6) No Yes Yes No

1100 (-4) + 1110 (-2) 1010 (-6) Yes Yes No Yes( ) ( ) ( )

1100 (-4) + 1010 (-6) 0110 (+6) Yes No Yes No

Unsigned Numbers Result MSB Carry

MSB Carry IN

Overflow?

Correct Result?Carry

OUTCarry IN ? Result?

0100 (4) + 0010 (2) 0110 (6) No No No Yes

1100 (12) + 0010 (14) 1010 (10) Yes Yes Yes No

cs2mf3/wfsp L09-1


1

CS2MF3 – Digital Systems and Systems ProgrammingReview of previous work on signed and unsigned integers:

• For signed numbers – overflow has occurred if, for the sign bit, there has been a carry IN and not carry OUT, or visa versa; no overflow has occurred is there is either no carries or there are two carries IN and OUT.

• For unsigned numbers – overflow has occurred only when there has been a carry OUT of the most significant bit (MSB) of the number.

• Otherwise, carry and overflow occur independently of each other.

• The table below summarizes these situations

Signed Numbers Result Sign Bit Carry IN

Sign BitCarry OUT

Overflow?

Correct Result?

0100 (+4) + 0010 (+2) 0110 (+6) No No No Yes0100 (+4) + 0010 (+2) 0110 (+6) No Yes Yes No0100 (+4) + 0010 (+2) 0110 (+6) Yes Yes No Yes

cs2mf3/wfsp L09-1

0100 (+4) + 0010 (+2) 0110 (+6) Yes Yes No Yes0100 (+4) + 0010 (+2) 0110 (+6) Yes No Yes NoUnsigned Numbers Result MSB Carry

OUTMSB Carry

INOverflow

?Correct Result?

0100 (4) + 0010 (2) 0110 (6) No No No Yes1100 (12) + 0010 (14) 1010 (10) Yes Yes Yes No


The signed magnitude, one’s complement, and two’s complement representation that we have just presented deal with integer values only.We have also visited small fractional representations, however, without p , ,modification, these formats are not useful in scientific or business applications that deal with real number values.Floating-point representation solves this problem.It turns out that by clever manipulation, it is possible to perform floating-point calculations using any integer format.This is called floating-point emulation, because floating point values aren’t stored as such, we just create programs that make it seem as if

cs2mf3/wfsp L09-2

, j p gfloating-point values are being used.But since the advent of complex integrated circuitry, most modern computers are equipped with specialized hardware that performs floating-point arithmetic with no special programming required.


2

Floating Point Number Representation -- IIntroduction

Recall from your background studies, that any number can be written with an arbitrary number of significant figures using scientific notation.For example: 0 125 = 1 25 × 10-10.125 = 1.25 × 105,000,000 = 5.0 × 106

And Avogadro's number is 6.023 x 1023

Also, floating point numbers allow an arbitrary number of decimal places to the right of the decimal point, such as 0.5 × 0.25 = 0.125

Computers use a form of scientific notation for floating-point representation

cs2mf3/wfsp L09-3

Numbers written in scientific notation have three components:

Floating Point Number Representation -- IIIntroduction -- continued

Computer representation of a floating-point number consists of three fixed-size fields:

This is the standard arrangement of these fields.

The one-bit sign field is the sign of the stored value.

The size of the exponent field determines the range of values that can

cs2mf3/wfsp L09-4

The size of the exponent field, determines the range of values that can be represented.

The size of the significand determines the precision (number of significant figures) of the representation.


3

Floating Point Number Representation -- IIIIntroduction -- continued

The IEEE-754 single precision floating point standard uses an 8-bit exponent and a 23-bit significand.The IEEE-754 double precision standard uses an 11-bit exponent and a 52-bit significand.For illustrative purposes we will use a 14-bit model with a 5-bitFor illustrative purposes, we will use a 14 bit model with a 5 bit exponent and an 8-bit significand.The significand of a floating-point number is always preceded by an implied radix point or since we are using computers then the radix point (decimal point) should be termed the binary point.Thus, the significand always contains a fractional binary value.The exponent indicates the power of 2 to which the significand is raised.

cs2mf3/wfsp L09-5

Example: Express 3210 in the simplified 14-bit floating-point model.• Now 32 is 25. So in (binary) scientific notation 32 = 1.0 x 25 = 0.1 x 26.• Using this information, put 110 (= 610) in the exponent field and 1 in the

significand as shown below.

Floating Point Number Representation -- IVSome problems

The illustrations shown at the rightare all equivalent representationsfor 32 using our simplified model.

N t l d thNot only do these synonymousrepresentations waste space,but they can also cause confusion.

Another problem with our systemis that we have made no allowancesfor negative exponents. We have no way to express 0.5 (=2 -1) since there is no sign in theexponent field!

cs2mf3/wfsp L09-6

exponent field!However, with a little bit ofthought, all of these problemscan be fixed with no changesbeing necessary to our basicmodel.


4

Floating Point Number Representation -- VSome problems -- continued

To resolve the problem of synonymous forms, we will establish a rule that the first digit of the significand must be 1. This results in a unique pattern for each floating-point number.

• In the IEEE-754 standard, this 1 is implied meaning that a 1 is , p gassumed after the binary point.

• By using an implied 1, we increase the precision of the representation by a power of two.

• NOTE: since we are not specialists in scientific computation, we shall use no implied bits in our instructional model used here.

To provide for negative exponents, we will use a biased exponent.A bias is a number that is approximately midway in the range of

cs2mf3/wfsp L09-7

values expressible by the exponent. We subtract the bias from the value in the exponent to determine its true value.

• In our case, we have a 5-bit exponent. We will use 16 for our bias. This is called excess-16 representation.

In our model, exponent values less than 16 are negative, representing fractional numbers.

Floating Point Number Representation -- VIExamplesExample 1 -- Express 3210 in the revised 14-bit floating-point model.

We know that 32 = 1.0 x 25 = 0.1 x 26.

To use our excess 16biased exponent, we add16 to 6, giving 2210 (=101102). Diagramatically:

Example 2 -- Express 0.062510 in the revised 14-bit floating-point model.We know that 0.0625 is 2-4. So in (binary) scientific notation0.0625 = 1.0 x 2-4 = 0.1 x 2 -3.

To use our excess 16biased exponent, we add 16to -3, giving 1310 (=011012).

cs2mf3/wfsp L09-8

Example 3 -- Express -26.62510 in the revised 14-bit floating-point model.We find 26.62510 = 11010.1012.Normalizing, we have:26.62510 = 0.11010101 x 2 5.

To use our excess 16 biased exponent, we add 16 to 5, giving 2110 (=101012). We also need a 1 in the sign bit.


5

Floating Point Number Representation -- VIMore on FP Standards

The IEEE-754 single precision floating point standard uses bias of 127 over its 8-bit exponent.

• An exponent of 255 indicates a special value.– If the significand is zero, the value is ± infinity.– If the significand is nonzero, the value is NaN, “not a number,” often

used to flag an error conditionused to flag an error condition.The double precision standard has a bias of 1023 over its 11-bit exponent.

• The “special” exponent value for a double precision number is 2047, instead of the 255 used by the single precision standard.

Both the 14-bit model that we have presented and the IEEE-754 floating point standard allow two representations for zero.

• Zero is indicated by all zeros in the exponent and the significand, but the sign bit can be either 0 or 1.

cs2mf3/wfsp L09-9

sign bit can be either 0 or 1.

This is why programmers should avoid testing a floating-point value for equality to zero.

• Negative zero does not equal positive zero.

Text Sections: 2.5Next Lecture: Floating Point Arithmetic Operations


cs2mf3/wfsp L10-1


Arithmetic Operations (Addition and Subtraction)• Floating-point addition and subtraction are done using methods

analogous to how we perform calculations using pencil and paper.

• The first thing that we do is express both operands in the same exponential power, then add the numbers, preserving the exponent in the sum.

• If the exponent requires adjustment, we do so at the end of the calculation.

Example -- Find the sum of 1210 and 1.2510using the 14-bit floating-point model.

• We find 1210 = 0.1100 x 2 4.And 1.2510 = 0.101 x 2 1 =

0.000101 x 2 4.

• Thus, our sum is0.110101 x 2 4.

Slide 2

cs2mf3/wfsp L10-2

Floating Point Number Representation -- I• Arithmetic Operations (Multiplication and Division)

Floating-point multiplication is also carried out in a manner akin to how we perform multiplication using pencil and paper.

We multiply the two operands and add their exponents.

If the exponent requires adjustment, we do so at the end of the calculation.

Multiplication Example -- Find the product of 1210 and 1.2510 using the 14-bit floating-point model.

• We find 1210 = 0.1100 x 2 4.And 1.2510 = 0.101 x 2 1.

Thus, our product is0.0111100 x 2 5 = 0.1111 x 2 4.

The normalized product requiresan exponent of 2010 = 101002. 1 1 1 1 0 0 0 01 0 1 0 0


cs2mf3/wfsp L10-3

Floating Point Number Representation -- II• Arithmetic Operations

(Multiplication) -- continuedDetails of calculation:

• Use Booth’s Algorithmfor the integermultiplication:

Check:• 1.2510 x 1210 = 1510

1510 = 11112or 1510 = 0.1111 x 2 4.

Since our BIAS is excess-16in our model, we have anexponent result of2 20.

Or 2010 = 0101002 giving us the result from the previous page of :

1100

x 101

- 1100 or + 0100

- 1100 or + 0100

+ 1100____

0111100

From 1st digit (1) so subtract multiplicand

Ignore internal 0’s.

Another 1st

digit (1) so subtract multiplicand

Last op.,so add

multiplicand

Multiplier

Multiplicand

1 0 1 0 0 1 1 1 1 0 0 0 00

0.0111100 x 2 5+16

Slide 4

cs2mf3/wfsp L10-4

Floating Point Number Representation -- III• Arithmetic Operations (Error Conditions)

No matter how many bits we use in a floating-point representation, our model must be finite.The real number system is, of course, infinite, so our models can give nothing more than an approximation of a real value. At some point, every model breaks down, introducing errors into our calculations.By using a greater number of bits in our model, we can reduce these errors, but we can never totally eliminate them.Our job becomes one of reducing error, or at least being aware of the possible magnitude of error in our calculations.We must also be aware that errors can compound through repetitive arithmetic operations.For example, our 14-bit model cannot exactly represent the decimal value 128.5. In binary, it is 9 bits wide:

10000000.12 = 128.510


cs2mf3/wfsp L10-5

Floating Point Number Representation -- IV

• Arithmetic Operations (Error Conditions) – continuedWhen we try to express 128.510 in our 14-bit model, we lose the low-order bit, giving a relative error of:

If we had a procedure that repetitively added 0.5 to 128.5, we would have an error of nearly 2% after only four iterations.

Floating-point errors can be reduced when we use operands that are similar in magnitude.

If we were repetitively adding 0.5 to 128.5, it would have been better to iteratively add 0.5 to itself and then add 128.5 to this sum.

In this example, the error was caused by loss of the low-order bit.

Loss of the high-order bit is more problematic.

128.5 - 128128.5

≈ 0.39%

Slide 6

cs2mf3/wfsp L10-6

Floating Point Number Representation -- V• Arithmetic Operations (Error Conditions) – continued

Floating-point overflow and underflow can cause programs to crash.Overflow occurs when there is no room to store the high-order bits resulting from a calculation.Underflow occurs when a value is too small to store, possibly resulting in division by zero.Experienced programmers know that it is better for a program tocrash than to have it produce incorrect, but plausible, results.When discussing floating-point numbers, it is important to understand the terms range, precision, and accuracy.The range of a numeric integer format is the difference between the largest and smallest values that is can express.Accuracy refers to how closely a numeric representation approximates a true value.The precision of a number indicates how much information we have about a value


cs2mf3/wfsp L10-7

Floating Point Number Representation -- VI• Arithmetic Operations (Error Conditions) – continued

Most of the time,greater precision leadsto better accuracy, butthis is not always true.

• For example,3.1333 is a value of pithat is accurate to twodigits, but has 5 digitsof precision.

There are otherproblems withfloating pointnumbers.

Because of truncated bits, you cannot always assume that a particular floating point operation is commutative or distributive.

Quack Quack

Quack Quack

OOO

OOO

ACCURATE INACCURATE

IMPRECISE

PRECISEOOO OOO

OOOOOO

OOO

OOO

Slide 8

cs2mf3/wfsp L10-8

Floating Point Number Representation -- VII• Arithmetic Operations (Error Conditions) – continued

This means that we cannot assume:(a + b) + c = a + (b + c) or

a*(b + c) = ab + ac

Moreover, to test a floating point value for equality to some other number, first figure out how close one number can be to be considered equal. Call this value epsilon and use the statement:if (abs(x) < epsilon) then ...

In fact, there is a quantity called "machine epsilon" which represents the precision that is possible with a computer given a finite word size in which to carry out calculations

Text Sections: 2.5

Next Lecture: Character Codes – how the computer stores characters (as coded numbers), as we shall see.


cs2mf3/wfsp L11-1

CS2MF3 – Digital Systems and Systems ProgrammingCharacter Representations

Calculations are not useful until their results can be displayed in a manner that is meaningful to humans.We also need to store the results of calculations, and provide a means for data input.Thus, human-understandable characters must be converted to computer-understandable bit patterns using some sort of character encoding scheme, since a computer really only understands numbers; therefore we must use some scheme where characters are coded as numbers Character Codes

• As computers have evolved, character codes have evolved.• Larger computer memories and storage devices permit richer character

codes.• The earliest computer coding systems used six bits.• Binary-coded decimal (BCD) was one of these early codes. It was used

by IBM mainframes in the 1950s and 1960s.

Slide 2

cs2mf3/wfsp L11-2

Character Representations -- ICharacter Codes

Binary Coded Decimal (at right)• In 1964, BCD was extended to an 8-bit code,

Extended Binary-Coded Decimal Interchange Code (EBCDIC).

• EBCDIC was one of the first widely-used computer codes that supported upper andlowercase alphabetic characters, in addition to special characters, such as punctuation and control characters.

• EBCDIC and BCD are still in use by IBM mainframes today.

• The next two slides illustrate these codes and what character (or action) they represent, as some character codes are non-printable but instead specify some condition relating to input or output character operations. (usually to do with written syntax or formatting such as Headings, blank lines, etc.


cs2mf3/wfsp L11-3

Character Representations -- IICharacter Codes – Extended BCD or EBCDIC (continued)

Slide 4

cs2mf3/wfsp L11-4

Character Representations -- III

Character Codes – EBCDIC non-printing characters


cs2mf3/wfsp L11-5

Character Representations -- IV

Character Codes –continued

ASCII – American Standard Code for Information Interchange

• Other computer manufacturers chose the 7-bit ASCII as a replacement for 6-bit codes.

• While BCD and EBCDIC were based upon punched card codes, ASCII was based upon telecommunications (Telex) codes.

• Until recently, ASCII was the dominant character code outside the IBM mainframe world.

Slide 6

cs2mf3/wfsp L11-6

Character Representations -- VCharacter Codes – continued

UNICODE Character Set• Many of today’s systems

embrace Unicode, a 16-bit system that can encode the characters of every language in the world.

– The Java programming language, and some operating systems now use Unicode as their default character code.

• The Unicode codespace is divided into six parts. The first part is for Western alphabet codes, including English, Greek, and Russian.

• The Unicode code space allocation is shown at the left.


cs2mf3/wfsp L11-7

Character Representations -- VICharacter Codes – continued

UNICODE Character Set – continued• The lowest-numbered Unicode characters comprise the ASCII code.

• The highest provide for user-defined codes.

Text Sections: 2.6NOTE: we shall NOT be looking at the electrical methods used totransmit codes as we leave this to a course more suited to the Electrical and Computer Engineer. Therefore section 2.7 will beomitted in its entirety for us in cs2mf3.

Next Lecture: Error Detection and Error Correction in code transmissions (only in elementary consideration, so not all parts of this section will be covered as, in this course, we consider them advanced topics.)


cs2mf3/wfsp L12-1

CS2MF3 – Digital Systems and Systems ProgrammingError Detection and Error Correcting

Introduction• It is physically impossible for any data recording or transmission medium to

be 100% perfect 100% of the time over its entire expected useful life.

• As more bits are packed onto a square centimeter of disk storage, as communications transmission speeds increase, the likelihood of error increases-- sometimes geometrically.

• Thus, error detection and correction is critical to accurate data transmission, storage and retrieval.

• Check digits, appended to the end of a long number can provide some protection against data input errors.

– The last character of UPC (Universal Product Code) barcodes and ISBNs (International Standard Book Numbers) are check digits.

• Longer data streams require more economical and sophisticated error detection mechanisms.

• Cyclic redundancy checking (CRC) codes provide error detection for large blocks of data.

Slide 2

cs2mf3/wfsp L12-2

Error Detection and Error Correcting CodesError Detection -- Parity Bits

Data transmission errors are easy to fix once an error is detected. • Just ask the sender to transmit the data again.

In computer memory and data storage, however, this cannot be done.

• Too often the only copy of something important is in memory or on disk.

Thus, to provide data integrity over the long term, error correctingcodes are required; we shall briefly look at some later in the lecture.The first problem is to DETECT errors so we can either resend them or if that is not practical try to correct them.Parity memory is easy to operate:

• Whenever a pattern of bits is read, then written into a location (be it memory or, say, an I/O device register (storage location), its parity is checked to determine correctness.


cs2mf3/wfsp L12-3

Error Detection & Error Correcting Codes -- IIError Detection -- Parity Bits (continued)

Suppose we have an 8-bits data byte that represents some character(by ASCII say).Then to detect errors single errors in a bit, since two errors in two bits is much rarer than one bit errors, when the data byte is created, an extra bit is added to become 9 bits where the last bit is called the parity bit.The parity bit is set either on (1) or off (0) to make the sum of the 8 data bits either sum to an odd bit or an even bit.If we choose even parity, then the parity bit, when summed with the 8 data bits, will be 1 or 0 to maintain an even parity sum for the 9 bits.If the 8 data bits are then read back, with even parity, then the parity bit added to the data bits should result in an even sum.If not then at least one of the data bits must be in error, and we have a "parity error".Hence the parity method will only detect single bit errors and not be able to determine which bit is incorrect, nor how to correct it.

Slide 4

cs2mf3/wfsp L12-4

Error Detection & Error Correcting Codes -- IIIError Detection – Checksum Methods

This method uses more than one bit to be appended to the word size (# of bits) for the data.The data bits are summed up and two's complement negated and placed into the number of bits appended to the data bits.If all bits are correct, then adding the data summed bits to thechecksum should result in zero.This method checks all bits for errors and only works such that if any bits in error do not by happenstance to result in an identical sum, cannot be detected; however, the chances of this happening is extremely small.

CRC (Cyclic Redundancy Check) MethodsChecksums and CRCs are examples of systematic error detection.In systematic error detection a group of error control bits is appended to the end of the block of transmitted data.

• This group of bits is called a syndrome.


cs2mf3/wfsp L12-5

Error Detection & Error Correcting Codes -- IV

Error Detection – CRC MethodsJust as a Checksum works by addition, in CRC, if no bits are lost or corrupted, works by division where by dividing the received information string by an agreed upon pattern (usually a form of a polynomial works best) will give a remainder of zero.This is as far as we will go in this course; more advanced courses in communications and architecture will carry the concept farther

Error Detection AND CORRECTIONHamming codes and Reed-Soloman codes are two important error correcting codes. The mathematics of Hamming codes is much simpler than Reed-Soloman, however, we shall discuss neither in detail in this course.Hamming codes are code words formed by adding redundant check bits, or parity bits, to a data word, but in such a way as to be able to detect where the bit is in error and what it should be.

Slide 6

cs2mf3/wfsp L12-6

Error Detection & Error Correcting Codes -- V

Error Detection and CorrectionReed-Soloman codes are particularly useful in correcting burst errors that occur when a series of adjacent bits are damaged.

• Because CD-ROMs are easily scratched, they employ a type of Reed-Soloman error correction.

Text Sections Covered: 2.8

Next Lecture: Boolean Algebra


cs2mf3/wfsp L13-1

CS2MF3 – Digital Systems and Systems ProgrammingBoolean Algebra

Introduction• In the latter part of the nineteenth century, George Boole incensed

philosophers and mathematicians alike when he suggested that logical thought could be represented through mathematical equations.

– How dare anyone suggest that human thought could be encapsulated and manipulated like an algebraic formula?

• Computers, as we know them today, are implementations of Boole’s Laws of Thought.

– John Atanasoff and Claude Shannon were among the first to see this connection.

• In the middle of the twentieth century, computers were commonly known as “thinking machines” and “electronic brains.”

– Many people were fearful of them.• Nowadays, we rarely ponder the relationship between electronic digital

computers and human logic. Computers are accepted as part of our lives.– Many people, however, are still fearful of them.

• In this chapter, we will see the simplicity that constitutes the essence of the machine.

Slide 2

cs2mf3/wfsp L13-2

Boolean Algebra -- IIntroduction -- continued

Boolean algebra is a mathematical system for the manipulation ofvariables that can have one of two values.

• In formal logic, these values are “true” and “false.”• In digital systems, these values are “on” and “off,” 1 and 0, or “high”

and “low.”

Boolean expressions are created by performing operations on Boolean variables.

• Common Boolean operators include AND, OR, and NOT; much like the daily words we use for IF structures when we converse.

A Boolean operator can be completely described using a truth table.

The truth table for the Boolean operators AND and OR are shown on the next slide.

The AND operator is also known as a Boolean product. The OR operator is the Boolean sum.


cs2mf3/wfsp L13-3

Boolean Algebra -- IIBoolean Operations

The truth table for the Boolean NOT operator is shown at the lower left.

The NOT operation is most often designated by an overbar, and we shall use this method. However, it is sometimes indicated by a prime mark ( ‘ ) or an “elbow” (¬).A Boolean function has:

• At least one Boolean variable, • At least one Boolean operator, and • At least one input from the set {0,1}.

It produces an output that isalso a member of the set {0,1}.

• Note that the digital numberingsystems is very handy in digitalsystems.

Slide 4

cs2mf3/wfsp L13-4

Boolean Algebra -- III• Boolean Operations – cont'd

• The truth table for the Booleanfunction:

is shown at the right.

To make evaluation of the Booleanfunction easier, the truth tablecontains extra (shaded) columns tohold evaluations of subparts of thefunction.

As with common arithmetic, Boolean operations have rules of precedence.The NOT operator has highest priority, followed by AND and then OR.

This is how we chose the (shaded) function subparts in our table.


cs2mf3/wfsp L13-5

Boolean Algebra -- IVBoolean Operations – cont'd

Digital computers contain circuits that implement Boolean functions.The simpler that we can make a Boolean function, the smaller thecircuit that will result.

• Simpler circuits are cheaper to build, consume less power, and run faster than complex circuits.

With this in mind, we always want to reduce our Boolean functions to their simplest form.There are a number of Boolean identities that help us to do this.

Most Boolean identitieshave an AND (product)form as well as an OR(sum) form. We giveour identities using bothforms. Our first group israther intuitive andshown at the left.

Slide 6

cs2mf3/wfsp L13-6

Boolean Algebra -- VBoolean Operations – cont'd

Our second group of Boolean identities should be familiar from previous studies in algebra:

The last group of Boolean identities are perhaps the most useful.

These laws come to us byway of settheory orformal logic.


cs2mf3/wfsp L13-7

Boolean Algebra -- VI

Boolean Operations – cont'dWe shall use these relations to simplify some complex boolean expressions in the next lecture.

Text Sections Covered: 3.1 and 3.2


cs2mf3/wfsp L14-1

CS2MF3 – Digital Systems and Systems ProgrammingComputer Software Programming

More on Boolean Algebra• We can use Boolean identities to simplify the function:

as follows:

Slide 2

cs2mf3/wfsp L14-2

Boolean Algebra -- IBoolean Operations – cont'd

Sometimes it is more economical to build a circuit using the complement of a function (and complementing its result) than it is to implement the function directly.

DeMorgan’s law provides an easy way of finding the complement of a Boolean function.

Remember DeMorgan’slaw states that:DeMorgan’s law can be extended to any number of variables.Replace each variable by its complement and change all ANDs toORs and all ORs to ANDs.Thus, we find thecomplement of:Is:


cs2mf3/wfsp L14-3

Boolean Algebra -- IIBoolean Operations – cont'd

Through our exercises in simplifying Boolean expressions, we seethat there are numerous ways of stating the same Boolean expression.

• These “synonymous” forms are logically equivalent.• Logically equivalent expressions have identical truth tables.

In order to eliminate as much confusion as possible, designers express Boolean functions in standardized or canonical form.There are two canonical forms for Boolean expressions: sum-of-products and product-of-sums.

• Recall the Boolean product is the AND operation and the Boolean sum is the OR operation.

In the sum-of-products form, ANDed variables are ORed together.• For example:

In the product-of-sums form, ORed variables are ANDed together:• For example:

Slide 4

cs2mf3/wfsp L14-4

Boolean Algebra -- III

Boolean Operations – cont'dIt is easy to convert a function to sum-of-products form using its truth table.We are interested in the values of the variables that make the function true (=1).Using the truth table, we list the values of the variables that result in a true function value.Each group of variables is then ORedtogether.


cs2mf3/wfsp L14-5

Boolean Algebra -- IV

Boolean Operations – cont'dThe sum-of-products form for our function is:

We note that this function is not in simplest terms. Our aim is only to rewrite our function in canonical sum-of-products form.

Slide 6

cs2mf3/wfsp L14-6

Boolean Algebra -- V

Text Sections Covered: 3.2

Next Lecture Subject: Digital circuits.


cs2mf3/wfsp L15-1

CS2MF3 – Digital Systems and Systems ProgrammingComputer Hardware – Digital Systems

Introduction• We have looked at Boolean functions in abstract terms.• In this section, we see that Boolean functions are implemented in digital

computer circuits called gates.• A gate is an electronic device that produces a result based on two or more

input values.– In reality, gates consist of one to six transistors, but digital designers

think of them as a single unit.– Integrated circuits contain collections of gates suited to a particular

purpose.• The three simplest gates

are the AND, OR, and NOTgates.

• They correspond directly totheir respective Booleanoperations, as you can seeby their truth tables.

Slide 2

cs2mf3/wfsp L15-2

Digital Logic -- ILogic Gates

Another very useful gate isthe exclusive OR (XOR) gate. The output of the XORoperation is true only whenthe values of the inputs differ.

• Note the special symbol ⊕ for the XOR operation.NAND and NOR are two very important gates. Their symbols and truth tables are shown at the left. NAND and NOR are known as universal gates because they are inexpensive to manufacture and any Boolean function can be constructed using only NAND or only NOR gates.


cs2mf3/wfsp L15-3

Digital Logic -- IILogic Gates – continued

NAND and NOR Gates• Use for production of OR,

NOT and AND Booleanfunctions

Gates can have multipleinputs and more than oneoutput.

• A second output can beprovided for the comple-ment of the operation.

• We’ll see more of this later.• Examples are shown below

in this slide.

Slide 4

cs2mf3/wfsp L15-4

Digital Logic -- IIILogic Gates – continued

The main thing to remember is that combinations of gates implement Boolean functions.The circuit below implementsthe Boolean function:

• Normally Boolean expressions are simplified first so that circuits that implement them are as simple as possible.We have designed a circuit that implements the Boolean function:

This circuit is an example of a combinational logic circuit.


cs2mf3/wfsp L15-5

Digital Logic -- IVCombinational Circuits – the Half Adder

Combinational logic circuits produce a specified output (almost) at the instant when input values are applied.

• In a later section, we will explore circuits where this is not the case.Combinational logic circuits give us many useful devices.One of the simplest is the half adder,which finds the sum of two bits.We can gain some insight as to theconstruction of a half adder bylooking at its truth table, shown atthe right.

As we see,the sumcan befoundusing the XOR operation and the carry using the AND operation.

Slide 6

cs2mf3/wfsp L15-6

Digital Logic -- VCombinational Circuits – the Full Adder

We can change our half adderinto to a full adder by includinggates for processing the carrybit.

The truth table for a full adderis shown at the upper right.The circuit for the full adder is shown on the next slide.


cs2mf3/wfsp L15-7

Digital Logic -- VICombinational Circuits – the Full Adder

Here is the completed full adder.

Slide 8

cs2mf3/wfsp L15-8

Digital Logic -- VIICombinational Circuits – the Ripple Carry Adder

Just as we combined half adders to make a full adder, full adders can connected in series.

The carry bit “ripples” from one adder to the next; hence, this configuration is called a ripple-carry adder.

Text Section Covered: 3.3 – 3.5

Next Lecture Contents: More on Combinational Circuits includingDecoders and Multiplexers


cs2mf3/wfsp L16-1


Computer Hardware – Digital SystemsLogic Circuits -- Combinational Circuits

• We have previously seen half adders and full adders as examples of combinational circuits. There are others as we see in this lecture.

• Specifically, decoders are another important type of combinational circuit.

• Among other things, they are useful in selecting a memory location according to a binary value placed on the address lines of a memory bus.

• Address decoderswith n inputs can selectany of 2n locations.

• At the right is a blockdiagram example

Slide 2

cs2mf3/wfsp L16-2

Computer Hardware – Digital Systems• Logic Circuits

• Combinational Circuits –Address Decoders

• This is what a 2-to-4decoder looks like onthe inside.

• Combinational Circuits –Multiplexors

• A multiplexor does justthe opposite of adecoder.

• It selects a single output from several inputs.

•The particular input chosen for output is determined by the value of the multiplexor’scontrol lines.

•To be able to select among n inputs, log2ncontrol lines are needed, as at the left, which is a block diagram of a full multiplxor.


cs2mf3/wfsp L16-3

Computer Hardware – Digital Systems• Logic Circuits -- I

• Combinational Circuits – Multiplexors• This is what a 4-to-1 multiplexer circuit looks like on the inside.

Slide 4

cs2mf3/wfsp L16-4

Computer Hardware – Digital Systems• Logic Circuits -- II

• Combinational Circuits– Shifters• Recall that a byte is 8

bits and a nybble ishalf of that at 4 bits.

• This shifter movesthe bits of a nibbleone position to theleft or right.

• Sequential Circuits

• Combinational logiccircuits are perfect forsituations when werequire the immediateapplication of a Booleanfunction to a set of inputs.


cs2mf3/wfsp L16-5

Computer Hardware – Digital

Systems• Logic Circuits -- III

• Sequential Circuits • There are other times, however, when we need a circuit to change its value

with consideration to its current state as well as its inputs.– These circuits have to “remember” their current state.

• Sequential logic circuits provide this functionality for us. • As the name implies, sequential logic circuits require a means by which

events can be sequenced. • State changes are controlled by clocks.

– A “clock” is a special circuit that sends electrical pulses through a circuit.

• Clocks produce electrical waveforms such as the one shown above.• State changes occur in sequential circuits only when the clock ticks. • Circuits can change state on the rising edge, falling edge, or when the clock

pulse reaches its highest voltage.• Circuits that change state on the rising edge, or falling edge of the clock

pulse are called edge-triggered.• Level-triggered circuits change state when the clock voltage reaches its

highest or lowest level.

Slide 6

cs2mf3/wfsp L16-6

Computer Hardware – Digital Systems• Logic Circuits -- IV

• Sequential Circuits – Flip-Flops• To retain their state values,

sequential circuits rely on feedback.

• Feedback in digital circuits occurs when an output is looped back to the input.

• A simple example of this concept is shown above.– If Q is 0 it will always be 0, if it is 1, it will always be 1.

• You can see how feedback works by examining the most basic sequential logic components, the SR flip-flop.

– The “SR” stands for set/reset.• The internals of an SR flip-flop are shown below, along with its block

diagram.


cs2mf3/wfsp L16-7

Computer Hardware –Digital Systems

• Logic Circuits -- V• Sequential Circuits – SR Flip-Flops

• The behavior of an SR flip-flopis described by a characteristictable.

• Q(t) means the value of the output at time t.

• Q(t+1) is the value ofQ after the next clockpulse.

• The SR flip-flop actually has three inputs:S, R, and its current output, Q

• Thus, we can construct a truth tablefor this circuit, as shown at the left.

• Notice the two undefined values.When both S and R are 1, the SR flip-flopis unstable.

Slide 8

cs2mf3/wfsp L16-8


• Logic Circuits -- VI• Sequential Circuits – JK Flip-Flops

• If we can be sure that the inputs to an SR flip-flop will never both be 1, we will never have an unstable circuit. However. this may not always be so.

• The SR flip-flop can be modified to provide a stable state when both inputs are 1.

• This modified flip-flop is called a JKflip-flop, shown at the upper right.

– The “JK” is in honor of Jack Kilby.

• At the right, we see how a SRflip-flop can be modified to createa JK flip-flop.

• The characteristic table indicatesthat the flip-flop is stable for all inputs,as shown, again, at the lower right.

_Q

Q


cs2mf3/wfsp L16-9


• Logic Circuits -- VII• Sequential Circuits – D Flip-Flops

• Another modification of the SR flip-flop is theD flip-flop, shown at the right with itscharacteristic table.

• Notice that the output of the flip-flop remainsthe same during subsequent clock pulses.

• The output changes only when the valueof D changes.

• The D flip-flop is the fundamental circuitof computer memory.

– D flip-flops are usually illustrated using the block diagram at the right.

• The characteristic table for the D flip-flopis shown at the right.

Slide 10

cs2mf3/wfsp L16-10

Computer Hardware – Digital Systems

Text Section Covered: 3.5 - 3.6

Next Lecture Content: Finite State Machines (FSM).


cs2mf3/wfsp L17-1

CS2MF3 – Digital Systems and Systems ProgrammingDigital Systems – more on sequential circuits

Finite State Machines• A finite state machine (FSM) or finite state automaton (plural: automata)

is a model of behaviour composed of states, transitions and actions.• A state stores information about the past, i.e. it reflects the input changes

from the system start to the present moment.• A transition indicates a state change and is described by a condition that

would need to be fulfilled to enable the transition.• An action is a description of an activity that is to be performed at a given

moment. There are several action types:– Entry action -- execute the action when entering the state– Exit action -- execute the action when exiting the state– Input action -- execute the action dependent on present state and input

conditions– Transition action -- execute the action when performing a certain transition

Slide 2

cs2mf3/wfsp L17-2

More on Sequential Circuits -- I

Finite State MachinesFSM can be represented using a state diagram (or state transition diagram) as in the figure at the right.They can be characterized by a truth table as we have been doing in our Boolean Algebra section.Therefore they can represent sequential circuits.


cs2mf3/wfsp L17-3

More on Sequential Circuits -- II

Finite State MachinesThe MOORE Machine

• The FSM uses only entry actions, i.e. output depends only on the state.

• The advantage of the Moore model is a simplification of the behaviour.

• The example at the left shows a Moore FSM of an elevator door.

• The state machine recognizes two commands: "command_open" and "command_close" which trigger state changes.

• The entry action (E:) in state "Opening" starts a motor opening the door, the entry action in state "Closing" starts a motor in the other direction closing the door.

• States "Opened" and "Closed" do not perform any actions. They signal to the outside world (e.g. to other state machines) the situation: "door is open" or "door is closed".

Slide 4

cs2mf3/wfsp L17-4

More on Sequential Circuits --IIIFinite State Machines

The MEALY Machine• The FSM uses only input actions, i.e. output depends on input and state.• The use of a Mealy FSM leads often to a reduction of the number of states.• The example below shows a Mealy FSM implementing the same behaviour

as in the Moore example (the behaviour depends on the implemented FSM execution model).

• There are two input actions (I:): "start motor to close the door if command_close arrives" and "start motor in the other direction to open the door if command_open arrives".


cs2mf3/wfsp L17-5

More on Sequential Circuits -- IVFinite State Machines

The behavior of sequential circuits can be expressed using characteristic tables or finite state machines (FSMs).

• FSMs consist of a set of nodes that hold the states of the machine and a set of arcs that connect the states.

Moore and Mealy machines are two types of FSMs that are equivalent.• They differ only in how they express the outputs of the machine.

Moore machines place outputs on each node, while Mealy machinespresent their outputs on the transitions.The behavior of a JK flop-flop is depicted below by a Moore machine(left) and a Mealy machine (right).

Slide 6

cs2mf3/wfsp L17-6

More on Sequential Circuits -- VFinite State Machines

Although the behavior of Moore and Mealy machines is identical, their implementations differ.The Moore machine.

• We return to the behavior of a JK flop-flop.


cs2mf3/wfsp L17-7

More on Sequential Circuits -- VIFinite State Machines

The Mealy machine• Again we return to the behavior of a JK flop-flop.

Slide 8

cs2mf3/wfsp L17-8

More on Sequential Circuits -- VIIFinite State Machines

The algorithmic state machine (ASM)• It is difficult to express the complexities of actual implementations

using only Moore and Mealy machines because:1. they do not address the intricacies of timing very well, and2. it is often the case that an interaction of numerous signals is required

to advance a machine from one state to the next.

• For these reasons, Christopher Clare invented the algorithmic state machine (ASM).


cs2mf3/wfsp L17-9

More on Sequential Circuits -- VIII

Finite State MachinesThe algorithmic state machine (ASM)• Example -- the ASM for a microwave oven:

Slide 10

cs2mf3/wfsp L17-10

More on Sequential Circuits -- IX

Text Section Covered: 3.7

Next Lecture Content: Computer Systems fromSequential Circuits.


cs2mf3/wfsp L18-1

CS2MF3 – Digital Systems and Systems ProgrammingA Final Word on Combinational and Sequential Circuits

"Stateful" Applications• Sequential circuits are used anytime that we have a “stateful” application.

– A stateful application is one where the next state of the machine depends on the current state of the machine and the input.

• A stateful application requires both combinational and sequential logic.

• The following slides provide several examples of circuits that fall into this category.

• EXAMPLE #1 – a four-bit register– The block diagram is below and the actual

circuit is at the left.

Slide 2

cs2mf3/wfsp L18-2

Combinational and Sequential Circuits -- I

Example #1 – The 4-bit registerRecall the D flip-flop behaviour: it remembers!

• At the right is theequivalent circuitin SR flip-flop terms.

• At the right is therepresentativeD flip-flop schematic.

• The D flip-flop truth table.


cs2mf3/wfsp L18-3

Combinational and Sequential Circuits -- IIExample #2 – A 3-bit 4-word memory circuit from D ffs.

Slide 4

cs2mf3/wfsp L18-4

Combinational and Sequential Circuits -- IIIExample #4 – A binary counter from JK f-fs.

A binary counter is another example of a sequential circuit.The low-order bit is complemented at each clock pulse.Whenever it changes from 0 to 1, the next bit is complemented, and so on through the other flip-flops.

Recall the JKflip-flopbehaviourat right.Equivalent(right) RSflip-flop.JK f-ftruth tablebelow.

_Q

Q


cs2mf3/wfsp L18-5

Observations:We have seen digital circuits from two points of view –

1. digital analysis, and2. digital synthesis.

• Digital analysis explores the relationship between a circuits inputs and its outputs.

• Digital synthesis creates logic diagrams using the values specified in a truth table.

Digital systems designers must also be mindful of the physical behaviors of circuits to include minute propagation delays that occur between the time when a circuit’s inputs are energized and when the output is accurate and stable.

Combinational and Sequential Circuits -- IV

Slide 6

cs2mf3/wfsp L18-6

Combinational and Sequential Circuits -- VObservations – cont'd:

Digital designers rely on specialized software to create efficient circuits.

• Thus, software is an enabler for the construction of better hardware.

Of course, software is in reality a collection of algorithms that could just as well be implemented in hardware.

• Recall the Principle of Equivalence of Hardware and Software.

When we need to implement a simple, specialized algorithm and its execution speed must be as fast as possible, a hardware solution is often preferred.

This is the idea behind embedded systems, which are small special-purpose computers that we find in many everyday things.

Embedded systems require special programming that demands an understanding of the operation of digital circuits, the basics of which you have learned in these past several lectures.


cs2mf3/wfsp L18-7

Conclusions:Computers are implementations of Boolean logic.Boolean functions are completely described by truth tables.Logic gates are small circuits that implement Boolean operators.The basic gates are AND, OR, and NOT.

• The XOR gate is very useful in parity checkers and adders.

The “universal gates” are NOR, and NAND.Computer circuits consist of combinational logic circuits and sequential logic circuits.Combinational circuits produce outputs (almost) immediately whentheir inputs change.Sequential circuits require clocks to control their changes of state.The basic sequential circuit unit is the flip-flop: The behaviors of the SR, JK, and D flip-flops are the most important to know.

Combinational and Sequential Circuits -- VI

Slide 8

cs2mf3/wfsp L18-8

Conclusions – continued (but not studied in 2006/07):The behavior of sequential circuits can be expressed using characteristic tables or through various finite state machines.Moore and Mealy machines are two finite state machines that model high-level circuit behavior.Algorithmic state machines are better than Moore and Mealy machines at expressing timing and complex signal interactions.Examples of sequential circuits include memory, counters, andViterbi encoders and decoders.


Next Lecture Content: COMPUTER OPERATION (at last!).

Combinational and Sequential Circuits -- VII


cs2mf3/wfsp L19-1


Simple Computer (MARIE) Software ProgrammingMARIE Components – (Machine Architecture that is Really Easy and

Intuitive) [boy, what a reach!]• Some Architectural Preliminaries First

– In Lectures 1 through 4, we described a general overview of computer systems.– In Lectures 5 through 12, we examined how data are stored and manipulated by

various computer system components.– In Lectures 13 through 18, we investigated the fundamental components of digital

circuits.– Having this background, we can now understand how computer components work,

and how they fit together to create useful computer systems. We will work towards the actual programming (or coding) of the computer as a system.

– This is called systems programming. At the lowest level it will involve assembly language programming, for which reason we shall see) and at the higher level may be “C” or JAVA programming languages.

– For the rest of this course we will be working with assembly language coding as applied first to our imaginary machine (MARIE) and then for the Intel CPU.

Slide 2

cs2mf3/wfsp L19-2

Simple Computer (MARIE) Software Programming

Machine Architecture that is Really Easy and IntuitiveThe computer’s CPU fetches, decodes, and executes program instructions.

• Recall that von Neumann computers store programs in the memory; so they must be fetched from memory.

• Once an instruction is fetched it must be taken apart to see what the CPU should do with it; this is called decoding

• And finally once the parts of the instruction are sent to the correct units that can deal with what is to be done, it completes the requested action; this is called execution.

The two principal parts of the CPU are the datapath and the control unit.

• The datapath consists of an arithmetic-logic unit (ALU) and storage units (registers) that are interconnected by a data bus that is also connected to main memory.

• Various CPU components perform sequenced operations according tosignals provided by its control unit.


cs2mf3/wfsp L19-3

MARIE Components -- IMachine Architecture that is Really Easy and Intuitive

CPU components• Registers hold data that can be readily accessed by the CPU.• They can be implemented using D flip-flops.

– A 32-bit register requires 32 D flip-flops.• The arithmetic-logic unit (ALU) carries out logical and arithmetic operations

as directed by the control unit.• The control unit determines which actions to carry out according to the

values in a program counter register and a status register.• The CPU shares data with other system components by way of a data bus.

– A bus is a set of wires that simultaneously convey a single bit along each line.• Two types of buses are commonly found in computer systems:

– point-to-point, (left) and– multipoint buses at right.

Slide 4

cs2mf3/wfsp L19-4

MARIE Components -- IIMachine Architecture that is Really Easy and Intuitive

BUS structures• Buses consist of data lines, control lines, and address lines.

• While the data lines convey bits from one device to another, control lines determine the direction of data flow, and when each device can access the bus.

• Address linesdetermine thelocation of thesource ordestination ofthe data.


cs2mf3/wfsp L19-5

MARIE Components -- IIIMachine Architecture that is Really Easy and Intuitive

More on BUS structures• Recall that a point-to-point bus connects two nodes, while a multipoint

bus connects many.• Because a multipoint bus is a shared resource, access to it is

controlled through protocols, which are built into the hardware.• In a master-slave configuration, where more than one device can be

the bus master, concurrent bus master requests must be arbitrated.• Four categories of bus arbitration are:

1. Daisy chain: Permissions are passed from the highest-priority device to the lowest.

2. Centralized parallel: Each device is directly connected to an arbitration circuit.

3. Distributed using self-detection: Devices decide which gets the bus among themselves.

4. Distributed using collision-detection: Any device can try to use the bus. If its data collides with the data of another device, it tries again.

Slide 6

cs2mf3/wfsp L19-6

MARIE Components -- IVMachine Architecture that is Really Easy and Intuitive

CLOCKs• Every computer contains at least one clock that synchronizes the

activities of its components.• A fixed number of clock cycles are required to carry out each data

movement or computational operation.• The clock frequency, measured in megahertz or gigahertz,

determines the speed with which all operations are carried out.• Clock cycle time is the reciprocal of clock frequency.

– An 800 MHz clock has a cycle time of 1.25 ns.

• Clock speed should not be confused with CPU performance.• The CPU time required to run a program is given by the general

performance equation:


cs2mf3/wfsp L19-7

MARIE Components -- VMachine Architecture that is Really Easy and Intuitive

CLOCKs -- continued• We see that we can improve CPU throughput when we reduce the

number of instructions in a program, reduce the number of cycles per instruction, or reduce the number of nanoseconds per clock cycle.

The Input / Output SubSystem• A computer communicates with the outside world through its

input/output (I/O) subsystem.• I/O devices connect to the CPU through various interfaces.• I/O can be memory-mapped-- where the I/O device behaves like main

memory from the CPU’s point of view.• Or I/O can be instruction-based, where the CPU has a specialized I/O

instruction set.

Slide 8

cs2mf3/wfsp L19-8

MARIE Components -- VIMachine Architecture that is Really Easy and Intuitive

Interrupts• The normal execution of a program is altered when an event of

higher-priority occurs. The CPU is alerted to such an event through an interrupt.

• Interrupts can be triggered by I/O requests, arithmetic errors (such as division by zero), or when an invalid instruction is encountered.

• Each interrupt is associated with a procedure that directs the actions of the CPU when an interrupt occurs.

– Nonmaskable interrupts are high-priority interrupts that cannot be ignored.


Next Lecture Content: MARIE COMPUTER ORGANIZATION.


cs2mf3/wfsp L20-1

CS2MF3 – Digital Systems and Systems ProgrammingMore on MARIE: A Simple Computer

Memory components• Computer memory consists of a linear array of addressable storage

cells that are similar to registers.• Memory can be byte-addressable, or word-addressable, where a word

typically consists of two or more bytes.• Memory is constructed of RAM chips, often referred to in terms of

length × width.• If the memory word size of the machine is 16 bits, then a 4M × 16 RAM

chip gives us 4 megabytes of 16-bit memory locations.• How does the computer access a memory location corresponds to a

particular address?• We observe that 4M can be expressed as 2 2 × 2 12 = 2 12 words.• The memory locations for this memory are numbered 0 through 2 12 -1.• Thus, the memory bus of this system requires at least 12 address lines.

– The address lines “count” from 0 to 212 - 1 in binary. Each line is either “on” or “off” indicating the location of the desired memory element.

Slide 2

cs2mf3/wfsp L20-2

More on MARIE: A Simple ComputerMemory components

At the right is a powers of 2chart which, when numbersget big, may mean differentconfigurations of memory.For instance, at 32 bits, the memory may be 4 words of 8 bits each, ORmay be 2 words of 16-bit memory.Below is shown this effect for 8-bit word memory and 16-bit word memory.We refer the the number of bits in a word a row and the number of rows tells us how many words compose the memory array.


cs2mf3/wfsp L20-3


In the old days, memory was placed on interface “cards” and adding more memory meant adding more memory interface cards to the backplane.Each interface card had rows of IC memory “chips”. Today, memory is on the mainboardand is added toby addingmemory “banks”to memory slots.

Slide 4

cs2mf3/wfsp L20-4


Physical memory usually consists of more than one RAM chip.Access is more efficient when memory is organized into banks of chips with the addresses interleaved across the chipsWith low-order interleaving, the low order bits of the address specify which memory bank contains the address of interest.

Low-Order Interleaving


cs2mf3/wfsp L20-5

More on MARIE -- IMemory components -- continued

Accordingly, in high-order interleaving, the high order address bits specify the memory bank.

We can now bring together many of the ideas that we have discussed to this point using a very simple model computer.Our model computer, the Machine Architecture that is Really Intuitive and Easy, MARIE, was designed for the singular purpose of illustrating basic computer system concepts.While this system is too simple to do anything useful in the real world, a deep understanding of its functions will enable us to comprehendsystem architectures that are much more complex.

High-Order Interleaving

Slide 6

cs2mf3/wfsp L20-6

More on MARIE -- IIMARIE architectureMARIE has the following characteristics:

• Binary, two's complement data representation.• Stored program, fixed word length data and instructions.• 4K words of word-addressable main memory.• 16-bit data words.• 16-bit instructions, 4 for the opcode and 12 for the address.• A 16-bit arithmetic logic unit (ALU).• Seven registers for control and data movement.


cs2mf3/wfsp L20-7

More on MARIE -- IIIMARIE architecture -- continued

Register Organization:1. Accumulator, AC, a 16-bit register that holds a conditional operator (e.g.,

"less than") or one operand of a two-operand instruction.

2. Memory address register, MAR, a 12-bit register that holds the memory address of an instruction or the operand of an instruction.

3. Memory buffer register, MBR, a 16-bit register that holds the data after its retrieval from, or before its placement in memory.

4. Program counter, PC, a 12-bit register that holds the address of the next program instruction to be executed.

5. Instruction register, IR, which holds an instruction immediately preceding its execution.

6. Input register, InREG, an 8-bit register that holds data read from an input device.

7. Output register, OutREG, an 8-bit register, that holds data that is ready for the output device.

Slide 8

cs2mf3/wfsp L20-8

More on MARIE -- IVMARIE architecture -- continued

Data Paths:• The registers are interconnected,

and connected with main memory through a common data bus.

• Each device on the bus is identified by a unique number that is set on the control lines whenever that device is required to carry out an operation.

• Separate connections are also provided between the accumulator and the memory buffer register, and the ALU and the accumulator and memory buffer register.

• This permits data transfer between these devices without use of the main data bus.


cs2mf3/wfsp L20-9

More on MARIE -- VMARIE architecture -- continued

Assembly Language: Instruction Formats• A computer’s instruction set architecture (ISA) specifies the format of its

instructions and the primitive operations that the machine can perform.

• The ISA is an interface between a computer’s hardware and its software.

• Some ISAs include hundreds of different instructions for processing data and controlling program execution.

• The MARIE ISA consists of only thirteen instructions.• This is the format

of a MARIE instruction:

Slide 10

cs2mf3/wfsp L20-10

More on MARIE -- VIMARIE architecture -- continued

Instruction Formats -- continued• The fundamental (first nine of 13) MARIE instructions are:

• This is a bit patternfor a LOAD instructionas it would appearin the IR:


cs2mf3/wfsp L20-11

More on MARIE -- VIIMARIE architecture -- continued

Instruction Formats –continued• So, for a LOAD instruction,

we see that the opcode is 1and the address from whichto load the data is 3.

• For the above, this is a bit pattern for a SKIPCOND instruction as it would appear in the IR

• For this instruction, we see that the opcode is 8 and bits 11 and 10 are 10, meaning that the next instruction will be skipped if the value in the AC is greater than zero.

Slide 12

cs2mf3/wfsp L20-12

More on MARIE -- VIII


Next Lecture Content: MARIE Computer Programming.


cs2mf3/wfsp L21-1


MARIE –Program-ming Model

VB machine simulator

• This simul-ator will be illustra-ted in one of your tutorials after the mid-term.

Slide 2

cs2mf3/wfsp L21-2


MARIE –Program-ming Model

VB machine simulator

• CPU control unit (the micro-Mach-ine)


cs2mf3/wfsp L21-3

MARIE Programming Operations -- IThe MARIE Programming Model -- continued

The MARIE microMachine• Each of our instructions actually consists of a sequence of smaller

instructions called microoperations.• The exact sequence of microoperations that are carried out by an

instruction can be specified using register transfer language (RTL).• In the MARIE RTL, we use the notation M[X] to indicate the actual data

value stored in memory location X, and ← to indicate the transfer of bytes to a register or memory location.

– So M[MAR] is a memory read from memory location (address) held in the Memory Address Register (MAR).

• Example Codes:– The RTL for the LOAD instruction is:

– Similarly, the RTL for the ADD instruction is:

MAR ← XMBR ← M[MAR] AC ← MBR

MAR ← XMBR ← M[MAR]AC ← AC + MBR

Slide 4

cs2mf3/wfsp L21-4

MARIE Programming Operations -- IIThe MARIE Programming Model -- continued

The MARIE microMachine• More Example Codes:

– The assembly code to change the consecutive sequence of instructions to be executed can be done using the SKIPCOND instruction, which is the most complicated MARIE instruction that says to skip the next instruction

1. if the value inside the accumulator (AC) is negative signaled by the value of bit 10 and 11 of the address part of the instruction being 00, OR

2. if the value inside the accumulator (AC) is zero signaled by the value of bit 10 and 11 of the address part of the instruction being 01

3. if the value inside the accumulator (AC) is positive signaled by the value of bit 10 and 11 of the address part of the instruction being 11

– The RTL for the SKIPCOND instruction is:

If IR[11 - 10] = 00 thenIf AC < 0 then PC ← PC + 1

else If IR[11 - 10] = 01 thenIf AC = 0 then PC ← PC + 1

else If IR[11 - 10] = 11 thenIf AC > 0 then PC ← PC + 1


cs2mf3/wfsp L21-5

MARIE Programming Operations -- IIIThe MARIE Programming Model -- continued

MARIE Instruction Processing – the FDE cycle• The fetch-decode-execute cycle is the series of steps that a computer

carries out when it runs a program, and is shown below in the flowchart.• We first have to fetch

an instruction from memory, and place it into the IR.

• Once in the IR, it is decoded to determine what needs to be done next.

• If a memory value (operand) is involved in the operation, it is retrieved and placed into the MBR.

• With everything in place, the instruction is executed.

Slide 6

cs2mf3/wfsp L21-6

MARIE Programming Operations -- IVThe MARIE Programming Model -- continued

MARIE Instruction Processing -- Interrupts• All computers provide a way of interrupting the fetch-decode-execute

cycle.• Interrupts occur when:

– A user break (e.,g., Control+C) is issued– I/O is requested by the user or a program– A critical error occurs

• Interrupts can be caused by hardware or software.

– Software interrupts are also called traps.

• Interrupt processing involves adding another step to the fetch-decode-execute cycle as shown at the lower left.

• The next slide shows how an interrupt is handled – it is called an interrupt handler.


cs2mf3/wfsp L21-7

MARIE Programming Operations -- VThe MARIE Programming Model -- continued

MARIE Instruction Processing -- Interrupts• A Generic Interrupt Service Routine:

Slide 8

cs2mf3/wfsp L21-8

MARIE Programming Operations -- VIThe MARIE Programming Model -- continued

MARIE Instruction Processing – Interrupts• For general-purpose systems, it is common to disable all interrupts

during the time in which an interrupt is being processed.– Typically, this is achieved by setting a bit in the flags register.

• Interrupts that are ignored in this case are called maskable.• Nonmaskable interrupts are those interrupts that must be processed in

order to keep the system in a stable condition.• Interrupts are very useful in processing I/O.• However, interrupt-driven I/O is complicated, and is left for the next

course in computer architecture.– We will look into this idea in greater detail later in term.

• MARIE, being the simplest of simple systems, uses a modified form of programmed I/O.

• All output is placed in an output register, OutREG, and the CPU polls the input register, InREG, until input is sensed, at which time the value is copied into the accumulator.


cs2mf3/wfsp L21-9

MARIE Programming Operations -- VII


Next Lecture Content: MARIE COMPUTER ORGANIZATION.


cs2mf3/wfsp L22-1

CS2MF3 – Digital Systems and Systems ProgrammingMARIE Software Programming

The MARIE micromachine• Underneath each of the Assembly Language instructions, there may exist

either a microcode controller (all hardware version we shall study later) or a microMachine.

– The microMachine means that there is a small computer inside the controller unit that, for every assembly code instruction, executes a sequence of instructions that manipulate the control unit that "fetches", "decode" and "executes" and then handles any pending interrupts as we shall see later.

– Instead of assembly language we use Register Transfer Notation (RTN) to show which gates will be active for each assembly language instruction that is executed (that is, fetched, decoded, and executed).

– RTL programs exist for each assembly code instruction.– RTL is held in Control Read-Only Memory (CROM). Sometimes CROM itself

may be able to be coded, then the CROM become a WCS (writeable control stores). Code that changes at this level is called machine emulation, as opposed to computer simulation (or interpretation or compilation) that occurs at the machine assembly code level.

Slide 2

cs2mf3/wfsp L22-2

Recall the simple MARIE program that was run in the tutorial, which takes two numbers from memory, adds them together, and puts the sum back in the memory at another location. It is given below and is loaded by the loader starting at location 100, that is a set of mnemonic instructions are stored at addresses 100 - 106 (hex):

MARIE Software Programming


cs2mf3/wfsp L22-3

Here is what happens inside the computer when the program runs.This is the LOAD 104 instruction:

After the fetch and decode,this LOAD (in bits IR[15-12] ) instruction says to go tomemory (address in bits IR[11-0] ),pick up the value ( M[MAR] ) contained in memory at address 104,and put it into the Accumulator ( MBR -> ACC ).

MARIE Software Programming -- I

Slide 4

cs2mf3/wfsp L22-4

You have seen what happens in the VBsim, this different simulator is less graphical but does provide ways to delay execution by time or set breakpoints to stop execution in order to view machine register contents.

MARIE Software Programming -- II


cs2mf3/wfsp L22-5

The simulator is programmed in JAVA and can be obtained by going to the textbook URL and following the instruction therein to set it up.Shown below is the "open assembly file" (left) and "set delay" (rights) commands.

MARIE Software Programming -- III

URL is http://computerscience.jbpub.com/ECOA

Slide 6

cs2mf3/wfsp L22-6

.MARIE Software

Programming -- IVThe simulator showing the Add2No program execution, with line by line snapshots (a) through (c).

(a)

(b)

(c)


cs2mf3/wfsp L22-7

This is the second instruction, ADD 105, and its RTN implementation:

The remaining two instructions should be run through with either the textbook simulator or the VB one to review the remaining implementations in the RTN.

MARIE Software Programming -- V

Slide 8

cs2mf3/wfsp L22-8

So why is this low level coding called "assembly language"?The reason is that early computers could only be coded with actual machine code instructions, that is the binary coding of binary that the computer can execute.The next step in making programming of computer easier, was to create abbreviations for all the main possible codes that could be executed. An example is ADD, or LOAD, etc.These are called mnemonics that represent what the operation code (bits 15-12) does.Of course mnemonics are NOT executable by the computer, but are only helpful to humans who wish to program at this machine level.

Assemblers translate instructions that are comprehensible to humans into the machine language that is comprehensible to computers

We note the distinction between an assembler and a compiler: In assembly language, there is a one-to-one correspondence between a mnemonic instruction and its machine code. With compilers, this is not usually the case.Assemblers, in translating the instruction to binary (machine code), must "assemble" the addresses and mnemonics into machine code.

MARIE Software Programming -- VI


cs2mf3/wfsp L22-9

Assembler ConceptsAssemblers create an object program file from mnemonic source code in two passes. Consider our "Add2No"assembly language programDuring the first pass, the as-sembler assembles as muchof the program is it can,while it builds a symbol tablethat contains memory references(by line #) for all symbols in the program.

During the second pass, the instructions are completed using the values from the symbol table. That is, the statements that have

references to symbols will havethese symbols replaced with thememory value of that specificsymbol which has beendetermined from the final passof the assembler (See next slide).

MARIE Software Programming -- VII

OpCode(Load)=1OpCode(Add) =3OpCode(Store)=2OpCode(Halt) =7

In pgm line 1

In pgm line 2

In pgm line 3

Slide 10

cs2mf3/wfsp L22-10

Assembler Concepts – continuedThe final assembled program isshown at the lower left in this slide.

Consider our example program onceagain (at the upper right).

MARIE Software Programming -- VIII

• Note that we have included two as-sembler directives HEX and DECthat specify the radix of theconstants.

• Such directives help the assembler do its job and are not really past of the assembly language instruction set.

• Text Section Covered: 4.10 – 4.11

• Next Lecture Content: More on MARIE COMPUTERORGANIZATION.


CS2MF3 – Digital Systems and Systems ProgrammingMARIE Software Programmingg g

More on the Instruction Set• Addressing Modes:

– So far all of the MARIE instructions

"ADDI X" instruction:MAR ← XMBR ← M[MAR]– So far, all of the MARIE instructions

that we have discussed use adirect addressing mode.

– This means that the address of the

MBR ← M[MAR]MAR ← MBR MBR ← M[MAR]ACC ← ACC + MBRoperand is explicitly stated in the

instruction.

– It is often useful to employ a indirect addressing, where the address of the address of the operand is given in the instruction.

ACC ← ACC + MBR

p g» If you have ever used pointers in a program, you are already familiar with indirect

addressing.

– To illustrate the indirect addressing mode and what happens at the machine l l th l f hi i t ti ith i di t dd ilevel, there are a couple of machine instructions with indirect addressing mode instructions in the MARIE instruction set.

– The ADDI instruction specifies the address of the address of the operand. The RTL at the above right, tells us what happens at the register level: what is

cs2mf3/wfsp L23-1

g pp gusually the address of the operand now becomes the address of a pointer to the actual operand and two memory accesses are necessary instead of just one.

MARIE Software Programming -- IMore on the Instruction Set

Addressing Modes -- continued• The Indirect Addressing Mode:

– The other instruction that uses

"JUMPI X" instruction:MAR ← XMBR ← M[MAR]– The other instruction that uses

the indirect addressing modeis the jump indirect instruction.

– The JUMPI instruction specifies the address of the operand, which itself is an

PC ← MBR

p p ,address at which the next instruction should be executed from memory. Again two memory accesses are necessary instead of just one.

Another helpful instruction is the CLEAR instruction.p• It simply sets the contents of the accumulator to all zeroes.• The RTL for CLEAR is:

"CLEAR" instruction:ACC ← 0

cs2mf3/wfsp L23-2


More on the Instruction Set

MARIE Software Programming -- II

Subroutines (procedures)• Another helpful programming tool is the use of subroutines.

Th j d t i t ti JNS i li it d b ti• The jump-and-store instruction, JNS, gives us limited subroutine functionality. The details of the JNS instruction are given by the following RTL:

"JNS X" instruction:MBR ← PC / get next instr. locationMAR ← X / get operand addr.M[MAR] ← MBR / put next instr. in [email protected] ← X / put operand addr. into mem.ACC ← 1 / add 1 to operand addr.

/ t d t j t b

• QUESTION: Does JNS permit recursive calls?

ACC ← ACC + MBR / get ready to jump to subr.PC ← ACC / jump to operand addr.

cs2mf3/wfsp L23-3

• QUESTION: Does JNS permit recursive calls?

MARIE Software

100 | LOAD Addr / load addr of 1st no. 4 addition101 | STORE Next / store this addr as next pntr102 | LOAD Num / load the # of items to be added103 | SUBT One / decrement counter

Programming -- III

103 | SUBT One / decrement counter104 | STORE Ctr / store value in Ctr as loop counter105 | CLEAR / Using running sum so make sure ACC=0106 | Loop, LOAD Sum / load running sum, I.e.Sum in ACC107 | ADDI Next / Add value pointed 2 by loc Next

Using the Full Instruction SET – Example

108 | STORE Sum / Store this sum109 | LOAD Next / Load Next #10A | ADD One / Inc by 1 to point to next address10B | STORE Next / Store that value into the loc "Next"10C | LOAD Ctr / Load loop counter value

Using a loop to add five numbers

hi h

10C | LOAD Ctr / Load loop counter value10D | SUBT One / Have done one so dec loop cntr.10E | STORE Ctr / Restore value back in cntr loc.10F | SKIPCOND 000 / If cntr<0, skip the next instruction110 | JUMP Loop / otherwise go through loop again.

which are stored after the program in memory.

111 | HALT / Done all five #s so quit program.112 | Addr, HEX 118 / Pntr to where the five numbers begin113 | Next, HEX 0 / Pntr to the next number to be added114 | Num, DEC 5 / Number of values to be added (5)115 | Sum DEC 0 / The running Sum y115 | Sum, DEC 0 / The running Sum116 | Ctr, HEX 0 / loop counter (to be dec to 0 if done)117 | One, DEC 1 / Used to inc. and dec. where needed118 | DEC 10 / First number to add119 | DEC 15 / Second number to add

cs2mf3/wfsp L23-4

11A | DEC 2 / Third number to add11B | DEC 25 / Fourth number to add11C | DEC 30 / Fifth number to add


CPU Control Unit

MARIE Software Programming -- IV

Decoding Operations• So the CPU of a computer contains registers (such as IR and PC,

etc ) but must also contain a mechanism to control dataetc.) but must also contain a mechanism to control data manipulation, that is, data control paths need to be opened and closed at appropriate times. This is called a computer control unit which keeps things synchronized, making sure that bits flow to the

t t th t d dcorrect components as the components are needed.

• There are two general ways in which a control unit can be implemented: microprogrammed control and hardwired control

1. The former is a miniature machine that performs appropriate RTN instructions in the same way a normal machine works with assembly instructions. This is called a mircoCoded machine or microMachine for short, where a small program is placed into read-only memory in the micromachine or microcontroller.

2. The latter is a hardwired unit that supplies the correct control signals when and where they are needed for each assembly instruction executed within the CPU This is called a hardwired control unit where

cs2mf3/wfsp L23-5

executed within the CPU. This is called a hardwired control unit wherecontrollers implement this program using digital logic components..

CPU Control Unit

MARIE Software Programming -- V

Decoding Operations• The text provides a complete list of the register transfer language for

each of MARIE’s instructions.• The microoperations given by each RTL define the operation of

MARIE’s control unit.• Each microoperation consists of a distinctive signal pattern that is

interpreted by the control unit and results in the execution of an instruction.

– Recall, the RTN for the Add instruction:

"Add" Instruction:MAR ← XMBR ← M[MAR]ACC ← ACC + MBR

cs2mf3/wfsp L23-6


CPU Control Unit

MARIE Software Programming -- VI

Decoding Operations• Each of MARIE’s registers and

main memory have a uniquemain memory have a unique address along the datapath.

• The addresses take the form ofsignals issued by the control unitsignals issued by the control unit.

• Let us define two setsof three signals.O t W B P t l• One set, W, B, P3, controlsreading from memory or a register,(MBR in this case) and the other

t i ti f R B P t lset consisting of R, B, P3, controls writing to memory or a register.

• The next slide shows a close up

cs2mf3/wfsp L23-7

view of MARIE’s MBR.

CPU Control UnitDecoding Operations

MARIE Software Programming -- VIIg p

• This register is enabled for reading when W, B & P3 (MBR) are high, • And it is enabled for writing when R, B and P3 (bus line 3) are high.

cs2mf3/wfsp L23-8


MARIE Software Programming -- VIII

Text Section Covered: 4.12 – part of 4.13

Next Lecture Content: More on MARIE computerNext Lecture Content: More on MARIE computerdecoding mechanisms.

cs2mf3/wfsp L23-9


CS2MF3 – Digital Systems and Systems ProgrammingMARIE Software Programming recapMARIE Software Programming -- recap

MARIE Full Instruction Set

cs2mf3/wfsp L24-1

Systems Programming -- I

MARIE Software ProgrammingMARIE Instruction Set Format

• Opcode is in 4 bits so 24 = 16 different instructions canOpcode is in 4 bits so 2 16 different instructions can represented. We have only 13 defined (See previous slide.)

• Address size is 12 bits, so 212 = 4096 or 4K of RAM is directly addressable;

• However, if indirect addressing is used, then a full 16 bits is available for the address, so 216 = 65, 536 or 64K of RAM can be addressed (but only indirectly).

cs2mf3/wfsp L24-2



Systems Programming -- II

MARIE Software ProgrammingMARIE Control Decoder --hardwired

I h l l h• In the last lecture, we saw the control unit for the RAM MBR which required three control lines for reading and another three forfor reading and another three for writing the register contents.

• Now we will focus on the CPU ALU which involves only three assembly language instructions, namely, ADD, SUBTR and CLEAR.

• We shall see later why we need• We shall see later why we need to add another, and that is the NOP or no operation directive.

• A close-up of the ALU dataflow

cs2mf3/wfsp L24-3

ppaths is illustrated on the next slide.

Systems Programming -- III

MARIE Software ProgrammingMARIE Software ProgrammingMARIE Control Decoder -- hardwired

• Therefore, call the ALU control lines: A0, A1, A2, and A3, for +, -, clear & NOPNOP

• Register will be controls: P0 through P5

• To complete the control lines from the control box we need Timing Lines: T0 through T and a counter reset line CT0 through T7 and a counter reset line Cr .

• Let us now create the control line sequence to perform an ADD assembly language instruction.

• We look to the RTN necessary to do this which isnecessary to do this, which is shown on the next slide..

cs2mf3/wfsp L24-4


Systems Programming -- IVMARIE Software Programming ADD instruction in RTN:MARIE Software Programming

MARIE Control Decoder – hardwired• Decoder Operations for an ADD• After an Add instruction is fetched

ADD instruction in RTN:MAR ← XMBR ← M[MAR]ACC ← ACC + MBRAfter an Add instruction is fetched,

the address, X, is in the rightmost12 bits of the IR, which has adatapath address of 7.

• X is copied to the MAR which has a

ACC ← ACC + MBR

• X is copied to the MAR, which has adatapath address of 1.

• Thus we need to raise signalsP2, P1, and P0 to read from the IR,and signal P to write to the MARand signal P5 to write to the MAR.

• Here is the complete signalsequence for MARIE’sAdd instruction:

WBP0 RBP1 T0: MAR ← XRMP12 WBP0 RBP3 T1: MBR ← M[MAR]

P8 P9 A0 P10 T2: ACC ← ACC + MBRCr T3: [Reset counter]

cs2mf3/wfsp L24-5

Cr T3: [Reset counter]

Systems Programming -- V

MARIE Software ProgrammingMARIE Software ProgrammingMARIE Control Decoder – hardwired

Decoder for the ADD instruction:

WBP0 RBP1 T0: MAR ← XRMP12 WBP0 RBP3 T1 : MBR ← M[MAR]

P8 P9 A0 P10 T2: ACC ← ACC + MBRCr T3: [Reset counter]

• These signals are ANDed with combina-ti l l i t b i b t th d i dtional logic to bring about the desired machine behavior.

• the timing diagram for this instruction is shown at the right of this slideshown at the right of this slide.

• Notice the concurrent signal states during each machine cycle: C0 through C3.

cs2mf3/wfsp L24-6


Systems Programming -- VI


MARIE Control Decoder –hardwiredhardwired

• Decoder for the ADDinstructioninstruction

• At the left is the hard-wired logic ffor MARIE’s Add = 0011 i t ti

cs2mf3/wfsp L24-7

instruction.

Systems Programming -- VIIMARIE Software ProgrammingMARIE Software Programming

MARIE Control Decoder – microprogrammed

Text Section Covered: Part of 4.13

Next Lecture Content: More on MARIE computerdecoding mechanisms –gmicroMachine version.

cs2mf3/wfsp L24-8


cs2mf3/wfsp L25-1

CS2MF3 – Digital Systems and Systems ProgrammingMARIE Software Programming

MARIE Control Decoder –microprogrammed

• In the last lecture we described a signal pattern that was generated by hardware to perform as a control decoder. This could have just as well been generated by a microprogrammed control unit as well as a hardwired control.

• In microprogrammed control, which we shall look at next, the bit pattern of an instruction that comes from a miniature computer feeds directly into the combinational logic within the control unit as shown at the right.

Slide 2

cs2mf3/wfsp L25-2

Systems Programming -- IMARIE Software Programming

MARIE Control Decoder – microprogrammed• In microprogrammed control, instruction microcode produces control

signal changes.• Machine instructions are the input for a microprogram that converts the 1s

and 0s of an instruction into control signals.


cs2mf3/wfsp L25-3

Systems Programming -- IIMARIE Software Programming

MARIE Control Decoder – microprogrammed• The microprogram is stored in firmware, which is also called the control

store.– Sometimes this control store may be writeable (WCS) meaning that the

program that implements assembly code instructions can be modified.– This means that software that makes a machine perform with one assembly

language instruction set may be changed.– Implementing a new assembly language set, makes this a new machine.

Making one machine appear like another by changing its language set at this low level is called EMULATION. Changing a computer to look like another with a high level language is called SIMULATION and is usually much slower than emulation.

• A microcode instruction is retrieved during each clock cycle• For MARIE, the microcode instruction format is shown below..

Slide 4

cs2mf3/wfsp L25-4

Systems Programming -- IIIMARIE Software Programming

MARIE Control Decoder – microprogrammed• In microprogrammed control, instruction microcode produces control

signal changes.• Machine instructions are the input for a microprogram that converts

the 1s and 0s of an instruction into control signals.• The microprogram is stored in firmware, which is also called the

control store.

• A microcode instruction is retrieved during each clock cycle.• If MARIE were microprogrammed, the microinstruction format could

involve two operational codes in one instruction as below.• MicroOp1 and MicroOp2 contain binary codes for each

instruction. Jump is a single bit indicating that the value in the Destfield is a valid address and should be placed in the microsequencer.


cs2mf3/wfsp L25-5

Systems Programming -- IVMARIE Software Programming

MARIE Control Decoder – microprogrammed• The table below contains MARIE’s microoperation codes along with

the corresponding RTN:

Slide 6

cs2mf3/wfsp L25-6

Systems Programming -- VMARIE Software Programming

MARIE Control Decoder – microprogrammed• The first nine lines of MARIE’s microprogram are given below (using RTN

for clarity)• The first four lines are the fetch-decode-execute cycle.• The remaining lines are the beginning of a jump table.


cs2mf3/wfsp L25-7


MARIE Software ProgrammingMARIE Control Decoder – microprogrammed

• It’s important to remember that a microprogrammed control unit works like a system-in-miniature.

• Microinstructions are fetched, decoded, and executed in the samemanner as regular instructions.

• The second version of VBSim shown several lectures earlier can now be understood.

• Another screen dump of the microMachine simulator is shown on the next slide.

• Note the Control Stores which have been expended to illustrate the contents.

Slide 8

cs2mf3/wfsp L25-8



cs2mf3/wfsp L25-9

Systems Programming -- VIIMARIE Software Programming

MARIE Control Decoder – microprogrammed• Note that this extra level of instruction interpretation is what makes

microprogrammed control slower than hardwired control.• The advantages of microprogrammed control are that it can

support very complicated instructions and only the microprogramneeds to be changed if the instruction set changes (or an error is found).

• This is actually the case with early versions of the Intel chip when a floating point error with roundoff problems was found, it was fixed first by changing the microCode.

• Now, however, for speed purposes the chip control decoder is entirely hardcoded

Slide 10

cs2mf3/wfsp L25-10

Systems Programming -- VIIMARIE Software Programming

MARIE Control Decoder – the microMachine

Text Section Covered: Last Part of 4.13

Next Lecture Content: Real World CPUs.


cs2mf3/wfsp L26-1


Architecture of Real World Computers – CISC vs RISCMARIE shares many features with modern architectures but it is not an accurate depiction of them.In the following slides, we will briefly examine two machine architectures. We look at an Intel architecture, which is a CISC machine and MIPS, which is a RISC machine.

• CISC is an acronym for complex instruction set computer.• RISC stands for reduced instruction set computer.

During the latter stages of this course, we will be programming briefly in Assembler for the Intel machine.

Slide 2

cs2mf3/wfsp L26-2

Advanced Digital Systems -- I

Real World Machine Architectures – CISC vs RISC

• Register Windows is an advanced concept, where groups of register values are placed in rapid cache memory for quick access.

• Pipelining is also an advanced concept we shall look at in a later lecture in the course, where instructions are executed in a faster way by using datapath flows in a rapid consecutive manner which means more complicated architecture than MARIE.


cs2mf3/wfsp L26-3

Advanced Digital Systems -- IIReal World Machine Architectures – The Intel Chip

The classic Intel architecture, the 8086, was born in 1979. It is a CISC architecture.It was adopted by IBM for its famed PC, which was released in 1981. The 8086 operated on 16-bit data words and supported 20-bit memory addresses.Later, to lower costs, the 8-bit 8088 was introduced. Like the 8086, it used 20-bit memory addresses allowing 1Mb of memory to be addressed.The 8086 had four 16-bit general-purpose registers that could be accessed by the half-word.It also had a flags register, an instruction register, and a stack accessed through the values in two other registers, the base pointer and the stack pointer. The 8086 had no built in floating-point processing.In 1980, Intel released the 8087 numeric coprocessor, but few users elected to install them because of their cost.

Slide 4

cs2mf3/wfsp L26-4

Advanced Digital Systems -- IIIReal World Machine Architectures – The Intel Chip

In 1985, Intel introduced the 32-bit 80386.It also had no built-in floating-point unit.The 80486, introduced in 1989, was an 80386 that had built-in floating-point processing and cache memory.The 80386 and 80486 offered downward compatibility with the 8086and 8088.Software written for the smaller word systems was directed to use the lower 16 bits of the 32-bit registers.Currently, Intel’s most advanced 32-bit single microprocessor is the Pentium 5. Modern chips now contain more than 1 CPU and are challenging the limits (and definition) of von Neumann architectures.It can run as fast as 5 GHz. This clock rate is over 800 times faster than the 4.77 MHz of the 8086.Speed enhancing features include multilevel cache and instruction pipelining.Intel, along with many others, is marrying many of the ideas of RISC architectures with microprocessors that are largely CISC.


cs2mf3/wfsp L26-5

Advanced Digital Systems -- IVReal World Machine Architectures – The MIPs Chip

The MIPS family of CPUs has been one of the most successful in its class.In 1986 the first MIPS CPU was announced.It had a 32-bit word size and could address 4GB of memory.Over the years, MIPS processors have been used in general purpose computers as well as in games.The MIPS architecture now offers 32- and 64-bit versions.MIPS was one of the first RISC microprocessors.The original MIPS architecture had only 55 different instructions, as compared with the 8086 which had over 100.MIPS was designed with performance in mind: It is a load/storearchitecture, meaning that only the load and store instructions can access memory.The large number of registers in the MIPS architecture keeps bustraffic to a minimum.

Slide 6

cs2mf3/wfsp L26-6

Digital Systems & Systems ProgrammingSummary

The major components of a computer system are its control unit, registers, memory, ALU, and data path.A built-in clock keeps everything synchronized.Control units can be microprogrammed or hardwired.Hardwired control units give better performance, whilemicroprogrammed units are more adaptable to changes.Computers run programs through iterative fetch-decode-execute cycles.Computers can run programs that are in machine language.An assembler converts mnemonic code to machine language.The Intel architecture is an example of a CISC architecture; MIPS is an example of a RISC architecture.


cs2mf3/wfsp L26-7

Digital Systems ProgrammingSystems and Software Programming

Text Section Covered: Last Part of 4.13

Next Lecture Content: Advanced Computer Systems.


cs2mf3/wfsp L27-1

CS2MF3 – Digital Systems and Systems ProgrammingAdvanced Architecture of Real World Computers

These next few lectures build upon the ideas of MARIE which approximate more real world computer architectures.We will take a detailed look at different instruction formats, operand types, and memory access methods.We will investigate the interrelation between machine organization and instruction formats.This will lead to a deeper understanding of computer architecture in general that is closer to modern processors.FIRST we will look at instruction set architectures in more detail:

• Instruction sets are differentiated by the following:• Number of bits per instruction.• Stack-based or register-based.• Number of explicit operands per instruction.• Operand location.• Types of operations.• Type and size of operands.

Slide 2

cs2mf3/wfsp L27-2

Advanced Systems Programming -- I

Instruction Set Architectures – continuedInstruction set architectures are measured according to:

• Main memory space occupied by a program.

• Instruction complexity.

• Instruction length (in bits).

• Total number of instructions in the instruction set.

In designing an instruction set, consideration is given to:• Instruction length.

– Whether short, long, or variable.• Number of operands.• Number of addressable registers.• Memory organization.

– Whether byte- or word addressable.• Addressing modes.

– Choose any or all: direct, indirect or indexed.


cs2mf3/wfsp L27-3

Advanced Systems Programming -- II

Instruction Set Architectures – continuedByte ordering, or endianness, is another major architectural consideration.If we have a two-byte integer, the integer may be stored so that the least significant byte is followed by the most significant byte or vice versa.

• In little endian machines, the least significant byte is followed by the most significant byte.

• Big endian machines store the most significant byte first (at the lower address).

As an example, suppose we have the hexadecimal number 12345678.

The big endian and small endian arrangements of the bytes are shown below.

Slide 4

cs2mf3/wfsp L27-4

Advanced Systems Programming -- III

Instruction Set Architectures – continuedSummary of Byte ordering, or endiannessBig endian:

• Is more natural.• The sign of the number can be determined by looking at the byte at

address offset 0.• Strings and integers are stored in the same order.

Little endian:• Makes it easier to place values on non-word boundaries.• Conversion from a 16-bit integer address to a 32-bit integer address

does not require any arithmetic.


cs2mf3/wfsp L27-5

Advanced Systems Programming -- IV

Instruction Set Architectures – continuedAnother consideration for architecture design concerns is determining how the CPU will store data.There are three choices:

1. A stack architecture2. An accumulator architecture3. A general purpose register architecture.

In choosing one over the other, the tradeoffs are simplicity (and cost) of hardware design with execution speed and ease of use.In a stack architecture, instructions and operands are implicitly taken from the stack.

• A stack cannot be accessed randomly.In an accumulator architecture, one operand of a binary operation is implicitly in the accumulator.

• One operand is in memory, creating lots of bus traffic.• This was at the heart of our simple MARIE architecture

Slide 6

cs2mf3/wfsp L27-6

Advanced Systems Programming -- V

Instruction Set Architectures – continuedIn a general purpose register (GPR) architecture, registers can be used instead of memory.

• Faster than accumulator architecture.• Efficient implementation for compilers.• Results in longer instructions.

Most systems today are GPR systems.There are three types:

• Memory-memory where two or three operands may be in memory.• Register-memory where at least one operand must be in a register.• Load-store where no operands may be in memory.

The number of operands and the number of available registers hasa direct affect on instruction length.


cs2mf3/wfsp L27-7

Advanced Systems Programming -- VIInstruction Set Architectures – Instruction Formats

Stack machines use one - and zero-operand instructions.• LOAD and STORE instructions require a single memory address

operand.• Other instructions use operands from the stack implicitly.• PUSH and POP operations involve only the stack’s top element.• Binary instructions (e.g., ADD, MULT) use the top two items on the stack.

• Stack architectures require us to think about arithmetic expressions a little differently.

• We are accustomed to writing expressions using infix notation, such as: Z = X + Y.

• Stack arithmetic requires that we use postfix notation: Z = XY+.– This is also called reverse Polish notation (RPN), (somewhat) in honor of its

Polish inventor, Jan Lukasiewicz (1878 - 1956).

– Most HP calculators use RPN and are hard to use at first, if the user is not familiar with stack machine operations

Slide 8

cs2mf3/wfsp L27-8

Advanced Systems Programming -- VIIInstruction Set Architectures – Instruction Formats

Stack operations• The principal advantage of postfix notation is that parentheses are not used.• For example, the infix expression,

Z = (X × Y) + (W × U),becomes:

Z = X Y × W U × +

in postfix notation.• In a stack ISA, the postfix expression,

Z = X Y × W U × +

might look like this:PUSH XPUSH YMULTPUSH WPUSH UMULTADDPUSH Z

Note: The result of a binary operation is implicitly stored on the top of the stack!


cs2mf3/wfsp L27-9

Advanced Systems Programming -- VIII

Instruction Set Architectures – Instruction FormatsIn a one-address ISA, like MARIE, the infix expression, Z = X × Y + W × U

looks like this:LOAD XMULT YSTORE TEMPLOAD WMULT UADD TEMPSTORE Z

In a two-address ISA, (e.g.,Intel, Motorola), the infix expression, Z = X × Y + W × U

might look like this:LOAD R1,XMULT R1,YLOAD R2,WMULT R2,UADD R1,R2STORE Z,R1

Slide 10

cs2mf3/wfsp L27-10

Advanced Systems Programming -- IX

Instruction Set Architectures – Instruction FormatsWith a three-address ISA, (e.g.,mainframes), the infix expression, Z = X × Y + W × U

might look like this:MULT R1,X,YMULT R2,W,UADD Z,R1,R2

Text Section Covered: 5.1 and Part of 5.2

Next Lecture Content: Instruction Addressing Modes.


cs2mf3/wfsp L28-1

CS2MF3 – Digital Systems and Systems ProgrammingArchitecture of Real World Computers -- Instructions

In the last lecture, we have seen how instruction length is affected by the number of operands supported by the ISA.In any instruction set, not all instructions require the same number of operands.Operations that require no operands, such as HALT (such as in MARIE) necessarily waste some space when fixed-length instructions are used.One way to recover some of this space is to use expanding opcodes.Suppose a system has 16 registers and 4K of memory.

• We need 4 bits to access one of the registers. We also need 12 bits for a memory address.

• If the system is to have 16-bit instructions, we have two choices for our instructions:

Slide 2

cs2mf3/wfsp L28-2


Advanced Machine Instruction Architectures – InstructionsIf we allow the length of the opcode to vary, we could create a very rich instruction set.Some instructions could have three register addressing, or two register addressing, one register addressing or even no registeraddressing as shown below.


cs2mf3/wfsp L28-3

Advanced Systems Programming -- II

Instruction Set Architectures – Instruction FormatsOther aspects concerning instruction formats can also be considered, such as

• Data movement.• Arithmetic.• Boolean.• Bit manipulation.• Input and Output (/O).• Control transfer.• Special purpose commands, and lastly• Addressing modes

• We shall next consider addressing modes in more detail than we did in the last set of lectures

Slide 4

cs2mf3/wfsp L28-4


Instruction Set Architectures – More on Addressing ModesAddressing modes specify where an operand is located.They can specify a constant, a register, or a memory location.The actual location of an operand is its effective address.Certain addressing modes allow us to determine the address of anoperand dynamically.Immediate addressing is where the data are part of the instruction.Direct addressing is where the address of the data is given in the instruction.Register addressing is where the data are located in a register.Indirect addressing gives the address of the address of the data in the instruction.Register indirect addressing uses a register to store the address of the address of the data.


cs2mf3/wfsp L28-5

Advanced Systems Programming -- IVInstruction Set Architectures – More on Addressing Modes

Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in the operand to determine the effective address of the data.Based addressing is similar except that a base register is used instead of an index register.The difference between these two is that an index register holds an offset relative to the address given in the instruction, a base register holds a base address where the address field represents a displacement from this base.In stack addressing the operand is assumed to be on top of the stack.There are many variations to these addressing modes including:

• Indirect indexed.• Base/offset.• Self-relative• Auto increment - decrement.

We will leave these topics mentioned immediately above for the main course in Computer ArchitectureWe will next look at an example of the principle addressing modes

Slide 6

cs2mf3/wfsp L28-6


Instruction Set Architectures – More on Addressing ModesFor the instruction shown, what value is loaded into the accumulator for each addressing mode?


cs2mf3/wfsp L28-7

Advanced Systems Programming -- VI

Instruction Set Architectures – More on Addressing ModesFor the instruction shown, the value shown is loaded into the accumulator for each addressing mode.

Slide 8

cs2mf3/wfsp L28-8

Advanced Systems Programming -- IX

Instruction Set Architectures – Addressing Modes

Text Section Covered: 5.1 through 5.4

Next Lecture Content: Instruction Level Pipelining (ILP).


cs2mf3/wfsp L29-1

CS2MF3 – Digital Systems and Systems ProgrammingArchitecture of Real World Computers – Instruction Pipelining

Some CPUs divide the fetch-decode-execute cycle into smaller steps.These smaller steps can often be executed in parallel to increase throughput.Such parallel execution is called instruction-level pipelining.This term is sometimes abbreviated ILP in the literature.An Example: suppose a fetch-decode-execute cycle were broken into the following smaller steps:

Suppose we have a six-stage pipeline. S1 fetches the instruction, S2 decodes it, S3 determines the address of the operands, S4 fetches them, S5 executes the instruction, and S6 stores the result.

1. Fetch instruction. 4. Fetch operands.2. Decode opcode. 5. Execute instruction.3. Calculate effective 6. Store result.

address of operands.

Slide 2

cs2mf3/wfsp L29-2


Real World Machine Architectures – Instruction PipeliningFor every clock cycle, one small step is carried out, and the stages are overlapped.

S1. Fetch instruction. S4. Fetch operands.S2. Decode opcode. S5. Execute.S3. Calculate effective S6. Store result.

address of operands.


cs2mf3/wfsp L29-3

Advanced Systems Programming -- IIReal World Machine Architectures – Instruction Pipelining

This will carry on for the length of the program with each fetch/decode/execute cycle:

This is true UNLESS something stops the sequential set of instructions from being followed – more on this later.

Slide 4

cs2mf3/wfsp L29-4


Instruction Set Architectures – Instruction PipeliningThe theoretical speedup offered by a pipeline can be determined as follows:

Let tp be the time per stage.

Each instruction represents a task, T, in the pipeline.

The first task (instruction) requires k × tp time to complete in a k-stage pipeline.

The remaining (n - 1) tasks emerge from the pipeline one per cycle.

So total time to complete remaining tasks is (n - 1)tp.

Thus, to complete n tasks using a k-stage pipeline requires:

(k × tp) + (n - 1)tp = (k + n - 1)tp.


cs2mf3/wfsp L29-5

Advanced Systems Programming -- IV

Instruction Set Architectures – Instruction PipeliningIf we take the time required to complete n tasks without a pipeline and divide it by the time it takes to complete n tasks using a pipeline, we find:

If we take the limit as n approaches infinity,(k + n - 1) approaches n,which results in a theoretical speedup of:

Slide 6

cs2mf3/wfsp L29-6


Instruction Set Architectures – Instruction PipeliningOur neat equations take a number of things for granted:1. we have to assume that the architecture supports fetching instructions

and data in parallel.

2. we assume that the pipeline can be kept filled at all times. This is not always the case. Pipeline hazards arise that cause pipeline conflicts and stalls.

An instruction pipeline may stall, or be flushed for any of the following reasons:• Resource conflicts.

• Data dependencies.

• Conditional branching.

Measures can be taken at the software level as well as at the hardware level to reduce the effects of these hazards, but they cannot be totally eliminated.


cs2mf3/wfsp L29-7

Advanced Systems Programming -- VI

Instruction Set Architectures – Instruction PipeliningStall Example:• Below, at instruction #3 a branch (flow control change) is executed.

• We must now wait for the pipeline to clear before we can start filling the pipeline with the new (not consecutive) next instruction.

• For this pipeline with 4 cycles per fill, we must wait 4 instructions, that is, not begin filling the pipeline until the 8th instruction or the 7th cycle!

Slide 8

cs2mf3/wfsp L29-8

Advanced Systems Programming -- VII

Instruction Set Architectures – Instruction Pipelining


Next Lecture Content: Real World Example of ISAs.


cs2mf3/wfsp L30-1


Architecture of Real World Computers – the INTEL chipWe return briefly to the Intel and MIPS architectures from the last set of lectures, using some of the ideas introduced in our ILP section.Intel introduced pipelining to their processor line with its Pentiumchip.The first Pentium had two five-stage pipelines. Each subsequent Pentium processor had a longer pipeline than its predecessor with the Pentium IV having a 24-stage pipeline.The Itanium (IA-64) has only a 10-stage pipeline.Intel processors support a wide array of addressing modes.The original 8086 provided 17 ways to address memory, most of them variants on the methods presented in these set of lectures.Owing to their need for backward compatibility, the Pentium chips also support these 17 addressing modes.The Itanium, having a RISC core, supports only one: register indirect addressing with optional post increment.

Slide 2

cs2mf3/wfsp L30-2

Advanced Digital Systems -- IReal World Machine Architectures – the MIPs chip

MIPS is an acronym for Microprocessor Without Interlocked Pipeline Stages.The architecture is little endian and word-addressable with three-address, fixed-length instructions.

Like Intel, the pipeline size of the MIPS processors has grown: The R2000 and R3000 have five-stage pipelines; the R4000 and R4400 have 8-stage pipelines.The R10000 has three pipelines: A five-stage pipeline for integer instructions, a seven-stage pipeline for floating-point instructions, and a six-state pipeline for LOAD/STORE instructions. In all MIPS ISAs, only the LOAD and STORE instructions can access memory.The ISA uses only base addressing mode.The assembler accommodates programmers who need to use immediate, register, direct, indirect register, base, or indexedaddressing modes.


cs2mf3/wfsp L30-3

Advanced Digital Systems -- II

Real World Machine Architectures – the Virtual MachineThe Java programming language is an interpreted language that runs in a software machine called the Java Virtual Machine (JVM).A JVM is written in a native language for a wide array of processors, including MIPS and Intel.Like a real machine, the JVM has an ISA all of its own, calledbytecode. This ISA was designed to be compatible with the architecture of any machine on which the JVM is running.Java bytecode is a stack-based language.Most instructions are zero address instructions.The JVM has four registers that provide access to five regions of main memory.

The next slide shows how the pieces fit together

Slide 4

cs2mf3/wfsp L30-4

Advanced Digital Systems -- III

Real World Machine Architectures – the JAVA VM


cs2mf3/wfsp L30-5

Advanced Digital Systems -- IV

Real World Machine Architectures – the JVMAll references to memory are offsets from these registers. Java uses no pointers or absolute memory references.Java was designed for platform interoperability, not performance!

Concluding RemarksISAs are distinguished according to their bits per instruction, number of operands per instruction, operand location and types and sizes of operands.Endianness as another major architectural consideration.

CPU can store store data based on1. A stack architecture2. An accumulator architecture3. A general purpose register architecture.

Slide 6

cs2mf3/wfsp L30-6

Advanced Digital Systems -- V

Concluding RemarksInstructions can be fixed length or variable length.To enrich the instruction set for a fixed length instruction set, expanding opcodes can be used.The addressing mode of an ISA is also another important factor. We looked at:• Immediate – Direct• Register – Register Indirect• Indirect – Indexed• Based – Stack

A k-stage pipeline can theoretically produce execution speedup of k as compared to a non-pipelined machine.Pipeline hazards such as resource conflicts and conditional branching prevents this speedup from being achieved in practice.The Intel, MIPS, and JVM architectures provide good examples of the concepts presented in this chapter.


cs2mf3/wfsp L30-7

Advanced Systems Programming -- VII


Next Lecture Content: Memory Systems


cs2mf3/wfsp L31-1


Memory SystemsMemory lies at the heart of the stored-program computer.In previous lectures, we studied the components from which memory is built and the ways in which memory is accessed by various ISAs.In the next several lectures, we focus on memory organization.A clear understanding of these ideas will assist in efficient assembly language programming.There are two kinds of main memory: random access memory, RAM, and read-only-memory, ROM.There are two types of RAM, dynamic RAM (DRAM) and static RAM (SRAM).

• Dynamic RAM consists of capacitors that slowly leak their charge over time. Thus they must be refreshed every few milliseconds to prevent data loss.

• DRAM is “cheap” memory owing to its simple design.

Slide 2

cs2mf3/wfsp L31-2

Memory Systems -- ITypes of Memory Systems – RAM

Static RAM• SRAM consists of circuits similar to the D flip-flop type that we studied earlier.• SRAM is very fast memory and it does not need to be refreshed like DRAM

does. It is used to build cache memory, which we will discuss in detail later.

ROM – read only memory (non-volatile RAM)• ROM also does not need to be refreshed, either. In fact, it needs very little

charge to retain its memory.• ROM is used to store permanent, or semi-permanent data that persists even

while the system is turned off.

General Propertiesfaster memory is more expensive than slower memory.To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion.Small, fast storage elements are kept in the CPU, larger, slower main memory is accessed through the data bus.Larger, (almost) permanent storage in the form of disk and tape drives is still further from the CPU.


cs2mf3/wfsp L31-3

Memory Systems -- IITypes of Memory Systems

This memory storage organization can be thought of as a pyramid:

Slide 4

cs2mf3/wfsp L31-4

Memory Systems -- IIITypes of Memory Systems

Cache Memory• To access a particular piece of data, the CPU first sends a request to its

nearest memory, usually cache. • If the data is not in cache, then main memory is queried. If the data is not

in main memory, then the request goes to disk.• Once the data is located, then the data, and a number of its nearby data

elements are fetched into cache memory.This leads us to some definitions.

• A hit is when data is found at a given memory level.• A miss is when it is not found.• The hit rate is the percentage of time data is found at a given memory

level.• The miss rate is the percentage of time it is not. • Miss rate = 1 - hit rate.• The hit time is the time required to access data at a given memory level.• The miss penalty is the time required to process a miss, including the time

that it takes to replace a block of memory plus the time it takes to deliver the data to the processor.


cs2mf3/wfsp L31-5

Memory Systems -- IVTypes of Memory Systems

Cache Memory – cont'd• An entire block of data is copied after a hit because the principle of locality

tells us that once a byte is accessed, it is likely that a nearby data element will be needed soon.

• There are three forms of locality:– Temporal locality- Recently-accessed data elements tend to be

accessed again.– Spatial locality - Accesses tend to cluster.– Sequential locality - Instructions tend to be accessed sequentially.

• The purpose of cache memory is to speed up accesses by storing recently used data closer to the CPU, instead of storing it in main memory.

• Although cache is much smaller than main memory, its access time is a fraction of that of main memory.

• Unlike main memory, which is accessed by address, cache is typically accessed by content; hence, it is often called content addressable memory.

• Because of this, a single large cache memory isn’t always desirable-- it takes longer to search.

Slide 6

cs2mf3/wfsp L31-6

Memory Systems -- VTypes of Memory Systems

Cache Memory – cont'd• The “content” that is addressed in content addressable cache memory is a

subset of the bits of a main memory address called a field.• The fields into which a memory address is divided provide a many-to-one

mapping between larger main memory and the smaller cache memory.• Many blocks of main memory map to a single block of cache. A tag field in

the cache block distinguishes one cached memory block from another.• The simplest cache mapping scheme is direct mapped cache.• In a direct mapped cache consisting of N blocks of cache, block X of main

memory maps to cache block Y = X mod N.• Thus, if we have 10 blocks of cache, block 7 of cache may hold blocks 7,

17, 27, 37, . . . of main memory.• Once a block of memory is copied into its slot in cache, a valid bit is set for

the cache block to let the system know that the block contains valid data.

• What could happen if there was no bit marked valid?


cs2mf3/wfsp L31-7

Memory Systems -- VITypes of Memory Systems

Cache Memory – cont'd• The diagram below is a schematic of what cache looks like.

• Block 0 contains multiple words from main memory, identified with the tag 00000000. Block 1 contains words identified with the tag 11110101.

• The other two blocks are not valid, I.e. not yet used or needed.

Slide 8

cs2mf3/wfsp L31-8

Memory Systems -- VIITypes of Memory Systems

Cache Memory – cont'd• The size of each field into which a memory address is divided depends on

the size of the cache.• Suppose our memory consists of 214 words, cache has 16 = 24 blocks, and

each block holds 8 words.– Thus memory is divided into 214 / 2 8 = 211 blocks.

• For our field sizes, we know we need 4 bits for the block, 3 bits for the word, and the tag is what is left over:


cs2mf3/wfsp L31-9

Memory Systems -- VIIITypes of Memory Systems

Cache Memory – cont'd• As an example, suppose

a program generates theaddress 1AA. tag=7 bits |Block#| Word#In 14-bit binary, this number is: 00 0001 1|010 1|010:Block 5 & Word 2.

• The first 7 bits of this address go in the tag field, the next 4 bits go in the block field, and the final 3 bits indicate the word within the block.

• If subsequently the program generates the address 1AB, it will find the data it is looking for tag 0000011 (3) in block 0101 (5), word 011 (3).

• However, if the program generates the address, 3AB, instead, the block loaded for address 1AA would be evicted from the cache, and replaced by the blocks associated with the 3AB reference. tag=7 bits |Block#| Word#Tag 7, Block 5 & Word 3 3AB = 00 0011 1|010 1|011since, at this address, the block number is the same as for the 1AA address.

• Suppose a program generates a series of memory references such as: 1AB, 3AB, 1AB, 3AB, . . . The cache will continually evict and replace blocks.

• The theoretical advantage offered by the cache is lost in this extreme case.

Slide 10

cs2mf3/wfsp L31-10

Memory Systems -- IX

Types of Memory SystemsCache Memory – to be cont'd

• This is the main disadvantage of direct mapped cache.• Other cache mapping schemes are designed to prevent this

kind of thrashing.


Next Lecture Content:More on Memory Systems


cs2mf3/wfsp L32-1


Types of Memory Systems – Cache MemoryLast time we talked about caches and specifically direct mapped cachesBut suppose instead of placing memory blocks in specific cache locations based on memory address, we allow a block to go anywhere in cache.

In this way, cache would have to fill up before any blocks are evicted.

This is how fully associative cache works.

A memory address is partitioned into only two fields: the tag and the word.

Slide 2

cs2mf3/wfsp L32-2

More on Memory Systems -- ITypes of Memory Systems

More on Cache Memory – cont'd• Suppose, as before, we have 14-bit memory addresses and a cache with

16 blocks, each block of size 8.The field format of a memory reference is:

• When the cache is searched, all tags are searched in parallel to retrieve the data quickly.

• This requires special, costly hardware.

• You will recall that direct mapped cache evicts a block whenever another memory reference needs that block.

• With fully associative cache, we have no such mapping, thus we must devise an algorithm to determine which block to evict from the cache.

• The block that is evicted is the victim block.• There are a number of ways to pick a victim, we will discuss them shortly.


cs2mf3/wfsp L32-3

More on Memory Systems -- IITypes of Memory Systems

More on Cache Memory – cont'd• Set associative cache combines the ideas of direct mapped cache and fully

associative cache.• An N-way set associative cache mapping is like direct mapped cache in

that a memory reference maps to a particular location in cache.• Unlike direct mapped cache, a memory reference maps to a set of several

cache blocks, similar to the way in which fully associative cache works.• Instead of mapping anywhere in the entire cache, a memory reference can

map only to the subset of cache slots.

• The number of cache blocks per set in set associative cache varies according to overall system design.

• For example, a 2-way set associative cache can be conceptualized as shown in the schematic below.

• Each set contains two different memory blocks.

Slide 4

cs2mf3/wfsp L32-4

More on MemorySystems -- III

Types of Memory SystemsMore on Cache Memory – cont'd

• In set associative cache mapping, memory reference is divided intothree fields: tag, set, and word, as shown above.

• As with direct-mapped cache, the word field chooses the word within the cache block, and the tag field uniquely identifies the memory address.

• The set field determines the set to which the memory block maps.• Suppose we have a main memory of 214 bytes.• This memory is mapped to a 2-way set associative cache having 16 blocks

where each block contains 8 words.• Since this is a 2-way cache, each set consists of 2 blocks, and there are 8

sets.• Thus, we need 3 bits for the set, 3 bits for the word, giving 8 leftover bits for

the tag:


cs2mf3/wfsp L32-5

More on Memory Systems -- IVTypes of Memory Systems

More on Cache Memory – cont'd• With fully associative and set associative cache, a replacement policy is

invoked when it becomes necessary to evict a block from cache.• An optimal replacement policy would be able to look into the future to see

which blocks won’t be needed for the longest period of time.• Although it is impossible to implement an optimal replacement algorithm, it

is instructive to use it as a benchmark for assessing the efficiency of any other scheme we come up with.

• The replacement policy that we choose depends upon the locality that we are trying to optimize-- usually, we are interested in temporal locality.

• A least recently used (LRU) algorithm keeps track of the last time that a block was assessed and evicts the block that has been unused for the longest period of time.

• The disadvantage of this approach is its complexity: LRU has to maintain an access history for each block, which ultimately slows down the cache.

Slide 6

cs2mf3/wfsp L32-6

More on Memory Systems -- VTypes of Memory Systems

More on Cache Memory – cont'd• First-in, first-out (FIFO) is a popular cache replacement policy.• In FIFO, the block that has been in the cache the longest, regardless of

when it was last used.• A random replacement policy does what its name implies: It picks a block at

random and replaces it with a new block.• Random replacement can certainly evict a block that will be needed often or

needed soon, but it never thrashes.• The performance of hierarchical memory is measured by its effective

access time (EAT).• EAT is a weighted average that takes into account the hit ratio and relative

access times of successive levels of memory.• The EAT for a two-level memory is given by:

EAT = H × AccessC + (1-H) × AccessMM.

where H is the cache hit rate andAccessC and AccessMM are the access times for cache and main memory, respectively.


cs2mf3/wfsp L32-7

More on Memory Systems -- VITypes of Memory Systems

More on Cache Memory – cont'd• For example, consider a system with a main memory access time of 200ns

supported by a cache having a 10ns access time and a hit rate of 99%.

• The EAT is:0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.

• This equation for determining the effective access time can be extended to any number of memory levels, as we will see in later sections.

• Cache replacement policies must also take into account dirty blocks, those blocks that have been updated while they were in the cache.

• Dirty blocks must be written back to memory. A write policy determines how this will be done.

• There are two types of write policies,write through and write back.

• Write through updates cache and main memory simultaneously on every write.

Slide 8

cs2mf3/wfsp L32-8

More on Memory Systems -- VIITypes of Memory Systems

More on Cache Memory – cont'd• Write back (also called copyback) updates memory only when the block is

selected for replacement.• The disadvantage of write through is that memory must be updated with

each cache write, which slows down the access time on updates. This slowdown is usually negligible, because the majority of accesses tend to be reads, not writes.

• The advantage of write back is that memory traffic is minimized, but its disadvantage is that memory does not always agree with the value in cache, causing problems in systems with many concurrent users.

• The cache we have been discussing is called a unified or integrated cache where both instructions and data are cached.

• Many modern systems employ separate caches for data and instructions.– This is called a Harvard cache.

• The separation of data from instructions provides better locality, at the cost of greater complexity.

– Simply making the cache larger provides about the same performance improvement without the complexity.


cs2mf3/wfsp L32-9

More on Memory Systems -- VIIITypes of Memory Systems

More on Cache Memory – cont'd• Cache performance can also be improved by adding a small associative

cache to hold blocks that have been evicted recently.– This is called a victim cache.

• A trace cache is a variant of an instruction cache that holds decoded instructions for program branches, giving the illusion that noncontiguous instructions are really contiguous.

• Most of today’s small systems employ multilevel cache hierarchies.• The levels of cache form their own small memory hierarchy.• Level1 cache (8KB to 64KB) is situated on the processor itself.

– Access time is typically about 4ns.

• Level 2 cache (64KB to 2MB) may be on the motherboard, or on an expansion card.

– Access time is usually around 15 - 20ns.

Slide 10

cs2mf3/wfsp L32-10

More on Memory Systems -- IXTypes of Memory Systems

More on Cache Memory – cont'd• In systems that employ three levels of cache, the Level 2 cache is placed

on the same die as the CPU (reducing access time to about 10ns)• Accordingly, the Level 3 cache (2MB to 256MB) refers to cache that is

situated between the processor and main memory.• Once the number of cache levels is determined, the next thing to consider

is whether data (or instructions) can exist in more than one cache level.• If the cache system used an inclusive cache, the same data may be

present at multiple levels of cache. • Strictly inclusive caches guarantee that all data in a smaller cache also

exists at the next higher level.• Exclusive caches permit only one copy of the data.• The tradeoffs in choosing one over the other involve weighing the variables

of access time, memory size, and circuit complexity.


cs2mf3/wfsp L32-11

More on Memory Systems -- X

Types of Memory SystemsVirtual Memory – to be covered

Text Section Covered: Last Parts of 6.4

Next Lecture Content:More on Memory Systems


cs2mf3/wfsp L33-1


More on Memory Systems – Virtual Memory SystemsCache memory enhances performance by providing faster memory access speed.Virtual memory enhances performance by providing greater memory capacity, without the expense of adding main memory.Instead, a portion of a disk drive serves as an extension of main memory.If a system uses paging, virtual memory partitions main memory into individually managed page frames, that are written (or paged) to disk when they are not immediately needed.

A physical address is the actual memory address of physical memory.

Programs create virtual addresses that are mapped to physical addresses by the memory manager.

Page faults occur when a logical address requires that a page be brought in from disk.

Memory fragmentation occurs when the paging process results in the creation of small, unusable clusters of memory addresses.

Slide 2

cs2mf3/wfsp L33-2

Advanced Systems -- IVirtual Memory SystemsMain memory and virtual memory are divided into equal sized pages.The entire address space required by a process need not be in memory at once. Some parts can be on disk, while others are in main memory.Furthermore, the pages allocated to a process do not need to be stored contiguously -- either on disk or in memory.In this way, only the needed pages are in memory at any time, the unnecessary pages are in slower disk storage.Information concerning the location of each page, whether on disk or in memory, is maintained in a data structure called a page table (shown below). There is one page table for each active process.


cs2mf3/wfsp L33-3

Advanced Systems -- IIVirtual Memory Systems

When a process generates a virtual address, the operating systemtranslates it into a physical memory address.To accomplish this, the virtual address is divided into two fields: A page field, and an offset field.The page field determines the page location of the address, and the offset indicates the location of the address within the page.The logical page number is translated into a physical page frame through a lookup in the page table.If the valid bit is zero in the page table entry for the logical address, this means that the page is not in memory and must be fetched from disk.

• This is called a page fault.• If necessary, a page is evicted from memory and is replaced by the page

retrieved from disk, and the valid bit is set to 1.If the valid bit is 1, the virtual page number is replaced by the physical frame number.The data is then accessed by adding the offset to the physical frame number.

Slide 4

cs2mf3/wfsp L33-4

Advanced Systems -- IIIVirtual MemorySystems

As an example, suppose a system has a virtual address space of 8K and a physical address space of 4K, no cache is being utilized, and the system uses byte addressing. Therefore we have 213/210 = 23 virtual pages.A virtual address has 13 bits (8K = 213) with 3 bits for the page field and 10 for the offset, because the page size is 1024.

A physical memory address requires 12 bits, the first two bits for the page frame and the trailing 10 bits the offset.At the left is a picture representing the current state using paging and the associated page table


cs2mf3/wfsp L33-5

Advanced Systems -- IVVirtual Memory Systems – a complete small memory example

Take a program that is 16 bytes longPhysical memory is byte addressable and its size is 8-bytes longAssume a page size of two bytes and that the program executes instructions in following order: 0, 1, 2, 3, 6, 7, 10, 11 (This is called an address reference string.)

Assume that when this program first executes that Page Frames 0 and 1 are used last from some other activity but that the remainder of page frames are free to use.

Slide 6

cs2mf3/wfsp L33-6

Advanced Systems -- VVirtual Memory SystemsExample – continued1. Address 0 referenced – no part of the program is in the main memory so

page miss; Page 0 PGM goes to first unused frame so Frame 22. Address 1 referenced – page hit as Frame 2 has Page 0 PGM3. Address 2 referenced – page miss, call page fault, Page 1 PGM to Frame 04. Address 3 referenced – page hit in Frame 05. Address 6 ref'd next – page miss, PGM Pg3 to Fr16. Addr7 – page hit7. Addr10 next – miss, PGM Pg5 to Fr38. Addr11 – hit.• Above situation shown in the figures a & b below.


cs2mf3/wfsp L33-7

Advanced Systems -- VIVirtual Memory Systems – Example continued

Summary: Pages 0,1,3 and 5 are valid in memory (RAM), but 2,6 & 7 are not.Suppose the program needs address 10 (K) for a second time, then to access it directly more info is need than just page number – we need an offset (see c)In (c), the page field is 3-bits (8 program pages) and the offset is 1-bit (0 or 1)In (d) below, the physical (RAM) address must be determined from the Virtual Address so virtual page 5 maps to physical frame 3, but keep the same offset generating the physical address as shown in (d) below.IN the same way, we needed to consider cache hit/miss ratios when calculating an effective Access Time (EAT) we must do the same here.

Slide 8

cs2mf3/wfsp L33-8

Advanced Systems -- VIIAdvanced Virtual Memory Systems -- EAT

We have seen that for each memory access there must now be two physical memory accesses (page table & physical mem) so the EAT must be modified.Suppose a main memory access is 200 ns and page fault rate is 1% (meaning that 99% of the time we find the memory page we need in memory), and that page fault takes 10ms (includes get page from disk, update pTable, access data) so

EAT = 0.99 (200ns + 200ns) + 0.01(10ms) = 100,396nseven if there were no page faults required, the EAT would be

EAT = 1.00 ( 200ns + 200ns) = 400nswhich is still double the access time of memory.To speed this up by caching the most recently used page table values, this is called a Translation Look-Aside Buffer (TLB)The TLB might appear as below.

Typically the TLB is implemented as an associative cache with the virtual page frame pairs being mapped anywhere.


cs2mf3/wfsp L33-9

Advanced Systems -- VIII Advanced Virtual Memory Systems – TLB Operation

Example of an address lookup when using a TLB.1. Extract the page # from virtual

addr.2. Extract offset from the virtual addr3. Search for the virtual page # in

TLB4. If (virtual page #, page fault #) pair

in TLB, add offset to physical frame # and access memory location

5. If TLB miss, go to page table, get required frame number. If page in memory, use frame # and add offset to get physical address

6. If page not in main memory, generate a page fault and restart access when page fault complete.

Slide 10

cs2mf3/wfsp L33-10

Advanced Systems -- IXAdvanced Virtual Memory Systems – TLB and Caching Example

Getting the physical address from the virtual address:

1. Use the TLB to find frame by using recently cached (page,frame) pair, OR

2. If TLB miss, use page table to get frame in main memory and update TLB as wellRetrieve data with physical address by

1. Search cache to see if data there, OR

2. On cache miss, go to actual main memory to get data and update cache


cs2mf3/wfsp L33-11

Advanced Systems -- XAdvanced Virtual Memory Systems – real world example (Pentium)

Pentium uses 32-bit virtual addresses and 32-bit physical addressesIt uses either 4KB or 4MB pages sizes, has two caches L1 ( for CPU with ½ for I (instruction) and ½ for D (data)) and L2 (for CPU & memory) with 32-byte blocks L1 uses LRU, uses 2-way set associative mapping & TLBs 4-way set associativeL2 is 1 MB in size, uses LRU and 2-way set associative mapping.

Pentium Memory

Hierarchy

Slide 12

cs2mf3/wfsp L33-12

Advanced Memory Systems -- XI


Next Lecture Content:Input/Output Systems


cs2mf3/wfsp L34-1

CS2MF3 – Digital Systems and Systems ProgrammingInput/Output Systems -- Introduction

Data storage and retrieval is one of the primary functions of computer systems.

• One could easily make the argument that computers are more useful to us as data storage and retrieval devices than they are as computational machines.

All computers have I/O devices connected to them, and to achievegood performance I/O should be kept to a minimum!In studying I/O, we seek to understand the different types of I/O devices as well as how they work.Sluggish I/O throughput can have a ripple effect, dragging down overall system performance.

• This is especially true when virtual memory is involved.

The fastest processor in the world is of little use if it spends most of its time waiting for data.If we really understand what’s happening in a computer system wecan make the best possible use of its resources.

Slide 2

cs2mf3/wfsp L34-2

Input/Output and Storage Systems -- IIntroductionThe overall performance of a system is a result of the interaction of all of its components.System performance is most effectively improved when the performance of the most heavily used components is improved. This idea is quantified by Amdahl’s Law:

where S is the overall speedup; f is the fraction of work performed by a faster component; and k is the speedup of the faster component.

Amdahl’s Law gives us a handy way to estimate the performance improvement we can expect when we upgrade a system component.On a large system, suppose we can upgrade a CPU to make it 50% faster for $10,000 or upgrade its disk drives for $7,000 to make them 250% faster. Processes spend 70% of their time running in the CPU and 30% of their time waiting for disk service.An upgrade of which component would offer the greater benefit for the lesser cost? Sometimes it is NOT obvious until we calculate things out.


cs2mf3/wfsp L34-3

Input/Output and Storage Systems -- IIIntroduction – Amdahl's Law and SpeedUp

The processor option offers a 130% speedup:

And the disk drive option gives a 122% speedup:

Each 1% of improvement for the processor costs $333, and for thedisk a 1% improvement costs $318.

Introduction – I/O ArchitecturesWe define input/output as a subsystem of components that moves coded data between external devices and a host system.

Slide 4

cs2mf3/wfsp L34-4

Input/Output and Storage Systems -- IIII/O Architectures -- introduction

I/O subsystems include:• Blocks of main memory that are devoted to I/O functions.• Buses that move data into and out of the system.

• Control modules in the host and in peripheral devices

• Interfaces to external components such as keyboards and disks.

• Cabling or communications links between the host system and its peripherals.

At the left is a model I/O configuration.


cs2mf3/wfsp L34-5

Input/Output and Storage Systems -- IVI/O Architectures – connection schemes

I/O can be controlled in four general ways.Programmed I/O reserves a register for each I/O device. Each register is continually polled to detect data arrival.Interrupt-Driven I/O allows the CPU to do other things until I/O is requested.Direct Memory Access (DMA) offloads I/O processing to a special-purpose chip that takes care of the details.Channel I/O uses dedicated I/O processors.Recall from earlier lectures that in a system which uses interrupts, the status of the interrupt signal is checked at the top of the fetch-decode-execute cycle.The particular code that is executed whenever an interrupt occurs is determined by a set of addresses called interrupt vectors that are stored in low memory.The system state is saved before the interrupt service routine is executed and is restored afterward.

Slide 6

cs2mf3/wfsp L34-6

Input/Output and Storage Systems -- V

I/O Architectures – interrupt-drivenBelow is an idealized I/O subsystem that uses interrupts.

• Each device connects its interrupt line to the interrupt controller.

• The controller signals the CPU when any of the interrupt lines are asserted.


cs2mf3/wfsp L34-7

Input/Output and Storage Systems -- VII/O Architectures – Direct Memory Access (DMA) driven

Below is a Direct Memory Access or DMA configuration. Notice that the DMA and the CPU share the bus. The DMA runs at a higher priority and steals memory cycles from the CPU.

Slide 8

cs2mf3/wfsp L34-8

Input/Output and Storage Systems -- VII

I/O Architectures – channel I/O operationVery large systems employ channel I/O.Channel I/O consists of one or more I/O processors (IOPs) that control various channel paths.Slower devices such as terminals and printers are combined (multiplexed) into a single faster channel.On IBM mainframes, multiplexed channels are called multiplexorchannels, the faster ones are called selector channels.

Channel I/O is distinguished from DMA by the intelligence of theIOPs.

The IOP negotiates protocols, issues device commands, translatesstorage coding to memory coding, and can transfer entire files or groups of files independent of the host CPU.

The host has only to create the program instructions for the I/Ooperation and tell the IOP where to find them.


cs2mf3/wfsp L34-9

Input/Output and Storage Systems -- VIIII/O Architectures – channel I/O operationBelow is a channel I/O configuration.

Slide 10

cs2mf3/wfsp L34-10

Input/Output and Storage Systems -- IXI/O Architectures –Types by Data Classification

Character I/O devices process one byte (or character) at a time.• Examples include modems, keyboards, and mice.• Keyboards are usually connected through an interrupt-driven I/O

system.

Block I/O devices handle bytes in groups.• Most mass storage devices (disk and tape) are block I/O devices.• Block I/O systems are most efficiently connected through DMA or

channel I/O.

Text Section Covered: 7.1 to 7.3

Next Lecture Content:Input/Output Devices


cs2mf3/wfsp L35-1

CS2MF3 – Digital Systems and Systems ProgrammingInput/Output and Storage Systems – Architectures

I/O buses, unlike memory buses, operate asynchronously. Requests for bus access must be arbitrated among the devices involved.

Bus control lines activate the devices when they are needed, raise signals when errors have occurred, and reset devices when necessary.

The number of data lines is the width of the bus.

A bus clock coordinates activities and provides bit cell boundaries.

At the left is a generic DMA configuration showing how the DMA circuit connects to a data bus.

Slide 2

cs2mf3/wfsp L35-2

Input/Output and Storage Systems -- II/O Architectures – BUS connection schemes

Bytes can be conveyed from one point to another by sending theirencoding signals simultaneously using parallel data transmission or by sending them one bit at a time in serial data transmission.

At the left is how a bus connects to a disk drive.


cs2mf3/wfsp L35-3

Input/Output and Storage Systems -- III/O Architectures –BUS connection schemes

Timing diagrams, such as this one at the upper right, define bus operation in detail.

Parallel data transmission for a printer, as shown at the lower left, resembles the signal protocol of a memory bus:

Slide 4

cs2mf3/wfsp L35-4

Input/Output and Storage Systems -- III

I/O Architectures – Parallel/Serial connection schemesIn parallel data transmission, the interface requires one conductor for each bit.Parallel cables are fatter than serial cables.Compared with parallel data interfaces, serial communications interfaces:

• Require fewer conductors.• Are less susceptible to attenuation.• Can transmit data farther and faster.

• Serial communications interfaces are suitable for time-sensitive (isochronous) data such as voice and video.

• Recent innovations have revitalized the Serial Line concept by adding faster interfaces that now resemble bus structures (but signals are still sequentially transferred) such as Universal Serial Bus called USB that can operate in excess of 600 Mbps which are speeds that are comparable to slower local area network speeds


cs2mf3/wfsp L35-5

Input/Output and Storage Systems -- IVI/O Devices – Magnetic Storage technology

Magnetic Disks• Magnetic disks offer

large amounts of durable storage that can be accessed quickly.

• Disk drives are called random (or direct)access storage devices,because blocks of data can be accessed according to their location on the disk.

– This term was coined when all other durable storage (e.g., tape) was sequential.

• Magnetic disk organization is shown at the right

Disk tracks are numbered from the outside edge, starting with zero.

Slide 6

cs2mf3/wfsp L35-6

Input/Output and Storage Systems -- VI/O Devices – Hard

Magnetic DisksHard disk platters aremounted on spindles.Read/write heads aremounted on a comb thatswings radially to read thedisk.The rotating disk forms a logical cylinder beneath the read/write heads.Data blocks are addressed by their cylinder, surface, and sector.There are a number of electromechanical properties of hard disk drives that determine how fast its data can be accessed.Seek time is the time that it takes for a disk arm to move into position over the desired cylinder.Rotational delay is the time that it takes for the desired sector to move into position beneath the read/write head.Seek time + rotational delay = access time.


cs2mf3/wfsp L35-7

Input/Output and Storage Systems -- VI

I/O Devices – Hard Magnetic DisksTransfer rate gives us the rate at which data can be read from the disk.Average latency is a function of the rotational speed:

Mean Time To Failure (MTTF) is a statistically-determined value often calculated experimentally.

• It usually does not tell us much about the actual expected life of the disk. Design life is usually more realistic.

At the left is a typical power specification sheet for a hard disk drive which usually include a read-write controller and cable socket connections for bus (both data and control) lines.

Slide 8

cs2mf3/wfsp L35-8


I/O Devices – Hard Magnetic Disks

At the left is a typical specification sheet for a hard disk drive which usually include a read-write controller and cable socket connections for bus (both data and control) lines.


cs2mf3/wfsp L35-9

Input/Output and Storage Systems -- IXI/O Devices – Floppy (Flexible) Magnetic Disks

Floppy (flexible) disks are organized in the same way as hard disks, with concentric tracks that are divided into sectors.Physical and logical limitations restrict floppies to much lowerdensities than hard disks.A major logical limitation of the DOS/Windows floppy diskette is the organization of its file allocation table (FAT).

• The FAT gives the status of each sector on the disk: Free, in use, damaged, reserved, etc.

On a standard 1.44MB floppy, the FAT is limited to nine 512-byte sectors.

• There are two copies of the FAT.There are 18 sectors per track and 80 tracks on each surface of a floppy, for a total of 2880 sectors on the disk. So each FAT entry needs at least 12 bits (211= 2048 < 2880 < 212 = 4096).

• Thus, FAT entries for disks smaller than 10MB are 12 bits, and the organization is called FAT12.

• FAT16 is employed for disks larger than 10MB.

Slide 10

cs2mf3/wfsp L35-10

Input/Output and Storage Systems -- XI/O Devices – Floppy (Flexible) Magnetic Disks

The disk directory associates logical file names with physical disk locations.

Directories contain a file name and the file’s first FAT entry.

If the file spans more than one sector (or cluster), the FAT contains a pointer to the next cluster (and FAT entry) for the file.

The FAT is read like a linked list until the <EOF> entry is found.

A directory entry says that a file we want to read starts at sector 121 in the FAT fragment shown below.

• Sectors 121, 124, 126, and 122 are read. After each sector is read, its FAT entry is to find the next sector occupied by the file.

• At the FAT entry for sector 122, we find the end-of-file marker <EOF>.


cs2mf3/wfsp L35-11

Input/Output and Storage Systems -- XI

I/O Devices – Optical Storage Media


Next Lecture Content:Input/Output Systems andOptical Disks


cs2mf3/wfsp L36-1

CS2MF3 – Digital Systems and Systems ProgrammingInput/Output Systems – Optical Storage Media

Optical disks provide large storage capacities very inexpensively.They come in a number of varieties including CD-ROM, DVD, and WORM.Many large computer installations produce document output on optical disk rather than on paper. This idea is called COLD--Computer Output Laser Disk.It is estimated that optical disks can endure for a hundred years. Other media are good for only a decade-- at best.CD-ROMs were designed by the music industry in the 1980s, and later adapted to data.This history is reflected by the fact that data is recorded in a single spiral track, starting from the center of the disk and spanning outward.Binary ones and zeros are delineated by bumps in the polycarbonate disk substrate. The transitions between pits and lands define binary ones.If you could unravel a full CD-ROM track, it would be nearly five miles long!

Slide 2

cs2mf3/wfsp L36-2

Input/Output and Storage Systems -

- II/O Devices –Optical Storage Media

Single spiral track at the upper right

At the lower left is the CD drive unit schematic


cs2mf3/wfsp L36-3

Input/Output and Storage Systems -- III/O Devices –Optical StorageMedia (CDs)

The logical dataformat for a CD-ROM is muchmore complexthan that of amagnetic diskas shown at the right and on the next slide. We will not go into detail.Different formats are provided for data and music.Two levels of error correction are provided for the data format.Because of this, a CD holds at most 650MB of data, but can contain as much as 742MB of music.

Slide 4

cs2mf3/wfsp L36-4

Input/Output and Storage Systems -- IIII/O Devices – Optical Storage Media (DVDs)

DVDs can be thought of as quad-density CDs.

Varieties include single sided, single layer, single sided double layer, double sided double layer, and double sided double layer.

Where a CD-ROM can hold at most 650MB of data, DVDs can hold as much as 17GB.

One of the reasons for this is that DVD employs a laser that has a shorter wavelength than the CD’s laser.

This allows pits and land to be closer together and the spiral track to be wound tighter.


cs2mf3/wfsp L36-5

Input/Output and Storage Systems -- IV

I/O Devices – Optical Storage MediaA shorter wavelength light can read and write bytes in greater densities than can be done by a longer wavelength laser.This is one reason that DVD’s density is greater than that of CD.The manufacture of blue-violet lasers can now be done economically, bringing about the next generation of laser disks.Two incompatible formats, HD-CD and Blu-Ray, are competing for market dominance.Blu-Ray was developed by a consortium of nine companies that includes Sony, Samsung, and Pioneer.

• Maximum capacity of a single layer Blu-Ray disk is 25GB.

HD-DVD was developed under the auspices of the DVD Forum with NEC and Toshiba leading the effort.

• Maximum capacity of a single layer HD-DVD is 15GB.The big difference between the two is that HD-DVD is backward compatible with red laser DVDs, and Blu-Ray is not.

Slide 6

cs2mf3/wfsp L36-6

Input/Output and Storage Systems -- V

I/O Devices – Optical Storage MediaBlue-violet laser disks have also been designed for use in the data center.The intention is to provide a means for long term data storage and retrieval.Two types are now dominant:

• Sony’s Professional Disk for Data (PDD) that can store 23GB on one disk and

• Plasmon’s Ultra Density Optical (UDO) that can hold up to 30GB.It is too soon to tell which of these technologies will emerge as the dominant technology.


cs2mf3/wfsp L36-7


I/O Devices – Magnetic Tape Storage MediaFirst-generation magnetic tape was not much more than wide analog recording tape, having capacities under 11MB.Data was usually written in nine vertical tracks:Today’s tapes are digital, and provide multiple gigabytes of data storage.Two dominant recording methods are serpentine and helical scan, which are distinguished by how the read-write head passes over the recording medium.

Slide 8

cs2mf3/wfsp L36-8


I/O Devices – Magnetic Tape Storage Media

Serpentine recording is used in digital linear tape (DLT) and Quarter inch cartridge (QIC) tape systems.Digital audio tape (DAT) systems employ helical scan recording.

← Serpentine

Helical Scan ^


cs2mf3/wfsp L36-9

Input/Output and Storage Systems -- VIIII/O Devices – Magnetic Tape Storage Media

Numerous incompatible tape formats emerged over the years.• Sometimes even different models of the same manufacturer’s tape

drives were incompatible!Finally, in 1997, HP, IBM, and Seagate collaboratively invented a best-of-breed tape standard.They called this new tape format Linear Tape Open (LTO) because the specification is openly available.LTO, as the name implies, is a linear digital tape format.The specification allowed for the refinement of the technology through four “generations.”Generation 3 was released in 2004.

• Without compression, the tapes support a transfer rate of 80MB per second and each tape can hold up to 400GB.

LTO supports several levels of error correction, providing superb reliability.

• Tape has a reputation for being an error-prone medium.

Slide 10

cs2mf3/wfsp L36-10


I/O Devices – Magnetic and Optical Storage Media

Text Section Covered: 7.7 and 7.8

Next Lecture Content:Input/Output Systems andMagnetic Tapes


cs2mf3/wfsp L37-1

CS2MF3 – Digital Systems and Systems ProgrammingFault-Tolerant Disk Storage Systems – RAID0

RAID, an acronym for Redundant Array of Independent Disks was invented to address problems of disk reliability, cost, and performance.In RAID, data is stored across many disks, with extra disks added to the array to provide error correction (redundancy).The inventors of RAID, David Patterson, Garth Gibson, and Randy Katz, provided a RAID taxonomy that has persisted for a quarter of a century, despite many efforts to redefine it.RAID Level 0, also known as drive spanning, provides improved performance, but no redundancy.

• Data is written in blocks across the entire array• The disadvantage of RAID 0 is in its low reliability.

Slide 2

cs2mf3/wfsp L37-2

Fault-Tolerant Disk Storage Systems -- I

Redundant Array of Independent Disks – RAID1RAID Level 1, also known as disk mirroring, provides 100% redundancy, and good performance.

• Two matched sets of disks contain the same data.• The disadvantage of RAID 1 is cost.


cs2mf3/wfsp L37-3

Fault-Tolerant Disk Storage Systems -- II

Redundant Array of Independent Disks – RAID2A RAID Level 2 configuration consists of a set of data drives, and a set of Hamming code drives.

• Hamming code drives provide error correction for the data drives.

• RAID 2 performance is poor and the cost is relatively high.

Slide 4

cs2mf3/wfsp L37-4

Fault-Tolerant Disk Storage Systems -- III

Redundant Array of Independent Disks – RAID3RAID Level 3 stripes bits across a set of data drives and provides a separate disk for parity.

• Parity is the XOR of the data bits.

• RAID 3 is not suitable for commercial applications, but is good for personal systems.


cs2mf3/wfsp L37-5

Fault-Tolerant Disk Storage Systems -- IV

Redundant Array of Independent Disks – RAID4RAID Level 4 is like adding parity disks to RAID 0.

• Data is written in blocks across the data disks, and a parity block is written to the redundant drive.

• RAID 4 would be feasible if all record blocks were the same size.

Slide 6

cs2mf3/wfsp L37-6

Fault-Tolerant Disk Storage Systems -- V

Redundant Array of Independent Disks – RAID5RAID Level 5 is RAID 4 with distributed parity.

• With distributed parity, some accesses can be serviced concurrently, giving good performance and high reliability.

• RAID 5 is used in many commercial systems.


cs2mf3/wfsp L37-7

Fault-Tolerant Disk Storage Systems -- VI

Redundant Array of Independent Disks – RAID6RAID Level 6 carries two levels of error protection over stripeddata: Reed-Soloman and parity.

• It can tolerate the loss of two disks.• RAID 6 is write-intensive, but highly fault-tolerant.

Slide 8

cs2mf3/wfsp L37-8

Fault-Tolerant Disk Storage Systems -- VII

Redundant Array of Independent Disks – RAIDdpDouble parity RAID (RAID DP) employs pairs of over-lapping parity blocks that provide linearly independent parity functions.


cs2mf3/wfsp L37-9

Fault-Tolerant Disk Storage Systems -- VIIIRedundant Array of Independent Disks – RAIDadg

Like RAID 6, RAID DP can tolerate the loss of two disks.The use of simple parity functions provides RAID DP with better performance than RAID 6.Of course, because two parity functions are involved, RAID DP’s performance is somewhat degraded from that of RAID 5.

• RAID DP is also known as EVENODD, diagonal parity RAID, RAID 5DP, advanced data guarding RAID (RAID ADG) and-- erroneously-- RAID 6.

Redundant Array of Independent Disks – RAID10Large systems consisting of many drive arrays may employ variousRAID levels, depending on the criticality of the data on the drives.

• A disk array that provides program workspace (say for file sorting) does not require high fault tolerance.

Critical, high-throughput files can benefit from combining RAID 0 with RAID 1, called RAID 10.Keep in mind that a higher RAID level does not necessarily mean a “better” RAID level. It all depends upon the needs of the applications that use the disks.

Slide 10

cs2mf3/wfsp L37-10

Fault-Tolerant Disk Storage Systems -- IX

Concepts of RAID


Next Lecture Content:Course Summary


cs2mf3/wfsp L38-1

CS2MF3 – Digital Systems and Systems ProgrammingFuture of Data Storage Systems

Advances in technology have defied all efforts to define the ultimate upper limit for magnetic disk storage.

• In the 1970s, the upper limit was thought to be around 2Mb/in2.• Today’s disks commonly support 20Gb/in2.

Improvements have occurred in several different technologies including:

• Materials science• Magneto-optical recording heads.• Error correcting codes.

As data densities increase, bit cells consist of proportionately fewer magnetic grains.There is a point at which there are too few grains to hold a value, and a 1 might spontaneously change to a 0, or vice versa.This point is called the superparamagnetic limit.

• In 2006, the superparamagnetic limit is thought to lie between 150Gb/in2

and 200Gb/in2 .Even if this limit is wrong by a few orders of magnitude, the greatest gains in magnetic storage have probably already been realized.

Slide 2

cs2mf3/wfsp L38-2

Disk Storage Systems -- IFuture of Data Storage Systems

Future exponential gains in data storage most likely will occur through the use of totally new technologies.Research into finding suitable replacements for magnetic disks is taking place on several fronts.Some of the more interesting technologies include:

• Biological materials• Holographic systems and• Micro-electro-mechanical devices.

Present day biological data storage systems combine organic compounds such as proteins or oils with inorganic (magentizable) substances.Early prototypes have encouraged the expectation that densities of 1Tb/in2 are attainable.Of course, the ultimate biological data storage medium is DNA.

• Trillions of messages can be stored in a tiny strand of DNA.Practical DNA-based data storage is most likely decades away.


cs2mf3/wfsp L38-3

Disk Storage Systems -- II

Future of Data Storage SystemsHolographic storage uses a pair of laser beams to etch a three-dimensional hologram onto a polymer medium.

Slide 4

cs2mf3/wfsp L38-4

Disk Storage Systems -- III

Future of Data Storage SystemsData is retrieved by passing the reference beam through the hologram, thereby reproducing the original coded object beam.


cs2mf3/wfsp L38-5

Disk Storage Systems -- IV

Future of Data Storage SystemsBecause holograms are three-dimensional, tremendous data densities are possible.Experimental systems have achieved over 30Gb/in2, with transfer rates of around 1GBps.In addition, holographic storage is content addressable.

• This means that there is no need for a file directory on the disk. Accordingly, access time is reduced.

The major challenge is in finding an inexpensive, stable, rewriteable holographic medium.Micro-electro-mechanical storage (MEMS) devices offer another promising approach to mass storage.IBM’s Millipede is one such device.Prototypes have achieved densities of 100Gb/in2 with 1Tb/in2

expected as the technology is refined.A photomicrograph of Millipede is shown on the next slide.

Slide 6

cs2mf3/wfsp L38-6

Disk Storage Systems -- V

Future of Data Storage SystemsMillipede consists of thousands of cantilevers that record a binary 1 by pressing a heated tip into a polymer substrate.

• The tip reads a binary 1 when it dips into the imprint in the polymer

Photomicrograph courtesy of the IBM Corporation.

© 2005 IBM Corporation


cs2mf3/wfsp L38-7

Disk Storage Systems -- VIConclusions on Data Storage Systems

I/O systems are critical to the overall performance of a computer system.Amdahl’s Law quantifies this assertion.I/O systems consist of memory blocks, cabling, control circuitry, interfaces, and media.I/O control methods include programmed I/O, interrupt-based I/O, DMA, and channel I/O.Buses require control lines, a clock, and data lines. Timing diagrams specify operational details.Magnetic disk is the principal form of durable storage.Disk performance metrics include seek time, rotational delay, and reliability estimates.Optical disks provide long-term storage for large amounts of data, although access is slow.Magnetic tape is also an archival medium. Recording methods aretrack-based, serpentine, and helical scan.

Slide 8

cs2mf3/wfsp L38-8

Disk Storage Systems -- VII

Conclusions on Data Storage SystemsRAID gives disk systems improved performance and reliability. RAID 3 and RAID 5 are the most common.RAID 6 and RAID DP protect against dual disk failure, but RAID DP offers better performance. Any one of several new technologies including biological, holographic, or mechanical may someday replace magnetic disks.The hardest part of data storage may be end up be in locating the data after it’s stored.


cs2mf3/wfsp L38-9

Disk Storage Systems -- VIII


Next Lecture Content:None – EOC (End, of course!)


course lecture notes for - mcmaster universitybruha/2mf3cs08.pdf · course lecture notes . for ....

Documents