memory and i

Upload: karthick77

Post on 07-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Memory and i

    1/52

    MEMORY AND I/O INTERFACING

    MEMORY

    Memory is an important part of embedded systems. The cost and performance of an embedded

    system heavily depends on the kind of memory devices it utilizes. In this section we will discuss

    about Memory Classification, Memory Technologies and Memory Management.

    (1) Memory Classification

    Memory Devices can be classified based on following characteristics

    (a) Accessibility

    (b) Persitance of Storage

    (c) Storage Density & Cost

    (d) Storage Media(f) Power Consumption

    Accessibility

    Memory devices can provide Random Access, Serial Access or Block Access. In a Random

    Access memory, each word in memory can be directly accessed by specifying the address of this

    memory word. RAM, SDRAMs, and NOR Flash are examples of Random Access Memories. In

    a Serial Access Memory, all the previous words (previous to the word being accessed) need to be

    accessed, before accessing a desired word. I2C PROM and SPI PROM are examples of Serial

    Access Memories. In Block Access Memories, entire memory is sub-divided in to small blocks

    (generally of the order of a KByte) of memory. Each block can be randomly accessed, and each

    word in a given block can be serially accessed. Hard Disks and NAND flash employ a similar

    mechanism. Word access time for a RAM (Random Access Memory) is independent of the

    word location. This is desirable of high speed application making frequent access to the memory.

    Persistence of Storage

    Memory devices can provide Volatile storage or a non-Volatile stroage. In a non-Volatile

    storage, the memory contents are preserved even after power shut down. Whereas a Volatile

    memory looses its contents, after power shut down. Non-Volatile storage is needed for storing

    application code, and re-usable data. However volatile memory can be used for all temporary

    storages. RAM, SDRAM are examples of volatile memory. Hard Disks, Flash (NOR & NAND)

    Memories, SD-MMC, and ROM are example of non-Volatile storages.

  • 8/3/2019 Memory and i

    2/52

    Storage Cells

    Memory Device may employ electronic (in terms of transistors or electron states) storage,

    magnetic storage or optical storage. RAM, SDRAM are examples of electronic storage. Hard

    Disks are example of magnetic storage. CDs (Compact Discs) are example of optical storage.

    Old Computers also employed magnetic storage (magnetic storages are still common in some

    consumer electronics products).

    Storage Density & Cost

    Storage Density (number of bits which can be stored per unit area) is generally a good meausre

    of cost. Dense memories (like SDRAM) are much cheaper than their counterparts (like SRAM).

    Power Consumption

    Low Power Consumption is highly desirable in Battery Powered Embedded Systems. Such

    systems generally employ memory devices which can operate at low (and ultra low) Voltage

    levels. Mobile SDRAMs are example of low power memories.

    (2) Memory Technologies

    RAM

    RAM stands for Random Access Memory. RAMs are simplest and most common form of data

    storage. RAMs are volatile. The figure below shows typical Data, Address and Control Signals

    on a RAM. The number of words which can be stored in a RAM are proportional (exponential of

    two) to the number of address buses available. This severely restricts the storage capacity ofRAMs (A 32 GB RAM will require 36 Address lines) because designing circuit boards with

    more signal lines directly adds to the complexity and cost.

    DPRAM (Dual Port RAM)

    DPRAM are static RAMs with two I/O ports. These two ports access the same memory locations

    - hence DPRAMs are generally used to implement Shared Memories in Dual Processor Systems.

    The operations performed on a single port are identical to any RAM. There are some common

    problems associated with usage of DPRAM:

    (a) Possible of data corruption when both ports are trying to access the same memory location -

    Most DPRAM devices provide interlocked memory accesses to avoid this problem.

    (b) Data Coherency when Cache scheme is being used by the processor accessing DPRAM -

    This happens because any data modifications (in the DPRAM) by one processor are unknown to

    the Cache controller of other processor. In order to avoid such issues, Shared memories are not

  • 8/3/2019 Memory and i

    3/52

    mapped to the Cacheable space. In case processor's cache configuration is not flexible enough (to

    define the shared memory space as non-cacheable), the cache needs to be flushed before

    performing any reads from this memory space.

    Dynamic RAM

    Dynamic RAMs use a different storage technique for data storage. A Static RAM has four

    transistors per memory cell, whereas Dynamic RAMs have only one transistor per memory cell.

    The DRAMs use capactive storage. Since the capacitor can loose charge, these memories need to

    be refreshed periodically. This makes DRAMs more complex (because we need to have extra

    control) and power consuming. However, DRAMs have a very high storage density (as

    compared to static RAMs) and are much cheaper in cost. DRAMs are generally accessed in

    terms of rows, columns and pages which significantly reduces the number of address buses

    (another advantage over RAM). Generally you need a SDRAM controller (which manages

    different SDRAM commands and Address translation) to access a SDRAM. Most of the modern

    processors come with an on-chip SDRAM controller.

    OTP- EPROM, UV-EPROM and EEPROM

    EPROMs (Electrically Programmable writable Read Only Memory) are non-volatile memories.

    Contents of ROM can be randomly accessed - but generally the word RAM is used to refer to

    only the volatile random access memories. The operating voltage for writing in to the EPROMs

    is much higher than the operating voltage. Hence you can write in to a PROM in-circuit (which

    signifies ROM). You need special programming stations (which have write mechanism) to write

    in to the EPROMs.

    OTP-EPROMs are One Time Programmable. Contents of these memories can not be changed,

    once written. UV-EPROM are UV erasable EPROMs. Exposure of memory cells, to UV light

    erases the exisiting contents of these memories and these can be re-programmed after that.

    EEPROM are Eletricaly Erasable EPROMs. These can be erased electrically (generally on the

    same programming station where you write in to them). The write cycles (number of times you

    can erase and re-write) for UV-EPROM and EEPROM is fairly limited. Erasable PROMs use

    either FLOTOX (Floating gate Tunnel Oxide) or FAMOS (Floating gate Avalanche MOS)

    technology.

    Flash (NOR)

  • 8/3/2019 Memory and i

    4/52

    Flash (or NOR-Flash to be more accurate) are quite similar to EEPROM in usage and can be

    considered in the class of EEPROM (since it is electically erasable). However there are a few

    differences. Firstly, the flash devices are in-circuit programmable. Secondly, these are much

    cheaper as compared to the conventional EEPROMs. These days (NOR) Flash are widely used

    for storing the boot code.

    NAND FLASH

    These memories are more dense and cheaper than NOR Flash. However these memories are

    block accessible, and can not be used for code execution. These devices are mostly used for Data

    Storage (since it is cheaper than NOR flash). However some systems use them for storing the

    boot codes (these can be used with external hardware or with built-in NAND boot logic in the

    processor).

    SD-MMC

    SD-MMC cards provide a cheaper mean of mass storage. These memory cards can provide

    storage capacity of the order of GBytes. These cards are very compact and can be used with

    portable systems. Most modern hand-held devices requiring mass storage (e.g. still and video

    cameras) use Memory cards for storage.

    Hard Disc

    Hard Discs are Optical Memory devices. These devices are bulky and they require another bulky

    hardware (disk reader) for reading these memories. These memories are generally used for Mass

    storage. Hence they memories do not exist in smaller and portable systems. However these

    memories are being used in embedded systems which require bulk storage without any size

    constraint.

    (3) Memory Management

    Cache Memory

    Size and the Speed (access time) of the computer memories are inversally proportional.

    Increasing the size means reduction in speed. Infact most of the memories are made up of

    smaller memory blocks (generally 4 KB) in order to improve the speed. Cost of the memory is

    also highly dependent on the memory speed. In order to achieve a good performance it is

    desirable that code and data must reside in a high speed memory. However using a high speed

    memory for all the code and data in a reasonably large system may be practically impossible.

    Even in a smaller system, using high speed memory as the only storage device can raise the

  • 8/3/2019 Memory and i

    5/52

    system cost exponentially.

    Most Systems employ a heirarichal memory system. They employ a small and fast (and

    expensive) memory device to store the frequently used code and data, whereas less frequently

    used data is stored in a big low speed (cheper) memory device. In a complex system there can be

    multiple level (with speed and cost) of memory heierarchy).

    Cache controller is a hardware (Generally built in to the processor) which can dynamically move

    the currently being used code and data from a higher level (slower) memory to the lower level

    (zero level or cache) memory. The in coming data or code replaces the old code or data (which is

    currently not being used) in the cache memory. The data (or code) movement is hidden to the

    user.

    Cache memories are based on the principle of locality in space and time. There are different

    types of cache mechanism and replacement mechanism.

    Software Overlays

    Why Overlays

    Low cost micro-processor generally do not have an in-built cache controller. But on these

    devices it may be still desirable to keep the currently being used code (or data) in internal

    memory and replace it with a new code section when it is not being used. This can be done using

    Software Overlays. Either code or data overlays can be used. In this section we will only

    discuss about code overlays (you can draw similar analogy for data overlays).Overlay Basics

    (a) Each code section which is mapped to an overlay has a run space and a live space. Live space

    is a space in the external (or high level) memory, where this code section resides, at non-runtime.

    Run space is a space in the internal (or lower level) memory, where this code resides during

    execution.

    (b) Overlay Manager is a piece of software which dynamically moves the code sections from live

    space to run space (whenever a function from given overlay section is called).

    (c) Linker and Loader tools generate overlay symbols corresponding to the code sections which

    are mapped to overlays. The overlay symbols are also supplemented by the information about

    run-space and live-space of the given overlay. This information is used by the overlay manager

    to move the overlays dynamically.

  • 8/3/2019 Memory and i

    6/52

    (d) You can have multiple overlays in your system. The overlay sections for a given overlay,

    have different live-space but the same run-space.

    Implementing overlays

    (a) Firstly you need to make sure that your code generation tools (linker and loader) provide

    some minimum support (in terms of overlays symbols) needed for the overlays.

    (b) Secondly you need to identify mutual exclusive code sections in your application. Mutually

    exclusive means that only one of these code section could be used at any given point of time.

    Also make sure that switching time between these code sections (i.e. the average time after

    which the processor will require some code from a different section) is quite high. Else, software

    overlays will degrade the performance (rather than improving it).

    (c) Make sure that you have enough run-space to accomodate the largest overlay section.

    (d) While implementing the code overlays, you can still choose to keep some code sections

    (which are not likely to improve the performance if used as overlays) out of overlays (these

    sections will have same live-space and run-space).

    Data overlays are analogous to code overlays. But there are rarely used.

    Virtual Memory

    Virtual Memory Mechanism allows users to store there data in a Hard Disk, whereas still use it

    as if it was available in RAM. The application makes accesses to the data in virtual address space

    (which is mapped to RAM), whereas the actuall data physically resides in Hard Disk (and ismoved to RAM for access).

    Paging Mechanism

    In virtual mode, memory is divided into pages usually 4096 bytes long (see page size). These

    pages may reside in any available RAM location that can be addressed in virtual mode. The high

    order bits in the memory address register are an index into page-mapping tables at specific

    starting locations in memory and the table entries contain the starting real addresses of the

    corresponding pages. The low order bits in the address register are an offset of 0 up to 4,095 (0

    to the page size - 1) into the page ultimately referenced by resolving all the table references of

    page locations.

    The distinct advantages of Virtual Memory Mechanism are:

    (a) User can access (in virtual space) more RAM space than what actually exists in the system.

    (b) In a multi-tasking application, each task can have its own independent virtual address space

  • 8/3/2019 Memory and i

    7/52

    (called discrete address space).

    (c) Applications can treat data as if it is stored in contiguous memory (in virtual address space),

    whereas it may be in dis contiguous locations (in actual memory).

    Cache Vs Virtual Memory

    Cache Memory and Virtual Memory are quite similar in concept and they provide similar

    benefits. However these schemes different significantly in terms of implementation:

    * Cache control is fully implemented in hardware. Virtual Memory Management is done by

    software (Operating System) with some minimum support from Hardware

    * With cache memory in use, user still makes accesses to the actual physical memory (and cache

    is hidden to the user). However it is reverse with Virtual Memory. User makes accesses to the

    virtual memory and the actual physical memory is hidden to the user.

    Cache memory

    The cache is a small amount of high-speed memory, usually with a memory cycle time

    comparable to the time required by the CPU to fetch one instruction. The cache is usually filled

    from main memory when instructions or data are fetched into the CPU. Often the main memory

    will supply a wider data word to the cache than the CPU requires, to fill the cache more rapidly.

    The amount of information which is replaces at one time in the cache is called the line size for

    the cache. This is normally the width of the data bus between the cache memory and the main

    memory. A wide line size for the cache means that several instruction or data words are loaded

    into the cache at one time, providing a kind of prefetching for instructions or data. Since the

    cache is small, the effectiveness of the cache relies on the following properties of most programs:

    Spatial locality-- most programs are highly sequential; the next instruction

    usually comes from the next memory location.

    Data is usually structured, and data in these structures normally are stored in contiguous

    memory locations.

    Short loops are a common program structure, especially for the innermost

    sets of nested loops. This means that the same small set of instructions is

    used over and over.

    Generally, several operations are performed on the same data values, or variables.

    When a cache is used, there must be some way in which the memory controller determines

    whether the value currently being addressed in memory is available from the cache. There are

  • 8/3/2019 Memory and i

    8/52

    several ways that this can be accomplished. One possibility is to store both the address and the

    value from main memory in the cache, with the address stored in a type of memory called

    associative memory or, more descriptively, content addressable memory.

    An associative memory, or content addressable memory, has the property that when a value is

    presented to the memory, the address of the value is returned if the value is stored in the

    memory, otherwise an indication that the value is not in the associative memory is returned.All

    of the comparisons are done simultaneously, so the search is performed very quickly. This type

    of memory is very expensive, because each memory location must have both a comparator and a

    storage element. A cache memory can be implemented with a block of associative memory,

    together with a block of ``ordinary'' memory. The associative memory would hold the address of

    the data stored in the cache, and the ordinary memory would contain the data at that address.

    Such a cache memory might be configured as shown in Figure .

    Figure: A cache implemented with associative memory

    If the address is not found in the associative memory, then the value is obtained from main

    memory.

    Associative memory is very expensive, because a comparator is required forevery wordin the

    memory, to perform all the comparisons in parallel. A cheaper way to implement a cache

    memory, without using expensive associative memory, is to use direct mapping. Here, part of the

    memory address (usually the low order digits of the address) is used to address a word in the

    cache. This part of the address is called the index. The remaining high-order bits in the address,

    called the tag, are stored in the cache memory along with the data.

    For example, if a processor has an 18 bit address for memory, and a cache of 1 K words of 2

    bytes (16 bits) length, and the processor can address single bytes or 2 byte words, we might have

    the memory address field and cache organized as in Figure .

    Figure: A direct mapped cache configuration

    This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there

    are 4 other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for

    each byte and one for the tag. The parity bits are used to check that a single bit error has not

    occurred to the data while in the cache. A fourth bit, called the valid bitis used to indicate

    http://web.cs.mun.ca/~paul/cs3725/material/web/notes/node3.html#figamcachehttp://web.cs.mun.ca/~paul/cs3725/material/web/notes/node3.html#figamcache
  • 8/3/2019 Memory and i

    9/52

    whether or not a given location in cache is valid. In the PDP-11/60 and in many other processors,

    the cache is not updated if memory is altered by a device other than the CPU (for example when

    a disk stores new data in memory). When such a memory operation occurs to a location which

    has its value stored in cache, the valid bit is reset to show that the data is ``stale'' and does not

    correspond to the data in main memory. As well, the valid bit is reset when power is first applied

    to the processor or when the processor recovers from a power failure, because the data found in

    the cache at that time will be invalid.

    In the PDP-11/60, the data path from memory to cache was the same size (16 bits) as from cache

    to the CPU. (In the PDP-11/70, a faster machine, the data path from the CPU to cache was 16

    bits, while from memory to cache was 32 bits which means that the cache had effectively

    prefetched the next instruction, approximately half of the time). The amount of information

    (instructions or data) stored with each tag in the cache is called the line size of the cache. (It is

    usually the same size as the data path from main memory to the cache.) A large line size allows

    the prefetching of a number of instructions or data words.Allitems in a line of the cache are

    replaced in the cache simultaneously, however, resulting in a larger block of data being replaced

    for each cache miss.

    The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K

    bytes. For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a

    cache configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes

    would be typical, and line sizes of 1, 2 or 4 words would be typical).

    Figure: One possible MIPS cache organization

    A characteristic of the direct mapped cache is that a particular memory address can be mapped

    into only one cache location. Many memory addresses are mapped to the same cache location (in

    fact, all addresses with the same index field are mapped to the same cache location.) Whenever a

    ``cache miss'' occurs, the cache line will be replaced by a new line of information from main

    memory at an address with the same index but with a different tag.

    Note that if the program ``jumps around'' in memory, this cache organization will likely not be

    effective because the index range is limited. Also, if both instructions and data are stored in

    cache, it may well happen that both map into the same area of cache, and may cause each other

  • 8/3/2019 Memory and i

    10/52

    to be replaced very often. This could happen, for example, if the code for a matrix operation and

    the matrix data itself happened to have the same index values.

    A more interesting configuration for a cache is theset associative cache, which uses aset

    associative mapping. In this cache organization, a given memory location can be mapped to

    more than one cache location. Here, each index corresponds to two or more data words, each

    with a corresponding tag. A set associative cache with n tag and data fields is called an ``n-way

    set associative cache''. Usually , fork= 1, 2, 3 are chosen for a set associative cache (k= 0

    corresponds to direct mapping). Such n-way set associative caches allow interesting tradeoff

    possibilities; cache performance can be improved by increasing the number of ``ways'', or by

    increasing the line size, for a given total amount of memory. An example of a 2-way set

    associative cache is shown in Figure , which shows a cache containing a total of 2K lines, or 1 K

    sets, each set being 2-way associative. (The sets correspond to the rows in the figure.)

    Figure: A set-associative cache organization

    In a 2-way set associative cache, if one data word is empty for a read operation corresponding to

    a particular index, then it is filled. If both data words are filled, then one must be overwritten by

    the new data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are

    filled, then one value in the set must be overwritten, or replaced, in the cache by the new tag and

    data values. Note that an entire line must be replaced each time. The most common replacement

    algorithms are:

    Random -- the location for the value to be replaced is chosen at random from all n of the

    cache locations at that index position. In a 2-way set associative cache, this can be

    accomplished with a single modulo 2 random variable obtained, say, from an internal

    clock.

    First in, first out (FIFO) -- here the first valuestoredin the cache, at each index position,

    is the value to be replaced. For a 2-way set associative cache, this replacement strategy

    can be implemented by setting a pointer to the previously loaded word each time a new

    word isstoredin the cache; this pointer need only be a single bit. (For set sizes > 2, this

    algorithm can be implemented with a counter value stored for each ``line'', or index in the

    cache, and the cache can be filled in a ``round robin'' fashion).

  • 8/3/2019 Memory and i

    11/52

    Least recently used (LRU) -- here the value which was actually used least recently is

    replaced. In general, it is more likely that the most recently used value will be the one

    required in the near future. For a 2-way set associative cache, this is readily implemented

    by setting a special bit called the ``USED'' bit for the other word when a value is

    accessedwhile the corresponding bit for the word which was accessed is reset. The value

    to be replaced is then the value with the USED bit set. This replacement strategy can be

    implemented by adding a single USED bit to each cache location. The LRU strategy

    operates by setting a bit in the other word when a value isstoredand resetting the

    corresponding bit for the new word. For an n-way set associative cache, this strategy can

    be implemented by storing a modulo n counter with each data word. (It is an interesting

    exercise to determine exactly what must be done in this case. The required circuitry may

    become somewhat complex, for large n.)

    Cache memories normally allow one of two things to happen when data is written into a memory

    location for which there is a value stored in cache:

    Write through cache -- both the cache and main memory are updated at the same time.

    This may slow down the execution of instructions which write data to memory, because

    of the relatively longer write time to main memory. Buffering memory writes can help

    speed up memory writes if they are relatively infrequent, however.

    Write back cache -- here only the cache is updated directly by the CPU; the cache

    memory controller marks the value so that it can be written back into memory when the

    word is removed from the cache. This method is used because a memory location may

    often be altered several times while it is still in cache without having to write the value

    into main memory. This method is often implemented using an ``ALTERED'' bit in the

    cache. The ALTERED bit is set whenever a cache value is written into by the processor.

    Only if the ALTERED bit is set is it necessary to write the value back into main memory

    (i.e., only values which have been altered must be written back into main memory). The

    value should be written back immediately before the value is replaced in the cache.

    The MIPS R2000/3000 processors used the write-through approach, with a buffer for the

    memory writes. (This was also the approach taken by the The VAX-11/780 processor ) In

    practice, memory writes are less frequent than memory reads; typically for each memory write,

    an instruction must be fetched from main memory, and usually two operands fetched as well.

  • 8/3/2019 Memory and i

    12/52

    Therefore we might expect about three times as many read operations as write operations. In

    fact, there are often many more memory read operations than memory write operations.

    Figure shows the behaviour (actually the miss ratio, which is equal to 1 - the hit ratio) for cache

    memories with various combinations of total cache memory capacity and line size. The results

    are from simulations of the behaviour of several ``typical'' program mixes. Several interesting

    things can be seen from these figures; Figure shows that the miss ratio drops consistently with

    cache size. Note, also, that increasing the line size is not always effective in increasing the

    throughput of the processor, although it does decrease the hit ratio, because of the additional time

    required to transfer large lines of data from the main memory to the cache.

    Figure: Cache memory performance for various line sizes

    It is interesting to plot the same data using log-log coordinates. Note that in this case. the graph is

    (very) roughly linear. Figure shows this plot.

    Figure: Log-log plot of cache performance for various line sizes

    The way size, or degree of associativity, of a cache also has an effect on the performance of a

    cache; the same reference determined that, for a fixed cache size, there was a roughly constant

    ratio between the performance of caches with a given set associativity and direct-mapped caches,

    independent of cache size. This relation is shown in Figure . (Of course, the performance of the

    set associative caches improved with associativity.)

    Figure: Cache adjustments for associatively (relative to direct mapping)

    MEMORY MANAGEMENT UNIT

    Modern MMUs typically divide the virtual address space (the range of addresses used by the

    processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they

    may be much larger. The bottom n bits of the address (the offset within a page) are left

    unchanged. The upper address bits are the (virtual) page number. The MMU normally translates

    virtual page numbers to physical page numbers via an associative cache called a Translation

    Lookaside Buffer(TLB). When the TLB lacks a translation, a slower mechanism involving

    hardware-specific data structures or software assistance is used. The data found in such data

    http://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Buffer
  • 8/3/2019 Memory and i

    13/52

    structures are typically calledpage table entries (PTEs), and the data structure itself is typically

    called apage table. The physical page number is combined with the page offset to give the

    complete physical address.

    A PTE or TLB entry may also include information about whether the page has been written to

    (the dirty bit), when it was last used (the accessed bit, for a least recently usedpage replacement

    algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and

    whether it should be cached.

    Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical

    random access memory has been allocated to that virtual page. In this case the MMU signals a

    page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to

    find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no

    RAM is free, it may be necessary to choose an existing page (known as a victim), using some

    replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can

    also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new

    mapping.

    In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory

    protection: an OS can use it to protect against errant programs, by disallowing access to memory

    that a particular program should not have access to. Typically, an OS assigns each program its

    own virtual address space.

    An MMU also reduces the problem offragmentation of memory. After blocks of memory have

    been allocated and freed, the free memory may become fragmented (discontinuous) so that the

    largest contiguous block of free memory may be much smaller than the total amount. With

    virtual memory, a contiguous range of virtual addresses can be mapped to several non-

    contiguous blocks of physical memory.

    In some early microprocessordesigns, memory management was performed by a separate

    integrated circuit such as the VLSI VI475 or the Motorola 68851 used with the Motorola 68020

    CPU in the Macintosh II or the Z8015 used with the Zilog Z80 family of processors. Later

    microprocessors such as the Motorola 68030 and the ZILOG Z280placed the MMU together

    with the CPU on the same integrated circuit, as did the Intel 80286 and laterx86

    microprocessors.

    http://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Fragmentation_(computer)http://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Motorola_68851http://en.wikipedia.org/wiki/Motorola_68020http://en.wikipedia.org/wiki/Macintosh_IIhttp://en.wikipedia.org/wiki/Zilog_Z80http://en.wikipedia.org/wiki/Motorola_68030http://en.wikipedia.org/wiki/ZILOG_Z280http://en.wikipedia.org/wiki/Intel_80286http://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Fragmentation_(computer)http://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Motorola_68851http://en.wikipedia.org/wiki/Motorola_68020http://en.wikipedia.org/wiki/Macintosh_IIhttp://en.wikipedia.org/wiki/Zilog_Z80http://en.wikipedia.org/wiki/Motorola_68030http://en.wikipedia.org/wiki/ZILOG_Z280http://en.wikipedia.org/wiki/Intel_80286http://en.wikipedia.org/wiki/X86
  • 8/3/2019 Memory and i

    14/52

    While this article concentrates on modern MMUs, commonly based on pages, early systems used

    a similar concept forbase-limit addressing, that further developed into segmentation. Those are

    occasionally also present on modern architectures. The x86 architectureprovided segmentation

    rather than paging in the 80286, and provides both paging and segmentation in the 80386 and

    later processors (although the use of segmentation is not available in 64-bit operation).

    Interrupts

    We just discussed how CALL and JUMP instructions can break the linear code flow in an

    application. Another event which can cause the change in program flow is called

    "INTERRUPT". Interrupts are signals (Hardware or Software) which can cause the program

    sequence to stop the normal program flow and execute instructions from a certain pre-defined

    location (known as Interrupt Vector Address). Interrupts can be triggered by a Hardware (e.g.

    state of an external CPU pin) or a Software (e.g. An illegal instruction execution like divide by

    ZERO) event. A CPU can have multiple interrupt channels and each of these channels will have

    its unique interrupt vector address. When an interrupt occurs, program sequencer starts

    processing instructions from the Interrupt Vector Address (of the associated interrupt channel).

    Similar to CALL instruction, the Return Address (address of the instruction which would have

    been fetched in absence of an interrupt event) is saved in one of the processor registers (some

    CPUs also save the current system state along with return address). An RTI (Return From

    Interrupt) instruction (similar to RTS) can bring the program flow back to the Return Address.

    The code which is stored at Interrupt Vector Address is called Interrupt Service Routine (ISR).

    RTI instruction generally forms the last instruction of ISR.

    Interrupt Controller: Is a Hardware inside the Processor which is responsible for managing the

    interrupt operations.

    Enabling Interrupts : Interrupts (on most processors) can be enabled or disabled by the

    programmer using a (Global) Interrupt Enable Bit. Interrupt Controllers also provide option for

    enabling or disabling each individual interrupt (on a local level).

    Interrupt Masking: Interrupt Mask is a control word (generally stored in a Interrupt Mask

    Register) which can be used to temporarily disable an interrupt (on a particular channel). The

    http://en.wikipedia.org/wiki/Page_(computing)http://en.wikipedia.org/w/index.php?title=Base-limit_addressing&action=edit&redlink=1http://en.wikipedia.org/wiki/Segmentation_(memory)http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/80286http://en.wikipedia.org/wiki/80386http://en.wikipedia.org/wiki/Page_(computing)http://en.wikipedia.org/w/index.php?title=Base-limit_addressing&action=edit&redlink=1http://en.wikipedia.org/wiki/Segmentation_(memory)http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/80286http://en.wikipedia.org/wiki/80386
  • 8/3/2019 Memory and i

    15/52

    Interrupt Mask contains control bits (mask bits) for each interrupt channel. If this bit is set, the

    interrupt for the corresponding interrupt channel is temporarily masked (and it remains masked

    unless the mask bit is cleared).

    Interrupt Priority : Interrupt Channels are associated with different priority levels. If two

    interrupts are acknowledged by the Interrupt Controller at same time, then the higher priority

    interrupt is processed first. Interrupt Priority Scheme helps to ensure that more important

    (interrupt) events gets processed first (as compared to less critical events. Critical Events (e.g.

    system power failure) are assigned with highest priority.

    Interrupt Mapping: Some Interrupt Controllers also provide flexibility of mapping the interrupt

    sources (events that generate events) to any of the available interrupt channel. This scheme has

    two major advantages. Firstly, in a system, (generally) not all the interrupts sources are active at

    a time. A fixed mapping (from source to channel) means that many of the interrupt channels will

    be un-utilized. However with a flexible mapping, it is possible to provide lesser interrupt

    channels (and active sources can be mapped to these channels). This reduces the Hardware

    complexity of Interrupt controller, and hence cost. Interrupt controller can also provide provision

    for mapping multiple sources to a single interrupt channel. In the ISR (for particular interrupt),

    the interrupt source (out of many sources mapped to this channel) can be identified by reading

    interrupt status register (this register has the corresponding bit set if an interrupt event occurs).

    Secondly, the interrupt sources can be assigned to interrupt channels with different priorities,

    based on the system requirement.

    Interrupts can be categorized into: maskable interrupt, non-maskable interrupt (NMI), inter-

    processor interrupt(IPI), software interrupt, and spurious interrupt.

    Maskable interrupt (IRQ) is a hardware interrupt that may be ignored by

    setting a bit in an interrupt mask register's (IMR) bit-mask.

    Non-maskable interrupt (NMI) is a hardware interrupt that lacks an associated

    bit-mask, so that it can never be ignored. NMIs are often used for timers,

    especially watchdog timers.

    Inter-processor interrupt (IPI) is a special case of interrupt that is generated

    by one processor to interrupt another processor in a multiprocessor system.

    Software interrupt is an interrupt generated within a processor by

    executing an instruction. Software interrupts are often used to implement

    http://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/IRQhttp://en.wikipedia.org/wiki/Interrupt_mask_registerhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Watchdog_timerhttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Multiprocessorhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/IRQhttp://en.wikipedia.org/wiki/Interrupt_mask_registerhttp://en.wikipedia.org/wiki/Non-maskable_interrupthttp://en.wikipedia.org/wiki/Watchdog_timerhttp://en.wikipedia.org/wiki/Inter-processor_interrupthttp://en.wikipedia.org/wiki/Multiprocessor
  • 8/3/2019 Memory and i

    16/52

    system calls because they implement a subroutine call with a CPU ring level

    change.

    Spurious interrupt is a hardware interrupt that is unwanted. They are

    typically generated by system conditions such as electrical interference on an

    interrupt line or through incorrectly designed hardware.

    Processors typically have an internal interrupt maskwhich allows software to ignore all external

    hardware interrupts while it is set. This mask may offer faster access than accessing an interrupt

    mask register (IMR) in a PIC, or disabling interrupts in the device itself. In some cases, such as

    the x86 architecture, disabling and enabling interrupts on the processor itself act as a memory

    barrier, however it may actually be slower.

    An interrupt that leaves the machine in a well-defined state is called a precise interrupt. Such

    an interrupt has four properties: The Program Counter (PC) is saved in a known place.

    All instructions before the one pointed to by the PC have fully executed.

    No instruction beyond the one pointed to by the PC has been executed (that

    is no prohibition on instruction beyond that in PC, it is just that any changes they

    make to registers or memory must be undone before the interrupt happens).

    The execution state of the instruction pointed to by the PC is known.

    An interrupt that does not meet these requirements is called an imprecise interrupt.

    Modern MMUs typically divide the virtual address space (the range of addresses used by the

    processor) intopages, each having a size which is a power of 2, usually a few kilobytes, but they

    may be much larger. The bottom n bits of the address (the offset within a page) are left

    unchanged. The upper address bits are the (virtual) page number. The MMU normally translates

    virtual page numbers to physical page numbers via an associative cache called a Translation

    Lookaside Buffer(TLB). When the TLB lacks a translation, a slower mechanism involving

    hardware-specific data structures or software assistance is used. The data found in such data

    structures are typically calledpage table entries (PTEs), and the data structure itself is typically

    called apage table. The physical page number is combined with the page offset to give the

    complete physical address.

    A PTE or TLB entry may also include information about whether the page has been written to

    (the dirty bit), when it was last used (the accessed bit, for a least recently usedpage replacement

    http://en.wikipedia.org/wiki/System_callhttp://en.wikipedia.org/wiki/Ring_(computer_security)http://en.wikipedia.org/wiki/Electrical_interferencehttp://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/System_callhttp://en.wikipedia.org/wiki/Ring_(computer_security)http://en.wikipedia.org/wiki/Electrical_interferencehttp://en.wikipedia.org/wiki/X86http://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Memory_barrierhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Page_(computer_science)http://en.wikipedia.org/wiki/Kilobytehttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Translation_Lookaside_Bufferhttp://en.wikipedia.org/wiki/Page_tablehttp://en.wikipedia.org/wiki/Least_recently_usedhttp://en.wikipedia.org/wiki/Page_replacement_algorithm
  • 8/3/2019 Memory and i

    17/52

    algorithm), what kind of processes (user mode, supervisor mode) may read and write it, and

    whether it should be cached.

    Sometimes, a TLB entry or PTE prohibits access to a virtual page, perhaps because no physical

    random access memory has been allocated to that virtual page. In this case the MMU signals a

    page fault to the CPU. The operating system (OS) then handles the situation, perhaps by trying to

    find a spare frame of RAM and set up a new PTE to map it to the requested virtual address. If no

    RAM is free, it may be necessary to choose an existing page (known as a victim), using some

    replacement algorithm, and save it to disk (this is called "paging"). With some MMUs, there can

    also be a shortage of PTEs or TLB entries, in which case the OS will have to free one for the new

    mapping.

    In some cases a "page fault" may indicate a software bug. A key benefit of an MMU is memory

    protection: an OS can use it to protect against errant programs, by disallowing access to memory

    that a particular program should not have access to. Typically, an OS assigns each program its

    own virtual address space.

    DMA

    DMA (Direct Memory Access) provides an efficient way of Data Transfers across "a Peipheral

    and Memory" or across "two memory regions". DMA is a processing engine which can perform

    data transfer operations (to or from the Memory). In absence of DMA engine, the CPU needs to

    handle these data operations, and hence the overall system performance is heavily reduced.DMA is specifically useful in the system which involve huge data transfers (in absence of DMA,

    CPU will be busy doing these transfers most of the time and will not be available for other

    processing).

    DMA Parameters : DMA Transfers involve a Source and a Destination. DMA Engine Transfers

    the data from Source to Destination. DMA engine requires source and destination addresses

    along with the Transfer Count in order to perform the data transfers. The (Source or Destination)

    Address could be a physical address (in case of a memory) or logical (in case of a peripheral).

    Transfer Counts specifies number of words which need to be transferred. As we mentioned

    before, Data transfer could be either from a Peripheral to Memory (generall called Received

    DMA) or from a Memory to Peripheral (generally called Transmit DMA) or from a Memory to

    another Memory (Generally called Memory DMA).

    Some DMA engines support additional parameters like Word-Size, and Address-Increment in

    http://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Page_replacement_algorithmhttp://en.wikipedia.org/wiki/User_modehttp://en.wikipedia.org/wiki/Supervisor_modehttp://en.wikipedia.org/wiki/Cachehttp://en.wikipedia.org/wiki/Random_access_memoryhttp://en.wikipedia.org/wiki/Page_faulthttp://en.wikipedia.org/wiki/Operating_systemhttp://en.wikipedia.org/wiki/Paginghttp://en.wikipedia.org/wiki/Memory_protectionhttp://en.wikipedia.org/wiki/Memory_protection
  • 8/3/2019 Memory and i

    18/52

    addition to the Start Address and Transfer Count. Word-Size specify the size of each transfer.

    Address-increment specifies the offset from current address (in memory), which the next transfer

    should use. This provides a way of tranferring data from non-contiguous memory locations.

    DMA Channels : DMA engine can support multiple DMA Channels. This means that at a given

    time, multiple DMA Transfers can happen (though physcially only one transfer may be possible,

    but logically DMA can handle many channels in parallel). This feature makes the life of software

    programmer very easy (as he does not have to wait for the current DMA operations to finish

    before he programs the next DMA operation). Each DMA channel will have control register

    where the DMA Parameters can be specified. DMA Channels also have an interrupt associated

    with it (on most processors) which (optionally) triggers after completion of DMA trasfer. Inside

    the ISR, programmer can take specific action (e.g. do some processign on the data which has

    been just received through DMA, or program a new DMA transfer).

    Chained DMA : Certain DMA controllers support an option for specifying DMA parameters in a

    Buffer (or array) in memory rather than directly writing it to DMA control registers (This is

    mostly applicable for the second DMA operation - parameters for first DMA operation are still

    specified in the control registers). This Buffer is called DMA Transfer Control Block (TCB).

    DMA controller takes the address of DMA TCB as one of the parameters, (in addition to the

    control parameters for first DMA transfer) and loads the DMA parameters (for second DMA

    operation) automatically from the Memory (after first DMA Operation is over). The TCB also

    contains an entry for "Next TCB Address", which provides an easy way for chaining multiple

    DMA operations in an automatic fashion (rather than having to program it after completion of

    each DMA). The DMA chaining can be stopped, by specifying a ZERO address in Next TCB

    Address field.

    Multi-diemnsional DMA : combined with Address-Increment gives many options.

    The simplest way to use DMA is to select a processor with an internal DMA controller. This

    eliminates the need for external bus buffers and ensures that the timing is handled

    correctly. Also, an internal DMA controller can transfer data to on-chip memory and

    peripherals, which is something that an external DMA controller cannot do. Because the

    handshake is handled on-chip, the overhead of entering and exiting DMA mode is often

    much faster than when an external controller is used.

    If an external DMA controller or processor is used, be sure that the hardware handles the

    transition between transfers correctly. To avoid the problem of bus contention, ensure that

  • 8/3/2019 Memory and i

    19/52

    bus requests are inhibited if the bus is not free. This prevents the DMA controller from

    requesting the bus before the processor has reacquired it after a transfer.

    So you see, DMA is not as mysterious as it sometimes seems. DMA transfers can provide

    real advantages when the system is properly designed.

    Figure 1: A DMA controller shares the processor's memory

    Hardware interrupts were introduced as a way to avoid wasting the processor's valuable time in

    polling loops, waiting for external events. They may be implemented in hardware as a distinct

    system with control lines, or they may be integrated into the memory subsystem.

    If implemented in hardware, an interrupt controller circuit such as the IBM PC's Programmable

    Interrupt Controller(PIC) may be connected between the interrupting device and the processor's

    interrupt pin to multiplex several sources of interrupt onto the one or two CPU lines typically

    available. If implemented as part of the memory controller, interrupts are mapped into the

    system's memory address space.

    SERIAL PROTOCOLS

    I2C Bus

    The physical I2C bus

    This is just two wires, called SCL and SDA. SCL is the clock line. It is used to synchronize all data

    transfers over the I2C bus. SDA is the data line. The SCL & SDA lines are connected to all devices

    on the I2C bus. There needs to be a third wire which is just the ground or 0 volts. There may also be

    a 5volt wire is power is being distributed to the devices. Both SCL and SDA lines are "open drain"

    drivers. What this means is that the chip can drive its output low, but it cannot drive it high. For the

    line to be able to go high you must provide pull-up resistors to the 5v supply. There should be a

    resistor from the SCL line to the 5v line and another from the SDA line to the 5v line. You only need

    one set of pull-up resistors for the whole I2C bus, not for each device, as illustrated below:

    The value of the resistors is not critical. I have seen anything from 1k8 (1800 ohms) to 47k (47000

    ohms) used. 1k8, 4k7 and 10k are common values, but anything in this range should work OK. I

    recommend 1k8 as this gives you the best performance. If the resistors are missing, the SCL and

    SDA lines will always be low - nearly 0 volts - and the I2C bus will not work.

    Masters and Slaves

    The devices on the I2C bus are either masters or slaves. The master is always the device that drives

    the SCL clock line. The slaves are the devices that respond to the master. A slave cannot initiate a

    http://en.wikipedia.org/wiki/Polling_(computer_science)http://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Memory_controllerhttp://en.wikipedia.org/wiki/Address_spacehttp://en.wikipedia.org/wiki/Polling_(computer_science)http://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Programmable_Interrupt_Controllerhttp://en.wikipedia.org/wiki/Memory_controllerhttp://en.wikipedia.org/wiki/Address_space
  • 8/3/2019 Memory and i

    20/52

    transfer over the I2C bus, only a master can do that. There can be, and usually are, multiple slaves

    on the I2C bus, however there is normally only one master. It is possible to have multiple masters,

    but it is unusual and not covered here. On your robot, the master will be your controller and the

    slaves will be our modules such as the SRF08 or CMPS03. Slaves will never initiate a transfer. Both

    master and slave can transfer data over the I2C bus, but that transfer is always controlled by themaster.

    The I2C Physical Protocol

    When the master (your controller) wishes to talk to a slave (our CMPS03 for example) it begins by

    issuing a start sequence on the I2C bus. A start sequence is one of two special sequences defined

    for the I2C bus, the other being the stop sequence. The start sequence and stop sequence are

    special in that these are the only places where the SDA (data line) is allowed to change while the

    SCL (clock line) is high. When data is being transferred, SDA must remain stable and not change

    whilst SCL is high. The start and stop sequences mark the beginning and end of a transaction with

    the slave device.

    Data is transferred in sequences of 8 bits. The bits are placed on the SDA line starting with the MSB

    (Most Significant Bit). The SCL line is then pulsed high, then low. Remember that the chip cannot

    really drive the line high, it simply "lets go" of it and the resistor actually pulls it high. For every 8 bits

    transferred, the device receiving the data sends back an acknowledge bit, so there are actually 9

    SCL clock pulses to transfer each 8 bit byte of data. If the receiving device sends back a low ACK

    bit, then it has received the data and is ready to accept another byte. If it sends back a high then it is

    indicating it cannot accept any further data and the master should terminate the transfer by sending

    a stop sequence.

    How fast?

    The standard clock (SCL) speed for I2C up to 100KHz. Philips do define faster speeds: Fast mode,

    which is up to 400KHz and High Speed mode which is up to 3.4MHz. All of our modules are

    designed to work at up to 100KHz. We have tested our modules up to 1MHz but this needs a small

    delay of a few uS between each byte transferred. In practical robots, we have never had any need to

    use high SCL speeds. Keep SCL at or below 100KHz and then forget about it.

    I2C Device Addressing

    All I2C addresses are either 7 bits or 10 bits. The use of 10 bit addresses is rare and is not covered

    here. All of our modules and the common chips you will use will have 7 bit addresses. This means

    that you can have up to 128 devices on the I2C bus, since a 7bit number can be from 0 to 127.

    When sending out the 7 bit address, we still always send 8 bits. The extra bit is used to inform the

    slave if the master is writing to it or reading from it. If the bit is zero are master is writing to the slave.

  • 8/3/2019 Memory and i

    21/52

    If the bit is 1 the master is reading from the slave. The 7 bit address is placed in the upper 7 bits of

    the byte and the Read/Write (R/W) bit is in the LSB (Least Significant Bit).

    The placement of the 7 bit address in the upper 7 bits of the byte is a source of confusion for the

    newcomer. It means that to write to address 21, you must actually send out 42 which is 21 movedover by 1 bit. It is probably easier to think of the I2C bus addresses as 8 bit addresses, with even

    addresses as write only, and the odd addresses as the read address for the same device. To take

    our CMPS03 for example, this is at address 0xC0 ($C0). You would uses 0xC0 to write to the

    CMPS03 and 0xC1 to read from it. So the read/write bit just makes it an odd/even address.

    The I2C Software Protocol

    The first thing that will happen is that the master will send out a start sequence. This will alert all the

    slave devices on the bus that a transaction is starting and they should listen in incase it is for them.

    Next the master will send out the device address. The slave that matches this address will continue

    with the transaction, any others will ignore the rest of this transaction and wait for the next. Having

    addressed the slave device the master must now send out the internal location or register number

    inside the slave that it wishes to write to or read from. This number is obviously dependant on what

    the slave actually is and how many internal registers it has. Some very simple devices do not have

    any, but most do, including all of our modules. Our CMPS03 has 16 locations numbered 0-15. The

    SRF08 has 36. Having sent the I2C address and the internal register address the master can now

    send the data byte (or bytes, it doesn't have to be just one). The master can continue to send data

    bytes to the slave and these will normally be placed in the following registers because the slave will

    automatically increment the internal register address after each byte. When the master has finished

    writing all data to the slave, it sends a stop sequence which completes the transaction. So to write to

    a slave device:

    1. Send a start sequence

    2. Send the I2C address of the slave with the R/W bit low (even address)

    3. Send the internal register number you want to write to

    4. Send the data byte

    5. [Optionally, send any further data bytes]

    6. Send the stop sequence.

    As an example, you have an SRF08 at the factory default address of 0xE0. To start the SRF08

    ranging you would write 0x51 to the command register at 0x00 like this:

    1. Send a start sequence

    2. Send 0xE0 ( I2C address of the SRF08 with the R/W bit low (even address)

    3. Send 0x00 (Internal address of the command register)

  • 8/3/2019 Memory and i

    22/52

    4. Send 0x51 (The command to start the SRF08 ranging)

    5. Send the stop sequence.

    Reading from the Slave

    This is a little more complicated - but not too much more. Before reading data from the slave device,

    you must tell it which of its internal addresses you want to read. So a read of the slave actually startsoff by writing to it. This is the same as when you want to write to it: You send the start sequence, the

    I2C address of the slave with the R/W bit low (even address) and the internal register number you

    want to write to. Now you send another start sequence (sometimes called a restart) and the I2C

    address again - this time with the read bit set. You then read as many data bytes as you wish and

    terminate the transaction with a stop sequence. So to read the compass bearing as a byte from the

    CMPS03 module:

    1. Send a start sequence

    2. Send 0xC0 ( I2C address of the CMPS03 with the R/W bit low (even address)

    3. Send 0x01 (Internal address of the bearing register)

    4. Send a start sequence again (repeated start)

    5. Send 0xC1 ( I2C address of the CMPS03 with the R/W bit high (odd address)

    6. Read data byte from CMPS03

    7. Send the stop sequence.

    The bit sequence will look like this:

    Wait a moment

    That's almost it for simple I2C communications, but there is one more complication. When the

    master is reading from the slave, its the slave that places the data on the SDA line, but its the master

    that controls the clock. What if the slave is not ready to send the data! With devices such as

    EEPROMs this is not a problem, but when the slave device is actually a microprocessor with other

    things to do, it can be a problem. The microprocessor on the slave device will need to go to an

    interrupt routine, save its working registers, find out what address the master wants to read from, get

    the data and place it in its transmission register. This can take many uS to happen, meanwhile the

    master is blissfully sending out clock pulses on the SCL line that the slave cannot respond to. The

    I2C protocol provides a solution to this: the slave is allowed to hold the SCL line low! This is called

    clock stretching. When the slave gets the read command from the master it holds the clock line low.

    The microprocessor then gets the requested data, places it in the transmission register and releases

    the clock line allowing the pull-up resistor to finally pull it high. From the masters point of view, it will

    issue the first clock pulse of the read by making SCL high and then check to see if it really has gone

    high. If its still low then its the slave that holding it low and the master should wait until it goes high

  • 8/3/2019 Memory and i

    23/52

    before continuing. Luckily the hardware I2C ports on most microprocessors will handle this

    automatically.

    CAN BUS

    Controller Area Network (CAN) is a multicast shared serial bus standard, originally

    developed in the 1980s by Robert Bosch GmbH, for connecting electronic control

    units (ECUs). CAN was specifically designed to be robust in electromagnetically

    noisy environments and can utilize a differential balanced line like RS-485. It can be

    even more robust against noise if twisted pair wire is used. Although initially

    created for automotive purposes (as a vehicle bus), nowadays it is used in many

    embedded control applications (e.g., industrial) that may be subject to noise.

    Bit rates up to 1 Mbit/s are possible at networks length below 40 m. Decreasing the

    bit rate allows longer network distances (e.g. 125 kbit/s at 500 m).The CAN data link layer protocol is standardized in ISO 11898-1 (2003). This

    standard describes mainly the data link layer composed of the Logical Link

    Control (LLC) sublayer and the Media Access Control (MAC) sublayer and some

    aspects of the physical layer of the ISO/OSI Reference Model. All the other protoc ol

    layers are left to the network designer's choice.

    CAN transmit data through a binary model of "dominant" bits and "recessive" bits

    where dominant is a logical 0 and recessive is a logical 1. If one node transmits a

    dominant bit and another node transmits a recessive bit then the dominant bit"wins" (a logical AND between the two).

    So, if you are transmitting a recessive bit, and someone sends a dominant bit, you

    see a dominant bit, and you know there was a collision. (All other collisions are

    invisible.) The way this works is that a dominant bit is asserted by creating a

    voltage across the wires while a recessive bit is simply not asserted on the bus. If

    anyone sets a voltage difference, everyone sees it, hence, dominant.

    Commonly when used with a differential bus, a Carrier Sense Multiple

    Access/Bitwise Arbitration (CSMA/BA) scheme is implemented: if two or moredevices start transmitting at the same time, there is a priority based arbitration

    scheme to decide which one will be granted permission to continue transmitting.

    During arbitration, each transmitting node monitors the bus state and compares the

    received bit with the transmitted bit. If a dominant bit is received when a recessive

    bit is transmitted then the node stops transmitting (i.e., it lost arbitration).

  • 8/3/2019 Memory and i

    24/52

    Arbitration is performed during the transmission of the identifier field. Each node

    starting to transmit at the same time sends an ID with dominant as binary 0,

    starting from the high bit. As soon as their ID is a larger number (lower priority)

    they'll be sending 1 (recessive) and see 0 (dominant), so they back off. At the end

    of ID transmission, all nodes bar one have backed off, and the highest priority

    message gets through unimpeded.

    Data transmissionFrames all frames (aka messages) begin with a start-of-frame (SOF) bit that, obviously, denotes

    the start of the frame transmission.

    CAN has four frame types:

    Data frame: a frame containing node data for transmission

    Remote frame: a frame requesting the transmission of a specific identifier

    Error frame: a frame transmitted by any node detecting an error

    Overload frame: a frame to inject a delay between data and/or remote frames

    Data frameThe data frame is the only frame for actual data transmission. There are two message

    formats:

    Base frame format: with 11 identifier bits

    Extended frame format: with 29 identifier bits

    The CAN standard requires the implementation mustaccept the base frame format and may

    accept the extended frame format, but musttolerate the extended frame format.

    USB Protocols

    Unlike RS-232 and similar serial interfaces where the format of data being sent is not defined,

    USB is made up of several layers of protocols. While this sounds complicated, dont give up

    now. Once you understand what is going on, you really only have to worry about the higher level

    layers. In fact most USB controller I.C.s will take care of the lower layer, thus making it almost

    invisible to the end designer.

    Each USB transaction consists of ao Token Packet (Header defining what it expects to follow), an

    o Optional Data Packet, (Containing the payload) and a

    o Status Packet (Used to acknowledge transactions and to provide a

    means of error correction)

  • 8/3/2019 Memory and i

    25/52

    As we have already discussed, USB is a host centric bus. The host initiates all transactions. The

    first packet, also called a token is generated by the host to describe what is to follow and whether

    the data transaction will be a read or write and what the devices address and designated endpoint

    is. The next packet is generally a data packet carrying the payload and is followed by an

    handshaking packet, reporting if the data or token was received successfully, or if the endpoint is

    stalled or not available to accept data.

    Common USB Packet Fields

    Data on the USBus is transmitted LSBit first. USB packets consist of the following fields,

    o Sync

    All packets must start with a sync field. The sync field is 8 bits long at low and

    full speed or 32 bits long for high speed and is used to synchronise the clock of

    the receiver with that of the transmitter. The last two bits indicate where the PID

    fields starts.

    o PID

    PID stands for Packet ID. This field is used to identify the type of packet that is

    being sent. The following table shows the possible values.

    P

    a

    c

    k

    e

    t

    I

    d

    e

    n

    t

    i

    f

    i

    e

    Group

  • 8/3/2019 Memory and i

    26/52

    r

  • 8/3/2019 Memory and i

    27/52

    O

    U

    T

    T

    o

    k

    e

    n

    P

    I

    D

    V

    a

    l

    u

    e

    Token

  • 8/3/2019 Memory and i

    28/52

    0001 1001 IN Token

    S

    O

    F

    T

    o

    k

    e

    n

    0101 1101 SETUP Token

    Data 0011 DATA0

    DA

    T

    A

    1D

    A

    T

    A

    2

    1

    0

    1

    1

    0111 1111 MDATA

    ACK

    Han

    dsh

    ake

    Handshake

    N

    A

    K

  • 8/3/2019 Memory and i

    29/52

    H

    a

    n

    d

    s

    h

    a

    k

    e

    0

    0

    1

    0S

    T

    A

    L

    L

    H

    an

    d

    s

    h

    a

    k

    e

    10

    1

    0N

    Y

  • 8/3/2019 Memory and i

    30/52

    E

    T

    (

    N

    o

    R

    e

    s

    p

    o

    n

    s

    e

    Y

    e

    t

    )

    1

    1

    1

    0P

    R

    E

    a

    mb

    l

    e

    0

    1

    Special

  • 8/3/2019 Memory and i

    31/52

    1

    0

    1100 1100 ERR

    1000 Split

    0100 Ping

    There are 4 bits to the PID, however to insure it is received correctly, the 4 bits

    are complemented and repeated, making an 8 bit PID in total. The resulting

    format is shown below.

    P

    I

    D

    2

    P

    I

    D

    3

    n

    P

    I

    D

    0

    n

    P

    I

    D

    1

    n

    P

    I

    D

    2

    n

    P

  • 8/3/2019 Memory and i

    32/52

    I

    D

    3

    o PID0PID1ADDR

    The address field specifies which device the packet is designated for. Being 7 bitsin length allows for 127 devices to be supported. Address 0 is not valid, as any

    device which is not yet assigned an address must respond to packets sent to

    address zero.

    o ENDP

    The endpoint field is made up of 4 bits, allowing 16 possible endpoints. Low

    speed devices, however can only have 2 additional endpoints on top of the default

    pipe. (4 endpoints max)

    o CRC

    Cyclic Redundancy Checks are performed on the data within the packet payload.

    All token packets have a 5 bit CRC while data packets have a 16 bit CRC.

    o EOP

    End of packet. Signalled by a Single Ended Zero (SE0) for approximately 2 bit

    times followed by a J for 1 bit time.

    USB Packet Types

    USB has four different packet types. Token packets indicate the type of transaction to follow,

    data packets contain the payload, handshake packets are used for acknowledging data or

    reporting errors and start of frame packets indicate the start of a new frame.

    o Token Packets

    There are three types of token packets,

    In - Informs the USB device that the host wishes to read

    information.

    Out - Informs the USB device that the host wishes to send

    information.

    Setup - Used to begin control transfers.

    Token Packets must conform to the following format,

    Sync PIDADD

    R

    END

    P

    CRC

    5EOP

  • 8/3/2019 Memory and i

    33/52

    o Data Packets

    There are two types of data packets each capable of transmitting up to 1024 bytes

    of data.

    Data0

    Data1

    High Speed mode defines another two data PIDs, DATA2 and MDATA.

    Data packets have the following format,

    Sync PID DataCRC

    16EOP

    Maximum data payload size for low-speed devices is 8 bytes.

    Maximum data payload size for full-speed devices is 1023 bytes.

    Maximum data payload size for high-speed devices is 1024bytes.

    Data must be sent in multiples of bytes.

    o Handshake Packets

    There are three type of handshake packets which consist simply of the PID

    ACK- Acknowledgment that the packet has been successfully

    received.

    NAK- Reports that the device temporary cannot send or

    received data. Also used during interrupt transactions to informthe host there is no data to send.

    STALL - The device finds its in a state that it requires

    intervention from the host.

    Handshake Packets have the following format,

    Sync PID EOP

    o Start of Frame Packets

    The SOF packet consisting of an 11-bit frame number is sent by the host every

    1ms 500ns on a full speed bus or every 125 s 0.0625 s on a high speed

    bus.

    F

    r

    a

  • 8/3/2019 Memory and i

    34/52

    m

    e

    N

    u

    m

    b

    e

    r

    C

    R

    C

    5

    E

    O

    P

    SyncPIDUSB Functions

    When we think of a USB device, we think of a USB peripheral, but a USB device could mean a

    USB transceiver device used at the host or peripheral, a USB Hub or Host Controller IC device,

    or a USB peripheral device. The standard therefore makes references to USB functions which

    can be seen as USB devices which provide a capability or function such as a Printer, Zip Drive,

    Scanner, Modem or other peripheral.

    So by now we should know the sort of things which make up a USB packet. No? You're

    forgotten how many bits make up a PID field already? Well don't be too alarmed. Fortunately

    most USB functions handle the low level USB protocols up to the transaction layer (which we

    will cover next chapter) in silicon. The reason why we cover this information is most USB

    function controllers will report errors such as PID Encoding Error. Without briefly covering this,

    one could ask what is a PID Encoding Error? If you suggested that the last four bits of the PID

    didn't match the inverse of the first four bits then you would be right.

    Most functions will have a series of buffers, typically 8 bytes long. Each buffer will belong to an

    endpoint - EP0 IN, EP0 OUT etc. Say for example, the host sends a device descriptor request.

  • 8/3/2019 Memory and i

    35/52

    The function hardware will read the setup packet and determine from the address field whether

    the packet is for itself, and if so will copy the payload of the following data packet to the

    appropriate endpoint buffer dictated by the value in the endpoint field of the setup token. It will

    then send a handshake packet to acknowledge the reception of the byte and generate an internal

    interrupt within the semiconductor/micro-controller for the appropriate endpoint signifying it has

    received a packet. This is typically all done in hardware.

    The software now gets an interrupt, and should read the contents of the endpoint buffer and parse

    the device descriptor request.

    PCI LOCAL BUS

    The PCI (Peripheral Component Interconnect) is a high performance Bus for interconnecting

    chips, expansion boards, and memory cards. It was originated at Intel Inc. In the early

    1990s as standard methods of interconnecting chips on a board. It was later adopted as an

    indusVL-Bus stands for VESA Bus a cloacl bus architecture create by VESA(Video Electronics

    Standards Association). Was popularly used in early 1990s computers.

    Typically used for VGA cards that drove the graphics of the computer display:try standard

    administered by the PCI Special Interest Group or the PCI SIG.

    The basic form of the PCI presents a fusion of sorts between ISA and VL-Bus. It provides

    direct access to system memory for connected devices, but uses a ?bridge to connect to

    the frontside bus and therefore to the CPU. Basically, this means that it is capable of even

    higher performance than VL-Bus while eliminating the potential for interference with the

    CPU. PCI can connect more devices than VL-Bus, up to five external components. Each of

    the five connectors for an external component can be replaced with two fixed devices on the

    motherboard. Also, you can have more than one PCI bus on the same computer, although

    this is rarely done. The PCI bridge chip regulates the speed of the PCI bus independently of

    the CPU's speed. This provides a higher degree of reliability and ensures that PCI hardware

    manufacturers know exactly what to design for.

    PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the standard

    include increasing the speed from 33 MHz to 66 MHz and doubling the bit count to 64.

    Currently, PCI-X provides for 64-bit transfers at a speed of 133 MHz for an amazing 1-Gbps

    (gigabit per second) transfer rate.

    PCI cards use 47 pins to connect provided there is a CPU. The PCI bus is able to work with

    so few pins because of hardware ?multiplexing, which means that the device sends more

    http://www.codepedia.com/1/PCIhttp://www.codepedia.com/1/Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/Intelhttp://www.pcisig.com/http://www.codepedia.com/1/VL-Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/devicehttp://www.codepedia.com/1/bridgehttp://www.codepedia.com/1/CPUhttp://www.codepedia.com/1/bithttp://www.codepedia.com/1/Multiplexhttp://www.codepedia.com/1/PCIhttp://www.codepedia.com/1/Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/Intelhttp://www.pcisig.com/http://www.codepedia.com/1/VL-Bushttp://www.codepedia.com/1/memoryhttp://www.codepedia.com/1/devicehttp://www.codepedia.com/1/bridgehttp://www.codepedia.com/1/CPUhttp://www.codepedia.com/1/bithttp://www.codepedia.com/1/Multiplex
  • 8/3/2019 Memory and i

    36/52

    than one signal over a single pin.. The connectors at the end of the card are connected to

    the motherboard slot and are called gold fingers.

    PERIPHERALS

    Peripherals (of a processor) are its means of communicating with the external world.

    (1) Peripheral Classification

    Peripherals can be classified based on following characteristics

    Simplex, Duplex & Semi Duplex

    Simplex communication involves unidirectional data transfers. Duplex communication involves bi-

    directional data transfers. Full Duplex interfaces have independent channels for transmission and

    reception. Semi-duplex communication involves data bi-directional data transfers, however at a given

    time, the data transfer is only possible in one direction. Semi-duplex interfaces involves the same

    communication channel for both transmission and reception.

    Serial Vs Parallel

    Serial peripherals communicate over a single data line. The data at Tx end needs to be converted Parallel

    to Serial before transmission and the data at Rx end needs to be converted Serial to Parallel after

    reception. Serial peripherals imply less signal lines on the external interface and thus reduced hardware

    (circuit board) complexity and cost. However the data rate on serial interfaces are fairly limited (as

    compared to the parallel interface). At the same clock rate, parallel interface can transfer Nx data, as

    compared to the serial interface (where N is the number of Data lines).

    Synchronous Vs Asynchronous

    Synchronous transfers are synchronized by a reference clock on the interface. This clock signal is

    generally provided by one of the devices (who are communicating) on the interface, called master device.

    However clock can also come from an external source.

    Data Throughput

  • 8/3/2019 Memory and i

    37/52

    Interfaces can also be classified based on the data throughput they offers. Generally parallel interfaces

    provide much more data throughput and are used for application data (this data needs to be processed by

    the application). Serial interfaces offer less data throughputs, and are generally used to transfer

    intermittent control data.

    (2) Common Serial Peripherals

    (a) UART (Universal Asynchronous Receiver Transmitter)

    UART is one of the oldest and most simple serial interface. Generally UART is used to tranfer data

    between different PCBs (Printed Circuit Boards). These PCBs can be either in the same system or across

    differnt systems. In its simplest configuration, UART consists of two pin interface. One pin is used for

    Transmission, and other for Reception.

    The data on UART is transferred word by word. A word consists of Start Bit, Data bits (5 to 8), (and

    optional parity bit) and (1, 1.5 or 2) Stop Bit. The individual bits of data word are transferred one by one

    on the serial bus.

    Start Bit: The Tx Line of a UART Transitter is high during periods of inactivity (when no communication

    is taking place). When the transmitter wants to initiate a data transmission it sends one START bit (drives

    the Tx line low) for one bit duration.

    Data Bits:Number of data bits can be configured to any value between 5 and 8. UART employs LSB first

    Transmission.

    Parity Bit: One parity bit can be optionally transitted along with each data word. The parity bit can be

    configured either as Odd or as even.

    Stop Bit: After each word transmission, transmitter transmits Stop bits (drives the Tx line high). Number

    of stop bits can be configured as 1, 1.5 or 2.

    Asynchronous Transmission: UART data transfers are asynchronous. The transmitter transmits each bit

    (of the word being transmitted) for a fixed duration (defined by baud rate). The receiver polls the value of

    transmit line (of transmitter). In order to be able to receive the data correctly, receiver needs to be aware

    of the duration for which each bit is transmitted (it is defined by baud rate).

    Baud Rate: Baud is a measurement of transmission speed in asynchronous communication. It is defined

    as the number of distinct symbol changes made to the transmission media per second. Since UART signal

    has only two levels (high and low), baud rate here is also equal to the bit rate.

    RS-232 and DB-9

    UART can be used to transfer data directly across any two devices. However the most common usage of

    UART involves transfer of data from a PC (or other host computer) to a remote board (other slave

    device). Under such scenarios (where distance between two devices is more than a few inches), physical

  • 8/3/2019 Memory and i

    38/52

    interface between Tx and Rx devices is defined by RS-232 specifications. Signals at each end are

    terminated to a 9-pin (DB-9) connector.

    Debugging UART Interface

    Following steps could be helpful while debugging communication problems on a UART interface

    (a) UART loop-back: Run the internal loop-back tests on both Rx and Tx (most UART devices provide

    this functionality). This will ensure that each device is functional (not damaged)

    (b) Check the Configuration: If the communication between two devices is failing, there could be a

    configuration mismatch between Tx and Rx. Cross-check the configuration at both sides and ensure that it

    is identical.

    (c) Check the Serial Cable: Generally two UARTs are connected through a serial cable (which has 9-pin

    connectors on both sides). The cable should be a cross-over (Tx on one side connects to Rx on other side).

    A faulty (damaged or wrong corssings) serial cable can also cause erratic behavior. Make sure that cable

    is not damaged.

    (d)Probe the Tx signal: If UART communication still remains erratic (after checks a, b and c), the last

    resort would be to probe the UART signals using a scope.

    Limitation: Both the sender and receive should agree to a predefined configuration (Baud Rate, Parity

    Settings, number of data and stop bits). A mismatch in the configuration at two ends (Transmitter and

    Receiver), will cause communication failure (data corruption). Data rates are very slow. Also, if there are

    more devices involved in communication, the number of external pins needed on the device increase

    proportionally.

    (b) SPI

    Serial Peripheral Interface (SPI) provides an easy way to communicate across various (SPI compatible)

    Devices in a system. SPI involves synchronous data transfers. Example of SPI compatible peripherals are

    Microprocessors, Data Converters and LCD Displays. Communication on SPI bus occurs with a Master

    and Slave relationship. Generally, a Micro-processors acts as the SPI bus master, and peripheral devices

    (such as Data Converters or Displays) act as slave devices. At times, there could be multiple micro-

    processors (or CPUs) on a given SPI bus. In such cases, a HOST processor wil act as SPI Master, and

    other processors will act as SPI slaves. Multi-master configurations (though rarely used) are also possible.

    SPI is a four wire interface. The fours signals on SPI bus are:

    * CLK : Clock signal is used for synchronizing the data transfers. It is output from Master and Input to the

    slave.

    * MISO: stands for Master In Slave Out. As the name suggests it is output from Slave and Input to the

    Master. This signal is used for transferring data from Slave Device to the Master Device.

  • 8/3/2019 Memory and i

    39/52

    * MOSI: stands for Master Out Slave In. This signal is an output from Master and is input to the slave. It

    is used for transferring data from Master Device to Slave device.

    * SSEL: Slave Select is output from the Master and is an input to the slave. This signal needs to be

    asserted (by the Master) for any transfers to be recognized by the slave. In a multi-slave configuration,

    Master device can have multiple slave select signals (one for each slave) and only the currently selected

    slave (corresponding SSEL signal asserted) will acknowledge the data transfers.

    Multiple Slave Scenario

    Under SPI protocol, one Master device can be connected to multiple slave devices through multiple SSEL

    lines. Master can assert SSEL for only the device, with who master wants to communicate. Selecting

    multiple slaves at a time, can damage the MISO pin (since multiple slaves will try to drive this line).

    Multi-mas