implementing a usb 2.0 intellectual
TRANSCRIPT
Final Year Project Spring Report
Implementing a USB 2.0 Intellectual
Property Core on FPGA
Presented By: Liza Tutunjian
Arine Hadidian George Ghanem
2
Final Year Project Spring Report
Implementing a USB 2.0
Intellectual Property core on
FPGA
By Liza Tutunjian
Arine Hadidian George Ghanem
Advisors
Dr. Mazen Saghir Dr. Ali Chehab
Department of Electrical and Computer Engineering
American University of Beirut
3
May 23, 2006
Abstract
Implementing a USB 2.0 Intellectual Property core on FPGA By Liza Tutunjian, Arine Hadidian and George Ghanem
This report contains the work we have completed concerning the implementation of a high speed USB
Intellectual Property (IP) core on an FPGA.
In the first chapter, we define the problem that we are setting to solve, highlight the practical importance of the
topic we chose as our FYP, summarize the tools that we shall use to complete the project and the estimated
budget and present a schedule of our plan of work for the Spring term.
In the second chapter, we included a Review section that provides the reader with background for USB specific
information, a summary of solutions that have already been implemented and the reason why we chose an FPGA
implementation over an ASIC implementation.
The third chapter, Design and Analysis, presents a high level block diagram of the USB system that we will
implement and explains the relationship between the various blocks as well as the responsibility of each in
delivering a working USB IP core.
The fourth chapter, Implementation, defines the design of our hardware system and our software layer. It
provides a description of the main components involved in the system.
The fifth chapter, Evaluation, includes the test cases that we performed at different stages of the project to ensure
the functionality of our project.
The last chapter, Conclusion, presents some of the problems we faced along with alternative solutions, areas of
future work and consideration of design constraints.
4
Table Of Contents 1.0 Introduction............................................................................................................. 6
1.1 Problem Statement Background ................................................................................................... 6 1.2 Problem Statement ....................................................................................................................... 7 1.3 Practical Importance of our FYP................................................................................................... 8 1.4 Budget........................................................................................................................................... 8 1.5 Tools for implementing the FYP.................................................................................................... 9
1.5.1 Virtex-II™ V2MB1000 Development Board........................................................................... 9 1.5.2 Embedded Development Kit Software................................................................................. 10
2.0 Review................................................................................................................... 12 2.1 Introduction ................................................................................................................................. 12 2.2 USB Background......................................................................................................................... 13
2.2.1 Interlayer Communication Model......................................................................................... 13 2.2.2 USB Packet Field Formats .................................................................................................. 16
2.3 Approach to solving problem ...................................................................................................... 17 2.4 Literature Survey......................................................................................................................... 18
3.0 Design and Analysis................................................................................................ 20 3.1 High Level Design ....................................................................................................................... 20 3.2 Datasheet Summary ................................................................................................................... 21
3.2.1 Introduction .......................................................................................................................... 21 3.2.2 Features............................................................................................................................... 21 3.2.3 Block Diagram ..................................................................................................................... 22 3.2.4 System Overview................................................................................................................. 23 3.2.5 Signal Definitions ................................................................................................................. 24 3.2.6 Registers.............................................................................................................................. 24 3.2.7 Choice of Design ................................................................................................................. 27
4.0 Implementation....................................................................................................... 28 4.1 USB Core General Implementation ............................................................................................ 29
4.1.1 Host Controller..................................................................................................................... 31 4.1.2 Transmitter........................................................................................................................... 36 4.1.3 Receiver............................................................................................................................... 43 4.1.4 Host Controller Driver .......................................................................................................... 47
5.0 Evaluation.............................................................................................................. 50 5.1 Host Controller Testing ............................................................................................................... 50 5.2 Transmitter Testing ..................................................................................................................... 52 5.3 Receiver Testing ......................................................................................................................... 60 5.4 Testing the USB Core on the FPGA ........................................................................................... 65
6.0 Conclusion............................................................................................................. 74 6.1 Difficulties Faced......................................................................................................................... 74 6.2 Future Work ................................................................................................................................ 77 6.3 Design Constraints...................................................................................................................... 77
7.0 References............................................................................................................. 78
5
Table Of Figures Figure 1: VirtexII System Board .............................................................................................................. 9 Figure 2: Embedded Development Kit Environment ............................................................................. 10 Figure 3: Simple USB Host/Device View............................................................................................... 13 Figure 4: USB Implementation Areas.................................................................................................... 13 Figure 5: Host Communication.............................................................................................................. 14 Figure 6: High Level System Block Diagram......................................................................................... 20 Figure 7: Overall System....................................................................................................................... 28 Figure 8: Host Controller ....................................................................................................................... 32 Figure 9: Transmitter ............................................................................................................................. 37 Figure 10: USB Cable............................................................................................................................ 37 Figure 11: Speed and Data States ........................................................................................................ 38 Figure 12: Handshake Packet ............................................................................................................... 39 Figure 13: Data Packet.......................................................................................................................... 40 Figure 14: Token Packet ....................................................................................................................... 40 Figure 15: NRZI Encoding..................................................................................................................... 41 Figure 16: Receiver ............................................................................................................................... 44 Figure 17: High Speed Data In Tick ...................................................................................................... 44 Figure 18: Cygnal CP2101 .................................................................................................................... 48 Figure 19: FSM states ........................................................................................................................... 46 Figure 20: SETUP Transaction ............................................................................................................. 51 Figure 21: OUT (0/1) Transaction ......................................................................................................... 51 Figure 22: IN Transaction...................................................................................................................... 52 Figure 23: Writing to the USB wire at High Speed ............................................................................... 53 Figure 24: Idle State .............................................................................................................................. 54 Figure 25: SYNC Bytes 1 and 2 ............................................................................................................ 54 Figure 26: SYNC Byte 3 and 4 .............................................................................................................. 55 Figure 27: Token Packet ....................................................................................................................... 55 Figure 28: Setup Token Packet............................................................................................................. 56 Figure 29: Little Endian ......................................................................................................................... 56 Figure 30:End Of Packet ...................................................................................................................... 57 Figure 31: Data Packet.......................................................................................................................... 58 Figure 32: Bit Stuffing ............................................................................................................................ 59 Figure 33: Handshake Packet ............................................................................................................... 59 Figure 34: High Speed Detection Waveform......................................................................................... 56 Figure 35: Receiving Bits at High Speed Waveform............................................................................. 58 Figure 36: Forming a Byte Waveform ................................................................................................... 57 Figure 37: Processing Bytes that represent an ACK Waveform ........................................................... 60 Figure 38: Processing Bytes that represent Data Waveform................................................................ 61 Figure 39: High Speed Rate Problem ................................................................................................... 72
List of Tables
Table 1: Signals...................................................................................................................................................24 Table 2: Register Description ................................................................................................................ 26 Table 3: Compare alternative designs with our design features ........................................................... 27 Table 4: Bus States ............................................................................................................................... 37 Table 5: Speed and Data States ........................................................................................................... 38 Table 6: Bit to Byte Conversion............................................................................................................. 43 Table 7: Bytes input into the Byte Analyzer component........................................................................ 60 Table 8: Processing Bytes that represent Data..................................................................................... 61 Table 9: Difficulties ................................................................................................................................ 74
6
1.0 Introduction 1.1 Problem Statement Background
The enhancements in hardware technology have led to the existence of Field Programmable Gate
Arrays (FPGA) that are large enough to accommodate a complete system on a single device. Thus,
such devices have been called “system on a programmable chip” (SOPC). The single-chip design
allows designers to place a large number of functions onto a SOPC and to reprogram this chip from
the desktop thus removing engineering costs from prototyping and testing of new designs.
For the past decade, Intellectual Property (IP) cores have been developed to provide consumers with
a collection of cores to decrease the customer's time-to-market. IP cores form an essential element of
design reuse and are part of the growing trend towards repeated use of previously designed
components. At this point, a very diverse set of IP cores is now available on today’s advanced FPGA
devices. This offers system developers the opportunity to mix a match various microprocessor,
embedded memory, system I/O functions on an FPGA while cutting down design time and reducing
risk due to the industry availability of well performing and low cost components. The vast majority of
those IP cores are created and owned by either the FPGA vendors themselves (such as Xilinx) or
licensed from microprocessor vendors.
Popular IP core implementations include functions such as USB, digital signal processing FIR and IIR
filters, fast Fourier transforms, adders, multipliers UARTs and more.
The motivation behind developing “FPGA plus IP core solutions” was accelerated due to the increased
functionality, performance and flexibility that these solutions presented over other approaches to
system design.
7
1.2 Problem Statement
Although high speed Universal Serial Bus (USB) IP cores have been implemented by commercial
companies, vendors protect the trade secret and patents for their tremendous investment in time and
development effort. FPGA vendors are working together with core providers to devise methods to
license IP cores. A popular concept is to allow users to try-out the core by down-loading from the net
for simulation-place-route trial use. If the user is satisfied with the core, a license fee is paid, enabling
a key, which allows programming of the device. The problem here is that these cores are relatively
expensive to purchase and only provide a border definition of the interface between the processor and
the downloaded core.
With the recent availability of advanced FPGA boards in our digital labs that have physical USB ports
on them, we found that a need exists for developing a high speed USB host IP core. Without such a
core, any hardware system downloaded on an FPGA cannot communicate with a USB device. In a
computer running Windows operating system, this functionality is implemented and the user of a USB
device knows very little about USB packets that are sent back and forth between the host computer
and a USB device. In order to enable the same thing for a hardware system that has been
downloaded on an FPGA, that is, allow it to communicate with an attached USB device, we had to
implement the hardware design of a host that implements the USB protocol.
Due to the significant emergence of USB based applications in today’s world, we believe that
implementing a USB IP core that can be reused by coming generations of engineering students at
AUB will provide the groundwork for developing more advanced systems that make use of USB
devices.
The main idea behind our Final Year Project is to develop a non-commercial, student research based
IP core that implements a high speed Universal Serial Bust Host, and add this core as a block to the
system hardware architecture. Basically we will design a hardware system that has a processor
running code that communicates with a USB device.
8
1.3 Practical Importance of our FYP
Having defined the problem that we will be addressing, we need to highlight the practicality of meeting
such a goal.
Generally speaking, the practicality of developing IP cores to build complex systems allows companies
to sell standard solutions written in HDL for implementation on the designers own FPGA’s and thus
removes an element of re-inventing the wheel while spreading development time, and thus costs,
around different companies.
Moreover, in today’s industry, time-to-market pressures continue to increase. Irrespective of how well
the previous project was completed, there is pressure to complete the next one in less time, less cost
and higher performance. “FPGA plus IP core solutions” will continue to feed that market need to build
faster and better systems-on-chip well into the next decade and beyond.
As a result of our FYP, the availability of an IP core for use by the faculty of engineering at AUB, will
remove an element of re-inventing the wheel and allow students/faculty in the future more time to fully
focus on optimizing system architectures and developing even more functionality into their USB
compliant end products.
1.4 Budget
The Faculty of Engineering and Architecture at AUB have purchased several Virtex-II™ V2MB1000
Development Boards that include Xilinx FPGA’s. The price of a development kit (the FPGA and the
board) is approximately $2900. All the hardware that we need to implement the USB core is available
in the digital labs. In addition to the hardware, the software is also available and installed on all the
computers in the digital lab. To be able to implement the USB host protocol, we will have to study the
USB 2.0 Specification, which we downloaded for free and thus this did not affect our budget.
9
1.5 Tools for implementing the FYP
1.5.1 Virtex-II™ V2MB1000 Development Board
The Virtex-II board, shown in the figure below, utilizes up to 1 million gates and contains a large
number of I/O’s to facilitate implementation.
Figure 1: VirtexII System Board Some of the components present on the board are:
• Xilinx FPGA
• 2 clock sources
• RS-232 port
• LED’s, switches and 7 segment displays
• P160 additional Module interface that can add USB physical interface, SRAM memory and
Ethernet interface.
• 16M x 16 DDR memory
We initially wanted to place the P160 additional module, which contains the physical USB port, in the
P160 expansion slot on the board to be able to test our system. However, we faced a problem with
this and we had to come up with an alternative solution. Instead of actually testing the USB core using
the physical USB wire, we designed a block in VHDL that simulates the USB device to ensure that our
system was functional.
10
1.5.2 Embedded Development Kit Software
To implement our core on the FPGA, we used the EDK software development kit that came with
the Virtex-II™ V2MB1000 Development Board. The EDK is an all encompassing design environment
for Virtex-II Pro MicroBlaze based embedded systems in Xilinx FPGAs. The figure below shows the
tools that the EDK environment provides for the implementation of embedded applications.
Figure 2: Embedded Development Kit Environment
We used Xilinx Platform Studio within the EDK environment to build a hardware system that contains:
1- Pre-designed microblaze soft processor: The design of this processor is provided with the
EDK kit and can be added with a click of a button.
2- VHDL USB core peripheral: The design of this peripheral was implemented by us.
We connected the VHDL peripheral as a slave to the Microblaze processor through the On-Chip
peripheral (OPB) bus. As part of the Hardware Development Flow, we synthesized the VHDL files for
the system above into a netlist that contains AND, OR, XOR, NAND gates and so forth. This netlist
was then mapped, placed and routed to fit onto the FPGA. Finally in the Hardware Development flow,
we generated the bitstream and then download the bitstream onto the FPGA through the JTAG port.
11
We also used XPS to write C code that functions as a simple driver for USB hardware core and is run
by the aforementioned Microblaze processor. As part of the Software Development Flow, we compiled
and downloaded the C code in on-chip memory. Specifically we used 64 KB of BRAM memory which
is the local data and instruction memory.
12
2.0 Review 2.1 Introduction
USB (Universal Serial Bus) is the serial bus which can realize the Plug & Play feature for easy
connection of peripherals to PCs. It removes the need to open up a PC when adding a new peripheral
device and allows the required software to be installed automatically.
In the mid-1990s, a core team of engineers from Compaq, DEC, IBM, Intel, Microsoft, NEC and
Northern Telecom (now, Nortel Networks) led to the development of a high speed serial bus
specification, USB 2.0. Today, more than 1000 companies develop products which can be connected
to the PC via USB.
Popular in the PC and telecom market for several years now, USB is designed to support standard PC
peripherals and specialist devices. PC peripherals supported by USB include modems, keyboards,
mice, CD ROM drives, joysticks, tape/floppy/hard drives, scanners and printers. Moreover, a new
wave of peripherals such as telephones, digital speakers, digital snapshot and motion cameras, data
gloves and digitizers are to take advantage of this exciting and versatile new interface.
A range of data traffic workloads can be serviced over a USB: Low-speed (10-100 kb/s) for interactive
devices, full-speed (500 kb/s – 10 Mb/s) for phone, audio, compressed video, high-speed (25 – 400
Mb/s) for video or storage. Note that the signaling rates for the low speed, full speed and high speed
protocol are 1.5Mbps, 12Mbps and 480Mbps respectively. But these are maximum values and
practically the rate of communication with the device is below these maxima.
13
2.2 USB Background
The following two sections contain background information about the USB 2.0 Specification. The first
section describes the different layers of the USB host communication model and the second section
defines some common USB packet fields.
2.2.1 Interlayer Communication Model
The USB cable provides communication services between a host and attached USB devices. A host is
any device that has USB devices attached to it. The view an end user sees of attaching one or more
USB devices to a host is little more complicated to implement than is indicated by the figure below.
Figure 3: Simple USB Host/Device View
The host is made of three distinct layers, shown in Figure 4 below. A physical device is attached to the
host. This device is typically a function that provides capabilities to the system. The physical device is
also implemented in three distinct layers (right side). Physical communication between the host and
the physical device occurs horizontally through the lowest layer which is the USB wire. The vertical
arrows between the layers indicate the actual communication on the host. Moreover, there is logical
host-device communication between each horizontal layer above the physical layer.
Figure 4: USB Implementation Areas
14
For our Final Year Project, we implemented parts of the host side only which is why the following
discussion will focus on the layers to the left of the figure above. Figure 5 below is a more detailed
view of the different layers of a USB host.
Our work focused mainly on the lowest, most physical layer (highlighted in blue in the figure below)
that accepts hardware defined data from the level above it and sends bits on the USB cable. In order
to test our hardware, we wrote a simple Host Controller Driver in C, highlighted in green in the figure
above, which completes a transaction with the USB device. In fact, our software system interface is
dependent on our hardware implementation and does not follow the specification for USB drivers. The
software layer simply abstracts the details of the protocol that are implemented by the hardware layer.
The software layer is involved at the level of transactions, whereas the hardware layer is involved at
the level of packets and bits. As for the highest layer which is the client software, this would typically
be C code that interacts with a USB device using only very high level functions such as read_USB( )
and write_USB( ). The client layer was not implemented as part of our FYP.
Layers we implemented
VHDL
C
Figure 5: Host Communication
15
1. Physical Layer
The physical layer, referred to as USB Bus Interface Layer in a USB environment (see Figure 5), is the
hardware that handles the transmission of raw bits over the USB wire. This is the lowest layer in the
figure above. It is composed of two blocks: Serial Interface Engine and Host Controller. Data flowing
out of the USB host passes through the Host Controller first, then through the Serial Interface Engine.
a) Serial Interface Engine (SIE): The SIE performs several functions including serialization
and de-serialization of transmissions, encoding and decoding of the signals, generation and
verification of cyclic redundancy checks and detection of packet IDs and special signals.
b) Host Controller (HC): The host controller, initiates transactions and controls access to the
USB. It divides the time into “frames” and issues a start-of-frame packet at each frame
interval. In addition, it processes requests for data to and from the host and handles errors.
2. Protocol Engine Layer
The middle layer is composed of three sub-blocks: Host Controller Driver, USB Driver and USB
system software.
a) Host Controller Driver (HCD): The HCD (see Figure 5), is the lowest tier in the USB
software stack. It is the USB software layer that abstracts the Host Controller hardware and
provides an interface for interaction with the Host Controller. We wrote part of the HCD to test
our hardware system.
The blocks that are described after this are required to enable a client application to interact with a
USB device. Implementing these layers was not within the scope of our project, but we will describe
them so that the reader can have an idea of the logical flow that occurs from the highest to the lowest
layers.
16
b) USB Driver (USBD): The interface between the USB System Software and the Client
Software. This interface provides clients with convenient functions for manipulating USB
devices.
c) USB system software: The USB system software (see Figure 5), allocates bus bandwidth
and manages bus power. It identifies, enumerates, and services data requests from devices
on the bus.
3. Application Layer
The application layer is also known as Client Software (see Figure 5). Client software determines what
transfers need to be made with a function. Client software is aware only of the set of pipes (i.e., the
interface) it needs to manipulate its function. Requests made by the client software are presented via
the USBD interface.
2.2.2 USB Packet Field Formats
For the purposes of our report, we did not find it necessary to define the details of the USB protocol
such as the exact format of packets exchanged. However, we need to describe very briefly a few
terms because they will be used in the Design and Analysis chapter to explain how our Design meets
the standard.
Here, we will simply define some of the most recurring fields in USB packets.
SYNC: All packets begin with a synchronization (SYNC) field. It is used by the input circuitry to align
incoming data with the local clock. The Start-of-Packet (SOP) delimiter is part of the SYNC field.
PID: A packet identifier (PID) immediately follows the SYNC field of every USB packet. A PID consists
of a four-bit packet type field followed by a four-bit check field. The PID indicates the type of packet
and, by inference, the format of the packet and the type of error detection applied to the packet. There
are four types of packets: Token (OUT, IN, SOF or SETUP), Data (DATA0, DATA1, DATA2 or
17
MDATA), Handshake (ACK, NAK, STALL or NYET) and Special (PRE, ERR, SPLIT, PING or
Reserved).
Note that an IN PID specifies a transaction from a function to the host.
Whereas OUT/SETUP PIDs specify transactions from the host to a function.
ADDRESS FIELD: Function endpoints are addressed using two fields: the function address field and
the endpoint field. A function needs to fully decode both address and endpoint fields.
DATA FIELD: The data field may range from zero to 1,024 bytes and must be an integral number of
bytes. Data bits within each byte are shifted out LSB first.
CRC: Cyclic redundancy checks (CRCs) are used to protect all non-PID fields in token and data
packets. Token and data packet CRCs provide 100% coverage for all single- and double-bit errors.
2.3 Approach to solving problem
We have decided to solve the problem of designing a hardware system that can interface to a USB
device by implementing a USB 2.0 Revision (high-speed) compliant Intellectual Property core on an
FPGA.
In fact, an alternative to implementing IP cores using FPGAs is doing so using Application Specific
Integrated Circuits (ASIC). In the past, it was a rule of thumb that densities of more than 500,000
gates and volumes above 100,000 units were beyond the capability of FPGAs. Today, FPGAs
approach ASIC-equivalent densities of 1 million gates which is the reason why we found it important to
discuss why an ASIC based approach was not taken. In what follows, we will briefly discuss the
alternative of implementing our project on an ASIC. In doing so, we will compare and contrast it to
implementation on an FPGA and state why we chose to use an FPGA based implementation instead.
First, an Application Specific Integrated Circuit (ASIC) is a chip designed to do a certain specific job or
a small group of jobs. If you want to implement different functionality, then you need to use a different
chip.
18
Second, An FPGA can be re-programmed again and again, until all bugs are removed and the system
is working correctly. However, an ASIC is hard-wired with a mask. Once it is fabricated, no changes
can be made. Usually, in the commercial world, a system that has been prototyped on an FPGA is
migrated to an ASIC as one of the final stages before selling the product.
Although an FPGA consumes more power than an ASIC, it was a better choice for us as students
implementing an IP core since we could take advantage of the debugging and reprogramming
advantages that it offers.
Third, ASICs are usually made in large quantities by big companies. The total investment is large, but
the unit cost is small if the chip is manufactured in large quantities. However, ASICs have nonrecurring
engineering (NRE) costs that are pretty high if the end result is targeted towards fabricating only one
chip. On the other hand, FPGAs can be used for one-offs since they do not have nonrecurring
engineering (NRE) costs. Since in our case, we will be implementing only a single chip, the decision of
using an FPGA is to avoid high costs.
Moreover, the purpose of our project is not commercial and we are not concerned about the unit cost
of implementing the chip since it will not be for sale. In addition to the reasons mentioned above, the
defining factor in our choice of an FPGA over ASIC was the availability of the FPGA boards with
physical USB boards in our labs.
2.4 Literature Survey
Have USB IP’s been implemented before? The answer is yes, USB cores are being implemented with
every emerging processor. The USB IP core that interfaces the Microblaze processor is also available.
However they are commercial IP’s and are not for free.
19
Alternatives to USB core
The USB IP core is the controller that is required if, for example, you wish to use your USB mouse or
USB memory stick. Our controller is the device that acts as a bridge between the Microblaze
processor and other USB devices. If it is not present then there is no way that you can utilize USB
devices. To be able to interface your processor with a USB device, three options are available:
1. Buy a standard chip or product
2. Buy a commercial core
3. Design the USB core
Buy a standard chip or product
In this solution, an extra chip will be added to the design. This third party chip will have a
microcontroller and other logic that will act as a mediator between the processor and the USB devices.
Since we are trying to add a USB core to the Microblaze soft CPU on the FPGA, using this design
option would be unwise since we would end up using another chip leaving us with a bulky and costly
design.
Buy a commercial core
Another solution would be to purchase a ready made USB core that will be mapped and downloaded
into the FPGA. What we are actually purchasing is the VHDL code that describes the USB IP. This
solution is fast and not risky since the purchased USB core ensures high performance according to
USB standards. This solution is usually pursued by design companies that require a USB interface
with the processor. However, purchasing the IP core ourselves would do us no good since we wish to
design a noncommercial IP core, one that will be used freely in AUB labs.
Design the USB core
Finally we come to the solution that we have chosen. In this solution, the USB core is designed in
VHDL and implemented in the FPGA. This solution is the most tedious of all. It requires a lot of work
since familiarity with the exhaustive details of the USB protocol is needed. The hard part in creating
the design is verifying compliance and interoperability with the USB standards.
20
3.0 Design and Analysis 3.1 High Level Design We designed a hardware system on an FPGA that consists of the Microblaze Microprocessor, the
RS232 Core and the USB core that we implemented. All these blocks are connected through the OPB
bus as shown in the figure below.
FPGA
MicroBlazeMicroprocessor
USB Core
On Chip Peripheral Bus (OPB)
RS232 Core
Designed by Xilinx
Designed by us
Figure 6: High Level System Block Diagram
The Microblaze processor runs the C code that implements the Host Controller Driver on the FPGA.
This C code interacts with the USB core by writing and reading from registers. Finally we use the
RS232 Core to display the results of the C code on the screen.
We researched into a large number of USB Embedded Host Controller datasheets implemented by
National Semiconductors, Cypress Semiconductors, Maxim, and Philips. We also designed our
system in reference to a full speed version of a USB HostSlave IP core. We wanted a USB host
controller block diagram which is at the same time simple enough for us to implement, compliant with
the USB 2.0 specifications and including all functionalities necessary for the design of the USB-to-USB
21
data transfer application. We finally settled for a block diagram whose summary datasheet is written
below by reference to the datasheets mentioned above.
3.2 Datasheet Summary
3.2.1 Introduction
The host controller enables an embedded system to function as a USB Host, dramatically expanding
the degree of interconnectivity and extending the applicability of USB into many new areas.
3.2.2 Features
• USB host controller for embedded applications.
• USB Specification 2.0 compliant
• Standard 8-bit microprocessor bus interface
• Supports high speed, full speed and low speed USB transactions
• Connected to the Microblaze processor as a slave on the OPB bus.
22
3.2.3 Block Diagram
Block Diagram:
Microblaze
address_i(0:7) data i (0:7)clk rstwe_i
Bus Interface
Physical Interface (USB Port)
USB Serial Interface Engine
USB Host Controller
USBSpeed
data o(0:7)
Receive FIFO Transmit FIFO
HostSOFSentIntOut
USB CORE
OPB bus
USBWireDataOut (0:1)
USBWireDataIn(0:1)
HostConnEventIntOut
HostResumeIntOut
HostTransDoneIntOut
strobe i
Calculate Reset
23
3.2.4 System Overview
The host controller block diagram consists of five major blocks (refer to Figure above).
• The USB Serial Interface Engine: Seen in a dotted black box in the figure above.
Provides the interface between the Physical USB wires and the USB Core. It deals with low-
level bit granularity by processing the incoming and outgoing data bits on the wires. It is
composed of a SIE receiver that de-stuffs, parallelizes and NRZI decodes raw incoming data
bits, and of a SIE transmitter that does the exact opposite with the outgoing bits.
• Receive and Transmit FIFOs: Seen in a dotted light blue box in the figure above.
Implemented as First-In-First-Out buffers. We use the receive FIFO to hold the data payload
of incoming data packets. These will be read later by the software layer. The transmit FIFO
holds the data payload of data packets to be transmitted, these are loaded with data by the
higher software layer prior to a transaction.
• Host Controller: Seen in a dotted grey box in the figure above.
Operates at the transaction and packet level in contrast to the packet and bit-level at which
the USB Serial Interface Engine operates. It manages all transactions, sends packets and
waits for response packets, and notifies the software layer when a transaction is done.
• Bus Interface: Seen in a dotted red box in the figure above
Selects and enables either one of the Receive FIFO, Transmit FIFO or Host Controller by
processing the address it receives. It also generates the USB clock from the Bus clock (clk_i),
the former being 4 times slower than the latter and processed by the majority of the
components in the USB Core.
• Calculate Reset: Seen in a dotted blue box in the figure above
Calculates different reset signals for each of the USB clock and the Bus clock. The latter
should last 4 times more clock cycles than the former.
24
Note that our USB Core is connected to the Microblaze microprocessor as a slave on the On-Chip
Peripheral Bus which is designed for easy connection of the USB peripheral device. It provides a
common design point for various on-chip peripherals.
In the following sections, we provide a detailed description of each component.
3.2.5 Signal Definitions
Name IN/ OUT Description
clk_i IN The bus clock linked to the system clock of the FPGA on the board.
rst_i IN Resets all components if active-high.
address_i [7:0] IN Input Address
data_i[7:0] IN Input Data
we_i IN Write Enable
strobe_i IN Indicates the start of a bus cycle period
data_o [7:0] OUT Output data corresponding to the address_i input
hostSOFSentIntOut OUT Active-high when a SOF transmission occurs
hostConnEventIntOut OUT Active-high when a connect or disconnect of the device occurs
hostResumeIntOut OUT Active-high when a resume state of the device is detected
hostTransDoneIntOut OUT Active-high when a transaction is complete
USBSpeed [1:0] OUT Speed of the attached device (low, full or high)
Table 1: Signals
3.2.6 Registers
These registers are the ones accessed through the signals address_i and data_i above. They provide
the mode of communication between the software layer that implements the Host Controller Driver
and the VHDL code that implements the USB core. By writing to and reading from these registers, the
C code sends control or configuration information to the Host Controller or data to the Transmit FIFO.
These registers also enable the software layer that implements the Host Controller Driver to read data
located in the Receive FIFOs, or to read data from the Host Controller in order to check whether the
Host Controller has processed the right configuration information.
25
In fact, in our implementation, the values of these registers are stored in VHDL and not in on-chip or
off-chip memory. The VHDL code reads the address and data values sent on the OPB bus, it decodes
this information to take one out of a set of actions. The details of how the VHDL code deals with these
values are described in the table below.
Register name Register
Address Bit position
Name Description
TRANSREQ_PREEN_SOFSYNC
0x00 1 TRANS_REQ Set to 1 to enable a transaction, 0 to disable it.
2 SOF_SYNC Set to 1 to synchronize transaction with end of SOF transmission.
3 PRE_EN Set to 1 to enable preamble packets.
SOF_ENABLE 0x01 1 SOF_EN Set to 1 to enable automatic transmission of SOF packets
FRAMENUM_MSB 0x02 [2:0] FRAMENUM_MSB
Most significant part of the frame number in SOF transmission
FRAMENUM_LSB 0x03 [7:0] FRAMENUM_LSB
Least significant part of the frame number in SOF transmission
CONNECT_STATE 0x04 [1:0] CONNECT_STATE
If 00, then we the device is at a disconnected state. If 01, low-speed state, if 10 , full-speed, if 11 high-speed
DEVICE_ADDRESS 0x05 [6:0] DEV_ADDR USB Device Address ENDPOINT_ADDRESS 0x06 [3:0] END_ADDR USB Device Endpoint Address TRANSACTION_TYPE 0x07 [1:0] TRANS_
TYPE Setup=0, IN=1, OUT0=2, OUT1=3 To specify the transaction type required,
INTERRUPT_STATUS 0x08 0 TRANS_ DONE_INT
Set to1 when a transaction is complete
1 RESUME_ INT
Set to 1 when resume state is detected.
2 CONNECTION_EVENT_INT
Set to 1 when a connect or disconnect occurs
3 SOF_ SENT_INT
Set to 1 when a SOF transmission occurs.
INTERRUPT_MASK 0x09 0 TRANS_ DONE_INT
Set to1 to enable interrupt when a transaction complete
1 RESUME_INT
Set to1 to enable interrupt when resume state is detected.
2 CONNECTION_EVENT_INT
Set to1 to enable interrupt when a connect or disconnect occurs
3 SOF_SENT_INT
Set to1 to enable interrupt when a SOF transmission occurs.
PID 0xa [3:0] RX_PID Packet ID of the last packet
26
received STATUS 0xb 0 CRC_
ERROR When set to 1, indicates CRC error in the last transaction.
1 BIT_STUFF_ERROR
When set to 1,indicates bit stuffing error in the last transaction
2 OVERFLOW When set to 1, indicates that the receive FIFO is full and cannot accept anymore of the incoming data.
3 TIME_OUT When set to 1, indicates no response from USB device.
4 NAK When set to 1, indicates that NAK has been received in response of the last packet sent.
5 STALL When set to 1, indicates that STALL has been received in response of the last packet sent.
6 ACK When set to 1, indicates that ACK has been received in response of the last packet sent.
7 DATA_ SEQUENCE OR NYET
Indicates the sequence number of the last packet received in case of IN transaction, or if it is a handshake packet, this indicates whether it is a NYET.
LINE_CONTROL_INFO 0xc [1:0] LINE_ STATE
If direct control is enabled, LINE_STATE directly controls the state of the physical wires.
2 DIRECT_ CNTR
Set to 1 to enable direct control of the USB physical wires
[3:4] LINE_ POLARITY_BIT
If 00, enables low-speed line polarity, if 01 full-speed line polarity, if 10 high-speed line polarity.
RX_FIFO_DATA 0x20 [7:0] RX_FIFO_ DATA
Contains the receive payload of the last IN Transaction
RX_FIFO_DATA_NUM_MSB
0x21 [7:0] RX_FIFO_DATA_NUM_MSB
Most significant byte of the number of elements in the receive FIFO
RX_FIFO_DATA_NUM_LSB
0x22 [7:0] RX_FIFO_DATA_NUM_LSB
Least significant byte of the number of elements in the receive FIFO
RX_FIFO_RESET 0x23 0 FORCE_ EMPTY
When set to 1, deletes all data in the receive FIFO
TX_FIFO_DATA 0x30 [7:0] TX_FIFO_ DATA
Contains the transmit payload of the last OUT Transaction
TX_FIFO_RESET 0x31 0 FORCE_ EMPTY
When set to 1, deletes all data in the transmit FIFO
Table 2: Register Description
27
3.2.7 Choice of Design
In our quest for an appropriate block diagram for the USB host controller we came across a number of
different implementation designs, such as: ISP1760; Embedded Hi-Speed USB host controller,
ISP1563; Hi-Speed Universal Serial Bus PCI Host Controller from Philips, SL811HS; Embedded USB
Host/Slave Controller. Many of these implementations had very low level and complicated block
diagrams and/or included more features and functionalities than what was needed for our project. The
design we settled for is as simple as possible implementing just the functionalities we need. Below is a
table including a list of examples that contrast our choice of design to an alternative one along with the
reason of our design choice.
Our Choice Alternative Design Reason of Choice
Implement only the host controller functionality.
Include slave controller functionality along with that of the host controller.
For simplicity purposes. In future work, slave functionality may be added.
Processor interface not designed to satisfy common standards among other interfaces. Satisfy only the OPB bus Protocol.
Implement a processor interface which follows certain common standards (e.g. Wishbone-compatible).
Other design alternatives have a processor interface to many kinds of microprocessors, microcontrollers, or directly to a variety of buses such as ISA, PCMCIA. Whereas our design only needs to have an interface to the Microblaze processor.
Host Controller can be interfaced directly via 8 bits of its data bus and 8 bits of its address bus.
Provide an 8-bit bidirectional data path along with appropriate control lines to interface to external processors or controllers. Access to memory and control register space is a simple two step process, requiring an address Write (set a certain control line called A0 to “0”) followed by a register/memory Read or Write cycle with address line A0 = “1.”
Simpler design implementation.
Table 3: Compare alternative designs with our design features
28
In order to build the USB High Speed core, we had to implement the USB 2.0 specification, which only
specifies the language that high speed USB speaks but provides no details of implementation.
Therefore, our first target was to complete the VHDL code that implements the USB High Speed
protocol specification. As part of the specification, our core is supposed to be backward compatible
with all three speeds of USB devices (low speed, full speed, high speed). This is because a high
speed USB port residing on a host computer is expected to succeed in initiating communication with
all USB devices, irrespective of their speed of operation. In this chapter, we will describe the
implemented design of the core that has been written in VHDL.
Our VHDL code, which implements a host USB IP core, is composed of two main blocks as seen in
the figure below: The Host controller component and the SIE component. The SIE component itself is
divided into the Transmitter and Receiver components. The core interfaces to the Host Controller
Driver implemented in software from the upper part and to the physical USB port from lower part. Note
that raw bits of 0 and 1 are sent on the USB port as seen in the figure below.
Figure 7: Overall System
4.0 Implementation
29
4.1 USB Core General Implementation
When a USB device initiates communication with the host computer, the receiver reads the bits that
are on the USB cable at the correct speed, decoding whether these bits represent a certain state
(such as start of packet, end of packet, idle) or certain fields that are part of USB packets and
transferring this information to the Host Controller. To achieve this goal, the receiver reads serial data
at one of the USB speeds, and it converts it to bytes which it sends to the Host Controller. Note that
between receiving raw bits on the wire and sending bytes to the HC a lot of processing is done by the
receiver block. For example, for a packet with a CRC, it recomputes the expected CRC and checks if it
is equal to the one received. If it is not, it reports an error to the HC. The receiver also removes the bit
stuffing that had been performed by the USB device, because the bytes that go to the HC must be
pure of bit stuffing. Moreover, the receiver detects the speed of the USB device upon connection and it
provides this information to all other components. For example, if it decodes that a high speed device
was connected, it sends this to both the HC and the Transmitter. The Transmitter will then only send
to the USB device at High Speed. As for the HC, this is the component that initiates and controls the
progress of all transactions. So, it would need to know whether a device is high speed so that it sends
a Start of Frame packet more often that in case the speed was low or full. These are just a few
examples to explain the sort of communication that happens when raw bits are received on the wire.
Now, assuming that the receiver has decoded what speed the connected device is running at, the core
is supposed to have an initial transaction with the USB device. A transaction consists of several
packets. The host controller initiates and controls the progress of all transactions. As input to the host
controller, we specify the type of transaction that the Host Controller Driver (HCD) wishes to make with
the USB device as well as other needed fields such as the address of the device. The host controller,
knowing what transaction is to be sent will command the transmitter to send the packets that make up
that type of transaction and then it will wait for a response from the receiver which indicates a
handshake from the device or a timeout indicating a lack of handshake. For example, let us take the
case of having a setup transaction with the device at startup. When the HC realizes that it needs to
initiate a setup transaction, it enters a state machine in which it sequentially sends a setup token
packet, a data packet and waits for a handshake. The setup token packet simply contains the address
30
of the targeted device and is an indicator that the following packet is used to configure setup
information. The data packet contains the setup information itself. So the host controller indicates to
the transmitter component that a setup token packet must be send to device with address x. The
transmitter, receiving this information from the HC, implements the details of the physical USB
protocol. For example, when the transmitter receives a command to send a setup token packet, it
cannot simply send the received bits for the packet on the line. The Transmitter needs to do a whole
lot of processing before sending the packet. The transmitter first sends a START OF PACKET
sequence on the line, serializes the bytes that it receives from the HC into bits, calculates the CRC
over the packet, performs bit stuffing and then it sends the resulting bits on the USB cable. Then,
realizing that the setup token packet has been sent, it sends an END OF PACKET sequence on the
line. Note that all this processing was still for the first packet sent. A similar sequence happens for the
data packet in the Transmitter. Now that both token and data packets have been sent, the core is in a
state of waiting for some sort of response from the device regarding the packets that were sent
previously. Two possible cases can occur. The device will either respond with a handshake packet or
it will not respond. The receiver block waits “listening” attentively on the USB cable. If it receives no
bits for a certain amount of time specified by the protocol, it reports a timeout to the HC. The host
controller has as output a timeout interrupt signal that it sets in this case. This signal is to be handled
by the (HCD) which should try to send the transaction another time. On the other hand, if the receiver
starts receiving bits, it decodes them to find out whether they make a handshake packet. If yes, it
reports to the HC that a handshake packet has been received from the device. The HC at this stage
has completed the whole transaction, so it interrupts the Host Controller layer to say that the
transaction is complete.
The explanation above is a very low detail and high level view of the interaction that happens between
the components within the core, the software layer and the USB device. The implementation was quite
tedious as it required us to take care of so many cases of transactions and packets and also to have
the core support all three speeds, each of which has different signaling rate and different transfer
protocols.
31
Our first step was to write the IP core in VHDL that implements the hardware and tests it. We defined
the interface to our core and made sure that it was working properly as a black box.
As for the HCD software layer, to have it fully working, all cases implemented in the core should be
covered which in fact is implemented as a protocol on its own called enhanced host controller
interface (EHCI). We concerned ourselves with writing a case that validates that the VHDL core is
working but it is not comprehensive. We will explain in what follows what we have implemented and
the future work that must be done in orderto complete the core to have it communicate with a USB
device from application level software.
When we described our VHDL core in the previous section, we only specified the main high level
blocks such as Host Controller, Transmitter and Receiver. This was to give the reader an
understanding of the overall functionality of the core. However, the code is pretty detailed since it
implements most of the USB protocol. In fact, the core is composed of many more components that
are sub-components of the previously mentioned ones. In the discussion below, we will explain the
block level design of each of these blocks
4.2 Host Controller The USB Host Controller is the main block in the USB Core that manages all outgoing and incoming
transactions. The different components in the USB Host Controller can be logically divided into 2
parts: those that manage all outgoing data transfers (Host Controller Arbiter, Control SOF, Send SOF,
HCA&SOF MUX, Check Preamble, Transmit Packet, Direct Control, SOF DC TxPacket MUX) and
those that manage all incoming data transfers (Host Controller Arbiter, Receive Packet, Interrupt
Generator). The USB Host Controller further contains a component that provides it with an interface to
the bus. The figure below depicts the block diagram of the USB Host Controller along with all its sub-
components.
32
Figure 8: Host Controller The USB Host Controller processes all control information sent by the software layer; whether
automatic transmission of SOF and PREAMBLE packets is enabled or not, the type of transaction
required and sends this information to the addressed components within the USB core, as a first step.
It also sends information about the transaction that is taking place or the device the host is attached
to, back to the software layer; the speed of the device, the kind of handshake received and so forth.
This component also has the function of sending interrupts to the software layer when a transaction is
done, a SOF is sent, resume is detected or the connection state of the USB physical line is changed
(the possible states being, DISCONNECTED, LOW-SPEED, FULL-SPEED or HIGH SPEED).
Depending on the type of transaction required (IN, OUT0, OUT1 or SETUP) it takes care of sending
appropriate packets to the device (token, data or handshake). In case automatic transmission of SOF
and/or PREAMBLE packets is enabled, it sends SOF packets at the start of each frame, or
PREAMBLE packets before each data or token packet.
Also, in case the software layer has also enabled direct control of the USB physical wires, it takes care
of sending to the device, the state of the line as specified within the control information sent by the
software layer.
33
The USB Host Controller stores the payload of the data packet it receives from the SIE Receiver and
therefore from the device, in the Receive FIFO, to be read later by the software layer. It also packages
the data in the Transmit FIFO as part of the payload of the data packet within an OUT or SETUP
transaction to be sent to the SIE Transmitter and consequently to the device.
Host Controller Bus Interface
The Host Controller Bus Interface interfaces between the Host Controller component and the Bus
Interface. Its job is to synchronize between the USB clock and the bus clock. It has a 4-bit address as
input, this address represents the address of the register whose content is in the 8 bit dataIn signal,
the Host Controller Bus Interface, divides this input data and assigns it to the appropriate signals or
assigns it as a whole to the dataOut output, to be sent to different components of the host controller or
other components of the wrapper.
Host Controller Arbiter
This Host Controller Arbiter checks whether a transaction is required (transReq bit set by software
layer components), then checks what is the transaction type required by the upper-level: SETUP, IN,
OUT0, OUT1. Accordingly, it sets the PID of the packets that are to be sent and asks for a turn from
the HCA&SOF MUX to send the packet, or it enables the Receive Packet component to read incoming
packets.
• In case an IN transaction is required, the packet ID is set to IN, it waits till this packet is sent,
and then that a packet is received from the device, after which it sets the id of the packet ACK,
and then notifies the upper level that the required transaction is done.
• In case a SETUP transaction is required, it first sets the packet ID to SETUP, it waits till the
packet is sent, then it sets the packet ID to DATA0, waits till the packet is sent and an ACK is
received, then notifies the upper level that the required transaction is done.
• In case an OUT0 transaction is required, if it had received a NYET, as a response for the
previous transaction, it sets the packet ID to PING, otherwise it sets it to OUT, waits till the
34
packet is sent, then sets the id to DATA0, wait till it’s sent and an ACK is received. Finally
notifies the upper level that the required transaction is done.
• In case an OUT1 transaction is required, it sets the packet ID to OUT, waits till the packet is
sent, then sets the id to DATA0, wait till it’s sent and an ACK is received, then notifies the
upper level that the required transaction is done.
Control SOF
This component keeps track of a timer for SOF. This timer is then used by the Send SOF component.
Send SOF
When the timer for SOF, given by the SOF Controller, reaches a certain value (which differs in low
speed and high speed) it notifies that a Start of Frame (SOF) packet needs to be transmitted.
HCA&SOF MUX
This block arbitrates between requests from the Host Controller Arbiter and the Send SOF
components both of which want to send packets. Send SOF wants to transmit SOF packets whereas
Host Controller Arbiter wants to transmit packets with any other PID. The block gives priority to the
Send SOF because the SOF packet needs to be transmitted first.
Check Preamble
As soon as there are packets that need to be sent, this block first checks if the software layer
components have enabled automatic transmission of preamble packets. If so, it waits until the
Transmit Packet component is ready to send packets, then it signals to it that a PREAMBLE packet
needs to be sent. In case the higher-level components have not enabled automatic transmission of
preamble packets, or after the PREAMBLE packet is sent, it signals the Transmit Packet component
that a packet needs to be sent, and forwards the packet’s ID with the value provided by the HCA&SOF
MUX : either SOF or any other PID provided by the Host Controller Arbiter component itself.
Note that PREAMBLE is only sent in low and full-speed transmissions before any token, data or
handshake packet.
35
Transmit Packet
It acts according to the packet ID (PID) it receives from the Check Preamble. It checks if the PID is
SOF and the device it is attached to operates at low speed, in that case, it sends a KEEP_ALIVE
control signal to the SIE. In fact, low speed devices do not see SOF packages, this KEEP_ALIVE
signal plays the same role as SOF packages for low speed devices; it keeps low-speed device from
entering the Suspend state.
In case the packet ID is not SOF and at the same time the attached device is not low-speed, it sends a
TX_PACKET_START control signal to the SIE Transmitter, then distinguishes between the data and
token packets along with their PID types:
• If the Packet ID is either DATA0 or DATA1, it reads data from the Transmit FIFO, and sends
this data to the SIE Transmitter along with a control signal called TX_PACKET_STREAM to
indicate it is sending data. When it has read all the data from the Transmit FIFO, it sends a
TX_PACKET_STOP control signal to indicate that it has finished sending data.
• If the Packet ID is SOF it sends the frame number to the SIE Transmitter, along with a control
signal called TX_PACKET_STREAM, it also increases the frame number
• If the Packet ID is either IN, OUT, SETUP, it sends the device endpoint number and address
along with a control signal called TX_PACKET_STOP, to indicate the end of the packet.
Direct Control
The Direct Control block checks if the higher-level components allow direct control of the state of the
USB physical wires, if so it requests the direct control line state specified by the upper-level
components along with a control signal TX_DIRECT_CONTROL (to describe the data it is sending) to
be sent to the SIE Transmitter. In case direct control is not enabled, it simply sends a control signal
called TX_IDLE to the SIE Transmitter.
36
SOF DC TxPacket MUX
This block acts as a multiplexer between the Control SOF, Transmit Packet and Direct Control to
using the Transmit port of the host controller in order to send packets. It gives priority is given first to
the Control SOF, then to the Transmit Packet and finally to Direct Control components.
Receive Packet
The Receive Packet block first checks whether the incoming data is valid, then whether the PID is
HANDSHAKE or DATA. If it is a HANDSHAKE packet, it sends to the Host Controller Arbiter
information it received from the SIE about the packet (errors in CRC, Overflow, RX Time Out and the
data sequence). In case it is a DATA packet, as long as the incoming data is valid, it reads it in and
sends it to the receive FIFO. However at some point it checks whether the Receive FIFO is full, in that
case it delays incoming received data in the FIFO until there is some space in the FIFO.
Interrupt Generator
Interrupts the higher-level components in case the connection state is changed (the possible states
being disconnected, low speed, full speed or high speed) or resume is detected by the SIE.
SpeedCtrlMux
This block sends the speed at which signaling with the USB device should occur to the Transmitter.
4.3 Transmitter The transmitter block, which is a sub-component of the Serial Interface Engine (SIE) block, takes as
input signals from the host controller and provides as output bits to be sent on the USB port towards
the USB device. The transmitter consists of sub-components each of which has a specific function
designed to support high speed, full speed and low speed USB communication. The figure below lists
the subcomponents within the transmitter which are: Data States, Transmit Controller, Token CRC,
Data CRC, Bit Stuffer and Serializer, Direct Bits, Bit Stuffer and Serializer and Direct Bits MUX and
USB Write. In what follows, we will describe each component in more detail.
37
Figure 9: Transmitter Data States The USB transfers signals and power over a four-wire cable. The signaling occurs over the two wires
D+ and D- while power is provided through VBUS and GND wires on each segment to deliver power to
devices.
Figure 10: USB Cable When transferring data, there are 4 possible states on the bus:
D+ D- Differential 0 0 1 Differential 1 1 0 Single-Ended-Zero 0 0 Single-Ended-One 1 1
Table 4: Bus States
38
In addition to the bus states mentioned above, which are defined by voltages on the lines, USB also
defines two Data bus states, J and K. The J and K data states are the two logical levels used to
communicate differential data in the system. These are defined by whether the bus state is Differential
1 or 0 and whether the cable segment is low or full or high speed.
Data States
Bus States Low Speed Full Speed High Speed Differential 0 J K K Differential 1 K J J
Table 5: Speed and Data States
Figure 11: Speed and Data States
The reason that J and K states are defined in this manner is so that one standard terminology can be
used to describe a logic state on the USB cable although the actual voltages on the Differential 0 and
1 lines are different. For example, a Start-of-Packet (SOP) state exists when the bus toggles between
the J and K states. On a high/full speed line, this means that D- becomes more positive than D+, while
on a low-speed segment, it means that D+ becomes more positive than D-.
Now that we know what the protocol requires of us, we can explain the functionality of Data States.
This is a very simple block which takes as input the speed of the USB device that we are connected to
and depending on that, it sets the J and K data states to either Differential 1 or Differential 0. In all
blocks that follow, we just use the J and K states without having to deal with Differential 0 and 1.
Transmit Controller
This block is at the heart of the transmitter block and is the most involved in controlling what states all
the other blocks in the transmitter will be in. It receives as input bytes from the host controller. It
compares the first byte that it receives to a constant to figure out whether a token, data, handshake or
special packet is to be sent. This byte is basically the packet id of the corresponding packet. Now, for
D+
D-
0
Differential 0
1
1
Differential 1
0
Data StatesJ (LS)K (FS/HS)
Data StatesJ (FS/HS)K (LS)
39
each of the four types of packets, it enters into a sequence of states whereby it accepts from the HC
the remaining fields of the packet, sends these fields to the Data CRC and Token CRC blocks, sends
the bytes to the Bit-Stuffer and Serializer then appends the CRC (if applicable) to the end of the
packet and sends it to be serialized and bit stuffed.
For a data packet, it first receives the first byte which is the packet id. From this it decodes that this is
a data packet. It sends this packet to the Bit Stuffer and Serializer along with control information to
indicate that this is the first byte of the packet. The Bit Stuffer and Serializer sends a start of packet
sequence before going on to bit stuff and serialize the packet id. Knowing that a data packet can have
a multiple of bytes after the packet id it goes on to a state waiting for the HC to write the data byte into
it. Now along with this byte comes control information that informs the Transmitter Controller whether
this is the last data byte or there is more to come. If this is the last, it appends the CRC value
computed by Data CRC component and it sends all this to the Bit Stuffer and Serializer to be further
processed. If this is NOT the last byte, it waits for another byte and stays in this loop until the last byte
is received.
For a token packet, it first receives the first byte which is the packet id and decodes that this is a token
packet. Knowing that a token packet has a packet id, followed by an address field followed by an
endpoint field, it waits in different states until it receives the remaining two bytes. Since this is the last
field in the packet that will be sent from the host controller, it reads the CRC value calculated from
Token CRC, appends it to the packet and then sends all this to the Bit Stuffer and Serializer to be
further processed.
For a handshake/special packet, it first receives the first byte which is the packet id. From this it knows
that this is a handshake/special packet. Knowing that a handshake/special packet has only a packet id
it sends this to the Bit Stuffer and Serializer to be further processed.
Figure 12: Handshake Packet
40
Data CRC A CRC is a cyclic redundancy check performed on data to see if an error has occurred in reading or
writing the data. The result of a CRC is transmitted with the checked data. At the receiving end, the
transmitted result is compared to the CRC calculated for the data to determine if an error has
occurred. The goal in inserting a CRC as part of the packet is to maximize the probability of detecting
errors using only a small number of redundant bits. The Divisor polynomial used to generate the CRC
is C(X) = X16 + X15 + X2 + 1.
When a data packet is sent (shown below) a special Data CRC of 16 bits for it is calculated. Note that
the PID is not included in the CRC check. The data CRC only covers the data field of the Data packet.
The Data CRC block has the function of generating the CRC over all the data fields that are
sequentially input to it by the Transmit Controller. When the Transmit Controller has sent all the bytes
of the DATA field to the Data CRC block, it reads the resulting 16 bit CRC and it sends it to the
Serializer and Bit Stuffer block to further process the packet.
Figure 13: Data Packet Token CRC
Similarly to the Data CRC block discussed above, TOKEN type packets are protected with a 5 bit
CRC. In this case, the function of the Token CRC block is to generate the CRC for a TOKEN packet.
The Divisor polynomial used to generate the CRC is C(X) = X5 + X2 + 1.
When a token packet is sent (shown below) a special Token CRC for it is calculated. Note that the PID
is not included in the CRC check. The Token CRC block has the function of generating the 5 bit CRC
over the ADDR and ENDP fields input to it sequentially by the Transmit Controller. It is later
appended to the end of the packet and sent as part of it.
Figure 14: Token Packet
41
Encoder, Bit Stuffer and Serializer This block has a multitude of functions that it completes. It receives two types of commands from the
Transmit Controller. The first is that it receives a command to send a byte of a packet and the second
case is that it receives control commands to send a special sequence on the USB cable that defines a
USB Bus State (Idle, Start of Packet, End of Packet and so forth). For example, when the Transmit
Controller wants to send a Data Packet, it sends the packet id of the data packet to this component
along with control information specifying that this is the start of the packet. This block has been
designed to automatically send at its output serial data that corresponds to the Start of Packet
sequence defined by the specification.
The encoding format used by the USB protocol is called Non-Return to Zero Inverted (NRZI) where a
“1” is represented by no change in level and a “0” is represented by a change in level. A string of zeros
causes the NRZI data to toggle each bit time. On the other hand, a string of ones causes long periods
with no transitions in the data. The figure below shows a data stream and the NRZI equivalent.
Figure 15: NRZI Encoding
Once we have NRZI encoded the data, we need to be able to send it on a physical USB cable,
specifically on the D+ and D- wires which were described before. For the sequence shown above, the
data sent to the USB cable for a high speed device would be JKKJJKKJKKKKJ. Note a Differential 1 is
a J in high speed.
Back to sending the data packet, after the start of packet (SOP) sequence is sent the bytes of PID,
DATA and CRC are bit stuffed and encoded. The protocol defines bit stuffing as the insertion of a zero
after every six consecutive ones in the data stream before the data is encoded. Note that if the data to
be sent included a sequence of 7 or more consecutive one, such as
42
011111110 then the data sent on the USB cable without bit stuffing would be JKKKKKKKJ. With bit
stuffing, we insert a 0 after six ones to get 0111111010 and it would be sent as JKKKKKKJJK. In the
USB protocol, the host and device do not share any clock and thus bit stuffing, which forces a toggle
in the data sent, ensures that the receiver remains synchronized with the transmitter without the
overhead of sending a separate clock signal or Start and Stop bits with each byte.
With the last byte of the Data packet (CRC byte), the block receives control information stating that
this is the end of the packet so it inserts the End of Packet (EOP) sequence on the line.
The output of this block is serial data that has undergone bit stuffing and encoding and is ready to be
sent on the USB wire. However, in order to have the data written on the wire at the correct speed, we
need to have a few other blocks that manage this.
Encoder, Bit Stuffer and Serializer and Transmit Controller MUX The component above takes care of sending bus states and USB packets on the USB cable. The
actual bits to be sent are calculated as the packet passes through the blocks of the Transmitter. The
software layer simply needs to provide the core with the type of packet and the values of fields in the
packet and the Transmitter along with the Host Controller take care of sending the packet in
accordance with the details of the specification.
Apart from this functionality, our VHDL core has been designed to allow the software layer to place
specific bits on the line, where these bits do not correspond to packet related information. To achieve
this functionality, the Transmit Controller has serial outputs through which it can output the specific bits
requested by the software layer. Note that these are predefined serial bits and need not be bit stuffed
and encoded. So, these are directly sent from the Transmit Controller to the MUX block. The MUX
block receives serial inputs from the Transmit Controller and the Encoder, Bit Stuffer and Serializer
blocks. The inputs from the Encoder, Bit Stuffer and Serializer
Have priority to be sent first since a packet cannot be interrupted in the middle to send a desired
sequence. This is a very simple block that simply forwards the inputs from one of the two blocks to the
USB Write block.
43
USB Write This block is the final block before previously processed data is actually sent on the physical USB
Wire. It maintains an input buffer to accept data that it receives and also manages an output buffer
that is responsible for writing data at the specified speed to the physical USB wire. We have
implemented these buffers as FIFO buffers so that the sequence of bits transmitted remains as it was
supposed to be.
Since our core supports all three speeds, this block should be able to write to the line at the rates of
1.5Mbps (LS), 12 Mbps (FS) and 480Mbps (HS). With a 960MHz input clock, we implemented
counters that enable this block to write at all three speeds. We implemented this as a 7 bit counter.
We can send bits at HS each time the LSB of the counter rolls over. This will divide the input clock by
two to get a 480Mbps. To send at FS, we wait for the 4 LSBs of the counter to roll over 5 times. This
will divide our clock by 24*5=80 and thus we can write at 960/80=12Mbps. To send at LS, we wait for
the 7 bits of the counter to roll over 5 times. This will divide our clock by 27*5=640 and thus we can
write at 960/640=1.5Mbps.
4.4 Receiver The receiver block, which is a sub-component of the Serial Interface Engine (SIE) block, takes as
input signals directly from the USB wire. The two inputs are the D+ and the D- signals. The main
function of the receiver is to convert the bits it is receiving into bytes which will then be analyzed and
given to the Host Controller (HC). Before it sends the bytes to the HC, it checks for CRC errors and bit
stuff errors. The Receiver has another very important task; it is responsible for detecting the speeds
of the connecting devices. Every device that works at a certain speed (low, full or high) will signal the
receiver giving it the needed data to determine the speed and thus the receiver will find out what the
speed of a connecting device is and notify the whole core of this speed.
The figure below lists the subcomponents within the receiver which are: USB Read, Bit To Byte
Converter, Byte Analyzer And Detect Speed. In what follows, we will describe each component in
more detail.
44
Figure 16: Receiver USB read
This is the lowest level component of the receiver block. Its function is to read the 1’s and 0’s from the
USB wire which contains the D+, and D-. This component can read the input at low speed, full speed
or high speed. The main concept behind reading the input is as follows: There is a 7 bit counter i that
is incremented at every rising edge of the clock. If we are working with high speed, then we will take in
a new input from the system whenever the least significant bit of i is 0. If we look at the figure below,
the first signal is the clock, the second is the counter i and the third signal is the high speed data in
tick. The high speed data in tick has a period of 2.0833 ns and this is the rate at which we take in
inputs. Thus, we will read from the wire whenever the high speed data in tick goes from 0 to 1. Note
that the high speed data in tick is derived from the least significant bit of the counter i.
Figure 17: High Speed Data in Tick
45
The full speed and low speed rates have data in ticks that are 40 times and 320 times slower
respectively and thus data will be read from the USB wire at those slower speeds. To toggle the full
speed rate data in tick, we would wait until the four least significant bits of the counter i become 0000
5 times. For achieving low speed, we would wait for the seven bits of i to become 0000000 5 times.
This component also takes into account metastabilty issues using a very simple method. To solve the
problem of not reading data whenever the USB wire is changing abruptly, we simply reset the counter
whenever the input changes. This way, we will never take in bits if they have not been on the line for
2.0833 ns. Whenever a new bit is read from the USB wire, it is first written to a buffer and then output
from this component. Whenever a new bit is output, we signal to all the other components that we
have received a new bit and thus this is the only component that will have to deal with timings. The
rest of the components will be waiting for the data out tick to toggle and will thus know whether a new
bit has entered the system.
In addition to taking in inputs and outputting them whilst setting the data out bit to 1, this component
also checks if a no activity time out has occurred and outputs this signal to other higher level
components such as the HC.
Bit to Byte Converter This component deals with converting the bits into bytes and sending the bytes to the Byte Analyzer
component. It monitors the output signal from the USB Read component and thus knows that a new
bit has entered whenever the data out signal becomes 1. The function of this component is to combine
every 8 bits using the NRZI decoding mechanism, forming a byte, and outputting the byte. In addition
to NRZI decoding, this component performs bit de-stuffing. The NRZI decoding is performed as
follows: when a bit is received, it is compared to the previous bit that was received. If they are
different, then a 0 is inserted into a byte, otherwise a 1 is inserted. This whole process is repeated 8
times until a byte is formed. Let us take an example; we receive 8 new bits. Note that the input can be
either a differential 0, a differential 1, a single ended 0 or a single ended 1.If the input was J, J, K, K, J,
J, J, K, and let us assume that the input that we had before those 8 inputs were received was a J.
46
The byte that will be output is formed as follows:
Received Bit: 00000000
J: 10000000 J: 11000000 K: 01100000 K: 10110000 J: 01011000 J: 10101100 J: 11010110 K: 01101011
Thus the byte received is 01101011. This byte will then be analyzed by the Byte Analyzer which is the
next component that we will be discussing.
Byte Analyzer
This component is responsible for analyzing the bytes that were formed in the Bit to Byte Converter
component. It is also responsible for calculating the CRC and comparing it with the CRC that was
sent with the packet. Whenever a byte is sent to this component from the Bit to Byte Converter
component, a data out signal is set to 1. Thus, this component waits for an input by monitoring the
data out bit from Bit to Byte Converter component. The first byte that we wait for is the start of packet
(SOP) which is 10000000 for low/full sped and 1000000000000000000000000000000 for high speed.
Note that this is taking into consideration that we receive the bytes in little endian order. Thus, this
component needs to know the speed at which we are working with. After the SOP is received we enter
a state where we wait for the next byte to come. When the next byte after the SOP is received, the
byte is analyzed and the PID field is checked to see if it is a special, token, handshake or data packet.
From the PID field we should know what bytes to expect next. Finally after all the bytes are sent to us,
an end of packet byte (EOP) which is 00000000 for low/full speed and 1111111 for high speed will be
received. After the EOP is received, this component will signal the upper components that a full
transaction has been received and will output the data to the upper components. For example, let us
simulate receiving an ACK from a low speed device. The bytes that should be received are SOP:
10000000, ACK: 00101101, EOP: 00000000. After receiving the 00101101 which contains the PID of
the ACK, we will know that we should not expect to receive anything else since this is only an ACK
Table 6: Bit to Byte Conversion
47
transaction and so we should expect an EOP. After receiving those three bytes, the Byte Analyzer
component will tell the HC (Host Controller) that an ACK has been received. If for example, instead of
receiving an ACK we are receiving a data packet, then the EOP field will tell us when the data stream
has ended.
Detect Speed
This component will detect the speed at which a connected device is sending us bits. Let us assume
that a low speed device is connected to our receiver. The first thing it does is that it sends a J bit (01)
for a specific amount of time (2.5 ns). After the 2.5 ns has elapsed with the J bit as an input, the rest of
the components will be signaled and told that they should work at low speed. For full speed detection,
a K bit (10) has to be received for 2.5 ns. We are left with the detection of high speed which is a bit
more complex and requires some interaction between the device and the Detect Speed component.
Once a high speed device is connected, it will always connect at full speed. That is, it will send a K bit
for 25 ns and thus establish a full speed connection. It will then wait to be reset. Once reset it will send
01 for a certain amount of time. This 01 will confirm that it is indeed a high speed component. After
Detect Speed receives the 01 it will send it a sequence of bits to tell the device that it is high speed
compatible and has accepted the 01. To summarize the above procedure, this is what happens in this
component for high speed connections.
• A high speed device is connected.
• The device sends a “10” for 2.5 ns,
• The receiver commands the transmitter block to send a “00” for 2.5 ns to reset the device.
• Once the device is reset, it will send a “01” for 2.5 ns.
• Once the receiver detects the “01”, it commands the transmitter to send a sequence of bits to
acknowledge the device.
4.5 Host Controller Driver
So far, all the blocks that we described were implemented in VHDL. Concerning the software layer
that should interact with the VHDL core, we wrote a C code that implements the sending of a complete
48
SETUP transaction. Such a transaction consists of the host sending token and data packets and
waiting for a handshake packet as a response from the device.
To test this software code, we would have to run it. It would automatically prompt the core to send
token and data packets to the USB device through the wire and then to wait for a response. Here, we
faced a problem whereby we could not link the output of the core (bits to be sent on the wire) to the
physical USB port that resides on the P160 additional module that we had planned to attach to the
Virtex VIIMB Development Board. The reason for this was that there was a physical chip that
interfaced to the USB pins. This chip was a RS232_USB Bridge Interface called Cygnal CP2101.
Figure 18: Cygnal CP2101 Thus, to use the USB physical port, we would have to send data according to the RS232 interface. We
had already completed a large part of the VHDL core that implements the USB specification and were
eager to test our system according to the USB specification, so we had to come up with an alternative
to ensure that our system implements its functionality on an FPGA.
Our alternative solution to this was to write a simple VHDL block that acts a device. That is, it
simulates the actions of the device at the bit level. We implemented this as a Finite State Machine that
has the states shown in the figure to the right.
49
The FSM waits until the bits relating to SOF and startup are sent by the VHDL core as if they were
being sent to an actual USB device. It moves to a state waiting to receive a setup token packet, then a
data packet. Once the receiver is at this stage, it should send a handshake packet. We simulated the
device by hard-coding the bits that the device would send to the core if it were to send an actual USB
handshake packet. We assumed that the device was sending an ACK Handshake packet which would
complete the transaction. Once the device simulator sends the ACK Handshake packet, the job of the
FSM is complete and the Receiver part of the VHDL Core comes takes action. The Receiver block of
the VHDL core receives these bits from the outputs of the device simulator, it processed the bits to
figure out that and ACK response has been received. It then informs the Host Controller that the
device responded with an ACK. The Host controller outputs this information to the software layer by
setting the Transaction Done bit to a 1. This way, the user who initiated a transaction from the
software layer by writing into a few registers can be informed that the transaction was completed by
reading the value of the transaction done bit.
Figure 19: FSM states
50
In order to verify the functionality of our code, we performed tests on each of the components
described in the Implementation chapters separately and combined all the system together and
performed more comprehensive tests. In implementing our VHDL system, we worked in parallel
developing the three main components (Host Controller, Transmitter, Receiver) after we had
understood how they need to interface together, then we tested each separately as a black box and
finally we proceeded to test the system as a whole to verify its functionality. Note that for all our test
cases provided below, we will describe them assuming a high speed device to avoid redundancy,
although the same cases work for low speed and full speed.
5.1 Host Controller Testing The tests done independently on the Host Controller will involve the four possible types of transactions
which are: SETUP, IN, OUT0 and OUT1. Note that all transactions are initiated by the host and that
every transaction consists of a number of packets.
Send SETUP Transaction
When the software layer requires a SETUP transaction, the Host Controller first sends a SETUP token
packet, then a data packet, the payload of which is read from the Transmit FIFO previously loaded by
the software layer, it then waits till a handshake packet is received from the device. By following the
states of the Host Controller Arbiter component (seen in a yellow box below in the simulation below),
we can see that first it waits till the SETUP token packet is sent, then that the data packet is sent,
finally it waits till a handshake packet is received, at that point it interrupts the software layer by setting
the TransDone signal to 1 (circled in red in the simulation below).
5.0 Evaluation
51
Figure 20: SETUP Transaction
Send OUT(0/1) Transaction
When the software layer requires an OUT transaction, first a token packet with PID equal to OUT is
sent, then a DATA(0/1) packet is sent, the payload of which is read from the Transmit FIFO previously
loaded by the upper-layer. The Host Controller core then waits till the device sends back a handshake
signal, at that instant it interrupts the software layer with a Transmission Done interrupt signal
(TransDone). The figure below shows the states that the Host Controller Arbiter component passes
through (show in a yellow box): it waits till the OUT token packet is sent, then it waits till the DATA0
packet is sent, finally that a handshake is received, it then sets the TransDone signal (circled in red
below). Note that, in case of an OUT0 transaction and a high speed device, if the previously received
handshake is a NYET, the host keeps on sending simply a PING token packet, without a following
data packet, until it received an ACK. Then it can start any other transaction.
Figure 21: OUT (0/1) Transaction
Send IN Transaction
When the software layer requires an IN transaction, first a token packet with PID equal to IN is sent to
indicate to the device that if he has packets to be send it can do so now. The Host Controller core then
waits till the device sends a data packet and when it does, it sends back a handshake packet to the
device in order to indicate that it processed the data packet. In case the host does not detect anything,
it sets the Time-Out bit, which is the 4th bit in the RxStatus signal. The software layer reads this signal
52
and sees that there is a Time-Out and initiates the IN transaction one more time. The figure below
deals with the second case where there is a time-out, it shows the states that the Host Controller
Arbiter component passes through (shown in a yellow box below): it waits till the IN token packet is
sent, then it waits till a DATA packet is received, which it doesn’t, finally the SIE Receiver detects a
time-out, and then the Host Controller Arbiter sets the TransDone bit (circled in red below) and sets
the 4th bit of the RxStatus signal to 1 (shown in a red box below).
Figure 22: IN Transaction 5.2 Transmitter Testing
The tests done independently on the Transmitter will have granularity of packets as this is the unit of
transfer that the Transmitter deals with. In other words, we will provide test cases that ensure proper
transmission of packets to the USB device.
Writing bits on the USB wire at the correct speed
The figure below is a screenshot of the Transmitter sending a Token Packet (SETUP). From the figure
above, we see that a bit is written every 2.084 ns. This verifies that our core can write at a speed of 1 /
2.084 ns≈ 480Mbps. This is the rate at which a high speed signaling occurs.
53
Figure 23: Writing to the USB wire at High Speed
In order to test the functionality of the Transmitter, we will display tests that were performed to send a
typical transaction to the USB device. As we explain each case, we will highlight how the details of the
protocol were tested.
Sending a Token Packet (SETUP):
The only inputs to the transmitter required to send a SETUP Token packet are the packet id, the
address and endpoint of the targeted device. All the steps described below are carried out by the
transmitter in the mentioned sequence.
1-Transmitter sends IDLE state
The start of a packet transmission requires the USB bus to be in an idle state. Therefore, prior to
sending a packet, the Transmitter sends an Idle state. In the figure above, we see the output signal
USBwirectrlout becoming one for the first time. This signifies that a bit is being written onto the USB
wire. When we check the value of the corresponding bit, we see that USBwiredataout is a 00 (Single
Ended Zero). This signifies an IDLE state on the bus and is necessary before we send a SYNC
pattern in the next step.
54
Figure 24: Idle State 2- Transmitter sends SYNCHRONIZATION (SYNC)
In the USB protocol, the host and devices do not share a clock. Thus, the device cannot identify when
the host will send a transition that signals the beginning of a new packet. Only one transition is not
sufficient to synchronize the receiver for the duration of a packet. Therefore, every packet has to begin
with a SYNC field to enable the device to align, or synchronize, its clock to the transmitted data. For
high speed devices, the host must send a SYNC pattern that is 4 bytes: {1 and 31 zeros} encoded
according to NRZI as fifteen KJ successions, and then a KK. The alternating Ks and Js provide the
transitions for synchronizing, and the last two Ks mark the end of the field.
Figure 25: SYNC Bytes 1 and 2
55
Figure 26: SYNC Byte 3 and 4 In the figures above, we illustrate the HS SYNC pattern being sent. As stated earlier, it is fifteen KJ
successions, and then a KK. Now note that when transmission is at high speed, J=2 or 10 and K=1 or
01. Therefore when we see a 1 on USBWiredataOut highlighted in the figures above this is a K and
similarly a 2 is a J.
In Part 1, we can see that the first 2 bytes of the sync field are sent. In Part 2, we see the last two
bytes. Note that every switch from a K to a J helps the receiver synchronize. The last 2 bits sent are
11 or KK, which indicates the end of the SYNC field. These 2 bits are circled in yellow.
3- Transmitter sends a Token Packet of type SETUP
Figure 27: Token Packet This is the information in the packet that we input to the Transmitter
PID=00101101, ADDR=00000000, ENDP=00000000, CRC5= to be calculated
56
Figure 28: Setup Token Packet
Note that we first have the 3 SYNC bytes (00) then the last sync byte (80) then the token PID (2d)
then the ADDR+1bit of ENDP (00) then the 3 bits of ENDP and CRC5.
As we can see in the figure above, the following sequence of bytes pass through the stages of bit
stuffing, CRC calculation, NRZI encoding and result in bits on the wire.
Note that, although PID=00101101 where MSB=0 and LSB=1, bits need to be sent out on the bus in
little endian order, as specified by the USB protocol. That is, the LSB of a byte is sent out first,
followed by the next LSB and through to the MSB.
Figure 29: Little Endian
57
In the figure above, we can follow how the PID (00101101) bits are encoded and send. Please note on
the figure how a J(10) and a K(01) are represented. Note that little endian is used to send a byte. So
10110100
K->KJJJKKJK
4-Transmitter sends a HS EOP (End of Packet)
In high-speed signaling, a sequence that would generate a bit stuff error at the receiver device is
intentionally sent to indicate EOP. For almost all high-speed packets the End of High-speed Packet is
an encoded byte of 01111111, without bit stuffing. If the preceding bit was a J, the End of High-speed
Packet is KKKKKKKK. The initial 0 causes the first bit to be a change of state from J to K, and the
following 1s mean that the rest of the bits don't change. If the preceding bit was a K, the End of High-
speed Packet is JJJJJJJJ. The initial 0 causes the first bit to be a change of state from K to J, and the
following 1s mean that the rest of the bits don't change. In either case, a sequence of seven bits
without a transition causes a bit stuff error.
When all fields of the token packet are sent, a HS EOP pattern must be sent. As illustrated above, we
will see this experimentally from the simulation in the figure below. When the packet has been sent, a
signal called HSEOP which has been 0 all along will become one. This will cause a sequence of 8
data states on the wire that are opposite to the last data state that was send by the last field of the
packet. In the figure, the last bit was a J (2) and so we can see a sequence of 8 K’s sent consecutively
to signal the end of the packet. This is highlighted in the white box.
Figure 30:End Of Packet
58
We have now tested the correct transmission of a token packet. Next, we will describe that of a Data
Packet as it is a bit different.
Sending a DATA packet:
The Idle state and the sync byte patterns that exist before a packet is sent are identical for all packets
sent, so we will skip the testing of these stages and directly start discussing the fields of the DATA
packet.
This is the information in the packet that we wish to send: PID=11000011 , DATA(1st byte)= 11110000,
DATA(2nd byte)=00001111, CRC16 (1st byte) and CRC16 (2nd byte)=to be calculated
Figure 31: Data Packet
As we can see in the figure above, the PID, then DATA(1st byte), DATA(2nd byte), CRC16(1st byte)
and CRC16(2nd byte) are sent out in succession. This completes the DATA packet.
In order to ensure adequate signal transitions, bit stuffing is employed by the transmitting device when
sending a packet on USB. The rule for bit stuffing was described earlier in the Implementation chapter.
In the figure below, the two bytes highlighted in red are sent in a little endian ordering. This means that
11110000 is sent, followed by 00001111. Since this means that there are 8 consecutive one bits, the
txonecount signal shown below is asserted and a new 0 bit is stuffed and decoded. The bit in purple is
bit stuffed, the ones in orange are the 2+6=8 encoded bits of the 11110000 (2nd)
59
Figure 32: Bit Stuffing
The EOP pattern that exist after a packet is sent are identical for all packets sent, so we will not repeat
the details again.
Sending a HANDSHAKE packet:
The Idle state and the sync byte patterns that exist before a packet is sent are identical for all packets
sent, so we will directly start discussing the fields of the HANDSHAKE packet.
This is the information in the packet that we need to send to have an ACK handshake
PID=11010010
Figure 33: Handshake Packet
As we can see in the figure above, the PID is sent which means the packet has been sent. This
completes the Handshake packet. The EOP pattern that exist after a packet is sent are identical for all
packets sent, so we will not repeat the details again.
60
5.3 Receiver Testing The tests done independently on the receiver will have granularity of bits that are converted to bytes at
the output. In other words, we will provide test cases that ensure proper reception of packets from the
USB device and proper sending of bytes to the HC.
High Speed Detection The first component to test in the receiver is the Detect Speed component since this component will
notify all other components as to what speed they should be working at. If this component
malfunctions then all the other components will be working at the wrong speed. Below is the test
waveform that shows that a high speed device has been connected.
In the figure above, the input rxwiredatain is what we are receiving from the USB wire. Connectstate
highlighted in red above is the output of this component. 00, 01, 10 and 11 correspond to low speed,
full speed, high speed and disconnected respectively. The first thing we notice above is that
connectstate has been changed from 11 to 10 and this means that we have detected a high speed
device. We can see that the resetdevice signal, highlighted in purple, goes high upon receiving the 2
input. This reset signal will force the transmitter to send an SE0 to reset he device. This will force the
device to respond and tell us if it is high speed or not. If the device is high speed then the device will
reply with a 01. Upon receiving the 01 input the sendjkjkjk and outputs are set to 1. This signal will
inform the receiver to send the acknowledgment sequence to the device and thus complete the
process of detecting high speed.
Figure 34: High Speed Detection Waveform
61
Receiving Bits
The USB Read component is the component that reads the input from the USB wire and performs the
timing issues. Below is a waveform that shows the timings.
In the figure above, Rxbitsin is what is the input read from the D+ and D- lines on the USB wire and is
highlighted in red on the waveform. The highspeedtick signal directly below the highlighted bits in red
is the speed at which we take in the input. At every rising edge of the highspeedtick signal, we take in
a new input. The highspeedtick signal has a period of 2.0833 ns and if the USB device sends each bit
for this amount of time, then we are guaranteed that we will not miss any of the bits since they will all
witness a rising edge of the highspeedtick. The bits highlighted in yellow are the outputs and will be
given to the Byte Analyzer component. We notice here that the output is 1 even though we are reading
new inputs from the wire. This is because, due to the 64 buffers present, there is a delay in outputting
the data. We also notice that fullspeedrate is 2 and this means that, as required, we are reading data
at a high speed rate.
Processing the bits
In this test we will be receiving the byte: 10000000 which is part of the start of packet for high speed.
The USB device will send the 10000000 starting from the least significant bit. Thus we will first detect
seven 0’s and then one 1.
Figure 35: Receiving Bits at High Speed Waveform
62
In the figure above, the bits highlighted in red are the inputs that are coming from the USB read
component. We see that the inputs are 01, 10, 01, 10, 01, 10, 01, and 01. The bit count highlighted in
purple shows us which bit number we are at before we form the byte. The bits highlighted in yellow
represent the bytes that will be sent to the Byte Analyzer component. Notice that after we receive the
two 01’s, the byte being formed becomes an 80 which is 10000000. Thus we have successfully
received a byte that will be sent to the Byte Analyzer.
Processing the Bytes
1-Receiving a Handshake Packet (ACK)
Here we will test the component that processes the bytes and forwards the information to the HC
(Host Controller). Below is a waveform showing an ACK handshake packet being received. The ACK
packet is made up of the following parts: SOP, PID, EOP where the SOP is 80,00,00,00 hex, the ACK
PID is D2 hex, and the EOP is FF.
Figure 36: Forming a Byte Waveform
63
In the figure above, highlighted in red we see that the Bit to Byte converter component signals the
Byte Analyzer component when it is giving it inputs by setting the processrxdatainwen bit to 1. Here
we see the inputs highlighted in yellow that are 80, 00, 00, 00, d2. Highlighted in purple, we see that,
after the byte has been analyzed, the ackrxed signal has been set to one so that higher level
components know that the ACK has just been received.
2-Receiving a DATA packet
In this test, we will be receiving a DATA packet that has two bytes in the data payload which are
11110000 and 00110000. Before receiving the data we need to receiver the PID telling us that this is
in fact a data packet. After receiving the data we should expect to receive the CRC and the EOP.
Below is the list of bytes that we should receive to complete a Data packet.
SOP 80 PID C3
Byte1 F0 Byte2 30
CRCByte1 BA CRCByte2 5B
Figure 37: Processing Bytes that represent an ACK Waveform
Table 7: Bytes input into the Byte Analyzer component
64
In the figure above, highlighted in red we see the bytes that we are receiving. Note that these bytes
are coming from the Bit to Byte Converter component. We see that we are receiving the correct bytes
that are present in the table above. The CRC being calculated is highlighted in yellow and we can see
that the CRC error does not become 1 and thus the CRC that was calculated is correct. The output
from this component is highlighted in purple and will be sent to upper level components, specifically
the HC. There are three things that should be looked at in the area highlighted in purple: rxdataout,
rxcontrolout and rxdataoutwen. We see that the rxdataoutwen becomes high 6 times and so we output
data 6 times.
The outputs are shown in the table below:
DataOut ControlOut 0 Rx_packet_start ( 0 ) c3 Rx_packet_stream ( 1 ) f0 Rx_packet_stream ( 1 ) 30 Rx_packet_stream ( 1 ) ba Rx_packet_stream ( 1 ) 5b Rx_packet_stream ( 1 ) 0 Rx_packet_stop ( 2 )
After the rx_packet_stop has been output to the upper layer, the transaction will be complete and the
bytes would have been successfully sent to the HC (Host controller) that will use this information.
Figure 38: Processing Bytes that represent Data Waveform
Table 8: Processing Bytes that represent Data
65
5.4 Testing the USB Core on the FPGA
After testing the VHDL USB core, we proceeded with downloading the system on the FPGA and
testing it using the C code. We first had to add our VHDL USB core as a component in our hardware
system.
Note that an IP core for USB is not available in the EDK peripheral libraries; therefore after having
designed it ourselves, we had to import it into our project in XPS in order to be able to use it. To
achieve this target, we used the Create and Import Peripheral Wizard that guided us through the
design flow.
Within the latter mentioned wizard, we added our core peripheral as a slave device on the On-chip
peripheral bus (OPB) which is attached to the Microblaze soft-core processor on the FPGA.
In this case, normally, our IP core should have had an interface compliant to the OPB bus protocol,
however EDK uses the Intellectual-Property Interface (IPIF) library which gives a set of simplified bus
protocol called IP interconnect which is much easier to use compared to operating on the bus using
the OPB protocol.
Moreover, the Create and Import Peripheral wizard generates templates that take care of all the OPB
bus interface protocol and connection between IPIF and our code. In fact, this wizard generates 2 files
(among many others) in the pcores subdirectory under the project directory, one peripheral top-level
file which we don’t modify, and another called user-logic. In the user-logic file (VHDL), we added the
top-level wrapper of our USB core as a component and we port mapped each input and output of this
wrapper to a register. These registers are different from the ones described in the Design and Analysis
Chapetr. In fact, these registers are 11 in number and they correspond to address_i, data_i, rst, we_i,
strobe_i, data_o HostResumeIntOut, HostTransDoneIntOut, HostConnEventIntOut,
HostSOFSentIntOut and USB Speed. Note that if we compare to the block diagram, we see that
registers for clk, USBWireDataOut and USBWireDataIn are missing. The reason is that we connected
the clk directly to the system clock running at 100 MHz. As for the USBWireDataOut and
66
USBWireDataIn signals, we omitted these because we used the device simulator which was
implemented in VHDL. Thus, these signals will not interface to the physical USB port but instead they
are connected within the VHDL core to the device simulator.
The last step was to generate a file with an extension .pao (peripheral analysis order file) in which
HDL Analysis Information is found (dependent library files and HDL source files to compile the
peripheral, as well as corresponding logical libraries those files will be compiled into). At
After having added the IP core, in order to interact with the VHDL core from the software layer, all the
C code has to do is to write to and read from registers described above. This will enable it to send
input data to the core and read output data from the core. These registers are located within the
address space assigned to the core. The functions used to read and write to these registers in C code
are fairly simply.
In fact, the above mentioned wizard generates a C header file called HIGH_SPEED_USB_CORE.h (in
accordance with our core which is called HIGH_SPEED_USB_CORE) in which one can find the
functions used to read and write to the registers. This header provides many functions to choose from
in order to read and write to the registers. A prototype of the functions we chose is:
HIGH_SPEED_USB_CORE_mWriteSlaveRegX(BaseAddress, Value)
HIGH_SPEED_USB_CORE_mReadSlaveRegX(BaseAddress)
Where
X: The number of the register we would like to read from or write to.
BaseAddress: The base address of the address space assigned to the core.
Value: The value we would like to write to register X.
67
Our C code configures the USB Core before requesting that a transaction begins. To do this, it
assigns appropriate values to the inputs of the USB Core and consequently to the registers to which
these inputs are assigned.
It is composed of 3 files: main.c, driver.c and HIGH_SPEED_USB_CORE.h. The last file is generated
by the wizard and contains a list of functions we can choose from to write and read to registers (as
mentioned above), whereas the other 2 files were written by us and are as follows:
main.c
The file main.c simply calls the function HIGH_SPEED_USB_CORE_SETUP_TRANSACTION() which
is located in the driver.c file. It provides a higher level of abstraction to the user; the user will just have
to call a function without having to deal with writing to registers at the bit level granularity.
#include "xbasic_types.h" #include "xstatus.h" #include "xparameters.h" #include "xio.h" #include "xuartlite_l.h" #include "xuartlite.h" #include "stdio.h" #include "High_Speed_USB_Core.h" #define BASEADDR 0x77400000 int main(void) { print("-- Entering Main() --\r\n"); print("-- Call the function that starts a setup transaction --\r\n"); HIGH_SPEED_USB_CORE_SETUP_TRANSACTION( ); } driver.c #include "xbasic_types.h" #include "xstatus.h" #include "xparameters.h" #include "xio.h" #include "xuartlite_l.h" #include "xuartlite.h" #include "stdio.h" #include "signal.h" #include "High_Speed_USB_Core.h" #define baseaddr 0x77400000 int HIGH_SPEED_USB_CORE_SETUP_TRANSACTION(void ) { Xuint32 Reg32Value;
68
xil_printf("**************************************************************\n\r "); xil_printf("First reset all the components\n\r "); //rst_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg0(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg0(baseaddr); xil_printf(" - wrote %d to rst_i\n\r", Reg32Value); //address_i=x"34" xil_printf(" Set the address equal to that of the Transmit Fifo\n\r "); HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 52); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 xil_printf(" Set the data to be sent to the Transmit Fifo equal to 1\n\r, so as to delete all data in the fifo\n\r"); HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); //we_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg3(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg3(baseaddr); xil_printf(" - wrote %d to we_i \n\r", Reg32Value); //strobe_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg4(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg4(baseaddr); xil_printf(" - wrote %d to strobe_i\n\r", Reg32Value); //address_i=x"24" xil_printf(" Set the address equal to that of the Receive Fifo \n\r, so as to delete all data in the fifo\n\r"); HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 36); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i \n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); //rst_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg0(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg0(baseaddr); xil_printf(" - wrote %d to rst_i\n\r", Reg32Value); xil_printf(" Write 0 to the TRANSREQ_PREEN_SOFSYNC -> No transaction required at present time\n\r", Reg32Value); //address_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 =>no transaction required HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); xil_printf("Set the transaction type equal to SETUP\n\r", Reg32Value); //address_i=7=>TRANSACTION_TYPE HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 7); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 =>Setup transaction HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0);
69
Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); xil_printf("If it has processed the incoming data, it sends it back as output\n\r"); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg6(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 0 to the DEVICE_ADDRESS\n\r"); //address_i=5=>DEVICE_ADDRESS HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 5); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 0 to the ENDPOINT_ADDRESS\n\r"); //address_i=6=>ENDPOINT_ADDRESS HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 6); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1111 to the INTERRUPT_MASK\n\r"); //address_i=9=>INTERRUPT_MASK HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 9); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=15 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 15); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); //address_i=12 HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 12); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); // xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1010000 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 80); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); // xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1 to the SOF_ENABLE\n\r"); //address_i=1=>SOF_ENABLE HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr);
70
xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 11110000 to the TX_FIFO_DATA\n\r"); //address_i=48=>TX_FIFO_DATA HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 48); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=11110000 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 240); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 00001111 to the TX_FIFO_DATA\n\r"); //data_i=00001111 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 15); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1 to the TRANSREQ_PREEN_SOFSYNC\n\r"); //address_i=0=>TRANSREQ_PREEN_SOFSYNC HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); xil_printf("Write 0 to we_i\n\r"); //we_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg3(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg3(baseaddr); xil_printf(" - wrote %d to we_i \n\r", Reg32Value); xil_printf("Write 0 to strobe_i\n\r"); //we_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg4(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg4(baseaddr); xil_printf(" - wrote %d to strobe_i \n\r", Reg32Value); Reg32Value==0; xil_printf("Wait till the transaction is done\n\r ", Reg32Value); while (Reg32Value==0) { Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg9(baseaddr);
71
} xil_printf(" - read %d from Transaction Done interrupt bit\n\r", Reg32Value); xil_printf("Transaction completed with no errors ! "); return 0; }
In the following we will try to provide a brief explanation of what the C code does; it first sends some
configuration information to the USB core such as: enable automatic transmission of Start-Of-Frame,
the address of the device and its endpoint and others. It also loads the Transmit FIFO with data to be
sent as part of a data packet, then specifies the type of transaction required (in this case a setup
transaction), and when the transaction should start exactly.
Note that every time we write values to the address (address_i) and data (data_i) registers, we can
check the value in data_o, which should be equal to data_i if the VHDL has processed the address
and data correctly, meaning it is properly configured before transaction start.
Finally we wait for the USB Core to set the Transaction Done bit, meaning that the transaction has
been completed. Below is the output result we get on the Hyperlink terminal on the computer screen.
This terminal is attached to the COM1 port on which the FPGA sends data.
-- Entering Main() -- -- Call the function that starts a setup transaction -- ************************************************************** First reset all the components - wrote 1 to rst_i Set the address equal to that of the Transmit Fifo - wrote 52 to address_i b Set the data to be sent to the Transmit Fifo equal to 1 , so as to delete all data in the fifo - wrote 1 to data_i - wrote 1 to we_i - wrote 1 to strobe_i Set the address equal to that of the Receive Fifo , so as to delete all data in the fifo - wrote 36 to address_i
72
- read 0 from data_o - wrote 0 to rst_i Write 0 to the TRANSREQ_PREEN_SOFSYNC -> No transaction required at present time - wrote 0 to address_i b - wrote 0 to data_i Set the transaction type equal to SETUP - wrote 7 to address_i b - wrote 0 to data_i If it has processed the incoming data, it sends it back as output - read 0 from data_o Write 0 to the DEVICE_ADDRESS - wrote 5 to address_i b - wrote 0 to data_i b - read 0 from data_o Write 0 to the ENDPOINT_ADDRESS - wrote 6 to address_i b - read 0 from data_o Write 1111 to the INTERRUPT_MASK - wrote 9 to address_i b - wrote 15 to data_i b - read 15 from data_o Write 1 to the SOF_ENABLE - wrote 1 to address_i b - wrote 1 to data_i b - read 1 from data_o Write 11110000 to the TX_FIFO_DATA - wrote 48 to address_i b - wrote 240 to data_i b Write 00001111 to the TX_FIFO_DATA - wrote 15 to data_i b Write 1 to the TRANSREQ_PREEN_SOFSYNC - wrote 0 to address_i b - wrote 1 to data_i b Write 0 to we_i - wrote 0 to we_i Wait till the transaction is done
73
- read 1 from Transaction Done interrupt bit Transaction completed with no errors !
We wrote the C code for the test case described above. As for the other cases of transactions that we
had tested in simulation, we did not implement them because the FSM would have to change for each
case since this deals with information at the bit level. For example, for an IN transaction, if the device
wishes to send a token packet followed by a data packet that has 10 bytes in its data field then this
would require that we simulate the sending of around a 1000 bits to run the test case. And note that
these bits need to be 100% correct or else the VHDL core would not work. For example, if one
mistake is made in a bit of the PID field and the PID is invalid, then the whole test case fails. In any
case, since the system worked on the FPGA for the case we tested, and it worked for the remaining
three cases of transactions in the VHDL simulator, it is expected to work on the FPGA for all other
cases.
74
6.1 Difficulties Faced Throughout our work on the Final Year Project we faced many difficulties and problems, some of
which were mentioned throughout the report. The table below provides a summary of these along with
possible alternatives solutions we found to overcome them:
Difficulties Alternatives
Understanding and implementing the whole
USB protocol.
We implemented the parts of the USB
protocol that were relevant to our project. In
many cases we did not implement parts of
the specification. An example is support of
split transactions that are required in
isochronous transfers.
We had planned to link our core to the
physical USB port on the FPGA but after
having implemented most of the VHDL
design, we noticed that the USB port has a
parallel interface instead of a serial interface;
as a result we could not test our code with an
actual physical device.
We simulated a USB device as part of a
SETUP transaction required by the software
layer; we implemented a finite state machine
which upon receiving correctly all the bits it
should receive in the scope of a SETUP
transaction, responds with an ACK.
The maximum frequency at which the internal
FPGA clock runs is 100 MHz whereas our
High Speed Host Controller Core needs to
run on 960 MHz
We had to readjust our core on 100 MHz to
be able to make it work on the FPGA. This
meant that signaling to the device simulator
was lower than required by the USB
specification
Regulating the timing between commands in
the C code at the software layer, because
both the USB Core and its test bench are
very sensitive on timing issues.
By testing with several timing patterns on the
FPGA, we managed to get the correct timing
Table 9: Difficulties Other than the problems listed in the table above, one of the major problems that we have faced while
making the core low, full and high speed compatible is that the high speed was too fast for us to
handle. The high speed frequency is 960 MHz whereas the full speed is 12 MHz. Our receiver, which
6.0 Conclusion
75
is the component responsible for reading the USB wire, can only handle a certain speed. Let us look
at a waveform to be able to understand the situation.
Highlighted in red is the speed at which we take in bits. Highlighted in yellow is the speed at which we
output the bits. Clearly, we are taking in bits at a faster rate then we are outputting them. The reason
for the slow outputting of the bits is because the bits have to go through 3 machine states before they
are output and thus we can output one bit every 3 rising edges of the main clock. This problem can be
approached using two different methods. The first method was adding buffers. We tried adding several
numbers of buffers and eventually chose to add 64 buffers. Let us do the analysis: We take in one bit
every 2.0833ns, we output one bit every 3.125 ns. With those rates, data that has been written to a
buffer and has not yet been output can be re-written at a later stage if the buffers become full and thus
the old data will be lost.
Below is a table that shows us the time at which we take in inputs, output them and the number of
buffers that are in use.
Figure 39: High Speed Rate Problem
76
Time(ns) take in new input output buffercounter 2.0833 Here 1 3.125 Here 0 4.1666 Here 1 6.2499 Here 2 6.25 Here 1 8.3332 Here 2 9.375 Here 1 10.4165 Here 2 12.4998 Here 3 12.5 Here 2 14.5831 Here 3 15.625 Here 2 16.6664 Here 3 18.7497 Here 4 18.5 Here 3 20.833 Here 4
As we can see from the table above, the buffers are being filled up fast and it will not take long until we
start overwriting data. If we have 64 buffers than the maximum number of sequential bits that we can
have is solved below
X – X (2.0833)/3.125 = 64 => X = 96 inputs which is obviously not enough since we must receive
thousands of bits in sequence. Thus we should either increase the number of buffer or look for other
solutions. Increasing the buffers would increase the area used by the FPGA, power consumption and
so on and is not considered as a good solution. We thus analyze another solution which would try to
make the component output data at a faster rate.
Te second approach actually speeds up the rate at which we output the results. Since we are
constrained by the act that we have to go through three state machines to be able to output, the best
solution would be to make it possible to move from one state to the next on the rising edge and the
falling edge and thus it would take us less time to output the data. The period would then go down
from 3.125 ns to 1.5625 ns and thus we will be able to output at a faster rate than our input. Buffers
will not be necessary in this solution.
Table 10: High Speed Rate Problem
77
6.2 Future Work
Our USB Core functions properly concerning the main deliverables, but it still has room for
improvement. We were not able to attempt the following suggestions due to time constraints.
Future work may involve the following:
• Implementing the whole USB protocol with all its details.
• Finding a way to have the internal clock of the FPGA equal to the USB clock needed for high
speed (960 MHz). For example, this can only be achieved by working on a faster processor on
the FPGA.
• Implementing a USB Device Core and trying to attach our USB Host Controller Core to it, or
as a second option, designing a simple board with only a USB physical port on it, which can
be attached to the Virtex development board through the P160 expansion slot.
• Developing the 3 remaining transaction cases in C code and generating libraries that abstract
the Host Controller Driver Layer and implement the layers above it.
6.3 Design Constraints
FPGA’s give us flexibility at the cost of performance. Since the FPGA’s were present in the AUB labs,
economic constraints do not apply to us. Even so, designing the USB core on an FPGA would be
more expensive than designing the core on another chip that can be mass manufactured. FPGA’s are
devices that have been used and can stay operational for many years. However, technology is
evolving rapidly along with the design of FPGA’s. The VirtexII board that we used has already been
succeeded by two newer versions and thus we expect that the FPGA that we are using will become
obsolete in about a decade and thus sustainability is a major issue in our design. Furthermore, a new
USB specification might come out and a newer core will need to be re-designed.
78
7.0 References 1. Axelson, Jan, (2001), USB Complete: Everything You Need to Develop Custom USB Peripherals. Third Edition 2. Birkner, J. (1998). HDL IP cores in FPGAs to drive pace of innovation. Cahners Publishing Company: Gale Group. Retrieved from http://www.findarticles.com/p/articles/mi_m0EKF/is_n2203_v44/ai_20201029 3.Copyright © 2000, Compaq Computer Corporation, Hewlett-Packard Company, Intel Corporation, Lucent Technologies Inc, Microsoft Corporation, NEC Corporation, Koninklijke Philips Electronics N.V, Universal Serial Bus Specification Revision 2.0 4. Cypress Semiconductor Corporation (2005), SL811HS Embedded USB Host/Slave Controller From Cypress, Document 38-08008
5. Fanning, J (1999). Literature Survey of Present State of FPGA's. Department of Instrumentation ad Analyical Science. Retrieved from: http://dias.umist.ac.uk/old_pages/njg/fpga2.htm 6. Fielding, Steve. USBHostSlave IP Core Specification. Retrieved from: http://www.opencores.org 7. Hyde, John. USB Design by Example: A Practical Guide to Building I/O Devices 8. Philips Semiconductors (1999), PDIUSBD11 USB device with serial interface, Retrieved from: http://www.semiconductors.philips.com/acrobat_download/datasheets/PDIUSBD11_N_3.pdf 9. Philips Semiconductors (2005). ISP1760 Hi-Speed Universal Serial Bus host controller for embedded applications 10. Saini, M. (2004). FPGA Solutions: Using Synplify® Software Synthesis with Xilinx Platform Studio. The Syndicated. Retrieved From http://www.synplicity.com/literature/syndicated/pdf/v4_i2/platform_studio_v4_i2.pdf 11. TransDimension Inc. (2002). UHC124 USB Host Controller Data Sheet. TransDimension Document Number: MU1002 Retrieved From: http://www.transdimension.com/downloads/assets/hardware/uhc124/UHC124%20Product%20Brief.pdf 12. Vilakathara H, Challenges in developing a reusable IP core USB OTG IP case study , D & R Industry Articles
13. Xilinx, Virtex-II™ V2MB1000 Development Board User’s Guide