implementing a usb 2.0 intellectual

78
Final Year Project Spring Report Implementing a USB 2.0 Intellectual Property Core on FPGA Presented By: Liza Tutunjian Arine Hadidian George Ghanem

Upload: dhananjay-patil

Post on 04-Mar-2015

147 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Implementing a USB 2.0 Intellectual

Final Year Project Spring Report

Implementing a USB 2.0 Intellectual

Property Core on FPGA

Presented By: Liza Tutunjian

Arine Hadidian George Ghanem

Page 2: Implementing a USB 2.0 Intellectual

2

Final Year Project Spring Report

Implementing a USB 2.0

Intellectual Property core on

FPGA

By Liza Tutunjian

Arine Hadidian George Ghanem

Advisors

Dr. Mazen Saghir Dr. Ali Chehab

Department of Electrical and Computer Engineering

American University of Beirut

Page 3: Implementing a USB 2.0 Intellectual

3

May 23, 2006

Abstract

Implementing a USB 2.0 Intellectual Property core on FPGA By Liza Tutunjian, Arine Hadidian and George Ghanem

This report contains the work we have completed concerning the implementation of a high speed USB

Intellectual Property (IP) core on an FPGA.

In the first chapter, we define the problem that we are setting to solve, highlight the practical importance of the

topic we chose as our FYP, summarize the tools that we shall use to complete the project and the estimated

budget and present a schedule of our plan of work for the Spring term.

In the second chapter, we included a Review section that provides the reader with background for USB specific

information, a summary of solutions that have already been implemented and the reason why we chose an FPGA

implementation over an ASIC implementation.

The third chapter, Design and Analysis, presents a high level block diagram of the USB system that we will

implement and explains the relationship between the various blocks as well as the responsibility of each in

delivering a working USB IP core.

The fourth chapter, Implementation, defines the design of our hardware system and our software layer. It

provides a description of the main components involved in the system.

The fifth chapter, Evaluation, includes the test cases that we performed at different stages of the project to ensure

the functionality of our project.

The last chapter, Conclusion, presents some of the problems we faced along with alternative solutions, areas of

future work and consideration of design constraints.

Page 4: Implementing a USB 2.0 Intellectual

4

Table Of Contents 1.0 Introduction............................................................................................................. 6

1.1 Problem Statement Background ................................................................................................... 6 1.2 Problem Statement ....................................................................................................................... 7 1.3 Practical Importance of our FYP................................................................................................... 8 1.4 Budget........................................................................................................................................... 8 1.5 Tools for implementing the FYP.................................................................................................... 9

1.5.1 Virtex-II™ V2MB1000 Development Board........................................................................... 9 1.5.2 Embedded Development Kit Software................................................................................. 10

2.0 Review................................................................................................................... 12 2.1 Introduction ................................................................................................................................. 12 2.2 USB Background......................................................................................................................... 13

2.2.1 Interlayer Communication Model......................................................................................... 13 2.2.2 USB Packet Field Formats .................................................................................................. 16

2.3 Approach to solving problem ...................................................................................................... 17 2.4 Literature Survey......................................................................................................................... 18

3.0 Design and Analysis................................................................................................ 20 3.1 High Level Design ....................................................................................................................... 20 3.2 Datasheet Summary ................................................................................................................... 21

3.2.1 Introduction .......................................................................................................................... 21 3.2.2 Features............................................................................................................................... 21 3.2.3 Block Diagram ..................................................................................................................... 22 3.2.4 System Overview................................................................................................................. 23 3.2.5 Signal Definitions ................................................................................................................. 24 3.2.6 Registers.............................................................................................................................. 24 3.2.7 Choice of Design ................................................................................................................. 27

4.0 Implementation....................................................................................................... 28 4.1 USB Core General Implementation ............................................................................................ 29

4.1.1 Host Controller..................................................................................................................... 31 4.1.2 Transmitter........................................................................................................................... 36 4.1.3 Receiver............................................................................................................................... 43 4.1.4 Host Controller Driver .......................................................................................................... 47

5.0 Evaluation.............................................................................................................. 50 5.1 Host Controller Testing ............................................................................................................... 50 5.2 Transmitter Testing ..................................................................................................................... 52 5.3 Receiver Testing ......................................................................................................................... 60 5.4 Testing the USB Core on the FPGA ........................................................................................... 65

6.0 Conclusion............................................................................................................. 74 6.1 Difficulties Faced......................................................................................................................... 74 6.2 Future Work ................................................................................................................................ 77 6.3 Design Constraints...................................................................................................................... 77

7.0 References............................................................................................................. 78

Page 5: Implementing a USB 2.0 Intellectual

5

Table Of Figures Figure 1: VirtexII System Board .............................................................................................................. 9 Figure 2: Embedded Development Kit Environment ............................................................................. 10 Figure 3: Simple USB Host/Device View............................................................................................... 13 Figure 4: USB Implementation Areas.................................................................................................... 13 Figure 5: Host Communication.............................................................................................................. 14 Figure 6: High Level System Block Diagram......................................................................................... 20 Figure 7: Overall System....................................................................................................................... 28 Figure 8: Host Controller ....................................................................................................................... 32 Figure 9: Transmitter ............................................................................................................................. 37 Figure 10: USB Cable............................................................................................................................ 37 Figure 11: Speed and Data States ........................................................................................................ 38 Figure 12: Handshake Packet ............................................................................................................... 39 Figure 13: Data Packet.......................................................................................................................... 40 Figure 14: Token Packet ....................................................................................................................... 40 Figure 15: NRZI Encoding..................................................................................................................... 41 Figure 16: Receiver ............................................................................................................................... 44 Figure 17: High Speed Data In Tick ...................................................................................................... 44 Figure 18: Cygnal CP2101 .................................................................................................................... 48 Figure 19: FSM states ........................................................................................................................... 46 Figure 20: SETUP Transaction ............................................................................................................. 51 Figure 21: OUT (0/1) Transaction ......................................................................................................... 51 Figure 22: IN Transaction...................................................................................................................... 52 Figure 23: Writing to the USB wire at High Speed ............................................................................... 53 Figure 24: Idle State .............................................................................................................................. 54 Figure 25: SYNC Bytes 1 and 2 ............................................................................................................ 54 Figure 26: SYNC Byte 3 and 4 .............................................................................................................. 55 Figure 27: Token Packet ....................................................................................................................... 55 Figure 28: Setup Token Packet............................................................................................................. 56 Figure 29: Little Endian ......................................................................................................................... 56 Figure 30:End Of Packet ...................................................................................................................... 57 Figure 31: Data Packet.......................................................................................................................... 58 Figure 32: Bit Stuffing ............................................................................................................................ 59 Figure 33: Handshake Packet ............................................................................................................... 59 Figure 34: High Speed Detection Waveform......................................................................................... 56 Figure 35: Receiving Bits at High Speed Waveform............................................................................. 58 Figure 36: Forming a Byte Waveform ................................................................................................... 57 Figure 37: Processing Bytes that represent an ACK Waveform ........................................................... 60 Figure 38: Processing Bytes that represent Data Waveform................................................................ 61 Figure 39: High Speed Rate Problem ................................................................................................... 72

List of Tables

Table 1: Signals...................................................................................................................................................24 Table 2: Register Description ................................................................................................................ 26 Table 3: Compare alternative designs with our design features ........................................................... 27 Table 4: Bus States ............................................................................................................................... 37 Table 5: Speed and Data States ........................................................................................................... 38 Table 6: Bit to Byte Conversion............................................................................................................. 43 Table 7: Bytes input into the Byte Analyzer component........................................................................ 60 Table 8: Processing Bytes that represent Data..................................................................................... 61 Table 9: Difficulties ................................................................................................................................ 74

Page 6: Implementing a USB 2.0 Intellectual

6

1.0 Introduction 1.1 Problem Statement Background

The enhancements in hardware technology have led to the existence of Field Programmable Gate

Arrays (FPGA) that are large enough to accommodate a complete system on a single device. Thus,

such devices have been called “system on a programmable chip” (SOPC). The single-chip design

allows designers to place a large number of functions onto a SOPC and to reprogram this chip from

the desktop thus removing engineering costs from prototyping and testing of new designs.

For the past decade, Intellectual Property (IP) cores have been developed to provide consumers with

a collection of cores to decrease the customer's time-to-market. IP cores form an essential element of

design reuse and are part of the growing trend towards repeated use of previously designed

components. At this point, a very diverse set of IP cores is now available on today’s advanced FPGA

devices. This offers system developers the opportunity to mix a match various microprocessor,

embedded memory, system I/O functions on an FPGA while cutting down design time and reducing

risk due to the industry availability of well performing and low cost components. The vast majority of

those IP cores are created and owned by either the FPGA vendors themselves (such as Xilinx) or

licensed from microprocessor vendors.

Popular IP core implementations include functions such as USB, digital signal processing FIR and IIR

filters, fast Fourier transforms, adders, multipliers UARTs and more.

The motivation behind developing “FPGA plus IP core solutions” was accelerated due to the increased

functionality, performance and flexibility that these solutions presented over other approaches to

system design.

Page 7: Implementing a USB 2.0 Intellectual

7

1.2 Problem Statement

Although high speed Universal Serial Bus (USB) IP cores have been implemented by commercial

companies, vendors protect the trade secret and patents for their tremendous investment in time and

development effort. FPGA vendors are working together with core providers to devise methods to

license IP cores. A popular concept is to allow users to try-out the core by down-loading from the net

for simulation-place-route trial use. If the user is satisfied with the core, a license fee is paid, enabling

a key, which allows programming of the device. The problem here is that these cores are relatively

expensive to purchase and only provide a border definition of the interface between the processor and

the downloaded core.

With the recent availability of advanced FPGA boards in our digital labs that have physical USB ports

on them, we found that a need exists for developing a high speed USB host IP core. Without such a

core, any hardware system downloaded on an FPGA cannot communicate with a USB device. In a

computer running Windows operating system, this functionality is implemented and the user of a USB

device knows very little about USB packets that are sent back and forth between the host computer

and a USB device. In order to enable the same thing for a hardware system that has been

downloaded on an FPGA, that is, allow it to communicate with an attached USB device, we had to

implement the hardware design of a host that implements the USB protocol.

Due to the significant emergence of USB based applications in today’s world, we believe that

implementing a USB IP core that can be reused by coming generations of engineering students at

AUB will provide the groundwork for developing more advanced systems that make use of USB

devices.

The main idea behind our Final Year Project is to develop a non-commercial, student research based

IP core that implements a high speed Universal Serial Bust Host, and add this core as a block to the

system hardware architecture. Basically we will design a hardware system that has a processor

running code that communicates with a USB device.

Page 8: Implementing a USB 2.0 Intellectual

8

1.3 Practical Importance of our FYP

Having defined the problem that we will be addressing, we need to highlight the practicality of meeting

such a goal.

Generally speaking, the practicality of developing IP cores to build complex systems allows companies

to sell standard solutions written in HDL for implementation on the designers own FPGA’s and thus

removes an element of re-inventing the wheel while spreading development time, and thus costs,

around different companies.

Moreover, in today’s industry, time-to-market pressures continue to increase. Irrespective of how well

the previous project was completed, there is pressure to complete the next one in less time, less cost

and higher performance. “FPGA plus IP core solutions” will continue to feed that market need to build

faster and better systems-on-chip well into the next decade and beyond.

As a result of our FYP, the availability of an IP core for use by the faculty of engineering at AUB, will

remove an element of re-inventing the wheel and allow students/faculty in the future more time to fully

focus on optimizing system architectures and developing even more functionality into their USB

compliant end products.

1.4 Budget

The Faculty of Engineering and Architecture at AUB have purchased several Virtex-II™ V2MB1000

Development Boards that include Xilinx FPGA’s. The price of a development kit (the FPGA and the

board) is approximately $2900. All the hardware that we need to implement the USB core is available

in the digital labs. In addition to the hardware, the software is also available and installed on all the

computers in the digital lab. To be able to implement the USB host protocol, we will have to study the

USB 2.0 Specification, which we downloaded for free and thus this did not affect our budget.

Page 9: Implementing a USB 2.0 Intellectual

9

1.5 Tools for implementing the FYP

1.5.1 Virtex-II™ V2MB1000 Development Board

The Virtex-II board, shown in the figure below, utilizes up to 1 million gates and contains a large

number of I/O’s to facilitate implementation.

Figure 1: VirtexII System Board Some of the components present on the board are:

• Xilinx FPGA

• 2 clock sources

• RS-232 port

• LED’s, switches and 7 segment displays

• P160 additional Module interface that can add USB physical interface, SRAM memory and

Ethernet interface.

• 16M x 16 DDR memory

We initially wanted to place the P160 additional module, which contains the physical USB port, in the

P160 expansion slot on the board to be able to test our system. However, we faced a problem with

this and we had to come up with an alternative solution. Instead of actually testing the USB core using

the physical USB wire, we designed a block in VHDL that simulates the USB device to ensure that our

system was functional.

Page 10: Implementing a USB 2.0 Intellectual

10

1.5.2 Embedded Development Kit Software

To implement our core on the FPGA, we used the EDK software development kit that came with

the Virtex-II™ V2MB1000 Development Board. The EDK is an all encompassing design environment

for Virtex-II Pro MicroBlaze based embedded systems in Xilinx FPGAs. The figure below shows the

tools that the EDK environment provides for the implementation of embedded applications.

Figure 2: Embedded Development Kit Environment

We used Xilinx Platform Studio within the EDK environment to build a hardware system that contains:

1- Pre-designed microblaze soft processor: The design of this processor is provided with the

EDK kit and can be added with a click of a button.

2- VHDL USB core peripheral: The design of this peripheral was implemented by us.

We connected the VHDL peripheral as a slave to the Microblaze processor through the On-Chip

peripheral (OPB) bus. As part of the Hardware Development Flow, we synthesized the VHDL files for

the system above into a netlist that contains AND, OR, XOR, NAND gates and so forth. This netlist

was then mapped, placed and routed to fit onto the FPGA. Finally in the Hardware Development flow,

we generated the bitstream and then download the bitstream onto the FPGA through the JTAG port.

Page 11: Implementing a USB 2.0 Intellectual

11

We also used XPS to write C code that functions as a simple driver for USB hardware core and is run

by the aforementioned Microblaze processor. As part of the Software Development Flow, we compiled

and downloaded the C code in on-chip memory. Specifically we used 64 KB of BRAM memory which

is the local data and instruction memory.

Page 12: Implementing a USB 2.0 Intellectual

12

2.0 Review 2.1 Introduction

USB (Universal Serial Bus) is the serial bus which can realize the Plug & Play feature for easy

connection of peripherals to PCs. It removes the need to open up a PC when adding a new peripheral

device and allows the required software to be installed automatically.

In the mid-1990s, a core team of engineers from Compaq, DEC, IBM, Intel, Microsoft, NEC and

Northern Telecom (now, Nortel Networks) led to the development of a high speed serial bus

specification, USB 2.0. Today, more than 1000 companies develop products which can be connected

to the PC via USB.

Popular in the PC and telecom market for several years now, USB is designed to support standard PC

peripherals and specialist devices. PC peripherals supported by USB include modems, keyboards,

mice, CD ROM drives, joysticks, tape/floppy/hard drives, scanners and printers. Moreover, a new

wave of peripherals such as telephones, digital speakers, digital snapshot and motion cameras, data

gloves and digitizers are to take advantage of this exciting and versatile new interface.

A range of data traffic workloads can be serviced over a USB: Low-speed (10-100 kb/s) for interactive

devices, full-speed (500 kb/s – 10 Mb/s) for phone, audio, compressed video, high-speed (25 – 400

Mb/s) for video or storage. Note that the signaling rates for the low speed, full speed and high speed

protocol are 1.5Mbps, 12Mbps and 480Mbps respectively. But these are maximum values and

practically the rate of communication with the device is below these maxima.

Page 13: Implementing a USB 2.0 Intellectual

13

2.2 USB Background

The following two sections contain background information about the USB 2.0 Specification. The first

section describes the different layers of the USB host communication model and the second section

defines some common USB packet fields.

2.2.1 Interlayer Communication Model

The USB cable provides communication services between a host and attached USB devices. A host is

any device that has USB devices attached to it. The view an end user sees of attaching one or more

USB devices to a host is little more complicated to implement than is indicated by the figure below.

Figure 3: Simple USB Host/Device View

The host is made of three distinct layers, shown in Figure 4 below. A physical device is attached to the

host. This device is typically a function that provides capabilities to the system. The physical device is

also implemented in three distinct layers (right side). Physical communication between the host and

the physical device occurs horizontally through the lowest layer which is the USB wire. The vertical

arrows between the layers indicate the actual communication on the host. Moreover, there is logical

host-device communication between each horizontal layer above the physical layer.

Figure 4: USB Implementation Areas

Page 14: Implementing a USB 2.0 Intellectual

14

For our Final Year Project, we implemented parts of the host side only which is why the following

discussion will focus on the layers to the left of the figure above. Figure 5 below is a more detailed

view of the different layers of a USB host.

Our work focused mainly on the lowest, most physical layer (highlighted in blue in the figure below)

that accepts hardware defined data from the level above it and sends bits on the USB cable. In order

to test our hardware, we wrote a simple Host Controller Driver in C, highlighted in green in the figure

above, which completes a transaction with the USB device. In fact, our software system interface is

dependent on our hardware implementation and does not follow the specification for USB drivers. The

software layer simply abstracts the details of the protocol that are implemented by the hardware layer.

The software layer is involved at the level of transactions, whereas the hardware layer is involved at

the level of packets and bits. As for the highest layer which is the client software, this would typically

be C code that interacts with a USB device using only very high level functions such as read_USB( )

and write_USB( ). The client layer was not implemented as part of our FYP.

Layers we implemented

VHDL

C

Figure 5: Host Communication

Page 15: Implementing a USB 2.0 Intellectual

15

1. Physical Layer

The physical layer, referred to as USB Bus Interface Layer in a USB environment (see Figure 5), is the

hardware that handles the transmission of raw bits over the USB wire. This is the lowest layer in the

figure above. It is composed of two blocks: Serial Interface Engine and Host Controller. Data flowing

out of the USB host passes through the Host Controller first, then through the Serial Interface Engine.

a) Serial Interface Engine (SIE): The SIE performs several functions including serialization

and de-serialization of transmissions, encoding and decoding of the signals, generation and

verification of cyclic redundancy checks and detection of packet IDs and special signals.

b) Host Controller (HC): The host controller, initiates transactions and controls access to the

USB. It divides the time into “frames” and issues a start-of-frame packet at each frame

interval. In addition, it processes requests for data to and from the host and handles errors.

2. Protocol Engine Layer

The middle layer is composed of three sub-blocks: Host Controller Driver, USB Driver and USB

system software.

a) Host Controller Driver (HCD): The HCD (see Figure 5), is the lowest tier in the USB

software stack. It is the USB software layer that abstracts the Host Controller hardware and

provides an interface for interaction with the Host Controller. We wrote part of the HCD to test

our hardware system.

The blocks that are described after this are required to enable a client application to interact with a

USB device. Implementing these layers was not within the scope of our project, but we will describe

them so that the reader can have an idea of the logical flow that occurs from the highest to the lowest

layers.

Page 16: Implementing a USB 2.0 Intellectual

16

b) USB Driver (USBD): The interface between the USB System Software and the Client

Software. This interface provides clients with convenient functions for manipulating USB

devices.

c) USB system software: The USB system software (see Figure 5), allocates bus bandwidth

and manages bus power. It identifies, enumerates, and services data requests from devices

on the bus.

3. Application Layer

The application layer is also known as Client Software (see Figure 5). Client software determines what

transfers need to be made with a function. Client software is aware only of the set of pipes (i.e., the

interface) it needs to manipulate its function. Requests made by the client software are presented via

the USBD interface.

2.2.2 USB Packet Field Formats

For the purposes of our report, we did not find it necessary to define the details of the USB protocol

such as the exact format of packets exchanged. However, we need to describe very briefly a few

terms because they will be used in the Design and Analysis chapter to explain how our Design meets

the standard.

Here, we will simply define some of the most recurring fields in USB packets.

SYNC: All packets begin with a synchronization (SYNC) field. It is used by the input circuitry to align

incoming data with the local clock. The Start-of-Packet (SOP) delimiter is part of the SYNC field.

PID: A packet identifier (PID) immediately follows the SYNC field of every USB packet. A PID consists

of a four-bit packet type field followed by a four-bit check field. The PID indicates the type of packet

and, by inference, the format of the packet and the type of error detection applied to the packet. There

are four types of packets: Token (OUT, IN, SOF or SETUP), Data (DATA0, DATA1, DATA2 or

Page 17: Implementing a USB 2.0 Intellectual

17

MDATA), Handshake (ACK, NAK, STALL or NYET) and Special (PRE, ERR, SPLIT, PING or

Reserved).

Note that an IN PID specifies a transaction from a function to the host.

Whereas OUT/SETUP PIDs specify transactions from the host to a function.

ADDRESS FIELD: Function endpoints are addressed using two fields: the function address field and

the endpoint field. A function needs to fully decode both address and endpoint fields.

DATA FIELD: The data field may range from zero to 1,024 bytes and must be an integral number of

bytes. Data bits within each byte are shifted out LSB first.

CRC: Cyclic redundancy checks (CRCs) are used to protect all non-PID fields in token and data

packets. Token and data packet CRCs provide 100% coverage for all single- and double-bit errors.

2.3 Approach to solving problem

We have decided to solve the problem of designing a hardware system that can interface to a USB

device by implementing a USB 2.0 Revision (high-speed) compliant Intellectual Property core on an

FPGA.

In fact, an alternative to implementing IP cores using FPGAs is doing so using Application Specific

Integrated Circuits (ASIC). In the past, it was a rule of thumb that densities of more than 500,000

gates and volumes above 100,000 units were beyond the capability of FPGAs. Today, FPGAs

approach ASIC-equivalent densities of 1 million gates which is the reason why we found it important to

discuss why an ASIC based approach was not taken. In what follows, we will briefly discuss the

alternative of implementing our project on an ASIC. In doing so, we will compare and contrast it to

implementation on an FPGA and state why we chose to use an FPGA based implementation instead.

First, an Application Specific Integrated Circuit (ASIC) is a chip designed to do a certain specific job or

a small group of jobs. If you want to implement different functionality, then you need to use a different

chip.

Page 18: Implementing a USB 2.0 Intellectual

18

Second, An FPGA can be re-programmed again and again, until all bugs are removed and the system

is working correctly. However, an ASIC is hard-wired with a mask. Once it is fabricated, no changes

can be made. Usually, in the commercial world, a system that has been prototyped on an FPGA is

migrated to an ASIC as one of the final stages before selling the product.

Although an FPGA consumes more power than an ASIC, it was a better choice for us as students

implementing an IP core since we could take advantage of the debugging and reprogramming

advantages that it offers.

Third, ASICs are usually made in large quantities by big companies. The total investment is large, but

the unit cost is small if the chip is manufactured in large quantities. However, ASICs have nonrecurring

engineering (NRE) costs that are pretty high if the end result is targeted towards fabricating only one

chip. On the other hand, FPGAs can be used for one-offs since they do not have nonrecurring

engineering (NRE) costs. Since in our case, we will be implementing only a single chip, the decision of

using an FPGA is to avoid high costs.

Moreover, the purpose of our project is not commercial and we are not concerned about the unit cost

of implementing the chip since it will not be for sale. In addition to the reasons mentioned above, the

defining factor in our choice of an FPGA over ASIC was the availability of the FPGA boards with

physical USB boards in our labs.

2.4 Literature Survey

Have USB IP’s been implemented before? The answer is yes, USB cores are being implemented with

every emerging processor. The USB IP core that interfaces the Microblaze processor is also available.

However they are commercial IP’s and are not for free.

Page 19: Implementing a USB 2.0 Intellectual

19

Alternatives to USB core

The USB IP core is the controller that is required if, for example, you wish to use your USB mouse or

USB memory stick. Our controller is the device that acts as a bridge between the Microblaze

processor and other USB devices. If it is not present then there is no way that you can utilize USB

devices. To be able to interface your processor with a USB device, three options are available:

1. Buy a standard chip or product

2. Buy a commercial core

3. Design the USB core

Buy a standard chip or product

In this solution, an extra chip will be added to the design. This third party chip will have a

microcontroller and other logic that will act as a mediator between the processor and the USB devices.

Since we are trying to add a USB core to the Microblaze soft CPU on the FPGA, using this design

option would be unwise since we would end up using another chip leaving us with a bulky and costly

design.

Buy a commercial core

Another solution would be to purchase a ready made USB core that will be mapped and downloaded

into the FPGA. What we are actually purchasing is the VHDL code that describes the USB IP. This

solution is fast and not risky since the purchased USB core ensures high performance according to

USB standards. This solution is usually pursued by design companies that require a USB interface

with the processor. However, purchasing the IP core ourselves would do us no good since we wish to

design a noncommercial IP core, one that will be used freely in AUB labs.

Design the USB core

Finally we come to the solution that we have chosen. In this solution, the USB core is designed in

VHDL and implemented in the FPGA. This solution is the most tedious of all. It requires a lot of work

since familiarity with the exhaustive details of the USB protocol is needed. The hard part in creating

the design is verifying compliance and interoperability with the USB standards.

Page 20: Implementing a USB 2.0 Intellectual

20

3.0 Design and Analysis 3.1 High Level Design We designed a hardware system on an FPGA that consists of the Microblaze Microprocessor, the

RS232 Core and the USB core that we implemented. All these blocks are connected through the OPB

bus as shown in the figure below.

FPGA

MicroBlazeMicroprocessor

USB Core

On Chip Peripheral Bus (OPB)

RS232 Core

Designed by Xilinx

Designed by us

Figure 6: High Level System Block Diagram

The Microblaze processor runs the C code that implements the Host Controller Driver on the FPGA.

This C code interacts with the USB core by writing and reading from registers. Finally we use the

RS232 Core to display the results of the C code on the screen.

We researched into a large number of USB Embedded Host Controller datasheets implemented by

National Semiconductors, Cypress Semiconductors, Maxim, and Philips. We also designed our

system in reference to a full speed version of a USB HostSlave IP core. We wanted a USB host

controller block diagram which is at the same time simple enough for us to implement, compliant with

the USB 2.0 specifications and including all functionalities necessary for the design of the USB-to-USB

Page 21: Implementing a USB 2.0 Intellectual

21

data transfer application. We finally settled for a block diagram whose summary datasheet is written

below by reference to the datasheets mentioned above.

3.2 Datasheet Summary

3.2.1 Introduction

The host controller enables an embedded system to function as a USB Host, dramatically expanding

the degree of interconnectivity and extending the applicability of USB into many new areas.

3.2.2 Features

• USB host controller for embedded applications.

• USB Specification 2.0 compliant

• Standard 8-bit microprocessor bus interface

• Supports high speed, full speed and low speed USB transactions

• Connected to the Microblaze processor as a slave on the OPB bus.

Page 22: Implementing a USB 2.0 Intellectual

22

3.2.3 Block Diagram

Block Diagram:

Microblaze

address_i(0:7) data i (0:7)clk rstwe_i

Bus Interface

Physical Interface (USB Port)

USB Serial Interface Engine

USB Host Controller

USBSpeed

data o(0:7)

Receive FIFO Transmit FIFO

HostSOFSentIntOut

USB CORE

OPB bus

USBWireDataOut (0:1)

USBWireDataIn(0:1)

HostConnEventIntOut

HostResumeIntOut

HostTransDoneIntOut

strobe i

Calculate Reset

Page 23: Implementing a USB 2.0 Intellectual

23

3.2.4 System Overview

The host controller block diagram consists of five major blocks (refer to Figure above).

• The USB Serial Interface Engine: Seen in a dotted black box in the figure above.

Provides the interface between the Physical USB wires and the USB Core. It deals with low-

level bit granularity by processing the incoming and outgoing data bits on the wires. It is

composed of a SIE receiver that de-stuffs, parallelizes and NRZI decodes raw incoming data

bits, and of a SIE transmitter that does the exact opposite with the outgoing bits.

• Receive and Transmit FIFOs: Seen in a dotted light blue box in the figure above.

Implemented as First-In-First-Out buffers. We use the receive FIFO to hold the data payload

of incoming data packets. These will be read later by the software layer. The transmit FIFO

holds the data payload of data packets to be transmitted, these are loaded with data by the

higher software layer prior to a transaction.

• Host Controller: Seen in a dotted grey box in the figure above.

Operates at the transaction and packet level in contrast to the packet and bit-level at which

the USB Serial Interface Engine operates. It manages all transactions, sends packets and

waits for response packets, and notifies the software layer when a transaction is done.

• Bus Interface: Seen in a dotted red box in the figure above

Selects and enables either one of the Receive FIFO, Transmit FIFO or Host Controller by

processing the address it receives. It also generates the USB clock from the Bus clock (clk_i),

the former being 4 times slower than the latter and processed by the majority of the

components in the USB Core.

• Calculate Reset: Seen in a dotted blue box in the figure above

Calculates different reset signals for each of the USB clock and the Bus clock. The latter

should last 4 times more clock cycles than the former.

Page 24: Implementing a USB 2.0 Intellectual

24

Note that our USB Core is connected to the Microblaze microprocessor as a slave on the On-Chip

Peripheral Bus which is designed for easy connection of the USB peripheral device. It provides a

common design point for various on-chip peripherals.

In the following sections, we provide a detailed description of each component.

3.2.5 Signal Definitions

Name IN/ OUT Description

clk_i IN The bus clock linked to the system clock of the FPGA on the board.

rst_i IN Resets all components if active-high.

address_i [7:0] IN Input Address

data_i[7:0] IN Input Data

we_i IN Write Enable

strobe_i IN Indicates the start of a bus cycle period

data_o [7:0] OUT Output data corresponding to the address_i input

hostSOFSentIntOut OUT Active-high when a SOF transmission occurs

hostConnEventIntOut OUT Active-high when a connect or disconnect of the device occurs

hostResumeIntOut OUT Active-high when a resume state of the device is detected

hostTransDoneIntOut OUT Active-high when a transaction is complete

USBSpeed [1:0] OUT Speed of the attached device (low, full or high)

Table 1: Signals

3.2.6 Registers

These registers are the ones accessed through the signals address_i and data_i above. They provide

the mode of communication between the software layer that implements the Host Controller Driver

and the VHDL code that implements the USB core. By writing to and reading from these registers, the

C code sends control or configuration information to the Host Controller or data to the Transmit FIFO.

These registers also enable the software layer that implements the Host Controller Driver to read data

located in the Receive FIFOs, or to read data from the Host Controller in order to check whether the

Host Controller has processed the right configuration information.

Page 25: Implementing a USB 2.0 Intellectual

25

In fact, in our implementation, the values of these registers are stored in VHDL and not in on-chip or

off-chip memory. The VHDL code reads the address and data values sent on the OPB bus, it decodes

this information to take one out of a set of actions. The details of how the VHDL code deals with these

values are described in the table below.

Register name Register

Address Bit position

Name Description

TRANSREQ_PREEN_SOFSYNC

0x00 1 TRANS_REQ Set to 1 to enable a transaction, 0 to disable it.

2 SOF_SYNC Set to 1 to synchronize transaction with end of SOF transmission.

3 PRE_EN Set to 1 to enable preamble packets.

SOF_ENABLE 0x01 1 SOF_EN Set to 1 to enable automatic transmission of SOF packets

FRAMENUM_MSB 0x02 [2:0] FRAMENUM_MSB

Most significant part of the frame number in SOF transmission

FRAMENUM_LSB 0x03 [7:0] FRAMENUM_LSB

Least significant part of the frame number in SOF transmission

CONNECT_STATE 0x04 [1:0] CONNECT_STATE

If 00, then we the device is at a disconnected state. If 01, low-speed state, if 10 , full-speed, if 11 high-speed

DEVICE_ADDRESS 0x05 [6:0] DEV_ADDR USB Device Address ENDPOINT_ADDRESS 0x06 [3:0] END_ADDR USB Device Endpoint Address TRANSACTION_TYPE 0x07 [1:0] TRANS_

TYPE Setup=0, IN=1, OUT0=2, OUT1=3 To specify the transaction type required,

INTERRUPT_STATUS 0x08 0 TRANS_ DONE_INT

Set to1 when a transaction is complete

1 RESUME_ INT

Set to 1 when resume state is detected.

2 CONNECTION_EVENT_INT

Set to 1 when a connect or disconnect occurs

3 SOF_ SENT_INT

Set to 1 when a SOF transmission occurs.

INTERRUPT_MASK 0x09 0 TRANS_ DONE_INT

Set to1 to enable interrupt when a transaction complete

1 RESUME_INT

Set to1 to enable interrupt when resume state is detected.

2 CONNECTION_EVENT_INT

Set to1 to enable interrupt when a connect or disconnect occurs

3 SOF_SENT_INT

Set to1 to enable interrupt when a SOF transmission occurs.

PID 0xa [3:0] RX_PID Packet ID of the last packet

Page 26: Implementing a USB 2.0 Intellectual

26

received STATUS 0xb 0 CRC_

ERROR When set to 1, indicates CRC error in the last transaction.

1 BIT_STUFF_ERROR

When set to 1,indicates bit stuffing error in the last transaction

2 OVERFLOW When set to 1, indicates that the receive FIFO is full and cannot accept anymore of the incoming data.

3 TIME_OUT When set to 1, indicates no response from USB device.

4 NAK When set to 1, indicates that NAK has been received in response of the last packet sent.

5 STALL When set to 1, indicates that STALL has been received in response of the last packet sent.

6 ACK When set to 1, indicates that ACK has been received in response of the last packet sent.

7 DATA_ SEQUENCE OR NYET

Indicates the sequence number of the last packet received in case of IN transaction, or if it is a handshake packet, this indicates whether it is a NYET.

LINE_CONTROL_INFO 0xc [1:0] LINE_ STATE

If direct control is enabled, LINE_STATE directly controls the state of the physical wires.

2 DIRECT_ CNTR

Set to 1 to enable direct control of the USB physical wires

[3:4] LINE_ POLARITY_BIT

If 00, enables low-speed line polarity, if 01 full-speed line polarity, if 10 high-speed line polarity.

RX_FIFO_DATA 0x20 [7:0] RX_FIFO_ DATA

Contains the receive payload of the last IN Transaction

RX_FIFO_DATA_NUM_MSB

0x21 [7:0] RX_FIFO_DATA_NUM_MSB

Most significant byte of the number of elements in the receive FIFO

RX_FIFO_DATA_NUM_LSB

0x22 [7:0] RX_FIFO_DATA_NUM_LSB

Least significant byte of the number of elements in the receive FIFO

RX_FIFO_RESET 0x23 0 FORCE_ EMPTY

When set to 1, deletes all data in the receive FIFO

TX_FIFO_DATA 0x30 [7:0] TX_FIFO_ DATA

Contains the transmit payload of the last OUT Transaction

TX_FIFO_RESET 0x31 0 FORCE_ EMPTY

When set to 1, deletes all data in the transmit FIFO

Table 2: Register Description

Page 27: Implementing a USB 2.0 Intellectual

27

3.2.7 Choice of Design

In our quest for an appropriate block diagram for the USB host controller we came across a number of

different implementation designs, such as: ISP1760; Embedded Hi-Speed USB host controller,

ISP1563; Hi-Speed Universal Serial Bus PCI Host Controller from Philips, SL811HS; Embedded USB

Host/Slave Controller. Many of these implementations had very low level and complicated block

diagrams and/or included more features and functionalities than what was needed for our project. The

design we settled for is as simple as possible implementing just the functionalities we need. Below is a

table including a list of examples that contrast our choice of design to an alternative one along with the

reason of our design choice.

Our Choice Alternative Design Reason of Choice

Implement only the host controller functionality.

Include slave controller functionality along with that of the host controller.

For simplicity purposes. In future work, slave functionality may be added.

Processor interface not designed to satisfy common standards among other interfaces. Satisfy only the OPB bus Protocol.

Implement a processor interface which follows certain common standards (e.g. Wishbone-compatible).

Other design alternatives have a processor interface to many kinds of microprocessors, microcontrollers, or directly to a variety of buses such as ISA, PCMCIA. Whereas our design only needs to have an interface to the Microblaze processor.

Host Controller can be interfaced directly via 8 bits of its data bus and 8 bits of its address bus.

Provide an 8-bit bidirectional data path along with appropriate control lines to interface to external processors or controllers. Access to memory and control register space is a simple two step process, requiring an address Write (set a certain control line called A0 to “0”) followed by a register/memory Read or Write cycle with address line A0 = “1.”

Simpler design implementation.

Table 3: Compare alternative designs with our design features

Page 28: Implementing a USB 2.0 Intellectual

28

In order to build the USB High Speed core, we had to implement the USB 2.0 specification, which only

specifies the language that high speed USB speaks but provides no details of implementation.

Therefore, our first target was to complete the VHDL code that implements the USB High Speed

protocol specification. As part of the specification, our core is supposed to be backward compatible

with all three speeds of USB devices (low speed, full speed, high speed). This is because a high

speed USB port residing on a host computer is expected to succeed in initiating communication with

all USB devices, irrespective of their speed of operation. In this chapter, we will describe the

implemented design of the core that has been written in VHDL.

Our VHDL code, which implements a host USB IP core, is composed of two main blocks as seen in

the figure below: The Host controller component and the SIE component. The SIE component itself is

divided into the Transmitter and Receiver components. The core interfaces to the Host Controller

Driver implemented in software from the upper part and to the physical USB port from lower part. Note

that raw bits of 0 and 1 are sent on the USB port as seen in the figure below.

Figure 7: Overall System

4.0 Implementation

Page 29: Implementing a USB 2.0 Intellectual

29

4.1 USB Core General Implementation

When a USB device initiates communication with the host computer, the receiver reads the bits that

are on the USB cable at the correct speed, decoding whether these bits represent a certain state

(such as start of packet, end of packet, idle) or certain fields that are part of USB packets and

transferring this information to the Host Controller. To achieve this goal, the receiver reads serial data

at one of the USB speeds, and it converts it to bytes which it sends to the Host Controller. Note that

between receiving raw bits on the wire and sending bytes to the HC a lot of processing is done by the

receiver block. For example, for a packet with a CRC, it recomputes the expected CRC and checks if it

is equal to the one received. If it is not, it reports an error to the HC. The receiver also removes the bit

stuffing that had been performed by the USB device, because the bytes that go to the HC must be

pure of bit stuffing. Moreover, the receiver detects the speed of the USB device upon connection and it

provides this information to all other components. For example, if it decodes that a high speed device

was connected, it sends this to both the HC and the Transmitter. The Transmitter will then only send

to the USB device at High Speed. As for the HC, this is the component that initiates and controls the

progress of all transactions. So, it would need to know whether a device is high speed so that it sends

a Start of Frame packet more often that in case the speed was low or full. These are just a few

examples to explain the sort of communication that happens when raw bits are received on the wire.

Now, assuming that the receiver has decoded what speed the connected device is running at, the core

is supposed to have an initial transaction with the USB device. A transaction consists of several

packets. The host controller initiates and controls the progress of all transactions. As input to the host

controller, we specify the type of transaction that the Host Controller Driver (HCD) wishes to make with

the USB device as well as other needed fields such as the address of the device. The host controller,

knowing what transaction is to be sent will command the transmitter to send the packets that make up

that type of transaction and then it will wait for a response from the receiver which indicates a

handshake from the device or a timeout indicating a lack of handshake. For example, let us take the

case of having a setup transaction with the device at startup. When the HC realizes that it needs to

initiate a setup transaction, it enters a state machine in which it sequentially sends a setup token

packet, a data packet and waits for a handshake. The setup token packet simply contains the address

Page 30: Implementing a USB 2.0 Intellectual

30

of the targeted device and is an indicator that the following packet is used to configure setup

information. The data packet contains the setup information itself. So the host controller indicates to

the transmitter component that a setup token packet must be send to device with address x. The

transmitter, receiving this information from the HC, implements the details of the physical USB

protocol. For example, when the transmitter receives a command to send a setup token packet, it

cannot simply send the received bits for the packet on the line. The Transmitter needs to do a whole

lot of processing before sending the packet. The transmitter first sends a START OF PACKET

sequence on the line, serializes the bytes that it receives from the HC into bits, calculates the CRC

over the packet, performs bit stuffing and then it sends the resulting bits on the USB cable. Then,

realizing that the setup token packet has been sent, it sends an END OF PACKET sequence on the

line. Note that all this processing was still for the first packet sent. A similar sequence happens for the

data packet in the Transmitter. Now that both token and data packets have been sent, the core is in a

state of waiting for some sort of response from the device regarding the packets that were sent

previously. Two possible cases can occur. The device will either respond with a handshake packet or

it will not respond. The receiver block waits “listening” attentively on the USB cable. If it receives no

bits for a certain amount of time specified by the protocol, it reports a timeout to the HC. The host

controller has as output a timeout interrupt signal that it sets in this case. This signal is to be handled

by the (HCD) which should try to send the transaction another time. On the other hand, if the receiver

starts receiving bits, it decodes them to find out whether they make a handshake packet. If yes, it

reports to the HC that a handshake packet has been received from the device. The HC at this stage

has completed the whole transaction, so it interrupts the Host Controller layer to say that the

transaction is complete.

The explanation above is a very low detail and high level view of the interaction that happens between

the components within the core, the software layer and the USB device. The implementation was quite

tedious as it required us to take care of so many cases of transactions and packets and also to have

the core support all three speeds, each of which has different signaling rate and different transfer

protocols.

Page 31: Implementing a USB 2.0 Intellectual

31

Our first step was to write the IP core in VHDL that implements the hardware and tests it. We defined

the interface to our core and made sure that it was working properly as a black box.

As for the HCD software layer, to have it fully working, all cases implemented in the core should be

covered which in fact is implemented as a protocol on its own called enhanced host controller

interface (EHCI). We concerned ourselves with writing a case that validates that the VHDL core is

working but it is not comprehensive. We will explain in what follows what we have implemented and

the future work that must be done in orderto complete the core to have it communicate with a USB

device from application level software.

When we described our VHDL core in the previous section, we only specified the main high level

blocks such as Host Controller, Transmitter and Receiver. This was to give the reader an

understanding of the overall functionality of the core. However, the code is pretty detailed since it

implements most of the USB protocol. In fact, the core is composed of many more components that

are sub-components of the previously mentioned ones. In the discussion below, we will explain the

block level design of each of these blocks

4.2 Host Controller The USB Host Controller is the main block in the USB Core that manages all outgoing and incoming

transactions. The different components in the USB Host Controller can be logically divided into 2

parts: those that manage all outgoing data transfers (Host Controller Arbiter, Control SOF, Send SOF,

HCA&SOF MUX, Check Preamble, Transmit Packet, Direct Control, SOF DC TxPacket MUX) and

those that manage all incoming data transfers (Host Controller Arbiter, Receive Packet, Interrupt

Generator). The USB Host Controller further contains a component that provides it with an interface to

the bus. The figure below depicts the block diagram of the USB Host Controller along with all its sub-

components.

Page 32: Implementing a USB 2.0 Intellectual

32

Figure 8: Host Controller The USB Host Controller processes all control information sent by the software layer; whether

automatic transmission of SOF and PREAMBLE packets is enabled or not, the type of transaction

required and sends this information to the addressed components within the USB core, as a first step.

It also sends information about the transaction that is taking place or the device the host is attached

to, back to the software layer; the speed of the device, the kind of handshake received and so forth.

This component also has the function of sending interrupts to the software layer when a transaction is

done, a SOF is sent, resume is detected or the connection state of the USB physical line is changed

(the possible states being, DISCONNECTED, LOW-SPEED, FULL-SPEED or HIGH SPEED).

Depending on the type of transaction required (IN, OUT0, OUT1 or SETUP) it takes care of sending

appropriate packets to the device (token, data or handshake). In case automatic transmission of SOF

and/or PREAMBLE packets is enabled, it sends SOF packets at the start of each frame, or

PREAMBLE packets before each data or token packet.

Also, in case the software layer has also enabled direct control of the USB physical wires, it takes care

of sending to the device, the state of the line as specified within the control information sent by the

software layer.

Page 33: Implementing a USB 2.0 Intellectual

33

The USB Host Controller stores the payload of the data packet it receives from the SIE Receiver and

therefore from the device, in the Receive FIFO, to be read later by the software layer. It also packages

the data in the Transmit FIFO as part of the payload of the data packet within an OUT or SETUP

transaction to be sent to the SIE Transmitter and consequently to the device.

Host Controller Bus Interface

The Host Controller Bus Interface interfaces between the Host Controller component and the Bus

Interface. Its job is to synchronize between the USB clock and the bus clock. It has a 4-bit address as

input, this address represents the address of the register whose content is in the 8 bit dataIn signal,

the Host Controller Bus Interface, divides this input data and assigns it to the appropriate signals or

assigns it as a whole to the dataOut output, to be sent to different components of the host controller or

other components of the wrapper.

Host Controller Arbiter

This Host Controller Arbiter checks whether a transaction is required (transReq bit set by software

layer components), then checks what is the transaction type required by the upper-level: SETUP, IN,

OUT0, OUT1. Accordingly, it sets the PID of the packets that are to be sent and asks for a turn from

the HCA&SOF MUX to send the packet, or it enables the Receive Packet component to read incoming

packets.

• In case an IN transaction is required, the packet ID is set to IN, it waits till this packet is sent,

and then that a packet is received from the device, after which it sets the id of the packet ACK,

and then notifies the upper level that the required transaction is done.

• In case a SETUP transaction is required, it first sets the packet ID to SETUP, it waits till the

packet is sent, then it sets the packet ID to DATA0, waits till the packet is sent and an ACK is

received, then notifies the upper level that the required transaction is done.

• In case an OUT0 transaction is required, if it had received a NYET, as a response for the

previous transaction, it sets the packet ID to PING, otherwise it sets it to OUT, waits till the

Page 34: Implementing a USB 2.0 Intellectual

34

packet is sent, then sets the id to DATA0, wait till it’s sent and an ACK is received. Finally

notifies the upper level that the required transaction is done.

• In case an OUT1 transaction is required, it sets the packet ID to OUT, waits till the packet is

sent, then sets the id to DATA0, wait till it’s sent and an ACK is received, then notifies the

upper level that the required transaction is done.

Control SOF

This component keeps track of a timer for SOF. This timer is then used by the Send SOF component.

Send SOF

When the timer for SOF, given by the SOF Controller, reaches a certain value (which differs in low

speed and high speed) it notifies that a Start of Frame (SOF) packet needs to be transmitted.

HCA&SOF MUX

This block arbitrates between requests from the Host Controller Arbiter and the Send SOF

components both of which want to send packets. Send SOF wants to transmit SOF packets whereas

Host Controller Arbiter wants to transmit packets with any other PID. The block gives priority to the

Send SOF because the SOF packet needs to be transmitted first.

Check Preamble

As soon as there are packets that need to be sent, this block first checks if the software layer

components have enabled automatic transmission of preamble packets. If so, it waits until the

Transmit Packet component is ready to send packets, then it signals to it that a PREAMBLE packet

needs to be sent. In case the higher-level components have not enabled automatic transmission of

preamble packets, or after the PREAMBLE packet is sent, it signals the Transmit Packet component

that a packet needs to be sent, and forwards the packet’s ID with the value provided by the HCA&SOF

MUX : either SOF or any other PID provided by the Host Controller Arbiter component itself.

Note that PREAMBLE is only sent in low and full-speed transmissions before any token, data or

handshake packet.

Page 35: Implementing a USB 2.0 Intellectual

35

Transmit Packet

It acts according to the packet ID (PID) it receives from the Check Preamble. It checks if the PID is

SOF and the device it is attached to operates at low speed, in that case, it sends a KEEP_ALIVE

control signal to the SIE. In fact, low speed devices do not see SOF packages, this KEEP_ALIVE

signal plays the same role as SOF packages for low speed devices; it keeps low-speed device from

entering the Suspend state.

In case the packet ID is not SOF and at the same time the attached device is not low-speed, it sends a

TX_PACKET_START control signal to the SIE Transmitter, then distinguishes between the data and

token packets along with their PID types:

• If the Packet ID is either DATA0 or DATA1, it reads data from the Transmit FIFO, and sends

this data to the SIE Transmitter along with a control signal called TX_PACKET_STREAM to

indicate it is sending data. When it has read all the data from the Transmit FIFO, it sends a

TX_PACKET_STOP control signal to indicate that it has finished sending data.

• If the Packet ID is SOF it sends the frame number to the SIE Transmitter, along with a control

signal called TX_PACKET_STREAM, it also increases the frame number

• If the Packet ID is either IN, OUT, SETUP, it sends the device endpoint number and address

along with a control signal called TX_PACKET_STOP, to indicate the end of the packet.

Direct Control

The Direct Control block checks if the higher-level components allow direct control of the state of the

USB physical wires, if so it requests the direct control line state specified by the upper-level

components along with a control signal TX_DIRECT_CONTROL (to describe the data it is sending) to

be sent to the SIE Transmitter. In case direct control is not enabled, it simply sends a control signal

called TX_IDLE to the SIE Transmitter.

Page 36: Implementing a USB 2.0 Intellectual

36

SOF DC TxPacket MUX

This block acts as a multiplexer between the Control SOF, Transmit Packet and Direct Control to

using the Transmit port of the host controller in order to send packets. It gives priority is given first to

the Control SOF, then to the Transmit Packet and finally to Direct Control components.

Receive Packet

The Receive Packet block first checks whether the incoming data is valid, then whether the PID is

HANDSHAKE or DATA. If it is a HANDSHAKE packet, it sends to the Host Controller Arbiter

information it received from the SIE about the packet (errors in CRC, Overflow, RX Time Out and the

data sequence). In case it is a DATA packet, as long as the incoming data is valid, it reads it in and

sends it to the receive FIFO. However at some point it checks whether the Receive FIFO is full, in that

case it delays incoming received data in the FIFO until there is some space in the FIFO.

Interrupt Generator

Interrupts the higher-level components in case the connection state is changed (the possible states

being disconnected, low speed, full speed or high speed) or resume is detected by the SIE.

SpeedCtrlMux

This block sends the speed at which signaling with the USB device should occur to the Transmitter.

4.3 Transmitter The transmitter block, which is a sub-component of the Serial Interface Engine (SIE) block, takes as

input signals from the host controller and provides as output bits to be sent on the USB port towards

the USB device. The transmitter consists of sub-components each of which has a specific function

designed to support high speed, full speed and low speed USB communication. The figure below lists

the subcomponents within the transmitter which are: Data States, Transmit Controller, Token CRC,

Data CRC, Bit Stuffer and Serializer, Direct Bits, Bit Stuffer and Serializer and Direct Bits MUX and

USB Write. In what follows, we will describe each component in more detail.

Page 37: Implementing a USB 2.0 Intellectual

37

Figure 9: Transmitter Data States The USB transfers signals and power over a four-wire cable. The signaling occurs over the two wires

D+ and D- while power is provided through VBUS and GND wires on each segment to deliver power to

devices.

Figure 10: USB Cable When transferring data, there are 4 possible states on the bus:

D+ D- Differential 0 0 1 Differential 1 1 0 Single-Ended-Zero 0 0 Single-Ended-One 1 1

Table 4: Bus States

Page 38: Implementing a USB 2.0 Intellectual

38

In addition to the bus states mentioned above, which are defined by voltages on the lines, USB also

defines two Data bus states, J and K. The J and K data states are the two logical levels used to

communicate differential data in the system. These are defined by whether the bus state is Differential

1 or 0 and whether the cable segment is low or full or high speed.

Data States

Bus States Low Speed Full Speed High Speed Differential 0 J K K Differential 1 K J J

Table 5: Speed and Data States

Figure 11: Speed and Data States

The reason that J and K states are defined in this manner is so that one standard terminology can be

used to describe a logic state on the USB cable although the actual voltages on the Differential 0 and

1 lines are different. For example, a Start-of-Packet (SOP) state exists when the bus toggles between

the J and K states. On a high/full speed line, this means that D- becomes more positive than D+, while

on a low-speed segment, it means that D+ becomes more positive than D-.

Now that we know what the protocol requires of us, we can explain the functionality of Data States.

This is a very simple block which takes as input the speed of the USB device that we are connected to

and depending on that, it sets the J and K data states to either Differential 1 or Differential 0. In all

blocks that follow, we just use the J and K states without having to deal with Differential 0 and 1.

Transmit Controller

This block is at the heart of the transmitter block and is the most involved in controlling what states all

the other blocks in the transmitter will be in. It receives as input bytes from the host controller. It

compares the first byte that it receives to a constant to figure out whether a token, data, handshake or

special packet is to be sent. This byte is basically the packet id of the corresponding packet. Now, for

D+

D-

0

Differential 0

1

1

Differential 1

0

Data StatesJ (LS)K (FS/HS)

Data StatesJ (FS/HS)K (LS)

Page 39: Implementing a USB 2.0 Intellectual

39

each of the four types of packets, it enters into a sequence of states whereby it accepts from the HC

the remaining fields of the packet, sends these fields to the Data CRC and Token CRC blocks, sends

the bytes to the Bit-Stuffer and Serializer then appends the CRC (if applicable) to the end of the

packet and sends it to be serialized and bit stuffed.

For a data packet, it first receives the first byte which is the packet id. From this it decodes that this is

a data packet. It sends this packet to the Bit Stuffer and Serializer along with control information to

indicate that this is the first byte of the packet. The Bit Stuffer and Serializer sends a start of packet

sequence before going on to bit stuff and serialize the packet id. Knowing that a data packet can have

a multiple of bytes after the packet id it goes on to a state waiting for the HC to write the data byte into

it. Now along with this byte comes control information that informs the Transmitter Controller whether

this is the last data byte or there is more to come. If this is the last, it appends the CRC value

computed by Data CRC component and it sends all this to the Bit Stuffer and Serializer to be further

processed. If this is NOT the last byte, it waits for another byte and stays in this loop until the last byte

is received.

For a token packet, it first receives the first byte which is the packet id and decodes that this is a token

packet. Knowing that a token packet has a packet id, followed by an address field followed by an

endpoint field, it waits in different states until it receives the remaining two bytes. Since this is the last

field in the packet that will be sent from the host controller, it reads the CRC value calculated from

Token CRC, appends it to the packet and then sends all this to the Bit Stuffer and Serializer to be

further processed.

For a handshake/special packet, it first receives the first byte which is the packet id. From this it knows

that this is a handshake/special packet. Knowing that a handshake/special packet has only a packet id

it sends this to the Bit Stuffer and Serializer to be further processed.

Figure 12: Handshake Packet

Page 40: Implementing a USB 2.0 Intellectual

40

Data CRC A CRC is a cyclic redundancy check performed on data to see if an error has occurred in reading or

writing the data. The result of a CRC is transmitted with the checked data. At the receiving end, the

transmitted result is compared to the CRC calculated for the data to determine if an error has

occurred. The goal in inserting a CRC as part of the packet is to maximize the probability of detecting

errors using only a small number of redundant bits. The Divisor polynomial used to generate the CRC

is C(X) = X16 + X15 + X2 + 1.

When a data packet is sent (shown below) a special Data CRC of 16 bits for it is calculated. Note that

the PID is not included in the CRC check. The data CRC only covers the data field of the Data packet.

The Data CRC block has the function of generating the CRC over all the data fields that are

sequentially input to it by the Transmit Controller. When the Transmit Controller has sent all the bytes

of the DATA field to the Data CRC block, it reads the resulting 16 bit CRC and it sends it to the

Serializer and Bit Stuffer block to further process the packet.

Figure 13: Data Packet Token CRC

Similarly to the Data CRC block discussed above, TOKEN type packets are protected with a 5 bit

CRC. In this case, the function of the Token CRC block is to generate the CRC for a TOKEN packet.

The Divisor polynomial used to generate the CRC is C(X) = X5 + X2 + 1.

When a token packet is sent (shown below) a special Token CRC for it is calculated. Note that the PID

is not included in the CRC check. The Token CRC block has the function of generating the 5 bit CRC

over the ADDR and ENDP fields input to it sequentially by the Transmit Controller. It is later

appended to the end of the packet and sent as part of it.

Figure 14: Token Packet

Page 41: Implementing a USB 2.0 Intellectual

41

Encoder, Bit Stuffer and Serializer This block has a multitude of functions that it completes. It receives two types of commands from the

Transmit Controller. The first is that it receives a command to send a byte of a packet and the second

case is that it receives control commands to send a special sequence on the USB cable that defines a

USB Bus State (Idle, Start of Packet, End of Packet and so forth). For example, when the Transmit

Controller wants to send a Data Packet, it sends the packet id of the data packet to this component

along with control information specifying that this is the start of the packet. This block has been

designed to automatically send at its output serial data that corresponds to the Start of Packet

sequence defined by the specification.

The encoding format used by the USB protocol is called Non-Return to Zero Inverted (NRZI) where a

“1” is represented by no change in level and a “0” is represented by a change in level. A string of zeros

causes the NRZI data to toggle each bit time. On the other hand, a string of ones causes long periods

with no transitions in the data. The figure below shows a data stream and the NRZI equivalent.

Figure 15: NRZI Encoding

Once we have NRZI encoded the data, we need to be able to send it on a physical USB cable,

specifically on the D+ and D- wires which were described before. For the sequence shown above, the

data sent to the USB cable for a high speed device would be JKKJJKKJKKKKJ. Note a Differential 1 is

a J in high speed.

Back to sending the data packet, after the start of packet (SOP) sequence is sent the bytes of PID,

DATA and CRC are bit stuffed and encoded. The protocol defines bit stuffing as the insertion of a zero

after every six consecutive ones in the data stream before the data is encoded. Note that if the data to

be sent included a sequence of 7 or more consecutive one, such as

Page 42: Implementing a USB 2.0 Intellectual

42

011111110 then the data sent on the USB cable without bit stuffing would be JKKKKKKKJ. With bit

stuffing, we insert a 0 after six ones to get 0111111010 and it would be sent as JKKKKKKJJK. In the

USB protocol, the host and device do not share any clock and thus bit stuffing, which forces a toggle

in the data sent, ensures that the receiver remains synchronized with the transmitter without the

overhead of sending a separate clock signal or Start and Stop bits with each byte.

With the last byte of the Data packet (CRC byte), the block receives control information stating that

this is the end of the packet so it inserts the End of Packet (EOP) sequence on the line.

The output of this block is serial data that has undergone bit stuffing and encoding and is ready to be

sent on the USB wire. However, in order to have the data written on the wire at the correct speed, we

need to have a few other blocks that manage this.

Encoder, Bit Stuffer and Serializer and Transmit Controller MUX The component above takes care of sending bus states and USB packets on the USB cable. The

actual bits to be sent are calculated as the packet passes through the blocks of the Transmitter. The

software layer simply needs to provide the core with the type of packet and the values of fields in the

packet and the Transmitter along with the Host Controller take care of sending the packet in

accordance with the details of the specification.

Apart from this functionality, our VHDL core has been designed to allow the software layer to place

specific bits on the line, where these bits do not correspond to packet related information. To achieve

this functionality, the Transmit Controller has serial outputs through which it can output the specific bits

requested by the software layer. Note that these are predefined serial bits and need not be bit stuffed

and encoded. So, these are directly sent from the Transmit Controller to the MUX block. The MUX

block receives serial inputs from the Transmit Controller and the Encoder, Bit Stuffer and Serializer

blocks. The inputs from the Encoder, Bit Stuffer and Serializer

Have priority to be sent first since a packet cannot be interrupted in the middle to send a desired

sequence. This is a very simple block that simply forwards the inputs from one of the two blocks to the

USB Write block.

Page 43: Implementing a USB 2.0 Intellectual

43

USB Write This block is the final block before previously processed data is actually sent on the physical USB

Wire. It maintains an input buffer to accept data that it receives and also manages an output buffer

that is responsible for writing data at the specified speed to the physical USB wire. We have

implemented these buffers as FIFO buffers so that the sequence of bits transmitted remains as it was

supposed to be.

Since our core supports all three speeds, this block should be able to write to the line at the rates of

1.5Mbps (LS), 12 Mbps (FS) and 480Mbps (HS). With a 960MHz input clock, we implemented

counters that enable this block to write at all three speeds. We implemented this as a 7 bit counter.

We can send bits at HS each time the LSB of the counter rolls over. This will divide the input clock by

two to get a 480Mbps. To send at FS, we wait for the 4 LSBs of the counter to roll over 5 times. This

will divide our clock by 24*5=80 and thus we can write at 960/80=12Mbps. To send at LS, we wait for

the 7 bits of the counter to roll over 5 times. This will divide our clock by 27*5=640 and thus we can

write at 960/640=1.5Mbps.

4.4 Receiver The receiver block, which is a sub-component of the Serial Interface Engine (SIE) block, takes as

input signals directly from the USB wire. The two inputs are the D+ and the D- signals. The main

function of the receiver is to convert the bits it is receiving into bytes which will then be analyzed and

given to the Host Controller (HC). Before it sends the bytes to the HC, it checks for CRC errors and bit

stuff errors. The Receiver has another very important task; it is responsible for detecting the speeds

of the connecting devices. Every device that works at a certain speed (low, full or high) will signal the

receiver giving it the needed data to determine the speed and thus the receiver will find out what the

speed of a connecting device is and notify the whole core of this speed.

The figure below lists the subcomponents within the receiver which are: USB Read, Bit To Byte

Converter, Byte Analyzer And Detect Speed. In what follows, we will describe each component in

more detail.

Page 44: Implementing a USB 2.0 Intellectual

44

Figure 16: Receiver USB read

This is the lowest level component of the receiver block. Its function is to read the 1’s and 0’s from the

USB wire which contains the D+, and D-. This component can read the input at low speed, full speed

or high speed. The main concept behind reading the input is as follows: There is a 7 bit counter i that

is incremented at every rising edge of the clock. If we are working with high speed, then we will take in

a new input from the system whenever the least significant bit of i is 0. If we look at the figure below,

the first signal is the clock, the second is the counter i and the third signal is the high speed data in

tick. The high speed data in tick has a period of 2.0833 ns and this is the rate at which we take in

inputs. Thus, we will read from the wire whenever the high speed data in tick goes from 0 to 1. Note

that the high speed data in tick is derived from the least significant bit of the counter i.

Figure 17: High Speed Data in Tick

Page 45: Implementing a USB 2.0 Intellectual

45

The full speed and low speed rates have data in ticks that are 40 times and 320 times slower

respectively and thus data will be read from the USB wire at those slower speeds. To toggle the full

speed rate data in tick, we would wait until the four least significant bits of the counter i become 0000

5 times. For achieving low speed, we would wait for the seven bits of i to become 0000000 5 times.

This component also takes into account metastabilty issues using a very simple method. To solve the

problem of not reading data whenever the USB wire is changing abruptly, we simply reset the counter

whenever the input changes. This way, we will never take in bits if they have not been on the line for

2.0833 ns. Whenever a new bit is read from the USB wire, it is first written to a buffer and then output

from this component. Whenever a new bit is output, we signal to all the other components that we

have received a new bit and thus this is the only component that will have to deal with timings. The

rest of the components will be waiting for the data out tick to toggle and will thus know whether a new

bit has entered the system.

In addition to taking in inputs and outputting them whilst setting the data out bit to 1, this component

also checks if a no activity time out has occurred and outputs this signal to other higher level

components such as the HC.

Bit to Byte Converter This component deals with converting the bits into bytes and sending the bytes to the Byte Analyzer

component. It monitors the output signal from the USB Read component and thus knows that a new

bit has entered whenever the data out signal becomes 1. The function of this component is to combine

every 8 bits using the NRZI decoding mechanism, forming a byte, and outputting the byte. In addition

to NRZI decoding, this component performs bit de-stuffing. The NRZI decoding is performed as

follows: when a bit is received, it is compared to the previous bit that was received. If they are

different, then a 0 is inserted into a byte, otherwise a 1 is inserted. This whole process is repeated 8

times until a byte is formed. Let us take an example; we receive 8 new bits. Note that the input can be

either a differential 0, a differential 1, a single ended 0 or a single ended 1.If the input was J, J, K, K, J,

J, J, K, and let us assume that the input that we had before those 8 inputs were received was a J.

Page 46: Implementing a USB 2.0 Intellectual

46

The byte that will be output is formed as follows:

Received Bit: 00000000

J: 10000000 J: 11000000 K: 01100000 K: 10110000 J: 01011000 J: 10101100 J: 11010110 K: 01101011

Thus the byte received is 01101011. This byte will then be analyzed by the Byte Analyzer which is the

next component that we will be discussing.

Byte Analyzer

This component is responsible for analyzing the bytes that were formed in the Bit to Byte Converter

component. It is also responsible for calculating the CRC and comparing it with the CRC that was

sent with the packet. Whenever a byte is sent to this component from the Bit to Byte Converter

component, a data out signal is set to 1. Thus, this component waits for an input by monitoring the

data out bit from Bit to Byte Converter component. The first byte that we wait for is the start of packet

(SOP) which is 10000000 for low/full sped and 1000000000000000000000000000000 for high speed.

Note that this is taking into consideration that we receive the bytes in little endian order. Thus, this

component needs to know the speed at which we are working with. After the SOP is received we enter

a state where we wait for the next byte to come. When the next byte after the SOP is received, the

byte is analyzed and the PID field is checked to see if it is a special, token, handshake or data packet.

From the PID field we should know what bytes to expect next. Finally after all the bytes are sent to us,

an end of packet byte (EOP) which is 00000000 for low/full speed and 1111111 for high speed will be

received. After the EOP is received, this component will signal the upper components that a full

transaction has been received and will output the data to the upper components. For example, let us

simulate receiving an ACK from a low speed device. The bytes that should be received are SOP:

10000000, ACK: 00101101, EOP: 00000000. After receiving the 00101101 which contains the PID of

the ACK, we will know that we should not expect to receive anything else since this is only an ACK

Table 6: Bit to Byte Conversion

Page 47: Implementing a USB 2.0 Intellectual

47

transaction and so we should expect an EOP. After receiving those three bytes, the Byte Analyzer

component will tell the HC (Host Controller) that an ACK has been received. If for example, instead of

receiving an ACK we are receiving a data packet, then the EOP field will tell us when the data stream

has ended.

Detect Speed

This component will detect the speed at which a connected device is sending us bits. Let us assume

that a low speed device is connected to our receiver. The first thing it does is that it sends a J bit (01)

for a specific amount of time (2.5 ns). After the 2.5 ns has elapsed with the J bit as an input, the rest of

the components will be signaled and told that they should work at low speed. For full speed detection,

a K bit (10) has to be received for 2.5 ns. We are left with the detection of high speed which is a bit

more complex and requires some interaction between the device and the Detect Speed component.

Once a high speed device is connected, it will always connect at full speed. That is, it will send a K bit

for 25 ns and thus establish a full speed connection. It will then wait to be reset. Once reset it will send

01 for a certain amount of time. This 01 will confirm that it is indeed a high speed component. After

Detect Speed receives the 01 it will send it a sequence of bits to tell the device that it is high speed

compatible and has accepted the 01. To summarize the above procedure, this is what happens in this

component for high speed connections.

• A high speed device is connected.

• The device sends a “10” for 2.5 ns,

• The receiver commands the transmitter block to send a “00” for 2.5 ns to reset the device.

• Once the device is reset, it will send a “01” for 2.5 ns.

• Once the receiver detects the “01”, it commands the transmitter to send a sequence of bits to

acknowledge the device.

4.5 Host Controller Driver

So far, all the blocks that we described were implemented in VHDL. Concerning the software layer

that should interact with the VHDL core, we wrote a C code that implements the sending of a complete

Page 48: Implementing a USB 2.0 Intellectual

48

SETUP transaction. Such a transaction consists of the host sending token and data packets and

waiting for a handshake packet as a response from the device.

To test this software code, we would have to run it. It would automatically prompt the core to send

token and data packets to the USB device through the wire and then to wait for a response. Here, we

faced a problem whereby we could not link the output of the core (bits to be sent on the wire) to the

physical USB port that resides on the P160 additional module that we had planned to attach to the

Virtex VIIMB Development Board. The reason for this was that there was a physical chip that

interfaced to the USB pins. This chip was a RS232_USB Bridge Interface called Cygnal CP2101.

Figure 18: Cygnal CP2101 Thus, to use the USB physical port, we would have to send data according to the RS232 interface. We

had already completed a large part of the VHDL core that implements the USB specification and were

eager to test our system according to the USB specification, so we had to come up with an alternative

to ensure that our system implements its functionality on an FPGA.

Our alternative solution to this was to write a simple VHDL block that acts a device. That is, it

simulates the actions of the device at the bit level. We implemented this as a Finite State Machine that

has the states shown in the figure to the right.

Page 49: Implementing a USB 2.0 Intellectual

49

The FSM waits until the bits relating to SOF and startup are sent by the VHDL core as if they were

being sent to an actual USB device. It moves to a state waiting to receive a setup token packet, then a

data packet. Once the receiver is at this stage, it should send a handshake packet. We simulated the

device by hard-coding the bits that the device would send to the core if it were to send an actual USB

handshake packet. We assumed that the device was sending an ACK Handshake packet which would

complete the transaction. Once the device simulator sends the ACK Handshake packet, the job of the

FSM is complete and the Receiver part of the VHDL Core comes takes action. The Receiver block of

the VHDL core receives these bits from the outputs of the device simulator, it processed the bits to

figure out that and ACK response has been received. It then informs the Host Controller that the

device responded with an ACK. The Host controller outputs this information to the software layer by

setting the Transaction Done bit to a 1. This way, the user who initiated a transaction from the

software layer by writing into a few registers can be informed that the transaction was completed by

reading the value of the transaction done bit.

Figure 19: FSM states

Page 50: Implementing a USB 2.0 Intellectual

50

In order to verify the functionality of our code, we performed tests on each of the components

described in the Implementation chapters separately and combined all the system together and

performed more comprehensive tests. In implementing our VHDL system, we worked in parallel

developing the three main components (Host Controller, Transmitter, Receiver) after we had

understood how they need to interface together, then we tested each separately as a black box and

finally we proceeded to test the system as a whole to verify its functionality. Note that for all our test

cases provided below, we will describe them assuming a high speed device to avoid redundancy,

although the same cases work for low speed and full speed.

5.1 Host Controller Testing The tests done independently on the Host Controller will involve the four possible types of transactions

which are: SETUP, IN, OUT0 and OUT1. Note that all transactions are initiated by the host and that

every transaction consists of a number of packets.

Send SETUP Transaction

When the software layer requires a SETUP transaction, the Host Controller first sends a SETUP token

packet, then a data packet, the payload of which is read from the Transmit FIFO previously loaded by

the software layer, it then waits till a handshake packet is received from the device. By following the

states of the Host Controller Arbiter component (seen in a yellow box below in the simulation below),

we can see that first it waits till the SETUP token packet is sent, then that the data packet is sent,

finally it waits till a handshake packet is received, at that point it interrupts the software layer by setting

the TransDone signal to 1 (circled in red in the simulation below).

5.0 Evaluation

Page 51: Implementing a USB 2.0 Intellectual

51

Figure 20: SETUP Transaction

Send OUT(0/1) Transaction

When the software layer requires an OUT transaction, first a token packet with PID equal to OUT is

sent, then a DATA(0/1) packet is sent, the payload of which is read from the Transmit FIFO previously

loaded by the upper-layer. The Host Controller core then waits till the device sends back a handshake

signal, at that instant it interrupts the software layer with a Transmission Done interrupt signal

(TransDone). The figure below shows the states that the Host Controller Arbiter component passes

through (show in a yellow box): it waits till the OUT token packet is sent, then it waits till the DATA0

packet is sent, finally that a handshake is received, it then sets the TransDone signal (circled in red

below). Note that, in case of an OUT0 transaction and a high speed device, if the previously received

handshake is a NYET, the host keeps on sending simply a PING token packet, without a following

data packet, until it received an ACK. Then it can start any other transaction.

Figure 21: OUT (0/1) Transaction

Send IN Transaction

When the software layer requires an IN transaction, first a token packet with PID equal to IN is sent to

indicate to the device that if he has packets to be send it can do so now. The Host Controller core then

waits till the device sends a data packet and when it does, it sends back a handshake packet to the

device in order to indicate that it processed the data packet. In case the host does not detect anything,

it sets the Time-Out bit, which is the 4th bit in the RxStatus signal. The software layer reads this signal

Page 52: Implementing a USB 2.0 Intellectual

52

and sees that there is a Time-Out and initiates the IN transaction one more time. The figure below

deals with the second case where there is a time-out, it shows the states that the Host Controller

Arbiter component passes through (shown in a yellow box below): it waits till the IN token packet is

sent, then it waits till a DATA packet is received, which it doesn’t, finally the SIE Receiver detects a

time-out, and then the Host Controller Arbiter sets the TransDone bit (circled in red below) and sets

the 4th bit of the RxStatus signal to 1 (shown in a red box below).

Figure 22: IN Transaction 5.2 Transmitter Testing

The tests done independently on the Transmitter will have granularity of packets as this is the unit of

transfer that the Transmitter deals with. In other words, we will provide test cases that ensure proper

transmission of packets to the USB device.

Writing bits on the USB wire at the correct speed

The figure below is a screenshot of the Transmitter sending a Token Packet (SETUP). From the figure

above, we see that a bit is written every 2.084 ns. This verifies that our core can write at a speed of 1 /

2.084 ns≈ 480Mbps. This is the rate at which a high speed signaling occurs.

Page 53: Implementing a USB 2.0 Intellectual

53

Figure 23: Writing to the USB wire at High Speed

In order to test the functionality of the Transmitter, we will display tests that were performed to send a

typical transaction to the USB device. As we explain each case, we will highlight how the details of the

protocol were tested.

Sending a Token Packet (SETUP):

The only inputs to the transmitter required to send a SETUP Token packet are the packet id, the

address and endpoint of the targeted device. All the steps described below are carried out by the

transmitter in the mentioned sequence.

1-Transmitter sends IDLE state

The start of a packet transmission requires the USB bus to be in an idle state. Therefore, prior to

sending a packet, the Transmitter sends an Idle state. In the figure above, we see the output signal

USBwirectrlout becoming one for the first time. This signifies that a bit is being written onto the USB

wire. When we check the value of the corresponding bit, we see that USBwiredataout is a 00 (Single

Ended Zero). This signifies an IDLE state on the bus and is necessary before we send a SYNC

pattern in the next step.

Page 54: Implementing a USB 2.0 Intellectual

54

Figure 24: Idle State 2- Transmitter sends SYNCHRONIZATION (SYNC)

In the USB protocol, the host and devices do not share a clock. Thus, the device cannot identify when

the host will send a transition that signals the beginning of a new packet. Only one transition is not

sufficient to synchronize the receiver for the duration of a packet. Therefore, every packet has to begin

with a SYNC field to enable the device to align, or synchronize, its clock to the transmitted data. For

high speed devices, the host must send a SYNC pattern that is 4 bytes: {1 and 31 zeros} encoded

according to NRZI as fifteen KJ successions, and then a KK. The alternating Ks and Js provide the

transitions for synchronizing, and the last two Ks mark the end of the field.

Figure 25: SYNC Bytes 1 and 2

Page 55: Implementing a USB 2.0 Intellectual

55

Figure 26: SYNC Byte 3 and 4 In the figures above, we illustrate the HS SYNC pattern being sent. As stated earlier, it is fifteen KJ

successions, and then a KK. Now note that when transmission is at high speed, J=2 or 10 and K=1 or

01. Therefore when we see a 1 on USBWiredataOut highlighted in the figures above this is a K and

similarly a 2 is a J.

In Part 1, we can see that the first 2 bytes of the sync field are sent. In Part 2, we see the last two

bytes. Note that every switch from a K to a J helps the receiver synchronize. The last 2 bits sent are

11 or KK, which indicates the end of the SYNC field. These 2 bits are circled in yellow.

3- Transmitter sends a Token Packet of type SETUP

Figure 27: Token Packet This is the information in the packet that we input to the Transmitter

PID=00101101, ADDR=00000000, ENDP=00000000, CRC5= to be calculated

Page 56: Implementing a USB 2.0 Intellectual

56

Figure 28: Setup Token Packet

Note that we first have the 3 SYNC bytes (00) then the last sync byte (80) then the token PID (2d)

then the ADDR+1bit of ENDP (00) then the 3 bits of ENDP and CRC5.

As we can see in the figure above, the following sequence of bytes pass through the stages of bit

stuffing, CRC calculation, NRZI encoding and result in bits on the wire.

Note that, although PID=00101101 where MSB=0 and LSB=1, bits need to be sent out on the bus in

little endian order, as specified by the USB protocol. That is, the LSB of a byte is sent out first,

followed by the next LSB and through to the MSB.

Figure 29: Little Endian

Page 57: Implementing a USB 2.0 Intellectual

57

In the figure above, we can follow how the PID (00101101) bits are encoded and send. Please note on

the figure how a J(10) and a K(01) are represented. Note that little endian is used to send a byte. So

10110100

K->KJJJKKJK

4-Transmitter sends a HS EOP (End of Packet)

In high-speed signaling, a sequence that would generate a bit stuff error at the receiver device is

intentionally sent to indicate EOP. For almost all high-speed packets the End of High-speed Packet is

an encoded byte of 01111111, without bit stuffing. If the preceding bit was a J, the End of High-speed

Packet is KKKKKKKK. The initial 0 causes the first bit to be a change of state from J to K, and the

following 1s mean that the rest of the bits don't change. If the preceding bit was a K, the End of High-

speed Packet is JJJJJJJJ. The initial 0 causes the first bit to be a change of state from K to J, and the

following 1s mean that the rest of the bits don't change. In either case, a sequence of seven bits

without a transition causes a bit stuff error.

When all fields of the token packet are sent, a HS EOP pattern must be sent. As illustrated above, we

will see this experimentally from the simulation in the figure below. When the packet has been sent, a

signal called HSEOP which has been 0 all along will become one. This will cause a sequence of 8

data states on the wire that are opposite to the last data state that was send by the last field of the

packet. In the figure, the last bit was a J (2) and so we can see a sequence of 8 K’s sent consecutively

to signal the end of the packet. This is highlighted in the white box.

Figure 30:End Of Packet

Page 58: Implementing a USB 2.0 Intellectual

58

We have now tested the correct transmission of a token packet. Next, we will describe that of a Data

Packet as it is a bit different.

Sending a DATA packet:

The Idle state and the sync byte patterns that exist before a packet is sent are identical for all packets

sent, so we will skip the testing of these stages and directly start discussing the fields of the DATA

packet.

This is the information in the packet that we wish to send: PID=11000011 , DATA(1st byte)= 11110000,

DATA(2nd byte)=00001111, CRC16 (1st byte) and CRC16 (2nd byte)=to be calculated

Figure 31: Data Packet

As we can see in the figure above, the PID, then DATA(1st byte), DATA(2nd byte), CRC16(1st byte)

and CRC16(2nd byte) are sent out in succession. This completes the DATA packet.

In order to ensure adequate signal transitions, bit stuffing is employed by the transmitting device when

sending a packet on USB. The rule for bit stuffing was described earlier in the Implementation chapter.

In the figure below, the two bytes highlighted in red are sent in a little endian ordering. This means that

11110000 is sent, followed by 00001111. Since this means that there are 8 consecutive one bits, the

txonecount signal shown below is asserted and a new 0 bit is stuffed and decoded. The bit in purple is

bit stuffed, the ones in orange are the 2+6=8 encoded bits of the 11110000 (2nd)

Page 59: Implementing a USB 2.0 Intellectual

59

Figure 32: Bit Stuffing

The EOP pattern that exist after a packet is sent are identical for all packets sent, so we will not repeat

the details again.

Sending a HANDSHAKE packet:

The Idle state and the sync byte patterns that exist before a packet is sent are identical for all packets

sent, so we will directly start discussing the fields of the HANDSHAKE packet.

This is the information in the packet that we need to send to have an ACK handshake

PID=11010010

Figure 33: Handshake Packet

As we can see in the figure above, the PID is sent which means the packet has been sent. This

completes the Handshake packet. The EOP pattern that exist after a packet is sent are identical for all

packets sent, so we will not repeat the details again.

Page 60: Implementing a USB 2.0 Intellectual

60

5.3 Receiver Testing The tests done independently on the receiver will have granularity of bits that are converted to bytes at

the output. In other words, we will provide test cases that ensure proper reception of packets from the

USB device and proper sending of bytes to the HC.

High Speed Detection The first component to test in the receiver is the Detect Speed component since this component will

notify all other components as to what speed they should be working at. If this component

malfunctions then all the other components will be working at the wrong speed. Below is the test

waveform that shows that a high speed device has been connected.

In the figure above, the input rxwiredatain is what we are receiving from the USB wire. Connectstate

highlighted in red above is the output of this component. 00, 01, 10 and 11 correspond to low speed,

full speed, high speed and disconnected respectively. The first thing we notice above is that

connectstate has been changed from 11 to 10 and this means that we have detected a high speed

device. We can see that the resetdevice signal, highlighted in purple, goes high upon receiving the 2

input. This reset signal will force the transmitter to send an SE0 to reset he device. This will force the

device to respond and tell us if it is high speed or not. If the device is high speed then the device will

reply with a 01. Upon receiving the 01 input the sendjkjkjk and outputs are set to 1. This signal will

inform the receiver to send the acknowledgment sequence to the device and thus complete the

process of detecting high speed.

Figure 34: High Speed Detection Waveform

Page 61: Implementing a USB 2.0 Intellectual

61

Receiving Bits

The USB Read component is the component that reads the input from the USB wire and performs the

timing issues. Below is a waveform that shows the timings.

In the figure above, Rxbitsin is what is the input read from the D+ and D- lines on the USB wire and is

highlighted in red on the waveform. The highspeedtick signal directly below the highlighted bits in red

is the speed at which we take in the input. At every rising edge of the highspeedtick signal, we take in

a new input. The highspeedtick signal has a period of 2.0833 ns and if the USB device sends each bit

for this amount of time, then we are guaranteed that we will not miss any of the bits since they will all

witness a rising edge of the highspeedtick. The bits highlighted in yellow are the outputs and will be

given to the Byte Analyzer component. We notice here that the output is 1 even though we are reading

new inputs from the wire. This is because, due to the 64 buffers present, there is a delay in outputting

the data. We also notice that fullspeedrate is 2 and this means that, as required, we are reading data

at a high speed rate.

Processing the bits

In this test we will be receiving the byte: 10000000 which is part of the start of packet for high speed.

The USB device will send the 10000000 starting from the least significant bit. Thus we will first detect

seven 0’s and then one 1.

Figure 35: Receiving Bits at High Speed Waveform

Page 62: Implementing a USB 2.0 Intellectual

62

In the figure above, the bits highlighted in red are the inputs that are coming from the USB read

component. We see that the inputs are 01, 10, 01, 10, 01, 10, 01, and 01. The bit count highlighted in

purple shows us which bit number we are at before we form the byte. The bits highlighted in yellow

represent the bytes that will be sent to the Byte Analyzer component. Notice that after we receive the

two 01’s, the byte being formed becomes an 80 which is 10000000. Thus we have successfully

received a byte that will be sent to the Byte Analyzer.

Processing the Bytes

1-Receiving a Handshake Packet (ACK)

Here we will test the component that processes the bytes and forwards the information to the HC

(Host Controller). Below is a waveform showing an ACK handshake packet being received. The ACK

packet is made up of the following parts: SOP, PID, EOP where the SOP is 80,00,00,00 hex, the ACK

PID is D2 hex, and the EOP is FF.

Figure 36: Forming a Byte Waveform

Page 63: Implementing a USB 2.0 Intellectual

63

In the figure above, highlighted in red we see that the Bit to Byte converter component signals the

Byte Analyzer component when it is giving it inputs by setting the processrxdatainwen bit to 1. Here

we see the inputs highlighted in yellow that are 80, 00, 00, 00, d2. Highlighted in purple, we see that,

after the byte has been analyzed, the ackrxed signal has been set to one so that higher level

components know that the ACK has just been received.

2-Receiving a DATA packet

In this test, we will be receiving a DATA packet that has two bytes in the data payload which are

11110000 and 00110000. Before receiving the data we need to receiver the PID telling us that this is

in fact a data packet. After receiving the data we should expect to receive the CRC and the EOP.

Below is the list of bytes that we should receive to complete a Data packet.

SOP 80 PID C3

Byte1 F0 Byte2 30

CRCByte1 BA CRCByte2 5B

Figure 37: Processing Bytes that represent an ACK Waveform

Table 7: Bytes input into the Byte Analyzer component

Page 64: Implementing a USB 2.0 Intellectual

64

In the figure above, highlighted in red we see the bytes that we are receiving. Note that these bytes

are coming from the Bit to Byte Converter component. We see that we are receiving the correct bytes

that are present in the table above. The CRC being calculated is highlighted in yellow and we can see

that the CRC error does not become 1 and thus the CRC that was calculated is correct. The output

from this component is highlighted in purple and will be sent to upper level components, specifically

the HC. There are three things that should be looked at in the area highlighted in purple: rxdataout,

rxcontrolout and rxdataoutwen. We see that the rxdataoutwen becomes high 6 times and so we output

data 6 times.

The outputs are shown in the table below:

DataOut ControlOut 0 Rx_packet_start ( 0 ) c3 Rx_packet_stream ( 1 ) f0 Rx_packet_stream ( 1 ) 30 Rx_packet_stream ( 1 ) ba Rx_packet_stream ( 1 ) 5b Rx_packet_stream ( 1 ) 0 Rx_packet_stop ( 2 )

After the rx_packet_stop has been output to the upper layer, the transaction will be complete and the

bytes would have been successfully sent to the HC (Host controller) that will use this information.

Figure 38: Processing Bytes that represent Data Waveform

Table 8: Processing Bytes that represent Data

Page 65: Implementing a USB 2.0 Intellectual

65

5.4 Testing the USB Core on the FPGA

After testing the VHDL USB core, we proceeded with downloading the system on the FPGA and

testing it using the C code. We first had to add our VHDL USB core as a component in our hardware

system.

Note that an IP core for USB is not available in the EDK peripheral libraries; therefore after having

designed it ourselves, we had to import it into our project in XPS in order to be able to use it. To

achieve this target, we used the Create and Import Peripheral Wizard that guided us through the

design flow.

Within the latter mentioned wizard, we added our core peripheral as a slave device on the On-chip

peripheral bus (OPB) which is attached to the Microblaze soft-core processor on the FPGA.

In this case, normally, our IP core should have had an interface compliant to the OPB bus protocol,

however EDK uses the Intellectual-Property Interface (IPIF) library which gives a set of simplified bus

protocol called IP interconnect which is much easier to use compared to operating on the bus using

the OPB protocol.

Moreover, the Create and Import Peripheral wizard generates templates that take care of all the OPB

bus interface protocol and connection between IPIF and our code. In fact, this wizard generates 2 files

(among many others) in the pcores subdirectory under the project directory, one peripheral top-level

file which we don’t modify, and another called user-logic. In the user-logic file (VHDL), we added the

top-level wrapper of our USB core as a component and we port mapped each input and output of this

wrapper to a register. These registers are different from the ones described in the Design and Analysis

Chapetr. In fact, these registers are 11 in number and they correspond to address_i, data_i, rst, we_i,

strobe_i, data_o HostResumeIntOut, HostTransDoneIntOut, HostConnEventIntOut,

HostSOFSentIntOut and USB Speed. Note that if we compare to the block diagram, we see that

registers for clk, USBWireDataOut and USBWireDataIn are missing. The reason is that we connected

the clk directly to the system clock running at 100 MHz. As for the USBWireDataOut and

Page 66: Implementing a USB 2.0 Intellectual

66

USBWireDataIn signals, we omitted these because we used the device simulator which was

implemented in VHDL. Thus, these signals will not interface to the physical USB port but instead they

are connected within the VHDL core to the device simulator.

The last step was to generate a file with an extension .pao (peripheral analysis order file) in which

HDL Analysis Information is found (dependent library files and HDL source files to compile the

peripheral, as well as corresponding logical libraries those files will be compiled into). At

After having added the IP core, in order to interact with the VHDL core from the software layer, all the

C code has to do is to write to and read from registers described above. This will enable it to send

input data to the core and read output data from the core. These registers are located within the

address space assigned to the core. The functions used to read and write to these registers in C code

are fairly simply.

In fact, the above mentioned wizard generates a C header file called HIGH_SPEED_USB_CORE.h (in

accordance with our core which is called HIGH_SPEED_USB_CORE) in which one can find the

functions used to read and write to the registers. This header provides many functions to choose from

in order to read and write to the registers. A prototype of the functions we chose is:

HIGH_SPEED_USB_CORE_mWriteSlaveRegX(BaseAddress, Value)

HIGH_SPEED_USB_CORE_mReadSlaveRegX(BaseAddress)

Where

X: The number of the register we would like to read from or write to.

BaseAddress: The base address of the address space assigned to the core.

Value: The value we would like to write to register X.

Page 67: Implementing a USB 2.0 Intellectual

67

Our C code configures the USB Core before requesting that a transaction begins. To do this, it

assigns appropriate values to the inputs of the USB Core and consequently to the registers to which

these inputs are assigned.

It is composed of 3 files: main.c, driver.c and HIGH_SPEED_USB_CORE.h. The last file is generated

by the wizard and contains a list of functions we can choose from to write and read to registers (as

mentioned above), whereas the other 2 files were written by us and are as follows:

main.c

The file main.c simply calls the function HIGH_SPEED_USB_CORE_SETUP_TRANSACTION() which

is located in the driver.c file. It provides a higher level of abstraction to the user; the user will just have

to call a function without having to deal with writing to registers at the bit level granularity.

#include "xbasic_types.h" #include "xstatus.h" #include "xparameters.h" #include "xio.h" #include "xuartlite_l.h" #include "xuartlite.h" #include "stdio.h" #include "High_Speed_USB_Core.h" #define BASEADDR 0x77400000 int main(void) { print("-- Entering Main() --\r\n"); print("-- Call the function that starts a setup transaction --\r\n"); HIGH_SPEED_USB_CORE_SETUP_TRANSACTION( ); } driver.c #include "xbasic_types.h" #include "xstatus.h" #include "xparameters.h" #include "xio.h" #include "xuartlite_l.h" #include "xuartlite.h" #include "stdio.h" #include "signal.h" #include "High_Speed_USB_Core.h" #define baseaddr 0x77400000 int HIGH_SPEED_USB_CORE_SETUP_TRANSACTION(void ) { Xuint32 Reg32Value;

Page 68: Implementing a USB 2.0 Intellectual

68

xil_printf("**************************************************************\n\r "); xil_printf("First reset all the components\n\r "); //rst_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg0(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg0(baseaddr); xil_printf(" - wrote %d to rst_i\n\r", Reg32Value); //address_i=x"34" xil_printf(" Set the address equal to that of the Transmit Fifo\n\r "); HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 52); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 xil_printf(" Set the data to be sent to the Transmit Fifo equal to 1\n\r, so as to delete all data in the fifo\n\r"); HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); //we_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg3(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg3(baseaddr); xil_printf(" - wrote %d to we_i \n\r", Reg32Value); //strobe_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg4(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg4(baseaddr); xil_printf(" - wrote %d to strobe_i\n\r", Reg32Value); //address_i=x"24" xil_printf(" Set the address equal to that of the Receive Fifo \n\r, so as to delete all data in the fifo\n\r"); HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 36); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i \n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); //rst_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg0(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg0(baseaddr); xil_printf(" - wrote %d to rst_i\n\r", Reg32Value); xil_printf(" Write 0 to the TRANSREQ_PREEN_SOFSYNC -> No transaction required at present time\n\r", Reg32Value); //address_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 =>no transaction required HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); xil_printf("Set the transaction type equal to SETUP\n\r", Reg32Value); //address_i=7=>TRANSACTION_TYPE HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 7); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 =>Setup transaction HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0);

Page 69: Implementing a USB 2.0 Intellectual

69

Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i\n\r", Reg32Value); xil_printf("If it has processed the incoming data, it sends it back as output\n\r"); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg6(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 0 to the DEVICE_ADDRESS\n\r"); //address_i=5=>DEVICE_ADDRESS HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 5); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 0 to the ENDPOINT_ADDRESS\n\r"); //address_i=6=>ENDPOINT_ADDRESS HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 6); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1111 to the INTERRUPT_MASK\n\r"); //address_i=9=>INTERRUPT_MASK HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 9); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=15 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 15); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); //address_i=12 HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 12); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); // xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1010000 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 80); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); // xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1 to the SOF_ENABLE\n\r"); //address_i=1=>SOF_ENABLE HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr);

Page 70: Implementing a USB 2.0 Intellectual

70

xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 11110000 to the TX_FIFO_DATA\n\r"); //address_i=48=>TX_FIFO_DATA HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 48); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=11110000 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 240); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 00001111 to the TX_FIFO_DATA\n\r"); //data_i=00001111 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 15); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); //data_o_S; Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg5(baseaddr); // xil_printf(" - read %d from data_o\n\r", Reg32Value); xil_printf("Write 1 to the TRANSREQ_PREEN_SOFSYNC\n\r"); //address_i=0=>TRANSREQ_PREEN_SOFSYNC HIGH_SPEED_USB_CORE_mWriteSlaveReg1(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg1(baseaddr); xil_printf(" - wrote %d to address_i b\n\r", Reg32Value); //data_i=1 HIGH_SPEED_USB_CORE_mWriteSlaveReg2(baseaddr, 1); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg2(baseaddr); xil_printf(" - wrote %d to data_i b\n\r", Reg32Value); xil_printf("Write 0 to we_i\n\r"); //we_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg3(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg3(baseaddr); xil_printf(" - wrote %d to we_i \n\r", Reg32Value); xil_printf("Write 0 to strobe_i\n\r"); //we_i=0 HIGH_SPEED_USB_CORE_mWriteSlaveReg4(baseaddr, 0); Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg4(baseaddr); xil_printf(" - wrote %d to strobe_i \n\r", Reg32Value); Reg32Value==0; xil_printf("Wait till the transaction is done\n\r ", Reg32Value); while (Reg32Value==0) { Reg32Value = HIGH_SPEED_USB_CORE_mReadSlaveReg9(baseaddr);

Page 71: Implementing a USB 2.0 Intellectual

71

} xil_printf(" - read %d from Transaction Done interrupt bit\n\r", Reg32Value); xil_printf("Transaction completed with no errors ! "); return 0; }

In the following we will try to provide a brief explanation of what the C code does; it first sends some

configuration information to the USB core such as: enable automatic transmission of Start-Of-Frame,

the address of the device and its endpoint and others. It also loads the Transmit FIFO with data to be

sent as part of a data packet, then specifies the type of transaction required (in this case a setup

transaction), and when the transaction should start exactly.

Note that every time we write values to the address (address_i) and data (data_i) registers, we can

check the value in data_o, which should be equal to data_i if the VHDL has processed the address

and data correctly, meaning it is properly configured before transaction start.

Finally we wait for the USB Core to set the Transaction Done bit, meaning that the transaction has

been completed. Below is the output result we get on the Hyperlink terminal on the computer screen.

This terminal is attached to the COM1 port on which the FPGA sends data.

-- Entering Main() -- -- Call the function that starts a setup transaction -- ************************************************************** First reset all the components - wrote 1 to rst_i Set the address equal to that of the Transmit Fifo - wrote 52 to address_i b Set the data to be sent to the Transmit Fifo equal to 1 , so as to delete all data in the fifo - wrote 1 to data_i - wrote 1 to we_i - wrote 1 to strobe_i Set the address equal to that of the Receive Fifo , so as to delete all data in the fifo - wrote 36 to address_i

Page 72: Implementing a USB 2.0 Intellectual

72

- read 0 from data_o - wrote 0 to rst_i Write 0 to the TRANSREQ_PREEN_SOFSYNC -> No transaction required at present time - wrote 0 to address_i b - wrote 0 to data_i Set the transaction type equal to SETUP - wrote 7 to address_i b - wrote 0 to data_i If it has processed the incoming data, it sends it back as output - read 0 from data_o Write 0 to the DEVICE_ADDRESS - wrote 5 to address_i b - wrote 0 to data_i b - read 0 from data_o Write 0 to the ENDPOINT_ADDRESS - wrote 6 to address_i b - read 0 from data_o Write 1111 to the INTERRUPT_MASK - wrote 9 to address_i b - wrote 15 to data_i b - read 15 from data_o Write 1 to the SOF_ENABLE - wrote 1 to address_i b - wrote 1 to data_i b - read 1 from data_o Write 11110000 to the TX_FIFO_DATA - wrote 48 to address_i b - wrote 240 to data_i b Write 00001111 to the TX_FIFO_DATA - wrote 15 to data_i b Write 1 to the TRANSREQ_PREEN_SOFSYNC - wrote 0 to address_i b - wrote 1 to data_i b Write 0 to we_i - wrote 0 to we_i Wait till the transaction is done

Page 73: Implementing a USB 2.0 Intellectual

73

- read 1 from Transaction Done interrupt bit Transaction completed with no errors !

We wrote the C code for the test case described above. As for the other cases of transactions that we

had tested in simulation, we did not implement them because the FSM would have to change for each

case since this deals with information at the bit level. For example, for an IN transaction, if the device

wishes to send a token packet followed by a data packet that has 10 bytes in its data field then this

would require that we simulate the sending of around a 1000 bits to run the test case. And note that

these bits need to be 100% correct or else the VHDL core would not work. For example, if one

mistake is made in a bit of the PID field and the PID is invalid, then the whole test case fails. In any

case, since the system worked on the FPGA for the case we tested, and it worked for the remaining

three cases of transactions in the VHDL simulator, it is expected to work on the FPGA for all other

cases.

Page 74: Implementing a USB 2.0 Intellectual

74

6.1 Difficulties Faced Throughout our work on the Final Year Project we faced many difficulties and problems, some of

which were mentioned throughout the report. The table below provides a summary of these along with

possible alternatives solutions we found to overcome them:

Difficulties Alternatives

Understanding and implementing the whole

USB protocol.

We implemented the parts of the USB

protocol that were relevant to our project. In

many cases we did not implement parts of

the specification. An example is support of

split transactions that are required in

isochronous transfers.

We had planned to link our core to the

physical USB port on the FPGA but after

having implemented most of the VHDL

design, we noticed that the USB port has a

parallel interface instead of a serial interface;

as a result we could not test our code with an

actual physical device.

We simulated a USB device as part of a

SETUP transaction required by the software

layer; we implemented a finite state machine

which upon receiving correctly all the bits it

should receive in the scope of a SETUP

transaction, responds with an ACK.

The maximum frequency at which the internal

FPGA clock runs is 100 MHz whereas our

High Speed Host Controller Core needs to

run on 960 MHz

We had to readjust our core on 100 MHz to

be able to make it work on the FPGA. This

meant that signaling to the device simulator

was lower than required by the USB

specification

Regulating the timing between commands in

the C code at the software layer, because

both the USB Core and its test bench are

very sensitive on timing issues.

By testing with several timing patterns on the

FPGA, we managed to get the correct timing

Table 9: Difficulties Other than the problems listed in the table above, one of the major problems that we have faced while

making the core low, full and high speed compatible is that the high speed was too fast for us to

handle. The high speed frequency is 960 MHz whereas the full speed is 12 MHz. Our receiver, which

6.0 Conclusion

Page 75: Implementing a USB 2.0 Intellectual

75

is the component responsible for reading the USB wire, can only handle a certain speed. Let us look

at a waveform to be able to understand the situation.

Highlighted in red is the speed at which we take in bits. Highlighted in yellow is the speed at which we

output the bits. Clearly, we are taking in bits at a faster rate then we are outputting them. The reason

for the slow outputting of the bits is because the bits have to go through 3 machine states before they

are output and thus we can output one bit every 3 rising edges of the main clock. This problem can be

approached using two different methods. The first method was adding buffers. We tried adding several

numbers of buffers and eventually chose to add 64 buffers. Let us do the analysis: We take in one bit

every 2.0833ns, we output one bit every 3.125 ns. With those rates, data that has been written to a

buffer and has not yet been output can be re-written at a later stage if the buffers become full and thus

the old data will be lost.

Below is a table that shows us the time at which we take in inputs, output them and the number of

buffers that are in use.

Figure 39: High Speed Rate Problem

Page 76: Implementing a USB 2.0 Intellectual

76

Time(ns) take in new input output buffercounter 2.0833 Here 1 3.125 Here 0 4.1666 Here 1 6.2499 Here 2 6.25 Here 1 8.3332 Here 2 9.375 Here 1 10.4165 Here 2 12.4998 Here 3 12.5 Here 2 14.5831 Here 3 15.625 Here 2 16.6664 Here 3 18.7497 Here 4 18.5 Here 3 20.833 Here 4

As we can see from the table above, the buffers are being filled up fast and it will not take long until we

start overwriting data. If we have 64 buffers than the maximum number of sequential bits that we can

have is solved below

X – X (2.0833)/3.125 = 64 => X = 96 inputs which is obviously not enough since we must receive

thousands of bits in sequence. Thus we should either increase the number of buffer or look for other

solutions. Increasing the buffers would increase the area used by the FPGA, power consumption and

so on and is not considered as a good solution. We thus analyze another solution which would try to

make the component output data at a faster rate.

Te second approach actually speeds up the rate at which we output the results. Since we are

constrained by the act that we have to go through three state machines to be able to output, the best

solution would be to make it possible to move from one state to the next on the rising edge and the

falling edge and thus it would take us less time to output the data. The period would then go down

from 3.125 ns to 1.5625 ns and thus we will be able to output at a faster rate than our input. Buffers

will not be necessary in this solution.

Table 10: High Speed Rate Problem

Page 77: Implementing a USB 2.0 Intellectual

77

6.2 Future Work

Our USB Core functions properly concerning the main deliverables, but it still has room for

improvement. We were not able to attempt the following suggestions due to time constraints.

Future work may involve the following:

• Implementing the whole USB protocol with all its details.

• Finding a way to have the internal clock of the FPGA equal to the USB clock needed for high

speed (960 MHz). For example, this can only be achieved by working on a faster processor on

the FPGA.

• Implementing a USB Device Core and trying to attach our USB Host Controller Core to it, or

as a second option, designing a simple board with only a USB physical port on it, which can

be attached to the Virtex development board through the P160 expansion slot.

• Developing the 3 remaining transaction cases in C code and generating libraries that abstract

the Host Controller Driver Layer and implement the layers above it.

6.3 Design Constraints

FPGA’s give us flexibility at the cost of performance. Since the FPGA’s were present in the AUB labs,

economic constraints do not apply to us. Even so, designing the USB core on an FPGA would be

more expensive than designing the core on another chip that can be mass manufactured. FPGA’s are

devices that have been used and can stay operational for many years. However, technology is

evolving rapidly along with the design of FPGA’s. The VirtexII board that we used has already been

succeeded by two newer versions and thus we expect that the FPGA that we are using will become

obsolete in about a decade and thus sustainability is a major issue in our design. Furthermore, a new

USB specification might come out and a newer core will need to be re-designed.

Page 78: Implementing a USB 2.0 Intellectual

78

7.0 References 1. Axelson, Jan, (2001), USB Complete: Everything You Need to Develop Custom USB Peripherals. Third Edition 2. Birkner, J. (1998). HDL IP cores in FPGAs to drive pace of innovation. Cahners Publishing Company: Gale Group. Retrieved from http://www.findarticles.com/p/articles/mi_m0EKF/is_n2203_v44/ai_20201029 3.Copyright © 2000, Compaq Computer Corporation, Hewlett-Packard Company, Intel Corporation, Lucent Technologies Inc, Microsoft Corporation, NEC Corporation, Koninklijke Philips Electronics N.V, Universal Serial Bus Specification Revision 2.0 4. Cypress Semiconductor Corporation (2005), SL811HS Embedded USB Host/Slave Controller From Cypress, Document 38-08008

5. Fanning, J (1999). Literature Survey of Present State of FPGA's. Department of Instrumentation ad Analyical Science. Retrieved from: http://dias.umist.ac.uk/old_pages/njg/fpga2.htm 6. Fielding, Steve. USBHostSlave IP Core Specification. Retrieved from: http://www.opencores.org 7. Hyde, John. USB Design by Example: A Practical Guide to Building I/O Devices 8. Philips Semiconductors (1999), PDIUSBD11 USB device with serial interface, Retrieved from: http://www.semiconductors.philips.com/acrobat_download/datasheets/PDIUSBD11_N_3.pdf 9. Philips Semiconductors (2005). ISP1760 Hi-Speed Universal Serial Bus host controller for embedded applications 10. Saini, M. (2004). FPGA Solutions: Using Synplify® Software Synthesis with Xilinx Platform Studio. The Syndicated. Retrieved From http://www.synplicity.com/literature/syndicated/pdf/v4_i2/platform_studio_v4_i2.pdf 11. TransDimension Inc. (2002). UHC124 USB Host Controller Data Sheet. TransDimension Document Number: MU1002 Retrieved From: http://www.transdimension.com/downloads/assets/hardware/uhc124/UHC124%20Product%20Brief.pdf 12. Vilakathara H, Challenges in developing a reusable IP core USB OTG IP case study , D & R Industry Articles

13. Xilinx, Virtex-II™ V2MB1000 Development Board User’s Guide