arindam goswami eric huneke mert ustun advanced embedded systems architecture spring 2011 hw/sw...
TRANSCRIPT
A R I N D A M G O S WA M IE R I C H U N E K EM E RT U S T U N
A DVA N C E D E M B E D D E D S Y S T E M S A R C H I T E C T U R E
S P R I N G 2 0 1 1
HW/SW Implementation of JPEG Decoder
Division of Labor
Software Profiling – Arindam/Eric Timing analysis – Arindam/Eric Interface to hardware - Arindam Test data for hardware - Eric
Hardware – Mert C to Verilog Conversion Scheduling & Resource Allocation on FPGA Bus Communication Interface
Outline
What is JPEG?Project DescriptionJPEG AlgorithmProfile DataSoftware DesignHardware DesignResultsConclusion
What is JPEG?
Image codec released by the Joint Photographic Experts Group in 1992 Joint committee between the ISO/IEC JTC1 and ITU-T
standards committeesInformally used to describe the file format
JPEG-encoded images are packed in Although the file format specified in the original
standard, JPEG Interchange Format (JIF), is rarely used
Exif or JFIF, both based JIF, are commonly used
What is JPEG? (cont.)
Optimized for realistic images and photographs Color transitions should be smooth for best results
Lossy compression, which can be tuned to produce compressions of varying quality and size Up to 20:1 without loss in quality for appropriate
images Better ratios than other algorithms such as GIF, but
slower to compress and decompress Has lossless mode, but not widely used
Project Description
Selected an existing software JPEG implementation we could modify and increase performance
Criteria Small enough to be easily understood and modified Reasonably fast, but not optimized
Project Description (cont.)
Most common JPEG implementation out there is libjpeg, from the Independent JPEG Group Fast, but hard modify due to complexity
Various other open source implementations Tiny Jpeg Decoder jpeg-compressor
Project Description (cont.)
We ended up choosing NanoJPEG, written by Martin Fiedler Reasonably fast, but not optimized Very small code size (< 1000 lines) in a single file Easy to understand
I/O Decompresses grayscale or YCbCr images Outputs grayscale or RGB raw images
Other details Written in C No floating point
JPEG Algorithm
Step 1Convert the image to the YCbCr color space
(typically from RGB) Y for brightness Cb and Cr for blue and red color components
The human eye is less sensitive to color changes than it is too brightness changes JPEG takes advantage of this
JPEG Algorithm (cont.)
Step 2Downsample the color data (CbCr) by
averaging together rows and vertically Factor of two on rows Factor of one or two on column Data can thus be reduced by 1/2 or 1/3
Imperceptible loss in quality
JPEG Algorithm (cont.)
Step 3For each component, split the pixel data into
8x8 blocksRun each block through a discrete cosine
transform (DCT)End up with a matrix containing one DC
value and 63 AC components
JPEG Algorithm
Step 4Divide each cell of the matrix by values
defined in a quantization matrix, then round to the nearest integer
The quantization matrix has values of customizable size The larger the values, the more cells are reduced to
zero, and hence lost
JPEG Algorithm (cont.)
Step 5Take the reduced blocks and perform
Huffman encoding (or Arithmetic encoding) to eliminate redundant values Lossless compression
Step 6Wrap data in a standard file format, along
with compression data including quantization and Huffman tables
JPEG Algorithm (cont.)
Decoding is simply the reverse of the encoding process Get the reduced matrixes back Multiply it with the quantization matrix Run an inverse DCT (IDCT) Upsample Convert to RGB
Profile Data
Profiled NanoJPEG on sample image with armsd simulator
55.10% of total time spent converting the image to RGB upsampling Logically separate from decode phase
38.34% of total time spent decoding the 8x8 blocks So really 85.39% of time not spend converting/upsampling
Row and column IDCTs were about half of the block decode time Our main focus for speedup, since took about 42% of decode
time, and were an obvious candidate for FPGA implementation
Software Design
Block decoding code
Row and column IDCT calls
Software Design
Row IDCT
Column IDCT
Software Design
Interface – Write 8x8 integers to FPGA addresses- D3000100-1FF Read 8x8 integers from D3000200-2FF (o/p of
RowIDCT) Read 8x8 bytes from D3000300-33F (o/p of ColIDCT)
Code – Replace calls to IDCT functions with r/w to FPGA
addresses
Hardware Design - Architecture
ROW IDCT
IDCT CORE
8x8x8b COL_OUTRegister File
BUS COMM. IF
8x8x32b BLOCKRegister File
AMBA BUS
COL IDCT
1. ARM writes row 02. Row IDCT: row 0
ARM writes row 1 3. …4. Row IDCT: row 7
ARM reads row 0 5. Col IDCT: col 0 - 7
ARM reads rest of the block6. ARM reads colIDCT results
Hardware Design - Optimizations
Register Files are used instead of RAMs to allow random access to any word in the block matrix
Arithmetic operations were distributed in multiple stages to share resources and therefore reduce area
Column IDCT and Row IDCT have a lot of common operations –
Use only a single datapath for both = Core IDCT
Hardware Design – Core IDCT
Row IDCT
Column IDCT
Hardware Design – Optimizations (2)
The hardware speed is limited by the ARM – FPGA bus transactions (block transfers).
Optimize bus state machine: Started with 6 state bus machine of Lab 2 Reduced it to only 3 states !!!
Total # of FPGA cycles per 8x8 block process: 3 x (64 Writes + (64+16) Reads ) = 432 Cycles
432 Cycles for 8 Row and 8 Column IDCTs
Results
Hardware produces correct outputs in simulation
Integrated system does not yet match simulation
Communication overhead between ARM and FPGA is the major bottleneck
Expected speed-up: ARM: 8 x 60 + 8 x 120 = 1440 ARM Cycles
(optimistic appr.) FPGA: 3 x (64 Writes + (64+16) Reads ) = 432 FPGA
Cycles
Conclusion
Work Completed Parallelized IDCT routines for each block decode in
FPGAWork to be completed
Get interface workingWhat we would have done differently
Used DMA to reduce communication overhead even more
Parallelize ARM and FPGA block processing Additional speed-up possible by moving njConvert
(upsampling & color conversion) into FPGA
References
Joint Photographic Experts Group http://www.jpeg.org/jpeg/index.html
Introduction to JPEG http://www.faqs.org/faqs/compression-faq/part2/
NanoJPEG http://keyj.s2000.ws/?p=137
Questions
?