algorithms and computation in signal processing
TRANSCRIPT
![Page 1: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/1.jpg)
Algorithms and Computation in Signal Processing
special topic course 18-799Bspring 2005
1st Lecture Jan. 11, 2005
Instructor: Markus PueschelTA: Srinivas Chellappa
![Page 2: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/2.jpg)
Motivation and Idea behind this Course
![Page 3: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/3.jpg)
The Problem: Example DFT on Pentium 4
Intel vendor library(hand-optimizedassembly code)but also FFTW, SPIRAL generated code
10x
DFT size
reasonableimplementation(Numerical recipes.GNU scientific library)
Ok, but the DFT is kind of complicated,so let’s take something simpler …
![Page 4: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/4.jpg)
The Problem: Matrix-matrix Multiplication
1000 2000 3000 4000 5000Size
1000
2000
3000
4000
5000
MFLOPS
Compiler
Model
CGw�S
Unleashed
BLAS
Matrixsize
standard: triple loop +compiler optimization
vendor library(handoptimizedassembly code)
60xATLAS generated code
Why is that?graph: Pingali, Yotov, Cornell U.
![Page 5: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/5.jpg)
Moore’s LawMoore’s Law: exponential (x2 in ~18 months) increase number of transistors/chip
But everything has its price …
sour
ce: S
cient
ific A
mer
ican,
Nov
200
4, p
. 98
![Page 6: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/6.jpg)
Moore’s Law: ConsequencesComputers are very complex
multilevel memory hierarchyspecial instruction sets beyond standard C programming modelundocumented hardware optimizations
Consequences:Runtime depends only roughly on the operations Runtime behavior is hard to understandCompiler development can hardly keep trackThe best code (and algorithm) is platform-dependentIt is very difficult to write really fast code
Computers evolve fastHighly tuned code becomes obsolete almost as fast as it as written
![Page 7: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/7.jpg)
What about the Future?
It gets rather worse: End of Moore’s Law and proliferation of multicore systems
Scientific American, Nov. 2004: “A Split at the Core,”subtitle: “[…] that is bad news for the software companies”
Dr. Dobb's Journal, 30(3), March 2005: “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software”
How to produce really fast code?and with reasonable effort?
![Page 8: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/8.jpg)
Current Research: New Approaches to Software
Linear Algebra: LAPACK, ATLAS BeBOP
Tensor Computations (Quantum Chemistry): Sadayappan, Baumgartner et al. (Ohio State)Finite Element Methods: Fenics (U. Chicago)Signal Processing:
FFTW SPIRALVSIPL (Kepner, Lebak et al., MIT Lincoln Labs)
New Compiler Techniques (Domain aware/specific):Model-based ATLAS Broadway (Lin, Guyer, U. Texas Austin)SIMD optimizations (Ueberhuber, Univ. Techn. Vienna)Telescoping Languages (Kennedy et al., Rice)
See also upcoming Proceedings of the IEEE special issueon “Program Generation, Optimization, and Adaptation,”http://www.ece.cmu.edu/~spiral/special-issue.html
![Page 9: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/9.jpg)
Possible Philosophy?
Present Future
ImplementationLevel
starting point: one algorithm/programhigh level information destroyedimplementation restricted
AlgorithmLevel
algorithm/implementation codesign
domain knowledgeused for optimization
auto
mat
ion
a u
t o m
a t
i o n
a new breed of domain-aware approaches/toolspush automation beyond what is currently possible
applies for software and hardware design alike
![Page 10: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/10.jpg)
Idea of this CourseWriting fast numerical code requires multidisciplinary knowledge of algorithms, programming/compilers, and computer architecture
Study the interaction of algorithms, implementation, and architecture at hand of cutting edge adaptive numerical softwareLearn a guideline how to write fast code and apply it in a research project
programming/compilersalgorithms
fastcodecomputer
architecture
![Page 11: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/11.jpg)
Course TopicsFoundations of algorithm analysis
cost and complexity, O-calculus, cost analysis through recurrences
Computer architecturearchitecture and microarchitecture, memory hierarchy/caches, execution units, special instruction sets (in particular, short vector instructions)
Compilersstrengths, limitations, guidelines for use
In detail: algorithms, complexity, and cutting edge adaptive software (extract design principles)
Discrete Fourier transform, other transforms, correlation, filters (FFTW, SPIRAL)Matrix-matrix multiplication (ATLAS) and possibly other linear algebra functionality (LAPACK)Sparse linear algebra (BeBOP)other as time permitswork towards a guideline for writing fast numerical codeapply that guideline in your research project
![Page 12: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/12.jpg)
About this CourseRequirements
solid C programming skillsmatrix algebrasenior or above
Grading50% research project20% midterm20% homework10% class participation
No textbookOffice Hours: yet to be determinedWebsite: www.ece.cmu.edu/~pueschel -> teaching -> 18-799B
![Page 13: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/13.jpg)
Research Project
Team up in pairs (preferably)
Topic: Very fast, ideally adaptive, implementation of (or code generation for) a numerical problem
End of January/early February: suggest to me a problem or I give you a problem
Show “milestones” during semester
Write 4 page standard conference paper (template will be provided)
Give short presentation end of April
![Page 14: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/14.jpg)
Midterm
Mostly about algorithm analysis
Some multiple-choice
There is no final exam
Final Exam
![Page 15: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/15.jpg)
Homework
Exercises on algorithm analysis (Math)
Implementation exercises study the effect of program optimizations, use of compilers, use of special instructions, etc. (Writing C code + creating runtime/performance plots)some templates will be provided
Probably: More homework in the beginning, less in the end
![Page 16: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/16.jpg)
Classes/Class Participation
I’ll start on time, duration ~1:30 (without break)be on time, it’s good style
It is important to attendmany things I’ll teach are not in booksI’ll use part slides part blackboard
Ask questions
I will provide some anonymous feedback mechanism (maybe every 3-4 weeks)
![Page 17: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/17.jpg)
Motivation from the Applications Side: Signal Processing
![Page 18: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/18.jpg)
Definitions
Definition: Signal Processing[The discipline that is concerned with] the representation, transformation, and manipulation of signals and the information they contain (Oppenheim, Schafer 1999)
Definition: Signal(In signal processing) A function over an index domain
Typical examples:
digital signal processing
![Page 19: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/19.jpg)
Examples
MultimediaSpeech (1-D), Image (2-D), Video (3-D)Quality improvement, compression, transmission
BiometricsMedical/BioimagingComputer visionCommunication
![Page 20: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/20.jpg)
Multimedia: Example Image Compression
Lena Pepper Baboon
512 × 512 × 3 bytes = 768KBWith JPEG, ~32KB
![Page 21: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/21.jpg)
JPEG: How does it Work?8 x 8 pixel block
2-D DCT
quantization(lossy)
entropy coding(lossless)
bit stream
![Page 22: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/22.jpg)
JPEG versus JPEG2000
original: 3MB
JPEG: 19KB (DCT based)
JPEG2000: 19KB (wavelet based)
![Page 23: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/23.jpg)
Multimedia Coding
MPEG-I to MPEG-IVIncludes standards for audio, image and VideoExample: MPEG-II, layer III audio = MP3
Temporal domain Frequency domain Data compression
Analysis filterbankAnalysis filterbank
Perceptual modelPerceptual model
Quantization and coding
Quantization and coding Bitstream encodingBitstream encoding
Digitalaudio Bitstream
transforms: DFT, MDCT, DCT
![Page 24: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/24.jpg)
Example: Biometrics
Facial Expression
FingerprintsIllumination
Source: Bhagavatula/Savvides
![Page 25: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/25.jpg)
How does it Work?: Registration
Training Training Images Images captured captured by by cameracamera
Filter Design Filter Design ModuleModule
Correlation Correlation Filter H Filter H (Template)(Template)
Frequency Frequency Domain arrayDomain array
Frequency Frequency Domain arrayDomain array
Frequency Frequency Domain arrayDomain array
FFTFFT
FFTFFT
FFTFFT
N x N pixelsN x N pixels N x N pixels (complex)N x N pixels (complex)
N x N pixels N x N pixels (complex)(complex)
Source: Bhagavatula/Savvides
![Page 26: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/26.jpg)
How does it Work?: Identification
Test Image Test Image captured captured by by cameracamera
Correlation Correlation Filter H Filter H (Template)(Template)
Frequency Frequency Domain arrayDomain array
FFTFFT
N x N pixelsN x N pixels
N x N pixelsN x N pixelsResulting Resulting Frequency Frequency Domain arrayDomain array
IFFTIFFT
PSRPSR
Source: Bhagavatula/Savvides
![Page 27: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/27.jpg)
longitudinal slices
Example: Cardiac MRI
short-axis
long-axis
transversal slices
non-tagged
tagged
Goal: 3D-movie from 2D dataSource: Hsien/Moura
![Page 28: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/28.jpg)
3-D Motion Estimation Procedure
Estimate 2-Ddense disp.
Establish spatial correspondences
Track temporal correspondences
Y. Sun, Y.L. Wu, K. Sato, C. Ho, and J.M.F. Moura, Proc. Annual Meeting ISMRM 2003
PreprocessingCreating a
fibrous architecture
Adopting continuum mechanics
Minimizing constrained
energy
U2D
U
Source: Hsien/Moura
![Page 29: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/29.jpg)
Example: MRI
Kidney tracking
MRI
Compensation for motionSource: Sun/Moura
![Page 30: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/30.jpg)
Example: Bioimaging
Classification
This is Tubulin!
Segmentation
Goal: automatic, fast, reliable identification of proteins from their distribution in the cellSignal processing
SegmentationClassifikation (Wavelets, Frames)
Source: Kovacevic/Murphy
![Page 31: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/31.jpg)
Images
Source: Kovacevic/Murphy
![Page 32: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/32.jpg)
Example: Computer Vision
Suberbowl 2001 (Kanade et al.)
Plot: Kanade
![Page 33: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/33.jpg)
Example: Communication
Goal: Robustness to losses in transmission
Transformcoder
TransformdecoderNetwork X
Source: Kovacevic
![Page 34: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/34.jpg)
Photo-to-Grandma ProblemGoal: send a digital photo to ItalyAvailable: FedEx or regular “post channel”
FedEx 99% reliable, cost $39.99Postal 80% reliable, cost $3.40
1 floppy per envelope onlyPhoto needs 2 floppies (CDs haven’t been invented yet)
new girlfriend Grandma lives in Italy
Source: Kovacevic
![Page 35: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/35.jpg)
Heterogenous Channel (Dumb Solution)
0.792
0.198
0.008
0.002
cost fixed to$43.39
Source: Kovacevic
![Page 36: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/36.jpg)
Heterogenous Channel (Smart Solution)
0.792
wavelets0.198
0.008
0.002
cost fixed to$43.39
Source: Kovacevic
![Page 37: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/37.jpg)
Summary:Computational Kernels in Signal Processing
Filter: FIR, IIR, correlation,filter banks
Signal transforms:DFT, DCT, wavelets, frames
Linear algebra:vector sum, matrix-vector product,…singular value decomposition, matrix inversion,…
Coding: Huffman, arithmetic, Viterbi, LDPC
Most DSP computation is linear algebra
![Page 38: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/38.jpg)
Numerical Computation Beyond DSP
More than 90% of all numerical computation are linear algebra computations/algorithms
Sciences: Chemistry, Physics, Biology; Economics; Engineering; etc.
![Page 39: Algorithms and Computation in Signal Processing](https://reader036.vdocuments.net/reader036/viewer/2022071603/613d79c3736caf36b75dc5f0/html5/thumbnails/39.jpg)
ImplementationPractically infinite speed requirements
Very large data setsRealtime
Multitude of platformsHardware: ASIC, FPGASoftware
Single vs. multiprocessor computersWorkstation versus embedded processorFloating point vs. fixed point arithmetic
Combined hardware/software platformsProblems: Implementation difficult, expensive (time/money), becomes quickly obsolete
In this course: Single processor workstations