supercomputing & multi-core have i/o problems that ... · pdf filesupercomputing &...
TRANSCRIPT
![Page 1: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/1.jpg)
Supercomputing & Multi-core Have I/O Problems
That Compression Can Solve
Samplify Systems, Inc.160 Saratoga Ave. Suite 150
Santa Clara, CA 95051www.samplify.com(888) LESS-BITS+1 (408) 249-1500
That Compression Can Solve27 Sep 2011
![Page 2: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/2.jpg)
Outline
� Introduction to Samplify Systems
� Samplify Prism Compression
� Prism Results on Integers
� Prism Results on IEEE-754 Floats
� “Good Enough” Results & Uncertainty Quantification
…simply the bits that matter®©2011 Samplify Systems, Inc.
� “Good Enough” Results & Uncertainty Quantification
� High-Performance Computing & Multi-core Bottlenecks
� Why Compression Can Help
Samplify & NCAR Collaboration ?
2
![Page 3: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/3.jpg)
About Samplify
• Intellectual Property company in Santa Clara, CA providing:
• Intellectual property for leading FPGAs & ASICs
• Semiconductors
• Module and system level
Executive Team:
Al Wegener, Founder & CTO• Industry-recognized compression expert• Inventor Samplify Prism compression• TI, Graychip, Morphics, Studer ReVox
Tom Sparkman, CEO• Sales and Marketing Semico Executive• 19 years Maxim, Motorola
…simply the bits that matter®©2011 Samplify Systems, Inc.
• Module and system level solutions
• Private company with >$22M raised from VCs & strategics (IDT & Schlumberger)
• Founded in March 2007
• 25 employees
3
Richard Tobias, VP Engineering• Engineering Semico Executive• Toshiba Semi, Pixelworks, White Eagle
(Quicksilver)
• 19 years Maxim, Motorola
Allan Evans, VP Marketing• Marketing & Technology Executive• Successful exits at Savi (LMCO), Netro
(NTRO), Stanford Telecom (Newbridge)
![Page 4: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/4.jpg)
Applications for Samplify Technology
First Markets:
Ultrasound – Higher resolution ultrasound machines, lower power portables, enable U/S “ODM” model in China
CT – Double number of x-ray sensors in existing hardware. Lower cost of data transport
New Markets:
High Speed Imaging –2x frame rate, resolution
HPC – Supercomputing
…simply the bits that matter®©2011 Samplify Systems, Inc.
Lower cost of data transport and storage
Wireless Base Stations –Lowers cost of data transport in wireless infrastructure. Especially important for LTE.
Wireless Repeaters – Dual-band over existing copper infrastructure
Storage –2x throughput & capacity
Broadcast – Reduce SDI coax links, long-range HDMI over UTP
Automotive – Driver assistance, collision avoidance, etc.
4
![Page 5: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/5.jpg)
Samplify’s Prism™ Signal Compression
• No other solutions operate as fast as Samplify. We start where they stop.• No psycho-visual/acoustic tricks. Samplify’s compression free from artifacts.• Operates in real time. Latency is very low with only a few samples of delay.• Validated by Experts: Herfkens (Stanford), Senzig (GE), several wireless OEMs• Samplify holds granted patents on integrating any lossless and lossy compression
into data converters (US 7,088,276) and in wireless base stations (US 8,005,152)
Q-CELP
…simply the bits that matter®©2011 Samplify Systems, Inc.
1 ksample/sec 40 Gsample/sec
…simply the bits that matter
Samplify spans 1ks-40Gs
10 ksps
ADPCM
Speech
LPC
100 ksps
Audio
to 50 Msps
Video
Q-CELP
5
![Page 6: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/6.jpg)
Samplify Prism Eliminates Signal Whitespace
0 500 1000 1500 2000 2500 3000 3500 4000-150
-100
-50
0
50
100
150
� Time domain whitespace: peak to average ratio of signals
� Frequency domain whitespace: oversampling of narrowband signals
� Full resolution not delivered by ADCs and DSP algorithms
�� No “a priori” signal information No “a priori” signal information
…simply the bits that matter®©2011 Samplify Systems, Inc.
6
12 Bit Resolution
10.5 Effective Bits
�� No “a priori” signal information No “a priori” signal information requiredrequired
Using floating point does not repeal the Nyquist criterion !!
![Page 7: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/7.jpg)
Prism Compression Algorithm & Modes
CompressionEngine
US 5,839,100
AdaptationEngine
Bit ratemonitor
Samplifycontroller
Compressed packetsInput samples
Param.tracking
RateTrakOptiBit
RateTrakVeribit
…simply the bits that matter®©2011 Samplify Systems, Inc.
monitorcontroller
MODE CONTROL RESULTS
tracking
7
• LOSSLESS• FIXED RATE• FIXED QUALITY
![Page 8: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/8.jpg)
Samplify’s Customer Signal Database
3000+ customer signal files; 700+ GB of data, including:
• Medical (CT, ultrasound, MRI, digital x-ray, PET)• Wireless (GSM, W-CDMA, cdma2000, LTE, WiMax)• Instrumentation (scopes, waveform generators, SerDes)• Military/defense (radar, SAR, spectra)
…simply the bits that matter®©2011 Samplify Systems, Inc.
8
• Military/defense (radar, SAR, spectra)• Automotive (RGB, infrared, ultrasound, radar)• Geophysical (sonobuoys, oil/gas exploration)• Video (NTSC, PAL, HD)• Print and still images (CMYK, YCrCb, RGB, infrared)• Floating-point data sets (seismic, drug discovery, molecular
simulation, astrophysics, weather satellite, fluid dynamics)
![Page 9: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/9.jpg)
Samplify Compression Results (Integers)
Signal Type Sample rate @ sample width
LosslessC. R.
Fixed rate C. R.& quality metrics
Typical customers
Wireless baseband (3G, LTE)
30.72 Msamp/sec @ 16 bits I & Q
1.2:1 – 1.5:1 1.6:1 – 2.3:1EVM, PCDE, ACLR
Ericsson, Huawei, ZTE
Wireless RF (3G, LTE)
600 Msamp/sec @ 16 bits I & Q
2:1 – 3:1 3:1 – 5:1EVM, PCDE, ACLR
Ericsson, Huawei, ZTE
Computedtomography
320,000 chans, 5 ksamp/sec @ 20 bits
1.6:1 – 2.7:1 3:1 – 4.5:1Radiologists & SSIM
GE, Philips, Toshiba
Ultrasound 64 - 256 chans, 1.5:1 – 2:1 2:1 – 3:1 GE, Siemens,
…simply the bits that matter®©2011 Samplify Systems, Inc.
9
Ultrasound(ADC)
64 - 256 chans, 50 Msamp/sec @ 12 bits
1.5:1 – 2:1 2:1 – 3:1Sonographers & SSIM
GE, Siemens, Sonosite
Ultrasound (beamformer)
4 beams, 12 Msamp/sec @ 18 bits
2:1 – 3:1 3:1 – 4:1Sonographers & SSIM
GE, Siemens, Sonosite
Images & video 60 frames/sec, 6 Msamp/sec @ 8 bits
1.5:1 – 2:1 2:1 – 3:1viewers, PSNR, SSIM
1000+ frames/sec
Oscilloscope (SerDes & LVDS)
60 Gsamp/sec@ 8 bits
1.3:1 – 2:1 2:1 – 4:1BER, rise/fall time
Agilent, Tektronix
Radar 3 Gsamp/sec@ 10 bits
2:1 – 3:1 3:1 – 5:1pd, pfa
Lockheed, Northrop
![Page 10: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/10.jpg)
Integer Compression: CT Scanners
Example 1:
Compression of CT X-ray Sensor Values (20-bit integers)
…simply the bits that matter®©2011 Samplify Systems, Inc.
10
20 bits/sample x 3,000 samples/sec per detector
X 912 detectors per rowX 64 rows
= 3.5 Gbps
![Page 11: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/11.jpg)
Integer Compression: CT Scanners
Bottleneck 1:
slip ringBottleneck
#1
…simply the bits that matter®©2011 Samplify Systems, Inc.
11
Bottleneck 2:
disk array
x-ra
y c
ou
nt
sensor number
1 200 500 800 1000
105
103
x-ray
source
x-ray
sensors
patient
Bottleneck#2
![Page 12: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/12.jpg)
Lossy Compression Methodology
Compress
Decompress
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
A
Compression
“samplified”
image
“samplified” (compressed + decompressed)
projectiondata
…simply the bits that matter®©2011 Samplify Systems, Inc.
ImageReconstruction
Compress
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
BCT Projection
Data Files
Compression
ratios:
3:1, 4:1, etc.
12
original
imageoriginal
projection
data
![Page 13: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/13.jpg)
Image Pair (SSIM_min = 0.9307)
…simply the bits that matter®©2011 Samplify Systems, Inc.
13
![Page 14: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/14.jpg)
Success: 3:1 Compression for CT
Of 419 image pairs, Dr. Herfkens correctly identified 17 “samplified” images:
RadiologistJudgment
Number of Images
Pct of images
“Left & right images 402 of 419 95.9%
…simply the bits that matter®©2011 Samplify Systems, Inc.
14
“Left & right images look identical”
402 of 419 95.9%
“Few minor streaks” 1 of 419 0.2%
“Streaks in soft tissue” 16 of 419 3.9%
…but no effect on the radiologist’s clinical diagnosis
using images created from “samplified” x-rays !!
![Page 15: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/15.jpg)
Integer Compression: 4G Wireless
Example 2:
Compression of 4G Wireless Baseband Signals
…simply the bits that matter®©2011 Samplify Systems, Inc.
15
16 bits/sample � 32 bits per (I, Q) sample pairx 30.72 Msamples/sec per antenna-carrierX 12 antenna-carriers per fiber-optic link
= 11.8 Gbps
![Page 16: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/16.jpg)
LTE Requires Distributed Base Stations
LTERRU LTE requires up
to 10 Gbps CPRI per sector
Remote radio units required for macro-celldeployments
To maintain coverage, LTE radio units deployed metro fiber.
LTERRU
MIMO technology for 4G makes passive antennas no longer feasible
CPRI incompatible with SONET/SDH �dark fiber required: DWDM/CWDM/PON
…simply the bits that matter®©2011 Samplify Systems, Inc.
16
LTEBBU
LTERRU
Hybrid 3G/4GRRU
Up to 10 km
sector
Each LTE RRU requires 8wavelengths across DWDM (6 for CWDM)
� 10 Gbps CPRI links very expensive!� LTE fiber optic CAPEX & OPEX up to 12x greater than 3G!
DWDM can support only 20 LTE RRUs;CWDM only 2
DWDM/PON
DWDM/CWDM/PON
![Page 17: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/17.jpg)
LTE Requires Distributed Base Stations
LTERRU
Samplify Prism IQ eliminates 10 Gbps CPRI links saving CAPEX
LTERRU
…simply the bits that matter®©2011 Samplify Systems, Inc.
17
LTEBBU
DWDM
LTERRU
Hybrid 3G/4GRRU
Up to 10 km
Samplify Prism IQ reduces OPEX of DWDM backhaul by 75%
� LTE fiber optic CAPEX & OPEX up to 12x greater than 3G!
Quadruple number of LTE RRUs deployed across dark fiber
![Page 18: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/18.jpg)
Success: Save ~$1500 per 4G CPRI Link
Component No Compression Compression
Fiber Optic Line Rate 9.8 Gbps 3.027 Gbps
Radio Head FPGA Stratix IV GX Cyclone IV GX
FPGA Price (1K, 2009) $560.00 $65.00
Fiber Optic Transceivers $590.00 $100.00
Baseband FPGA Stratix IV GX Cyclone IV GX
BB FPGA Price (1K, 2009) $560.00 $65.00
2 Fibers at 6.144 Gbps required
withoutcompression
4x6.144 Gbps SFP fiber optic modules
…simply the bits that matter®©2011 Samplify Systems, Inc.
18
Total $1,710.00 $230.00
Cost Savings per Sector $1,480.00Installation cost of
2nd fiber optic cable (150 ft)
� SAM2308 enables deployment of LTE-capable RRUs today with single fiber optic cable
� No tower climbing required to install second fiber optic cable to upgrade to LTE
![Page 19: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/19.jpg)
Compression Saves Mobile Industry $13.5B for LTE Deployment
� Industry expects 1M LTE base stations to be deployed world wide per year
� 3 sectors/CPRI links per base station
# LTE Base Stations Deployed per year
1M
Number of Sectors/CPRI links per base station
3
Number yrs of peak 3
…simply the bits that matter®©2011 Samplify Systems, Inc.
� LTE peak deployment years 2012-2014
�Compression saves $13.5B per year
19
Number yrs of peak deployment
3
Number of LTE CPRI Links
9M
Cost Savings per Link
$1,500
Total Savings $13.5B
![Page 20: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/20.jpg)
Virtually Lossless at 7.5 Effective Bits (2:1 compression)
Configuration:• TD-LTE Downlink• 20 MHz BW• E-TM 3.1 per 3GPP
TS36.141
Results:• EVM = 0.55% rms
…simply the bits that matter®©2011 Samplify Systems, Inc.
20
• EVM = 0.55% rms
�Virtually lossless: Equivalent to Agilent test equipment
![Page 21: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/21.jpg)
4
5
6
7
8
EV
M (
%)
4:1 Compression for LTE (Downlink)
No compression = 15 bits
EVM limit for LTE Downlink at 64 QAM
is 8%
Prism IQ achieves 3.75 effective bits at
8% EVM = 4:1 compression
…simply the bits that matter®©2011 Samplify Systems, Inc.
3 4 5 6 7 8 9 100
1
2
3
Effective Number of Bits
EV
M (
%)
21
At 7.5 effective bits (2:1 compression)
EVM performance is equivalent to Agilent
test equipment
compression
![Page 22: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/22.jpg)
Integer Compression: Imaging
Example 3:
Compression for 40 Mpixeland 2k frames/sec Cameras
…simply the bits that matter®©2011 Samplify Systems, Inc.
22
16 bits/pixel x 40 Mpixel/frame x 30 fps =
= 19 Gbps
16 bits/pixel x 1 Mpixel /frame x 2k fps =
= 32 Gbps
![Page 23: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/23.jpg)
Prism Lossless Compression
• Lossless means bit-exact replica of original
• Samplify SignalZIP lossless compression achieved minimum 1.76:1 compression
2.09 : 1 1.90 : 1
…simply the bits that matter®©2011 Samplify Systems, Inc.
1.76:1 compression
• Algorithm operates in real time on FPGA
• Switch from lossless to lossy with a register setting
9/28/2011 V1.1
2.09 : 1
1.83 : 1
1.90 : 1
1.76 : 1
23
![Page 24: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/24.jpg)
Prism Fixed-Rate Compression
• Fixed rate provides high quality compression at a given rate
• Minimal image degradation between different steps of
2.65:1Original
…simply the bits that matter®©2011 Samplify Systems, Inc.
different steps of compression
• Algorithm operates in real time on FPGA
• Switch from lossless to lossy with a register setting
9/28/2011 V1.1
2.65:1
3.15:1 3.60:1
Original
24
![Page 25: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/25.jpg)
Infrared Imaging
Across 40 infrared images, Prism HD achieved
…simply the bits that matter®©2011 Samplify Systems, Inc.
25
~4:1 lossless
(12 grayscale bits per pixel)
![Page 26: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/26.jpg)
Bayer Matrix Image Results
File Name
File Size
(bytes) CR lossless
SSIM @
2.0:1
SSIM @
2.5:1
SSIM @
3.0:1
SSIM @
3.5:1
SSIM @
4.0:1
Cam1-b.bin 3956064 1.70 0.9968 0.9887 0.9760 0.9598 0.9413
Cam1-g1.bin 3956064 1.62 0.9953 0.9858 0.9690 0.9473 0.9245
Cam1-g2.bin 3956064 1.62 0.9954 0.9860 0.9690 0.9482 0.9248
Cam1-r.bin 3956064 1.55 0.9951 0.9811 0.9596 0.9279 0.9127
Cam2-b.bin 3956064 2.12 1.0000 0.9946 0.9919 0.9853 0.9775
…simply the bits that matter®©2011 Samplify Systems, Inc.
Cam2-g1.bin 3956064 1.90 0.9980 0.9929 0.9837 0.9699 0.9566
Cam2-g2.bin 3956064 1.90 0.9979 0.9928 0.9842 0.9696 0.9590
Cam2-r.bin 3956064 1.84 0.9967 0.9927 0.9827 0.9669 0.9469
Cam3-b.bin 3956064 1.73 0.9960 0.9894 0.9775 0.9606 0.9449
Cam3-g1.bin 3956064 1.65 0.9955 0.9858 0.9692 0.9480 0.9275
Cam3-g2.bin 3956064 1.65 0.9958 0.9866 0.9688 0.9496 0.9282
Cam3-r.bin 3956064 1.61 0.9950 0.9840 0.9651 0.9381 0.9180
26
![Page 27: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/27.jpg)
Example: HD Video @ 2.5:1 compression
…simply the bits that matter®©2011 Samplify Systems, Inc.
27
{-2, +5} {-3, +3} {-3, +6}
![Page 28: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/28.jpg)
Compression of Floats: Prism FP*
Compression for High-Performance Computing
(HPC)
* floating point
…simply the bits that matter®©2011 Samplify Systems, Inc.
28
• Compressing Integers and Floating-Pt Values• For HPC Scientific, Technical & Multi-core Apps
FP
![Page 29: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/29.jpg)
Prism FP Compression for HPC
Prism FP features:
• User-selectable lossless & lossy modes• Compresses integers and floating-point values• Low complexity (“fits under a bond pad or two”)• Low latency (< 6 clks to comp or decomp 4 numbers)• Trade higher latency for better compression
…simply the bits that matter®©2011 Samplify Systems, Inc.
29
• Trade higher latency for better compression• Scalable to PCIe Gen3, DDR3, & optical rates
• Targeted at HPC applications:
>> Prism FP solves multi-core I/O problems <<
![Page 30: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/30.jpg)
Floating-point Basics
The ONLY Standard That Matters:
IEEE-754-2008
“mantissa”
…simply the bits that matter®©2011 Samplify Systems, Inc.
30
![Page 31: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/31.jpg)
Prism FP Concept
Using floating-point representation:• doesn’t repeal the Nyquist criteria
• doesn’t reduce dynamic range requirements !!
+127(max exp)
exp = 523 bits {5 .. -17}
exp = -123 bits {-1 .. -23} exp = -7
23 bits {-7 .. -29}
10+38
Base 10 Base 2
± Inf, NaN
…simply the bits that matter®©2011 Samplify Systems, Inc.
0
Exponent: 5 5 4 2 -1 -2 -3 -5 -5 -7 -9 …
23 bits {-7 .. -29}
-128(min exp)
10-38
100
= 1.0000
Denorm,± Zero
equivalent
“noise floor”
31
![Page 32: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/32.jpg)
Prism FP Results on Nvidia CUDA SDK
Signal & Datatype
Prism Real-time
Compression Rate
Prism Lossless
Comp Ratio
Prism Lossy Comp
Ratios & Quality Metrics
3G & 4G wireless,
16-bit integers
3 to 10 Gbps 1.2:1 – 1.5:1 1.6:1 – 2.3:1
EVM, PCDE, ACLR
Computed tomography,
20-bit integers
20 to 80 Gbps 1.6:1 – 2.7:1 3:1 – 4.5:1
Radiologists & SSIM
Medical ultrasound,
12-bit integers
50 to 300 Gbps 2:1 – 3:1 3:1 – 4:1
Sonographers & SSIM
Image sensors, 0.6 to 10 Gbps 1.5:1 – 2:1 2:1 – 3:1
…simply the bits that matter®©2011 Samplify Systems, Inc.
32
12-bit integers Viewers, PSNR, SSIM
Oscilloscopes,
8-bit integers
100 to 600 Gbps 1.3:1 – 2:1 2:1 – 4:1
BER, rise/fall time
k-means clustering,
32-bit floats
300 Mfloat/sec 1.4:1 – 2:1 2:1 – 4.5:1
SSIM, % error
Black-Sholes financial,
32-bit floats
100 Mfloat/sec 1.6:1 – 2.2:1 3:1 – 4:1
% error of mean and std
3D wireframe model,
32-bit floats
60 Mfloat/sec 1.9:1 – 2.6:1 2:1 – 3.5:1
visual inspection, SSIM
Example 1
Example 2
![Page 33: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/33.jpg)
k-means Clustering (from CUDA SDK)
Resulting oval measurements:
• location (xi, yi) and 2.5:1 compression
…simply the bits that matter®©2011 Samplify Systems, Inc.
33
• location (xi, yi) and
• axis length (Lx, Ly)
differ in the 6th decimal place, e.g.:
3.55873 vs. 3.55875
2.5:1 compression
![Page 34: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/34.jpg)
Graphics: FP Wireframe & Textures
original decompressed
…simply the bits that matter®©2011 Samplify Systems, Inc.
34
2.75:1 compression, SSIM = 0.99
![Page 35: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/35.jpg)
Geophysical Exploration Data Bottlenecks From Acquisition to Data Processing
3. Data storage &
Formats:
• LIS, DLIS
• SEG-D, -Y
• WellLog ML
…simply the bits that matter®©2011 Samplify Systems, Inc.
35
1. Seismic sensor acquisition
3. Data storage &intermediate results
4. Computation
2. Remote data transmission
�Data sets are petabytes in size!
![Page 36: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/36.jpg)
Prism FP Results for HPC Seismic
Signal Type Signal Description Lossy Comp Ratio & Quality Metric or Resolution
Images Downhole imaging 20:1 to 60:1 @ SSIM > 0.99
Acoustic traces 5 acoustic files 2:1 to 4:1 @ 80+ dB
Acoustic archives Trace headers & signals 2:1 @ 99.1 dB3:1 @ 69.6 dB
Earth models Delta, epsilon, velocity 2:1 @ 137 dB3:1 @ 70 dB
…simply the bits that matter®©2011 Samplify Systems, Inc.
36
Forward path RTM Reverse Time Migration intermediate signal
3:1 to 4:1 @ 55 - 75 dB
Noise-reducedacoustic traces
Reverse Time Migration input signal 2:1 to 4:1 @ 45 - 60 dB
Pressure (Type 1) 4 pressure waveforms 2.66:1 to 3.47:1 @ 0.01 psi5.24:1 to 6.57:1 @ 0.1 psi
Pressure (Type 2) 1 pressure waveform 4.33:1 @ 0.01 psi6.2:1 @ 0.1 psi
Temperature 4 temperature waveforms 15.9:1 to 19.3:1 @ 0.01º C21.9:1 to 22.6:1 @ 0.1º C
![Page 37: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/37.jpg)
Objective Metrics of Signal Quality:
…simply the bits that matter®©2011 Samplify Systems, Inc.
How to Quantify “Good Enough” Results
37
![Page 38: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/38.jpg)
Prism Compression’s Effects on Results ?
Q: How does compression affect users’ signal quality?A: IT’S COMPLICATED – JUST TRY IT!
• Medical imaging:• computed tomography (CT): SSIM + radiologists’ assessment• ultrasound: working with 10+ Asian and 2 US ultrasound mfrs (sonographer assessment)
…simply the bits that matter®©2011 Samplify Systems, Inc.
(sonographer assessment)
• Wireless:• Measure EVM, ACLR, spectral emissions masks, PCDE
• Seismic: • Ask geophysicists to assess the quality of 3D Earth images• SSIM on 3D Earth “slices”• Try on both input signals (acoustic traces) and intermediate sigs
38
![Page 39: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/39.jpg)
Simple Signal Quality Metrics
x(i) = original signaly(i) = decompressed signal
d(i) = x(i) – y(i) <<< difference signal
Some representative signal quality metrics include:
1. mean(d) error mean2. std(d) error standard deviation
…simply the bits that matter®©2011 Samplify Systems, Inc.
2. std(d) error standard deviation3. max(abs(d)) worst-case error4. SNR(x) – SNR(y) decrease in SNR5. 100 * rms(d) / rms(x) percent error6. FFT(y) – FFT(x) spectral effects
CAVEAT: These quality metrics are easy to measure, BUT they don’t tell you how the final results are affected !!
39
![Page 40: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/40.jpg)
Image Quality Metrics
• Difference image: Di,j = Oi,j – Pi,j
• HU diffs: min(Di,j) and max(Di,j), vs.• Percentile-based HU diff thresholds
• Local contrast ratio:Contrast = sqrt (mean (∑ (O – O)2 ) )
…simply the bits that matter®©2011 Samplify Systems, Inc.
40
ContrastRMS = sqrt (mean (∑ (Oi,j – O)2 ) )
• Peak signal-to-noise ratio (PSNR) << not useful
• Just-noticeable differences (JND) << not available
• Masking effects (bone, air, etc.)• Structural Similarity (SSIM) << next page
![Page 41: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/41.jpg)
Structural Similarity Metric (SSIM)
SSIM(O, P) = l(O, P) ● c(O, P) ● s(O, P)
= ( ) ● ( ) ● ( )2µOµP
µO + µP2 2
2σO σP
σ O + σ P2 2
σOP
σ O σ P
…simply the bits that matter®©2011 Samplify Systems, Inc.
41
Brightness(µ)
Contrast(σ)
“Structure”(cross-correlation)
Ref: Wang & Bovik, IEEE Signal Processing Magazine, Jan 2009
![Page 42: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/42.jpg)
Uncertainty Quantification (1 of 2)
In general, uncertainty quantification has to incorporate research and development efforts in three key, irreducibletechnical areas:
…simply the bits that matter®©2011 Samplify Systems, Inc.
42
(1) Characterization of uncertainty in systemparameters and the external environment;
(2) Propagation of this uncertainty through largecomputational engineering models; and
(3) Verification and validation of the computationalmodels and incorporating the uncertainty of the models themselves into the global uncertainty assessment.
![Page 43: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/43.jpg)
Uncertainty Quantification (2 of 2)
…simply the bits that matter®©2011 Samplify Systems, Inc.
43
![Page 44: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/44.jpg)
“What a Long, Strange Trip It’s Been”
“Multi-core Needs Compression” – REALLY??
…simply the bits that matter®©2011 Samplify Systems, Inc.
44
![Page 45: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/45.jpg)
a)b)
What is Numerical Data? Ints & Floats
…simply the bits that matter®©2011 Samplify Systems, Inc.
Figure 1
Prior Artc)
45
![Page 46: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/46.jpg)
NUMERICALINPUT(INTS /
FLOATS)
MULTI-CORENUMERICALPROCESSOR
NUMERICALOUTPUT(INTS /
FLOATS)
HPC is “Just” Numerical Processing
…simply the bits that matter®©2011 Samplify Systems, Inc.
INTERMEDIATERESULTS
(INTS / FLOATS)
46
Two kinds of HPC algs:
1. compute-bound2. I/O-bound
Samplify accelerates I/O-bound applications
![Page 47: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/47.jpg)
I/O Is A Real HPC & Multi-core Problem
GPU and multi-core trends:
• Cores scale (Moore’s Law), but I/O (pins, clks, mem speed) doesn’t• Core utilization (% busy) keeps decreasing (e.g. < 20% in seismic)• Nvidia GPUs with 16 lanes of PCIe Gen2 (8 GB/sec)
• In 2007: 192 SMPs (GeForce) � 41 MB/sec per core• In 2011: 512 SMPs (Fermi) � 15 MB/sec per core
• Intel x86
…simply the bits that matter®©2011 Samplify Systems, Inc.
47
• Intel x86• In 2006: 500 MB/sec per core • In 2011: 2 GB/sec for 4 cores � still 500 MB/sec per core
Int’l Supercomputing & Hot Chips Conferences:
• “Exascale is I/O-limited, while multi-core is easy” Jeffrey Vetter, DoE
• “Exascale is power-limited (20 MW/Exaflop)” Jack Dongarra, DoE
• “Communication-avoiding algorithms” Jim Demmel, UC Berkeley
![Page 48: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/48.jpg)
1. The real world is inherently noisy:• Real-world (vs. idealized) measurements contain noise• Signal-to-noise ratio (SNR) measures what part of measurements
are “useful” (ADC analogy: resolution vs. ENOB)• “Simulated real-world” computations add noise on purpose (Monte
Carlo)
2. The real world is inherently lowpass:
Why Lossy Comp is OK for HPC (1 of 2)
…simply the bits that matter®©2011 Samplify Systems, Inc.
2. The real world is inherently lowpass:• To a DSP guy, 2D Nyquist rate � choosing grid/mesh size for HPC• Time series of adjacent HPC grid/mesh points are correlated• Distance and time attenuate signals, often to r2 or r3 (e.g.
SerDes on backplanes, light in space, audio signals, etc.)• 2 kinds of HPC problems:
• those that can be validated against the real world, and • those that can’t (“theoretical” HPC problems…)
48
![Page 49: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/49.jpg)
Why Lossy Comp is OK for HPC (2 of 2)
3. Application dyn range vs. Computational dyn range:
• The required dynamic range of HPC signals (input, intermediate, output) is typically lower than the dynamic range provided by 32/64-bit computational float engines
…simply the bits that matter®©2011 Samplify Systems, Inc.
49
• 32-bit and 64-bit floats are arbitrary:• Why not 21-bit or 16-bit mantissas? • Why 8-bit and 11-bit exponents? • Why not 5-bit or 16-bit exponents…
Simple goal: “good enough” answers … sooner and faster!
![Page 50: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/50.jpg)
Future: Prism 4 for Multi-Core Engines
x86Core 1
x86Core 2
x86Core 3
x86Core 4
x86Core 5
x86Core 6
FrontSideBus
DDRx
&PCIeGen2
QPI or HT Ring,≤ 200 GB/sec (256-bit bus)
3 GHz cores,
DDRxDIMM
#2
DDRxDIMM
#1
C
C D C D C D
C DC DC D
Compress
C D
CD
CD
8 -18GB/sec
…simply the bits that matter®©2011 Samplify Systems, Inc.
x86 bottlenecks:o DDR3 (off-chip RAM)o PCIe (off-chip I/O)o Inter-core communicationso QPI and HyperTransport
50
PCIe Gen2 bus
3 GHz cores,1200 – 2000 pins
C
D
Compress
Decompress
8 GB/sec
GPU bottlenecks:o On-chip “shared RAM”o GDDR5 (video RAM)o PCIe (off-chip I/O)
Network bottlenecks:o Infinibando 10 GbE, 40 GbEo MPIo RapidIO
![Page 51: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/51.jpg)
How to Start? Send Samplify Signals, or Use Prism Software
Usual Samplify model: customers send Samplify (Al) > 700 GB
Option 1: Existing Prism 3 (ints) and Prism FP (floats) SW:
• Prism 3 for Windows and Matlab• Prism FP for Linux, Windows, and Matlab
Option 2: Easy Ports:
…simply the bits that matter®©2011 Samplify Systems, Inc.
51
Option 2: Easy Ports:
• fwrite_c, fread_c (for file I/O)• memcpy_c (for memory moves)
Option 3: More work, but possible:
• MPI_SEND_C, MPI_RECV_C (MPI)• What else?
![Page 52: Supercomputing & Multi-core Have I/O Problems That ... · PDF fileSupercomputing & Multi-core Have I/O Problems That Compression Can Solve Samplify Systems, Inc. ... CPRI per sector](https://reader034.vdocuments.net/reader034/viewer/2022052608/5aa0d1637f8b9a7f178ea6ff/html5/thumbnails/52.jpg)
Proposed Collaboration with NCAR
• Try Prism compression (Linux, Windows, Matlab)• Quantify your application’s BW and/or storage bottlenecks• Quantify your application’s sensitivity to input variations• Quantify your application’s “good enough” results level
or
…simply the bits that matter®©2011 Samplify Systems, Inc.
• Send Samplify your signals (in, intermediate, out) & we’ll do the work
Goal: publish collaboration results in 2012
Contact: Al [email protected]
408-221-1191
52