Improve Hadoop Economics, Performance,
and Security with Compression and
Encryption
Ravi Lambi
Director of Software Engineering
Data Compression and Security Business Unit
Exar Corporation
Santa Clara, CA USA
November 2014
1
The Storage IO BottleneckPerformance Gap
1
10
100
1000
10000
100000
1000000
10000000
100000000
Processor
Traditional Disk
Santa Clara, CA USA
November 2014 2
The Storage IO BottleneckCurrent Server Solution
More Disks and more rack space
This will increase management cost, and also require more expensive
storage controller. Additionally, there is a limit to scale the width – each
server has a hard physical limit.
Santa Clara, CA USA
November 2014 3
The Storage IO Bottleneck
It Is Difficult to Balance Performance, Capacity
Scaling and Cost Associated with the Storage IO
Summarized Challenge
Performance
Cost Capacity
Santa Clara, CA USA
November 2014 4
Ingest
Map
Compress/Distribute
Decompress/Compute
Reduce
Output
Compression
CodecDiskNetwork
Santa Clara, CA USA
November 2014 5
Compression TechnologyWhere In Hadoop To Apply Compression?
Exar’s Hadoop Acceleration - AltraHD
Ingest
Map
Compress/Distribute
Decompress/Compute
Reduce
Output
Compression
Codec
File System
Filter Driver
Driver
DiskNetwork
File System Filter
DriverCompression
Codec
5 GB/sec HW
Compression &
Encryption
Accelerator
Santa Clara, CA USA
November 2014 6
Offload Compression and Accelerate
AltraHD Overview
Storage Volume
Native File System
File System Filter
Driver
Driver
Applications
Native Linux Kernel
• File System Filter Driver
– Kernel plug-in at the file system layer
– Compresses/decompresses ALL files
independent of application
• Transparent to the Application
– No modification to Applications or
Workflow
• Seamlessly Layers over File System
– Supports EXT3, EXT4, or XFS
• Fast, Easy Deployment
– No APIs – Software installs in minutes
• Hardware Acceleration Offloads Host
CPU
Exar Compression
& Encryption
Acceleration Card
Core Technology
Santa Clara, CA USA
November 2014 7
MapReduce 1 Terasort Benchmark
93% 35% 27% 51% 21% 22%
0
200
400
600
800
1000
1200
1400
EXT3 8 Disk XFS 8 Disk EXT4 8 Disk EXT3 12 Disk XFS 12 Disk EXT4 12 Disk
SE
CO
ND
S
Native AltraHD % Improvement
Value Proposition – Performance
Santa Clara, CA USA
November 2014 8
Value Proposition - PerformanceMarReduce2 Job Execution Time
27% 34% 31% 18% 18% 24%
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
EXT3 8 Disk XFS 8 Disk EXT4 8 Disk EXT3 12 Disk XFS 12 Disk EXT4 12 Disk
SE
CO
ND
S
Native AltraHD % Improvement
Santa Clara, CA USA
November 2014 9
Value Proposition – Storage
Santa Clara, CA USA
November 2014 10
Increased Storage Capacity
672
192
0 100 200 300 400 500 600 700
TERABYTES
MR2 Effective Storage Capacity
Native AltraHD
1344
192
0 300 600 900 1200 1500
TERABYTES
MR1 Effective Storage Capacity
Native AltraHD
Value Proposition – Storage
Santa Clara, CA USA
November 2014 11
Increased Storage Capacity
Native - Storage
AltraHD – Effective Storage
Value Proposition – Security
Exar’s Compression Acceleration Card Supports
Compression, Encryption, and Hashing in a Single
Pass
Aligned with Hadoop Security Roadmap
Santa Clara, CA USA
November 2014 12
Compression
Encryption
Hashing
Value Proposition – Indirect Values Other Savings
Reduce Indirect Costs:
• Power
• Rack Space
• Cooling
• Disk Life
• etc.
Santa Clara, CA USA
November 2014 13