1 openssl acceleration using graphics processing units pedro miguel costa saraiva
TRANSCRIPT
1
OpenSSL acceleration using Graphics
Processing Units
Pedro Miguel Costa Saraiva
2
Introduction•Cryptography: The study of
security techniques
•SSL: A set of rules governing authentication and encrypted client/server communication• De facto standard for secure electronic
communications
• Computationally intensive
• Large volumes of SSL traffic impact performance
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
3
Introduction
•GPU: A specialised processing unit designed to manipulate graphics• Originally used solely for graphics calculations
• Recent developments enable its use for general purpose computing
• Massive computational power
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
4
Introduction
•OpenSSL• Open-source implementation of the SSL and
TLS protocols
• Core-library implements a variety of cryptographic functions
• Intensively used by an extremely large number of both open and proprietary applications
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
5
Introduction
•Objectives• Efficiently offload cryptographic operations
onto a GPU
• Add GPU functionality to OpenSSL
• Lighten the load on the CPU
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
6
Introduction•Structure
• State of the art• OpenSSL
• GPU
• Programming the GPU
• OpenCL
• CUDA
• OpenCL vs CUDA
• Main challenges
• Implementation
• Results
• Conclusion
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
7
State of the art
•OpenSSL
• Commercial-grade full-featured open source toolkit
• Divided into libssl and libcrypto
• Core library written in C
• Supports accelerator hardware via engines
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
8
State of the art
• Massive parallel processing power
• Roughly ten times the floating point capability of a high end CPU
• Faster growth rate than CPUs
Pedro Miguel Costa Saraiva
GPU
OpenSSL acceleration using Graphics Processing Units
9
State of the art
• At the end of the 90s, graphics cards could not be programmed
• Things changed in 2001 with the release of DirectX 8 and OpenGL
• Programmers had to express their computations in terms of textures, vertices and shader programs
Pedro Miguel Costa Saraiva
GPU - Programming
OpenSSL acceleration using Graphics Processing Units
10
State of the art
• 2006: NVIDIA created the CUDA framework
• ATI created the CTM low-level framework
• 2008: NVIDIA and ATI joined the Khronos Group
• Development of an industry standard for hybrid computing
• OpenCL version 1.0 released in December 2008
Pedro Miguel Costa Saraiva
GPU - Programming
OpenSSL acceleration using Graphics Processing Units
11
State of the art
• Open, royalty-free standard for general purpose programming
• Supports CPUs, GPUs, and other types of processors
• Maintained by the non-profit consortium Khronos Group
• Adopted by Intel, AMD, NVIDIA, and ARM Holdings
Pedro Miguel Costa Saraiva
GPU - OpenCL
OpenSSL acceleration using Graphics Processing Units
12
State of the art
• API for coordinating parallel computation across different processors
• Cross-platform programming languages
• Subset of ISO C99
• Low performance on NVIDIA GPUs
Pedro Miguel Costa Saraiva
GPU - OpenCL
OpenSSL acceleration using Graphics Processing Units
13
State of the art
• Proprietary hardware and software architecture
• Designed by NVIDIA
• Manages computations on a GPU
• API is programmed with “C for CUDA”
• Third party wrappers available for other languages
Pedro Miguel Costa Saraiva
GPU - CUDA
OpenSSL acceleration using Graphics Processing Units
14
State of the art
• Well suited to extremely parallel problems
• Interaction between threads should be minimal
• Diverging executions paths are slow
• Limited memory
• Slow memory swapping
• Data-intensive operations are discouraged
• No file or standard I/O operations
Pedro Miguel Costa Saraiva
GPU - Main Challenges
OpenSSL acceleration using Graphics Processing Units
15
Implementation
• OpenSSL
• AES
• RSA Key Generation
• RSA Cipher
Pedro Miguel Costa Saraiva
Structure
OpenSSL acceleration using Graphics Processing Units
16
Implementation
• ENGINE component supports alternative cryptography implementations
• Supports dynamic loading of external engines
Pedro Miguel Costa Saraiva
OpenSSL
OpenSSL acceleration using Graphics Processing Units
17
Implementation
• Binding function defines supported algorithms
• Pointers to functions implementing the defined algorithms
Pedro Miguel Costa Saraiva
OpenSSL Engine
OpenSSL acceleration using Graphics Processing Units
18
Implementation
• CBC mode encryption cannot be parallelised
• Previous ciphertext block is required to begin encryption of the next one
• CBC mode decryption can be parallelised
• All blocks are decrypted in parallel
• ECB mode can be parallelised
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
19
Implementation
• Initialisation
• Key expansion is performed on the CPU
• Cipher
• Initialises the GPU
• Allocates host and GPU memory for input and output data
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
20
Implementation
• Cipher
• Input data transferred to the GPU memory
• All data transferred at once
• GPU Kernel is called
• Output data is transferred from the GPU memory
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
21
Implementation
• GPU Kernel
• For CBC encryption, a single thread is called
• Encrypts every block serially
• For CBC decryption and ECB operations, a thread is called for every block
• All blocks are processed in parallel
Pedro Miguel Costa Saraiva
AES
OpenSSL acceleration using Graphics Processing Units
22
Implementation
• Generation function (CPU side)
• Calls the GPU to generate a large amount of prime candidates
• No more numbers are generated until the initial pool is exhausted
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
23
Implementation
• Generation function (GPU call)
• GPU RNG is initialised
• Device memory is allocated
• A large amount of threads is called to generate prime BIGNUMs
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
24
Implementation
• Generation function (GPU kernel)
• Random BIGNUM is generated
• BIGNUM p is tested for primality
• Miller-Rabin probabilistic primality test
• BIGNUMs determined to be prime are written into global memory
• Each thread tests one BIGNUM
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
25
Implementation
• Generation function (GPU call)
• Output data copied back to the host
• Required implementing the entire OpenSSL BIGNUM library on the GPU
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
26
Implementation
• BIGNUMs used in RSA must be broken down into small words
• Multiple threads can each process a word
• Chinese Remainder Theorem can split private key operations in half
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
27
Implementation
• Multi-Precision Algorithm
• K-bit integer A is broken into s k/64 words
• O(s) parallel implementation
• Runs s threads in two phases
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
28
Implementation
• First phase accumulates s partial products in 2s steps
• Carries accumulated in a separate array
• Second phase adds the carries to the intermediate result\
• Worst case scenario is s-1 iterations
• Usually only one or two
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
29
Results
• Intel Core i7 950 CP, 3.07GHz
• NVIDIA GeForce GTX 580
• Stress tool used on heavy CPU load tests
• 300 threads looping on sqrt, malloc/free and sync
Pedro Miguel Costa Saraiva
Testing Framework
OpenSSL acceleration using Graphics Processing Units
30
Results
Pedro Miguel Costa Saraiva
AES – CBC Decryption
OpenSSL acceleration using Graphics Processing Units
31
Results
Pedro Miguel Costa Saraiva
AES – CBC Encryption
OpenSSL acceleration using Graphics Processing Units
32
Results
Pedro Miguel Costa Saraiva
AES – ECB Encryption
OpenSSL acceleration using Graphics Processing Units
33
Results
Pedro Miguel Costa Saraiva
AES – ECB Decryption
OpenSSL acceleration using Graphics Processing Units
34
Results
Pedro Miguel Costa Saraiva
RSA Key Generation
OpenSSL acceleration using Graphics Processing Units
35
Results
Pedro Miguel Costa Saraiva
RSA Key Generation – Heavy CPU load
OpenSSL acceleration using Graphics Processing Units
36
Results
Pedro Miguel Costa Saraiva
RSA Cipher
OpenSSL acceleration using Graphics Processing Units
Single message, heavy CPU load
RSA Cipher
Single message
Multiple messages (4096-bit)
37
Results
Pedro Miguel Costa Saraiva
RSA Key Generation – Heavy CPU load
OpenSSL acceleration using Graphics Processing Units
38
Results
Pedro Miguel Costa Saraiva
RSA Key Generation – Heavy CPU load
OpenSSL acceleration using Graphics Processing Units
39
Conclusion
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
• Significant performance boost for AES ECB and CBC Decryption
• AES CBC Encryption is slower, but significantly lighter on the CPU
• RSA Key Generation is significantly faster for multiple keys
• RSA Cipher is significantly slower
40
Future Work
Pedro Miguel Costa Saraiva
OpenSSL acceleration using Graphics Processing Units
• AES CTR Cipher Mode
• OpenSSL implementation still unstable
• Manager to cache RSA requests for more effective use of the GPU