openssl on qoriq communications platform and c29x crypto...
TRANSCRIPT
External Use
TM
OpenSSL on QorIQ
Communications Platform and
C29x Crypto Coprocessor Family
FTF-NET-F0352
A P R . 2 0 1 4
Sam Siu | Application Engineer
TM
External Use 1
Agenda
• OpenSSL Overview
• QorIQ Processors with Crypto Accelerator
• Enable OpenSSL with QorIQ SEC Engine
• Performance Benefits with SEC Engine
• Summary
TM
External Use 2
Secure Socket Layer (SSL) Overview
• SSL
− The Secure Socket Layer (SSL) protocol is the most widely deployed application protocol to protect data during transmission by: Encrypting the data using popular cipher algorithms such as AES and 3DES
Message authentication using popular hash/digest algorithms such as SHA1 and MD5
− SSL is widely used in application web servers (HTTP) and other applications such as cloud storage, webmail (POP3 and IMAP), Proxy servers and many more, where protection of data in transit is essential for data integrity
− There are various version of SSL protocol such as TLSv1, SSLv3 and SSLv2 that are commonly used
− Other newer versions are available, such as TLSv2, TLSv3 and DTLS (Datagram TLS)
− Of all the SSL protocol versions, TLSv1 and SSLv3 are in common use
TM
External Use 3
SSL Handshakes Between Client and Server
Client Server
1. Client Hello
2. Server Hello
3. Certificate (Optional)
4. Certificate Request (Optional)
6. Server Hello Done
7. Certificate (Optional)
8. Client key exchange
9. Certificate Verify (Optional)
10. Change Cipher Spec
11. Finished
12. Change Cipher Spec
13. Finished
Encrypted Data
5.Server key exchange (Optional)
Public Private Public Private
TM
External Use 4
OpenSSL Overview
• OpenSSL is the de-facto high-level open standard library for Linux security user space applications
− Sample applications: Apache, PGP, SSL-based apps
− Documents are available from http://www.openssl.org/docs/
• OnenSSL allows the selection of an “ENGINE' to replace the default implementation during the operation of the command.
− OpenSSL can easily extended to use QorIQ SEC accelerator
− The ENGINE interface provides callback hooks to integrate with hardware accelerators with the crypto library
− The custom callback hooks implement the glue logic (code) to interface with various hardware accelerators
• The OpenSSL library has several sub-components: − SSL protocol library: libssl
− Crypto library (Symmetric and Asymmetric Crypto support): libcrypto
− Digest Support
− Certificate Management: CA.pl
TM
External Use 5
OpenSSL Layered Architecture with Linux Kernel
Engine
TM
External Use 6
Freescale Solution for OpenSSL Hardware Offloading
• Freescale Layer solution for OpenSSL hardware offloading: − User Space OpenSSL- implements the SSL protocol
Cryptodev-engine- implements the OpenSSL ENGINE interface; talks to cryptodev-linux (/dev/crypto) via ioctls, offloading cryptographic operations in kernel
− Kernel Space Cryptodev-linux- Linux module that translates ioctl requests from cryptodev-engine into calls to
Linux Crypto API
Linux Crypto API- Linux kernel crypto abstraction layer
CAAM driver- Linux device driver for the QorIQ crypto engine
• crypto_register_alg() to register CAAM driver's algorithm interface function pointers to the crypto layer.
• The following are offloaded in hardware in current SDK: − Protocols: TLS v1.0
− Cipher modes: Two passes (two ioctls - one for encryption, the other for authentication): all other combinations
of AES with SHA, all combinations of DES and 3DES with SHA
Single pass (a single ioctl for both encryption and authentication): AES128-SHA
• Reduce the amount of memory copies and API calls for authentication, encryption, and protocol specific operations
• TLS code block algorithms there is a need to perform padding in order to prepare data for encryption, padding can be done with a SEC combo descriptor
TM
External Use 7
QorIQ Processors with Crypto
Accelerator
TM
External Use 8
QorIQ Processors with Crypto Accelerator
Frees CPU from
draining repetitive
RSA, VPN and
HTTPs traffic
Supports protocol processing for the following: • IPSec • 802.1ae (MACSEC) • SSL/TLS • 3GPP RLC • LTE PDCP • SRTP • 802.11i (WiFi) • 802.16e (WiMax)
Data Encryption Standard Accelerators (DESA) • DES, 3DES (2K, 3K) • ECB, CBC, OFB modes AES Accelerators (AESA) • Key lengths of 128-, 192-, and 256-bit • ECB, CBC, CTR, CCM, GCM, CMAC, • OFB, CFB, and XTS Message Digest Hardware Accelerators (MDHA) • SHA-1, SHA-2 256, 384, 512-bit digests • MD5 128-bit digest • HMAC with all algorithms Random Number Generator
TM
External Use 9
C29x Public Key Accelerator
• Device
− 45 SOI process
− 29x29 1.0mm package
• Power
− ~15W at 1.2GHz
− 0C to 105C Tj
Coherent System Bus
JTAG
Real Time Debug
32-bit
DDR3/3L
Memory
Controller
512KB
Platform Cache
4 Lane 5GHz SERDES
Power Architecture™
e500-v2 Core
32KB
D-Cache
32KB
I-Cache
PC
Ie
DMA
ve
TS
EC
veT
SE
C
Security Fuse Processor
Security Monitor
IFC
Power Management
SD/MMC+
2x DUART
2x I2C
SPI, GPIO
SEC 0
512KB
Platform
SRAM
SEC 1
SEC 2
• Acceleration
− Secure Boot
Battery Backed Secret Key
Anti-Tamper
Side channel attack resistance
• 6Gbps AES-HMAC-SHA-1
• Asymmetric Ops
− 1024b Private Key (CRT) 115,400
− 1024 Public Key (17b exp) 1.6M
− 2048b Private Key (CRT) 31,700
− 2048b Public Key (17b exp) 571,000
• Processor
− 1x e500 v2, 32b, up to 1.2GHz
• Memory SubSystem
− 1MB Frontside L2 cache/SRAM w/ECC
− 32-bit DDR3/3L, 1200MHz data rate w/ECC
− Up to 64GB addressability (36-bit physical addressing)
• ECM Coherent System Bus High Speed Serial IO
− 1 PCIe 2.0 Controller (5GHz)
x1, x2, x4 options
• Network IO
− 2 x 10/100/1000 Ethernet Controllers
− RGMII /SGMII
− Lossless Flow Control, IEEE 1588
• Misc IO
− Integrated Flash Controller supporting NOR, SLC and MLC based NAND devices
− Dual UARTs, 2x I2Cs
TM
External Use 10
OpenSSL with QorIQ SEC Accelerator
• OpenSSL cryptodev “ENGINE” interface enable software to offload crypto
operation to the SEC accelerator by:
− Provides Hook Function glue code for interfacing with SEC engine
− Provides customized crypto driver and plug in code for SEC engine access
− Supports Cipher, Digest, PKI and RNG
• Utilize the SEC capabilities to the fullest possible manner
− Symmetric cryptography (AES, DES, 3DES)
− Digest Calculation (MD5, SHA1, SHA2)
− Random Number Generation
− Asymmetric Crypto Support (Public Key Crypto)
− SSL Record creation (TLSv1, SSLv3)
• Two Approaches:
− Direct Access to SEC (via USDPAA)
− Indirect access to SEC (via a kernel driver)
TM
External Use 11
OpenSSL Features Included
• SSL protocol offloading
− TLSv1 and SSLv3
• Cipher offloading
− AES (128/192/256), 3DES and DES
• Digest offloading
− MD5, SHA1, SHA224, SHA256, SHA384 and SHA512
• Diffie-Hellman: First published public-key (asymmetric) crypto algororithm
− Key generation: openssl genpkey -genparam -algorithm DH …
• RSA: the second publicly announced public key (asymmetric) cryptography method
− > openssl genpkey -algorithm RSA -out ${RSAKEYFILE} …
− Encrypt with public key
− Decrypt with private key
• Digital Signature Algorithm: creation and verification of cryptographically secure signatures
− Key generation e.g. openssl dsa -in ${DSAKEYFILE} -text -noout
− Sign, Verify
• Modified C Modules
− openssl-1.0.1c/ssl/ssl_lib.c
− openssl-1.0.1c/ssl/s3_clnt.c
− openssl-1.0.1c/ssl/s3_srvr.c
− openssl-1.0.1c/crypto/*
− openssl-1.0.1c/crypto/pkc.c
− include/openssl/ssl.h
TM
External Use 12
Encryption Data Path with SSL offloading
• With SSL offloading the same data packet can be encrypted with
MAC appended in a SINGLE PASS Encrypted
Data/Control Packet
Plain Data/Contr
ol Packet
Encrytp Packet
If DATA Packet?
Yes
No
Encrypt Data Record with
MAC appended
Offload SSL Data Record to SEC via
ENGINE
No Any error?
Yes
Return ERROR
TM
External Use 13
Decryption Data Path with SSL offloading
• With SSL offloading the same data packet can be decrypted with MAC verification in a SINGLE PASS
Plain Data/Control Packet
Encrypted Data/Control
Packet
Decrypt Packet
If DATA Packet?
Yes
No
Decrypt Data
Record with MAC
Verification
Offload SSL Data Record to SEC via ENGINE
No Any error?
Yes
Return ERROR
TM
External Use 14
OpenSSL Layered Architecture
EVP
Engine Layer
OpenSSL application
SSL Library
SSLV3 handshake state
machine
Crypto dev framework APIs
Offload Driver (CAAM)
BIO
Kernel socket layer
Crypto APIs
Cryptodev engine
T4240 SEC Engine
Kernel
Space
User
space
Hardware
Accelerator C290 Key Mgmt Accel
TM
External Use 15
Enable OpenSSL with QorIQ
SEC Engine
TM
External Use 16
US-DPAA Flow Chart for ENGINE operations
• A generic flow of data path in ENGINE glue code, used by all cryptographic offloading operations such as cipher, digest, RNG, PKI and SSL offloading.
Yes
No
Input Data
Output Data
Crypto Library Data
Offloading
Return Error to Engine
Engine Data Offloading
Copy Data to User Space
Mem
DEQ output from FQ
Pull QMAN for Output Data
Copy Data to USDPAA DMA
memory
Create: Job Descriptor & Frame Desc.
ENQ FD to SEC Frame Queue
Error?
Queue
Manager
ENQ DEQ
SEC
Engine
TM
External Use 17
SEC Queue Interface (QI) Interface
• When the SEC’s Queue Interface has room for more jobs, it issues dequeue requests to the Queue Manager.
− The QI uses a mechanism called Subportals to request FDs from different FQs.
− Each dequeue request specifies one of N subportal IDs.
− The QI is configured to request 1 or 3 Frame Descriptors.
− In response, the Qman provides 1-3 Frame Descriptors and Frame Queue Summary Information.
• Aside from debug scenarios, the SEC uses the following from the FQ Summary Info:
− Number of Frames dequeued
− Context A : Pointer to PreHeader
− Context B : Frame Queue ID to enqueue results
Note:
T4240 SEC use dedicated channel 840h, WQ 4200h to 4207h.
P4080 SEC use dedicated channel 80h, WQ 400h to 407h.
HW Portal (DCP)
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Queue Interface
SP
1
SP
2
SP
3
SP
4
SP
5
SEC
channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Channel 2112
TM
External Use 18
SEC5 IPSec Protocol Example
preheader: 0000 0021 0000 0000 [00] BA8F0221 shrhdr: stidx=15 share=serial len=33
PDB: IPSEC ESP ENCAP (CBC) PDB [01] 0009000D Options:NextHdr=0x09 NHOffset=0 ChainedIV IPHeaderInPDB
PrependOptIPHdr tunnel [02] 41311751 rsv(ESN) [03] 00000001 Seq Num = 1 [04] 92CD6CE9 IV[92cd6ce9ab7c728c153a85cefd12ab79]
[05] AB7C728C [06] 153A85CE [07] FD12AB79 [08] 00000001 SPI=0x00000001 [09] 00000014 OptIPHdrLength=20 [10] 45A60014 Opt IP Header [11] 58335CB7 [12] 55C809F8 [13] B44767BB [14] 79890A98
[15] A1004011 jump: jsl1 all-match[shrd] offset=17 local->[32]
[16] 04830028 key: class2-md-split len=40 imm [17] C8C1D7BF key=[c8c1d7bfa4e3ee84b15237063897ac9f
[18] A4E3EE84 [19] B1523706 [20] 3897AC9F [21] 53A90ACA 53a90acac4c5e59e65940e330fa54d3d [22] C4C5E59E [23] 65940E33 [24] 0FA54D3D [25] C71EFE3E c71efe3e2f205908] [26] 2F205908 [27] 02800010 key: class1-keyreg len=16 imm [28] 86EB545E key=[86eb545eec5347b851a0539e2ff69a8d]
[29] EC5347B8 [30] 51A0539E [31] 2FF69A8D [32] 87010C07 operation: encap ipsec aes-cbc hmac-sha1-160
TM
External Use 19
Enable OpenSSL with QorIQ Processors
• Supported SDK
− SDK1.5
• Support boards:
− P4080DS, P5040DS 32b / 64b, B4860QDS, T4240QDS
• Building OpenSSL with hardware offloading support
− Cross Compile Machine
$ ./scripts/host-prepare.sh
$ source ./poky/fsl-setup-poky -m t4240qds -t 16 -j 16 -l
$ bitbake fsl-image-core
− Development System
root@p4080ds:~# modprobe cryptodev
• cryptodev: driver 1.6 loaded.
root@p4080ds:~# openssl engine
TM
External Use 20
Building a Custom OpenSSL and a Web Server
• OpenSSL with TLS support can be used directly from Freescale SDK image or can be manually built with support for cryptodev:
$ ./config –DHAVE_CRYPTODEV –DUSE_CRYPTODEV_DIGESTS
$ make && make install
• These two options are required to enable support for cryptodev into OpenSSL. If nginx is used as a web server, it can be built with support for SSL/TLS with the commands:
$ ./configure –-with-http_ssl_module –-with_openssl=<openssl_dir>
$ make && make install
• OpenSSL directory <openssl_dir> is the one where openssl tarball is extracted. Nginx build scripts will dive in there and build openssl before building itself.
• If support for HW acceleration is required, nginx configuration command will be slightly different. For testing TLS acceleration, nginx must be linked with the OpenSSL version from Freescale.
$ ./configure –-with-http_ssl_module –-with_openssl=<openssl_dir> --with_openss_opt=”-DHAVE_CRYPTODEV –DUSE_CRYPTODEV_DIGESTS”
• Cryptodev engine will then have to be enabled in nginx configuration file: ssl_engine cryptodev;
TM
External Use 21
Inserting Crypto-dev Module and Basic Checks
• Test Scenario − A Freescale board is used as an HTTPS server responding to HTTPS requests from
various SSL clients (e.g. web browsers).
− TLS record offload shows performance improvement when the HTTPS is used to transfer large amount of data between server and client.
− $ modprobe cryptodev cryptodev: driver 1.6 loaded.
− $ openssl engine (cryptodev) BSD cryptodev engine
(dynamic) Dynamic engine loading support
• If cryptodev driver is not loaded, OpenSSL will report only dynamic engine support and all operations will be done in user-space without HW acceleration
• If crypto testing module was built into the kernel, it can be used to check if TLS support is available: − $ modprobe tcrypt
− $ grep tls /proc/crypto
− name : tls10(hmac(sha1),cbc(aes))
− driver : tls10-hmac-sha1-cbc-aes-caam
TM
External Use 22
OpenSSL Demo
• Https server(nginx+100M.html) ---------------------- DUT
• OpenSSL s_client command will be used on Freescale board to make the connection with the server:
− $ openssl s_client
• The command can be scripted and run without further intervention:
− $ echo GET /index.html | openssl s_client –connect <server_ip>:443 –cipher AES128-SHA –tls1 –quiet
• The option “–quiet” can be removed to see more details about the TLS session.
• OpenSSL will use automatically the HW acceleration if cryptodev module is loaded in the kernel.
TM
External Use 23
OpenSSL Demo Configuration
• A100M file from https server to DUT:
− time echo GET /100M.html | openssl s_client -connect 192.168.4.1:443 -
tls1 -cipher AES128-SHA -pause -quiet > /dev/null 2>&1
• collect cpu utilization by mpstate during the https get.
− mpstate -P ALL 1
• Example:
− if got real time 5
− throughput = 100M / 5 = 20Mbps
TM
External Use 24
Performance Benefits with SEC
Engine
TM
External Use 25
T4240QDS System
Board T4240QDS (Rev 2.0 silicon)
OS SMP Linux 3.8.13 64bit / 32bit user space
Core 12 x e6500 cores * 2 threads @1600MHz
Sec
Engine
Sec 5 : 8 x DECO
Frequency Core/CCB/DDR: 1667/733/933
Cache L1 : 32 Kbytes Dcache and Icache with 64 byte line
size
L2 : 2MB shared L2 cache
L3 : 512KB CPC per DDRC, totally 1.5MB CPC for 3 DDRC
Memory CPC : 512K per DDR controller
6GB single rank DIMM with 1x 64b DDR3 @1.8G
U-Boot U-boot 2013.01
Filesyste
m
Ramdisk file system
Compiler gcc-4.7.3, eglibc-2.15, binutils-2.23.1
SEC 5.0
TM
External Use 26
0
10
20
30
40
50
60
70
80
90
100
0
100
200
300
400
500
600
CPU Utils
M
b
p
s
OpenSSL Offload Benchmark
Mbps
CPU Utilization
OpenSSL Offload Result for P4080DS
• Benefits of SEC Accelerator
− 4x Performance improvement
TM
External Use 27
Key Management CPU Utilization Without Accelerator
• T4240 has 24 cores, reach maximum performance with a thread per core
• With no offload, OpenSSL uses up all CPU resources
TM
External Use 28
Key Management CPU Utilization With Accelerator (C293)
• With C293 accelerator, CPU utilization is fairly low
• Core 0-3 show higher CPU utilization as they are dedicated for C290 IO task
TM
External Use 29
Summary
TM
External Use 30
Summary
• Enabling the OpenSSL offload through SEC means most
Linux apps will benefit through performance improvement and
reduced CPU utilization
• Freescale offers user space application the flexibility of direct
access via USDPAA environment, or indirect access to SEC via
kernel driver
• Additional FTF Session:
FTF-SDS-F0218_Security_DN_101
FTF-NET-F0111_Overview_of_Autonomous_IPSec
TM
External Use 31
Introducing The
QorIQ LS2 Family
Breakthrough,
software-defined
approach to advance
the world’s new
virtualized networks
New, high-performance architecture built with ease-of-use in mind Groundbreaking, flexible architecture that abstracts hardware complexity and
enables customers to focus their resources on innovation at the application level
Optimized for software-defined networking applications Balanced integration of CPU performance with network I/O and C-programmable
datapath acceleration that is right-sized (power/performance/cost) to deliver
advanced SoC technology for the SDN era
Extending the industry’s broadest portfolio of 64-bit multicore SoCs Built on the ARM® Cortex®-A57 architecture with integrated L2 switch enabling
interconnect and peripherals to provide a complete system-on-chip solution
TM
External Use 32
QorIQ LS2 Family Key Features
Unprecedented performance and
ease of use for smarter, more
capable networks
High performance cores with leading
interconnect and memory bandwidth
• 8x ARM Cortex-A57 cores, 2.0GHz, 4MB L2
cache, w Neon SIMD
• 1MB L3 platform cache w/ECC
• 2x 64b DDR4 up to 2.4GT/s
A high performance datapath designed
with software developers in mind
• New datapath hardware and abstracted
acceleration that is called via standard Linux
objects
• 40 Gbps Packet processing performance with
20Gbps acceleration (crypto, Pattern
Match/RegEx, Data Compression)
• Management complex provides all
init/setup/teardown tasks
Leading network I/O integration
• 8x1/10GbE + 8x1G, MACSec on up to 4x 1/10GbE
• Integrated L2 switching capability for cost savings
• 4 PCIe Gen3 controllers, 1 with SR-IOV support
• 2 x SATA 3.0, 2 x USB 3.0 with PHY
SDN/NFV
Switching
Data
Center
Wireless
Access
TM
External Use 33
See the LS2 Family First in the Tech Lab!
4 new demos built on QorIQ LS2 processors:
Performance Analysis Made Easy
Leave the Packet Processing To Us
Combining Ease of Use with Performance
Tools for Every Step of Your Design
TM
© 2014 Freescale Semiconductor, Inc. | External Use
www.Freescale.com