Contemporary Engineering Sciences, Vol. 10, 2017, no. 12, 555 - 567
HIKARI Ltd, www.m-hikari.com
https://doi.org/10.12988/ces.2017.7653
Performance Evaluation of the Present
Cryptographic Algorithm over FPGA
Edwar Gomez
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Cesar Hernández
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Fredy Martinez
Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Copyright © 2017 Edwar Gomez, Cesar Hernández and Fredy Martinez. This article is
distributed under the Creative Commons Attribution License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Modern cryptography seeks to guarantee the information confidentiality and
prevent unauthorized people from having access to it. These principles may be
applied in portable devices that require the protection of the information that has
been stored and processed. These types of applications require certain design
commitments that are achieved using high-performance hardware and the
implementation of light-weight algorithms, specifically the Present algorithm.
This method uses a block-based light encryption scheme which is relatively new,
that has not been infringed at the present date, with features that make it appealing
for an implementation on reconfigurable architectures such as FPGA. This work
presents the study, design, implementation and tests of this encryption algorithm.
Keywords: Block-based encryption, light-weight encryption, Present algorithm,
reconfigurable architectures
1. Introduction
The need to transmit secure data poses different scenarios and features in every
type of application. In each case, the two key elements that are integrated in the
556 Edwar Gomez, Cesar Hernández and Fredy Martinez
design and determine reliability and robustness are hardware and software. In
wireless applications with low data transfer (throughput) rates, the hardware tends
to be dedicated and have low cost. These applications, which are the focus of this
investigation, require the use of embedded hardware with low computational
capacity (light-weight) [2-4] that are based on an embedded microprocessor
development platform [5, 6], as well as programmable logic devices such as
FPGA [7-9].
Block-based encryptors [10, 11] are one of the most widely used encryption
schemes [12], where an endless number of algorithms with different features can
be found. Most of them have been implemented on programmable logic devices
[9, 11, 13, 14]. One of these algorithms is the AES (Advanced Encryption
Standard), which is a mandatory reference since it is the block-based encryption
standard [5, 13, 15] as well as many others which are listed in numerous works
that compare their characteristics and metrics [1, 11, 16]. Another mandatory
reference is the elliptic curve cryptography [17, 18] due to its background and the
need to give the project a solid mathematical foundation. Taking this into
consideration and knowing that the implementation of such algorithms can be
done with different techniques and devices, this research aims to implement a
block-based encryptor that complies with the light-weight philosophy and that can
be easily achieved in terms of hardware. Therefore, the PRESENT algorithm [4,
8, 9, 16, 19] is chosen to be implemented on hardware and afterwards studies will
be performed to determine the main metrics.
The implemented procedures in hardware give quick solutions to applications
where data traffic is higher and real-time encryption is required [14]. The
programmable logic devices are an excellent alternative, since they are
reconfigurable, and their concurrence makes them optimal for massive and
complex processes with pretty good responses times [4, 8, 9].
The PRESENT algorithm is specifically chosen, for being standardized and
accepted by the entities that regulate cryptography at a worldwide level [20, 21]
and for being a relatively new algorithm that has not been breached for now.
Furthermore, there is a comparison point with another block-based encryption
algorithm with a previous implementation [5], which was published in an
international congress.
The PRESENT algorithm is a symmetrical encryption method with a totally
minimalist nature, that uses substitution and permutation blocks of only 4 bits as a
base. This type of structure allows its implementation to be performed in devices
with reduced-sized hardware which gives a sense of adequate use of resources
making it easy to implement in processors with small bus. Additionally, it has
passwords of relatively small width, compared to other algorithms of the same
type and require a low number of rounds. From the standpoint of its design and implementation it presents an interesting challenge, since it requires an implementa-
Performance evaluation of the present cryptographic algorithm over FPGA 557
tion that establishes a balance between the amount of used resources and the
system’s general communication speed [1, 22, 23].
2. Present Algorithm
PRESENT is one the most well-known block-based lightweight encryption
algorithms mainly due to its specific design that has an easy application, both in
hardware and software [4, 8]. Its hardware implementation can be performed in
some of the smallest FPGA in the market, with a significantly high throughput
[9].
2.1. Structure of the PRESENT algorithm
Figure 1 describes the basic structure of the PRESENT algorithm where its blocks
are shown and how each of its 31 rounds is carried out [4].
Figure 1. Round of the PRESENT algorithm
2.1.1. Byte substitution layer: SboxLayer
It consists of a non-linear substitution that is independently applied to each nibble
of the state matrix (table 1) generating a new nibble. This transformation consists
of the substitution of each nibble with the result of applying the substitution table
S-Box [4, 8]. This substitution block is applied to the 16 nibbles that add up to 64
bits of information which is the standard size of the encryptor blocks.
558 Edwar Gomez, Cesar Hernández and Fredy Martinez
Table 1. Substitution box S-box of the PRESENT algorithm
X 0 1 2 3 4 5 6 7 8 9 A B C D E F
S(x) C 5 6 B 9 0 A D 3 E F 8 4 7 1 2
When the encryption algorithm is designed, the highest entropy within the
substituted data is searched. In this case, the word has a size of 4 bits since it is a
totally minimalist algorithm it can be implemented on a single LUT (Look Up
Table) in FPGAs.
2.1.2. Bits permutation: pLayer
It is a mixing layer where a bitwise substitution is performed on 64-bit
information block where the bit i of the round is moved to position P(i), the order
of the substitution is shown in table 2.
Table 2. Substitution order
Í
P(i)
0
0
1
16
2
32
3
48
4
1
5
17
6
33
7
49
8
2
9
18
10
34
11
50
12
3
13
19
14
35
15
51
í
P(i)
16
4
17
20
18
36
19
52
20
5
21
21
22
37
23
53
24
6
25
22
26
38
27
54
28
7
29
23
30
39
31
55
í
P(i)
32
8
33
24
34
40
35
56
36
9
37
25
38
41
39
57
40
10
41
26
42
42
43
58
44
11
45
27
46
43
47
59
í
P(i)
48
12
49
28
50
44
51
60
52
13
53
29
54
45
55
61
56
14
57
30
58
46
59
62
60
15
61
31
62
47
63
63
2.1.3. Password expansion function: addRoundKey
PRESENT may have passwords going from 80 to 128 bits of length. However,
this design and implementation will only consider 80 bits that will be stored in a
register K of such size and will be numbered K79, K78 …K0. In each round, only
the 64 most significative bits of the new calculated password will be mixed after
applying the password expansion function so that the new password Ki = K79,
K78.... K0, is determined by the following bit rotation:
𝐾𝑖 = K63 K62 . . . . K0 = K79 K78 . . . . K0 (1)
After this rotation has been performed on the input block, the following
operations must be performed for each new generated subPassword Ki:
Bit rotation of the input password:
[K79 K78 . . . . K0] = [K18 K17 . . . . K20𝐾19] (2)
Performance evaluation of the present cryptographic algorithm over FPGA 559
Substitution using S-Box for the nibble from k78 to k76 of the password:
[K79 K78 K77K76] = 𝑆 [K79 K78 K77K76] (3)
Addition or mix of nibbles k19 a k15 of the password with the round counter,
through the addition on finite field GF(21) or XOR operation:
[K19 K18 K17K16𝐾15] = [K19 K18 K17K16𝐾15] ⨁ 𝑅𝑜𝑢𝑛𝑑 𝐶𝑜𝑢𝑛𝑡𝑒𝑟 (4)
This function allows the generation of information blocks that are useful as sub-
passwords starting from the system password K. The first Nk words of this array
contain the password used for encryption, since the user password is mapped to
the array W while the rest of the words are generated from these first Nk words
[4, 8].
This function takes consecutive bytes from the sequence derived from the
password expansion function and assigns them to each sub-password Ki, to form
blocks of the same size than the state matrix. This means that it takes Nb*4 bytes
for each round, here Nb is 16.
The generation of the password (password expansion) for the decryption process
is performed in the same way as in the encryption process. The difference is in the
password selection function. In the decryption process, blocks from the password
list are taken starting from the final values up to the initial ones, which is the
user’s personal password. This means that the last sub-password Ki used to
decrypt will be the first to be used to decrypt [4, 8]. Hence, the encryption process
has to include all password generation rounds to begin from this last password Ki,
performing the reverse process until the original password is reached. The round
counter must be therefore decremental and perform its mixing in each round with
the previously indicated nibble.
3. Implementation of the PRESENT algorithm
3.1. Block diagram: DataPath
Since all encryption algorithms (be it through blocks or through flow) must give
all the information of their possible implementation, a possible implementation is
shown (Figure 2) through a DataPath that solves the algorithm. Although the
architecture shows a possible hardware implementation, it is one of the many
possible implementations from the light-weight perspective but does not
guarantee an optimization of the use of the programmable device’s hardware
resources [10]. In other implementations, dedicated memory blocks of some
FPGA are used to achieve the least number of gates used in the device, causing
the number of clock cycles for each round to increase [9].
560 Edwar Gomez, Cesar Hernández and Fredy Martinez
Figure 2. 16-bit DataPath for PRESENT’s SboxLayer and pLayer
When a round is performed, the calculation of the password must be computed
again. In this case, the sub-password Ki. Figure 3 exhibits functional blocks that
could be implemented both in software and hardware, since PRESENT is an
algorithm designed to be implemented in any of these techniques without
distinction.
Figure 3. DataPath generator of sub-passwords from the Present algorithm
Performance evaluation of the present cryptographic algorithm over FPGA 561
3.2. Hardware platforms
The selected embedded hardware platform is a Xilinx FPGA, using the ISE
development tool and low-cost FPGA such as the Spartan 3AN and Spartan 3E
which are in at least two different development systems used in the Universidad
Distrital Francisco José de Caldas. It is very important to mention that although
tests are not planned in Altera devices, a description is performed in VHDL
(Hardware Description Language) so that the resulting code is totally compatible
with any FPGA fabricated by any company.
3.3. Evaluation metrics
After choosing the technology, development systems and language to be used, the
parameters to be verified for each one of the embedded platforms must be
considered. In this case, the measurement of the implementation results of the
block-based encryption algorithm PRESENT over the different FPGA-type
programmable logic devices was achieved using the encryption and decryption
function with password size of up to 80 bits. The performance was tested with
vectors established with a 64-bit blocks and the evaluation metrics selected were
the throughput and the number of occupied slices (CLB-GE).
For the encryption implementation in FPGA-type embedded hardware platform,
the Top-Down methodology is applied, creating each block and using afterwards a
state machine to control such blocks. A code will be obtained in the end that can
be called upon by a higher-level design such as a functional block. Some of the
most significative design diagrams are shown seeking an increase in the
throughput but trying to reduce the use of the device in terms of slices, that are no
more than the basic resource of any device programmable in the field.
3.4. Algorithm implementation
Each one of the functional blocks was designed, tested and implemented for both
the encryptor and the decryptor. At this point, each block was described in low
level trying to use the least possible amount of resources. Each bock must be
simulated and tested before uniting them, to follow up with the methodology and
guarantee the system’s proper functioning. If all blocks work but the system’s
control unit fails, the system’s function would be compromised. Figure 4 shows
the state machine that will serve as system control unit handling the manipulation
of all signals and, most importantly, the counting and verification that confirm
that system output is the correct one.
562 Edwar Gomez, Cesar Hernández and Fredy Martinez
Figure 4. State machine of the encryption control unit (Fig 3.3)
The system rests until a starting signal is issued then the following state must
generate the load of both the password and the text to encrypt: the round counter
must start at zero, and immediately after that, the system must go to an encryption
state where the remaining 30 rounds are performed which mix the password and
the information. This is done by placing multiplexors in one of them so that the
output is given feedback by the previous state. The round counter must be totally
synchronized because, if at least one clock cycle is lost or the password is mixed
with an erroneous counting, the entire encryption process would be damaged. This
can be explained by the fact the S-box layer and pLayer blocks expand the error of
at least one bit over the entire information processed when performing
permutations and combinations. In the end, a finish state is given which generates
an output signal that is synchronized to issue a load order or data exit over a
possible transmission or storage block of the encryption data.
After the entire code is established and considering the given test vectors, a
simulation of the whole system is started. The depuration and verification process
was pretty long and dispendious and requires the redesign of each block and the
control unit for each step.
Performance evaluation of the present cryptographic algorithm over FPGA 563
4. Results
When the implementation is complete, a hardware measurement is carried out on
each functional block from the description and is compared to the reference,
created by the people that posed the algorithm [4]. It is determined that there is a
significant improvement in terms of hardware use.
Table 3. Comparative table of the reference implementation and ours
Reference Own
Flip-Flops Slices % used
Slices
Flip-
Flops Slices
% used
Slices
Sbox-layer 0 64 24,24% 0 37 19,17%
p-layer 0 0 0,00% 0 0 0,00%
State-
machine 96 34 12,88% 11 30 15,54%
Add Round
Key 0 37 14,02% 0 67 34,72%
Key-update 7 129 48,86% 7 59 30,57%
Registers 164 0 0,00% 137 0 0,00%
Total 260 264 100,00% 155 193 100,00%
General Purpose Reduction 26,89%
One of the most important contributions is the improvement of the general
system’s throughput which can be compared directly with table 4 [4].
Table 4. Comparative table of some encryption algorithms [4]
Key
size
Block
size
Cycles
per
block
Throughput at
100KHz(Kbps)
Logic
process
Area
GE rel.
Block ciphers
PRESENT-80 80 64 32 200 0.18𝜇𝑚 1570 1
AES-128[16] 128 128 132 12.4 0.35𝜇𝑚 3400 2.17
HIGHT[22] 128 64 34 188.2 0.25𝜇𝑚 3048 1.65
mCrypton[30] 96 64 13 492.3 0.13𝜇𝑚 2681 1.71
Camellia[1] 128 128 20 640 0.35𝜇𝑚 11350 7.23
DES[37] 56 64 144 44.4 0.18𝜇𝑚 2309 1.47
DESXL[37] 184 64 144 44.4 0.18𝜇𝑚 2168 1.38
Stream ciphers
Trivium [18] 80 1 1 100 0.13𝜇𝑚 2599 1.66
Grain[18] 80 1 1 100 0.13𝜇𝑚 1294 0.82
It focuses specifically in the analysis in the PRESENT-80 algorithm. The system’s
throughput which is 200 Kbps for the author at 100 kHz is compared to our own
564 Edwar Gomez, Cesar Hernández and Fredy Martinez
throughput according the maximum delay of 6,865 ns which corresponds to a
maximum working rate of 145.6 MHz for the design. Considering that each round
lasts one clock cycle, this represents 32 clock cycles of the system and each block
includes 64 bits of information.
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑡𝑝𝑢𝑡 𝐴𝑝𝑟𝑜𝑥 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 𝑟𝑎𝑡𝑒
𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑟𝑜𝑢𝑛𝑑𝑠=
145,6𝑀𝐻𝑧
32 𝑟𝑜𝑢𝑛𝑑𝑠= 4,55 𝑀𝑒𝑔𝑎 𝑀𝑒𝑠𝑠𝑎𝑔𝑒𝑠/𝑠 (5)
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑠ℎ𝑖𝑝𝑝𝑖𝑛𝑔 𝑟𝑎𝑡𝑒 (𝑘𝑏𝑝𝑠) =4,55𝑀 𝑚𝑒𝑠𝑠𝑎𝑔𝑒𝑠/𝑠
64 𝑏𝑖𝑡𝑠= 291200 𝑘𝑏𝑝𝑠 (6)
This implies an increase in the performance of 1456%. The creators of the
algorithm focused on the viability of the algorithm more than its performance
hence the clear difference.
5. Conclusions
Present is an ultra-light algorithm for block-based encryption. It possesses one of
the most compact encryption methods and, it is the smallest encryption scheme in
comparison to the reference size. Due to these structural features, it has awoken
great interest in the research of highly-efficient applications in terms of
computation with low energy consumption demands. This research in particular is
centered in a performance study over hardware platforms aiming at finding ideal
functioning conditions for the high-performance applications. A total of four
implementations were achieved over four different microcontrollers of fixed
hardware, four specific architectures of the algorithms for different degrees of
demand and an implementation over a FPGA in a reconfigurable hardware that
showed an operation speed 1000 times greater than the one previously reported.
It is noteworthy to mention that the previous investigation of this project was not
possible to find any implementation reported of this algorithm on
microcontrollers, which makes it unique. In terms of the FPGA implementation, it
is well-known that there are previous implementations, ours reports a 26%
reduction in terms of the hardware used due to an internal study of the algorithm
and the final architecture. This implementation allows standard coding, and
maintains the possibility of the use of very high-level tools to describe hardware
or automatic C code generators to accelerate the development time of the
algorithm, without leaving the light-weight philosophy.
The reduction in hardware use by 30% in comparison to previous
implementations over FPGA. The implementation of hardware using standard
VHDL programming language, managed a reduction of 26,89% in terms of the
use of resources over a Spartan 3AN FPGA from Xilinx. It is worth mentioning
that the reference implementation uses standard VHDL and it can be replicated
over any FPGA from another manufacturer, hence waiting for equivalent
reductions over them.
Performance evaluation of the present cryptographic algorithm over FPGA 565
References
[1] J. Attridge, An overview of hardware security modules, SANS Institute,
InfoSec Reading Room, 1 (2002), no. 1, 1-10.
[2] H. A. Alkhzaímí and M. M. Lauridsen, Cryptanalysis of the Simon family of
block ciphers, Technical University of Denmark, Vol. 1, 2013, no. 1, 1-26.
[3] P. Valla and J. Kaps, Lightweight cryptography for FPGAs, Reconfigurable
Computing and FPGAs, 2009. ReConFig ’09. International Conference on, 2009,
225- 230. https://doi.org/10.1109/reconfig.2009.54
[4] A. Bogdanov, L. Knudsen, G. Leander, and C. Paarl, A. Poschmann, M. J. B.
Robshaw, Y. Seurin, C. Vikkelsoe, PRESENT: An Ultra-Lightweight Block
Cipher, Chapter in Cryptographic Hardware and Embedded Systems - CHES
2007, Springer-Verlag Berlin Heidelberg, 2007, Ch. 5, 450-466.
https://doi.org/10.1007/978-3-540-74735-2_31
[5] R. Azuero, E. Jacinto and J. Castano, A low-memory implementation of 128
aes for 32 bits architectures, in En Congreso Argentino de Sistemas Embebidos
CASE 2012, 2012, 67-73.
[6] M. Kumar and A. Singhal, Efficient implementation of advanced encryption
standard (AES) for ARM based platforms, 1st International Conference on Recent
Advances in Information Technology (RAIT), 2012, 23-27.
https://doi.org/10.1109/rait.2012.6194473
[7] K. M. Abdellatif, R. Chotin-Avot and H. Mehrez, Lightweight and compact
solutions for secure reconfiguration of FPGAs, 2013 International Conference on
Reconfigurable Computing and FPGAs (ReConFig), 2013, 1-4.
https://doi.org/10.1109/reconfig.2013.6732304
[8] J. Pospiil and M. Novotny, Evaluating cryptanalytical strength of lightweight
cipher PRESENT on reconfigurable hardware, 15th Euromicro Conference on
Digital System Design (DSD), 2012, 560-567.
https://doi.org/10.1109/dsd.2012.53
[9] E. Kavun and T. Yalcin, RAM-based ultra-lightweight FPGA implementation
of PRESENT, 2011 International Conference on Reconfigurable Computing and
FPGAs (ReConFig), 2011, 280-285. https://doi.org/10.1109/reconfig.2011.74
[10] A. Aysu, E. Gulcan and P. Schaumont, Simon says: Break area records of
block ciphers on FPGAs," IEEE Embedded Systems Letters, 6 (2014), no. 2, 37-
40. https://doi.org/10.1109/les.2014.2314961
566 Edwar Gomez, Cesar Hernández and Fredy Martinez
[11] S. Feizi, A. Ahmadi and A. Nemati, A hardware implementation of Simon
cryptography algorithm, 2014 4th International Conference on Computer and
Knowledge Engineering (ICCKE), 2014, 245-250.
https://doi.org/10.1109/iccke.2014.6993386
[12] Shuangqing Wei, J. Wang, R Yin, and J. Yuan, Trade-off between security
and performance in block ciphered systems with erroneous ciphertexts, 8 (2013),
no. 4, 636-645. https://doi.org/10.1109/tifs.2013.2248724
[13] C.-P. Fan and J.-K. Hwang, Implementations of high throughput sequential
and fully pipelined AES processors on FPGA, International Symposium on
Intelligent Signal Processing and Communication Systems, 2007. ISPACS 2007,
353-356. https://doi.org/10.1109/ispacs.2007.4445896
[14] Chih-Peng Fan and Jun-Kui Hwang, Implementations of high throughput
sequential and fully pipelined AES processors on FPGA, International
Symposium on Intelligent Signal Processinq and Comunication Systems ISPACS
2007, 2007, 353-356. https://doi.org/10.1109/ispacs.2007.4445896
[15] J. J. Tay, M. M. Wong and I. Hijazin, Compact and low power AES block
cipher using lightweight key expansion mechanism and optimal number of s-
boxes, 2014 International Symposium on Intelligent Signal Processing and
Communication Systems (ISPACS), 2014, 108-114.
https://doi.org/10.1109/ispacs.2014.7024435
[16] R. Beaulieu, D. Shors, J. Smith, S. Treatman-Clark, B. Weeks and L.
Wingers, The simon and speck families of lightweight block ciphers, Cryptology
Print Archive, Report 2013/404, 4 2013, http://eprint.iacr.org
[17] V. Trujillo-Olaya, T. Sherwood and Ç. Kaya Koç, Analysis of performance
versus security in hardware realizations of small elliptic curves for lightweight
applications, Journal of Cryptographic Engineering, 2 (2012), no. 3, 179-188.
https://doi.org/10.1007/s13389-012-0039-x
[18] F.A. Urbano-Molano, V. Trujillo-Olaya and J. Velasco-Medina, Design of an
elliptic curve cryptoprocessor using optimal normal basis over GF(2233), 2013
IEEE Fourth Latin American Symposium on Circuits and Systems (LASCAS),
2013, 1-4. https://doi.org/10.1109/lascas.2013.6519014
[19] R. Beaulieu, D. Shors and J. Smith, S. Treatman-Clark, B. Weeks, L.
Wingers, The Simon and speck block ciphers on AVR 8-bit microcontrollers,
Lightweight Cryptography for Security and Privacy, Vol. 1, 2014, no. 1, 1-18.
https://doi.org/10.1007/978-3-319-16363-5_1
Performance evaluation of the present cryptographic algorithm over FPGA 567
[20] N. Hanley and M. O'Neill, Hardware comparison of the ISO/lEC 29192-2
Block ciphers, 2012 IEEE Computer Society Annual Symposium on VLSI
(ISVLSI), 2012, 57-62. https://doi.org/10.1109/isvlsi.2012.25
[21] D. Klinc, C. Hazay, A. Jagmohan, H. Krawczyk and T. Rabin, On
compression of data encrypted with block ciphers, 58 (2012), no. 11, 6989-7001.
https://doi.org/10.1109/tit.2012.2210752
[22] F. Qatan and I. Damaj, High-speed KATAN ciphers on-a-chip, 2012
International Conference on Computer Systems and Industrial Informatics
(ICCSII), 2012, 1-6. https://doi.org/10.1109/iccsii.2012.6454511
[23] S. Mane, M. Taha and P. Schaumont, Efficient and side-channel-secure block
cipher implementation with custom instructions on FPGA, 2012 22nd
International Conference on Field Proqrammable Logic and Applications (FPL),
2012, 20-25. https://doi.org/10.1109/fpl.2012.6339236
Received: July 11, 2017; Published: August 1, 2017