parallel algorithms for symmetric key...
TRANSCRIPT
![Page 1: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/1.jpg)
PARALLEL ALGORITHMS FOR
SYMMETRIC KEY
INFRASTRUCTURE BASED
SECURITY TECHNIQUES
THESIS
Submitted
in fulfilment of the requirements of the degree of
DOCTOR OF PHILOSOPHY
By
Disha Handa
University Regd. No: PHDENG10023
Supervised by
Dr. Bhanu Kapoor,
Professor, Chitkara University, Himachal Pradesh
December, 2014
Department of Computer Science & Engineering
CHITKARA UNIVERSITY, HIMUDA EDUCATIONAL HUB,
SOLAN, HIMACHAL PRADESH-174103
![Page 2: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/2.jpg)
ii
CHITKARA UNIVERSITY, HIMACHAL PRADESH
DECLARATION BY THE STUDENT
I hereby certify that the work which is being presented in this thesis
entitled “Parallel Algorithms for Symmetric Key infrastructure Based
Security Techniques” is for fulfillment of the requirement for the award
of Degree of Doctor of Philosophy submitted in the Department of
Computer Science and Engineering, Chitkara University, Barotiwala,
Solan, Himachal Pradesh is an authentic record of my own work carried
out under the supervision of Dr. Bhanu Kapoor.
The work has not formed the basis for the award of any other degree or
diploma, in this or any other Institution or University. In keeping with the
ethical practice in reporting scientific information, due acknowledgements
have been made wherever the findings of others have been cited.
(Signature)
(Disha Handa)
![Page 3: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/3.jpg)
iii
CHITKARA UNIVERSITY, HIMACHAL PRADESH
CERTIFICATE BY THE SUPERVISOR
This is to certify that the thesis entitled “Parallel Algorithms for Symmetric Key
infrastructure Based Security Techniques” submitted by Disha Handa, Regd. No.
PHDENG10023 to the Chitkara University, Barotiwala, Solan,Himachal Pradesh in
fulfillment for the award of the degree of Doctor of Philosophy is a bonafide record of
research work carried out by her under my supervision. The contents of this thesis,in
full or in parts, have not been submitted to any other Institution or University for the
award of any degree or diploma.
(Signature)
Dr. Bhanu Kapoor,
Professor, Chitkara University,
Himachal Pradesh, India
![Page 4: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/4.jpg)
iv
ACKNOWLEDGMENT
I would like to express my special appreciation and thanks to my advisor
Professor Dr. Bhanu Kapoor, you have been a great mentor for me. I would like to
thank you for encouraging my research and for allowing me to grow as a research
scientist. Your advice on both research as well as on my career have been
priceless. I would also like to thank my research colleagues, Ms.Neha Kishore,
Ms. Sapna Saxena, and Ms. Tanu Sharma for their valuable support. A special
thanks to Ms. Harpreet Kaur from Electronics and Communication Department for
her advices and support. A heartiest thanks to Ms. Isha Saluja and Ms.Padma kala
for helping me on refining this document in terms of language accuracy.
All the same, I would like to thank my family for motivating and supporting me in
this endeavor and letting me spread my wings all over. My parents, husband and
my little daughter have been my backbone throughout. Only because of them, I
could keep the essence of hard work towards my goal.
In the end, I would like to thank the management of Chitkara University and a
special thanks to Dr. Ashok Chitkara, Chancellor, Dr. Madhu Chitkara, Pro
Chancellor, Brig. (Dr.) R.S. Grewal, Vice Chancellor, Dr. Sudhir Mahajan, Dean
Research and Development and his team, Dr. Rajnish Sharma, Dean Academics,
Dr. Shaily Jain, Head of the Computer Science Department, and all the internal
and external examiners for their valuable time and their expert guidance for
various progress seminars, presentations, suggestions, feedback and the approvals.
Once, again I would like to extend my deep gratitude to everyone who has helped
me shaping up this dream and making it a reality.
Above all, I would like to thank Almighty for giving me the inner strength and
passion that drives me and helps me keep going.
Disha Handa
![Page 5: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/5.jpg)
v
LIST OF PUBLICATONS
Published/Presented
Handa D and Kapoor B (2014) State of the Art Realistic Cryptographic
Approaches for RC4 Symmetric Stream Cipher. IJCSA,vol. 4, pp. 27-37,
DOI:10.5121/ijcsa.2014.4403
Handa D and Kapoor B (2014) “Performance Analysis of PBlock
Algorithm Implemented Using SIMD Model to Attain Parallelism”,
Proceedings of the 49th Annual Convention of the Computer Society of
India CSI -Emerging ICT for Bridging the Future, Volume 2, Springer,
pp.71-80, DOI: 10.1007/978-3-319-13731-5_9
Handa D and Kapoor B (2014) "PARC4: High performance
implementation of RC4 cryptographic algorithm using parallelism",
Proceedings of the international conference on Optimization, Reliability,
and Information Technology (ICROIT), pp. 286-
289,10.1109/ICROIT.2014.6798339.
Accepted
Handa D and Kapoor B( 2015) PARC4-I: Parallel Implementation of
Enhanced RC4A using PASCS and Loop Unrolling Mechanism”,
Computer Applications: International Journal, 2:2
Communicated
Handa D and Kapoor B (2015) PBlock- an Energy Efficient Parallel
Approach for Faster File Encryption using Parallel Independent Feistel
Cipher Structure, Asian Journal of Scientific research
Handa D and Kapoor B (2015) PARC4: An Energy Efficient, Parallel
Implementation of RC4 Cipher using Parallel Additive Stream Cipher
Structure, International journal of emergent, parallel and distributed
computing.
![Page 6: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/6.jpg)
vi
ABBREVIATIONS
AES Advanced Encryption Standard
API Application Programming Interface
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
CC-NUMA Cache-Coherent Non Uniform Access
DES Data Encryption Standard
DVFS Dynamic Voltage and Frequency Scaling
DES Data Encryption Standard
ECB Electronic Code Book
FDE File and Disk Encryption
FPGA Field-Programmable Gate Array
GCC GNUs Compiler Collection
GPGPU General Purpose Graphics Processing Unit
HPC High Performance Computing
IC Integrated Circuit
IPC Inter Processor Communication
KSA Key Generation Algorithm
MIMD Multiple Instruction Stream and Multiple
Data Stream
MISD Multiple Instruction Stream Single Data
Stream
NUMA Non- Uniform Memory Access
OpenMP Open Multiprocessing
![Page 7: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/7.jpg)
vii
PASCS Parallel Additive Stream Cipher Structure
PARC4 Parallel Approach of RC4
PRGA Pseudo Random Generation Algorithm
PIFNS Parallel Independent Feistel Network Structure
RC4 ARC Four
RAM Random Access Memory
ROM Read Only Memory
RSA Rivest Shamir Adleman
RAW Read after Write
SIMD Single Instruction Stream Multiple Data
Stream
SISD Single Instruction Stream Single Data
Stream
SMP Symmetric Multiprocessor Architecture
SSL Secure Socket Layer
TLS Transport Layer Security
UMA Uniform Memory Access
VLSI Very large Scale Integration
WEP Wired Equivalent Privacy
WAR Write after Read
WAW Write after Write
![Page 8: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/8.jpg)
viii
NOTATIONS
S Substitution Box
P1-P18 Array containing digits of Pi
Mod Modular arithmetic
Tp Execution time of parallel portion
Ts Execution time for serial portion
E(n) Efficiency with n processing elements
T(1) Execution time using single processing element
T(n) Execution time using n processing element
⊕ XOR operation
Ω Standard asymptotic lower bound
Θ Standard asymptotic upper bound
𝑇𝑜 Total parallel overhead
X Number of times speedup
𝑝 Number of processors
n Number of blocks
µ muon
![Page 9: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/9.jpg)
ix
LIST OF TABLES
Table No. Title Page No.
Table 4-1: Time (In Seconds) taken by RC4 to encrypt/decrypt large data files by
uniprocessor ......................................................................................................... 57
Table 4-2: Time (In Seconds) taken by PARC4 to encrypt/decrypt large data files
using 2 Cores ........................................................................................................ 57
Table 4-3: Time (In Seconds) taken by PARC4 to encrypt/decrypt large data files
using 4 cores ......................................................................................................... 58
Table 4-4: Time (In Seconds) taken by PARC4 to encrypt/decrypt large data files
using 6 cores ......................................................................................................... 58
Table 4-5: Time (In Seconds)taken by PARC4 to encrypt/decrypt large data files
using 8 cores ......................................................................................................... 59
Table 4-6: Efficiency as a function of n and p for running n blocks on p
processors to encrypt input stream………………………………………………62
Table 4-7: Comparison between PARC4 and Multithreaded approach ............... 66
Table 5.1: Time taken by RC4A to encrypt/decrypt large data files by
uniprocessor system ........................................................................................... 766
Table 5.2: Time taken by PARC4-I to encrypt/decrypt large data files using 2
Cores ..................................................................................................................... 76
Table 5.3: Time taken by PARC4-I to encrypt/decrypt large data files using 4
cores ................................................................................................................... 776
Table 5.4: Time taken by PARC4-I to encrypt/decrypt large data files using 6
cores ................................................................................................................... 777
Table 5.5: Time taken by PARC4-I to encrypt/decrypt large data files using 8
cores ..................................................................................................................... 77
Table 5.6: Comparison between PARC4 and PARC4-I..................................... 844
Table 7.1: Avalanche effect in Blowfish and PBlock: change in plaintext ........ 100
Table 7.2: Avalanche effect in Blowfish and PBlock: change in key ................ 100
Table 7.3: Time taken by Blowfish to encrypt/decrypt large data files by single
processor ........................................................................................................... 1001
![Page 10: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/10.jpg)
x
Table 7.4 Time taken by PBlock to encrypt/decrypt large data files using 2 cores
………………………………………………………………………………….101
Table 7.5 Time taken by PBlock to encrypt/decrypt large data files using 4 cores
............................................................................................................................ 102
Table 7.6 Time taken by PBlock to encrypt/decrypt large data files using 6 cores
............................................................................................................................ 102
Table 7.7 Time taken by PBlock to encrypt/decrypt large data files using 8 cores
............................................................................................................................ 103
Table 7.8 Efficiency Vs number of processing elements for different file size..105
Table 7.9 Comparison between PBlock and Pipelined approach ……………..107
Table 8.1 Calibrated and Non-calibrated specification .................................... 1155
Table 8.2 Energy consumed by Blowfish and PBlock with system’s default
frequency and voltage ...................................................................................... 1155
Table 8.3 Energy consumed by existing and proposed parallel algorithms for
stream cipher technique using system’s default frequency and voltage .......... 1155
Table 8.4 Low power states of AMD-8320 processor…………………………116
![Page 11: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/11.jpg)
xi
LIST OF FIGURES
Figure No. Title Page No.
Fig.1.1 Pictorial Representation of Cryptography………………………………..5
Fig.1.2 Pictorial representation of Symmetric key infrastructure based security
algorithms ……………………………………………………………………….8
Fig.1.3 Stream Cipher Encryption Techniques…………………………………...8
Fig.1.4 Block Cipher Encryption Techniques…………………………………….9
Fig. 2.1 Pictorial Representation of Data Parallel Model (Barney, 2010) ........... 32
Fig. 2.2 Pictorial Representation of MPMD (Barney, 2010) ............................... 33
Fig. 2.3 Representation of Domain Decomposition (Barney, 2010) .................... 34
Fig. 2.4 Pictorial Demonstration of Functional Decomposition (Barney, 2010) . 34
Fig. 2.5 Multi-Core Processor Architecture ......................................................... 37
Fig 3.1 Design of Parallel Additive Stream Cipher Structure .............................. 44
Fig 4.1 Swapping between S[i] and S[j] .............................................................. 49
Fig 4.2 I. Depicts sequential key generation whereas II. Presents the formation of
key stream for parallel framework ....................................................................... 52
Fig 4.3 Graphical Representation of Complete Flow and Model Used to
Parallelize RC4 ..................................................................................................... 53
Fig 4.4 Pictorial Representation of Input Data Decomposition Technique ......... 54
Fig 4.5 Speedup comparison of PARC4 using multiple cores ............................. 60
Fig 4:6 Speedup for Constnt Data using multiple cores ..................................... 61
Fig 4:7 Comparison for Throughput achieved using RC4 and PARC4 ............. 64
Fig 5.1 Method used to implement PARC4-I on SMPs ....................................... 71
Fig 5.2 Pictorial representation of normal and unwinding loop ......................... 755
Fig 5.3 Execution time of 1Gb of data file using PARC4-I ................................. 78
Fig 5.4 Speedup comparison using multiple cores ............................................... 79
Fig 5.5 Graphical representation of parallel run time of PARC4 vs PARC4-I on
eight cores .......................................................................................................... 831
Fig 5.6 Comparison between PARC4 and PARC4-I for speedup ……………...81
Fig 5.7 Comparison of PARC4 and PARC4-I for Efficiency .............................. 82
Fig 5.8 Comparison between PARC4 and PARC4-I algorithms for throughput . 83
![Page 12: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/12.jpg)
xii
Fig 6.1 Structure of Sequential Feistel Network (William, 2006) ....................... 87
Fig 6.2 Parallel Independent Feistel Network Structure ...................................... 89
Fig 7.1 Graphical representation of F function .................................................... 97
Fig 7.2 Speedup comparison of PBlock using multiple cores .......................... 1044
Fig 7.3 For constant file size speedup tends to saturate at specific point .......... 104
Fig 8.1-Power metering interface exposed by Joulemeter ............................... 1133
Fig 8.2: Data file of PARC4 consisting joules consumed at each time stamp . 1144
Fig 8.3 Comparison of serial, parallel and parallel with calibration Blowfish and
PBlock for energy consumption using platform 1…………………………….117
Fig 8.4 Comparison of serial, parallel and parallel with calibration Blowfish and
PBlock for energy consumption using platform 2 ............................................ 118
Fig 8.5 Serial and Parallel algorithms for stream ciphers technique with default
and calibrated frequency using platform 1 ......................................................... 118
Fig 8.6 Serial and Parallel algorithms for stream ciphers technique with default
and calibrated frequency using platform 2 ......................................................... 119
![Page 13: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/13.jpg)
xiii
LIST OF ALGORITHMS
Algorithm No. Title Page No.
Algorithm 1.1 The key Scheduling Algorithm (KSA) (Schneier, 2008) ............. 18
Algorithm 1.2 The Pseudo-Random Generation Algorithm (PRGA) (Schneier,
2008) ..................................................................................................................... 18
Algorithm 4.1 Steps to Implement PARC4 .......................................................... 51
Algorithm 5.1 Enhanced pseudo-random generation algorithm (PRGA) ............ 70
Algorithm 5.2 Method use to parallelize multiple data chunks using PARC4-I .. 72
Algorithm 7.1 Algorithmic steps for encryption process in PBlock .................. 965
Algorithm 7.2 Algorithmic steps for parallel F function…………………..........96
![Page 14: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/14.jpg)
xiv
CONTENTS
DECLARATION BY THE STUDENT .......................................................................... ii
CERTIFICATE BY THE SUPERVISOR ..................................................................... iii
ACKNOWLEDGMENT ................................................................................................. iv
LIST OF PUBLICATONS ............................................................................................... v
ABBREVIATIONS ........................................................................................................... vi
NOTATIONS .................................................................................................................. viii
List of Tables ..................................................................................................................... ix
List of Figures.................................................................................................................... xi
List of Algorithms ........................................................................................................... xiii
Contents ........................................................................................................................... xiv
Abstract ......................................................................................................................... xviii
1 Chapter 1 ................................................................................................................ 2
Introduction ........................................................................................................................ 2
1.1 Technology enhancements ................................................................................. 3
1.2 Cryptography ...................................................................................................... 4
1.2.1 Classification of Cryptographic techniques ........................................................ 7
Blowfish .......................................................................................................................... 9
Cast ...............................................................................................................................10
Data Encryption Standard (DES) ...................................................................................10
IDEA ..............................................................................................................................10
RC4 ................................................................................................................................10
Triple DES ......................................................................................................................10
1.3 Issues in Symmetric Key Infrastructure based Algorithms ...............................11
1.4 Possible Solutions .............................................................................................11
1.5 Motivation of Research: ...................................................................................13
1.6 Research Problem .............................................................................................13
1.7 Literature Review .............................................................................................14
1.7.1 Description of Blowfish....................................................................................14
1.7.2 Description of RC4 ...........................................................................................17
1.7.3 Description of RC4A ........................................................................................19
1.8 Dissertation Contribution And Delineate .........................................................21
![Page 15: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/15.jpg)
xv
2 Chapter 2 .............................................................................................................24
Basic Ideology of Parallel Computing, Tools and Experimental Setup used for Research
..........................................................................................................................................24
2.1 Introduction ......................................................................................................24
2.2 Essential notions of parallel programming used in Research ..........................26
2.2.1 Identifying Parallel Region in Code .................................................................26
2.2.2 Type of Parallel Computer ...............................................................................28
2.2.3 Speed up calculation using Amdahl’s law ........................................................30
2.2.4 Parallel computing memory architecture ..........................................................30
2.2.5 Parallel Programming Models ..........................................................................31
2.2.6 Partitioning .......................................................................................................33
2.2.7 Synchronization ................................................................................................34
2.2.8 Mapping for Load Balancing ............................................................................35
2.2.9 Granularity ........................................................................................................35
2.3 Experimental setup ...........................................................................................36
2.4 Tools used .........................................................................................................37
2.4.1 Gprof.................................................................................................................38
2.4.2 OpenMP ............................................................................................................38
2.4.3 MinGW .............................................................................................................38
2.4.4 CodeBlocks .......................................................................................................39
2.4.5 Joulemeter .........................................................................................................39
Conclusion ....................................................................................................................39
3 Chapter 3 ..............................................................................................................41
Design of Parallel Additive Stream Cipher Structure.......................................................41
3.1 Introduction ......................................................................................................41
3.2 Motivation For Parallel Architecture ................................................................43
3.3 Design of PASCS ................................................................................................43
Conclusion ....................................................................................................................45
4 Chapter 4 ..............................................................................................................47
PARC4: Parallel approach for RC4 using PASCS ...........................................................47
4.1 Introduction ......................................................................................................47
4.2 Detection of Parallelism ...................................................................................48
4.3 Method for Adding Parallelism .........................................................................49
4.3.1 Parallelization techniques .................................................................................54
![Page 16: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/16.jpg)
xvi
4.4 Security Analysis ...............................................................................................55
4.4.1 Shannon’s entropy ............................................................................................55
4.5 Experimental Results ........................................................................................56
4.6 Performance and Scalability Analysis ...............................................................59
4.6.1 Speedup ............................................................................................................59
4.6.2 Efficiency .........................................................................................................61
4.6.3 Complexity and Cost optimality .......................................................................62
4.6.4 Scalability .........................................................................................................63
4.6.5 Throughput .......................................................................................................64
4.7 Comparative Analysis .......................................................................................64
4.7.1 Mapping and Load Balance ..............................................................................65
4.7.2 Modified Key stream ........................................................................................65
4.7.3 Energy Efficiency .............................................................................................65
Conclusion ....................................................................................................................66
5 Chapter 5 ...................................................................................................................67
PARC4-I: Parallel RC4A using PASCS and loop unrolling mechanism .........................68
5.1 Introduction ......................................................................................................68
5.2 Modified KSA and PRGA ...................................................................................68
5.3 Incorporating parallelism .................................................................................69
5.3.1 Techniques to enhance benefits of parallelization ............................................72
5.4 Experimental Results ........................................................................................75
5.5 Performance and Scalability Analysis ...............................................................78
5.5.1 Parallel Run Time .............................................................................................78
5.5.2 Speedup ............................................................................................................78
5.5.3 Efficiency .........................................................................................................80
5.5.4 Scalability .........................................................................................................80
5.6 Comparison between PARC4 and PARC4-I .......................................................80
5.6.1 Parallel Run time ..............................................................................................80
5.6.2 Speedup ............................................................................................................81
5.6.3 Efficiency .........................................................................................................82
5.6.4 Loop overhead ..................................................................................................83
5.6.5 Throughput .......................................................................................................83
Conclusions ...................................................................................................................84
6 Chapter 6 ..............................................................................................................86
![Page 17: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/17.jpg)
xvii
Design of Parallel Independent Feistel Network ..............................................................86
6.1 Introduction ......................................................................................................86
6.2 Motivation for Parallel architecture .................................................................88
6.3 Design of Parallel Independent Feistel Network Structure ..............................88
6.4 Application Area of PIFNS .................................................................................90
Conclusion ....................................................................................................................90
7 ........................................................................................................................................91
Chapter 7 ................................................................................................................92
PBlock- Parallel approach for Blowfish cipher using PIFN .............................................92
7.1 Introduction ......................................................................................................92
7.2 Implementation of PBlock using PIFNS ............................................................93
7.2.1 Parallel Methodology ...........................................................................................94
7.2.2 Design of Parallel F function ............................................................................97
7.3 Security Analysis using Avalanche effect ..........................................................97
7.4 Experimental Results ......................................................................................100
7.5 Performance and Scalability Analysis .............................................................103
7.5.1 Speedup ..........................................................................................................103
7.5.2 Efficiency .......................................................................................................105
7.5.3 Complexity and Cost optimality .....................................................................106
7.5.4 Scalability .......................................................................................................106
7.6 Comparative analysis of PBlock and Blowfish using Pipeline approach .........107
Conclusion ..................................................................................................................108
8 ......................................................................................................................................109
8 Chapter 8 ............................................................................................................110
Analysis of Energy Consumption by proposed parallel algorithms ...............................110
8.1 Introduction ....................................................................................................110
8.2 Motivation ......................................................................................................111
8.3 Tools and Techniques used for energy measurement ...................................112
8.4 How Joulemeter works to measure energy ...................................................113
8.5 Energy Measurement .....................................................................................114
8.5.1 Result and Analysis ........................................................................................114
Conclusion ..................................................................................................................120
9 Chapter 9 ...............................................................................................................121
Conclusions and Future Scope .......................................................................................122
![Page 18: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/18.jpg)
xviii
9.1 Thesis Contribution ........................................................................................122
9.2 CONCLUSIONS ....................................................................................................123
9.3 Future Scope ...................................................................................................125
References ......................................................................................................................127
Appendix-A ....................................................................................................................134
Appendix-B ....................................................................................................................135
Appendix-C ....................................................................................................................136
Appendix-D ....................................................................................................................139
Appendix-E.....................................................................................................................141
Appendix-F .....................................................................................................................142
Appendix-G ....................................................................................................................146
Appendix-H ....................................................................................................................148
Appendix-I ......................................................................................................................150
ABSTRACT
Parallel computing involves simultaneous use of multiple compute
resources to solve a large computational problem. In the real world applications,
![Page 19: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/19.jpg)
xix
many of the tasks can be executed in parallel. Parallel computing is being used in
diverse areas that range from computational simulations for technical and
engineering problems to marketable applications in transaction processing and
data mining. The performance and energy benefits of parallelism are key drivers
for the growth in parallel computing.
Data security is a critical issue for businesses and individual computer
users. Client information, payment information, personal files, bank account
details – information that is typically needed any commercial transaction is
potentially dangerous if it falls into the wrong hands. Thus, to secure data or
information, various cryptographic algorithms are being used. These encryption
algorithms are compute-intensive and tend to be slow as a result. These
algorithms can benefit significantly from parallel implementations that utilize
multicore processors available today.
In this thesis, parallel symmetric-key based algorithms to encrypt/decrypt
large sets of data have been proposed. The design of the parallel algorithms is
targeting speed and energy consumption improvements for these algorithms.
Much effort has gone in to enhance the speed of these algorithms using FPGA-
based hardware implementations in recent times. The thesis proposes software-
based parallel implementations of these security algorithms running on
symmetric multiprocessing machines.
The performance of proposed parallel algorithms with the existing
sequential implementations of the algorithms has been compared. The
comparisons of results show that the proposed algorithms have significantly
better performance than the existing sequential algorithms. Apart from the
speedup gained due to parallel implementation, the energy efficiency of the
algorithms has also been measured. Energy-efficient parallel algorithms make
them suitable for their use in the handheld devices. The proposed algorithms in
this thesis have a high potential for their adoption in the area of full-disk
encryption and other data-intensive encryption processes.
![Page 20: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/20.jpg)
xx
![Page 21: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/21.jpg)
1
Chapter 1
Introduction
![Page 22: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/22.jpg)
2
1 CHAPTER 1
INTRODUCTION
This chapter discusses security as one of the foremost concerns of today’s
computer-centric era and the requirements for security techniques along with
some historical background on cryptographic algorithms. The issues involved in
sequential security algorithms and motivation for the research has been discussed
followed by an in-depth literature review.
To secure information communications over the network, different encryption
algorithms have been used from time to time. The encryption algorithms are
further categorized into two broad categories: Symmetric and Asymmetric(
Menezes, 1996).In symmetric algorithms, the key is common to both encryption
and decryption process. Some symmetric block cipher algorithms include DES,
3-DES, AES and Blowfish. RC4 is a symmetric stream cipher algorithm(
Menezes, 1996). Asymmetric algorithms use two dissimilar keys for encryption
and decryption. The public-key infrastructure-based algorithm such as RSA is an
example of an asymmetric encryption algorithm. Apart from the security level,
speed of the encryption algorithm is also a very important aspect in the
cryptographic world. A slow algorithm can drastically affect the speed of entire
application and condenses its effectiveness. Power consumption by electronic
devices such as smart phones, tablets, and other computing systems is another
challenging issue that has become as a significant concern at the individual level
as well as the community level(the heat produced by these electronic systems
raises the temperature of greenhouse gases). In multi core processor models
applications can be executed on N number of cores where N may be variable, and
these cores can operate at diverse frequencies( Roy, 2008, POWER, 2010, Vajda
and Stenström, 2012). The overall performance and power cost of a parallel
algorithm will depend on different parameters such as the number of cores an
algorithm uses, set of frequencies these cores operate at, and last but not the
least, the formation of the parallel algorithm( Gepner and Kowalik, 2006).
![Page 23: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/23.jpg)
3
Sequential security algorithms can be made faster using parallelization.
Fortunately, with the advent of parallel processors in computing, we now have
easily available means to parallelize the algorithms to make them faster(Kumar
V).The symmetric multiprocessors such as those from Intel and AMD can be
used in conjunction with parallel programming APIs such as the OpenMP to
make security algorithms parallel and faster( Chapman B, 2008).It is feasible to
use parallel algorithms for any of the cryptographic techniques currently in use.
The basic motivation for this research is to observe if complex security
algorithms could break down their responsibilities as tasks that can be executed
in parallel, successfully leading to performance gains.
The complete chapter is organized as follows: the brief information about
technology enhancements has been given in section 1.1.In section 1.2, the
overview of cryptographic techniques have discussed. Afterwards, issues related
to cryptographic algorithms in next section followed by the possible solutions for
that. In section 1.5 importance of research has stated and after that the research
problem is given. Literature review related to the research problem is presented in
later section.
1.1 Technology enhancements
Cryptography is a scientific discipline where immense calculations are
required in order to secure the information of any type (William, 2006,
Diffie and Hellman, 1976). It can be the data transmitted over a channel
or an important file/disk data. In recent scenarios, the complexity of
cryptographic algorithms is increasing due to massive usage of confusion
and diffusion structures, variable rounds and complex feistel function
hence leading to longer execution time (Patidar et al., 2009, Kanda,
2001).
Development in CMOS technology in terms of scaling capacity, increases
performance, escalates transistor density and condenses power
consumption( Davari et al., 1995, Auth et al., 2012). A chip with billions
![Page 24: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/24.jpg)
4
of transistors was not unforeseen by the corporate sector, as the concept
diligently follows Moore’s Law which states that “The number of
transistors in an integrated circuit seems to be doubled every two years”
(Moore, 1965). Gordon E. Moore was the co-founder of the Intel
Corporation and explained this concept in 1965.His law continues to
apply on CMOS technology. In 2006, Intel designed their first chip with
more than one billion transistors. Several transistors on a single chip grant
chances for new intensities of computing ability.
But enhancing processor’s performance without generating too much heat
is a challenge as per the following quote “Intel processors would soon be
producing more heat per square centimeter than the surface of the sun,
which is why the problem of heat is already setting hard limits to
frequency (clock speed) increases(Koch, 2005)”. Parallel Programmers
need to take care of frequency and voltage in order to reduce power
consumption by the complex applications.
1.2 Cryptography
Over a period of times, an ostentatious set of rules and methods have been
specified to lever the information security concern. In early days, the
major purpose of cryptography was to achieve message confidentiality
which is the transformation of messages from a clear form into a
perplexing one and vice versa at the other end. The basic idea behind this
process is to make the information unreadable by unauthorized parties, to
confirm privacy in communications. The pictorial representation of
cryptography has been shown in Fig. 1.1. Now days, the arena has
extended beyond confidentiality and includes procedures to ensure
message integrity, digital signatures, authentication and secure
calculations. In recent scenarios, cryptography comes with the following
objectives(William, 2006, Yu et al., 2010):
1) Confidentiality: It refers to limiting information access and averting
access to unauthorized users.
![Page 25: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/25.jpg)
5
2) Integrity: It refers to the trustworthiness and consistency of
information resources.
3) Non-repudiation: It ensures that the source of the information cannot
decline its intents in the transmission of the information at the later
stage.
4) Authentication: It refers to the process of confirming the identities of
the sender and the receiver along with the confirmation about the
source and the destination of the information.
Fig. 1.1 Pictorial representation of Cryptography (Source:
http://www.onlinebusiness.newstipstricks.com/what-is-cryptography/)
Cryptography comprises of diverse approaches such as amalgamation of
words with images, microdots to hide information in transfer and many
more. It basically, ensures the sender and receiver that the information
![Page 26: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/26.jpg)
6
cannot be retrieved by unauthorized parties. The most common traditional
ciphers fall in two major categories:
Substitution techniques: As per this technique, the alphabets or digits of
plaintext are substituted by other alphabets or by digits or special
symbols. Caesar Cipher, Mono alphabetic ciphers, Hill ciphers, Poly
alphabetic ciphers are the ciphers belonging to this category( Menezes,
1996).
Transposition techniques: Apart from the substitution, a dissimilar kind
of mapping is accomplished by performing some permutations based on
some predefined function, on the plaintext( Menezes, 1996). This is
mentioned as transposition cipher. Rail fence technique is the common
method applied to many cipher algorithms to achieve permutation. In this
method, the plaintext is written as an arrangement of diagonals and then
read off that arrangement row wise. For example, to encipher the message
“Go to Party” with a rail fence method, can be written as:
G t t e a t
O o h p r y
Finally, the encrypted message is: “Gtteatoohpry”
To implement any cryptographic technique, following key elements must
be involved:
Plaintext - The original understandable and comprehensible message
Cipher text - The altered message
Cipher - An algorithm for converting a plaintext into cipher text by
transposition and/or substitution methods.
key - some significant information used by the cipher to manipulate
text and only known to the sender and receiver
Where, the cipher or an algorithm is the most important element as all
functionality related to enciphering or deciphering is with this algorithm.
![Page 27: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/27.jpg)
7
Another important element is key. The longer and harder the key, the time
taken to deduce the key will increase.
1.2.1 Classification of Cryptographic techniques
Broadly Cryptographic techniques are divided into two categories.
Asymmetric Infrastructure based techniques
Asymmetric cryptographic algorithms use two different keys for
encryption and decryption(Salomaa, 1996). The key that is used for
encryption process is known as public key and the key used for decryption
process is known as private key. That means sender should have public
key and receiver should have private key to decrypt the message sent by
sender. RSA, DSA are popular asymmetric also known as public key
infrastructure algorithms(Schoen and Boberski, 2002).
Symmetric Infrastructure based techniques
The symmetric algorithms, also known as private key based algorithms use
same key for both encryption as well as decryption process( Bellare and
Yee, 2003). The private keys used in symmetric-key cryptography are
strongly resistant to brute force attacks. That means private-key
algorithms are more difficult to break than their public key counterparts.
Additionally, secret-key algorithms require less computing power to be
created than equivalent private keys in public-key cryptography. Figure.
1.2 is the pictorial representation of symmetric infrastructure based
security techniques.
![Page 28: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/28.jpg)
8
Fig. 1.2 Pictorial representation of Symmetric key infrastructure based
security algorithms. (Source: http://www.powayusd.com)
1.2.1.1 Classification of Symmetric Infrastructure based techniques
Furthermore, Symmetric algorithms are divided into two categories:
Stream Cipher and Block Cipher algorithms.
1.2.1.1.1 Stream ciphers
Stream Ciphers are one of the common type of f encryption algorithms.
They encrypt individual characters of a plaintext message one at a time,
using an encryption transformation. Figure 1.1 shows the working of
stream cipher encryption technique.
Fig.1.3 Stream Cipher Encryption Techniques
![Page 29: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/29.jpg)
9
1.2.1.1.2 Block ciphers
A block cipher encrypts data in fixed sized blocks (commonly of 64 bits).
The most commonly used block ciphers are Triple DES , DES, Blowfish
and AES. Figure 1.2 demonstrates the working of block cipher encryption
technique.
Fig.. 1.4 Block Cipher Encryption Techniques
1.2.1.2 Symmetric key based algorithms
The brief overview of some of the commonly used stream/block
symmetric algorithms is stated in this section (Schneier, 2008).
Blowfish
Blowfish (Schneier) is a symmetric encryption algorithm designed By
Bruce Schneier in 1993. It has a 64-bit block size and a capricious key
length that ranges from 32 bits to 448 bits. At the time of doing key
scheduling, it produces huge pseudo-random lookup tables by doing many
encryptions. All required tables depend on the complex key that is
supplied by the user. This technique has been confirmed to be highly
defiant against several attacks such as differential and linear
cryptanalysis. But at the same time, this also means that the algorithm
cannot be used for those systems where huge memory space is not
available. Since then Blowfish has been considerably receiving attention
as a strong encryption algorithm. It is unpatented and license-free.
![Page 30: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/30.jpg)
10
Cast
CAST stands for Carlisle Adams and Stafford Tavares, the inventors of
CAST (Heys, 1994). It is also a popular 64-bit block cipher which
belongs to the class of symmetric encryption algorithms.
Data Encryption Standard (DES)
Data Encryption Standard (DES) was implemented in the United States as
a federal standard in 1977 (Madson, 1998). DES uses a 56-bit key to
encrypt and decrypt data that is in the form of fix size blocks where each
block size is 64 bit. The DES algorithm has 16 rounds, which means the
main algorithm is repeated 16 times to generate the cipher text. It has
been observed that the number of rounds in DES is exponentially
proportional to the total of time needed to locate a key using a brute-force
attack. As the number of rounds increases, the security of the algorithm
increases exponentially.
IDEA
International Data Encryption Algorithm, abbreviated as IDEA (Schneier,
2008) is a symmetric encryption algorithm and was developed by Dr. X.
Lai and Prof. J. Massey. This was the replacement of DES algorithm. It
uses 128 bit key. The size of the key makes it unfeasible to break by
simply trying permutation and combinations.
RC4
RC4 stream cipher was developed by Ron Rivest in 1987(Schneier,
2008). The key size of the cipher is f up to 2048 bits (256 bytes). The
algorithm is extremely fast. Because of its speed, it is being used in many
applications. The algorithm is divided into two sub algorithms, one is for
key generation and another is for encryption. For encryption, the output of
the generator is XOR with the data stream.
Triple DES
Triple DES (William, 2006)is a deviation of Data Encryption Standard
(DES). It uses a 64-bit key from which first 56 bits are effective key bits
![Page 31: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/31.jpg)
11
and 8 are considered as parity bits. The block size for the algorithm is 8
bytes that is 64 bits. The thought behind the proposal of Triple DES is to
improve the security of DES by implementing DES encryption three
times using three different keys. The algorithm is considerably secure but
very slow.
1.3 Issues in Symmetric Key Infrastructure based Algorithms
1. Apart from the security, Execution time or speed is also very important
aspect of security algorithm. In recent days, there are maximum hardware
based FPGA implementations of these algorithms to enhance the speed.
This dissertation presented the software based parallel implementations of
security algorithms providing good speed up on symmetric multiprocessor
machine.
2. Another major challenge these days is to reduce the power consumption
by software applications. These algorithms will consume more energy on
uniprocessor systems due to the massive calculations they do. In this
thesis, it has been proved that parallel implementations are more energy
efficient.
1.4 Possible Solutions
1) Hardware based approaches: There are three types of approaches that
can be implemented at hardware level.
1.1) FPGA Implementations: A field-programmable gate array
(FPGA), as the name suggests, is a cohesive circuit which is
configured or programmed by client using hardware description
language (HDL) after manufacturing(Zeidman, 1999). FPGAs
have large RAM blocks, number of logic gates and very fast I/O
and bidirectional data buses to implement composite digital
calculations. FPGAs comprise of programmable logic mechanisms
termed "logic blocks", and a sequence of reconfigurable
interconnects so that different blocks can be wired together. Logic
![Page 32: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/32.jpg)
12
blocks can be designed to accomplish multifaceted combinational
functions, or simply logic gates like AND, OR and XOR.
1.2) ASIC Implementations: An application specific integrated circuit
is an assimilated circuit that is designed for a specific use, instead
of general purpose usage(Sato et al., 1991). For example, a small
chip customized to implement in a digital voice recorder is an
ASIC. Application specific standard products (ASSPs) are
transitional between ASICs and corporate standard ICs like the
4000 or the 7400 series. Latest ASICs generally contain complete
microprocessors, memory lumps including ROM, RAM,
EEPROM, flash memory and other large building blocks. Such an
ASIC is called a system-on-chip (SoC) system. Verilog or HDL is
used to program these ASICs (Palnitkar, 2003).
1.3) VLSI Implementations: VLSI is an acronym for Very-large-scale
integration which is the method of constructing an integrated
circuit by associating large number of transistors into a single
chip(Mead and Conway, 1980). A circuit may comprises of a
CPU, RAM, ROM and other adhesive logic. This technology lets
IC manufacturers implement all of these into a small single chip.
2) Software based approaches: In today’s computing era, most of the
cryptographic algorithms are implemented at software level. Parallel
computing is one the possible solution for above mentioned issues
because it can make better use of essential parallel hardware. In recent
scenarios, desktops or laptops are parallel in design with multiple
cores/processors. In general cases, serial programs executed using latest
computer architecture "waste" possible computing power. Parallel
software is explicitly envisioned for optimum utilization of parallel
hardware with multiple cores.
![Page 33: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/33.jpg)
13
1.5 Motivation of Research:
Security has always been a biggest concern for the computing world in
terms of transmitting information and data across the networks. Security
algorithms are usually implemented serially which are bit slow as it takes
time to perform calculations for encryption as well as decryption. It also
requires large amount of memory which is sometimes not possible for a
single processor.
Parallel computing is an emerging area which uses multicore processor
for the faster and efficient execution of the instructions. So to achieve
higher performance in the area of security, the security algorithms can be
implemented in parallel.
Parallel implementations of security algorithms are also very important in
the area of mobile computing and high-end servers as a means to reach
high performance targets while also maintaining acceptable power
characteristics, as security algorithms are more computation intensive and
the implementation of security algorithms concurrently will help to reduce
power consumption which is one of the most critical aspects in above
mentioned areas.
1.6 Research Problem
Considering above mentioned issues related to the sequential
cryptographic algorithms, this dissertation is proposing parallel
algorithms for symmetric-key based encryption methods implemented on
Symmetric multiprocessor machine using OpenMP and analysis of
performance gains by parallelizing the algorithms through experiments
over large number of data sets. To achieve the desired outcome, research
problem is divided into following objectives:
To study sequential algorithms that can be used to implement
symmetric key cryptography and execute them.
To come up with parallel algorithm for symmetric-key based-
encryption methods.
![Page 34: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/34.jpg)
14
To implement parallel algorithm on multi core machine using
OpenMP.
To analyze how much performance can be gained by parallelizing the
security algorithm through experiments over large number of data sets
and through utilizing various parameters of the algorithms.
To use parallelism leading to more energy-efficient algorithms for
intensive computations and making them more applicable to real time
applications such as cryptographic algorithms.
1.7 Literature Review
There are many security algorithms based on symmetric key infrastructure
as discussed in Section 1.2.1.2. But the thesis is proposing general parallel
framework for the Feistel network and for stream ciphers. Thus, three
different algorithms based on feistel network and stream cipher properties
are chosen to test the performance of feistel framework and parallel
stream cipher framework. Blowfish (from the category of Block ciphers)
is feistel network based algorithm and thesis is presenting the
performance of parallel blowfish after implementing parallel feistel
framework. RC4, RC4A (from the category of stream ciphers) are
considered to test the parallel framework and thus the performance
enhancement of parallel algorithms based on that framework. In this
section, the introduction and structure of the existing algorithms along
with the related work for these algorithms is presented.
1.7.1 Description of Blowfish
Blowfish is a private key infrastructure based block cipher security
algorithm that uses only single key to encrypt and decrypt the data
(Schneier). Blowfish was designed by Bruce Schneier in 1993. The
algorithm has 64-bit block size and a changeable key length from 1 bit to
448 bits. It is simply appropriate for those applications in which the key
does not alter frequently, for example an automatic file encryptor or a
communications link. The algorithm encompasses two sub algorithms:
![Page 35: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/35.jpg)
15
Key expansion and Data encryption algorithm. Key expansion function
will convert the key into many sub key arrays that ranges up to 4168 bytes.
On the other hand, the data encryption occurs through a 16-round Feistel
network. All rounds consist of a key-dependent transformation, and a key
and data dependent replacement. The algorithm uses simple operations like
exclusive-or, addition, table lookup, modular multiplication. All these
operations are efficient on microprocessor. The following elements are
involved in both functionalities:
Key expansion function:
P-box: which are eighteen 32-bit boxes from P1 to P18 used to perform
bit shuffling.
S-box : Substitution box for non-linear functions which are four 32-bit
arrays with 256 entries each. All of these boxes are initialized with a
fixed string, the hexadecimal digits of pi.
Blowfish Algorithm uses a large number of sub keys. These keys could
be pre computed for faster encryption or decryption process.
Data Encryption function:
Feistel function, where input data is divided into two halves.
F function: This is the commonly used function in Blowfish. It
necessitates a 32 bit input data to be divided into four eight bit blocks.
Each block references the S-Box and each entry of the S-box output a
32 bit data. The output of S-box 1 and S-box 2 are added first and then
result is XOR with S-box 3. Finally, S-box 4 is then added to the
output of the XOR operation and it provides a 32 bit data as output.
1.7.1.1 Related work
In the emerging era of data-intensive computing and low-cost internet
connections for global data communications, there is a higher demand for
data security and computational speedup. In recent years, successful
studies have been made using hardware acceleration technique, FPGA
![Page 36: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/36.jpg)
16
implementations, using CUDA’s GPGPU platform to speed up the
execution of cryptographic algorithms. Liu et al (Liu, 2012) presented
implementation method for power efficient hardware acceleration of RSA
and Blowfish cryptographic algorithms. They were able to condense the
energy consumption by 9.6% for RSA and 36% for the Blowfish
algorithms, separately. However, their approach is based on co-processor
design on an FPGA platform.
Krishnamurthy G.N et al (G.N, 2007) presented the performance
enhancement of Blowfish by modifying its F function without violating
memory requirements, security and simplicity of existing blowfish
algorithm. The presented modification was only limited to the change in
the implementation of F function of the feistel network. That means the
existing Blowfish divide X1 into four eight-bit quarters: a, b, c and d and
F(X1)= ((S1,a+S2,b mod 232 ) XOR S3,c)+S4,d mod 232 whereas the
modified F(X1)= ((S1,a+S2,b mod 232 ) XOR (S3,c +S4,d mod 232 ).
Thus, it supports to the parallel evaluation of two addition operations by
using threads. They were able to reduce the overall execution time by
14%.
An ASIC implementation of low power and high throughput blowfish
security algorithm has presented by P. Karthigai kumara and K. Baskaran
(P. Karthigai Kumara June 2010). The algorithm was prototyped in
130 nm custom integrated circuit.
Krishnamurthy G.N et al (G.N, March 2008) presented the Performance
enhancement of Blowfish and CAST-128 algorithms and Security analysis
of improved Blowfish algorithm using Avalanche effect. With the help of
VHDL implementation, it was observed that the reduction in time
achieved for encryption and decryption is above 12.5 % compared to the
existing algorithm.
P. Karthigai kumara and k. Baskaran (P. Karthigai kumar 2010) explored
and presented the partially pipelined VLSI implementation of blowfish
encryption/decryption algorithm. This implementation is a partial
![Page 37: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/37.jpg)
17
pipelined, vigorous architecture of Blowfish algorithm in hardware. The
proposed design attains an implausible encryption speed of 2670
MBits/sec and decryption speed of 2642 MBits/sec.
Authors of pipelined approach for High-Performance Implementation and
Evaluation of Blowfish Cryptographic Algorithm on Single-Chip Cloud
Computer (SCC) (Kamak Ebadi, Dec 2012) presented parallel approach to
blowfish on special processor. This was an experimental processor having
48 core architecture created by Intel Labs for research projects. According
to this model, the input file is split into number of small data chunks and
each data chunk undergoes a sequence of computations based on the
blowfish security algorithm, each core is responsible to perform single
round of computations and then data is sent to next core for next round of
computations. Authors illustrate that this approach is 27X faster than the
sequential one. However, in this pipelined model, the use of large data
chunks can cause bandwidth saturation and higher latency which further
leads to longer execution time.
1.7.2 Description of RC4
RC4 is the most common algorithm and is used in popular protocols like
secure socket layer (SSL) to protect web browsing and in WEP to protect
the wireless networks (Schneier, 2008). Other application areas of RC4
are Skype and Bit Torrent protocol system. RC4 generates key stream that
is random stream of bits. The key stream is combined with the plaintext
using bit-wise XOR to generate the encrypted text. The algorithm has two
main parts: the key scheduling algorithm (KSA) and the pseudo random
generation algorithm (PRGA).
The KSA is used to initialize the permutations in the ‘S’ array. The "key
length" is the number of bytes in the key and the range of “key length” is
from 1 to 256.
![Page 38: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/38.jpg)
18
For i=0 to 255
S[i]:= i
End loop
Set j: = 0
For i=0 to 255
Set J: = (j + S[i] + key [i mod key length]) mod 256
Swap values of S[i] and S[j]
End loop
Algorithm1.1 The key Scheduling Algorithm (KSA) (Schneier, 2008)
As shown in algorithm 1.1, the S array is first initialized with digits 0 to
255. Then with the help of S array elements and the keys, the j values are
calculated. S[i] and S[j] are then swapped to generate a permuted array.
The whole process is executed 256 times to generate a random key
stream.
The iterations in the PRGA algorithm, as shown in algorithm 1.2, depend
on the input size. In each of the iterations, there is a different value for
ranging from 1 to 255. If the input length is more than 255 then the
process again starts from 1 and continues until the last byte. For each of
the iterations, the value for j is calculated, S[i] and S[j] are swapped, and
the sum of S[i] and S[j] mod 256 is looked up in the S array to return one
byte. This byte is then XOR with one individual letter in plaintext to
convert it into cipher text.
Algorithm1.2 The Pseudo-Random Generation Algorithm (PRGA) (Schneier,
2008)
While Generating Output:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
Swap values of S[i] and S[j]
K: = S [(S[i] + S[j]) mod 256]
Output
End loop
![Page 39: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/39.jpg)
19
1.7.2.1 Related work
Many researchers have worked on the parallelization of stream ciphers
security algorithms using hardware acceleration techniques. K.H. Tsoi et al
(Tsoi, 2002) presented a parallel FPGA implementation of RC4 algorithm
in 2002. FPGA designs employ parallelism at the logic level to increase
the number of operations per cycle by RC4 search engine. In their design,
they have used on-chip memories to attain very high memory bandwidth,
floor planning to condense routing delays and multiple decryption units to
accomplish further parallelism. Total 96 number of RC4 decryption
engines was integrated on a single Xilinx Virtex XCV1000-E field
programmable gate array (FPGA). The resulting design operates at a 50
MHz clock rate and gained a search speed of 6.06 × 106 keys/second,
which is a speedup of 58 over a 1.5 GHz Pentium 4 PC.
In 2009 (Li, 2009, August) Changxin Li, Hongwei Wu, Shifeng Chen,
Xiaochao Li, Donghui Guo have presented an efficient implementation for
MD5-RC4 encryption/decryption algorithm using NVIDIA’s Graphics
Processing Unit with CUDA programming framework. The algorithm was
implemented on NVIDIA GeForce 9800GTX GPU and they got 3-5X
speedup.
In 2012 T.D.B Weerasinghe (Weerasinghe, August 2012) presented a java
based multithreaded implementation of RC4 algorithm using i3 and i7
series processors. The proposed method does not parallelize RC4, instead
it introduces a way that multithreading can be used to perform encryption
and decryption when the message is in the form of text file. According to
the author, the plaintext is in the form of text file and the file is split into
number of small files and then these files are encrypted separately using
RC4 cipher.
1.7.3 Description of RC4A
Souradyuti Paul and Bart Preneel have proposed an RC4 variant, which
they call RC4A(Paul and Preneel, 2004). RC4A uses two state arrays S1
and S2, and two indices j1 and j2. Each time i is incremented, two bytes
![Page 40: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/40.jpg)
20
are generated. First, the basic RC4 algorithm is performed using S1 and j1,
but in the last step, S1 [i] + S1 [j1] is looked up in S2. Second, the
operation is repeated (without incrementing i again) on S2 and j2, and S1
[S2 [i] +S2 [j2]] is output. The algorithm has two main parts: the key
scheduling algorithm (KSA) and the pseudo random generation algorithm
(PRGA).
1.7.3.1 KSA
In this algorithm, the key stream is generated with the help of a variable
length key with an internal state comprised of the following key elements:
1) Four 256 bytes S1-S2 arrays that contains a transformation of these 256
bytes
2) Three index pointers i, j1 and j2 which will use to point elements in the
S1 and S2 arrays
The algorithm will start with initializing two arrays with the values from 0-
255 that means the values in the array are equal to their index. Once the
arrays are initialized, the next step is to generate random numbers and
store in these two arrays to make them permutation arrays. For this, simply
iterate the array 256 times, compute the value of j1 and j2 pointers with the
help of j1 = j1 + S[i] + key[i mod key-length] formula where key is the
user’s input value and The "key length" is the number of bytes in the key
and the range of “key length” is from 1 to 256.
As already discussed, the only operation on these S arrays is swap, the only
effect is a permutation and all S arrays contains all random numbers from
0-255.
1.7.3.2 PRGA
In this step, generated key stream is XORed with plaintext to produce
encrypted text in the form of a sequence of bytes. All arithmetic is
performed modulo 256. The iterations in the PRGA algorithm, depend on
the input size. In each out of 256 iterations, there is a different value that
ranges from 0 to 255. If the input length is more than 255 bytes, then the
![Page 41: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/41.jpg)
21
process again starts from 0 and continues until the last byte. For each of
the iterations, the value for j1 and j2 is calculated, S1 [i] and S1 [j1] are
swapped, and the sum of S1 [i] and S1 [j1] mod 256 is looked up in the S2
array to return one byte. Same operation applies to S2 array. Returned
bytes are then XORed with one individual letter in plaintext to convert it
into cipher text.
1.7.3.3 Related Work
Authors of the paper (Noman, 2009) presents efficient hardware
implementation of new stream cipher, RC4A. The proposed hardware
implementation achieves a data throughput up to 22.28 MB/sec at
frequency of 33.33 MHz and the performance in terms of throughput to
area ratio equal to 0.37. The implementation is also parameterized in order
to support variable key lengths, 8-bit to 512-bit. The cipher was designed
using Verilog hardware description language and implemented into a
single Altera APEX TM 20K200E Field Programmable Gate Array
(FPGA).
1.8 Dissertation Contribution And Delineate
In this dissertation, two major issues related to sequential cryptographic
algorithms are addressed: First is the slow execution and second is the
energy consumption (Noman, 2009). To solve these two issues, a parallel
stream cipher structure and a parallel Feistel Network structure is
proposed, which can further implement in any feistel based block cipher
and stream ciphers to make them parallel. The thesis has incorporated
three parallel algorithms based on RC4, RC4A and Blowfish. In this
regard, the thesis is organized into eight chapters.
Chapter 1 incorporates the major concerns of today’s computer centric
era, requirement of security techniques along with some historical
background of cryptographic algorithms, issues involved in sequential
security algorithms, and motivation for the research followed by literature
review.
![Page 42: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/42.jpg)
22
In Chapter 2, the basic concepts of parallel programming along with
different techniques and experimental setup used for the research is
discussed. These concepts have been used throughout the whole thesis to
parallelize the algorithms.
Chapter 3 will briefly introduces the stream ciphers and its types,
Motivation for the parallel design of Parallel Additive stream cipher
(PASCS) along with the description of new architecture.
In Chapter 4, description of Parallel Feistel network and parallel F
function is provided.
PARC4 –The parallel approach for RC4 using PASCS and corresponding
data parallel model along with along with its experimental results and
comparisons with existing one is given in Chapter 5.
Chapter 6 introduces PARC4-1, parallel implementation of RC4A (RC4
variant) using PASCS and loop unrolling method.
PBlock- Parallel implementation of Blowfish Using Parallel Feistel
Network along with the results and comparisons is described in Chapter 7.
In Chapter 8, the thorough discussion about the energy measurements of
all three parallel algorithms along with description of low power and high
power states of the processor to find the power benefit is given and
Chapter 9 will have the conclusions and future scope of the research.
![Page 43: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/43.jpg)
23
Chapter 2
Basic Ideology Of Parallel
Computing, Tools And
Experimental Setup Used For
Research
![Page 44: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/44.jpg)
24
2 CHAPTER 2
BASIC IDEOLOGY OF PARALLEL COMPUTING,
TOOLS AND EXPERIMENTAL SETUP USED FOR
RESEARCH
Concurrent execution of tasks has been in use for many years, especially in high-
performance computing, but more attention in this area has developed recently
because of some physical constraints that prevent frequency scaling. Since power
consumption by computers has turned out to be a major concern in recent years,
parallel computing has grown to be the leading archetype in computer
architecture. In this chapter, brief introduction of each of the concepts which are
used for this research will be discussed. parallel programming along with its
applications and fundamentals have been discussed. On the basis of these
concepts, further parallelization of algorithms is described in detail.
2.1 Introduction
Parallel computing is a type of process in which numerous sub processes
are carried out concurrently on multiple cores. The concept is based on
the theory that huge problems can be divided into smaller ones, and
solved simultaneously. Parallel computing provides different levels of
parallelism: bit-level, instruction level, data, and task level
parallelism(Quinn, 1994, Almasi and Gottlieb, 1988).
Furthermore parallel computers can be broadly categorized by the level
by which the hardware support concurrent execution of tasks:, multi-
processor and multi-core computers having several computing nodes
within a single machine, whereas clusters and grids use number of
computers collectively to work on the shared task.
![Page 45: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/45.jpg)
25
Parallel algorithms to solve any computational challenge are more
complex to write than sequential programs, since parallelism comes with
numerous possible software bugs, out of which synchronization, data
locality problem and race conditions are the most ordinary one(Leighton,
1992). Inter-processor Communication and synchronization among the
different sub processes are usually some of the key hindrances to having
good performance.
There is a tremendous impact of parallel computing on a number of
diverse areas that ranges from computational simulations for technical and
engineering problems to marketable applications in transaction processing
and data mining(Bo, 2009). The cost and energy benefits of parallelism
tied with the performance necessities of applications that present
convincing point of view in support of parallel computing. Although there
is a huge scope of parallel computing but here it has been divided into
three common categories(Kumar et al., 1994):
1) Engineering and Design applications
Conventionally, parallel computing has been used in the design of airfoils
and high speed circuits’ etc. Now the days, it is being used in making
design of micro-electro-mechanical and nano-electro-mechanical systems
and has engrossed noteworthy attention.
2) Scientific applications
The past few years have seen a revolution in high performance scientific
computing applications. Advancements in computable physics and
chemistry are paying attention in learning processes that ranges in scale
from quantum phenomena to macromolecular structures. As a result, we
have design of new resources and more proficient processes.
Bioinformatics and astrophysics are other good areas which present many
demanding problems with respect to investigating enormously large
datasets.
![Page 46: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/46.jpg)
26
3) Applications in computer systems
In this domain, computer security will be the major agenda and under this
area, intrusion detection is a great challenge. For intrusion detection in a
network, data is collected at dispersed sites. For signaling intrusion, it
must be analyzed speedily. In the area of cryptography, some of the most
impressive applications of Internet- based parallel computing have paying
attention on factoring enormously large set of integers.
The complete chapter is organized as: In section 2.1, introduction of
parallel programming and applications of it, is mentioned. Afterwards in
section 2.2, the concepts of parallel programming specifically used for
this research is explained because these concepts have to serve as baseline
concepts to parallelize security algorithms which are presented in the
following chapters. This section covers all the notions like which type of
parallel computer is used, how to identify parallel regions of the code,
decomposition of problem, mapping ,load balancing and speed up
calculations using Amdahl’s law. Experimental setup used for research
purpose is described in section 2.3.Tools used to parallelize all algorithms
are briefly discussed in next section.
2.2 Essential notions of parallel programming used in Research
There are some key concepts which serve as the basis of our
parallelization methodology. These concepts are:
1) Identifying portion of work that can be performed concurrently
2) Type of parallel computer
3) Decomposition technique and mapping method for load balancing.
2.2.1 Identifying Parallel Region in Code
There are many methods to check the parallel and strictly sequential
regions in code. One and most important is to check the code for
dependencies. If there is any type of data dependency in code, it should be
removed to make it parallel. If due to any constraint, the given
![Page 47: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/47.jpg)
27
dependency cannot be removed, that portion of code cannot be
parallelized. There are different types of dependencies as discussed next.
(Barney, 2010).
1) Flow dependency
Flow dependency also known as read-after-write (RAW)
dependency(Babb, 1984). It occurs when an instruction depends on the
output of a previous instruction. For example:
1. A = 3
2. B = A
3. C = B
Here, 3rd instruction is dependent on 2nd instruction, because the final
value of C depends on the previous instruction which is updating the
value of B. 2nd instruction is dependent on 1st instruction, because the
concluding value of B depends on the instruction updating the value of A.
Since each instruction is dependent on each other instruction level
parallelism is not an option in this example.
2) Anti-dependency
Anti-dependency means write-after-read (WAR) (Babb, 1984). It happens
while an instruction needs a value that will be updated in future. In the
following example, 2nd instruction anti-depends on 3rd instruction.
1. B = 4
2. A = B + 2
3. B = 8
The sequence of these instructions cannot be altered, nor can they be
implemented in parallel because it would affect the final outcome of A.
3) Output dependency
Output dependency is popularly known as write-after-write
(WAW)(Babb, 1984). It arises when the arrangement of instructions will
![Page 48: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/48.jpg)
28
affect the final value of a variable. In example below, there is an output
dependency between 3rdinstructions and 1st instruction. 1. B = 5
2. A = B + 1
3. B = 10
But changing the execution sequence of instructions will change the final
value of B therefore these statements cannot be implemented in parallel.
4) Control Dependency
An instruction is said to be control dependent on a previous instruction if
the result of final instruction defines whether previous instruction should
be executed? In the example below, I2 instruction is control dependent on
I1 instruction. Though, I3 is not control dependent upon I1 as I3 is always
executed regardless of output of I1.
I1. If (x == y)
I2. x = x + y
I3. y = x + y
2.2.2 Type of Parallel Computer
There are different methods to categorize parallel computers. According
to Flynn's Taxonomy (Barney, 2010) multi-processor computer
architecture system is organized according to Instruction Stream and Data
Stream. Each of these proportions can have only one of two possible
states: Single or Multiple. There are four possible classifications:
i. Single Instruction, Single Data (SISD):
A serial computer
Single Instruction: one instruction stream is being executed by the
CPU during one clock cycle.
Single Data: one data stream is being used as input data during
one clock cycle.
![Page 49: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/49.jpg)
29
Examples: mainframes, minicomputers and workstations.
ii. Single Instruction, Multiple Data (SIMD):
A form of parallel computer
Single Instruction: All processing elements execute the similar
instruction at any given clock cycle.
Multiple Data: Each processing element can operate on dissimilar
data elements.
Graphics processing units (GPUs), AMD andIntel’s multicore
processors are available in market.
iii. Multiple Instructions, Single Data (MISD):
A form of parallel computer
Multiple Instructions: Each core/processing unit operates on the
data independently by using separate instruction streams.
Single Data: data in the form of sequence of bits/bytes is being
used as input data during one clock cycle.
Some conceivable uses of this type of system is might be:
Multiple security algorithms attempting to break a single coded
message.
For fault-tolerance purposes
iv. Multiple Instructions, Multiple Data (MIMD):
A form of parallel computer
Multiple Instructions: Each core/processing unit operates on the
data independently by using separate instruction streams.
Multiple Data: Every core will work with a different data stream
Currently, the most common type of parallel computer falls in this
category is Supercomputers.
![Page 50: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/50.jpg)
30
Examples: supercomputers, multi-processor SMP computers and
multi-core PCs.
2.2.3 Speed up calculation using Amdahl’s law
The speedup of a program using multiple cores or processors is restricted
by the time required to execute the sequential portion of the program. This
is known as the Amdahl law(Hill and Marty, 2008). As per this law, if 10
hours are required to execute a program using a single processor core, and
the sequential part of the program take an hour whereas the remaining 9
hours (90%) program can execute in parallel or concurrently, then
irrespective of how many cores are dedicated to a execute the parallel
portion, the minimum time required to accomplish the whole task cannot
be less than that one hour. Hence the speedup is restricted to be no more
than 10. Following equation can be used to calculate speedup:
Speedup = (2.1)
Where P = Parallel fraction, N = Number of processors and S = serial
fraction.
2.2.4 Parallel computing memory architecture
Memory architecture of a parallel computer is based on either shared
memory system or distributed memory scheme.
Shared Memory: In parallel computers based on this type of memory
architecture, all processors access common memory as global address
space. Multiple cores or processors can work individually but share the
same memory resources. If one processor makes some changes in a
memory location, same will be visible to all other processors.
Furthermore, shared memory systems have been categorized as UMA and
NUMA, based on memory access times.
1) Uniform Memory Access (UMA): The Computer architectures in
which every portion of main memory can be accessed with equivalent
![Page 51: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/51.jpg)
31
bandwidth and latency are recognized as Uniform Memory Access
(UMA) systems. Now the days, these types of architectures are well
represented by Symmetric Multiprocessor (SMP) machines having
identical processors and equal access times to memory. UMA systems are
also known as CC-UMA that means Cache Coherent UMA. That means if
one processor updates a position in shared memory, it is known to all
other processors. Cache coherency is implemented at the hardware level.
2) Non-Uniform Memory Access (NUMA): This is the combination of
two or more SMPs and can directly access memory of one another. But
the access time is not same for all processors. If cache coherency is
continued, then it will be called CC-NUMA that is Cache Coherent
NUMA
Distributed Memory: In this architecture the system requires a
communication network to connect inter-processor memory. Each
Processor has its own private or local memory. That’s why changes made
by one processor to its local memory are not having any effect on other
processor’s memory. Hence, there is no need to apply the concept of
cache coherency. When there is a requirement to access data of another
processor, the programmer has to explicitly define about the data
communication. In this thesis, the concept of shared memory architecture
is used as it is commonly works for SMPs.
2.2.5 Parallel Programming Models
The concept of Parallel programming models exists to create mapping
between hardware and memory architectures. These models are not very
specific to a particular type of memory or machine architecture. Basically,
the choice of programming model for parallel implementation is based on
the architecture of the algorithm.
There are some commonly used parallel programming models:
1) Shared Memory Model
2) Distributed Memory / Message Passing Model
![Page 52: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/52.jpg)
32
3) Data Parallel Model
4) Hybrid Model
5) Single Program Multiple Data (SPMD)
6) Multiple Program Multiple Data (MPMD)
In this thesis, Data Parallel and Multiple Program Multiple Data Models
are considered to implement security algorithms in parallel because these
two models are well suited for data oriented algorithms.
Data Parallel Model: It is also mentioned as the Partitioned Global
Address Space (PGAS) model. In this model, Global address space is
common to all cores or processors. As mentioned above, Most of the
parallel work implemented using this model emphases on execution of
operations on a data set where the data set is usually prearranged into a
common structure, For example an array or cube. There must be a set of
tasks work together on the similar data structure, but, each individual task
works on a different portion of the same data structure. On shared
memory systems, all tasks can have access to the data through global or
common memory. On distributed memory systems the data structure is
divided and exists in as small portions in the local memory of each task.
Fig. 2.1 Pictorial Representation of Data Parallel Model (Barney, 2010)
Multiple Program Multiple Data (MPMD): Like SPMD, MPMD model
is a high level programming model that can be made up using any
![Page 53: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/53.jpg)
33
grouping of the formerly mentioned parallel programming models. In
MPMD Model
Fig. 2.2 Pictorial Representation of MPMD (Barney, 2010)
Multiple programs: Tasks may perform dissimilar programs concurrently.
The programs can be threads, message passing, data parallel or hybrid
models.
Multiple Data: All tasks may use dissimilar data.
MPMD applications are more appropriate for those types of problems
where functional decomposition is used instead of domain decomposition.
2.2.6 Partitioning
After identifying the type of parallel computer the next step involves the
decomposition of data. In case of large data sets this is important to
decide that how to decompose data so that an algorithm can execute in
parallel. The optimization objective for decomposition is to balance the
work-load among processing units and to minimize the inter process
communication requirements. The number of data sets generated by the
partitioning step may not be equal to the processing units/cores, thus a
core may be idle or loaded with multiple processes. There are two
techniques to decompose data: Domain partitioning and Functional
partitioning.
Domain Decomposition: In this type of partitioning, the data associated
with an algorithm is decomposed. Following figure demonstrates it.
![Page 54: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/54.jpg)
34
Fig. 2.3 Representation of Domain Decomposition (Barney, 2010)
Functional Decomposition: In Functional partitioning the computations
involved in executing an algorithm is decomposed among multiple cores
rather than data.
Fig. 2.4 Pictorial Demonstration of Functional Decomposition (Barney, 2010)
2.2.7 Synchronization
Synchronization is one of the crucial problems in developing shared
memory based parallel software. At the user level, shared resources or
shared memory implementation is generally the usage of shared variables,
whereas at the machine level they may registers, memory locations and
status flags, etc. To increase the efficiency of parallel software, source
languages should offer high-level notions for synchronization to affluence
parallel programming and compilers are mandatory to provide exact and
efficient implementation for such notions.
Broadly there are two types of Synchronization:
Barrier
Commonly infers that all sub tasks are involved in order to
accomplish a larger one where each task executes its work till it
reaches the barrier.
![Page 55: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/55.jpg)
35
When the last task reaches the barrier, all tasks are synchronized.
Lock / semaphore
It can comprise any number of tasks. Basically, it is used to
protect access to global data or a section of code. Only one task at
a time may use the lock / semaphore / flag.
2.2.8 Mapping for Load Balancing
After decomposing data the next step is to load balancing. Load balancing
refers to the approach of distributing approximately equal amount of work
among cores so that all cores/processing units are kept busy all of the
time. The primary optimization purpose of mapping is to balance the task
load of processing unit/cores and to minimize the cost of inter-processor
communication (IPC). Commonly, the task of load balancing is to
develop decomposition and mapping algorithm for the purpose of
achieving their respective optimization objectives. Furthermore Load
balancing techniques can be broadly classified into two major categories,
one is static load balancing techniques and another is dynamic
techniques. Static load balancing techniques distribute the processes to
processors at compile time. This type of mapping is being used when the
data set is known. While dynamic techniques bind processes and
processors at run time. This approach is being used for unknown data
sets..
2.2.9 Granularity
In parallel computing, granularity refers to a measure of the proportion of
the calculation to communication. Where, Phases of calculations are
separated from phases of communication by synchronization measures.
Granularity can be classified into two categories:
Fine-grain Parallelism:
• Comparatively small amount of computational work is done between
communication events
![Page 56: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/56.jpg)
36
• There is Low ratio of calculations to communication
• Ease of load balancing
• Infers high communication overhead and less chance for
performance improvement
• It is possible that the overhead required for communications and
synchronization between tasks takes longer than the actual
calculations, if the granularity is too fine.
Coarse-grain Parallelism:
• Large amount of computational work is done between
communication and synchronization events
• There is very High ratio of computation to communication
• Infers more possibilities for better performance
• Load balancing is a complex task.
2.3 Experimental setup
A multi core processor is a single computing component(Geer, 2005). But
it can have two or more independent actual cores. These units can read
and execute various tasks and instructions concurrently, increasing overall
speed of programs which are adaptable to parallel computing. Also, this
architecture will enhance performance and reduce power consumption
which will in turn serve as a contribution towards greenhouse effects.
Figure-2.5 shows the architecture of multi core processor.
For this research, the machine setup is done with the following
configuration:
Processor - AMD FX(tm) - 8320 , eight core processor running @ 3500
MHz
RAM - 8 GB
System Type - 64 bit operating system,x64- based processor
![Page 57: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/57.jpg)
37
Operating System - Linux/Ubuntu 12.04 version
OpenMP - 4.0
Compiler- GCC
Programming Language - C with OpenMP
Fig. 2.5 Multi-Core Processor Architecture
2.4 Tools used
There are different steps involved in parallelization of an algorithm.
First and foremost condition for parallelism is that the algorithm must
be designed in a way so that it can support parallelism. To execute parallel
algorithm, following points must be considered:
•How many functions are time consuming in sequential implementation?
•Is there any type of data dependency?
Above questions can be answered with the help of a profiler tool. The
code profiler will tell how many functions are compute intensive in
algorithm, how many functions are calling other functions and how many
times and where the loop dependencies are. For this research, GNU’s
Gprof is used to profile the algorithm.
Core-1 Core-2 Core-3 Core-4
Private
memory
Private
memory
Private
memory
Private
memory
Shared Memory
Bus Interface
Chip Boundary
Off Chip Components
![Page 58: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/58.jpg)
38
2.4.1 Gprof
This is a profiler program that collects and arranges statistics on
programs(Graham et al., 2004, Fenlason and Stallman, 1988). It generates
“gmon.out” data file having all details of your program like which
function is getting executed maximum number of times. It provides
many options to get details about the program.
2.4.2 OpenMP
OpenMP is an application programming interface (API) to design parallel
algorithms using its shared memory model(Jin et al., 1999). OpenMP is
used in conjunction with C/C++and FORTRAN (Dagum and Menon,
1998, Chandra, 2001). It provides a manageable model to programmers to
develop portable and scalable parallel algorithms. It comprises of three
common components: environment variables, compiler directives and
runtime library routines. These constructs extend a programming
language which is sequential with single instruction multiple data models,
synchronization and work sharing models. OpenMP uses fork join model
for parallel programs which means only a single processor starts
execution and the moment it encounters the parallel region it distributes
the tasks among the team of other processors depending upon the
constructs and data in the region. At the end of parallel region, all
processors terminated after the completion of their respective tasks and
only the master processor will continue execution until next parallel
region encountered in the program.
2.4.3 MinGW
MinGW (Minimalist GNU for Windows) is free and open source tool for
built-in Microsoft Windows applications(Peters et al., 2010, Team, 2008).
But itis also compatible with cross-hosted on GNU/Linux platform. It
consists of a port of the GNU Compiler Collection (GCC), GNU Binutils
for Windows, a set of Windows specific header files which are freely
distributable and specific libraries which allow the use of the Windows
![Page 59: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/59.jpg)
39
API and various utilities. MinGW supports almost all languages which are
supported by GCC few of them are C, C++, Objective-C, Objective-C++,
FORTRAN and Ada.
2.4.4 CodeBlocks
Code Blocks is a cross-platform IDE that supports compiling and running
multiple programming languages.
2.4.5 Joulemeter
This project has been used to develop methods to improve the energy
efficacy of calculating devices and infrastructures. It is a demonstrating
tool to measure the energy consumption of desktops, laptops, servers,
virtual machines and even specific software applications running on a
computer. The perceptibility provided by this tool is being used to
improve energy consumption costs for data centers, desktop energy
optimizations, and mobile battery management.
Conclusion
Generally the programs are written using sequential execution model.
That means instructions are executing one after another forming a
sequence. To take the benefits of multi core machine architecture, which
can provide faster execution as compared to sequential one, the algorithm
or program must be designed or developed using parallel computing
principles. Further, profiler tools can be used to optimize the code. For
implementation, different type of manual as well as directive based
approaches can be used. In this thesis, OpenMP is used to implement the
programs in parallel.
![Page 60: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/60.jpg)
40
Chapter 3
Design Of Parallel Additive
Stream Cipher Structure
![Page 61: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/61.jpg)
41
3CHAPTER 3
DESIGN OF PARALLEL ADDITIVE STREAM CIPHER
STRUCTURE
Stream ciphers are used to encrypt distinct characters of a plaintext one at a time.
Various design methodologies for stream ciphers have been proposed and
comprehensively studied. Linear feedback shift registers (LFSRs) are commonly
used in key stream generators but these are well suited to hardware
implementations. For software implementations, usually additive or binary
additive stream cipher structures are being used. This chapter provides a
discussion on the stream ciphers and focuses on the design of the Parallel
Additive Stream Cipher Structure (PASCS) which can be used to develop parallel
stream cipher algorithm based on synchronous additive stream cipher structure.
3.1 Introduction
Stream ciphers scramble specific characters of a plaintext one at a time,
using some specific method or technique which differs with time(Mao,
2003). In hardware implementations, stream ciphers are usually faster
than block ciphers with simple hardware circuitry as compared to block
cipher’s hardware circuitry(Mao, 2003). In fact, in some of the cases the
usage of stream ciphers are mandatory. For example, in
telecommunications applications, when there is a limited buffering or
when characters must be exclusively processed as they are received, only
stream ciphers can be used. Further, it can be classified as:
a. One-time pad cipher
The Encryption process to encrypt a binary alphabet using vernam cipher
(Salomon, 2003) is defined by:
![Page 62: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/62.jpg)
42
Ci = Mi XOR Kifor i=1, 2, 3…, n, where M1, M2 up to Mn, are the plaintext
digits, K1, K2, up to Kn, are the key stream bits and C1, C2up to Cnare the
cipher text digits. Decryption is the exact vice versa of the above equation
and is defined by Mi= Ci XOR Ki. If the process used to generate key
stream is generating independent and random key stream digits, the cipher
is titled one-time pad cipher.
b. Synchronous stream cipher
In this stream cipher, the key stream is produced separately of the original
message and of the Cipher text (Fontaine, 2011). The complete encryption
process can be described as shown below using the equations:
StateSpacei+1=NextStateFunction (StateSpacei, key),
Keystreami = CalculateKeyStream (StateSpacei, key),
Cipheri=GenerateCipherText (Keystreami, Messagei)
Where Statespace0 is the primary state and can be determined from the
key, CalculateKeyStream function is producing the key stream
Keystreami, and finally, GenerateCipherText is the output function which
takes Keystreami and Messagei as input and producing Cipheri.
Additive stream ciphers
The commonly used ciphers from the category of synchronous stream
ciphers are additive stream ciphers (Cusick et al., 2004). In these
ciphers, the key stream digits are XOR with plaintext individually and
in case of decryption, the reverse process takes place by doing XOR
the cipher-text with the key stream.
c. Asynchronous stream cipher
These ciphers are also known as self-synchronizing stream ciphers and in
this type of ciphers, the key stream is created as a function of the key and
a static number of preceding cipher-text digits(Robshaw, 1995). The
encryption function can be described by the following equations:
StateSpacei= (ci-t, ci-t+1… ci-1),
![Page 63: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/63.jpg)
43
Keystreami=CalculateKeyStream (StateSpacei, key),
Cipheri=GenerateCipherText (Keystreami, Messagei),
Where StateSpacei= (ci-t, ci-t+1… ci-1) is the original state,
CalculateKeyStream function is calculating key stream of digits with the
help of StateSpace and the key. Finally, GenerateCipherText is
calculating the cipher-text.
The rest of the chapter is organized as follows:Section 3.2 states the
motivation for parallel architecture. The design for the parallel
architecture is presented in Section 3.3 followed by some of the
conclusions from the design.
3.2 Motivation For Parallel Architecture
For stream ciphers, the whole encryption process is based on bit-by-bit
encryption in a sequential manner. This mechanism doesn’t make use of
parallel computing and the multi-core processors that are commonly
available in the market today. If multiple bits can be processed at the
same time to produce multiple cipher text bits then the encryption process
can be much faster as compared to the traditional one where only a single
bit is encrypted at a time. Most of the software applications are encrypted
using stream ciphers which are based on synchronous additive stream
cipher structure discussed before. This structure is sequential in nature. A
parallel framework for the same will help to process many bits
concurrently and make the application faster.
3.3 Design of PASCS
In this framework, a key is supplied to the key stream generator which
will produce random key stream. The plaintext is in the form of fixed
sized blocks. The random key stream is supplied to each individual block
to process plaintext concurrently. The following figure illustrates the
complete process used in this framework.
![Page 64: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/64.jpg)
44
Fig 3.1 Design of Parallel Additive Stream Cipher Structure
As shown in Fig.3.1, there are n fix size data blocks. Random key stream
of same length is supplied to each block and further each bit from
plaintext block is XOR with key bit to produce the cipher text bit. This
parallel structure can be used by any stream cipher which is of
synchronous nature. The size of the block depends upon the algorithm’s
structure. PASCS is based on the concept of vernam cipher where
corresponding to each bit of plain text there is individual key bit. To keep
the essence of vernam cipher and maintain its randomness, each block
should have different key stream. Hence, modification in architecture is
required to implement the stream cipher algorithms. In this thesis, the
PASCS is applied to RC4 and RC4A algorithms to analyze the impact of
parallelization on the speed of the cipher. We discuss RC4 and RC4A in
later chapters.
![Page 65: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/65.jpg)
45
Conclusion
Linear feedback shift registers are used extensively today but for
hardware implementations. For software applications, there is a need to
redesign binary additive structures to enable parallelism so that the speed
of the application could be enhanced. PASCS is an effort towards this.
PASCS is a parallel framework based on the scheme where multiple
blocks of data in the form of bits/bytes can be encrypted or decrypted
concurrently in order to achieve improvements in performance.
![Page 66: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/66.jpg)
46
Chapter 4
PARC4: Parallel approach for
RC4 using PASCS
![Page 67: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/67.jpg)
47
4 CHAPTER 4
PARC4: PARALLEL APPROACH FOR RC4 USING
PASCS
This chapter introduces a parallel stream cipher, PARC4 which is used to encrypt
a large set of data. The implementation of the algorithm is based on PASCS
framework which we discussed in Chapter 3. This chapter focuses on the
development of the methodology to add parallelism and on the model that is used
to map PASCS architecture to gain the performance benefits. Various
performance metrics have been used to measure the performance of the
developed parallel algorithm and discussed in this chapter.
4.1 Introduction
To ensure the security of confidential data or information, different
encryption algorithms have been used. As discussed earlier, the
encryption algorithms are of two types: Symmetric and Asymmetric. RC4
developed by Ron Rivest (Rivest), is a very popular symmetric stream
cipher algorithm. It operates on individual bits to secure the message.
Although it is a faster cipher as compared to other symmetric ciphers like
DES and 3DES (Elminaam et al., 2010), this algorithm doesn’t take
advantage of today’s multiprocessing computing environments. Today’s
computing environment supports symmetric multicore programming
infrastructure. Also, if we can improve the performance of the algorithms
then we can make them more energy-efficient at their original
performance levels. We discuss this later in this thesis. To effectively
utilize all processing cores, the structure of the algorithm must support
parallelism. Furthermore, parallelism improves the speed of encryption as
well as that of decryption resulting in overall speedup of the applications,
an important need for security algorithms while working with software
![Page 68: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/68.jpg)
48
applications. Any compute-intensive algorithm, which security algorithms
are, can reduce the speed of an application and reduce its effectiveness.
With the start of the usage of Symmetric Multiprocessors (SMPs) in
computing, it becomes possible to parallelize complex computational
algorithms and make them run faster (Keckler et al., 2009), (Chandra,
2001).
The complete chapter is organized as follows: Section 4.2 discusses RC4
and how it supports parallelism. An identification of the parallelism in the
algorithm, along with the parallel techniques used in the implementation
are presented in Section 4.3. In the next section, we have included
security analysis to verify that the modified algorithm is as secure as the
original one. The results on a large set of data files along with some of the
speed up calculations are discussed in Section 4.5. In Section 4.6, the
performance of the proposed algorithm has been measured using various
metrics. PARC4 is compared with the existing multithreaded approaches
in Section 4.7 followed by a discussion on the conclusions from the work.
4.2 Detection of Parallelism
As discussed in Chapter 3, the PASCS framework is applied on the RC4
algorithm to parallelize it. RC4 has two sub-algorithms: KSA to generate
key stream and PRGA for encryption and decryption. Furthermore, KSA
(Fluhrer et al., 2001) performs a fix set of iterations but the number of
times the PRGA algorithm is called upon depends on the length of the
input data. PASCS is used to make PRGA algorithm parallel. However,
PRGA is based on the exchange shuffle model which is inherently
sequential.
As shown in Fig.4.1, the S array’s values are changed after each swap
occurs. The procedure is repeated for n times, where n is size of
message/plaintext in bits. As a result, the functional decomposition of the
algorithmwith existing structure is not feasible.
![Page 69: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/69.jpg)
49
Fig 4:1 Swapping between S[i] and S[j]
4.3 Method for Adding Parallelism
The input to PASCS framework must be supplied in the form of
individual blocks which are of fixed length. So first, the input data has
been divided into fixed size blocks. Afterwards, each individual block is
encrypted simultaneously using similar steps, and finally, the output of
each block is concatenated to make the complete cipher text. All of these
operations are being done in parallel by multiple cores to achieve
performance improvements. The next objective is to decide the size of the
block. For this, consider the following code snippet of PRGA:
// Output function: used to perform encryption and decryption
unsigned char rc4_output()
{
i = (i + 1) % 256;
j = (j + s[i]) % 256;
Swap(s, i, j);
return s[(s[i] + s[j]) % 256];
}
// f_size is a variable represents the size of the input.
// output function will be called up to f_size
1
i=0
8
i=7
2
i=1
3
i=2
4
i=3
5
i=4
6
i=5
7
i=6
i=2 j=7
If i=2 and j=7, values at S[i], S[j] will
interchange
![Page 70: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/70.jpg)
50
for (int x = 0; x < f_size; x++)
{
enblock[x] = (memblock[x] ^ rc4_output());
}
The value of index i start from 1 and goes up to the length of the input
text. Now, for the 257th iteration, i will be repeated gain starting from 1
onwards as.: i= (256+1) mod 256 = 1, (257+1) mod 256 = 2 and so on and
this repetition happens at each multiple of 256. Thus, after the completion
of 256 iterations, i will be starting from 1 onwards to calculate j’s value
and that determines the swap taking place in the S array. Moreover, the
length of the array, which is used to generate key stream, is 256 only.
Hence, it makes sense for the block size of PARC4 to be 256 bits. If input
data is not a multiple of 256 then the last block will be padded with extra
zeros to make a block 256-bit long and those added bits will be discarded
during decryption process. For 512 bytes of data, i and j value at each
iteration has been given in Appendix-C for reader’s reference.
The next important objective is to supply random key stream to each
block. For this, the value of block id has been added to the generated key
stream value and then the outcome uses mod by 256 so that the generated
number can fall within the boundary of 256. Because the block id is
different for each block, the generated key stream for each block must be
different from each other. Algorithm 4.1 represents the whole procedure
of PARC4 to run it on SMPs whereas Figure 4.2 and Figure 4.2 is the
pictorial representation of key stream generated for the parallel structure.
![Page 71: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/71.jpg)
51
Algorithm 4-1: Steps to Implement PARC4
In Algorithm 4.1, plaintext is declared as shared variable because this data
needs to be accessed by each core in small chunk sizes and the block size
variable should be known to each core. It is declared as a shared variable
in the OpenMP looping construct. Furthermore, each core works on its
own set of data and this data should be declared as private data to each
core. The line numbers 1 and 7 include the parallel region. Line 2 specifies
that each block includes fixed number of iterations and can be carried out
in a synchronized manner and Line 3 is assigning the size and range of
each block. Line 4 specifies the loop which executes a total of 256
iterations for each block in a sequential manner. To synchronize the work
done by multiple cores, synchronization constructs have been used in the
OpenMP implementation.
Procedure: Encryption
Model: Data Parallel Model with P processors [P=2, 4, 6, 8]
Input: Plaintext in the form of small chunks [Chunk Size = 256], n=number of
blocks
Output: Encrypted text
Declare: Plaintext and BlockID as shared variable, i as private variable to each
processing element
1. ParBegin
2. For ALL BlockID: [0, n] IN SYNC
3. Set Start=BlockID*256 and End=Start+256
4. For i=start to End-1 do
5. Output= ((keystream_bit+ BlockID) Mod 256) XOR plaintext_bit
6. End for
7. ParEnd
![Page 72: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/72.jpg)
52
I. I. II.
Fig 4:2: I. Depicts sequential key generation whereas II. Presents the formation
of key stream for parallel framework
The parallel implementation using PASCS necessitates random key
stream for each block. Therefore, as shown in Fig. 4.2 II, the process of
key generation has block index as an additional variable. Each block has
different index which gets added to the key bit and the final result is
calculated after applying the mod 256 operation:
(KeyByte + BlockID) Mod 256
Figure 4.3 represents the complete process of encryption as well as
decryption in PARC4. Input data is divided into fix size blocks where the
block size is 256 byte. Each block is then encrypted using modified key
stream. For reader’s reference the parallel implementation of PARC4
using OpenMP has been given in Appendix-C.
![Page 73: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/73.jpg)
53
Fig 4:3 Graphical Representation of Complete Flow and Model Used to
Parallelize RC4
![Page 74: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/74.jpg)
54
4.3.1 Parallelization techniques
PARC4 algorithm has been developed using the following parallelization
techniques:
1) Data decomposition: Data decomposition is a commonly used technique
for deriving concurrency for the algorithm where dataset is large(Kumar
et al., 1994). In this implementation “input data partitioning” method has
been used to decompose the input data. Figure 4.4 represents this
partitioning:
Fig 4:4 Pictorial Representation of Input Data Decomposition Technique
2) Mapping technique for load balancing: After the decomposition of
data, the next step is to map the specific chunk of data on different threads
(which is used by different processors) to complete the whole task in a
parallel and faster manner. As discussed in Chapter 2, dynamic mapping
is used for unknown data set. The dynamic mapping can be further
classified in two categories: centralized and distributed. In centralized
dynamic mapping, all executable tasks are maintained in a common task
pool. A master thread initiates the task and then each processor
independently takes some portion of task to perform where as in
distributed dynamic mapping the set of executable tasks are distributed
among processes which exchange tasks at run time to balance work. Each
process can send or receive work from any other process. In PARC4,
centralized dynamic mapping scheme is applied to make sure that each
![Page 75: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/75.jpg)
55
core has equal load and the work is more balanced among the processing
elements.
3) Data Parallel Model: In this model, the complete data set is arranged into
a shared structure such as an array. The set of instructions work together
on the shared data structure but each instruction works on a different
portion of the shared data structure.
4.4 Security Analysis
The modified key stream is used in PARC4, must be verified that the
stream of bits which has been supplied to different blocks, is unique and
random. The most popular method to measure randomness in data is
Entropy. It was introduced by Claude E. Shannon in 1948 and also known
as Shannon’s entropy just to differentiate from the other occurrence of it,
which appears in various parts of physics in different forms (Bekenstein,
1973).
4.4.1 Shannon’s entropy
Shannon’s entropy is an important metric in information theory(Shannon,
1951). It measures the uncertainty coupled with a random variable (Rrnyi,
1961). Various tool are available to measure entropy of random
numbers(Walker and ENT, 1998). For this research, PARC4 is tested for
randomness by using Shannon’s entropy formula which is based on
probability distribution (Shannon, 1949, Shannon, 2001) . It is calculated
using formula:
(4.1)
In Eq.4.1 Pi is the probability of given value. Here, log with base 2 is used
because information is in binary form. It gives the information about the
minimal number of bits per symbol required to encode the information
which is in binary form for log base 2. Additionally, metric entropy can
![Page 76: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/76.jpg)
56
be calculated using entropy value divided by the string length. It gives the
information about the randomness of the data in message. The entropy
metric can take the values from 0 to 1, where 1 means equally distributed
random values. To measure entropy and metric entropy for RC4 and
PARC4, two different text files containing random key stream bytes in
decimal form have been generated. Then, using online Shannon’s Entropy
tool (Shannon), it is calculated for both the algorithms. The result shows
that the H(X) = 3.72 for PARC4 and H(X) =3.14 for RC4. That tells
PARC4 requires 4 bits and RC4 requires 3 bits to encode the data value
optimally. Similarly, metric entropy of both the algorithms is 0.0071 for
PARC4 and 0.0052 for RC4 algorithm. It shows the randomness in data.
From above mentioned values, it is observed that entropy values for both
the algorithms are almost same which shows that the changes made in
PARC4 algorithm to generate random key stream for each block has not
disturbed the security of existing cipher. For reader’s reference the key
bytes generated using RC4 and PARC4 have been given in Appendix-A
and Appendix-B.
4.5 Experimental Results
To study performance improvements achieved through the parallelization
of the RC4 algorithm, firstly, the sequential RC4 cryptographic algorithm
has been executed to evaluate its execution time in a given environment.
The sequential results serve as the baseline for comparison with the
results for the parallel algorithm PARC4. In order to evaluate all data
files, GCC compiler has been used for compilation, -O3 is used to support
third level of optimization and -March=native to enable usage of CPU
specific instructions. All the data has been tested on a server having
configuration as mentioned in Section 2.3 of Chapter 2.
To assess the parallel framework, all tests use the text files from t5-
Corpus11(Roussev, Roussev, 2011) with some changes (Required
sequence of bits). From Table 4.1 to 4.5, the time taken by RC4 and
PARC4onmultiple cores has been shown. The first column of each of the
![Page 77: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/77.jpg)
57
table show the number of input bytes used for encryption and decryption.
Second and third column shows the execution time (in seconds) for
encryption and decryption processes and last column shows the overall
time taken by both the processes on AMD FX(tm) - 8320 , eight core
processor running @ 3.5 GHz machine.
Table 4-1: Time (In Seconds) taken by RC4 to encrypt/decrypt large data files by
uniprocessor
Size of input
data [In GB ]
Encryption Decryption Overall Time
0.1 1.31785 1.29868 2.61653
0.2 2.64969 2.62575 5.27544
0.3 3.87678 3.85207 7.72885
0.4 5.29639 5.24420 10.54059
0.5 6.46599 6.41855 12.88454
0.6 7.73803 7.67800 15.41603
0.7 9.02553 9.16311 18.18864
0.8 10.58992 10.93706 21.52698
0.9 11.59993 11.85782 23.45775
1.0 12.89147 12.97361 25.86508
Table 4-2: Time (In Seconds) taken by PARC4 to encrypt/decrypt large data
files using 2 Cores
Size of input
data [In GB ]
Encryption Decryption Overall Time
0.1 0.67892 0.67872 1.35764
0.2 1.33886 1.33878 2.67764
0.3 2.0177 2.0176 4.03528
0.4 2.69656 2.69636 5.39292
0.5 3.37532 3.37524 6.75056
0.6 4.05413 4.05407 8.1082
0.7 4.73296 4.73288 9.46584
0.8 5.41179 5.41169 10.8235
0.9 6.09061 6.09051 12.1811
1.0 6.76942 6.76934 13.5388
![Page 78: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/78.jpg)
58
Table 4-3: Time (In Seconds) taken by PARC4 to encrypt/decrypt large data
files using 4 cores
Size of input
data [In GB ]
Encryption Decryption Overall Time
0.1 0.399695 0.399295 0.79899
0.2 0.799947 0.799943 1.59989
0.3 1.199905 1.199903 2.39981
0.4 1.640765 1.640725 3.28149
0.5 1.99994 1.9999 3.99984
0.6 2.362748 2.362742 4.72549
0.7 2.797925 2.797885 5.59581
0.8 3.299505 3.299465 6.59897
0.9 3.649495 3.649455 7.29895
1.0 3.974905 3.974865 7.94977
Table 4-4: Time (In Seconds) taken by PARC4 to encrypt/decrypt large data
files using 6 cores
Size of input
data [In GB ]
Encryption Decryption Overall Time
0.1 0.25999 0.25995 0.51994
0.2 0.499937 0.499933 0.99987
0.3 0.769528 0.76952 1.53905
0.4 1.04601 1.04597 2.09198
0.5 1.28548 1.28547 2.57095
0.6 1.537027 1.537023 3.07405
0.7 1.844995 1.844955 3.68995
0.8 2.14746 2.14726 4.29472
0.9 2.33866 2.3386 4.67726
1.0 2.574919 2.574911 5.14983
![Page 79: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/79.jpg)
59
Table 4-5: Time (In Seconds)taken by PARC4 to encrypt/decrypt large data files
using 8 cores
Size of input
data [In GB ]
Encryption Decryption Overall time
0.1 0.18687 0.18683 0.3737
0.2 0.37683 0.37677 0.7536
0.3 0.55207 0.55203 1.1041
0.4 0.75287 0.75283 1.5057
0.5 0.92033 0.92027 1.8406
0.6 1.10114 1.10106 2.2022
0.7 1.29918 1.29912 2.5983
0.8 1.53763 1.53757 3.0752
0.9 1.67557 1.67553 3.3511
1.0 1.84755 1.84745 3.695
The encrypted and decrypted text using PARC4 has been given in Appendix-D
for reader’s reference . OpenMP implementation for the same has been given in
Appendix E
4.6 Performance and Scalability Analysis
To examine the benefits of parallelism, a number of metrics, such as
speedup, efficiency, complexity, and scalability, have been used to
measure performance of proposed algorithm. We discuss these metrics
next.
4.6.1 Speedup
A serial algorithm is typically assessed in terms of its execution time
which is stated as a function of its input size. In contrast, the execution
time of a parallel algorithm is determined by the input size as well as the
parallel structural design and the number of processing elements
employed. With this, the speedup is well-defined as the ratio of the time
taken to solve a problem using a single processing element to the time
required to execute the same problem using a parallel computer with p
identical processing cores. From the Tables 4.1 to 4.6, it can be observed
![Page 80: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/80.jpg)
60
that PARC4 results in speedup corresponding to the number of cores
being used for experiments. Fig. 4.5 shows the speedup comparison for
~1GB of data file by using PARC4 on multiple cores.
Fig 4:5: Speedup comparison of PARC4 using multiple cores
It is visible from the graph above that speedup is increasing as the number
of cores are increasing. But as per the conclusions drawn from the
Amdahl’s law, speedup tends to saturate and efficiency drops at some
specific point which depends on the sequential portion of the executing
code. If the processing elements and problem data are increasing, the
overhead of decomposition and distribution of tasks among processors are
also increased.
Similarly, if execution time is observed for large input streams like ~1 GB
of data, the total time for executing the algorithm on the complete data is
a function of number of cores employed to complete the task. This way it
can be inferred that after adding two additional cores for constant file size,
the execution time will be half of the time the algorithm takes to execute
on less number of cores. Figure 4.6 illustrates the similar scenario.
![Page 81: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/81.jpg)
61
Fig 4:6: Speedup for constant data using multiple cores
4.6.2 Efficiency
Although speedup measures performance gains for multiple cores
compared to a single core, it does not provide information whether the
processing elements or cores used in the parallel computer are being used
efficiently. The efficiency of a given problem on n processing elements,
E(n), is defined as the ratio of the speedup attained and the number of
processors used to attain the speedup.
(4.6)
In Equation 4.6, let T (1) = 25.8 seconds, T (n) = 3.6 seconds and n=8 to
process ~ 1GB data. The Efficiency of PARC4 is: = = 0.89,
since , there is . PARC4 achieved efficiency
in that range as
.
![Page 82: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/82.jpg)
62
Table 4-6: Efficiency as a function of n and p for running n blocks on p
processors to encrypt input stream
Input size [GB] P=4 P=6 P=8
0.6 0.81 0.83 0.87
0.7 0.82 0.84 0.9
0.8 0.83 0.85 0.87
0.9 0.81 0.84 0.89
1.0 0.81 0.84 0.89
It is visible from Table 4.6 that for a given problem size, as the number of
processing elements increase, the overall efficiency of the parallel system
increases. Secondly, the efficiency of a parallel system remains almost
constant if the problem size is increased while keeping the number of
processing elements constant. This is due to the fact that implementation
of PARC4 is based on the data parallel model, in which each processing
core has equal distribution of tasks from the centralized task pool.
4.6.3 Complexity and Cost optimality
The cost of running a program on a SMP is given as the product of
parallel execution time and the number of cores or processing elements
employed for that program. It replicates the sum of the execution time that
each core spends solving the programmable task. The cost of solving a
specific task using single core is execution time of the fastest known
serial algorithm. “A parallel system is said to be cost optimal if the cost of
solving a problem on a parallel computer has the same asymptotic growth
as a function of the input size as the fastest known sequential algorithm on
a single processing element(Kumar et al., 1994). Since efficiency is the
ratio of sequential cost to parallel cost, a cost optimal or pTp optimal
system has as efficiency of O(1). But due to parallel overhead involved,
the efficiency of 1 is never achieved”(Kumar et al., 1994).
Furthermore, if n is representing as number of data blocks and p is
number of cores where n>p, n/p steps are required to execute the n blocks
in parallel and in each step. There are256*n iterations executing
![Page 83: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/83.jpg)
63
simultaneously. Thus the overall parallel execution time for PARC4
is , where n is the number of parallel blocks and m is 256. For
example, if input is 512 bytes then n =2. Further, (n*256)/p is half less
than the original algorithm. Therefore, parallel execution time can be
defined as:
(4.7)
Consequently, its cost is:
Cost = (4.8)
(4.9)
= (4.10)
= (n*m) (4.11)
= (n) (4.12)
In equation 4.11, n is the number of blocks multiplied by 256 which is
equal to serial n iterations. This proves it remains linear in nature making
PARC4 cost optimal.
4.6.4 Scalability
“The ability to maintain efficiency at a fixed value by concurrently
increasing the number of cores and the size of the problem is unveiled by
many SMPs. Such systems are scalable parallel systems(Kumar et al.,
1994)”. In Table 4.7, it is visible that after increasing problem size and
number of cores, the efficiency remains almost the same. It indicates that
the proposed algorithm is supporting scalability feature as it keeps the
efficiency fixed by increasing the problem size and processing elements
instantaneously.
![Page 84: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/84.jpg)
64
4.6.5 Throughput
Throughput can be calculated as plaintext in bytes or bits divided by the
time taken to encrypt and decrypt that data. Figure 4.9 shows the fair
comparison of throughput achieved using RC4 and PARC4 for ~0.5 GB
of data. Here, PARC4 is executed on eight cores. It shows PARC4 is
providing much higher throughput as compared to RC4 running on a
single core.
Fig 4:7: Comparison for throughput achieved using RC4 and PARC4
4.7 Comparative Analysis
As mentioned in Chapter, several hardware or FPGA based
implementations are available to parallelize RC4 stream cipher. Those
implementations cannot be compared with this technique because of
different technology. Thus, in this section, PARC4 is compared with a
multithreaded approach proposed by T.D.B Weerasinghe (Weerasinghe,
2014). Although both approaches are extremely different due to the
different platforms used because parallel computing using symmetric
multiprocessors offers large scope for parallel programming using API
such as OpenMP as compared to multithreading which takes benefit of the
CPU idle time. Due to the similar concepts and common agendas, both
the techniques can be compared on the basis of few parameters:
![Page 85: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/85.jpg)
65
4.7.1 Mapping and Load Balance
Using threading in Java like technologies, there is no assurance that all
available cores are being used efficiently. Moreover, JVM takes care of
creating and assigning threads. So it is an implicit procedure but on the
contrary, parallel programming explicitly breaks the task into smaller
chunks, where each chunk can be executed on an individual core. This
way one can have multiple parts of the same program being executed in
parallel. As it is an explicit approach, programmer has to take care of
mapping between the processes and cores.
4.7.2 Modified Key stream
In multithreaded approach, same key stream is used for all the file chunks.
This can affect security of the cipher. In the contrary modified key stream
is used in PARC4 algorithm so that each individual block can have
different key stream.
4.7.3 Energy Efficiency
If power consumption of an application needs to be reduced, each core
should be operated at low frequency and voltage. Single core with low
frequency and voltage will lower the performance further. Thus,
multithreading on a single core will work only with high frequency and
voltage which will ultimately consume more power. This has been
discussed in detail in Chapter 8.
On the basis of above parameters, the following conclusion can be drawn
that PARC4 uses better and futuristic approach as compared to
multithreaded approach. Below table shows the comparison in both
approaches.
![Page 86: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/86.jpg)
66
Table 4-7: Comparison between PARC4 and Multithreaded approach
Conclusion
This chapter introduces PARC4 algorithm, a parallel approach to the well-
known RC4 cipher algorithm. The parallel implementation of the
algorithm is based on PASCS framework which can be used to implement
any stream cipher in parallel.
The total time for executing the algorithm on the complete data is a
function of number of cores employed to complete the task. As the
number of processing elements increase, the overall efficiency of the
parallel system increases. Secondly, the efficiency of a parallel system
remains almost constant if the problem size is increased while keeping the
number of processing elements constant. Due to its high efficiency,
PARC4 is also cost optimal. PARC4 provides much higher throughput as
compared to RC4 running on a single core
Parameters RC4 using Multithreading PARC4
Processor/Technology Core i3
2 cores/4 threads
AMD FX(tm) 8 core
8 cores/8 threads
Energy Efficient No Yes
Type of programming Implicit (No intervention of
programmer to map the
processes onto multiple
cores/threads. Programming
environment will take care
of that)
Explicit (Programmer
can map the processes
onto multiple cores
according to the
requirement)
![Page 87: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/87.jpg)
67
5
Chapter 5
PARC4-I: Parallel RC4A
using PASCS and loop
unrolling mechanism
![Page 88: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/88.jpg)
68
CHAPTER 5
PARC4-I: PARALLEL RC4A USING PASCS AND
LOOP UNROLLING MECHANISM
This chapter introduces a parallel algorithm to encrypt/decrypt large data files
and to secure communications over a channel. In this parallel approach, some
revisions have been done in the existing KSA and PRGA algorithms of RC4A in
order to produce a random key stream and to generate more than one output byte
of data during each of the iterations for parallelization. PASCS framework has
been used for parallelization along with loop unwinding technique for code
optimization purposes. This revised and parallel algorithm is then termed as
PARC4-I.
5.1 Introduction
RC4A is one of the strongest alternatives for the RC4 algorithm. It was
proposed by Bert and Preneel(Paul and Preneel, 2004). It has a modified
key stream generator which enables stronger security than RC4. Most of
the attacks on RC4 are less effective on RC4A. Moreover, RC4A requires
fewer instructions per output byte and it is feasible to make use of the
inbuilt parallelism to get better performance.
The rest of the chapter is organized as follows: The process of adding
parallelism along with the use of parallel techniques has been discussed in
Section 5.3. We discuss the results on a set of data files of various sizes
along with the speedup calculations in the Section 5.4. The performance
analysis for the same has been discussed in the next section. In Section 5.5,
PARC4-I is compared with PARC4 to observe speedup gains followed by
conclusions.
5.2 Modified KSA and PRGA
In the KSA algorithm, a randomly chosen key k1 is supplied to the key
generator which then produces three more keys k2, k3, and k4 using k1 as
![Page 89: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/89.jpg)
69
the seed .There are four S-boxes S1,S2,S3 and S4 that are initialized using
different keys. As discussed in Chapter 2 , S1 and S2 are two random
permutations of N-1(Abinash Roy, 2008). In this modified scheme there are
four distinct S-Boxes, all assumed to be random permutations of (N-1) and
it is assumed to generate a uniform distribution of permutation of (N −1).
Algorithm 5.1 shows the steps of generating the four distinct key bytes
which are then use to encrypt four plaintext bytes. All the arithmetic
operations are based on modulo N where N is 256. The transition of the
internal states of the four S-boxes are based on an exchange shuffle as RC4.
In order to generate four bytes, four distinct variables j1, j2, j3 and j4
corresponding to the four S-Boxes have been introduced. The only
modification is that the index-pointer S1[i]+S1[j] evaluated on S1
generates output from lookup S-Box S2 and vice-versa for all of the four
bytes. Please see the steps 1.4, 1.7, 1.10, and 1.13 of Algorithm 5.1. The
next round starts after each output generation.PARC4-I with new KSA and
PRGA schemes that use fewer instructions per output byte as compared to
the RC4. To generate four successive output bytes, the index i pointer is
incremented once in the case of the PARC4-I algorithm whereas it is
incremented four times to produce as many output words in the RC4
algorithm. The RC4A produces two output bytes at each iteration.
5.3 Incorporating parallelism
The input text is divided into fixed size blocks, where each block size is
256 bytes. Afterwards, multiple data blocks are encrypted simultaneously
using PRGA. As discussed in Section 5.2, at each index pointer increment
PRGA generates four distinct bytes, therefore, first four bytes of plaintext
can be fetched altogether to encrypt or decrypt. Using the loop unrolling
technique this process can be accomplished efficiently, and finally, the
output of each block is concatenated to make the complete cipher text. The
overhead associated with the function calls is also reduced by using this
method because for every 64 bytes of data PRGA is executed only 16 times
![Page 90: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/90.jpg)
70
Procedure: Pseudo random number generator
Input: Four S-Boxes: S1, S2, S3, S4
Output: Four distinct key bytes
Declare: i, j1, j2, j3, j4
Initialize: i, j1, j2, j3, j4 are set to 0
Repeat steps until i=255
i: = i + 1
calculate j1:= j1 + S1 [i]
swap values of S1 [i] and S1 [j1]
output S2 [S1 [i] + S1 [j1]]
calculate j2:= j2 + S2 [i]
swap values of S2 [i] and S2 [j2]
output S1 [S2 [i] + S2 [j2]]
calculate J3:= j3 + S3 [i]
swap values of S3 [i] and S3 [j3]
output S4 [S3 [i] + S3 [j3]]
calculate J4:= j4 + S4 [i]
swap values of S4 [i] and S4 [j4]
output S3 [S4 [i] + S4 [j4]]
End
Algorithm 5.1 Enhanced pseudo-random generation algorithm (PRGA)
instead of 32 in RC4A or 64 times in RC4. Figures 5.1 depicts the process
of encryption/decryption of PARC4-I and for reader’s reference the parallel
implementation of PARC4-I using OpenMP has been given in Appendix-D.
![Page 91: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/91.jpg)
71
F ig. 5.1 Method used to implement PARC4-I on SMPs
![Page 92: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/92.jpg)
72
The Algorithm for PARC4-I is listed below:
Algorithm 5.2 Method use to parallelize multiple data chunks using PARC4-I
5.3.1 Techniques to enhance benefits of parallelization
PARC4-I uses parallel techniques similar to PARC4. Additionally, it is
using loop unrolling method to further optimize the performance. PARC4
cannot use this technique due the intensive swap functionality and
because it returns only one byte per call whereas PARC4-I returns four
distinct bytes. We will next take a look at the loop unrolling method to
optimize the code.
Loop unrolling: It is also acknowledged as loop unwinding (Krall and
Lelait, 2000).It is a loop alteration technique that tries to optimize a
Procedure: Encryption
Model: Data Parallel Model and loop unrolling with P processors [P=2, 4, 6, 8]
Input: Plaintext in the form of small chunks [Chunk Size = 256], Number of blocks
Output: Encrypted text
Declare: Plaintext, BlockID as shared variables, i as private variable to each processing core
Initialize: Number of blocks= Size of plaintext / 256
1. ParBegin
2. For ALLBlockID: [0, Number of blocks] IN SYNC
2.1 Set Start=BlockID*256 and End=Start+256
2.1.1 For i=start to End-1,4 do
2.1.1.1 Output-1= ((key_byte-1+ block id[[i]) Mod 256) XOR msg_byte-1
2.1.1.2 Output-2= ((key_byte-2+ block id[[i+1]) Mod 256) XOR msg_byte-2
2.1.1.3 Output-3= ((key_byte-3+ block id[[i+2]) Mod 256) XOR msg_byte-3
2.1.1.4 Output-4= ((key_byte-4+ block id[[i+3]) Mod 256) XOR msg_byte-4
2.1.2 End for
2.2 Concatenate all blocks to make it complete Ciphertext corresponding to plaintext
3. ParEnd
![Page 93: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/93.jpg)
73
program's execution speed at the cost of its code size. The conversion can
be undertaken manually by the programmer. The objective of loop
unwinding is to boost a program's speed by dipping instructions that
control the loop, for example ‘end of loop’ test on every iteration,
dropping branch penalties, and hiding latencies, particularly, the waiting
time used to read data from memory. To eliminate this overhead, one can
use the mechanism in which loops can be re-written as a repetitive series
of similar independent statements.
This implementation has used static loop unrolling in which the
programmer analyzes the loop and convert the iterations into a series of
directions which will diminish the loop overhead.
A simple example of static loop unrolling used in this implementation is:
A function in a computer program adds 100 items from an array. This is
usually done using simple for-loop which calls the function add
(item_number). If the size of the loop is 100 then the method “add” is
called 100 times. There is loop iteration overhead each time. In order to
reduce this loop overhead, loop unrolling can be used.
Normal loop
inti;
for (i = 0; i< 100; i++)
{
Add(i);
}
![Page 94: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/94.jpg)
74
After loop unrolling
inti;
for (i = 0; i< 100; i+=5)
{
Add(i);
add(i+1);
add(i+2);
add(i+3);
add(i+4);
}
According to the revision, the new program will make just 20 iterations
rather than 100. As a result, merely 20% jumps and conditional branches
are required. To generate the maximum benefit, there should be no
variable specification in the unrolled code that necessitates pointer
arithmetic. This normally requires "base plus offset" addressing
mechanism, instead of indexed referencing.
Conversely, it should be observed that the manual loop unwinding
expands the size of the source code from 3 lines to 7, that have to be
checked, debugged and the compiler can have to assign extra registers to
accumulate variables in the extended loop, Moreover the control variables
and number of operations within the body of the loop have to be elected
cautiously so that the result should be same as in the original code. Fig.
5.2 depicts the process of loop unrolling.
PARC4-I has been developed using the same framework which has been
used to develop PARC4. A similar key generator is being used to develop
both the parallel algorithms. Thus, PARC4-I can be considered as secure
as the original algorithm.
![Page 95: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/95.jpg)
75
Fig. 5.2 Pictorial representation of normal and unwinding loop
5.4 Experimental Results
To estimate the performance gains, the sequential RC4A cryptographic
algorithm has been executed to evaluate the execution time. The sequential
results serve as the baseline for comparison with the results of the improved
parallel algorithm PARC4-I. The same compiler along with same
configuration options have been used to compile all data files. To disable
debugging, Compiler option –g0 is used. Similarly, -O3 is used to support
third level of optimization and -March=native to enable usage of CPU
specific instructions. All the data has been tested on a server having
configuration as mentioned in Section 2.3 of chapter 2.
![Page 96: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/96.jpg)
76
To assess the parallel framework, all of the tests use the text files from t5-
Corpus11 (Roussev). Table-5.1 to Table 5.5 shows the execution time taken
by RC4A on a single core and that of PARC4-Ionmultiple cores. The first
column of each of the table shows the number of input bytes used for
encryption and decryption. The second and the third columns show the
execution time, which has been measured in seconds, for the encryption
and decryption processes and the last column shows the overall time taken
by both the processes.
Table 5.1 Time taken by RC4A to encrypt/decrypt large data files by uniprocessor
system
Data files [in GB ] Encryption time Decryption time Overall time
0.1 1.73520 1.78436 1.35034
0.2 1.58461 1.5705 3.15511
0.3 2.52451 2.50427 5.02878
0.4 3.59566 3.57956 7.17522
0.5 4.77162 4.54654 9.31816
0.6 6.5028 6.21054 12.71334
0.7 7.8804 7.14834 15.02874
0.8 8.91011 8.77074 17.68085
0.9 9.43049 9.59899 19.02948
1.0 10.71573 10.49485 21.21058
Table 5.2 Time taken by PARC4-I to encrypt/decrypt large data files using 2
Cores
Data file [In GB ] Encryption time Decryption time Overall time
0.1 0.357585 0.337585 0.69517
0.2 0.809525 0.789525 1.59905
0.3 1.317195 1.297195 2.61439
0.4 1.903805 1.883805 3.78761
0.5 2.38954 2.36954 4.75908
0.6 3.238335 3.218335 6.45667
0.7 3.827185 3.797185 7.61437
0.8 4.432125 4.402125 8.82425
0.9 4.78737 4.75737 9.53474
1.0 5.317645 5.287645 10.61529
![Page 97: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/97.jpg)
77
Table 5.3 Time taken by PARC4-I to encrypt/decrypt large data files using 4 cores
Data Files [In GB ] Encryption time Decryption time Overall time
0.1 0.17993 0.17989 0.35982
0.2 0.41564 0.41558 0.83123
0.3 0.66285 0.66277 1.3256
0.4 0.97849 0.97845 1.95694
0.5 1.24622 1.24616 2.49238
0.6 1.68213 1.68207 3.36421
0.7 2.01168 2.01162 4.0233
0.8 2.36608 2.36602 4.7321
0.9 2.50562 2.50559 5.0112
1.0 2.81159 2.81151 5.62311
Table 5.4 Time taken by PARC4-I to encrypt/decrypt large data files using 6 cores
Data Files [In GB ] Encryption time Decryption time Overall time
0.1 0.11847 0.11843 0.2369
0.2 0.2766 0.276 0.5526
0.3 0.4415 0.4407 0.8822
0.4 0.6297 0.6291 1.2588
0.5 0.8177 0.8171 1.6348
0.6 1.1157 1.1148 2.2304
0.7 1.31328 1.31322 2.6265
0.8 1.55089 1.55081 3.1017
0.9 1.6659 1.6655 3.3314
1.0 1.8606 1.8605 3.7211
Table 5.5 Time taken by PARC4-I to encrypt/decrypt large data files using 8 cores
Data files [In GB ] Encryption time Decryption time Overall time
0.1 0.09248 0.09242 0.1849
0.2 0.21614 0.21606 0.4322
0.3 0.34438 0.34432 0.6887
0.4 0.49144 0.49136 0.9828
0.5 0.63823 0.63817 1.2764
0.6 0.87077 0.87073 1.7415
0.7 1.0294 1.0293 2.0587
0.8 1.21104 1.21097 2.422
0.9 1.30337 1.30333 2.6067
1.0 1.45277 1.45273 2.9055
![Page 98: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/98.jpg)
78
The encrypted and decrypted text using PARC4-I has been given in
Appendix-D for reader’s reference. OpenMP implementation for the same
has been given in Appendix F.
5.5 Performance and Scalability Analysis
Various metrics have been used to measure and analyses the performance
benefit of proposed algorithm over serial algorithm.
5.5.1 Parallel Run Time
The parallel run time is a measure represented as PT (n). This is referred
as the execution time of a parallel program on a symmetric multi-
processor having n number of cores. PT (1) denotes the execution time of
a serial program on single processor. Figure 5.5 is visualizing the parallel
run time on each core using PARC4-I.
Fig. 5.3 Execution time of 1 GB of data file using PARC4-I
It is clear in the above graph that the execution times reduce nicely as the
cores increase for a given data set.
5.5.2 Speedup
Speedup is a quantitative measure of performance gain that is
accomplished by a parallel algorithm when executed on SMPs over a
![Page 99: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/99.jpg)
79
sequential implementation running on a single processor system. But to
capture the relative benefit of running a program in parallel, the sequential
algorithm should be the fastest algorithm. After observing the results from
Tables 5.1 to Table 5.5, it can be deduced that PARC4-I results in a 7.3X
speedup on eight cores and 5.7X on 6 cores. Figure 5.6shows the speedup
comparison of PARC4-I on multiple cores.
Fig. 5.4 Speedup comparison using multiple cores
It is clear from the above graph that speedup is increases nicely as the
number of cores increase. But as per Amdahl’s law, speedup tends to
saturate and efficiency can drop at some specific point depending upon
the available parallelism in the program. The point where saturation
occurs depends on the type of parallel execution model used to execute
the program, the size of data associated with the program and the number
of cores to process the data.
Similarly, if execution time for large input streams like 1GB of data, the
total time for executing the complete data is the function of number of
cores employed to complete the task. This way it has been inferred that
after adding cores for a given file size, the execution time will continue to
see reduction with increasing number of cores until some point.
![Page 100: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/100.jpg)
80
5.5.3 Efficiency
This metric reflects the efficiency of all processing elements working
together. Basically, it is a function of load balancing. The efficiency of a
given problem using n processing elements, E (n), is defined as the ratio
of the speedup achieved and the number of processors used to achieve it:
(5.1)
In equation 5.1, Let T (1) = 21.21 seconds, T (n) = 2.9 seconds and n=8 to
process 93, 41, 59,360 bytes (~1 GB). The Efficiency of PARC4-I is: =
0.91, since , there is . PARC4-I
achieved for increasing p and n.
5.5.4 Scalability
As stated above, PARC4-I maintained efficiency 0.9 for all increasing
cores with respect to scalable data. This property of proposed algorithm
indicates that the algorithm is scalable because by increasing the problem
size and processing elements it keeps the efficiency fixed.
5.6 Comparison between PARC4 and PARC4-I
The two implementations, PARC4(Handa and Kapoor, 2014) and PARC4-I
have been compared on the bases of below mentioned parameters.
5.6.1 Parallel Run time
Consider 93, 41, 59,360 bytes (~ 1GB) to process on eight processing elements
using PARC4 and PARC4-I. The time taken by PARC4 is 3.69 seconds whereas
PARC4-I has taken 2.90 seconds which is less than time taken by PARC4.
Please see Figure 5.5 for a comparison for all different sizes of data.
![Page 101: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/101.jpg)
81
Fig 5.5 Graphical representation of parallel run time of PARC4 vs PARC4-I on eight
cores
5.6.2 Speedup
As PARC4-I is using loop unrolling optimization technique for additional
speedup and, it achieves higher speedup compared to PARC4. Figure 5.6
represents the speedup comparison between both the approaches.
Fig 5.6 Comparison between PARC4 and PARC4-I for speedup
![Page 102: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/102.jpg)
82
In above figure, the speedup is calculated for the same input stream size
(93, 41, 59,360 bytes that is ~ 1 GB data) and using four, six and eight
cores. It is worth noting that on four and six cores PARC4-I has given
slightly larger speedup as compared to speedup using eight cores. This is
because of parallel overhead is involved in distributing the task on
multiple cores. Thus it can be concluded that using fewer number of cores
PARC4-I is more effective as compared to PARC4.
5.6.3 Efficiency
There is a slight difference between the efficiency achieved by PARC4
and PARC4-I because both uses the same concept to implement
parallelism. The only difference between both is the usages of
optimization technique i.e. loop unrolling. In PARC4, the loop has to
execute 256 times whereas in PARC4-I only 64 iterations are required.
Consider Figure 5.7 which is representing the similar scenario.
Fig. 5.7 Comparison of PARC4 and PARC4-I for Efficiency
![Page 103: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/103.jpg)
83
5.6.4 Loop overhead
As discussed in efficiency metric, loop overhead in PARC4-I is one
fourth of as compared to PARC4. This is because of the loop unrolling
method.
5.6.5 Throughput
Here, throughput is the total number of bytes the algorithm can process in
a given time period and it can be calculated using total number of input
bytes / execution time. Considering 93, 41, 59,360 (~1GB) number of
input bytes processed using eight processing cores, Figure 5.8 is
representing the comparison of PARC4-I and PARC4 with respect to
throughput achieved. It shows that PARC4-I has higher throughput than
PARC4.
Fig. 5.8 Comparison between PARC4 and PARC4-I algorithms for throughput
All above parameters conclude that PARC4-I algorithm is faster than
PARC4 algorithm due to less loop overhead. Table 5.6 outlines the
comparative analysis between both the algorithms.
![Page 104: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/104.jpg)
84
Table 5.6 Comparison between PARC4 and PARC4-I
Parameters PARC4 PARC4-I
Parallel run time [1GB of
data file]
25.8 21.2
Loop overhead High Low
Efficiency 0.89 0.91
Speedup with eight cores 7 7.3
Throughput [MB/s] 252.8
322.12
Conclusions
This chapter introduces PARC4-I, a parallel approach to the well-known
RC4A cipher algorithm. The basic idea behind this implementation is to use
some loop unrolling optimization techniques along with the parallel
methodology to improve the performance gains. The implementation shows
promising results with the use of loop unrolling method. The following
conclusions are drawn from the discussion about this implementation. As a
result of use of the PASCS framework along with the loop unrolling
optimization techniques, we get better performance gains along with the
gains in efficiency and throughput. In addition, the new algorithm is as
scalable as the algorithm that doesn’t use the optimization.
![Page 105: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/105.jpg)
85
Chapter 6
Design of Parallel
Independent Feistel Network
![Page 106: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/106.jpg)
86
6 CHAPTER 6
DESIGN OF PARALLEL INDEPENDENT FEISTEL
NETWORK
A Feistel network is a symmetric structure used in the construction of block
ciphers. Many popular block ciphers use this network, including the Data
Encryption Standard (DES) and Blowfish algorithms. The Feistel structure has
the advantage that encryption and decryption operations are very similar.
Therefore the size of the code or circuitry required to implement such a cipher is
cut down in half. This chapter focuses on the design of the Parallel Independent
Feistel Network which has been used to develop a Feistel network based parallel
block cipher algorithm for faster execution.
6.1 Introduction
A block cipher encrypts or decrypts fixed-length data called blocks. The
block ciphers fall in the category of deterministic algorithms that means for
a given input, they always produce the same output. Majority of block
ciphers are based on two principles:
Substitution: Plaintext elements as individuals or group of elements are
exclusively substituted by corresponding cipher text elements or group of
elements.
Permutation: A sequence of plaintext elements is replaced by a permutation
of that sequence. That is, no elements are added or deleted or replaced in
the sequence, rather the order in which the elements appear in the sequence
is changed.
Based on these two principles, there are two types of structure to encrypt
blocks:
1) Feistel Network
2) Substitution-Permutation Network(Heys and Tavares, 1995)
![Page 107: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/107.jpg)
87
Numerous block ciphers are based on the Feistel framework(Choy et al.,
2009). Figure 6.1 depicts the concept of the Feistel structure. Such a
framework consists of numeral identical rounds of processing. In each
round, an exchange is done on one half of the data being processed,
followed by a transformation that swaps the two halves. The unique key
which is used in the enciphering process, is extended so that a different
key is used for every round. Using the complete process, each block of
plaintext is processed one by one in a sequence, due to which the whole
procedure of encryption becomes very slow. In this chapter, a parallel
independent Feistel network to encrypt multiple data blocks concurrently
for faster execution.
Fig. 6.1 Structure of Sequential Feistel Network (William, 2006)
![Page 108: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/108.jpg)
88
The rest of the chapter is organized as follows: In section 6.1, a brief
introduction of block cipher and its building blocks, is described. Section
6.2 specifies the motivation and requirement for parallel architecture. PIFN
Design is presented in Section 6.3. The chapter includes a discussion on the
application areas of PIFN followed by the conclusions.
6.2 Motivation for Parallel architecture
In case of encoding data using block cipher algorithms, the entire process is
based on block- by- block encryption. As shown in Fig. 6.1, one block of
plaintext block goes through the process to generate the same length of
cipher-text block using the Feistel network. This sequential process doesn’t
take advantage of today’s multiple-core processors with large and shared
memory systems that are available. Parallel Feistel network is designed
with the basic idea of data parallelism. For example: 16,77,7216 bytes of
data with a single block with 64 bytes results in (16, 77, 7216 /64) or 26,
2144 blocks of data. These blocks can run concurrently to encrypt the
entire 16,77,7216 bytes of data. Parallelization will help block ciphers to
execute multiple data blocks simultaneously and provide better results in
terms of execution time.
6.3 Design of Parallel Independent Feistel Network Structure
The essence of this approach is to develop a parallel Feistel network, in
which multiple blocks of same length can be executed on multiple cores of
a processor to produce the cipher text. This network is based on Electronic
Code Book (ECB) mode of operation where each block is independent of
each other so to support parallelism. Fig-6.2 shows the complete parallel
process of encrypting multiple fix size blocks using PIFNS.
![Page 109: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/109.jpg)
89
Fig. 6.2 Parallel Independent Feistel Network Structure
According to the Fig. 6.2, the input to the network is complete message that
needs to encrypt. Message is being divided into fix-length blocks. Further,
each plaintext block is divided into two halves, Left and Right halves. Both
halves of the data pass through n rounds of processing and then combine to
produce the cipher text blocks of equal length. Each round has a left-side
and a right-side input that is derived from the previous rounds, as well as a
sub key Ki derived from the overall Key K. This parallel structure can also
be used with N number of rounds where N can be any positive number. A
substitution is performed on the left half of the data. This has been done by
applying a parallel round function F to the right half of the data and then
taking the exclusive-or with the left half of the data. Parallel function F
takes the right half of block of w bits and a sub-key of y bits, which
produces an output value of length w bits. Following this substitution, a
![Page 110: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/110.jpg)
90
permutation is performed that consists of the interchange of the two halves
of the data.
6.4 Application Area of PIFNS
All block ciphers of deterministic type can be parallelized with PIFNS
framework. These ciphers can be used for both password management as
well as file/disk encryption. A significant use case scenario of this
framework is to encrypt/decrypt large data files where execution time can
be a critical factor. In this thesis, Blowfish block cipher algorithm is
parallelized using PIFNS and the resulted algorithm is named as PBlock
which is discussed in detail in the next chapter.
Conclusion
Feistel networks have been used extensively for encryption but these
networks can be very slow due to the sequential nature of processing.
Today’s computing applications require encryption such as-, file encryption
and complete disk encryption. There is a great demand for some parallel
security structure that can process the tasks faster. PIFNS has been
developed for this purpose. PIFNS is a parallel framework based on the
scheme where multiple data blocks can be encrypted or decrypted
concurrently to achieve faster execution. In the next chapter, a parallel block
cipher called, PBlock using PIFNS framework will be discussed.
![Page 111: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/111.jpg)
91
7
Chapter 7
PBlock- Parallel approach for
Blowfish cipher using PIFN
![Page 112: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/112.jpg)
92
CHAPTER 7
PBLOCK- PARALLEL APPROACH FOR BLOWFISH
CIPHER USING PIFN
In this chapter, we discuss and explain PBlock, a parallel block cipher algorithm.
The parallel implementation is based on Parallel Independent Feistel Network
Structure, discussed in Chapter 6. Various metrics have been applied on the
parallel algorithm to measure its efficiency, speedup, and cost optimality. The
results prove that the proposed algorithm is faster and can be used in many
typical applications such as file and disk encryption and for securing
communications over the Internet.
7.1 Introduction
There are many benefits of using parallel programming on symmetric
multiprocessors (Keckler et al., 2009). Increasing the number of cores on a
single chip can help improve the performance and also reduce the energy
consumption of the system. The idea which serves as motivation for this
research is to observe if complex cryptographic algorithms can be
restructured to allow efficient parallel implementations. As a result, higher
performance can be achieved making these algorithms more applicable to
the long processes like full disk encryption or backing up software for
networked computers. The focus of the study is to lessen the time taken by
the block cipher encryption process for large data and redesign the security
algorithm so that it can utilize multiple cores if available in the computing
device. With the advent of the parallel computing era, there is no
requirement of extra hardware to benefit from parallelism. Also, almost all
smart phones have multiple cores(Gonzalez et al., 2009) but effectively
utilize all these processing elements, parallel algorithm implementation is
required.
![Page 113: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/113.jpg)
93
Blowfish is a private-key infrastructure based block cipher that can be used
as an alternative for DES or IDEA (Schneier, 1994). It has a variable-length
key, which ranges from 32 bits to 448 bits. It is ideal for both commercial
and domestic uses. It was designed by Bruce Schneier in 1993 as a fast
substitute for existing security algorithms. Blowfish is used in many
commercial products (Schneier). The chapter is focused on one major
application area of Blowfish that is file encryption.
Rest of the chapter is organized as follows: Section 7.2 explains the PBlock
implementation with PIFN framework, the design of the parallel Feistel
network along with parallel techniques used in the algorithm. In next
section, security analysis has been done to verify that the modified
algorithm is as secure as original one. The results on a large set of data files
along with the speedup calculations have been mentioned in Section 7.4. In
Section 7.5, performance analysis using various metrics has been done.
PBlock is compared with a pipelined approach in Section 7.5 and that is
followed by the conclusions from this work.
7.2 Implementation of PBlock using PIFNS
Blowfish algorithm is made parallel using the PIFNS system. Apart from
the requirement of faster execution of cipher, additional metrics are vital and
serve as motivation for this research. Possibly the most important of these is
the ability of the memory system to feed data to the processor at the required
rate. “There is a mismatch between processor speed and DRAM latency
and is normally bridged by a hierarchy of successively faster memory
devices called caches that rely on locality of data reference to deliver higher
memory system performance(Kumar et al., 1994)”. Parallel platforms
typically yield better memory system performance as compared to
sequential one. The reason being that the larger the aggregate caches and the
higher the aggregate bandwidth to the memory system; typically linear in
the number of processors. This argument can be extended to disks where
parallel platforms can be used to achieve high aggregate bandwidth to
secondary storage. The other benefit of parallelism is to get energy
![Page 114: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/114.jpg)
94
efficiency. In this section, the brief introduction of the parallel methodology
along with design of parallel F function has been discussed.
7.2.1 Parallel Methodology
PBlock execution model is based on the data parallel model and this model
can be easily mapped on to the PIFNS framework. According to the
model, “the tasks are statically mapped on to the processes and each task
performs similar operations on different data(Kumar et al., 1994)”. As all
tasks carry out similar set of computations, the decomposition of the
problem into tasks is based on data partitioning techniques because a
uniform partitioning of data followed by a static mapping is sufficient to
guarantee the load balance. Data parallel algorithms can be implemented in
both message passing paradigms as well as with shared address space
technique. For this implementation, the shared address space paradigm has
been used. The important characteristic of data parallel model is that for
most problems the degree of data parallelism increases with the size of the
problem, making it possible to use more processes/cores to effectively
solve larger problems. At the same time, reducing the interaction overhead
between concurrenttasks is important for an efficient parallel program. The
overhead that a parallel program incurs due to interaction among its
processes depends on many factors, such as the volume of data exchanged
during interactions, the frequency of interaction, the spatial and temporal
pattern of interactions. Thus to reduce interaction overheads following
techniques have been used (Kumar et al., 1994):
Maximizing Data Locality: The interaction overheads in a parallel program
can be reduced by using methods that support the use of local data or data
that have been recently fetched.
Minimize volume of data exchange: Another important technique for
reducing the interaction overhead is to minimize the overall amount of
shared data that needs to be accessed by parallel processes. This is similar
to maximizing the temporal data locality.
Minimize frequency of interactions: Minimizing interactions frequency is
important in reducing the interaction overheads in parallel programs
because there is a comparatively high startup cost related with each
![Page 115: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/115.jpg)
95
interaction on most of the architectures. Interaction frequency can be
reduced by redesigning the algorithm such that the shared data is accessed
and used in large chunks
Consider the algorithmic steps of PBlock encryption process and parallel F
function to discuss all above factors in detail. For reader’s reference the
parallel implementation of PBlock using OpenMP has been given in
Appendix-E.
Procedure: Parallel Encryption
Input: Plaintext
Output: Cipher text
Declare: Y, LHalf, RHalf, i, Lblock, Rblock, pi
1. Plaintext and Block size (as shared variable)
2. Y, LHalf, RHalf as (Private variable to increase data locality)
3. Divide plaintext into 64 bit n number of blocks
4. ParBegin
4.1 For i=0 to n-1:
4.2 Declare LHalf=Y, RHalf=Y+32(To create specific chunk size)
4.3 Divide each block into 32-bit halves: Lblock, Rblock
4.3.1 For i=1 to 16
4.3.2 Lblock = Lblock XOR pi
4.3.3 Rblock =F (Lblock) XOR Rblock [Figure 7]
4.3.4 Swap Lblock and Rblock
4.4 End For
5. ParEnd
6. Swap Lblock and Rblock again to undo the last swap
7. Then, Rblock=Rblock XOR p17 and Lblock=Lblock XOR p18.
8. Recombine Lblock and Rblock to get the cipher text
9. Finally recombine the output of all blocks to get cipher text
Algorithm 7.1 Algorithmic steps for Encryption process in PBlock
![Page 116: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/116.jpg)
96
Procedure: Feistel Function
Input: 32 bit Right Half
Output: 32 bit data
1. Declare a, b, c, d, y1, y2 and Message (32 bit input data which is
shared among each core)
2. ParBegin
2.1 Calculate a, b, c, d concurrently (private data to each core)
3. ParEnd
4. Synchronization constructs
5. ParBegin
5.1 calculate y1=s[0][a]+s[1][b] and y2= s[2][c]+s[3][d]
concurrently (private data to each core)
6. ParEnd
7. Synchronization constructs
8. calculate y3=y1^y2
Algorithm 7.2 Algorithmic steps for parallel F function
In Algorithm 7.1, line number 1 shows the declaration of two shared
variables plaintext and block size. These should be stored in shared
address space as this data needs to be accessed by each core. Moreover, the
variable holding value of block size should be known to each core.
Furthermore, line number 2 declares private data members. Each core
must have its own iterations, local variable to calculate data. Thus y,
LHalf, RHalf should be declared as private data members. If these data
members should not be declared as private to each core, the data will be
lost due to inter processor communication. Line number 3 calculates the
block size and from line 4 the parallel execution of multiple blocks has
been started. The implementation has been done in ECB mode, where each
block is independent to one another. To synchronize the work done by
multiple cores, synchronization constructs have been used. Similarly, in
Algorithm 7.2, line 1 declares a, b, c, d, y1, y2 as private variables and
![Page 117: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/117.jpg)
97
Message as the shared variable to each core so that the inter processor
communication between the different available cores can be reduced. Line
2.1, 5.1 and 8 declares different private data members for independent
calculation to increase data locality.
7.2.2 Design of Parallel F function
It takes 32-bit input data which is further divided into four eight bit
quarters. Each block references the S-Box and each entry of the S-box
output a 32 bit data. Further, the output of S-box 1, S-box 2 and the output
of S-box3, S-box4 will be added by different cores. Finally, XOR operation
has been applied on both the values and it provide 32 bit output. In this
method, the whole process is executed in three instructions as compared to
the sequential F function which takes seven instructions for the complete
process. Figure-7.1 represents the functionality of parallel F function.
Fig. 7.1 Graphical representation of F function
7.3 Security Analysis using Avalanche effect
A single change in the plain text or key should generate a change in
numerous bits in the cipher text. This process is called Avalanche
![Page 118: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/118.jpg)
98
effect(Webster and Tavares, 1986). If there are fewer changes, it may make
available a way to condense the size of the key space or plaintext to be
searched and consequently makes the cryptanalysis extremely effortless. So,
to say that any cryptographic algorithm is secure, it should exhibit strong
avalanche effect, and this is the reason that the thesis has considered
avalanche effect to make sure that the parallel implementation has not
compromised the security of the existing blowfish algorithm. If a single
change in plain text has been done, two bits in cipher text get affected using
existing algorithm and same happens with PBlock because the security
architecture of both algorithms are same. As listed below, text and tables
show that avalanche effect of blowfish and avalanche effect of PBlock is
same.
Modified plaintext and corresponding Cipher text generated by Blowfish:
Plaintext: “It is soon posted on the sci.crypt newsgroup, and from there to
many sites on the Internet. The leaked code was confirmed to b”
Cipher text:
”cc13b58d468422cfa4d491d475c8d78996b5db84a6a7b4be87469124801b2
edbbba75cc059712e6d5c10157aa52440ce85c6c9828de1581cd59d5c0d76c4
6826d616e79207369746573206f6e2074686520496e7465726e65742e20546
865206c65616b656420636f64652077617320636f6e6669726d656420746f2
062a0”
After changing 4 bit positions in plaintext: i.e “It is soon hosted on the
sci.crypt newsgroup, and from there to many sites on the Internet. The
peaked mode was confirmed to d”
Following changes are there in Cipher text :
“cc13b58d468422cfa4d4917575c8d78996b5db84a6a7b4be87469124801b2
edbbba75cc059712e6d5c1015a6a52440ce85c6c9828de1581cd59d5c0d76c4
6826d616e79207369746573206f6e2074686520496e7465726e65742e20546
865207065616b6564206d6f64652077617320636f6e6669726d656420746f2
064a0”
![Page 119: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/119.jpg)
99
After changing 11 bit positions in plaintext: i.e “It is very hosted in the
sci.crypt newsgroup, and home there to nany sites on the Internet. The
peaked mode was confirmed to d
Following changes are there in Cipher text :
cc13b58d4684b575581f917575c8d78996b542d49697b4be87469124801b2e
dbbba75cc059713abe871315a6a52440ce85cea107c881581cd59d5c0d76c46
826e616e79207369746573206f6e2074686520496e7465726e65742e205468
65207065616b6564206d6f64652077617320636f6e6669726d656420746f20
64a0
Similarly for PBlock:
Plaintext: “It is soon posted on the sci.crypt newsgroup, and from there to
many sites on the Internet. The leaked code was confirmed to b”
Following changes are there in Cipher text :
497420697320736f6f6e20706f73746564206f6e20746865207363692e63727
97074206e65777367726f75702c20616e642066726f6d20746865726520746
f206d616e79207369746573206f6e2074686520496e7465726e65742e20546
865206c65616b656420636f64652077617320636f6e6669726d656420746f2
062a0”
After changing 4 bit positions in plaintext: i.e “It is soon hosted on the
sci.crypt newsgroup, and from there to many sites on the Internet. The
peaked mode was confirmed to d”
Following changes are there in Cipher text :
“497420697320736f6f6e20686f73746564206f6e20746865207363692e6372
797074206e65777367726f75702c20616e642066q26f6d2074686572652074
6f206d616e79207369746573206f6e2074686520496e7465726e65742e2054
6865207065616b6564206d6f64652077617320636f6e6669726d656420746f
2064a0”
After changing 11 bit positions in plaintext: i.e“ It is very hosted in the
sci.crypt newsgroup, and home there to nany sites on the Internet. The
peaked mode was confirmed to d
The Ciphertext is:
![Page 120: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/120.jpg)
100
“4974206973207665727920686f7374656420696e20746865207363692e637
2797074206e65777367726f75702c20616e6420686f6d65207468657265207
46f206e616e79207369746573206f6e2074686520496e7465726e65742e205
46865207065616b6564206d6f64652077617320636f6e6669726d656418746
f2064a0”
Table 7.1 Avalanche effect in Blowfish and PBlock: change in plaintext
Bits changed in
plaintext
Number of bits changed in
Cipher text produced by
Blowfish
Number of bits changed
in Cipher text produced
by PBlock
4 5 5
11 31 32
19 66 68
Table 7.2 Avalanche effect in Blowfish and PBlock: change in key
Bits changed in
key
Number of bits changed in
Cipher text produced by
Blowfish
Number of bits changed
in Cipher text produced
by PBlock
4 5 5
11 31 32
19 66 68
7.4 Experimental Results
To study performance improvements achieved through the PBlock
algorithm, firstly, the sequential Blowfish cryptographic algorithm has been
executed to evaluate its execution time in a given environment. The same
compiler along with same configuration options have been used to compile
all data files. To assess the parallel framework, all tests uses the text files
from t5-Corpus11 (Roussev). Tables 7.3-7.7 show the time taken by
Blowfish and PBlock while executing on multiple cores. The first column
of each of the table show the number of input bytes used for encryption and
decryption. Second and third column shows the execution time [in seconds]
for encryption and decryption processes and last column shows the overall
![Page 121: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/121.jpg)
101
time taken by both the processes. All the data has been tested on a server
having below mentioned configuration:
AMD FX(tm) - 8320, eight core processor running @ 3500 MHz, 64 bit
operating system with 8 GB RAM.
Table 7.3 Time taken by Blowfish to encrypt/decrypt large data files by single processor
File Size[ GB ] Encryption time Decryption time Overall time
0.1 20.06685 20.11274 40.17959
0.2 40.06112 40.18968 80.25080
0.3 60.19072 60.87597 121.0667
0.4 80.11484 81.17861 161.2935
0.5 100.14973 101.49825 201.6480
0.6 120.16994 121.77476 241.9447
0.7 140.20940 142.06767 282.2771
0.8 160.52404 161.84168 322.3657
0.9 180.31254 182.75779 363.0703
1.0 200.59846 201.02998 401.6284
Table7.4 Time taken by PBlock to encrypt/decrypt large data files using 2 cores
File Size[ GB ] Encryption time Decryption time Overall time
0.1 10.5349 10.5549 21.0898
0.2 19.5527 19.5727 39.1254
0.3 30.16668 30.36668 60.53335
0.4 40.30338 40.34338 80.64675
0.5 50.312 50.512 100.824
0.6 60.47618 60.49618 120.9724
0.7 71.05928 71.07128 142.1386
0.8 80.58143 80.60143 161.1829
0.9 90.25758 90.28758 180.5352
1.0 99.8871 99.9271 199.8142
![Page 122: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/122.jpg)
102
Table 7.5 Time taken by PBlock to encrypt/decrypt large data files using 4 cores
File Size[ GB ] Encryption time Decryption time Overall Time
0.1 5.51086 5.53086 11.04172
0.2 11.00007 11.00607 22.00614
0.3 16.47954 16.49954 32.97908
0.4 22.22653 22.26653 44.49307
0.5 27.39367 27.41367 54.80734
0.6 32.95798 32.99798 65.95596
0.7 38.44913 38.48913 76.93826
0.8 44.02625 44.04625 88.07250
0.9 50.24433 50.26433 100.50866
1.0 54.85922 54.86122 109.72044
Table 7.6 Time taken by PBlock to encrypt/decrypt bits of input stream using 6 cores
File Size[ GB ] Encryption time Decryption time Overall Time
0.1 3.82568 3.84568 7.67136
0.2 7.689945 7.709945 15.3999
0.3 11.50644 11.52644 23.0329
0.4 15.27489 15.29489 30.5698
0.5 19.12053 19.14053 38.2611
0.6 22.9221 22.9421 45.8642
0.7 26.66044 26.68044 53.3409
0.8 30.76716 30.78716 61.5543
0.9 34.45764 34.47764 68.9353
1.0 38.09243 38.11243 76.2049
![Page 123: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/123.jpg)
103
Table 7.7 Time taken by PBlock to encrypt/decrypt bits of input stream using 8 cores
File Size[ GB ] Encryption time Decryption time Overall Time
0.1 3.014415 3.016415 6.03083
0.2 5.981855 5.997855 11.9837
0.3 9.03536 9.05536 18.0907
0.4 12.15148 12.17148 24.323
0.5 15.23598 15.25598 30.492
0.6 18.22181 18.24181 36.4636
0.7 21.31759 21.33759 42.6552
0.8 24.25144 24.27144 48.5229
0.9 27.12253 27.14253 54.2651
1.0 30.0599 30.0799 60.1398
After getting results on different cores, the performance gains are visible
and it can be easily predicted that by adding cores the execution time
decreases drastically. The encrypted and decrypted text using PARC4 has
been given in Appendix-D for reader’s reference . OpenMP implementation
for the same has been given in Appendix G.
7.5 Performance and Scalability Analysis
The following metrics have been used to examine the benefits of
parallelism of proposed algorithm.
7.5.1 Speedup
From above Tables 7.3 to 7.7, it can be observed that PBlock is giving
6.6X speedup using eight cores and 5.2X using six cores. Following
figure is showing the execution time taken by PBlock with different cores.
![Page 124: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/124.jpg)
104
Fig. 7.2 Speedup comparison of PBlock using multiple cores
It is clear from the graph that execution time is increasing as the file size
increases. On the other hand, for constant data if the number of cores
increases, speedup tends to saturate and start decreasing after a specific
point. As per the conclusions drawn from the Amdahl’s law, if speedup
tends to saturate, efficiency can drops at some specific point and that
depends on the type of parallel model used to execute the problem, the
size of data associated with the problem and the number of cores to
process that data. Figure 7.3 depicts that the execution time decreases as
the cores increase but saturate at specific point. That means for the 1 GB
data file, at most 16 cores will be sufficient to perform after that the
difference is very small moreover at the cost of extra cores it should be
negligible.
![Page 125: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/125.jpg)
105
Fig 7.3: For constant file size speedup tends to saturate at specific point
7.5.2 Efficiency
The same formula, which has been used to measure efficiency for
PARC4 and PARC4-I, is being used to measure efficiency of PBlock
algorithm. Let T (1) = 401.8 seconds, T (n) = 60.6 seconds and n=8 to
process 1 GB of data. The Efficiency of PBlock is: = 0.82, since, there
is . PBlock achieved an efficiency in that range as
. Table 7.8 shows that if numbers of cores are
increasing for constant file size the efficiency drops but if the problem
size is increasing and number of cores remains constant, the efficiency
remains constant.
Table 7.8 Efficiency Vs number of processing elements for different file size
File size
[GB] P=4 P=6 P=8
0.6 0.91 0.88 0.83
0.7 0.91 0.88 0.83
0.8 0.91 0.88 0.83
0.9 0.91 0.87 0.82
1.0 0.91 0.87 0.82
![Page 126: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/126.jpg)
106
7.5.3 Complexity and Cost optimality
If there are n data blocks where n is any positive number and p is number
of cores where n>p, n/p steps are required to execute the n blocks in
parallel. Thus the overall parallel execution time for PBlock is ,
where n is the number of blocks and p is the number of processing
elements. Therefore, parallel execution time can be defined as:
(7.1)
Consequently, its cost is:
Cost = (7.2)
= (7.3)
= (7.4)
Above equations have been proved that the parallel runtime is =
which is same as of serial runtime. Hence, PBlock is Cost optimal.
7.5.4 Scalability
The ability to maintain efficiency at a fixed value by concurrently
increasing the number of cores and the size of the problem is unveiled by
many SMPs. Such systems are scalable parallel systems(Kumar et al.,
1994). It can be seen in Table 7.8 that if four symmetric cores are
processing ~1GB of data, the efficiency is 0.91 but the same problem is
processed using eight cores with efficiency of 0.82. That means efficiency
drops as the cores increases. But after increasing problem size and number
of cores, the efficiency remains almost same. This is the indicator that the
proposed algorithm is supporting scalability feature as keeps the efficiency
fixed when increasing the problem size and processing elements
simultaneously.
![Page 127: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/127.jpg)
107
7.6 Comparative analysis of PBlock and Blowfish using Pipeline
approach
As described in Chapter 1, in order to parallelize Block cipher blowfish
algorithm, Kamak Ebadi, Victor Pena and Chen Liu has proposed a
pipelined approach. Pipelined approach has been implemented on a Single-
Chip Cloud Computer (SCC) experimental processor having 48-cores
created by Intel Labs as a platform for many-core software research.
Although both approaches are extremely different due to different platform
used yet due to the similar concept and common agenda both the
techniques can be differentiated on the basis of few parameters:
Table 7.9 Comparison between PBlock and Pipelined approach
Parameter PBlock Pipelined approach
Parallel Computing
Model
Data Parallel Model Pipeline Model
Processor type Symmetric
multiprocessor system
Single chip cloud
computer –A 48-core
experimental processor
Communication
Overhead
No( each core perform
independent task
assigned to it)
Yes ( because the input
data passes in turn
through all the cores
involved)
Suitability For larger input data For smaller files ( due to
communication
overhead and latency
associated with the
model)
Speedup Attain good speedup for
large input data due to
domain decomposition.
Achieve ample speedup
for very small files due
to functional
decomposition.
![Page 128: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/128.jpg)
108
Conclusion
This chapter introduced PBlock, which is the data parallel approach of
Blowfish and concluded that parallel approach is much faster than sequential
method and has given 6.6X speedup using eight symmetric cores. The
parallel algorithm also proved cost optimal as it has the time complexity
similar to sequential and having less complexity in terms of asymptotic
notations. The approach proposed in this research has no communication
overhead involved. Thus, it is suitable for large data files encryption.
![Page 129: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/129.jpg)
109
8
Chapter 8
Analysis of Energy
Consumption by proposed
parallel algorithms
![Page 130: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/130.jpg)
110
8 CHAPTER 8
ANALYSIS OF ENERGY CONSUMPTION BY PROPOSED
PARALLEL ALGORITHMS
In today’s mobile computing era, a large number of battery-operated embedded
systems such as cell phones, smart cards, and health monitoring devices are used
to access, store, and manipulate complex and confidential data. Security
trepidations in such systems range from user identification, to secure software
execution, and secure information storage. To implement security techniques in
these systems, cryptographic algorithms have been used extensively. But as
discussed in previous chapters, these cryptographic algorithms do compute-
intensive calculations to encrypt/decrypt data and also consume lots of energy as
a result. This chapter incorporates the detailed analysis of the energy
consumption using serial and parallel algorithms. Some encouraging results on
reduction of energy over sequential algorithm have been achieved through the
experiments on an eight-core parallel machine and simulated using Joulemeter
(Microsoft’s Research Tool).
8.1 Introduction
Energy costs have become increasingly important to computing, since
they directly impact the power provisioning cost for computing
infrastructures, the operating expenses for both the data centers and the
enterprise infrastructures as well as the battery life for laptops and all
other mobile devices. Cryptographic algorithms are well-known for doing
large and complex computations to protect important data files from
illegal or unauthorized access. Because of the rigorous computation
intrinsic in encryption/decryption algorithms, they tend to consume a
significant amount of energy. As explained by (Krishnamurthy,2003), to
encrypt only 13.6 kilobytes of data using Blowfish block cipher algorithm
on a mobile device will gutter about 75% of the battery power. Many
![Page 131: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/131.jpg)
111
researchers have tried to contribute in this area of key significance.
Various power management techniques such as power gating, adaptive
voltage and frequency scaling, and active body-bias ,to address power
consumption issues have been discussed in the literature (Kapoor and
Verma, 2011). In article “Computational and Energy Costs of
Cryptographic Algorithms on Handheld Devices” authors had carried out
an extensive analysis on the costs of initiating private and public key
infrastructure based algorithms and hash functions, and compare them
with the costs of basic operating system functions. Outcomes show that
though cryptographic energy costs are high and such operations shall be
delimited in time(Rifa-Pous and Herrera-Joancomartí, 2011).
Rest of the chapter is organized as follows: Section 8.2 describes the
motivational aspect of the study. Tools and techniques used for the
research is mentioned in next section. Joulemeter powered by Microsoft
has been used for the research to measure application level energy. In
Section 8.4, the working of the tool is described. Experimental Results
have been given in section 8.5 and discussion in next section followed by
the conclusion.
8.2 Motivation
There is a fundamental relationship among power and frequency
(Korthikanti and Agha, 2009). The shift to multicore processors is a result
of increasing power consumption in the microprocessors. In multicore
architecture, each core can be operated at a lower frequency, dividing
power between them usually given to a single core, while reducing the
overall power consumption. This is because one can also lower the
voltage of operation when reducing frequency and power consumption
has a quadratic dependence on the supply voltage (Chandrakasan and
Brodersen, 1995, Chandrakasan et al., 1992). Symmetric multi-processor
architectures have been proposed as a method to escalate computation
cycles with less energy consumption. As the relationship between power
and frequency of a core is non-linear, on a uniprocessor, energy
![Page 132: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/132.jpg)
112
consumption can be condensed by dropping the frequency at which it
operates. However, dropping the frequency in a single processor will
decreases the performance of the algorithm. Parallel algorithm consists of
few serial sub computations, parallel computations, and communication
between the parallel sub computations. Thus the performance and energy
cost of the parallel algorithm are dependent on two factors: one is the
number of cores and the frequency at which each core operates, another is
structure of the parallel algorithm. In previous chapter, three different
parallel algorithms have been already proposed. Thus, this chapter focuses
on the study and analysis to reduce the energy consumption by compute-
intensive processes/applications by lowering the frequency. But this
change will affect the speed of the application. According to
(Krishnamurthy,2003), “if we increase clock frequency by 20 % of a
single core, it can provide a 13% performance increase, but it requires
approximately 73% greater power. On the other hand, if we decrease
clock frequency by 20% the reduction of power can be up to 49% but
causes only 13% performance loss. If another core is added into the
single core design, it results in a dual-core processor that at 20% reduced
clock frequency; this design can provide approximately 73% more
performance while using the same power as a single core processor at
maximum frequency”(Mani and Jee, 2007). This research also points out
that energy power can be reduced while providing optimal performance to
the systems.
8.3 Tools and Techniques used for energy measurement
To measure and compare the energy consumption by proposed parallel
algorithms, following tools have been used (as discussed in Chapter 2):
Operating system: Windows 7
Framework: dot net service pack 1
Compiler : MinGW
Editor : Code blocks
Tools: Joulemeter
![Page 133: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/133.jpg)
113
8.4 How Joulemeter works to measure energy
In Joulemeter, Power data is shown for the computer as a whole as well as
the key hardware components(BEKAROO et al.). Power data for a
specific application can also be tracked using this software tool. The data
can be stored periodically to a file if desired.
Fig 8.1-Power metering interface exposed by Joulemeter
Joulemeter estimates the power usage through a power model that relates
the computer resource usage and hardware power state (processor
frequency, processor utilization, screen intensity, monitor on/off state,
disk usage) to power drawn. This power model is derived using a process
called calibration. On laptops calibration can be performed without any
external power meter. For desktops, a Watts UP PRO power meter is
required. If such a meter is not available, approximate power data can be
monitored. In this thesis no external power meter is used to measure
energy of proposed parallel algorithms, thus the machine needs to
calibrate with the values as shown in Table 8.4.
![Page 134: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/134.jpg)
114
After setting these values, we specify application’s name on the Power
usage tab and then start the application. We then execute the program
using code blocks and check Joulemeter to see the energy usage by the
application for each time stamp. By adding all the instances total energy
consumed in joules, we can compute the energy consumed by the
application over a period of time. Figure 8.3 shows the excel file being
generated by Joulemeter for PARC4.
Fig 8.2: Data file of PARC4 consisting joules consumed at each time stamp
8.5 Energy Measurement
This section covers the comprehensive results and detailed analysis for
energy consumption by proposed algorithms (as discussed in Chapter 4, 5,
and 7) using the described experimental setup.
8.5.1 Result and Analysis
To measure energy cost of proposed algorithms versus existing
algorithms, following test environments have been used:
Platform 1: An Intel Core 2 Duo CPU T5270 laptop supported 32-bit
Windows 7 Operating System, 1.40 GHz frequency, 2GB RAM.
![Page 135: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/135.jpg)
115
Platform 2: A Desktop having AMD FX (tm) - 8320 Eight - Core
Processor, 3.5 GHz frequency , 8 GB RAM and 64-bit Windows 7
Operating System.
On both the test environments, Window 7 with Joulemeter has been
installed to measure application-level energy consumption with system’s
default setting for frequency and voltage. Table 8.1 specified the
calibrated and non-calibrated states of a system.
Table 8.1: Calibrated and Non-calibrated specification
Laptop Desktop
Default / non-calibrated 1.2 GHz frequency and 1.2
voltage
3.5 GHz frequency and 1.332
voltage
Calibrated [operate processor
by using low frequency]
1.5 GHz frequency and 0.9
voltage
2.3 GHz frequency and 0.9
Voltage
The thesis proposes three different parallel algorithms for block cipher
and stream cipher. Energy characteristics for all of these algorithms have
been described in the following tables.
Table 8.2: Energy consumed by Blowfish and PBlock with system’s default frequency
and voltage
Platform 1 Platform 2
Algorithm µJ/B MB/s Algorithm µJ/B MB/s
Blowfish 12.21875 1.103448 Blowfish 0.1546875 1.855072464
PBlock 12.53125 1.939394 PBlock 0.28515625 5.333333333
Table 8.3: Energy consumed by existing and proposed parallel algorithms for stream
cipher technique using system’s default frequency and voltage
Platform 1 Platform 2
Algorithm µJ/B MB/s Algorithm µJ/B MB/s
RC4 0.283008 24.7343 RC4 0.036914063 32
PARC4 0.353125 40.99608 PARC4 0.058789063 128
RC4A 0.256953 30.08226 RC4A 0.031640625 39.38461538
PARC4-I 0.345371 51.71717 PARC4-I 0.067773438 131.2820513
Table 8.2 and 8.3 specifies the results using system’s default frequency
and voltage i.e., using non-calibrated states. From these results, it can be
![Page 136: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/136.jpg)
116
inferred that PBlock provides a 1.5X speedup over the serial version on
Platform 1 and approximately 2.5X on Platform 2, but at the same time
parallel algorithms consume more energy as compared to sequential
algorithms. Similarly PARC4 and PARC4-I are much faster than the
existing sequential algorithms but consuming more µJ/B as compared to
their sequential versions. Results specified using Platform 2 are having
similar description as Platform 1. The only difference is, processor is
running with low frequency and voltage in Platform 1 where as in
Platform 2 processor is running with much higher frequency. That’s why
the results of Platform 2 are better than Platform 1 even with non-
calibrated states. Serial algorithms are slower and consuming less energy
but parallel algorithms are consuming more energy at the cost of faster
execution. With parallel computing, one can operate each core at low
frequency to have less energy consumption as a benefit while keeping
performance levels same as the sequential methods. Thus, all experiments
have been carried out for calibrated states of a processor. Table 8.4
specifies the low power states for Platform 2.
Table 8.4: Low power states of AMD-8320 processor
After calibrating the system with values mentioned in table 8.4, the
energy consumption using parallel algorithms reduced drastically. Figures
8.4 shows that by using PBlock algorithm on multiple cores where each
core is operating at low frequency, the transmission rate is 1.18 MB/s and
energy consumption is 7.1125 µJ/B.
Voltage Frequency
1.3 2900
1.1875 2300
1.0625 1700
0.95 1400
![Page 137: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/137.jpg)
117
Fig 8.3: Comparison of serial, parallel and parallel with calibration for energy
consumption using platform 1
From Figure 8.3, it is clear that blowfish algorithm has consumed less
energy as compared to PBlock parallel algorithm with system’s default
frequency and voltage but PBlock is much faster than Blowfish algorithm.
On the other hand if frequency reduces, the energy consumption by
PBlock reduces drastically at same performance level. Similarly, in
Fig.8.4 both the algorithms have been executed using Platform 2 after
scaling down the frequency. Again, result shows that with low frequency
PBlock algorithm consumes less energy at the same performance levels as
that of sequential implementation.
![Page 138: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/138.jpg)
118
Fig 8.4: Comparison of serial, parallel and parallel with calibration Blowfish and
PBlock for energy consumption using platform 2
Fig 8.5: Serial and Parallel algorithms for stream ciphers technique with default and
calibrated frequency using platform 1
![Page 139: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/139.jpg)
119
Fig 8.6: Serial and Parallel algorithms for stream ciphers technique with default and
calibrated frequency using platform 2
Figure 8.5 and 8.6 shows that using a lower frequency, PARC4 and
PARC4-I consume less energy while providing high throughput on both
the platforms. Thus, it has been observed that by adding number of cores,
the computation carried out at each core can be reduced, which can help
to improve performance with respect to time. But at the same time if
frequency drops, it will turn into the gain in energy. That means at same
performance level, parallel algorithm is consuming less energy as
compared to the sequential algorithm.
![Page 140: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/140.jpg)
120
Conclusion
This chapter described and compared the energy cost of proposed parallel
algorithms PARC4, PARC4-I, PBlock, and the serial algorithms RC4,
RC4A and Blowfish. Results have been shown that parallel algorithms are
much faster than serial algorithms but consuming more energy. SMPs
provide option to reduce the frequency and voltage of the machine
through the dynamic voltage and frequency scaling technique. By using
this mechanism, parallel algorithms can become more energy efficient.
The analysis shows that the PBlock parallel algorithm has consumes 58%
less energy as compared to Blowfish algorithm and similarly PARC4 and
PARC4-I have consumed 63% and 54% less energy than RC4 and RC4A
algorithms. On the other hand, the compromise for the speed needs to be
done. That means, the gain in time will turn into the gain in energy.
Overall, the study concluded that SMPs using low frequency has given
promising results and can significantly contributes towards greenhouse
computing, ultimately towards society. Our results also indicate that block
ciphers consume more power than stream ciphers while executing,
because faster algorithms consume less energy because they operate at an
elevated level of power for less time and stream ciphers are faster than
block ciphers.
![Page 141: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/141.jpg)
121
9
Chapter 9
Conclusions and Future Scope
![Page 142: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/142.jpg)
122
CHAPTER 9
CONCLUSIONS AND FUTURE SCOPE
9.1 Thesis Contribution
Information security is the practice of preventing information from
unauthorized access, use, disclosure, disruption, modification, perusal,
inspection, recording, or destruction. The rapid growth and prevalent use
of electronic data processing and electronic business conducted through
the Internet, along with numerous occurrences of cyber-attacks, has fueled
the need for better methods of protecting computers and information they
store, process and transmit. It is sensible to assume that anyone's
communication can be captured or altered over the network. Thus, to
safeguard the data from unauthorized use over the channel,
encryption/decryption process is used.
Different encryption algorithms have been used to secure information
communications over the network. Along with the security of the
algorithms, there are two important aspects of these algorithms:
1. Speed:
Speed of encryption and decryption is an important aspect of security
algorithms. A slow cryptographic algorithm can slow down the speed of
an application and reduce its effectiveness.
2. Energy consumption
Apart from the speed, energy consumption of these cryptographic
algorithms is another crucial aspect due the prevalent usage of mobile
devices today. As discussed in Chapter 8, to encrypt only 13.6 kilobytes
of data using Blowfish block cipher algorithm on a mobile device will use
up about 75% of its battery power.
![Page 143: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/143.jpg)
123
This thesis has developed parallel algorithms for different symmetric
cryptographic algorithms. First of all, the parallel framework for both the
stream ciphers as well as the block ciphers is designed which enables
writing of parallel algorithms without impairing their security aspect.
Afterwards, PARC4 which is a parallel implementation of the well-known
RC4 stream cipher is developed and implemented for execution on an
eight-core machine to measure its performance gains. PARC4 has proved
much faster than the RC4. It has resulted in approximately 7X speedup
over the sequential implementation. Then RC4A is implemented in
parallel and the resulting algorithm is named PARC4-I. This
implementation uses loop unrolling method as a key optimization
technique and has proved better than PARC4. It provides up to 7.3X
speedup and results in larger percentage gains when using fewer cores
when compared to PARC4.
From the category of block ciphers, Blowfish has been chosen for
implementation in parallel because this is one of the latest cipher
techniques using the Feistel structure. Parallel algorithm is termed as
PBlock which is an acronym for Parallel Blocks. PBlock provides up to
6.6X speedup when using eight symmetric cores.
Power consumption for all parallel algorithms, is measured using
Microsoft’s Joulemeter tool and it has been observed that at the same
performance level, parallel algorithms are more energy efficient as
compared to the sequential algorithms.
9.2 CONCLUSIONS
The following conclusions are made:
RC4 stream cipher can be parallelized with the help of data parallel model
and to achieve this, sufficient modifications need to be done in the PRGA
algorithm. Parallel algorithm PARC4 results in a 7X speedup and has
been proved to be as secure as RC4. PARC4 uses extra space compared to
RC4, necessary to implement the parallel algorithm. It should be applied
to large data tasks to take full benefits of parallelism.
![Page 144: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/144.jpg)
124
PARC4-I is faster than PARC4 but in contrast, PARC4-I takes up more
space than PARC4 due to the use of additional lookup tables. Thus for
applications, which do not have memory constraint and can benefit from
performance gains can use PARC4-I.
PBlock is the parallel approach for Blowfish block cipher algorithm.
Blowfish is used in many products like password encryption, File and
Disk encryption etc. In this research, parallel approach has been tested for
file encryption scenario and found good speedup. PBlock provides a 6.6X
speedup.
Code optimization techniques have proved helpful to gain potential
speedup. For example, in the implementation of PARC4-I, loop unrolling
technique proved better in conjunction with the data parallel model
because it reduces loop overhead.
It has been observed in each implementation that execution time is
directly proportional to the file size. As the file size increases, the
execution time increases as well. On the other hand, in parallel systems, if
number processing cores can be increased simultaneously, the execution
time decreases and speedup increases. However, if the problem size is
constant and number of cores is added to the problem performance gains
are only limited by the amount of sequential computation present in the
problem being addressed. The benefits of parallel computation have a
ceiling as stated by the Amdahl’s Law.
While executing a parallel algorithm on SMPs, energy can also be
reduced for operating at similar performance level as what the sequential
implementation offers. In order to reduce energy using the multicore
architecture, processor cores have to be calibrated with low frequency and
low voltage using the dynamic voltage and frequency scaling technique.
The research concluded that PBlock parallel algorithm has consumes 58%
less energy as compared to Blowfish algorithm and similarly PARC4 and
PARC4-I have consumed 63% and 54% less energy as compared to their
sequential versions at same performance level.
![Page 145: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/145.jpg)
125
9.3 Future Scope
This thesis has concentrated on developing parallel cryptographic
algorithms for symmetric-key approach. According to the thesis, parallel
algorithms for symmetric-key based security techniques have proven
much faster and energy-efficient as compared to existing sequential
algorithms. Here are some suggestions for the future work:
The thesis has incorporated the domain decomposition technique for
almost all of the parallel algorithms to divide the problem into sub tasks
and assign them to the different processes. Domain decomposition or data
decomposition normally works on either the large data sets or the data
where similar types of operations are needed to perform on the complete
data. But in those cases where data set is not that large and type of
operation is different for each task, instruction level parallelism can
perform better. But in order to have instruction level parallelism,
algorithms will have be redesigned completely in many cases.
Only PARC4, PARC4-I, and PBlock have been implemented using
PASCS and PIFN frameworks. More security algorithms fall under the
same category and can also be implemented and analyzed. Moreover, in
this cloud computing era, almost every electronic device including
commonly used smart phones have multicore processors. Thus, to make
them more efficient in terms of speed and energy, these parallel
algorithms can be applied.
Finally, the measurement of energy itself is an active area of research.
More accurate energy measurement techniques can be applied for more
the analysis as well as for measuring energy issues at a finer grain.
![Page 146: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/146.jpg)
126
10
11
References
![Page 147: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/147.jpg)
127
12 REFERENCES
Almasi GS & Gottlieb A (1988) Highly parallel computing. CA:
Benjamin/Cummings Publishing Company
Auth C, Allen C, Blattner A, Bergstrom D, Brazier M, Bost M, Buehler M,
Chikarmane V, Ghani T and Glassman T (2012) "A 22nm high
performance and low-power CMOS technology featuring fully-depleted
tri-gate transistors, self-aligned contacts and high density MIM
capacitors" An IEEE Symposium on VLSI Technology (VLSIT), pp.131-
135
Babb RG (1984) Parallel processing with large-grain data flow techniques.
Computer, 7:17, pp. 55-61
Barney B (2010) Introduction to parallel computing. Lawrence Livermore
National Laboratory[online], Available from:
(https://computing.llnl.gov/tutorials/parallel_comp/?ref=driverlayer.com/
web)
Bekaroo G, Bokhoree C and Pattinson C "Power Measurement of Computers:
Analysis of the Effectiveness of the Software Based Approach" Int. J.
Emerg. Technol. Adv. Eng. 4:5, pp.755-762
Bekenstein JD (1973) Black holes and entropy, Physical Review D, 7:8, DOI:
http://dx.doi.org/10.1103/PhysRevD.7.2333
Bellare M. and Yee B (2003) "Forward-security in private-key cryptography",
Topics in Cryptology—CT-RSA 2003, pp.1-18, Springer Berlin
Heidelberg.
Bo H (2009) Parallel Computing and Data Mining. China
Chandra R (2001) Parallel programming in OpenMP, CA: Academic Press.
Morgan Kaufmann Publishers. ISBN:1-55860-671-8.
Chandrakasan AP and Brodersen RW (1995) "Minimizing power consumption
in digital CMOS circuits", Proceedings of the IEEE international
Conference, pp.498-523
![Page 148: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/148.jpg)
128
Chandrakasan AP, Sheng S and Brodersen RW (1992) Low-power CMOS
digital design. IEICE Transactions on Electronic, 75, pp.371-382
Chapman B, Jost G and Van R (2008) Using OpenMP: Portable Shared Memory
Parallel Programming, MIT Press.Vol 10
Choy J, Chew G, Khoo K and Yap H (2009) "Cryptographic properties and
application of a generalized unbalanced Feistel network structure",
Proceedings of 14th Australian Conference on Information Security and
Privacy, LNCS, vol. 5594. pp. 73–89. Springer
Cusick TW, Ding C and Renvall AR (2004) Stream ciphers and number theory,
Revised Edition,Elsevier
Dagum L and Menon R (1998) "OpenMP: an industry standard API for shared-
memory programming", Proceedings of the IEEE international
Conference on Computational Science & Engineering, 5, pp.46-55
Davari B, Dennard RH and Shahidi GG (1995) "CMOS scaling for high
performance and low power-the next ten years", Proceedings of the IEEE
international Conference, 83, pp.595-606
Diffie W and Hellman ME (1976) New directions in cryptography, IEEE
Transactions on Information Theory, 22, pp. 644-654
Elminaam D S A, Abdual-Kader H M and Hadhoud MM (2010) Evaluating
The Performance of Symmetric Encryption Algorithms, International
Journal of Network Security, 10, pp.216-222
Fenlason J and Stallman R (1988) GNU gprof. GNU binutils.[Online].
Available from: (http://www. gnu. org/software/binutils)
Fluhrer S, Mantin I and Shamir A (2001) "Weaknesses in the key scheduling
algorithm of RC4", Proceedings of the Selected areas in cryptography,
vol: 2259 of LNCS, pp. 1-24, Springer-Verlag
Fontaine C (2011) Synchronous Stream Cipher. Encyclopedia of Cryptography
and Security, 1,pp.1274-1275
G.N, P.K (2007) Performance Enhancement of Blowfish Algorithm by
Modifying its function. In: Innovative Algorithms and
Techniques,Industrial Electronics and Telecommunications, springer,
pp.241-244
![Page 149: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/149.jpg)
129
G.N, P. K (2008) Performance enhancement of Blowfish and CAST-128
algorithms and Security analysis of improved Blowfish algorithm using
Avalanche effect, International Journal of Computer Science and
Network Security, 8, pp.244-250
Geer D (2005) Chip makers turn to multicore processors, Computer, 38:5 ,
pp.11-13
Gepner P and Kowalik MF (2006) "Multi-core processors: New way to achieve
high system performance", International IEEE Symposium on Parallel
Computing in Electrical Engineering, pp. 9-13
Gonzalez ME, Bilgic A, Lackorzynski A, Tudor D, Matus E and Badr I
(2009) ICT-eMuCo. "An innovative solution for future smart phones'',
Proceedings of the IEEE International Conference on Multimedia and
Expo., pp.1821-1824
Graham SL, Kessler PB and Mckusick MK (2004) Gprof: A call graph
execution profiler, ACM SIGPLAN Notice, 39, pp.49-57
Handa D and Kapoor B (2014) "PARC4: High performance implementation of
RC4 cryptographic algorithm using parallelism" Proceedings of IEEE
International Conference on Optimization, Reliabilty, and Information
Technology (ICROIT), Faridabad, pp. 286-289
Heys HM, and Tavares SE (1994) "On the security of the CAST encryption
algorithm", Proceedings of the Canadian Conference on Electrical and
Computer Engineering, pp.332-335
Heys HM and Tavares SE (1995) Avalanche characteristics of substitution-
permutation encryption networks, IEEE Transactions on Computers, 44,
pp.1131-1139
Hill MD and Marty MR (2008) Amdahl's law in the multicore era, Computer,
pp.33-38
Jin HQ, Frumkin M and Yan J (1999) The OpenMP implementation of NAS
parallel benchmarks and its performance, NAS technical report, Available
from: (https://www.nas.nasa.gov/assets/pdf/techreports/1999/nas-99-
011.pdf)
![Page 150: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/150.jpg)
130
Kamak Ebadi, V. P, Chen Liu (2012) "High-Performance Implementation and
Evaluation of Blowfish Cryptographic Algorithm on Single-Chip Cloud
Compute: A Pipelined Approach", Proceedings of the International
Conference on Applied and Theoretical Information Systems Research,
Taiwan, pp. 27-29
Kanda M. (2001) Practical security evaluation against differential and linear
cryptanalyses for Feistel ciphers with SPN round function, Selected
Areas in Cryptography, Springer, pp.324-338
Kapoor B. and Verma S. (2011) Power Management Design and Verification,
Journal of Low Power Electronics, 7, pp.41-48
Keckler SW, Olukotun OA and Hofstee HP (2009) Multicore processors and
systems, Springer, ISBN: 978-1-4419-0262-7
Kholidy H and Alghathbar K ( 2009) Adapting and accelerating the stream
cipher algorithm “RC4” using “ultra gridsec” and “HIMAN” and use it to
secure “HIMAN” data, Journal of information assurance and security, 4,
pp.474-483
Koch G. (2005) Discovering Multi-Core: Extending the Benefits of Moore’s
Law, Technology Intel Magazine, Intel Corporation, Tech. Report
Korthikanti VA and Agha G (2009) "Analysis of parallel algorithms for energy
conservation in scalable multicore architectures", Proceedings of the IEEE
International Conference on Parallel Processing, pp. 212-219
Krall A. and Lelait S. (2000) Compilation techniques for multimedia processors,
International Journal of Parallel Programming, 28, pp.347-361
Kumar V, Grama A, Gupta A and Karypis G (1994) Introduction to parallel
computing: design and analysis of algorithms, 2nd edition. CA:
Benjamin/Cummings Publishing Company Redwood City
Leighton FT (1992) Introduction to parallel algorithms and architectures,
Morgan Kaufmann San Francisco publishers
Li C, Wu H, Chen S, Li X, Guo D (2009) "Efficient implementation for MD5-
RC4 encryption using GPU with CUDA", Proceedings of the 3rd
International IEEE Conference on Anti-counterfeiting, Security, and
Identification in Communication, pp.167-170
![Page 151: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/151.jpg)
131
Liu C. (2012) Critical Path based hardware Acceleration for Cryptosystems,
Journal of information processiong system(JIPS), 8, pp.133-144
Madson C, and Doraswamy N (1998) The ESP DES-CBC cipher algorithm with
explicit IV. RFC 2405 [Online] Available from: (http://www.rfc-
editor.org/info/rfc2405)
Mani K and Jee B (2007) On the Edge: A Comprehensive Guide to Blade
Server Technology, 1st edition, John Wiley & Sons.
Mao W (2003) Modern cryptography: theory and practice, 1st edition, Prentice
Hall Professional Technical Reference
Mead C and Conway L (1980) Introduction to VLSI systems, Reading, MA:
Addison-Wesley
Menezes, Alfred J, Paul C and Scott A (1996) Handbook of applied
cryptography, CRC press.
Moore GE (1965) Cramming More Components onto Integrated Circuits,
Electronics, 38, pp.114-117
Noman AA (2009) Hardware Implementation of RC4A Stream Cipher,
International Journal of Cryptology Research, pp.224-233
P. Karthigai Kumar , K. B (2010) Partially pipelined vlsi implementation of
blowfish encryption/decryption algorithm, International Journal of Image
and Graphics, 10:03, pp.327-341
P. Karthigai Kumara , K. B (2010) An ASIC implementation of low power and
high throughput blowfish crypto algorithm, Microelectronics Journal, 41,
pp.347-355
Palnitkar S (2003) Verilog HDL: a guide to digital design and synthesis, Prentice
Hall Professional
Patidar V, Pareek N and Sud K (2009) A new substitution–diffusion based
image cipher using chaotic standard and logistic maps. Communications
in Nonlinear Science and Numerical Simulation, 14, pp.3056-3075
Paul S and Preneel B (2004) A New Weakness in the RC4 Keystream Generator
and an Approach to Improve the Security of the Cipher, Fast Software
Encryption, Springer, pp.245-259
![Page 152: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/152.jpg)
132
Peters C, Van Der Heijden J and Khan M (2010) MinGW: Minimalist GNU for
Windows[online], Available from : (http://www. mingw. org)
POWER, IBM. (2010) Multi-Core Processors.
Quinn MJ (1994) Parallel computing: theory and practice, 2nd edition, McGraw-
Hill New York, ISBN:0-07-051294-9
Rifa-Pous H and Herrera-Joancomartí J (2011) Computational and energy costs
of cryptographic algorithms on handheld devices. Future internet, 3,
pp.31-48
Rivest RL (1992) The RC4 encryption algorithm. RSA Data Security Inc.
Robshaw MJ (1995) Stream ciphers, RSA Laboratories, a division of RSA Data
Security, Inc.
Roussev, Available from: (http://roussev.net/t5/t5.html)
Roussev V (2011) An evaluation of forensic similarity hashes, digital
investigation, 8, pp.S34-S41.
Roy A, Jingye Xu and Chowdhury M (2008) "Multi-core processors: A new way
forward and challenges", Proceedings of the IEEE international
Conference on Microelectronics, pp. 454-457
Rrnyi A (1961) "On measures of entropy and information", Fourth Berkeley
symposium on mathematical statistics and probability, pp.547-561
Salomaa A (1996) Public-key cryptography, 1st edition, Springer Science &
Business Media
Salomon D (2003) Introduction: Data Privacy and Security, 1st edition. Springer.
Sato J, Imai M, Hakata T, Alomary AY and Hikichi N (1991) "An integrated
design environment for application specific integrated processor",
Proceedings of the IEEE International Conference on Computer Design:
VLSI in Computers and Processors, pp. 414-417
Schneier B [online] Available from: (https://www.schneier.com/blowfish-
products.html)
Schneier B (1994) Description of a new variable-length key, 64-bit block cipher
(Blowfish), In Fast Software Encryption, Springer, pp.191-204
Schneier B (2008) Applied cryptography: protocols, algorithms, and source
code in C, Wiley India.
![Page 153: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/153.jpg)
133
Schoen I and Boberski M (2002) Secure PKI proxy and method for instant
messaging clients, Patent No : US 20030204741 A1,US
Shannon CE (1949) Communication theory of secrecy systems. Bell system
technical journal, 28, pp.656-715
Shannon CE (1951) Prediction and entropy of printed English, Bell system
technical journal, 30, pp.50-64
Shannon CE (2001) A mathematical theory of communication, ACM
SIGMOBILE Mobile Computing and Communications Review, 5, pp. 3-55
Tsoi KH, Lee KH, Leong PHW (2002) "A massively parallel RC4 key search
engine", 10th Annual IEEE Symposium, pp.13-21
Vajda A and Stenström P (2012) Multi-core processors, Patent No: WO
2012136766 A1
Walker J and Ent A (1998) A pseudorandom number sequence test program
[Online] Available from: (http://www.fourmilab.ch/random/)
Webster A and Tavares SE (1986) "On the design of S-boxes", Proceedings of
the Advances in Cryptology, Springer, pp.523-534
Weerasinghe T (2014) Improving throughput of RC4 algorithm using
multithreading techniques in multicore processors, IACR Cryptology
ePrint Archiv, pp.180-184
Weerasinghe TDB (2012) Improving throughput of RC4 algorithm using
multithreading techniques in multicore processors, International Journal
of Computer Applications, 51:22,pp.102-109
William S (2006) Cryptography And Network Security, 4th edition, Pearson.
Yu S, Wang C, Ren K and Lou W (2010) "Achieving secure, scalable, and
fine-grained data access control in cloud computing", Proceedings of
INFOCOM, pp.1-9.
Zeidman B (1999) "An Introduction to FPGA design", Proceedings of the
Embedded Systems Conference, Europe
![Page 154: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/154.jpg)
134
APPENDIX-A
//256 key bytes generated for a set of 256 plaintext bytes by RC4 algorithm
208 1 45 122 39 140 218 179 112 99
94 190 89 150 207 224 212 241 48 78
178 160 197 40 80 5 167 32 105 107
37 21 225 182 42 234 42 103 221 254
5 124 20 140 106 38 48 163 224 206
247 120 191 139 73 3 0 146 244 251
32 142 139 84 250 14 138 20 41 55
124 172 185 30 38 210 25 252 160 98
170 68 143 42 215 150 93 207 55 93
210 216 209 3 23 196 173 222 18 148
181 207 192 255 183 93 219 108 7 42
121 236 148 210 71 88 105 174 163 84
147 73 215 24 61 87 145 95 122 109
137 177 182 3 214 121 41 186 146 190
62 42 90 255 131 77 139 156 173 71
53 75 90 164 19 219 209 83 65 174
35 222 155 165 66 14 143 138 180 151
44 87 238 132 9 150 247 108 113 35
63 131 102 214 232 150 21 251 50 161
218 192 135 197 106 167 175 133 141 79
87 95 113 105 35 87 85 159 92 71
39 196 86 53 45 54 194 178 6 188
195 160 89 135 249 73 238 120 172 62
255 63 15 167 76 210 13 109 10 149
22 11 202 15 51 69 76 139 86 126
4 51 218 57 79 34
![Page 155: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/155.jpg)
135
APPENDIX-B
//256 key bytes generated for a set of 256 plaintext bytes by PARC4
algorithm using PASCS framework
142 185 197 27 138 226 50 117 168 132
49 234 170 133 90 182 108 96 155 215
240 22 2 162 79 82 30 176 131 199
163 204 36 77 135 155 199 122 191 72
224 195 227 216 139 245 102 249 237 124
243 28 16 134 96 43 12 237 181 48
200 71 73 123 112 242 182 188 22 162
57 129 240 243 248 122 123 104 249 185
59 70 59 226 53 156 10 172 91 690
189 61 249 87 210 178 102 132 210 113
254 53 153 163 224 145 54 8 199 223
1 155 139 219 206 12 78 17 9 228
217 222 151 37 150 62 234 187 242 213
133 197 207 226 143 158 247 60 97 114
115 28 61 121 149 90 168 165 66 61
27 135 121 244 200 198 137 202 16 137
125 190 194 108 47 36 70 82 130 49
195 39 186 115 150 19 32 202 230 118
148 182 25 156 205 79 190 203 116 118
157 52 44 115 87 166 155 22 37 160
152 143 11 32 40 59 40 163 52 206
166 140 32 77 208 29 47 73 59 213
86 134 15 187 144 153 140 98 125 52
77 182 20 126 66 18 251 118 147 152
112 121 231 83 71 25
![Page 156: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/156.jpg)
136
13 APPENDIX –C
// After the completion of 256 iterations, i will be starting from 1 onwards to
calculate j’s value and that determines the swap taking place in the S array.
(i=1,j=183) (i=2,j=208) (i=3,j=17) (i=4,j=176) (i=5,j=239)
(i=6,j=195) (i=7,j=16) (i=8,j=133) (i=9,j=103) (i=10,j=250)
(i=11,j=154) (i=12,j=222) (i=13,j=114) (i=14,j=245) (i=15,j=157)
(i=16,j=234) (i=17,j=43) (i=18,j=81) (i=19,j=60) (i=20,j=223)
(i=21,j=86) (i=22,j=1) (i=23,j=44) (i=24,j=226) (i=25,j=25)
(i=26,j=96) (i=27,j=191) (i=28,j=221) (i=29,j=95) (i=30,j=139)
(i=31,j=219) (i=32,j=185) (i=33,j=5) (i=34,j=225) (i=35,j=83)
(i=36,j=77) (i=37,j=205) (i=38,j=143) (i=39,j=237) (i=40,j=72)
(i=41,j=228) (i=42,j=139) (i=43,j=204) (i=44,j=247) (i=45,j=6)
(i=46,j=133) (i=47,j=57) (i=48,j=126) (i=49,j=135) (i=50,j=2)
(i=51,j=131) (i=52,j=32) (i=53,j=91) (i=54,j=10) (i=55,j=49)
(i=56,j=191) (i=57,j=115) (i=58,j=47) (i=59,j=137) (i=60,j=116)
(i=61,j=128) (i=62,j=220) (i=63,j=69) (i=64,j=111) (i=65,j=190)
(i=66,j=179) (i=67,j=241) (i=68,j=150) (i=69,j=255) (i=70,j=9)
(i=71,j=185) (i=72,j=20) (i=73,j=70) (i=74,j=57) (i=75,j=181)
(i=76,j=62) (i=77,j=56) (i=78,j=79) (i=79,j=102) (i=80,j=209)
(i=81,j=247) (i=82,j=58) (i=83,j=172) (i=84,j=201) (i=85,j=197)
(i=86,j=60) (i=87,j=200) (i=88,j=224) (i=89,j=171) (i=90,j=197)
(i=91,j=0) (i=92,j=86) (i=93,j=159) (i=94,j=5) (i=95,j=135)
(i=96,j=206) (i=97,j=83) (i=98,j=116) (i=99,j=106) (i=100,j=27)
(i=101,j=228) (i=102,j=251) (i=103,j=221) (i=104,j=188) (i=105,j=146)
(i=106,j=136) (i=107,j=112) (i=108,j=51) (i=109,j=108) (i=110,j=254)
(i=111,j=40) (i=112,j=16) (i=113,j=167) (i=114,j=59) (i=115,j=239)
(i=116,j=16) (i=117,j=86) (i=118,j=57) (i=119,j=75) (i=120,j=214)
(i=121,j=39) (i=122,j=23) (i=123,j=50) (i=124,j=50) (i=125,j=228)
(i=126,j=41) (i=127,j=77) (i=128,j=89) (i=129,j=6) (i=130,j=57)
(i=131,j=186) (i=132,j=129) (i=133,j=0) (i=134,j=172) (i=135,j=46)
(i=136,j=36) (i=137,j=126) (i=138,j=246) (i=139,j=157) (i=140,j=159)
(i=141,j=119) (i=142,j=116) (i=143,j=54) (i=144,j=239) (i=145,j=244)
(i=146,j=202) (i=147,j=61) (i=148,j=30) (i=149,j=76) (i=150,j=241)
(i=151,j=81) (i=152,j=87) (i=153,j=44) (i=154,j=204) (i=155,j=61)
(i=156,j=43) (i=157,j=210) (i=158,j=58) (i=159,j=60) (i=160,j=171)
(i=161,j=146) (i=162,j=22) (i=163,j=57) (i=164,j=145) (i=165,j=162)
(i=166,j=150) (i=167,j=45) (i=168,j=171) (i=169,j=0) (i=170,j=149)
(i=171,j=19) (i=172,j=191) (i=173,j=70) (i=174,j=16) (i=175,j=77)
(i=176,j=236) (i=177,j=243) (i=178,j=32) (i=179,j=21) (i=180,j=182)
(i=181,j=50) (i=182,j=211) (i=183,j=138) (i=184,j=115) (i=185,j=35)
(i=186,j=164) (i=187,j=145) (i=188,j=112) (i=189,j=6) (i=190,j=85)
(i=191,j=1) (i=192,j=73) (i=193,j=156) (i=194,j=134) (i=195,j=90)
(i=196,j=8) (i=197,j=34) (i=198,j=186) (i=199,j=5) (i=200,j=145)
![Page 157: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/157.jpg)
137
(i=201,j=174) (i=202,j=132) (i=203,j=214) (i=204,j=118) (i=205,j=246)
(i=206,j=61) (i=207,j=83) (i=208,j=108) (i=209,j=215) (i=210,j=126)
(i=211,j=31) (i=212,j=195) (i=213,j=154) (i=214,j=236) (i=215,j=87)
(i=216,j=82) (i=217,j=44) (i=218,j=231) (i=219,j=55) (i=220,j=147)
(i=221,j=117) (i=222,j=185) (i=223,j=92) (i=224,j=116) (i=225,j=80)
(i=226,j=6) (i=227,j=248) (i=228,j=170) (i=229,j=186) (i=230,j=111)
(i=231,j=42) (i=232,j=96) (i=233,j=107) (i=234,j=184) (i=235,j=216)
(i=236,j=42) (i=237,j=136) (i=238,j=248) (i=239,j=177) (i=240,j=31)
(i=241,j=196) (i=242,j=18) (i=243,j=25) (i=244,j=30) (i=245,j=161)
(i=246,j=33) (i=247,j=71) (i=248,j=183) (i=249,j=176) (i=250,j=67)
(i=251,j=90) (i=252,j=70) (i=253,j=144) (i=254,j=34) (i=255,j=139)
(i=0,j=224) (i=1,j=140) (i=2,j=7) (i=3,j=38) (i=4,j=156)
(i=5,j=231) (i=6,j=157) (i=7,j=24) (i=8,j=198) (i=9,j=208)
(i=10,j=127) (i=11,j=63) (i=12,j=207) (i=13,j=35) (i=14,j=34)
(i=15,j=68) (i=16,j=14) (i=17,j=205) (i=18,j=27) (i=19,j=153)
(i=20,j=244) (i=21,j=233) (i=22,j=109) (i=23,j=93) (i=24,j=216)
(i=25,j=223) (i=26,j=73) (i=27,j=151) (i=28,j=149) (i=29,j=132)
(i=30,j=137) (i=31,j=247) (i=32,j=36) (i=33,j=164) (i=34,j=163)
(i=35,j=247) (i=36,j=36) (i=37,j=89) (i=38,j=120) (i=39,j=201)
(i=40,j=243) (i=41,j=56) (i=42,j=138) (i=43,j=120) (i=44,j=82)
(i=45,j=233) (i=46,j=107) (i=47,j=39) (i=48,j=138) (i=49,j=177)
(i=50,j=45) (i=51,j=240) (i=52,j=213) (i=53,j=176) (i=54,j=114)
(i=55,j=194) (i=56,j=7) (i=57,j=42) (i=58,j=146) (i=59,j=38)
(i=60,j=40) (i=61,j=111) (i=62,j=248) (i=63,j=184) (i=64,j=187)
(i=65,j=93) (i=66,j=153) (i=67,j=44) (i=68,j=78) (i=69,j=19)
(i=70,j=255) (i=71,j=37) (i=72,j=146) (i=73,j=252) (i=74,j=117)
(i=75,j=135) (i=76,j=181) (i=77,j=242) (i=78,j=20) (i=79,j=113)
(i=80,j=77) (i=81,j=173) (i=82,j=135) (i=83,j=157) (i=84,j=112)
(i=85,j=191) (i=86,j=5) (i=87,j=112) (i=88,j=152) (i=89,j=205)
(i=90,j=228) (i=91,j=180) (i=92,j=87) (i=93,j=249) (i=94,j=69)
(i=95,j=78) (i=96,j=132) (i=97,j=72) (i=98,j=51) (i=99,j=240)
(i=100,j=170) (i=101,j=70) (i=102,j=31) (i=103,j=61) (i=104,j=169)
(i=105,j=120) (i=106,j=2) (i=107,j=132) (i=108,j=157) (i=109,j=33)
(i=110,j=217) (i=111,j=32) (i=112,j=139) (i=113,j=232) (i=114,j=170)
(i=115,j=147) (i=116,j=171) (i=117,j=36) (i=118,j=196) (i=119,j=156)
(i=120,j=107) (i=121,j=21) (i=122,j=164) (i=123,j=165) (i=124,j=192)
(i=125,j=137) (i=126,j=48) (i=127,j=223) (i=128,j=92) (i=129,j=35)
(i=130,j=6) (i=131,j=206) (i=132,j=80) (i=133,j=139) (i=134,j=117)
(i=135,j=79) (i=136,j=173) (i=137,j=118) (i=138,j=217) (i=139,j=20)
(i=140,j=192) (i=141,j=140) (i=142,j=116) (i=143,j=13) (i=144,j=87)
(i=145,j=227) (i=146,j=80) (i=147,j=57) (i=148,j=212) (i=149,j=210)
(i=150,j=198) (i=151,j=20) (i=152,j=60) (i=153,j=120) (i=154,j=79)
(i=155,j=194) (i=156,j=154) (i=157,j=179) (i=158,j=246) (i=159,j=109)
(i=160,j=56) (i=161,j=187) (i=162,j=204) (i=163,j=203) (i=164,j=90)
(i=165,j=91) (i=166,j=153) (i=167,j=19) (i=168,j=130) (i=169,j=238)
(i=170,j=176) (i=171,j=200) (i=172,j=86) (i=173,j=180) (i=174,j=209)
![Page 158: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/158.jpg)
138
(i=175,j=245) (i=176,j=183) (i=177,j=222) (i=178,j=123) (i=179,j=148)
(i=180,j=242) (i=181,j=32) (i=182,j=177) (i=183,j=115) (i=184,j=51)
(i=185,j=119) (i=186,j=135) (i=187,j=10) (i=188,j=31) (i=189,j=204)
(i=190,j=64) (i=191,j=143) (i=192,j=59) (i=193,j=78) (i=194,j=193)
(i=195,j=101) (i=196,j=5) (i=197,j=94) (i=198,j=82) (i=199,j=184)
(i=200,j=208) (i=201,j=33) (i=202,j=48) (i=203,j=47) (i=204,j=220)
(i=205,j=17) (i=206,j=217) (i=207,j=105) (i=208,j=129) (i=209,j=158)
(i=210,j=156) (i=211,j=121) (i=212,j=20) (i=213,j=249) (i=214,j=152)
(i=215,j=158) (i=216,j=25) (i=217,j=225) (i=218,j=16) (i=219,j=116)
(i=220,j=33) (i=221,j=119) (i=222,j=158) (i=223,j=77) (i=224,j=162)
(i=225,j=106) (i=226,j=0) (i=227,j=140) (i=228,j=163) (i=229,j=59)
(i=230,j=146) (i=231,j=221) (i=232,j=58) (i=233,j=209) (i=234,j=16)
(i=235,j=119) (i=236,j=50) (i=237,j=116) (i=238,j=224) (i=239,j=137)
(i=240,j=70) (i=241,j=7) (i=242,j=101) (i=243,j=143) (i=244,j=234)
(i=245,j=14) (i=246,j=81) (i=247,j=165) (i=248,j=46) (i=249,j=19)
(i=250,j=177) (i=251,j=133) (i=252,j=239) (i=253,j=163) (i=254,j=189)
(i=255,j=169) (i=0,j=63)
![Page 159: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/159.jpg)
139
APPENDIX-D
// Encrypted and Decrypted text using PARC4
4aedd1ab7ee3874055cce14283cdeb3fd24c22f7b887706bd127727fc3a3a6
83ab4123e2e9a61692776092b38cabeed57958a13817564eed72c84db44ad2
92d1e14d2d7cf36c357a1919f8c1f9698cd52715ec241f32ad833637adc5a9
039b59222c181519f5bfd8386e3536de7b067bbffb3ab2c3e756686b5bbfa5
6b65f84dfd2b6eaabbc96fdff9e52151537d4eafa39c1c436587fa7e9dbcae
a9a564e118cce2d43436820eb45ab4a9420f5694ebb8f13e6967df9be26d9a
51112bd76401c20c0f2654d80f9fc214185c4dcd51bcc7e0d3da56f5f2648b
076e763dcffebe516155961967d1036779512f6fc5011947326b50ccdc969f
e1ab7bfbd3b079ab975c56dded57cbcae132d9a41d89da23762cd347379c3e
1ad84a2436efcf4ad5bb07a7983a881b4ffd66948a5d97727ff4155e6c87d8
45ba682bf29cd4d476d75fe6d18390f144995ce546d4ec357f16cc22e6e2fd
653d4bb0db9e0ba178d16fec192fb5067ad8422dee0b3b86b267b62c5bbacb
847b7588ed296a4e7aafd93e5ed88521116547c521df929c0d22b5330a8139
bbda2e4af74ee5edeb04d514b2832ef46a15ad515e36b57bac73fa976ef5ab
70deab1c19f9156931b87fd794281ffe267449534f8441a7c2ad513cbe7c5c
2f51a577e67ed8b1f9e2dd413cd67c113760dc16e7ea1e1281353cb9458ec6
8a8ffce674e4d3a26fac855810c5f45a98c4ea72ebdaf3bc9b7d7684353153
dbe2a7c0aa6de5f2be1aa87a7fc2a496faecd7f9e811b877f7ea512587c9fd
841af692a11159bcad76ccc56afd29c9481849fde59295c9717ff628e3c642
8da4d9538a0dd1ee3f476f17efcd87b3596da08f64bdc998832e31676fc6bd
92fc42b82f8a9d80bfe2e7a196fec2955be7557f511fb73fc5c9374b6fe812
98bebbabad61f810a3adf45436d74f35ab1419c1aec6b4792c819fadf74eca
625a8a0582bfc5e401030cdef3843c7e9f8644c4f59ae75abfdfe84b3cba3f
50381db678ef74d3f7e9ef852463aca66ef3771dd19f9fd1d0802121b0e8fd
b8a88f0a978f4d8e87ffec554168be65899c8fa33d81bae8bfd36d6a85263d
4493e7adcfa14070cedb995da3647cc3a094beebd869ad0b9e7a33f911416d
9cdd5eb86a2d1017d5dad077d648f6929f959d4b9dcf4d655d8c59eb68cb7e
7e34b428dc35a1ddcf5f5156d5ff3d48ae05166e4b167f6e0bbb86a232a688
2af88fd46fc25cdfc99b7ffaebb96fa819b4c441456782512fd2fcac468306
bf8c98b8acacac7febefb3ac1b5a4a2675ec5ab24fd312fc603809a59f6d73
cfebc26a4bc1918d45e59137addf1395ccbfeb2654442504c867394faaf423
eb70543e5bbd79e6749ee7f4ea45f0453fdb7be42866de18f8ff5310853839
aa51d1da8b89e5e87ce1cea67ce787421cc8a84287dae17ed54cfaa9d47b6a
c13c244bc0efa689834b39e5faa21bb12b7d81b48ebbe19ceddb3ec985772e
317473090ca14ad6b2a1a1a93c8a1768b4df0868793d1a4c9cd158644d8f4d
c8639c7f6330b524933fb3d24ae9a5556357bdc688e15f7ea58271f6f9baf2
643d2b7980b8c2fa4ef32c899fb29cdae9b288f7829d32013574f275ded36c
989363370e9119da1b9adaf7ae9a1a1a91e405c7937ef5bb3589e16e9764b8
1831becdf7980a63ab6a75815d5505557bcbf37d46d6f5e76d4926c4fcf56b
7c3f55671b044402458b73fee33fdf8f2e70e84b399a79f16677df1df2f211
68a3b66af51c3c4898df2e97fffd9a552e6935f1781a95f81dbef31df14180
bd863b66c974364fc1edbec6b05d3bfac5ecbab2c6e8dadc8bcf793599d2c5
bd7235eab543194c850a86c21e65583d3a43bd74ce997949812f34d9fd0242
c238854d27bc316231b631de3da4c44eabe17775ffa8892ff4f65ec8b72c8f
592f3703f6b619bb797f447b5108ca899bfe3afa6c7f0c68f281191d45623c
![Page 160: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/160.jpg)
140
f630dacc2e3148ac1f8aedaaaea870e0e7b1a61d19597925f759bb499f19f7
614a9182ff1d63281bb3ab7b5171fc75d11192f8bc175ac4f1e06357432159
c14fb68be54a72a14a516e21bd32e573e0d1dbc247e357369b7ff122722014
febd11d89783c924ad0d98292ebea
//Decrypted text
another algorithm. Schneier designed Blowfish as a general-
purpose algorithm, intended as an alternative to the aging DES
and free of the problems and constraints associated with other
algorithm. Schneier designed Blowfish as a general-purpose
algorithm, intended as an alternative to the aging DES and
free of the problems and constraints associated with other
algorithm. Schneier designed Blowfish as a general-purpose
algorithm, intended as an alternative to the aging DES and
free of the problems and constraints associated with other
algorithm. Schneier designed Blowfish as a general-purpose
algorithm, intended as an alternative to the aging DES and
free of the problems and constraints associated with other
algorithm. Schneier designed Blowfish as a general-purpose
algorithm, intended as an alternative to the aging DES and
free of the problems and constraints associated with other
algorithm. Schneier designed Blowfish as a general-purpose
algorithm, intended as an alternative to the aging DES and
free of the problems
![Page 161: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/161.jpg)
141
APPENDIX-E
// Encrypted and Decrypted text using PARC4-I
Encrypted text:
b6589fc6ab0dc82cf12099d1c2d40ab994e8410cddfe163345d338193ac2bd
c183f8e9dcff904b43c4a2d99bc28d236098a095277b7eb0718d6be068c4b5
c86bd577da3d93fea7c89cba61c78b48e58911904a4e8b77f6242e2d288705
023adad00a9310fdf8bc5814536f66012884e146a8887a44709a56b7ed0881
90c204b31cd71484e6a1c538986b5f77ccaa8d8dcc7d030cd6a6768db81f90
d0ef976c3d9a7149a5a7786bb368e06d08c5d77774eb43a49e87acec17cd9d
cd20a716cc2cf67417b71c8a70167ed10e4a589c87f9e6a85c22e4b0c38ecf
5f50595dd23b67eb79211cfdddad518279291b117971d3d7c997c777b174bb
05faa82799526f12
Decrypted text
The incredible growth of the Internet has excited businesses and consumers alike with
its promise of changing the way we live and work. It's extremely easy to buy and sell
goods all over the world while sitting in front of a laptop. But security is a major
concern on the Internet, especially when you're using it to send sensitive information
between parties. Information security is provided on computers and over the Internet
by a variety of methods. A simple but straightforward security method is to only keep
sensitive information on removable storage media like portable flash memory drives or
external hard drives.
![Page 162: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/162.jpg)
142
14 APPENDIX-D
// Encrypted and Decrypted text using PBlock
//Encrypted text:
45303030303034367c412d317c41317ca45303030303035307c412e412e4d2
e442e7c41414d447ca45303030303038317c414944532072656c6174656420
636f6d706c65787c414944532d72656c6174656420636f6d706c65787ca453
03030303039387c414e4c4c7c414e4c7ca45303030303134387c416365636c
6964696e7c416365636c6964696e657ca45303030303135347c6163747c416
3747ca45303030303135357c61637469766520766572746963616c20636f72
726563746f727c41637469766520566572746963616c20436f72726563746f
727ca45303030303135387c4164616d732053746f6b6573206469736561736
57c4164616d732d53746f6b657320646973656173657ca4530303030313633
7c61647269616d7963696e7c41647269616d7963696e7ca453030303031363
47c61647269616d7963696e6f6c7c41647269616d7963696e6f6c7ca453030
30303138317c416e63697374726f646f6e7c41676b697374726f646f6e7ca4
5303030303230327c616c6369616e20626c75657c416c6369616e20626c756
57ca45303030303231327c416c6578616e646572205472616c6c69616e7573
7c416c6578616e646572206f66205472616c6c65737ca45303030303232357
c416c6b2e2050686f732e7c416c6b2e2070686f732e7ca4530303030323235
7c616c6b2e2070686f732e7c416c6b2e2070686f732e7ca453030303032333
47c416c75497c416c7520497ca45303030303233397c616c7a6865696d6572
2d747970657c416c7a6865696d65722d747970657ca45303030303335337c4
16d65726963616e2d747970652063756c7475726520636f6c6c656374696f6
e7c416d65726963616e20747970652063756c7475726520636f6c6c6563746
96f6e7ca45303030303336307c416e616261656e617c416e6162656e617ca4
5303030303337397c616e676f72617c416e676f72617ca4530303030333831
7c616e746172637469637c416e746172637469637ca45303030303338327c6
16e746172637469637c416e746172637469637ca45303030303338397c6170
617274686569647c4170617274686569647ca45303030303430317c4170726
573736f6c696e657c41707265736f6c696e657ca45303030303432337c4172
61417c41726120417ca45303030303432337c61726120417c41726120417ca
45303030303432337c6172612d417c41726120417ca45303030303432337c6
17261417c41726120417ca45303030303432387c61726368626973686f707c
41726368626973686f707ca45303030303433327c6172656368696e657c417
2656368696e657ca45303030303433377c417267796c6c2d526f6265727473
6f6e20707570696c7c417267796c6c2052
![Page 163: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/163.jpg)
143
// Decrypted text: E0000046|A-1|A1|
E0000050|A.A.M.D.|AAMD|
E0000081|AIDS related complex|AIDS-related complex|
E0000098|ANLL|ANL|
E0000148|Aceclidin|Aceclidine|
E0000154|act|Act|
E0000155|active vertical corrector|Active Vertical Corrector|
E0000158|Adams Stokes disease|Adams-Stokes disease|
E0000163|adriamycin|Adriamycin|
E0000164|adriamycinol|Adriamycinol|
E0000181|Ancistrodon|Agkistrodon|
E0000202|alcian blue|Alcian blue|
E0000212|Alexander Trallianus|Alexander of Tralles|
E0000225|Alk. Phos.|Alk. phos.|
E0000225|alk. phos.|Alk. phos.|
E0000234|AluI|Alu I|
E0000239|alzheimer-type|Alzheimer-type|
E0000353|American-type culture collection|American type
culture collection|
E0000360|Anabaena|Anabena|
E0000379|angora|Angora|
E0000381|antarctic|Antarctic|
E0000382|antarctic|Antarctic|
E0000389|apartheid|Apartheid|
E0000401|Apressoline|Apresoline|
E0000423|AraA|Ara A|
E0000423|ara A|Ara A|
E0000423|ara-A|Ara A|
E0000423|araA|Ara A|
E0000428|archbishop|Archbishop|
E0000432|arechine|Arechine|
E0000437|Argyll-Robertson pupil|Argyll R
![Page 164: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/164.jpg)
144
// Text encryption using PARC4
//Encrypted text (256 bytes)
7bb88cfd22ec81414cab4558992b739d05d52abeec97233c17d362a80e2a59
af51028bfaff64adc637186a6d2eeb8c652cc894fc02421bf534a6fdfd754a
b76714816cad38572c15fe7c9c3cc184f5b9a8d5a355fdbcfe699e6362688e
bc96eead356bde743c92fb9485f363ff6e765ade9befe3a76643d86ecbdaf1
3f45adf8cc5b1e4f6abdbbcf8925d56114b255e1bf46c99c675579b34191fd
feb0fb21b04d8aa3584a1c6565b557f5158e41f33344ecde4ea3862cacff7c
dfaf1512ad44d4a6497ab7041c6a0ed621510a1c967f8c0ba167de12f1771f
b067b239daa8a1b4551c0798124472327cd14a4fd1440df6b76e210ce82d2c
ef1f62becdef12dbfdb7139fe43889eb5638e5557feed972034c033327e89e
0f0c0ae4679ecacfb4ba36c2080f2d3efb4925c2df1bcd2d75b744a34d9805
2fa2c731d46c2838476965bb5c6cccf191e5acc825938a876a5689838306c8
99ce3db18754e4e74b9aeadc582f75438ff8366a5e8bfab3f7f693dd4ecbcf
b14f459dd8bc7e7b1a4e7d2f3a799a5242422a5b1ea76bcc9222b24b119cdf
ea1b3ae27eb468aa05f2403c31b351f710de17f13747ead742ac827ef9fc63
ddfa4212fc214466996ff7142c5fdf166141841b93cfd9ebc437fe02997ddb
264e327dfa8a3ba049279d5244877719e47a1ff1010d0667fe5189b83d59da
5f4
//Decrypted text
b6589fc6ab0dc82cf12099d1c2d40ab994e8410cddfe163345d338193ac2bd
c183f8e9dcff904b43c4a2d99bc28d236098a095277b7eb0718d6be068c4b5
c86bd577da3d93fea7c89cba61c78b48e58911904a4e8b77f6242e2d288705
023adad00a9310fdf8bc5814536f66012884e146a8887a44709a56b7ed0881
90c204b31cd71484e6a1c538986b5f77ccaa8d8dcc7d030cd6a6768db81f90
d0ef976c3d9a7149a5a7786bb368e06d08c5d77774eb43a49e87acec17cd9d
cd20a716cc2cf67417b71c8a70167ed10e4a589c87f9e6a85c22e4b0c38ecf
5f50595dd23b67eb79211cfdddad518279291b117971d3d7c997c777b174bb
05faa82799526f12
![Page 165: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/165.jpg)
145
// Text encryption using PARC4-I (256 bytes)
//Encrypted text
2da3d9f37cba820429dbf78f99b23edda6a8ef9779679d6064428bb0fd93fe
122eb0fef04da06d21dbf781bcee95e9f8e18972822bd571b3e8cd21f27128
4d11918c8c7b975ee5c5c19e1f44fc38b83b9d552ae359c3e31648c59ce3bb
78cdb2e4423455ab97dbaa03faa8467aabfbbe46c61316984e3d5ab12a823d
98ec4e2b5f3e182a294cb85b151726594aaf6e9b9771592eb14ccef9feb9aa
23ea4ef8f4577426f64e751f718dd45f93a47d1db4ff38e2eabad68a5f9424
fa8017163fccfe2a1393a3a967411ea12c650e0cbec412ee47b47aaee6cb12
7dbabf9b75746502d8921b621219c1ba2fe45118a617ee51ace8586c9aca32
ef485f929bfdc348c9b068f9bbd3b8e5750a7b8ca7d6497666012d5b5af9fa
c1764b8fcae1ba13e7a80f3d3eab89dacbc149977976b06163a93894f8287a
4f4793df867f908b4c1c49d491bdc58c76e5cd2bab6b993d3f39a673c569b7
8058e6e51532cffc5d1f15d6eafd131fab6bfe039673d3886bf80a849a970d
edcc0bdbca9ead1f7c4c85e141132f261da03bcfcc777929ee1a91f9feeefe
72eab1f8a15b51173f6ce7df349d941fb331bd8da18a08f24aae167f4f6164
e9114b4739caa72f4dc6a8a43210485c199f4b09ab61461e219b7fabd3ab17
689f1a9b85de354328b20b320219c1daff94646d36373ec12d3d581c5a6a9
// Decrypted text
b6589fc6ab0dc82cf12099d1c2d40ab994e8410cddfe163345d338193ac2bd
c183f8e9dcff904b43c4a2d99bc28d236098a095277b7eb0718d6be068c4b5
c86bd577da3d93fea7c89cba61c78b48e58911904a4e8b77f6242e2d288705
023adad00a9310fdf8bc5814536f66012884e146a8887a44709a56b7ed0881
90c204b31cd71484e6a1c538986b5f77ccaa8d8dcc7d030cd6a6768db81f90
d0ef976c3d9a7149a5a7786bb368e06d08c5d77774eb43a49e87acec17cd9d
cd20a716cc2cf67417b71c8a70167ed10e4a589c87f9e6a85c22e4b0c38ecf
5f50595dd23b67eb79211cfdddad518279291b117971d3d7c997c777b174bb
05faa82799526f12
![Page 166: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/166.jpg)
146
15 APPENDIX-E
// Modified encryption function for PARC4
unsigned char rc4_output()
{
i1 = (i1 + 1) % 256;
j1 = (j1 + s[i1]) % 256;
swap(s, i1, j1);
i1++;
j1++;
return s[((s[i1] + s[j1]) % 256)];
}
// Parallel region for encryption in PARC4
// by default all variables are shared variables. But i is declared as private to each
core.
#pragma omp parallel for default(shared) private(i)
for (int x = 0; x < block; x++)
{
i=0;
int y=x*256,end=y+256;
while(y<end)
{
enblock[y] = (memblock[y] ^((s2[i++]+x)%256));
// to generate random stream for each block//
y++;
![Page 167: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/167.jpg)
147
}
}
// End of parallel region
![Page 168: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/168.jpg)
148
APPENDIX-F
// modified encryption method to return four distinct bytes using PARC4-I
unsigned char * PARC4I_output()
{
i = (i + 1) %256;
j1=(j1+s1[i])%256;
swap_s1(s1,i,j1);
V1=(s1[i]+s1[j1])% 256;
index1[0]=V1;
i = (i +1) %256;
j2=(j2+s2[i])% 256;
swap_s2(s2,i,j2);
V2=(s2[i]+s2[j2])% 256;
index1[1]=V2;
i = (i +1) %256;
j3=(j3+s3[i])% 256;
swap_s3(s3,i,j3);
V3=(s3[i]+s3[j3])% 256;
index1[2]=V3;
i = (i +1) %256;
j4=(j4+s4[i])% 256;
swap_s4(s4,i,j4);
V4=(s4[i]+s4[j4])% 256;
index1[3]=V4;
![Page 169: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/169.jpg)
149
j1++;
j2++;
j3++;
j4++;
return index1;
}
// Parallel region of PARC4-I to encrypt multiple data blocks simultaneously
#pragma omp parallel default(shared) private(i)
{
#pragma omp for
for (int x = 0; x < block; x++ )
{
i=0;
int y=x*256,end=y+256;
while(y<end)
{
enblock[y] = (memblock[y] ^(((temp[i])+x)%256) );
enblock[y+1] = (memblock[y+1] ^(((temp[i+1])+x)%256) );
enblock[y+2] = (memblock[y+2] ^(((temp[i+2])+x)%256) );
enblock[y+3] = (memblock[y+3] ^(((temp[i+3])+x)%256) );
i=i+4;
y=y+4;
}
}
}
![Page 170: PARALLEL ALGORITHMS FOR SYMMETRIC KEY ...shodhganga.inflibnet.ac.in/bitstream/10603/46758/1/dh...CERTIFICATE BY THE SUPERVISOR This is to certify that the thesis entitled “Parallel](https://reader036.vdocuments.net/reader036/viewer/2022071110/5fe548f66b828024ec342cce/html5/thumbnails/170.jpg)
150
APPENDIX-G
// Parallel region for encryption/decryption in PBlock
#pragma omp parallel default(none)
shared(ctx,enblock,memblock,block,size)
{
#pragma omp for
for (int x = 0; x < block; x++)
{
int y=x*64;
int L1=y,R1=y+32;
Blowfish_Encrypt(&ctx,L1,R1,memblock);
}
#pragma omp barrier
#pragma omp for
for (int x = 0; x < block; x++)
{
int y=x*64;
int L1=y,R1=y+32;
Blowfish_Decrypt(&ctx,L1,R1,memblock);
}
}
// Parallel region ends