a vhdl scalable-encryption-algorithm

Abstract

SEA is a scalable encryption algorithm targeted for small embedded applications. It was initially designed for software implementations in controllers, smart cards, or processors. In this letter, we investigate its performances in recent field-programmable gate array (FPGA) devices. For this purpose, a loop architecture of the block cipher is presented. Beyond its low cost performances, a significant advantage of the proposed architecture is its full flexibility for any parameter of the scalable encryption algorithm, taking advantage of generic VHDL coding. The letter also carefully describes the implementation details allowing us to keep small area requirements. Finally, a comparative performance discussion of SEA with the Advanced Encryption Standard Rijndael and ICEBERG (a cipher purposed for efficient FPGA implementations) is proposed. It illustrates the interest of platform/context-oriented block cipher design and, as far as SEA is concerned, its low area requirements and reasonable efficiency.

Scalable encryption algorithm (SEA) is a parametric block cipher for resource constrained systems (e.g., sensor networks, RFIDs) that has been introduced in [1]. It was initially designed as a low-cost encryption/ authentication routine (i.e., with small code size and memory) targeted for processors with a limited instruction set (i.e., AND, OR, XOR gates, word rotation, and modular addition). Additionally and contrary to most recent block ciphers (e.g., the DES [2] and AES Rijndael [3], [4]), the algorithm takes the plaintext, key, and the bus sizes as parameters and, therefore, can be straightforwardly adapted to various implementation contexts and/or security requirements. Compared to older solutions for low-cost encryption like tiny encryption algorithm (TEA) [5] or Yuval’s proposal [6], SEA also benefits from a stronger security analysis, derived from recent advances in block cipher design/cryptanalysis.

In practice, SEA has been proven to be an efficient solution for embedded software applications using microcontrollers, but its hardware performances have not yet been investigated. Consequently, and as a first step towards hardware performance analysis, this letter explores the features of a low-cost field-programmable gate array (FPGA) encryption/ decryption core for SEA. In addition to the performance evaluation, we show that the algorithm’s scalability can be turned into a fully generic VHDL design, so that any text, key, and bus size can be straightforwardly reimplemented without any modification of the hardware description language, with standard synthesis and implementation tools.

1

CONTENTS

CHAPTER 1: Introduction to VLSI 9

1.1 Introduction 9 1.2 VLSI Design Style 10 1.3 VLSI Design Flow 11 1.4 VLSI Features 11

CHAPTER 2: Introduction to VHDL 122.1 Introduction 122.2 Capabilities 132.3 Abstraction levels of VHDL 13 2.4 Basic Terminology 142.5 Modeling Techniques for VHDL 172.6 Process Statements 182.7 Conditional Statements 19 2.8 Active HDL Overview 212.9 Macro language 22 2.10 Compilation 23 2.11 Simulation 23 2.12 X Linix 24

CHAPTER 3: Introduction to SEA 26 3.1 Specifications 273.2 Design properties 303.3 Overall Structure 313.4 Security Analysis 313.5 Performance Analysis 35

CHAPTER 4: An Exposition Of SEA 37 4.1 Overview of SEA 38

CHAPTER 5: SEA Architecture 39

5.1 Key Generation 405.2 Encryption 425.3 Decryption 44

Appendix-I Simulation Results 47

Appendix-II Synthesis Reports 50

Appendix- III Implementation 79

Appendix-IV Advantages 80

2

Appendix-V Conclusion 81

Appendix-VI Bibliography 82

CH:1 INTRODUCTION TO VLSI

The first digital circuit was designed by using electronic components like vacuum tubes and transistors. Later Integrated Circuits (ICs) were invented, where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale. With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration). Using design at this level, one can create digital sub blocks (adders, multiplexes, counters, registers, and etc.) on an IC. This level is LSI (Large Scale Integration), using this scale of integration people succeeded to make digital subsystems (Microprocessor, I/O peripheral devices and etc.) on a chip.

At this point design process started getting very complicated. i.e., manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical. This created new challenges to digital designers as well as circuit designers. Designers felt need to automate these processes. In this process, Rapid advances in Software Technology and development of new higher level programming languages taken place. People could able to develop CAD/CAE (Computer Aided Design/Computer Aided Engineering) tools, for design electronics circuits with assistance of software programs. Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency. It became very easy to a designer to verify functionality of design at various levels.

With advent of new technology, i.e., CMOS (Complementary Metal Oxide Semiconductor) process technology. One can fabricate a chip contains more than Million of gates. At this point design process still became critical, because of manual converting the design from one level to other. Using latest CAD tools could solve the problem. Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels. This way of designing (using CAD tools) is certainly a revolution in electronic industry. This may be leading to development of sophisticated electronic products for both consumer as well as business. Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability, performance and etc.,) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing.

3

Specifications

Behavioral Description

RTL Description

Behavioral SimulationFunctional SimulationBehavioral Synthesis

Logic Synthesis

Gate Level Net list

ConstraintsConstraintsLibraryAutomaticP&R

LayoutLogic simulationFabricationLay Out Management

1.2 TYPICAL IC DESIGN FLOW:

4

1.3 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories, Evolving from micron technology and extending up to VDSM.

Micron Technology : The technology up to 10-6 µ m is the micron

Technology.

Submicron Technology : The technology below 1um is known as the Submicron technology. It generally ranges up to 0.36µ m.

DSM(Deep Sub Micron technology) : The technology extending up to 0.18µ m is DSM.

VDSM(Very Deep Sub Micron technology): The presently used technology is VDSM. It ranges up to 0.09um.

1.4 FEATURES:

5

Micron SM DSMVDSM

7

Micron Technology

2.1 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description language.VHSIC is acronym for very high speed Integrated Circuits. It is a hardware description language that can be used to model a digital system at many levels of abstraction, ranging from the algorithmic level to the gate level.

The VHDL language can be regarded as an integrated amalgamation of the following languages:

➢ Sequential language

➢ Concurrent language

➢ Net-list language

➢ Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore, models written in this language can be verified using a VHDL simulator. This subset is usually sufficient to model most applications .The complete language, however, has sufficient power to capture the descriptions of the most complex chips to a complete electronic system.

HISTORY:

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD). Reprocurement and reuse was also a big issue. Thus, a need for a standardized hardware description language for the design,

8

documentation, and verification of the digital systems was generated. The IEEE in the December 1987 standardized VHDL language; this version of the language is known as the IEEE STD 1076-1987. The official language description appears in the IEEE standard VHDL language Reference manual, available from IEEE. The language has also been recognized as an American National Standards Institute (ANSI) standard.

According to IEEE rules, an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard. Consequently, the language was upgraded with new features, the syntax of many constructs was made more uniform, and many ambiguities present in the 1987 version of the language were resolved. This new version of the language is known as the IEEE STD 1076-1993.

2.2 CAPABILITIES:

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages.

The language can be used as exchange medium between chip vendors and CAD tool users. Different chip vendors can provide VHDL descriptions of their components to system designers.

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy; that is a digital can be modeled as asset of interconnected components; each component, in turn, can be modeled as a set of interconnected subcomponents.

The language supports flexible design methodologies: top-down, bottom-up, or mixed. It supports both synchronous and asynchronous timing models.

Various digital modeling techniques, such as finite –state machine descriptions, and Boolean equations, can be modeled using the language.

The language is publicly available, human-readable, and machine-readable.

The language supports three basic different styles: Structural, Dataflow, and behavioral.

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions.

Arbitrarily large designs can be modeled using the language, and there are no limitations imposed by the language on the size of the design.

9

2.3 HARDWARE ABSTRACTION:

VHDL is used to describe a model for a digital hardware device. This model specifies the external view of the device and one or more internal views. The internal view of the device specifies functionality or structure, while the external view specifies the interface of the device through which it communicates with the other modules in the environment. In VHDL each device model is treated as a distinct representation of a unique device, called an Entity. The Entity is thus a hardware abstraction of the actual hardware device. Each Entity is described using one model, which contains one external view and one or more internal views.

2.4 Basic terminology:

VHDL is a hardware description language that can be used to model a digital system. A hardware abstraction of this digital system is called an entity. An entity X, when used in another entity Y, becomes a component for the entity Y.To describe an entity, VHDL provides five different types of primary constructs, called design units. They are:

1. Entity declaration 2. Architecture body 3. Configuration declaration 4. Package declaration

5. Package body

1. An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity;

For example: the input and output signal names.2. The architecture body contains the internal description of the entity; for

example, as a set of interconnected components that represents the structure of the entity, or a set of concurrent or sequential statements that represents the behavior of the entity.

3. A configuration declaration is used to create a configuration for an entity. It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity .It may also specify the bindings of the architecture components used in the selected architecture body to other entities. An entity may have any number of configurations.

4. A package declaration encapsulates a set of related declarations, such type of declaration s, subtype declaration and subprogram declaration, which can be shared across two or more design units.

5. A package body contains the definition of subprogram declared in a package declaration.

10

Once an entity has been modeled, it needs to be validated by a VHDL system. A typical VHDL system consists of an analyzer and a simulator. The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks.

The language is case insensitive; that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -).

Entity Declaration:

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports. Ports are signals through which entity communicates with other models in its external environment.

EXAMPLE:

Entity declaration for the half adder circuit is

Entity half adder is Port (A, B: in Bit; sum, carry: out Bit); End half adder;

The entity called half adder has two input ports, A and B and two out put ports sum and carry Bit is predefined type of the language.

Architecture Body:

An architecture body using any of the following modeling styles specifies the internal details of an entity.

1. As a set of interconnected components (to represent structure)2. As a set of concurrent assignment statements (to represent data flow)3. As a set of sequential assignment statements (to represent behavior)4. As any combination of the above three.

2.5 Structural style of modeling:

In this one an entity is described as a set of interconnected components. Such a model for the HALF_ADDER entity, is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X, Y: in BIT; Z:out BIT);End component;Component And2 Port (L, M: in BIT; N:outBIT); End component; Begin X1: Xor2portmap (A, B, SUM)

11

A1: AND2portmap (A, B, CARRY); End ha;

The name of the architecture body is ha .the entity declaration for half adder specifies the interface ports for this architecture body. The architecture body is composed of two parts: the declaration part and the statement part. Two component declarations are present in the declarative part of the architecture body.

The declared components are instantiated in the statement part of the architecture body using component instantiation. The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position.

DATAFLOW STYLE OF MODELING:

In this modeling style, the flow of data through the entity is expressed primarily using concurrent signal assignment statements. The data flow model for the half adder is described using two concurrent signal assignment statements .In a signal assignment statement, the symbol <=implies an assignment of a value to a signal.

BEHAVIORAL STYLE OF MODELING:

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order. These sets of sequential statements, which are specified inside a process statement, do not explicitly specify the structure of the entity but merely its functionality. A process statement is a concurrent statement that can appear with in an architecture body. MIXED STYLE OF MODELING:

It is possible to mix the three modeling styles in a single architecture body. That is, within an architecture body, we could use component instantiation statements, concurrent signal assignment statements and process statements.

MODEL ANALYSIS:

Once an entity is declared in VHDL, it can be validated using analyzer and a simulator that are apart of a VHDL system. The first step in the validation process is analysis. The analyzer takes a file that contains one or more design units and compile s them into an intermediate form. The generated intermediate form is stored in a specific design library that has been designated as the working library.

There is a design library with the logic name STD predefined by the VHDL language environment. This library contains two packages: STANDARD and TEXTIO. The STANDARD package contains declarations for all the predefined types of the language .The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations. There also exists an IEEE standard package called STD_LOGIC_1164,and contains its associated sub types; overloaded

12

operator functions, and other useful utilities. This standard is called the IEEE STD 1164 –1993.

SIMULATION:

For a hierarchical entity to be simulated, all of its lowest –level components must be described at the behavioral level. A simulation can be performed on either one of the following:

1. An entity declaration and an architecture body pair.

2. A configuration

Preceding the actual simulation are two major steps:

1. Elaboration phase: IN this phase, the hierarchy of the entity is expanded and linked, components are bound to entities in a library, and the top- level entity is built as a network of behavioral models that is ready to be simulated.

2. Initialization phase: Driving and effective values for all explicitly declared signals are computed, implicit signals are assigned values, processes are executed once until they suspend, and simulation time is set to 0ns.

Simulation commences by advancing time to that of the next event. Values that are assigned to signals at this time are assigned. If the value of a signal changes, and if that signal is present in the sensitivity list of a process, the process is executed until it suspends. Simulation stops when an assertion occurs, depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached.

Entity Declaration:

An entity declaration describes the external interface of the entity. It specifies the name of the entity, the names of the interface ports, their mode and the type of ports .The syntax for entity declaration is:

Entity entity _name is [generic (list of –generics and –their types);] [port (list of interface-port-names-and their types );]

[entity item declarations] [begin entity statements] end [entity][entity name];

The entity –name is the name of the entity, and the interface ports are the signals through which entity passes the information to and from its external environment. Each interface port can have one of the following modes:

1. in: The value of an input port can only read with in the entity model .2. out: The value of an out put port can only be updated within the entity model.

13

3. inout: The value of a bi directional port can be read and updated within the entity model.

4. buffer: The value of a buffer port can be read and updated within the entity model .It cannot have more than one source.

Declarations that are placed in the entity are common to all the design units that

are associated with that entity declaration.

ARCHITECTURE BODY:

An architecture body describes the internal view of an entity. It describes the functionality of the structure of the entity.

Architecture <architecture name> of< entity name> is Begin Concurrent statements; Process statements; Block statements; Concurrent signal assignment-statement; Component –instantiation-statement; Generate statement; End [architecture] [architecture name];

The concurrent statements describe the internal composition of the entity. All

concurrent statements are executed in parallel. The internal composition of an entity can be expressed in terms of structure, dataflow and sequential behavior.

Here we describe an entity by using the behavioral model. A process statement, which is a concurrent statement, is the primary mechanism used to describe the functionality of an entity in this modeling style.

2.6 PROCESS STATEMENT:

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms. The syntax for the process statement is:

[Process-label:] process [(sensitivity-list)] [is] begin sequential statements; variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement

14

report-statement procedure-call-statement return end process [process label];

A set of signals to which the process is sensitive is defined by the sensitivity list. In other words, each time an event occurs on any of the signals in the sensitivity list, the sequential statements with in the process are executed in a sequential order, that is in the order in which they appear. The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list.

VARIABLE ASSIGNMENT STAEMENT:

Variables can be declared and used inside a process statement. A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object: = expression;

The expression is evaluated when the statement is executed, and the computed value is assigned to the variable object instantaneously, that is, at the concurrent simulation time.

A variable can be declared outside of a process or subprogram. Such a variable can be read and updated by more than one process. These variables are called shared variables.

SIGNAL ASSIGNMENT STATEMENT:

Signals are assigned values using a signal assignment statement. The simplest form of a signal assignment statement is:

Signal-object <= expression [after a delay value];

A signal assignment statement can appear within a process or outside of a process. If it occurs outside of a process, it is considered to be a concurrent signal assignment statement.

When a signal assignment statement appears with in a process, it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process.

2.7 CONDITIONAL STATEMENTS:

IF STATEMENT: An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition .the condition .The condition can be any expression that evaluates to a Boolean value. The general form of an if statement is:

15

If Boolean expression thenSequential statements{elsif Boolean-expression thenSequential-statements}[else sequential statements] end if;

The if statement is executed by checking each condition sequentially until the first true condition is found; the set of sequential statements associated with this condition is executed. An if statement is also a sequential statement.

CASE STATEMENT:

The format of a case statement is: Case expression is

When choices =>sequential statementsWhen choices =>sequential statements End case;

The case statement selects one of the branches for the execution based on the value of the expression. The expression value must be of a discrete type or one-dimensional array type. Choices may be expressed as single values, as a range of values by choosing “others”. The other clause can be used as a choice to cover the “catch-all” values and, if present, must be the last branch in the case statement

LOOP STATEMENTS:

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is:

[Loop-label:] iteration-scheme loopSequential-statements End loop [loop label];

16

2.8 Active HDL Overview:

Active-HDL is an integrated environment designed for development of VHDL, Verilog, EDIF and mixed VHDL-Verilog-EDIF designs. It comprises three different design entry tools, VHDL'93 compiler, Verilog compiler, single simulation kernel, several debugging tools, graphical and textual simulation output viewers, and auxiliary utilities designed for easy management of resource files, designs, and libraries.

Standards Supported

VHDL:

The VHDL simulator implemented in Active-HDL supports the IEEE Std. 1076-1993 standard.

Verilog:

The Verilog simulator implemented in Active-HDL supports the IEEE Std. 1364-1995 standard. Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL.

EDIF:

Active-HDL supports Electronic Design Interchange Format version 2 0 0.

VITAL:

The simulator provides built-in acceleration for VITAL packages version 3.0. The VITAL-compliant models can be annotated with timing data from SDF files. SDF files must comply with OVI Standard Delay Format Specification Version 2.1.

WAVES:

Active-HDL supports automatic generation of test benches compliant with the WAVES standard. The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P1029.1/D1.0 May 1997). The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that

17

supports the verification and testing of hardware designs, the communication of hardware design and test verification data, the maintenance, modification and procurement of hardware system.

2.9 ACTIVE-HDL Macro Language:

All operations in Active-HDL can be performed using Active-HDL macro language. The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI).

1. HDL Editor:

HDL Editor is a text editor designed for HDL source files. It displays specific syntax categories in different colors (keyword coloring). The editor is tightly integrated with the simulator to enable debugging source code. The keyword coloring is also available when HDL Editor is used for editing macro files, Perl scripts, and Tcl scripts.

2. Block Diagram Editor:

Block Diagram Editor is a graphical tool designed to create block diagrams. The editor automatically translates graphically designed diagrams into VHDL or Verilog code.

3. State Diagram Editor:

State Diagram Editor is a graphical tool designed to edit state machine diagrams. The editor automatically translates graphically designed diagrams into VHDL or Verilog code.

4. Waveform Editor:

Waveform Editor displays the results of a simulation run as signal waveforms. It allows you to graphically edit waveforms so as to create desired test vectors.

5. Design Browser:

The Design Browser window displays the contents of the current design, that is:

➢ Resource files attached to the design.

➢ The contents of the default-working library of the design.

➢ The structure of the design unit selected for simulation.

➢ VHDL, Verilog, or EDIF objects declared within a selected region of the current design.

6. Console window:

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands, macros, and scripts. All Active-HDL tools output their messages to Console.

18

2.10 Compilation:

Compilation is a process of analysis of a source file. Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator. In Active-HDL, a source file can be on of the following:

• VHDL file (.vhd)

• Verilog file (.v)

• EDIF net list file

• State diagram file (.asf)

• Block diagram file (.bde)

In the case of a block or state diagram file, the compiler analyzes the intermediate VHDL, Verilog, or EDIF file containing HDL code (or net list) generated from the diagram.

A net list is a set of statements that specifies the elements of a circuit (for example, transistors or gates) and their interconnection.

Active-HDL provides three compilers, respectively for VHDL, Verilog, and EDIF. When you choose a menu command or toolbar button for compilation, Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled.

2.11 Simulation:

The purpose of simulation is to verify that the circuit works as desired.

The Active-HDL simulator provides two simulation engines.

➢ Event-Driven Simulation

➢ Cycle-Based Simulation

The simulator supports hybrid simulation – some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel. Cycle-based simulation is significantly faster than event-driven.

2.12 XILINX:

Integrated Software Environment (ISE) is the Xilinx design software suite. This overview explains the general progression of a design through ISE from start to finish.

19

ISE enables you to start your design with any of a number of different source types, including:

• HDL (VHDL, Verilog HDL, ABEL)

• Schematic design files

• EDIF

• NGC/NGO

• State Machines

• IP Cores

From your source files, ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities, including ModelSim Xilinx Edition and the HDL Bencher test bench generator. HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE. The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD, and finally produce a bit stream for your device configuration.

Design Entry:

• ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports.

• Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create, view, and edit schematics and symbols for the Design Entry step of the Xilinx® design flow.

• CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders, to system-level building blocks such as filters, transforms, FIFOs, and memories.

• Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints.

• PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit I/O, Global logic, and Area Group constraints.

• State CAD State Machine Editor - State CAD allows you to specify states, transitions, and actions in a graphical editor. The state machine will be created in HDL.

Implementation:

• Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file.

• Map - The Map program maps a logical design to a Xilinx FPGA.

• Place and Route (PAR) - The PAR program accepts the mapped design, places and routes the FPGA, and produces output for the bit stream generator.

20

• Floor planner - The Floor planner allows you to view a graphical representation of the FPGA, and to view and modify the placed design.

• FPGA Editor - The FPGA Editor allows you view and modify the physical implementation, including routing.

• Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs. With Timing Analyzer, analysis can be performed immediately after mapping, placing or routing an FPGA design, and after fitting and routing a CPLD design.

• Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file.

• Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs, macro cell details, equations, and pin assignments.

Device Download and Program File Formatting

• BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration.

• iMPACT - The iMPACT tool generates various programming file formats, and subsequently allows you to configure your device.

• XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices.

• Integration with ChipScope Pro.

CH 3: Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances. In addition, they generally aim to be implemented efficiently on a large variety of platforms. In this paper, we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements. For this purpose, we propose low-cost encryption routines (i.e. with small code size and memory) targeted for processors with a limited instruction set

21

(i.e. AND, OR, XOR gates, word rotation and modular addition). The proposed design is parametric in the text, key and processor size, allows efficient combination of encryption/decryption, “on-the-fly” key derivation and its security against a number of recent cryptanalytic techniques is discussed. Target applications for such routines include any context requiring low-cost encryption and/or authentication.

In this paper, we consequently consider a general context where we have very limited processing resources (e.g. a small processor) and throughput requirements. It yields design criteria such as: low memory requirements, small code size, limited instruction set. In addition, we propose the flexibility as another unusual design principle. SEAn,b is parametric in the text, key and processor size. Such an approach was motivated by the fact that many algorithms behave differently on different platforms (e.g. 8-bit or 32-bit processors). In opposition, SEAn,b allows to obtain a small encryption routine targeted to any given processor, the security of the cipher being adapted in function of its key size. Beyond these general guidelines, alternative features were wanted, including the efficient combination of encryption and decryption or the ability to derive keys “on the fly”.

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (e.g. authentication). Finally, the simplicity of SEAn,b makes its implementation straightforward. Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems.

For example, introducing programmability into the configuration of lights and switches, thermostats and air handlers, promises to improve the cost of construction, flexibility in occupancy, and energy efficiency of buildings. But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems. The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20]. SEAn,b constitutes a suitable solution for low-cost encryption/authentication within such networks. RFID’s or any power/space-limited applications are similarly targeted.

3.1 Specifications:

Parameters and Definitions:

SEAn,b operates on various text, key and word sizes. It is based on a Feistelstructure with a variable number of rounds, and is defined with respect to thefollowing parameters:– n: plaintext size, key size.– b: processor (or word) size.– nb = n2b : number of words per Feistel branch.--nr: number of block cipher rounds.

22

As only constraint, it is required that n is a multiple of 6b. For example, usingan 8-bit processor, we can derive 48, 96, 144, . . . -bit block ciphers, respectivelydenoted as SEA48,8, SEA96,8, SEA144,8, ... Let x be a n2-bit vector. In the following, we will consider two representations:– Bit representation: xb = x(n2− 1) x(n2− 2) . . . , x(2) x(1) x(0).--Word representation: xW = xnb−1 xnb−2 . . . x2 x1 x0.

Basic Operations

Due to its simplicity constraints, SEAn,b is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows: (1) bitwise XOR ⊕, (2) substitution box S, (3) word (left) rotationR and inverse word rotation R−1, (4) bit rotation r, (5) addition mod 2b _.

These operations are formally defined as follows:

1. Bitwise XOR:

The bitwise XOR is defined on n2-bit vectors:⊕ : Zn22 ﾗ Zn22 → Zn22 : x, y → z = x ⊕ y ⇔ z(i) = x(i) ⊕ y(i), 0 ≤ i ≤n2 − 1

2. Substitution Box S:

SEAn,b uses the following 3-bit substitution table:ST := {0, 5, 6, 7, 4, 3, 1, 2},in C-like notation. For efficiency purposes, it is applied bitwise to any set of threewords of data using the following recursive definition:

S : Znb2b → Znb

2b : x → x = S(x) ⇔x3i = (x3i+2 ∧ x3i+1) ⊕ x3i,x3i+1 = (x3i+2 ∧ x3i) ⊕ x3i+1,x3i+2 = (x3i ∨ x3i+1) ⊕ x3i+2, 0≤ i ≤ nb3 − 1,where ∧ and ∨ respectively represent the bitwise AND and OR.

Word Rotation R:

The word rotation is defined on nb-word vectors:R : Znb2b → Znb2b : x → y = R(x) ⇔ yi+1 = xi, 0 ≤ i ≤ nb − 2,y0 = xnb−1

23

Bit Rotation r:

The bit rotation is defined on nb-word vectors:r : Znb2b → Znb2b : x → y = r(x) ⇔ y3i = x3i≫1,y3i+1 = x3i+1,y3i+2 = x3i+2 ≪1, 0 ≤ i ≤ nb3 − 1,where≫and ≪represent the cyclic right and left shifts inside a word.

Addition mod2b _:

The mod 2b addition is defined on nb-word vectors:r : Znb2b ﾗ Znb2b → Znb2b : x, y → z = x _ y ⇔ zi = xi _ yi, 0 ≤ i ≤ nb − 1

The Round and Key Round

Based on the previous definitions, the encrypt round FE, decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F :Z2 2n/2 ﾗ Z2n/2 → Z2 2n/2 such that:

[Li+1,Ri+1] = FE(Li,Ri,Ki) Ri+1 = R(Li) ⊕ r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1,Ri+1] = FD(Li,Ri,Ki) Ri+1 = R−1_Li ⊕ r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1,KRi+1] = FK(KLi,KRi, Ci) KRi+1=KLi ⊕ R_r_S(KRi _ Ci)__ KLi+1 = KRi

24

RR-1rSkiLiRiLi+1

Ri+1

KLiKRiRrSCiKLi+1

KRi+1

FIG 3.1 Encrypt/decrypt round and key round

The Complete Cipher:

The cipher iterates an odd number nr of rounds. The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C. P,C andK have a parametric bit size n. The operations within the cipher are performedconsidering parametric b-bit words.C=SEAn,b(P,K){% initialization:L0&R0 = P;KL0&KR0 = K;

% key scheduling:for i in 1 to _nr2_[KLi,KRi] = FK(KLi−1,KRi−1, C(i));switch KL_ nrfor i in nr 2_, KR_ nr2_;2 to nr − 1

25

[KLi,KRi] = FK(KLi−1,KRi−1, C(r − i));

% encryption:for i in 1 to nr2

[Li,Ri] = FE(Li−1,Ri−1,KRi−1);for i in nr2 + 1 to nr[Li,Ri] = FE(Li−1,Ri−1,KLi−1);

% final:C = Rnr&Lnr ;switch KLnr−1, KRnr−1;},wherewhere & is the concatenation operator, KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i. Decryption is exactly the same, using the decrypt round FD.

3.2 Design Properties of the Components

Substitution Box S:

The substitution box was searched exhaustively in order to meet the following security and efficiency criteria:– λ-parameter1: 1/2.– δ-parameter2: 1/4.– Maximum nonlinear order, namely 2.– Recursive definition.– Minimum number of instructions.Remark that, if 3-operand instructions are available, the recursive definition allows to perform the substitution box in 2 operations per word of data. As a comparison, the 3 ﾗ 3 bitwise substitution box used in 3-WAY [15] requires 3. The counterpart of this efficiency is the presence of two fixed points in the table.

Bit and Word Rotations r and R:

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher, when combined with the bitslice substitution box. It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48, b = 8, nb = 3. Looking at the figure, it can be seen that SEAn,b divides its data in 2nb3 blocks of 3 words. The substitution box is applied in parallel to these blocks. Therefore, the diffusion process (starting with one single active bit in the left branch) is divided into two steps3:

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box. It requires at most

26

nb rounds to be completed (in our example, nb = 3 which yields 3 rounds). Once every word has at least one active bit, the combination of r and S yields six more active bits per block in each round. Therefore, finishing the diffusion of all the blocks requires at most _b/2_ rounds. Combining these observations, the diffusion is complete after nb + _b/2_ rounds.

Addition mod 2b _:

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons: (1) improvement of the diffusion process, (2) improvement of the non-linearity, (3) same cost/speed as the bitwise XOR inmost processors, (4) necessity to avoid structural attacks.

3.3 Overall Structure:

The overall structure of the cipher follows the Feistel strategy. However, a few points are specific to SEAn,b, namely the key schedule and the position of R, R−1 in the encrypt/decrypt rounds.The key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half. It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption. Namely, we have:K0,K1,K2, . . . , K_ r2 _,K_ r2 _−1, . . . , K2,K1,K0As a consequence of this structure, the encryption/decryption rounds cannotkeep the traditional Feistel structure: it would result in having identical encryptionand decryption functions. This is the reason of moving the word rotationto the left branch of the Feistel round.

3.4 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis:

From the properties of the substitution box, we can compute bounds for the best linear and differential characteristics through the cipher. We first use the following lemma [29]:Lemma 1. Let f be the bijective nonlinear function of a 3-round Feistel cipher. Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ, then the linear, differential parameters of the 3-round cipher Δ,Λ are respectively smaller than λ2, δ2. Since our nonlinear function S has parameter δ = 2−2 and parameter λ = 2−1,it implies that 3 rounds of SEAn,b have their linear and differential parameters respectively bounded by Δ < 2−4 and Λ < 2−2. However, for a n-bit block cipher, it is respectively required that Δ _ 2−n and Λ _ 2−n2 to resist against differential [4] and linear cryptanalysis [28]. In order to approach these bounds, we require that:δ2nr/3 = _2−2_2nr/3< 2−n and λ2nr/3 = _2−1_2nr/3< 2−n2. (1)In both cases, the required number of rounds is: nr ≥ 3n/4. We note that we used a hybrid approach, between the provable security against linear and differential attacks that consists in bounding the parameter of the best differential/hull, like in lemma 1, and the usual heuristics to estimate the best linear/differential characteristic through a cipher (as

27

in the previous estimation for nr). In fact, the strategy of Equation (1) is similar to the one of e.g. the AES Rijndael [17], but we only assume one active s-box per round.

Extensions of Linear and Differential Cryptanalysis:

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26], bi-linear cryptanalysis [14], differential-linear cryptanalysis [27], multiple linear cryptanalysis [22, 10], boomerang [31] and rectangle [8] attack.

However these extensions usually imply only a small improvement compared to the basic attacks. As a matter of fact, non-linear approximations of outer rounds allow to improve the bias of one or two rounds only. Regarding bi-linear cryptanalysis, we quote the author of [14]: For ciphers similar to DES, based on small substitution boxes, we claim that bi-linear cryptanalysis is very closely related to LC, and we do not expect to find a bi-linear attack much faster than by LC.

It is difficult to evaluate the efficiency of multiple linear cryptanalysis, but it seems more promising for big substitution boxes (as mentioned in [22]). Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAn,b a Feistel structure and a poor diffusion) is limited. Finally, the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks.

As an example, the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one, but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q. Therefore although these attacks can perform slightly better in specific cases, the expected improvement is never outstanding.The conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security, but that a reasonable multiplicative factor should be enough to take them into account.

A Dedicated Related-Key Attack Against a Modified Version. Forx ∈ Znb2b, we denote by x≪a the left rotation by a bits of each of the nb wordsof x. The non-linear and diffusion layers have the following properties:– S(x≪a) = S(x)≪a– r(x≪a) = r(x)≪a– R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition, and where all round constants Ci are such that Ci ≪ a = Ci, e.g. all Ci’s equal 0. As a consequence of the previous observations, the modified round F_E and the key round FK satisfy:F_E (L≪a,R≪a,K ≪a) = F_E (L,R,K)≪aFK(KL≪a,KR≪a, 0) = FK(KL,KR, 0)≪a

28

These properties are iterative, in the sense that they also hold for the composition of several block cipher rounds. It is immediate to deduce from them a distinguisher on the modified cipher, which requires 2 chosen encryption queries under 2 related keys K and K ≪a. In the actual SEAn,b, the key addition is performed word-wise mod 2b. As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations, it only holds with a probability p, which depends on a and the word size b. For a = 1, p rapidly converges to 3/8 as b grows. It is smaller for 1 < a < b−1. Of course, this probability is averaged for all possible (X,K) and certain keys (e.g. “all zeroes”) yield no carry propagation at all. However, the design properties of the key schedule prevent SEAn,b from having such weak keys.

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter). Combined with the diffusion in the key schedule, it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes. These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAn,b.

Square Attacks:

We explored square attacks [16] on SEA48,8. More precisely, we considered all possible sets of inputs to one branch of the Feistel structure, where the input to some of the substitution boxes is active (i.e. takes all possible input values the same number of times), and the input to the other substitution boxes is constant. The other branch is also constant. Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active). Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure. It is expected that it remains the same when different parameters n and b are considered, which implies that nb + _b/2_ rounds are enough to prevent square attacks. Note that although our observations also hold for ⊕-SEAn,b, the use of addition mod 2b provides better resistance against square attacks.

Truncated and Impossible Differentials:

As for square attacks, the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25]. Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials. As a consequence, we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 ｷ (nb + _b/2_).

Interpolation Attacks:

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression. It requires the substitution box to have a compact expression, and the diffusion layer to permit the composition of these expressions. In the case of SEAn,b, there is a priori no such expression, and the bitwise diffusion would make the combination of algebraic expressions difficult anyway.

29

Slide Attacks:

The sequence of round keys of SEAn,b is the same as the one of ICEBERG. Therefore the analysis done in [30] is still valid. Namely, the non periodicity of the sequence should make slide attacks [11, 12] irrelevant. The particular structure of this sequence also has some similarities with the one of GOST, of which the vulnerability against slide attacks is examined in [12]. None of the attacks presented in [12] seems to be applicable to our cipher.

Related-Key Attacks:

The first related-key attack has been described in [5]. It is the related-key counterpart of the slide attack. Such an attack is applicable when a round key Ki is computed from the previous round key Ki−1 using a function f which is always the same: Ki = f(Ki−1). However in the case of SEAn,b, a round constant that changes for each key round is used, which prevents this attack. Another type of related-key attack is the differential related key attack [23, 24]. The non-linearity of the SEAn,b key schedule should prevent it. Moreover, note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to “counter” the effect of the diffusion layer on the differential characteristic; a typical example is the attack on 3-WAY [24]. As the security of SEAn,b against differential cryptanalysis results from its large number of rounds rather than from its diffusion, this effect is notrelevant here.

Complementation Properties:

The DES has the following complementation property: if P K→C denotes the fact that encryption of P under key K gives ciphertext C, then: P K −→ C ⇐⇒ P K −→ C. The non-linear key scheduling and the presence of carry propagations in the actual SEAn,b algorithm prevents this property. We are not aware of any other similar structural feature in the design.

Algebraic Attacks:

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher. For example, certain block ciphers can be written as an overdefined system of quadratic equations. Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs. Clearly, SEAn,b has a simple algebraic structure, as it is based on a 3-bit substitution box. Therefore, if such an attack practically applies to a cipher like Serpent [1], it is likely applicable to one of the versions of our routines. As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds, it is specially true when those values increase. However, as the criteria for these techniques to be successful are still being discussed [9], we did consider this latter point as a scope for further research. We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines.

30

Suggested Number of Rounds:

From the previous descriptions, the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 ｷ (nb + _b/2_). This roughly corresponds to the number of rounds to resist linear/differential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks). A more conservative approach (applied in most present block ciphers) would be to take a large security margin, e.g. by doubling this number of rounds4. nr has to be odd: we add one if it is even. We also assume a minimum word size b ≥ 8 bits.

3.5 Performance Analysis:

SEAn,b is targeted for being implemented on low-cost processors, with little code size and a small instruction set. However, SEAn,b’s simple structure makes it easy to implement on any processor. In appendix, we propose a pseudo-assembly code of an encryption/decryption design with “on the fly” key scheduling. The implementation objectives were, in decreasing order of importance: (1) low RAM and registers usage, (2) low code size and (3) speed. It is based on the following (very) reduced instruction set (assuming 2-operand instructions only):

– Arithmetic and logic operators: ∨, ∧,⊕,_,≫,≪.

– Branch instructions: goto, subroutine call and return.

– Comparison, load RAM in register, store register in RAM.

According to the code in appendix, the performances can be roughly estimated as follows. First, the combined number of RAM words and registers equals 5nb + 3. Then, the code size and implementation time (both in expressed in ops.) is evaluated by summing the values given in appendix. For the code size, it directly yields 31nb+36 ops. For the implementation time, the round and key round respectively require 12nb + 11 ops. and 10nb + 11 ops. It yields a total of (nr − 1) ﾗ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7. These values are summarized in Table 1. Remark that, due to the particular structure of the key scheduling, we do not need to keep the master key in memory as, at the end of an encryption/decryption, we have Knr−1 = K0. Remark also that this implementation uses a low number of registers, namely nb +3. However, if more registers are available, they can be traded for RAM words, which will result in lower code size and faster implementation.

31

For illustration purposes, we implemented SEAn, b on Atmel AVR ATtiny[3] And ARM [2] microprocessors. The Atmel ATtiny represents a typical target for such a low-cost encryption routine. We chose the ARM platform in order to provide rough comparisons between SEAn,b and the AES Rijndael. While direct comparisons are made difficult by their high dependencies on the target devices, the following general comments can be made:– SEAn,b designs combine encryption and decryption more efficiently than most other encryption algorithms. In particular, key agility in decryption is usually not possible (e.g. for the AES Rijndael).– The combined number of RAM words and registers of SEAn,b implementations (i.e. 5nb + 3) is generally lower than for other block ciphers.– The code size of SEAn,b is generally lower than for other block ciphers implemented on similar platforms.

The flexibility of SEAn,b also makes it less sensitive to the choice of a processor than fixed-sized algorithms, although it is obvious that large buses improve efficiency. The drawback of these limited resources is in the number of cycles required for the encryption (i.e. SEAn,b trades space for time, which may be relevant due to present processors speeds). Looking at the code size - cycles product, the efficiency of SEAn,b remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations.

32

CH:4 AN EXPOSITION OF THE SEA ALGORITHM

The Schoof{Elkies{Atkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld. This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm. The mathematical background for the technique is then given.Let p be a large (odd) prime and let E : y2 = x3 + a4x + a6 be an elliptic curve, where a4 and a6 are given _xed integers. In the case where p does not divide 4a34 +27a26 , E can be reduced to an elliptic curve over Fp. The number of points of E over Fp, denoted by #E(Fp), is of cryptographic interest, since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks.

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof. He used calculations with torsion points on the curve to arrive at the number of points. At _rst Schoof's algorithm was considered impractical, but Elkies suggested the use of \good" primes (now known as Elkies primes), where isogenies and modular curves can be involved to speed up the calculation. Atkin also made a number of important contributions to the algorithm, which then became known as the Schoof{Elkies{Atkin (SEA) algorithm.

Further improvements were later proposed by Dewaghe and Couveignes{Dewaghe{Morain. The SEA algorithm was implemented by Morain, M uller, and Izu et al.� Schoof's seminal paper [18] describes the original algorithm. He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995. Elkies' paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples. The implementations of Morain and M uller are described in [15] and [16]. The implementation of Izu,� Kogure, Noro and Yokoyama, which focuses on speeding up the algorithm as much as possible, is described in [13].

Dewaghe's improvement is published in [7. The improvement by Couveignes{Dewaghe{Morain is published in [5]. Atkin never formally published his contributions described in [1], but they are discussed extensively in [9, 19]. This paper, which is not aimed at the experts in the area, describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morain's. The algorithm considered below is probabilistic and, for a 200-bit prime p, succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below). The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp, where p has 200 bits.

It is known that#E(Fp) = p + 1 � t;where t is an integer which satis_es the Hasse bound�2pp _ t _ 2pp:

33

The algorithm works by calculating t modulo several small auxiliary primes `. When the product of the auxiliary primes exceeds 4pp, the Chinese Remainder Theorem is used to recover the exact value of t, and hence that of #E(Fp). The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below. For each candidate, a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this `. These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows. Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2). For those curves where the algorithm applies, we can determine t modulo `. When we _nished with all our candidates for the auxiliary primes, we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not. In the former case, we succeeded in determining t.

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp, with the intention of _nding an E with #E(Fp) = xr, where r is a prime and x is small. Given such a curve, a point P of order r can be located easily and the pair (E; P) could be used for a number of cryptographic algorithms, such as Di_e-Hellman key exchange, El Gamal encryption, etc. If we use 200-bit primes for p and require x _ 32, then the probability that #E = xr is about 2.5%, so we expect to have to run our algorithm on about 55 curves. Section 2 describes the algorithm in detail. Section 3 presents the mathematical background of the algorithm. Section 4 presents ideas by which the algorithm could be improved. Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm.

The Algorithm

4.1 Overview:

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes. For each ` 2 A, we need to determine a polynomial `(F; J) 2 Z[F; J]. For ` 2 As, this is stored in the program. For ` 2 Al, must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it. The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store).We start out with a given prime p and an elliptic curveE : y2 = x3 + a4x + a6:

34

CH 5: SEA Architecture Block Diagram

35

36

ModSBoxBRWRXORRound RegKeyReg[95:0]KeyReg0[95:0]

KeyReg1[95:0]

KeyReg8[95:0]

KeyReg9[95:0]

KEY

COMPUTATIONAL

BLOCK

ModSBoxxxx

BRWRXORRound Reg

Encryption Computational BlockCipher data RegisterKey0[95:0]

Key9[95:0]

DataI[95:0]

DataLd

ClkRstModSBoxBRIWRXORRound Reg

Decryption Computational Block

Plain text data RegisterKey0[95:0]

Key9[95:0]

DataO[95:0]

DataRd

ClkRstSMCClk

Rst

Ena

E/DOvr

KeyI[95:0]

KeyLd

Clk

Rst

FIG: 5.1

5.1 KEY GENERATION

Key generation is the process of generating keys for cryptography. A key is used to encrypt and decrypt whatever data is being encrypted/decrypted.

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA). Symmetric-key algorithms use a single shared key; keeping data secret requires keeping this key secret. Public-key algorithms use a public key and a private key. The public key is made available to anyone (often by means of a digital certificate). A sender will encrypt data with the public key; only the holder of the private key can decrypt this data.

Since public-key algorithms tend to be much slower than symmetric-key algorithms, modern systems such as TLS and SSH use a combination of the two: one party receives the other's public key, and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it). The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption.

In computer cryptography keys are integers. In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG), the latter being a computer algorithm that produces data which appears random under analysis. Of the PRNGs those which use system entropy to seed data generally produce better results, since this makes the initial conditions of the PRNG much more difficult for an attacker to guess. In other situations, the key is created using a passphrase and a key generation algorithm, usually involving a cryptographic hash function such as SHA-1.

The simplest method to read encrypted data is a brute force attack—simply attempting every number, up to the maximum length of the key.

37

http://en.wikipedia.org/wiki/Key_(cryptography)

http://en.wikipedia.org/wiki/Brute_force_attack

http://en.wikipedia.org/wiki/SHA-1

http://en.wikipedia.org/wiki/Cryptographic_hash_function

http://en.wikipedia.org/wiki/Cryptographic_hash_function

http://en.wikipedia.org/wiki/Passphrase

http://en.wikipedia.org/wiki/Random_seed

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Pseudorandom_number_generator

http://en.wikipedia.org/wiki/Random_number_generator

http://en.wikipedia.org/wiki/Secure_Shell

http://en.wikipedia.org/wiki/Transport_Layer_Security

http://en.wikipedia.org/wiki/Digital_certificate

http://en.wikipedia.org/wiki/Private_key

http://en.wikipedia.org/wiki/Public_key

http://en.wikipedia.org/wiki/RSA

http://en.wikipedia.org/wiki/Public-key_cryptography

http://en.wikipedia.org/wiki/Advanced_Encryption_Standard

http://en.wikipedia.org/wiki/Data_Encryption_Standard

http://en.wikipedia.org/wiki/Symmetric-key_algorithm

http://en.wikipedia.org/wiki/Encryption

http://en.wikipedia.org/wiki/Cryptography

Therefore, it is important to use a sufficiently long key length; longer keys take exponentially longer to attack, rendering a brute force attack impractical. Currently, key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common.

38

http://en.wikipedia.org/wiki/128-bit

http://en.wikipedia.org/wiki/Key_length

Cryptography:

Cryptography is the art and science of secret writing. The term is derived from the Greek language

• krytos - secret

• graphos - writing

5.2 Encryption:

Encryption is the actual process of applying cryptography. Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages, text, words, signals and other forms of communication. Cryptography has many uses, especially in the areas of espionage, intelligence and military operations. Cryptography deals with all aspects of secure messaging, authentication, digital signatures, electronic money, and other applications.

Today, many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception. Some of this encryption is highly advanced, however even simple encryption techniques can help uphold the privacy of any everyday person. The term cryptography also meant the breaking of encrypted messages until the early 1920s, when the concept of Cryptanalysis began being used and is now practically an art and science all on its own.

The two main areas of cryptography are Cipher and Code.

Code is one of the two major methods of cryptography. This method involves the replacement of complete words or phrases by code words or numbers.

Cipher is the other major method of cryptography. This works on the principal of replacing individual letters by other numbers or letter.

Cryptographic algorithms all perform the same basic function: They take two inputs – a message and a key -- and transform them into a single output. There are two ways to perform this function. Encryption, as shown in Figure 1, uses the cryptographic key to transform the original message into an encrypted form. Decryption, as shown in Figure 2, does the reverse; it uses a cryptographic key to transform an encrypted message back into its original (a.k.a. plaintext) form.

39

FIG 5.2 ENCRYPTION BLOCK

FIG 5.3 Encryption Operation

40

5.3 DECRYPTION :

The process of decoding data that has been encrypted into a secret format. Decryption requires a secret key or password.

It is a commonly held misconception that every encryption method can be broken. In connection with his WWII work at Bell Labs, Claude Shannon proved that the one-time pad cipher is unbreakable, provided the key material is truly random, never reused, kept secret from all possible attackers, and of equal or greater length than the message.[22]

Most ciphers, apart from the one-time pad, can be broken with enough computational effort by brute force attack, but the amount of effort needed may be exponentially dependent on the key size, as compared to the effort needed to use the cipher.

In such cases, effective security could be achieved if it is proven that the effort required (i.e., "work factor", in Shannon's terms) is beyond the ability of any adversary. This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher. Since no such showing can be made currently, as of today, the one-time-pad remains the only theoretically unbreakable cipher.

There are a wide variety of cryptanalytic attacks, and they can be classified in any of several ways. A common distinction turns on what an attacker knows and what capabilities are available. In a ciphertext-only attack, the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks). In a known-plaintext attack, the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs). In a chosen-plaintext attack, the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times); an example is gardening, used by the British during WWII.

Finally, in a chosen-ciphertext attack, the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts.[10] Also important, often overwhelmingly so, are mistakes (generally in the design or use of one of the protocols involved; see Cryptanalysis of the Enigma for some historical examples of this).

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher. For example, a simple brute force attack against DES requires one known plaintext and 255 decryptions, trying approximately half of the possible keys, to reach a point at which chances are better than even the key sought will have been found. But this may not be enough assurance; a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations.[23] This is a considerable improvement on brute force attacks.

Public-key algorithms are based on the computational difficulty of various problems. The most famous of these is integer factorization (e.g., the RSA algorithm is based on a problem related to integer factoring), but the discrete logarithm problem is also important. Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems, or some of them, efficiently (ie, in a practical time).

41

http://en.wikipedia.org/wiki/Discrete_logarithm

http://en.wikipedia.org/wiki/Integer_factorization

http://en.wikipedia.org/wiki/Linear_cryptanalysis

http://en.wikipedia.org/wiki/Cryptanalysis_of_the_Enigma

http://en.wikipedia.org/wiki/Cryptographic_protocol

http://en.wikipedia.org/wiki/Chosen-ciphertext_attack

http://en.wikipedia.org/wiki/Bletchley_Park

http://en.wikipedia.org/wiki/Gardening_(cryptanalysis)

http://en.wikipedia.org/wiki/Chosen-plaintext_attack

http://en.wikipedia.org/wiki/Known-plaintext_attack

http://en.wikipedia.org/wiki/Ciphertext-only_attack

http://en.wikipedia.org/wiki/Exponential_time

http://en.wikipedia.org/wiki/Brute_force_attack

http://en.wikipedia.org/wiki/Random_numbers

http://en.wikipedia.org/wiki/One-time_pad

http://en.wikipedia.org/wiki/One-time_pad

http://en.wikipedia.org/wiki/Claude_Shannon

http://en.wikipedia.org/wiki/Bell_Labs

http://webopedia.internet.com/TERM/D/password.html

http://webopedia.internet.com/TERM/D/key.html

http://webopedia.internet.com/TERM/D/format.html

http://webopedia.internet.com/TERM/D/data.html

For instance, the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring, at least for problems of more or less equivalent size. Thus, other things being equal, to achieve an equivalent strength of attack resistance, factoring-based encryption techniques must use larger keys than elliptic curve techniques. For this reason, public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s.

While pure cryptanalysis uses weaknesses in the algorithms themselves, other attacks on cryptosystems are based on actual use of the algorithms in real devices, and are called side-channel attacks. If a cryptanalyst has access to, say, the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character, he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis.

An attacker might also study the pattern and length of messages to derive valuable information; this is known as traffic analysis,[24] and can be quite useful to an alert adversary. Poor administration of a cryptosystem, such as permitting too short keys, will make any system vulnerable, regardless of other virtues. And, of course, social engineering, and other attacks against the personnel who work with cryptosystems or the messages they handle (e.g., bribery, extortion, blackmail, espionage, torture, ...) may be the most productive attacks of all.

42

http://en.wikipedia.org/wiki/Torture

http://en.wikipedia.org/wiki/Espionage

http://en.wikipedia.org/wiki/Blackmail

http://en.wikipedia.org/wiki/Extortion

http://en.wikipedia.org/wiki/Bribery

http://en.wikipedia.org/wiki/Social_engineering_(security)

http://en.wikipedia.org/wiki/Social_engineering_(security)

http://en.wikipedia.org/wiki/Traffic_analysis

http://en.wikipedia.org/wiki/Timing_attack

http://en.wikipedia.org/wiki/Side-channel_attack

http://en.wikipedia.org/wiki/Elliptic_curve_cryptography

FIG: 5.4 DECRYPTION BLOCK

FIG: 5.5 Decryption Operation

43

SIMULATION RESULTS

Key Generation Results

44

Encryption Results

45

Decryption Results

46

SYNTHESIS REPORTS

KEY INPUT:

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT:

47

Release 6.1i - ngdbuild G.23Copyright (c) 1995-2003 Xilinx, Inc. All rights reserved.

Command Line: ngdbuild -intstyle ise -dd c:\xilinx\bin\vasu/_ngo -i -pxc2s15-cs144-6 keyreg.ngc keyreg.ngd

Reading NGO file "c:/xilinx/bin/vasu/keyreg.ngc" ...Reading component libraries for design expansion...

Checking timing specifications ...Checking expanded design ...

NGDBUILD Design Results Summary: Number of errors: 0 Number of warnings: 0

Total memory usage is 37996 kilobytes

Writing NGD file "keyreg.ngd" ...

Writing NGDBUILD log file "keyreg.bld"...

Release 6.1i Map G.23Xilinx Mapping Report File for Design 'keyreg'

Design Summary--------------Number of errors: 0Number of warnings: 0Logic Utilization:Logic Distribution: Number of Slices containing only related logic: 0 out of 0 0% Number of Slices containing unrelated logic: 0 out of 0 0% *See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs: 194 out of 86 225% (OVERMAPPED) IOB Flip Flops: 96 Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25%

Total equivalent gate count for design: 768Additional JTAG gate count for IOBs: 9,360Peak Memory Usage: 57 MB

MAPPING REPORT:

Rele'keyreg'

48

Design Information------------------Command Line : C:/Xilinx/bin/nt/map.exe -intstyle ise -p xc2s15-cs144-6 -cmarea -pr b -k 4 -c 100 -tx off -o keyreg_map.ncd keyreg.ngd keyreg.pcf Target Device : x2s15Target Package : cs144Target Speed : -6Mapper Version : spartan2 -- $Revision: 1.16 $ase 6.1i Map G.23Xilinx Mapping Report File for Design Mapped Date : Mon Mar 30 12:42:43 2009

Design Summary--------------Number of errors: 0Number of warnings: 0Logic Utilization:Logic Distribution: Number of Slices containing only related logic: 0 out of 0 0% Number of Slices containing unrelated logic: 0 out of 0 0% *See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs: 194 out of 86 225% (OVERMAPPED) IOB Flip Flops: 96 Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25%


Placing & Routing Report:

Design Summary--------------

49

Number of errors: 0Number of warnings: 0Logic Utilization:Logic Distribution: Number of Slices containing only related logic: 0 out of 0 0% Number of Slices containing unrelated logic: 0 out of 0 0% *See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs: 194 out of 86 225% (OVERMAPPED) IOB Flip Flops: 96 Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25%


KEY REGISTER:

Release 6.1i - xst G.23Copyright (c) 1995-2003 Xilinx, Inc. All rights reserved.--> Parameter TMPDIR set to __projnavCPU : 0.00 / 0.67 s | Elapsed : 0.00 / 1.00 s --> Parameter xsthdpdir set to ./xstCPU : 0.00 / 0.67 s | Elapsed : 0.00 / 1.00 s --> Reading design: keyreg.prj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 4.1) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 7.1) Device utilization summary 7.2) TIMING REPORT

=========================================================================* Synthesis Options Summary *=========================================================================---- Source Parameters

50

Input File Name : keyreg.prjInput Format : mixedIgnore Synthesis Constraint File : NOVerilog Include Directory

---- Target ParametersOutput File Name : keyregOutput Format : NGCTarget Device : xc2s15-6-cs144

---- Source OptionsTop Module Name : keyregAutomatic FSM Extraction : YESFSM Encoding Algorithm : AutoFSM Style : lutRAM Extraction : YesRAM Style : AutoROM Extraction : YesROM Style : AutoMux Extraction : YESMux Style : AutoDecoder Extraction : YESPriority Encoder Extraction : YESShift Register Extraction : YESLogical Shifter Extraction : YESXOR Collapsing : YESResource Sharing : YESMultiplier Style : lutAutomatic Register Balancing : No

---- Target OptionsAdd IO Buffers : YESGlobal Maximum Fanout : 100Add Generic Clock Buffer(BUFG) : 4Register Duplication : YESEquivalent register Removal : YESSlice Packing : YESPack IO Registers into IOBs : auto

---- General OptionsOptimization Goal : SpeedOptimization Effort : 1Keep Hierarchy : NOGlobal Optimization : AllClockNets

51

RTL Output : YesWrite Timing Constrain : NOHierarchy Separator : _Bus Delimiter : <>Case Specifier : maintainSlice Utilization Ratio : 100Slice Utilization Ratio Delta : 5

---- Other Optionslso : keyreg.lsoRead Cores : YEScross_clock_analysi : NOverilog2001 : YESOptimize Instantiated Primitives : NO

=========================================================================

WARNING:Xst:1885 - LSO file is empty, default list of libraries is used

=========================================================================* HDL Compilation *=========================================================================Compiling vhdl file c:/xilinx/bin/vasu/KeyReg.vhd in Library work.Architecture keyreg of Entity keyreg is up to date.

=========================================================================* HDL Analysis *=========================================================================Analyzing Entity <keyreg> (Architecture <keyreg>).Entity <keyreg> analyzed. Unit <keyreg> generated.

=========================================================================* HDL Synthesis *=========================================================================

Synthesizing Unit <keyreg>.

52

Related source file is c:/xilinx/bin/vasu/KeyReg.vhd. Found 96-bit register for signal <Dreg>. Summary:

inferred 96 D-type flip-flop(s).Unit <keyreg> synthesized.

=========================================================================HDL Synthesis Report

Macro Statistics# Registers : 1 96-bit register : 1

=========================================================================

=========================================================================* Advanced HDL Synthesis *=========================================================================

=========================================================================* Low Level Synthesis *=========================================================================

Optimizing unit <keyreg> ...Loading device for application Xst from file '2s15.nph' in environment C:/Xilinx.

Mapping all equations...Building and optimizing final netlist ...Found area constraint ratio of 100 (+ 5) on block keyreg, actual ratio is 28.

=========================================================================* Final Report *=========================================================================Final ResultsRTL Top Level Output File Name : keyreg.ngrTop Level Output File Name : keyregOutput Format : NGC

53

Optimization Goal : SpeedKeep Hierarchy : NO

Design Statistics# IOs : 195

Macro Statistics :# Registers : 1# 96-bit register : 1

Cell Usage :# BELS : 1# LUT1 : 1# FlipFlops/Latches : 96# FDCE : 96# Clock Buffers : 1# BUFGP : 1# IO Buffers : 194# IBUF : 98# OBUF : 96=========================================================================

Device utilization summary:---------------------------

Selected Device : 2s15cs144-6

Number of Slices: 55 out of 192 28% Number of Slice Flip Flops: 96 out of 384 25% Number of 4 input LUTs: 1 out of 384 0% Number of bonded IOBs: 194 out of 90 215% (*) Number of GCLKs: 1 out of 4 25%

WARNING:Xst:1336 - (*) More than 100% of Device resources are used

=========================================================================TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT GENERATED AFTER PLACE-and-ROUTE.

54

Clock Information:-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary:---------------Speed Grade: -6

Minimum period: No path found Minimum input arrival time before clock: 7.962ns Maximum output required time after clock: 6.788ns Maximum combinational path delay: No path found

Timing Detail:--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint: Default OFFSET IN BEFORE for Clock 'Clk'Offset: 7.962ns (Levels of Logic = 1) Source: KeyEna (PAD) Destination: Dreg_95 (FF) Destination Clock: Clk rising

Data Path: KeyEna to Dreg_95 Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUF:I->O 96 0.776 6.300 KeyEna_IBUF (KeyEna_IBUF) FDCE:CE 0.886 Dreg_0 ---------------------------------------- Total 7.962ns (1.662ns logic, 6.300ns route) (20.9% logic, 79.1% route)-------------------------------------------------------------------------

Timing constraint: Default OFFSET OUT AFTER for Clock 'Clk'Offset: 6.788ns (Levels of Logic = 1) Source: Dreg_95 (FF) Destination: KeyO<95> (PAD) Source Clock: Clk rising

Data Path: Dreg_95 to KeyO<95> Gate Net

55

Cell:in->out fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCE:C->Q 1 1.085 1.035 Dreg_95 (Dreg_95) OBUF:I->O 4.668 KeyO_95_OBUF (KeyO<95>) ---------------------------------------- Total 6.788ns (5.753ns logic, 1.035ns route) (84.8% logic, 15.2% route)

=========================================================================CPU : 3.59 / 4.64 s | Elapsed : 4.00 / 5.00 s -->


SBOX:

56

RTL SCHEMATIC

GATE LEVEL

=========================================================================* Synthesis Options Summary *=========================================================================

57

---- Source ParametersInput File Name : sbox8x3.prjInput Format : mixedIgnore Synthesis Constraint File : NOVerilog Include Directory :

---- Target ParametersOutput File Name : sbox8x3Output Format : NGCTarget Device : xc2s15-6-cs144

---- Source OptionsTop Module Name : sbox8x3Automatic FSM Extraction : YESFSM Encoding Algorithm : AutoFSM Style : lutRAM Extraction : YesRAM Style : AutoROM Extraction : YesROM Style : AutoMux Extraction : YESMux Style : AutoDecoder Extraction : YESPriority Encoder Extraction : YESShift Register Extraction : YESLogical Shifter Extraction : YESXOR Collapsing : YESResource Sharing : YESMultiplier Style : lutAutomatic Register Balancing : No


---- General OptionsOptimization Goal : SpeedOptimization Effort : 1Keep Hierarchy : NO

58

Global Optimization : AllClockNetsRTL Output : YesWrite Timing Constraint : NOHierarchy Separator : _Bus Delimiter : <>Case Specifier : maintainSlice Utilization Ratio : 100Slice Utilization Ratio Delta : 5

---- Other Optionslso : sbox8x3.lsoRead Cores : YEScross_clock_analysi : NOverilog2001 : YESOptimize Instantiated Primitives : NO

=========================================================================

WARNING:Xst:1885 - LSO file is empty, default list of libraries is used

=========================================================================* HDL Compilation *=========================================================================Compiling vhdl file c:/xilinx/bin/vasu/KeyReg.vhd in Library work.Architecture sbox8x3 of Entity sbox8x3 is up to date.

=========================================================================* HDL Analysis *=========================================================================Analyzing Entity <sbox8x3> (Architecture <sbox8x3>).INFO:Xst:1561 - c:/xilinx/bin/vasu/KeyReg.vhd line 29: Mux is complete : default of case is discardedEntity <sbox8x3> analyzed. Unit <sbox8x3> generated

=========================================================================* HDL Synthesis *=========================================================================

Synthesizing Unit <sbox8x3>.

59

Related source file is c:/xilinx/bin/vasu/KeyReg.vhd.Unit <sbox8x3> synthesized.

========================================================================HDL Synthesis Report

Found no macro=========================================================================

=========================================================================* Advanced HDL Synthesis *=========================================================================

=========================================================================* Low Level Synthesis *=========================================================================

Optimizing unit <sbox8x3> ...Loading device for application Xst from file '2s15.nph' in environment C:/Xilinx.

Mapping all equations...Building and optimizing final netlist ...Found area constraint ratio of 100 (+ 5) on block sbox8x3, actual ratio is 1.

=========================================================================* Final Report *=========================================================================Final ResultsRTL Top Level Output File Name : sbox8x3.ngr

60

Top Level Output File Name : sbox8x3Output Format : NGCOptimization Goal : SpeedKeep Hierarchy : NO

Design Statistics# IOs : 7

Cell Usage :# BELS : 3# LUT4 : 3# IO Buffers : 7# IBUF : 4# OBUF : 3===============================================================Device utilization summary:---------------------------

Selected Device : 2s15cs144-6

Number of Slices: 2 out of 192 1% Number of 4 input LUTs: 3 out of 384 0% Number of bonded IOBs: 7 out of 90 7%


TRANSLATION REPORT:




61

FLOOR PLANNING

62

MAPPING REPORT:

Design Summary--------------Number of errors: 0Number of warnings: 0Logic Utilization: Number of 4 input LUTs: 3 out of 384 1%Logic Distribution: Number of occupied Slices: 2 out of 192 1% Number of Slices containing only related logic: 2 out of 2 100% Number of Slices containing unrelated logic: 0 out of 2 0% *See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs: 3 out of 384 1% Number of bonded IOBs: 7 out of 86 8%

Total equivalent gate count for design: 18Additional JTAG gate count for IOBs: 336Peak Memory Usage: 56 MB

Maping Report:

Device utilization summary:

Number of External IOBs 7 out of 86 8% Number of LOCed External IOBs 0 out of 7 0%

Number of SLICEs 2 out of 192 1%

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is: 0

The AVERAGE CONNECTION DELAY for this design is: 0.871 The MAXIMUM PIN DELAY IS: 1.512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is: 0.707

63

KEY GENERATION:

RTL SCHEMATIC

64

GATE LEVEL

65

=========================================================================* Synthesis Options Summary *=========================================================================---- Source ParametersInput File Name : keygenblock.prjInput Format : mixedIgnore Synthesis Constraint File : NOVerilog Include Directory : ---- Target ParametersOutput File Name : keygenblockOutput Format : NGCTarget Device : xc2s15-6-cs144 ---- Source OptionsTop Module Name : keygenblockAutomatic FSM Extraction : YESFSM Encoding Algorithm : AutoFSM Style : lutRAM Extraction : YesRAM Style : AutoROM Extraction : YesROM Style : AutoMux Extraction : YESMux Style : AutoDecoder Extraction : YESPriority Encoder Extraction : YESShift Register Extraction : YESLogical Shifter Extraction : YESXOR Collapsing : YESResource Sharing : YESMultiplier Style : lutAutomatic Register Balancing : No


66

---- General OptionsOptimization Goal : SpeedOptimization Effort : 1Keep Hierarchy : NOGlobal Optimization : AllClockNetsRTL Output : YesWrite Timing Constraints : NOHierarchy Separator : _Bus Delimiter : <>Case Specifier : maintainSlice Utilization Ratio : 100Slice Utilization Ratio Delta : 5

---- Other Optionslso : keygenblock.lsoRead Cores : YEScross_clock_analysis : NOverilog2001 : YESOptimize Instantiated Primitives : NO

TRANSLATION REPORT:

Release 6.1i - ngdbuild G.23Copyright (c) 1995-2003 Xilinx, Inc. All rights reserved.

Command Line: ngdbuild -intstyle ise -dd c:\xilinx\bin\vasu/_ngo -i -pxc2s15-cs144-6 keygenblock.ngc keygenblock.ngd

Reading NGO file "c:/xilinx/bin/vasu/keygenblock.ngc" ...Reading component libraries for design expansion...




Writing NGD file "keygenblock.ngd" ...

Writing NGDBUILD log file "keygenblock.bld"...

67

MAPPING REPORT:

Design Summary--------------Number of errors: 0Number of warnings: 0Logic Utilization: Total Number Slice Registers: 419 out of 384 109% (OVERMAPPED) Number used as Flip Flops: 415 Number used as Latches: 4 Number of 4 input LUTs: 1,016 out of 384 264% (OVERMAPPED)Logic Distribution: Number of occupied Slices: 665 out of 192 346%(OVERMAPPED) Number of Slices containing only related logic: 648 out of 665 97% Number of Slices containing unrelated logic: 17 out of 665 2% *See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs: 1,066 out of 384 277% (OVERMAPPED) Number used as logic: 1,016 Number used as a route-thru: 50 Number of bonded IOBs: 1,060 out of 86 1232% (OVERMAPPED) IOB Flip Flops: 960 Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25%

Total equivalent gate count for design: 17,572Additional JTAG gate count for IOBs: 50,928Peak Memory Usage: 72 MB

68

ENCRYPTION:

RTL SCHEMATIC

69

GATE LEVEL

70

SYNTHESIS REPORT:

=========================================================================* Synthesis Options Summary *=========================================================================---- Source ParametersInput File Name : encryption.prjInput Format : mixedIgnore Synthesis Constraint File : NOVerilog Include Directory :

---- Target ParametersOutput File Name : encryptionOutput Format : NGCTarget Device : xc2s15-6-cs144

---- Source OptionsTop Module Name : encryptionAutomatic FSM Extraction : YESFSM Encoding Algorithm : AutoFSM Style : lutRAM Extraction : YesRAM Style : AutoROM Extraction : YesROM Style : AutoMux Extraction : YESMux Style : AutoDecoder Extraction : YESPriority Encoder Extraction : YESShift Register Extraction : YESLogical Shifter Extraction : YESXOR Collapsing : YESResource Sharing : YESMultiplier Style : lutAutomatic Register Balancing : No


71


---- Other Optionslso : encryption.lsoRead Cores : YEScross_clock_analysis : NOverilog2001 : YESOptimize Instantiated Primitives : NO

Translation Report:



72

DECRYPTION:

GATE LEVEL

73

SYNTHESIS REPORT:

=========================================================================* Synthesis Options Summary *=========================================================================---- Source ParametersInput File Name : decryption.prjInput Format : mixedIgnore Synthesis Constraint File : NOVerilog Include Directory :

---- Target ParametersOutput File Name : decryptionOutput Format : NGCTarget Device : xc2s15-6-cs144

---- Source OptionsTop Module Name : decryptionAutomatic FSM Extraction : YESFSM Encoding Algorithm : AutoFSM Style : lutRAM Extraction : YesRAM Style : AutoROM Extraction : YesROM Style : AutoMux Extraction : YESMux Style : AutoDecoder Extraction : YESPriority Encoder Extraction : YESShift Register Extraction : YESLogical Shifter Extraction : YESXOR Collapsing : YESResource Sharing : YESMultiplier Style : lutAutomatic Register Balancing : No


74


---- Other Optionslso : decryption.lsoRead Cores : YEScross_clock_analysis : NOverilog2001 : YESOptimize Instantiated Primitives : NO

Translation Report:



75

ADVANTAGES

➢ SEA is parametric in text, key and processor size.

➢ It is a low cost encryption routine targeted for the processors with limited instruction set.

➢ It is a small encryption routine targeted to any given processor , the security of the cipher being adapted in function of its key size.

➢ It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

➢ This is a low-cost encryption routine basically designed for processors with a limited instruction set.

➢ In wireless communication and mobile computing and networking systems.

➢ For the encryption of JPEG2000 images.

➢ In scalable video coding .

➢ In sensor networks and RFID’s.

77

CONCLUSION

SEAn,b is a scalable encryption algorithm targeted for small embedded applications. The plaintext size, key size and processor (or word) size are parameters of the design. The structure of SEAn,b allows a fast evaluation of the cipher efficiency on any RISC machine. Its typical performances (encryption + decryption) for present key sizes and processors (e.g. 128-bit key, 1 Mhz 8-bit RISC) are in the range of an encryption/decryption in a few milliseconds, using a few hundreds bytes of ROM. One additional advantage of the design is its extreme simplicity. Based on the pseudo code provided in this paper, it is expected that the implementation of the cipher in assembly can be done within a few hours. We note finally that the design criteria of SEAn,b do not make it a conservative algorithm by nature. Further cryptanalysis efforts are consequently required.

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters. The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding. It executes one round per clock cycle, computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost. Compared to other recent block ciphers, SEA exhibits a very small area utilization that comes at the cost of a reduced throughput. Consequently, it can be considered as an interesting alternative for constrained environments. Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations.

Bibliography

78

Reference books:

Basic VLSI design, 3rd Edition Douglas A.Pucknell, Kamran Eshraghian

A VHDL Primer J. Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S. Tannenbaum

Network Cryptology William Stalling

Reference Websites:

IEEE Transactions www.wikipedia.com

www.webopedia.com

79

http://www.webopedia.com/

http://www.wikipedia.com/

a vhdl scalable-encryption-algorithm

Documents