custom fpga cryptographic blocks for reconfigurable ... and diffie-hellman ... hellman key exchange...

42
Acta Electrotechnica et Informatica No. 2 , Vol. 4, 2004 33 CUSTOM FPGA CRYPTOGRAPHIC BLOCKS FOR RECONFIGURABLE EMBEDDED NIOS PROCESSOR Miloš DRUTAROVSKÝ, Martin ŠIMKA Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Park Komenského 13, 041 20 Košice, Slovak Republic, E-mail: {Milos.Drutarovsky, Martin.Simka}@tuke.sk SUMMARY This paper introduces two custom blocks for Nios reconfigurable embedded processor implemented on Altera Field Programmable Gate Arrays (FPGAs). When operations like modular multiplication and modular exponentiation of long integers or other complex algebraic functions are performed on a general-purpose processor they usually consume a lot of processor resources and execution times are not satisfactory. A solution of this problem lies in development of custom coprocessors. The algebraic coprocessor for Montgomery Multiplication (MM) makes possible a fast execution of modular multiplication with large numbers that can be used in several public key cryptographic algorithms. A True Random Number Generator (TRNG) enhances an application of the Nios processor in cryptographic protocols. Until now only few implementations of TRNG on FPGA have been presented in literature. We describe a custom TRNG implementation based on a recently proposed method that reliably extracts intrinsic randomness from low-jitter clock signals synthesized by on-chip FPGA analog PLLs. Both peripheral blocks are connected to the Nios processor through Altera Avalon bus that directly supports scalable connection and different sources of the clock signal. In this way we can optimally use the resources of Altera FPGAs and implement designs customisable with regard to available area resources and desired level of security or timing constraints. Proposed solutions significantly improve security and computational power of System on a Chip (SoC) embedded cryptographic applications based on the Nios processor. Keywords: cryptographic algorithm, SoC, embedded coprocessor, TRNG, IP blocks, Montgomery multiplication 1. INDRODUCTION The recent rapid increase of the devices complexity and decreasing time-to-market are forcing new ways of system design. Capable solution of this problem lies in application of System on a Chip (SoC) based on pre-designed Intellectual Property (IP) cores and building blocks. IP blocks optimised for target devices, and with standardised interfaces offer fast solution without need for long- term development of often-used parts of designs. A cryptographic IP block either executes whole cryptographic algorithms, or, as it is in our case, offers significant improvement of computational power or security level of the system. The coprocessor for Montgomery Multiplication (MM) enables a faster execution of large numbers of multiplication needed for several cryptographic public key algorithms. The True Random Number Generator (TRNG) enhances an application of the Nios processor [1] in cryptographic protocols. Almost all cryptographic protocols require generation and use of secret values that must be unknown to attackers [2]. For example, TRNGs are required to generate public/private key pairs for asymmetric (public key) algorithms including RSA, DSA, and Diffie-Hellman [2]. Unfortunately, standard processors (including synthesizable Nios processor from Altera) are not able to generate true random numbers, as they are deterministic systems. The only way how to get true random numbers, hence true security for cryptographic systems, is to build a generator based on a random physical phenomenon. Current modern Field Programmable Gate Arrays (FPGAs) provide an alternative hardware platform even for system-level integration of embedded symmetric and asymmetric cryptographic algorithms, but not for high quality TRNGs. Most hardware TRNGs follow unpredictable natural processes, such as thermal (resistance or shoot) noise or nuclear decay. Since these principles of random numbers generation require additional external devices, they cannot be embedded completely inside the FPGAs. Such TRNGs are not compatible with modern FPGAs and do not provide a SoC solution. This paper describes a custom TRNG IP block based on a recently proposed method [3] that uses on-chip analog PLLs included in Altera APEX FPGAs [4]. The proposed method reliably extracts intrinsic randomness from low-jitter clock signals synthesized by on-chip APEX analog PLLs. The TRNG is developed as a custom IP building block optimised for Nios processor and provides significantly higher system level security for complete embedded cryptographic SoC designs. Public key algorithms, such as RSA, Diffie- Hellman key exchange algorithm or Elliptic curve cryptography use modular multiplication and modular exponentiation of long integers (up to 2048 bits for RSA). When these operations are performed on a general-purpose processor they consume a lot of time and processor resources. A solution of this problem lies in development of a custom coprocessor for modular long integer multiplication. Chosen MM algorithm provides certain advantages in the implementation of modular multiplication.

Upload: lytuyen

Post on 10-Jun-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

  • Acta Electrotechnica et Informatica No. 2 , Vol. 4, 2004 33

    CUSTOM FPGA CRYPTOGRAPHIC BLOCKS FOR RECONFIGURABLE EMBEDDED NIOS PROCESSOR

    Milo DRUTAROVSK, Martin IMKA Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics,

    Technical University of Koice, Park Komenskho 13, 041 20 Koice, Slovak Republic, E-mail: {Milos.Drutarovsky, Martin.Simka}@tuke.sk

    SUMMARY This paper introduces two custom blocks for Nios reconfigurable embedded processor implemented on Altera Field

    Programmable Gate Arrays (FPGAs). When operations like modular multiplication and modular exponentiation of long integers or other complex algebraic functions are performed on a general-purpose processor they usually consume a lot of processor resources and execution times are not satisfactory. A solution of this problem lies in development of custom coprocessors. The algebraic coprocessor for Montgomery Multiplication (MM) makes possible a fast execution of modular multiplication with large numbers that can be used in several public key cryptographic algorithms. A True Random Number Generator (TRNG) enhances an application of the Nios processor in cryptographic protocols. Until now only few implementations of TRNG on FPGA have been presented in literature. We describe a custom TRNG implementation based on a recently proposed method that reliably extracts intrinsic randomness from low-jitter clock signals synthesized by on-chip FPGA analog PLLs. Both peripheral blocks are connected to the Nios processor through Altera Avalon bus that directly supports scalable connection and different sources of the clock signal. In this way we can optimally use the resources of Altera FPGAs and implement designs customisable with regard to available area resources and desired level of security or timing constraints. Proposed solutions significantly improve security and computational power of System on a Chip (SoC) embedded cryptographic applications based on the Nios processor.

    Keywords: cryptographic algorithm, SoC, embedded coprocessor, TRNG, IP blocks, Montgomery multiplication 1. INDRODUCTION

    The recent rapid increase of the devices complexity and decreasing time-to-market are forcing new ways of system design. Capable solution of this problem lies in application of System on a Chip (SoC) based on pre-designed Intellectual Property (IP) cores and building blocks. IP blocks optimised for target devices, and with standardised interfaces offer fast solution without need for long-term development of often-used parts of designs.

    A cryptographic IP block either executes whole cryptographic algorithms, or, as it is in our case, offers significant improvement of computational power or security level of the system. The coprocessor for Montgomery Multiplication (MM) enables a faster execution of large numbers of multiplication needed for several cryptographic public key algorithms. The True Random Number Generator (TRNG) enhances an application of the Nios processor [1] in cryptographic protocols.

    Almost all cryptographic protocols require generation and use of secret values that must be unknown to attackers [2]. For example, TRNGs are required to generate public/private key pairs for asymmetric (public key) algorithms including RSA, DSA, and Diffie-Hellman [2]. Unfortunately, standard processors (including synthesizable Nios processor from Altera) are not able to generate true random numbers, as they are deterministic systems. The only way how to get true random numbers, hence true security for cryptographic systems, is to build a generator based on a random physical phenomenon.

    Current modern Field Programmable Gate Arrays (FPGAs) provide an alternative hardware platform even for system-level integration of embedded symmetric and asymmetric cryptographic algorithms, but not for high quality TRNGs. Most hardware TRNGs follow unpredictable natural processes, such as thermal (resistance or shoot) noise or nuclear decay. Since these principles of random numbers generation require additional external devices, they cannot be embedded completely inside the FPGAs. Such TRNGs are not compatible with modern FPGAs and do not provide a SoC solution.

    This paper describes a custom TRNG IP block based on a recently proposed method [3] that uses on-chip analog PLLs included in Altera APEX FPGAs [4]. The proposed method reliably extracts intrinsic randomness from low-jitter clock signals synthesized by on-chip APEX analog PLLs. The TRNG is developed as a custom IP building block optimised for Nios processor and provides significantly higher system level security for complete embedded cryptographic SoC designs.

    Public key algorithms, such as RSA, Diffie-Hellman key exchange algorithm or Elliptic curve cryptography use modular multiplication and modular exponentiation of long integers (up to 2048 bits for RSA). When these operations are performed on a general-purpose processor they consume a lot of time and processor resources. A solution of this problem lies in development of a custom coprocessor for modular long integer multiplication. Chosen MM algorithm provides certain advantages in the implementation of modular multiplication.

    40Acta Electrotechnica et Informatica No. 2, Vol. 4, 2004

    Acta Electrotechnica et Informatica No. 2 , Vol. 4, 200433

    CUSTOM FPGA CRYPTOGRAPHIC BLOCKS FOR RECONFIGURABLE EMBEDDED NIOS PROCESSOR

    Milo Drutarovsk, Martin imka

    Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics,

    Technical University of Koice, Park Komenskho 13, 041 20 Koice, Slovak Republic,

    E-mail: {Milos.Drutarovsky, Martin.Simka}@tuke.sk

    SUMMARY

    This paper introduces two custom blocks for Nios reconfigurable embedded processor implemented on Altera Field Programmable Gate Arrays (FPGAs). When operations like modular multiplication and modular exponentiation of long integers or other complex algebraic functions are performed on a general-purpose processor they usually consume a lot of processor resources and execution times are not satisfactory. A solution of this problem lies in development of custom coprocessors. The algebraic coprocessor for Montgomery Multiplication (MM) makes possible a fast execution of modular multiplication with large numbers that can be used in several public key cryptographic algorithms. A True Random Number Generator (TRNG) enhances an application of the Nios processor in cryptographic protocols. Until now only few implementations of TRNG on FPGA have been presented in literature. We describe a custom TRNG implementation based on a recently proposed method that reliably extracts intrinsic randomness from low-jitter clock signals synthesized by on-chip FPGA analog PLLs. Both peripheral blocks are connected to the Nios processor through Altera Avalon bus that directly supports scalable connection and different sources of the clock signal. In this way we can optimally use the resources of Altera FPGAs and implement designs customisable with regard to available area resources and desired level of security or timing constraints. Proposed solutions significantly improve security and computational power of System on a Chip (SoC) embedded cryptographic applications based on the Nios processor.

    Keywords: cryptographic algorithm, SoC, embedded coprocessor, TRNG, IP blocks, Montgomery multiplication

    1. INDRODUCTION

    The recent rapid increase of the devices complexity and decreasing time-to-market are forcing new ways of system design. Capable solution of this problem lies in application of System on a Chip (SoC) based on pre-designed Intellectual Property (IP) cores and building blocks. IP blocks optimised for target devices, and with standardised interfaces offer fast solution without need for long-term development of often-used parts of designs.

    A cryptographic IP block either executes whole cryptographic algorithms, or, as it is in our case, offers significant improvement of computational power or security level of the system. The coprocessor for Montgomery Multiplication (MM) enables a faster execution of large numbers of multiplication needed for several cryptographic public key algorithms. The True Random Number Generator (TRNG) enhances an application of the Nios processor [1] in cryptographic protocols.

    Almost all cryptographic protocols require generation and use of secret values that must be unknown to attackers [2]. For example, TRNGs are required to generate public/private key pairs for asymmetric (public key) algorithms including RSA, DSA, and Diffie-Hellman [2]. Unfortunately, standard processors (including synthesizable Nios processor from Altera) are not able to generate true random numbers, as they are deterministic systems. The only way how to get true random numbers, hence true security for cryptographic systems, is to build a generator based on a random physical phenomenon.

    Current modern Field Programmable Gate Arrays (FPGAs) provide an alternative hardware platform even for system-level integration of embedded symmetric and asymmetric cryptographic algorithms, but not for high quality TRNGs. Most hardware TRNGs follow unpredictable natural processes, such as thermal (resistance or shoot) noise or nuclear decay. Since these principles of random numbers generation require additional external devices, they cannot be embedded completely inside the FPGAs. Such TRNGs are not compatible with modern FPGAs and do not provide a SoC solution.

    This paper describes a custom TRNG IP block based on a recently proposed method [3] that uses on-chip analog PLLs included in Altera APEX FPGAs [4]. The proposed method reliably extracts intrinsic randomness from low-jitter clock signals synthesized by on-chip APEX analog PLLs. The TRNG is developed as a custom IP building block optimised for Nios processor and provides significantly higher system level security for complete embedded cryptographic SoC designs.

    Public key algorithms, such as RSA, Diffie-Hellman key exchange algorithm or Elliptic curve cryptography use modular multiplication and modular exponentiation of long integers (up to 2048 bits for RSA). When these operations are performed on a general-purpose processor they consume a lot of time and processor resources. A solution of this problem lies in development of a custom coprocessor for modular long integer multiplication. Chosen MM algorithm provides certain advantages in the implementation of modular multiplication. The coprocessor has a scalable structure and uses effectively the resources of FPGA. Thanks to the implementation on FPGA we can obtain a very flexible and secure solution.

    The paper is organised as follows: in Section 2 we present briefly the Nios processor and its features important to interface of the coprocessor and TRNG. In Section 3 we give a brief description of the TRNG principle and in Section 4 the MM algorithm is presented. In the Section 5 we deal with a way of an implementation of the blocks and their interface to the Nios processor, improvements of the MM coprocessor's structure, and speed and area results of implementations in Altera FPGA devices. In the Section 6 we present our conclusions and propose possible ways of future development.

    2. AN OVERVIEW OF EMBEDDED NIOS PROCESSOR

    The Nios CPU [1] is a pipelined general-purpose RISC processor that is generated by proprietary Altera VHDL generator (SOPC Builder) and can be synthesised and embedded in all recent Altera FPGAs. The Nios supports both 32-bit and 16-bit architectural variants. Both variants use 16-bit instructions. The principal features of the Nios instruction set architecture are:

    Large, windowed register file Nios implementations can include up to 512 internal general-purpose registers. The compiler uses the internal registers to accelerate subroutine calls and local variable access.

    Simple, complete instruction set both 32-bit and 16-bit Nios variants use 16-bit wide instructions. 16-bit instructions reduce code size and instruction-memory bandwidth.

    Powerful addressing modes the Nios instruction set includes Load and Store instructions that the compiler uses to accelerate structure access and local-variable (stack) access.

    Extensibility users can incorporate custom logic directly into the Nios arithmetic logic unit. The automatically-generated software development kit includes macros for accessing custom instruction hardware for C and assembly-language programs.

    Existing Nios peripherals (e.g. UART, Timer...) as well as new custom peripherals can be connected through an Avalon bus [5]. Avalon is a simple bus architecture designed for connecting on-chip processor(s) and peripheral together into a SoC. The principal features of the Avalon bus are:

    Simplicity provides an easy to understand protocol with a short learning curve.

    Optimized resource utilization conserves Logic Elements (LEs) inside the FPGAs.

    Synchronous operation integrates well with other user logic that coexists on the same FPGA, while avoiding complex timing analysis issues.

    An example of a SoC structure with user-defined custom peripherals is shown in Fig. 1. Custom peripherals can be included in a system block generated by SOPC Builder what brings optimised implementation of the whole system, or they can be fed to Avalon bus as pre-synthesised blocks. The Fig. 1 depicts a simplified clock distribution, where

    is a primary clock signal used for clocking of the Nios and all of peripherals, and

    2

    F

    denotes additional clock signals whose function is explained later.

    Fig. 1 Example of a system module integrated with custom peripherals into an Altera FPGA

    3. BASIC PRINCIPLE OF TRNG IP BLOCK

    Several well-known TRNGs require additional off-chip component which decreases a security level of the system. In addition, some practical implementations based on free running oscillators are not secure enough for cryptographic applications as the entropy increase is not sufficiently high, so that the output of the generator can be predicted [6]. In next we shortly describe an implementation completely embedded in Altera FPGAs [3].

    Modern Altera FPGAs use reconfigurable analog on-chip PLLs to increase performance and to provide on-chip clock-frequency synthesis. The basic principle behind our method is an extraction of a randomness from the jitter of the clock signal synthesized in the embedded analog PLL [3] (illustrated in Fig. 2).

    CLK

    PLL

    D

    CLK

    Q

    CLJ

    x(nT

    Q

    )

    XOR

    Decimator

    (K

    D

    )

    q(nT

    CLK

    )

    Fig. 2 Basic principle of randomness extraction from the low-jitter clock signal

    The jitter is detected by the sampling of a reference clock signal

    CLK

    (

    1

    F

    in Fig. 1) with frequency

    CLK

    F

    using rationally related clock signal

    CLJ

    (

    2

    F

    in Fig. 1) synthesized in the PLL with the frequency:

    M

    CLJCLKCLK

    D

    K

    m

    FFF

    Knk

    ==

    , MACROBUTTON MTPlaceRef \* MERGEFORMAT (1)

    where

    m

    and

    k

    can range from 2 to 160, and

    n

    ranges from 1 to 16 [4]. Let values of multiplication factor

    M

    K

    and division factor

    D

    K

    be relative primes, so

    (

    )

    GCD,1

    MD

    KK

    =

    , MACROBUTTON MTPlaceRef \* MERGEFORMAT (2)

    where

    GCD

    is an abbreviation of Greatest Common Divisor. Equation (2)

    ensures that the maximum guaranteed distance between the closest edges of GOTOBUTTON ZEqnNum289648 \* MERGEFORMAT and

    CLJ

    (denoted as

    (

    )

    min

    MAX

    T

    D

    ) over the period:

    QDCLKMCLJ

    TKTKT

    ==

    MACROBUTTON MTPlaceRef \* MERGEFORMAT (3)

    is equal to [3]:

    (

    )

    (

    )

    (

    )

    min

    GCD2,

    MAX

    4

    GCD2,

    .

    4

    MD

    CLK

    M

    MD

    CLJ

    D

    KK

    TT

    K

    KK

    T

    K

    D==

    =

    MACROBUTTON MTPlaceRef \* MERGEFORMAT (4)

    By proper choosing of

    M

    K

    ,

    D

    K

    and

    CLK

    T

    it is possible to guarantee that

    (

    )

    min

    MAX

    jit

    T

    s

    D

    , where div class="embedded" id="_1136794016"/

    pjit/p

    ps/p

    is the RMS (root mean square) value of PLL intrinsic jitter. According to [7] and [8], an intrinsic jitter can be approximated by Gaussian distribution and div class="embedded" id="_1136794031"/

    p15ps/p

    pjit/p

    ps/p

    p/p

    . Moreover, the proposed method is insensitive to an overall jitter characteristic as far as an intrinsic PLL jitter is included. /p

    p class="normln"In general, the obtained bits are statistically biased random bits. Acceptable TRNG should produce output bits with equal probability. A common way how to reduce the statistical bias is to use a XOR corrector. Non-overlapped pairs of bits are XORed together. In this way, a uniform distribution of generated true random bits with period div class="embedded" id="_1136794044"/

    pQ/p

    pT/p

    (3) is guaranteed [3]./p

    p class="nadpis_1"4. SCALABLE WORD-BASED MONTGOMERY MULTIPLICATION ALGORITHM/p

    p class="normln"Basic mathematical operation used by RSA is modular exponentiation [2]:/p

    p class="mTDisplayEquation"div class="embedded" id="_1136794056"/

    pmod/p

    pE/p

    pZXM/p

    p=/p

    ,(5)/p

    p class="normln"that a binary or general div class="embedded" id="_1136794064"/

    pm/p

    -ary method can break into a series of modular multiplications. All of these computations have to be performed with large div class="embedded" id="_1136794074"/

    pk/p

    -bit integers (typically div class="embedded" id="_1136794081"/

    p{/p

    p}/p

    p1024,2048/p

    pk/p

    p/p

    pK/p

    )./p

    p class="normln"The well-known MM algorithm [9] speeds-up modular multiplication and squaring required for exponentiation (5). It computes the MM product for div class="embedded" id="_1136794107"/

    pk/p

    -bit integers div class="embedded" id="_1136794115"/

    pX/p

    , div class="embedded" id="_1136794123"/

    pY/p

    as/p

    p class="mTDisplayEquation"div class="embedded" id="_1136794131"/

    p(/p

    p)/p

    p1/p

    p,mod/p

    pMMXYXYRM/p

    p-/p

    p=/p

    ,(6)/p

    p class="normln"where div class="embedded" id="_1136794139"/

    p2/p

    pk/p

    pR/p

    p=/p

    , and div class="embedded" id="_1136794146"/

    pM/p

    is an integer in the range div class="embedded" id="_1136794152"/

    p1/p

    p22/p

    pkk/p

    pM/p

    p-/p

    p/p

    such that div class="embedded" id="_1136794161"/

    p(/p

    p)/p

    pGCD,1/p

    pRM/p

    p=/p

    ./p

    p class="normln"Basic MM (6) can be used for efficient computation of (5) by the standard Montgomery exponentiation algorithm [2]. The starting point of this algorithm is the MM. The faster MM is performed, the faster exponentiation process will be accomplished./p

    p class="normln"For implementation of MM on FPGAs we use a modified version of MM Multiple Word Radix-2 Montgomery Multiplication (MWR2MM) algorithm [10]. MWR2MM with word width div class="embedded" id="_1136794176"/

    pw/p

    performs bit-level computations and produces word-level outputs. The scalability and word-orientation of the algorithm makes possible to implement the scalable MM coprocessor. For operands with a div class="embedded" id="_1136794107"/

    pk/p

    -bit precision, div class="embedded" id="_1136794218"/

    pekw/p

    p/p

    p=/p

    p/p

    words are required. MWR2MM algorithm scans word-wise operand div class="embedded" id="_1136794240"/

    pY/p

    (multiplicand) and div class="embedded" id="_1136794247"/

    pM/p

    , and bit-wise operand div class="embedded" id="_1136794250"/

    pX/p

    (multiplier). Depending on timing requirements, area limitations and connected version of Nios processor we can choose the word width div class="embedded" id="_1137240436"/

    p8,16,32,64/p

    pw/p

    pK/p

    p=/p

    /p

    p class="normln"For description of the MWR2MM algorithm we use the following denominations:/p

    p class="mTDisplayEquation"div class="embedded" id="_1144390707"/

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p110/p

    p110/p

    p110/p

    p,,,,/p

    p,,,,/p

    p,,,./p

    pe/p

    pe/p

    pk/p

    pMMMM/p

    pYYYY/p

    pXxxx/p

    pK/p

    pK/p

    pK/p

    p-/p

    p-/p

    p-/p

    p=/p

    p=/p

    p=/p

    (7)/p

    p class="normln"The concatenation of vectors div class="embedded" id="_1136827709"/

    pA/p

    and div class="embedded" id="_1136827720"/

    pB/p

    is represented as div class="embedded" id="_1136827734"/

    p(/p

    p)/p

    p,/p

    pAB/p

    . A particular range of bits in a vector div class="embedded" id="_1136827751"/

    pA/p

    from position div class="embedded" id="_1136827761"/

    pi/p

    to position div class="embedded" id="_1136827770"/

    pj/p

    is represented as div class="embedded" id="_1136827787"/

    p../p

    pji/p

    pA/p

    . The bit position of the div class="embedded" id="_1136827867"/

    pth/p

    pk/p

    word of div class="embedded" id="_1136827933"/

    pA/p

    is represented as div class="embedded" id="_1136827963"/

    p(/p

    p)/p

    pk/p

    pi/p

    pA/p

    . Then a fragment of the MWR2MM algorithm can be described as:/p

    p class="mTDisplayEquation"div class="embedded" id="_1137241394"/

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p(/p

    p)/p

    p000/p

    p0/p

    p0/p

    p000/p

    p11/p

    p01..1/p

    p11/p

    p1..1/p

    pfor0to1do/p

    p,/p

    pif1then/p

    p,/p

    pfor1to1do/p

    p,/p

    p,/p

    p,/p

    pi/p

    pjjjj/p

    pi/p

    pjjj/p

    pw/p

    pee/p

    pw/p

    pik/p

    pCSxYS/p

    pS/p

    pCSCSM/p

    pje/p

    pCSCxYMS/p

    pSSS/p

    pSCS/p

    p--/p

    p-/p

    p--/p

    p-/p

    p=-/p

    p=+/p

    p=/p

    p=++/p

    p=-/p

    p=+++/p

    p=/p

    p=/p

    /p

    p class="normln"The computations are performed with precision div class="embedded" id="_1136794176"/

    pw/p

    bits, what means that they are independent on the operands precision div class="embedded" id="_1137242411"/

    pk/p

    . What varies is the number of loop iterations needed to compute the result of modular multiplication. /p

    p class="normln"Due to the mutual dependency of the operations inside the div class="embedded" id="_1137242498"/

    pj/p

    loop the parallel execution is not possible. This is not the case of the instructions in different div class="embedded" id="_1137242701"/

    pi/p

    loops which can be executed in parallel. This fact makes possible to implement even more flexible and faster scalable pipelined architecture./p

    p class="nadpis_1"5. HARDWARE MAPPING TO THE ALTERA FPGA/p

    p class="normln"The Altera Nios Embedded Processor Development Board with APEX EP20K200 [11] has been used for the implementation of the proposed IP blocks. The board and the provided software form powerful development tools for implementation and testing of the designs. Thanks to the fact that the custom blocks are described by VHDL code we obtained parameterised designs that can be modified according to the desired time and area limits. All presented results have been obtained by using the Altera Quartus II ver.2.2, Altera Nios 2.2, and FPGA Advantage 5.3 tools./p

    p class="nadpis_8"5.1 TRNG/p

    p class="zkladn_text,Prv_odsek"The same Nios development board was also used in [8] for reference PLL measurements so we can expect that jitter characteristics presented in [8] can be directly applied to our design. The configuration of the TRNG block is depicted in Fig. 3. The board features a PLL-capable APEX EP20K200-2X with four on-chip analog PLLs. We used the same PLL organisation as described in the previous design of TRNG [3], the values of parameters div class="embedded" id="_1136793972"/

    pM/p

    pK/p

    and div class="embedded" id="_1136793984"/

    pD/p

    pK/p

    were chosen according to the connection of the TRNG to the Nios processor. The two on-chip PLLs shown in Fig. 3 have been used for generating div class="embedded" id="_1136794290"/

    pCLJ/p

    and div class="embedded" id="_1136794300"/

    pCLK/p

    signals./p

    p class="obrazok"div class="embedded" id="_1137245530"/

    pF/p

    pEXT/p

    p= 33.570 MHz/p

    pPLL4/p

    pclk1/p

    pclk0/p

    pPLL2/p

    pclk1/p

    pclk0/p

    pinclk/p

    pinclk/p

    pF/p

    pEXT/p

    p14/p

    p101/p

    p= 4.616 MHz/p

    pF/p

    pEXT/p

    p /p

    p= 33.3 MHz/p

    pF/p

    pEXT/p

    p14/p

    p14/p

    p= 33.3 MHz/p

    p 80*14/p

    p11*101/p

    p12/p

    p4 Dedicated/p

    pClocks/p

    pG4 G2 G1 G3/p

    p4.616 MHz/p

    pCLK/p

    p:3/p

    pCLJ/p

    /p

    p class="obrazok"Fig. 3 Actual PLL configuration used in the proposed TRNG IP block/p

    p class="zkladn_text,Prv_odsek"The external clock source was div class="embedded" id="_1136794313"/

    p33.3MHz/p

    pEXT/p

    pF/p

    p=/p

    , on-chip synthesized clocks were div class="embedded" id="_1137331712"/

    p33.3MHz/p

    pCLK/p

    pF/p

    p=/p

    and div class="embedded" id="_1137483759"/

    p(/p

    p)/p

    p1120/3333/p

    pCLJCLK/p

    pFF/p

    p=/p

    div class="embedded" id="_1144582219"/

    p11.9MHz/p

    (can be modified according to other requirements). According to (4)

    these parameters ensure that GOTOBUTTON ZEqnNum880351 \* MERGEFORMAT . The output bit-rate of the TRNG is div class="embedded" id="_1136794347"/

    p1/10000bits/s/p

    pQ/p

    pT/p

    p/p

    . /p

    p class="normln"Example of resource requirements of 16-bit and 32-bit Nios with corresponding 16-bit and 32-bit TRNG implementations are shown in Tab.1 (LE Logic Element is the basic building block of Altera FPGAs). /p

    p class="normln"Name of the block/p

    p class="normln"# LEs/p

    p class="normln"% of /p

    p class="normln"total capacity/p

    p class="normln"Nios-16/p

    p class="normln"Nios-32/p

    p class="normln"1140/p

    p class="normln"1480/p

    p class="normln"14/p

    p class="normln"18/p

    p class="normln"UART/p

    p class="normln"170/p

    p class="normln"2/p

    p class="normln"TRNG-16 /p

    p class="normln"TRNG-32/p

    p class="normln"150/p

    p class="normln"200/p

    p class="normln"2/p

    p class="normln"2.5/p

    p class="normln"ALL /p

    p class="normln"ALL + interface logic/p

    p class="normln"1669/p

    p class="normln"2143/p

    p class="normln"20/p

    p class="normln"26/p

    p class="normln"Tab. 1 Results of TRNG mapping to Altera APEX EP20K200-2X/p

    p class="nadpis_8"5.1.1 TRNG interface/p

    p class="normln"The TRNG peripheral is connected to Avalon bus as a standard memory mapped peripheral. TRNG can be accessed from the Nios processor through three custom registers. Data (read only) and Control/Status registers (write/read access) are mapped into two memory locations shown in Fig. 4. /p

    p class="normln"div class="embedded" id="_1144261947"/

    pVX X X ... X X X/p

    pC/p

    pTRNG Dataread access15 ... 2 1 0TRNG_base+0TRNG_base+1(data register)/p

    p(status register)/p

    pwrite accessIE (control register)X X X ... X X X 15 ... 2 1 0V =1/0 - valid/invalid dataC =1/0 - proper/improper clocksIE=1/0 - interrupt enable/disable/p

    pX - undefined/p

    /p

    p class="normln"Fig. 4 Data and Control/Status registers of 16-bit TRNG/p

    p class="normln"TRNG can be accessed by the standard memory pooling as well as by using an interrupt service request that can be enabled by an application program. The exact position of a TRNG_base address and TRNG_IRQ can be configured by the SOPC Builder./p

    p class="nadpis_8"5.1.2 TRNG statistical testing/p

    p class="normln"Statistical testing is very important for random number generators dedicated to usage in cryptographic algorithms. More detailed description of applying NIST statistical test suite [12] on presented TRNG can be found in [3]. /p

    p class="normln"While in [3] the generator is the only one block implemented on the device, in this paper we present configuration with TRNG connected to the Nios processor. More complex design and logic clocked by different clock sources can affect the jitter distribution. The aim of tests was to detect possible deviations of statistical characteristics. After testing we have obtained similar results of statistical tests as in [3]. This confirms the fact that the reliability of TRNG and the randomness of its output are independent on the design complexity, FPGA configuration, and internal circuit activity./p

    p class="nadpis_8"5.2 Scalable MM coprocessor/p

    p class="normln"The implemented radix-2 MM coprocessor is a unit realizing the MWR2MM algorithm. Input values (div class="embedded" id="_1136794365"/

    pX/p

    ,div class="embedded" id="_1136794371"/

    pY/p

    , and div class="embedded" id="_1136794374"/

    pM/p

    ) as well as the result div class="embedded" id="_1136794379"/

    pS/p

    obtained by the coprocessor are stored in coprocessor's memory. Since input and output values can have high precision (1024 bits or more), we have decided to store inputs and intermediate results in Embedded Memory Blocks (EMBs). EMBs can implement various types of memory with one block size 2048 bits [4]. Therefore, the coprocessor has been split into two blocks: a Radix-2 Montgomery multiplier and a dual port data memory./p

    p class="normln"In order to reduce storage and arithmetic hardware complexity, data path of the MM coprocessor uses div class="embedded" id="_1136794399"/

    pX/p

    ,div class="embedded" id="_1136794403"/

    pY/p

    , and div class="embedded" id="_1136794407"/

    pM/p

    in a standard non-redundant form. The internal sum div class="embedded" id="_1136794411"/

    pS/p

    is received and generated in the redundant Carry-Save form [13]. Therefore, the bit resolution of the sum div class="embedded" id="_1136794422"/

    pS/p

    is effectively doubled and we use denomination div class="embedded" id="_1136879002"/

    p1/p

    pS/p

    and div class="embedded" id="_1136879066"/

    p2/p

    pS/p

    for it. Then div class="embedded" id="_1137329419"/

    p(/p

    p)/p

    p12/p

    pj/p

    pS/p

    represents the second bit of the div class="embedded" id="_1136879295"/

    pth/p

    pj/p

    word of the first part of div class="embedded" id="_1136794422"/

    pS/p

    (see Fig. 5). The advantage of Carry-Save form is faster addition process without using the carry logic especially for div class="embedded" id="_1137399089"/

    p16/p

    pw/p

    p/p

    ./p

    p class="normln"The data path design is based on the structure presented in [10]. The MM unit consists of two layers of carry-save adders as it is shown in Fig. 5 for div class="embedded" id="_1136794431"/

    p3/p

    pw/p

    p=/p

    . Input div class="embedded" id="_1136794442"/

    pc/p

    represents latched value div class="embedded" id="_1136879337"/

    pt/p

    that is the least significant bit of the value div class="embedded" id="_1136794453"/

    p(/p

    p)/p

    p(/p

    p)/p

    p00/p

    pi/p

    pSxY/p

    p+/p

    and is used to control the addition of div class="embedded" id="_1144262441"/

    p./p

    pM/p

    /p

    p class="normln"/p

    p class="normln"Fig. 5 Structure of the MM unit for div class="embedded" id="_1136794474"/

    p3/p

    pw/p

    p=/p

    (FA Full Adder)/p

    p class="normln"The most important parameter influencing the overall multiplier speed is the memory access time. Since during one cycle previous results div class="embedded" id="_1136879408"/

    p1/p

    pS/p

    and div class="embedded" id="_1136879066"/

    p2/p

    pS/p

    have to be read and current results from the last stage have to be written to the same memory, we have chosen the EMB configuration as a dual port RAM using the Altera-specific function lpm_ram_dp from the Library of Parameterized Modules (LPM). The memory with registered input/output data has been found as the faster implementation./p

    p class="normln"The pipelined version of the coprocessor has been implemented using changes published in [14]. The data path is organized as a pipelined structure of several MM units shown in Fig. 6. A distribution of signal div class="embedded" id="_1136794489"/

    pX/p

    to the units has been found as a critical path in the time analysis. Therefore this part has been redesigned by adding an 1-bit register for div class="embedded" id="_1136794494"/

    pi/p

    px/p

    into the MM unit (see Fig. 6). Thanks to this modification there are no limitations for a number of units as in [15]./p

    p class="normln"/p

    p class="normln"Fig. 6 Block diagram of the MM coprocessor data path/p

    p class="normln"The maximum degree of parallelism that can be obtained with this architecture is found as:/p

    p class="mTDisplayEquation"div class="embedded" id="_1144390726"/

    pmax/p

    p./p

    p2/p

    pe/p

    pn/p

    p/p

    p/p

    p=/p

    p/p

    p/p

    (8)/p

    p class="normln"The number 2 in denominator expresses the number of clock cycles after which the output of the MM unit is valid. When less than div class="embedded" id="_1136794518"/

    pmax/p

    pn/p

    MM units are available, the total execution time will increase, but on the other hand the area occupation of the coprocessor can be changed according to the area constraints of a target device. The computation time of the single MM operation when div class="embedded" id="_1136794529"/

    pmax/p

    pnn/p

    p/p

    MM units are used is:/p

    p class="mTDisplayEquation"div class="embedded" id="_1144390743"/

    p2/p

    pMM/p

    p1/p

    p2./p

    pMM/p

    pclk/p

    pk/p

    pTn/p

    pfwn/p

    p/p

    p/p

    p/p

    p/p

    p=+/p

    p/p

    p/p

    p/p

    p/p

    p/p

    p/p

    (9)/p

    p class="normln"The MM coprocessor has 3 main parameters (div class="embedded" id="_1137313046"/

    pw/p

    ,div class="embedded" id="_1137313062"/

    pe/p

    , and div class="embedded" id="_1137313111"/

    pn/p

    ) that can be changed according to the required area of the implemented coprocessor and the required timings for MM computations (div class="embedded" id="_1137313125"/

    pn/p

    ,div class="embedded" id="_1137313134"/

    pw/p

    ) or the security level (div class="embedded" id="_1137313144"/

    pe/p

    ). This approach gives high flexibility to the processor and coprocessor design./p

    p class="normln"Area occupation expressed in LEs and maximal possible FPGA clock frequency for Altera APEX 20K200E are given in Table 2 for precisions div class="embedded" id="_1137495315"/

    p1024/p

    pk/p

    p=/p

    , and div class="embedded" id="_1137495354"/

    p2048/p

    bits. The MM coprocessor has the maximal clock frequency significantly higher than the Nios processor (div class="embedded" id="_1137479828"/

    pNios/p

    p33.33MHz/p

    pclk/p

    pf/p

    p=/p

    ). To achieve higher performance of the system, we use one clock signal (div class="embedded" id="_1137311344"/

    pNios/p

    pclk/p

    pf/p

    ) for the Nios processor, and another one (div class="embedded" id="_1137311366"/

    pMM/p

    pclk/p

    pf/p

    ), with higher frequency, for the MM coprocessor./p

    p class="normln"The computational time (in s) of the single MM operation (according to the equation 9) is presented in Table 3 for frequency value of the MM coprocessor clock signal div class="embedded" id="_1137582046"/

    pMM/p

    p99.99MHz/p

    pclk/p

    pf/p

    p=/p

    ./p

    p class="nadpis_8"5.2.1 MM coprocessor interface/p

    p class="normln"The way in which the MM coprocessor is connected to the embedded processor is important for the computing process control (the process starting and the processor status checking), and for an exchange of processed data./p

    p class="normln"In the previous solution [15] the status register has been used for checking the actual status of the coprocessor and the computational process. The current version uses the communication over an interrupt. This solution is more suitable for software control of coprocessors and for a configuration with several MM coprocessors that can be used for parallel computation of MM. After the computation of MM the interrupt signal of the processor is asserted. This state persists until the results are read within the interrupt routine by the processor./p

    p class="normln"/

    p class="normln"div class="embedded" id="_1136794610"/

    p1024/p

    pk/p

    p=/p

    /p

    p class="normln"div class="embedded" id="_1136794619"/

    p2048/p

    pk/p

    p=/p

    /p

    p class="normln"/

    p class="normln"LEs/p

    p class="normln"div class="embedded" id="_1136794625"/

    pMM/p

    pclk/p

    pf/p

    /p

    p class="normln"LEs/p

    p class="normln"div class="embedded" id="_1136794632"/

    pMM/p

    pclk/p

    pf/p

    /p

    p class="normln"div class="embedded" id="_1136794579"/

    p1/p

    pn/p

    p=/p

    /p

    p class="normln"542/p

    p class="normln"107.22/p

    p class="normln"551/p

    p class="normln"105.83/p

    p class="normln"div class="embedded" id="_1136794586"/

    p2/p

    pn/p

    p=/p

    /p

    p class="normln"1100/p

    p class="normln"110.43/p

    p class="normln"1136/p

    p class="normln"106.96/p

    p class="normln"div class="embedded" id="_1136794592"/

    p3/p

    pn/p

    p=/p

    /p

    p class="normln"1621/p

    p class="normln"108.34/p

    p class="normln"1644/p

    p class="normln"104.39/p

    p class="normln"div class="embedded" id="_1136794600"/

    p4/p

    pn/p

    p=/p

    /p

    p class="normln"1943/p

    p class="normln"106.67/p

    p class="normln"1980/p

    p class="normln"103.85/p

    p class="normln"Tab. 2 Area occupation (LEs)/max div class="embedded" id="_1136794639"/

    pMM/p

    pclk/p

    pf/p

    (MHz) of the MM coprocessor (div class="embedded" id="_1136794646"/

    p32/p

    pw/p

    p=/p

    , div class="embedded" id="_1137436307"/

    p1..4/p

    pn/p

    p=/p

    )/p

    p class="normln"/

    p class="normln"div class="embedded" id="_1136794655"/

    p1024/p

    pk/p

    p=/p

    /p

    p class="normln"div class="embedded" id="_1136794662"/

    p2048/p

    pk/p

    p=/p

    /p

    p class="normln"div class="embedded" id="_1136794670"/

    p1/p

    pn/p

    p=/p

    /p

    p class="normln"327.7/p

    p class="normln"1310.72/p

    p class="normln"div class="embedded" id="_1136794671"/

    p2/p

    pn/p

    p=/p

    /p

    p class="normln"163.88/p

    p class="normln"655.38/p

    p class="normln"div class="embedded" id="_1136794672"/

    p3/p

    pn/p

    p=/p

    /p

    p class="normln"109.26/p

    p class="normln"436.94/p

    p class="normln"div class="embedded" id="_1136794673"/

    p4/p

    pn/p

    p=/p

    /p

    p class="normln"82.00/p

    p class="normln"327.76/p

    p class="normln"Tab. 3 Computational time div class="embedded" id="_1136794681"/

    pMM/p

    pT/p

    (s) for div class="embedded" id="_1137311801"/

    p32/p

    pw/p

    p=/p

    , div class="embedded" id="_1137582133"/

    p99.99MHz/p

    pMM/p

    pclk/p

    pf/p

    p=/p

    , div class="embedded" id="_1137436316"/

    p1..4/p

    pn/p

    p=/p

    /p

    p class="normln"Thereafter new operands can be loaded into the memory and the whole process starts again./p

    p class="normln"The second area where the most suitable solution needs to be found is the interface between the coprocessor's memory block and the bus of the embedded processor. Since there is a need for faster clocking of the coprocessor, we have assumed the case with two clock signals (see Fig. 7). /p

    p class="normln"/p

    p class="normln"Fig. 7 Block diagram of the interface between the MM coprocessor and the Nios processor/p

    p class="normln"The slower one is connected to the processor, and through the bus and the interface to the coprocessor. All the processes concerning the operations between the processor's bus and the coprocessor's memory block are clocked by this signal (div class="embedded" id="_1137311429"/

    pNios/p

    pclk/p

    pf/p

    ). The operations inside the MM coprocessor are clocked by the external faster clock signal (div class="embedded" id="_1137311416"/

    pMM/p

    pclk/p

    pf/p

    ). Thanks to this clock signals organisation almost three times higher performance has been obtained compared to the previous implementation [15]./p

    p class="nadpis_1"6. CONCLUSIONS AND FUTURE DEVELOPMENT/p

    p class="normln"The paper has demonstrated the possibility of custom IP blocks development for cryptographic applications in FPGAs. Advancements in FPGAs provide new options for design engineers. FPGAs maintain the advantages of custom functionality, like an ASIC, but avoid high development costs and the inability to make design modifications after productions./p

    p class="normln"Implemented IP blocks enhance the applicability of the Nios processor in a cryptographic SoC. Both blocks have scalable structure and hence create together with the parameterisable Nios processor very flexible solution. A support for faster modular multiplication and a need of true random numbers are functions required for several currently used cryptographic protocols and algorithms /p

    p class="normln"Thanks to the changes in implemented MM coprocessor we obtained faster solution with optimised features of the interface. The fact that TRNG can be implemented inside the FPGAs represents significant security and system advantage in embedded cryptographic applications. /p

    p class="normln"The future development should include the implementations of other cryptographic algorithms and the improvements of the presented blocks. /p

    p class="nadpis_6"REFERENCES/p

    p class="zhlav"[1]/p

    p class="normln"NIOS 2.2 CPU. Data Sheet, September 2002, ver. 1.3, pp. 1-14, www.altera.com/nios./p

    p class="normln"[2]/p

    p class="normln"J.A. Menezes, P.C. Oorschot, and S.A. Vanstone: Handbook of Applied Cryptography, CRC Press, New York, 1997./p

    p class="normln"[3]/p

    p class="normln"V. Fischer and M. Drutarovsk: True Random Number Generator Embedded in Reconfigurable Hardware, Proceedings of the Workshop on Cryptographic Hardware and Embedded Systems CHES2002, Redwood Shores, California, USA, August 2002, pp. 415-430./p

    p class="normln"[4]/p

    p class="normln"APEX 20K Programmable Logic Device Family. Data Sheet, February 2002, ver. 4.3, pp. 1-116, www.altera.com./p

    p class="normln"[5]/p

    p class="normln"Avalon Bus Specification, Altera Reference Manual, January 2003, ver. 2.1, pp. 1-108, www.altera.com/nios./p

    p class="normln"[6]/p

    p class="normln"M. Dichtl: How to Predict the Output of a Hardware Random Number Generator, In C.D. Walter, C. K. Koc, Ch. Paar (Eds.): Cryptographic Hardware and Embedded Systems 2003, LNCS 2779, Springer, Berlin, 2003, pp. 181-188./p

    p class="normln"[7]/p

    p class="normln"Jitter comparison analysis: APEX 20KE PLL vs. Virtex-E DLL, Technical Brief 70, January 2001, ver.1.1, pp.1-7, www.altera.com./p

    p class="normln"[8]/p

    p class="normln"Superior Jitter management with DLLs. Virtex Tech Topic VTT013(v1.2), January 21, 2002, pp. 1-6, www.xilinx.com./p

    p class="normln"[9]/p

    p class="normln"C.K. Koc, T. Acar: Analyzing and Comparing Montgomery Multiplication Algorithms, IEEE Micro, (16) 3, pp. 26-33, June 1996./p

    p class="normln"[10]/p

    p class="normln"A.F. Tenca, C.K. Koc: A Scalable Architecture for Modular Multiplication Based on Montgomerys Algorithm, IEEE Transactions on Computers, 52(9), pp. 1215-1221, September 2003./p

    p class="normln"[11]/p

    p class="normln"p1/p

    pF/p

    Nios Embedded Processor Development Board, Altera Data Sheet, v.2.1, April 2002, pp.1-22, www.altera.com/nios./p

    p class="normln"[12]/p

    p class="normln"A. Rukhin, et al.: A statistical test suite for random and pseudorandom number generators for cryptographic applications, NIST Special Publication 800-22, May 15, 2001, pp. 1-153, cs-www.ncsl.nist.gov/rng./p

    p class="normln"[13]/p

    p class="normln"C.K. Koc: RSA Hardware Implementation, www.rsa.com, pp.1-28, August 1995./p

    p class="normln"[14]/p

    p class="normln"M. Drutarovsk, V. Fischer: Implementation of Scalable Montgomery Multiplication Co-processor in Altera Reconfigurable Hardware, Proceedings of 5th International Scientific Conference DSP MCOM 2001, Koice, Slovakia, pp. 132-135, November 2001./p

    p class="normln"[15]/p

    p class="normln"M. imka, V. Fischer: Montgomery Multiplication Coprocessor for Altera Nios Embedded Processor, Proceedings of the 5th International Scientific Conference on Electronic Computers and Informatics 2002, Koice, Slovakia, pp. 206-211, October 2002./p

    p class="zkladn_text,Prv_odsek"BIOGRAPHY/p

    p class="zkladn_text,Prv_odsek"Milo Drutarovsk was born in 1965 in Preov, Slovak Republic. He received the MSc degree and PhD degree in radioelectronics from the Faculty of Electrical Engineering and Informatics, Technical University of Koice, in 1988 and 1995, respectively. He defended his habilitation work - Digital Signal Processors in Digital Signal Processing in 2000. He is currently working as an associated professor at the Department of Electronics and Multimedia Communications, Technical University of Koice. Main areas of his research are applied cryptography, digital signal processing, and algorithms for embedded cryptographic architectures./p

    p class="zkladn_text,Prv_odsek"Martin imka was born in 1979 in Koice. He received his MSc. degree in Electronics and Telecommunications in 2002 after defending his Master's Thesis - Conception of connection of embedded processor to arithmetic coprocessor in SOPC Altera. Currently he is aPhD student at the Department of Electronics and Multimedia Communications, Technical University of Koice and his main research area is an implementation of cryptographic blocks on FPGAs./p

    p class="nzev"VOLTAGE IN ELECTRIC POWER SYSTEM WITH HIGH PHOTO VOLTAIC CELLS PENETRATION/p

    p class="normln"Pavla Hejtmnkov, Jan korpil/p

    p class="normln"Department of Power Engineering and Ecology, Faculty of Electrical Engineering,/p

    p class="normln"West Bohemia University in Plze, Sady Ptatctnk 14, 304 16 Plze, Czech Republic, tel.: 00420 37763 4321,/p

    p class="normln"E-mail: [email protected], [email protected]/p

    p class="normln"SUMMARY/p

    p class="normln"Small, distributed generation (DG) technologies such as micro-turbines, photovoltaic (PV) and fuel cells are gaining wide interest because of rapid advances in technologies. The deployment of these generation units on distribution networks could potentially lower the cost of power delivery by placing energy sources nearer to the demand centres. The capacity of the devices ranges from 1 kW to 2 MW in power level./p

    p class="normln"PV systems can either be stand-alone, or grid-connected. The main difference between these two basic types of systems is that in the latter case the PV system produces power in parallel with the electrical utility and can feed power back into the utility grid if the onsite load requirement does not need all of the output from the PV system installed. The instantaneous power production from PV in the near future will often exceed the instantaneous power consumption in residential areas with a high concentration of PV systems. The power flow backwards through the MV/LV transformers, i.e. the power flows from the LV network to the medium voltage (MV) network will increase. There is need to know the impact of the backward power flow to the Electric power system (EPS) and to set the upper limits to of PV amount that can be fed into a power system without causing problems to the power systems and find the possibilities of stretching the limits./p

    p class="zkladn_text_odsazen"Keywords: Photovoltaic power generation, Grid interconnection, Utility distribution system, Penetration level, Voltage limits, Distributed generation/p

    p class="nadpis_1"7. INDRODUCTION/p

    p class="zkladn_text_odsazen_2"Distributed energy resources (DER) refers to a variety of small, modular electricity-generating or storage technologies that are located close to the load they serve. There are many types of DER [4] which can be used for power supply heat or electric. If we assume as a final product electric power from DER, the produced electricity can covers the whole costumers load or the power output can be the higher or less one. In these cases, the power output has to be combined with a outside power EPS supplying possibility. The role of PV technology, as one of the perspective type of DG, is growing up. In order we can decide which PV operation state and interconnection to EPS is suitable [2], there is need to know the PV properties and its influence on EPS operation./p

    p class="nadpis_1"8. LOAD PLACEING/p

    p class="zkladn_text_odsazen_2"Regarding to the load place and the interconnection to the power supply system, the load (supply) or DG systems based on PV generators can be divided to:/p

    p class="zkladn_text_odsazen_2" Stand alone (SAPV systems) grid off/p

    p class="zkladn_text_odsazen_2" Grid on (IPV) interconnected to EPS/p

    p class="zkladn_text_odsazen_2"Stand-alone PV systems are usually installed where it is more economic to use PV than any other form of power supply./p

    p class="zkladn_text_odsazen_2"These types of PV generations can be separate to a few applications (Fig.1) in depending on the utilisation purpose:/p

    p class="zkladn_text_odsazen_2" Small SAPV DC systems/p

    p class="zkladn_text_odsazen_2" SAPV DC-AC system/p

    p class="zkladn_text_odsazen_2" SAPV DC-AC battery system/p

    p class="zkladn_text_odsazen_2"They can provide the power for domestic or other purposes./p

    p class="zkladn_text_odsazen_2"pPV generator/p

    pJunction box/p

    pAC circuit/p

    pBreaker/p

    pString/p

    pdiodes/p

    pString 1/p

    pString 2/p

    pOvervoltage/p

    pprotection/p

    pAC/p

    pLoad/p

    pDC/p

    pLoad/p

    pConnector/p

    pEarth/p

    p=/p

    p~/p

    pCharge/p

    pcontroler/p

    pBatery/p

    pInverter/p

    pDC circuit/p

    pBreaker/p

    /p

    p class="zkladn_text_odsazen_2"Figure 1. Grid on PV systems/p

    p class="zkladn_text_odsazen_2"Off-grid domestic systems provide electricity to households and villages that are not connected to the utility grid. They provide electricity for lighting, refrigeration and other low power loads and have been installed worldwide, particularly in developing countries, where they are often the most appropriate technology to meet the energy demands of off-grid communities. Off-grid domestic systems generally offer an economic alternative to extending the electricity distribution grid at distances of more than 1 or 2 km from existing power lines./p

    p class="zkladn_text_odsazen_2"Off-grid non-domestic installations were the first commercial application for terrestrial PV systems. They provide power for a wide range of professional applications, such as telecommunication, water pumping, vaccine refrigeration, navigational aids, aeronautical warning lights and meteorological recording equipment. These are applications where small amounts of electricity have a high value, thus making PV commercially cost competitive with other small generating sources. There are the large market possibilities for not only for developing countries but for industrialised countries too./p

    p class="zkladn_text_odsazen_2"The possibility of SAPV using is the economy question. The electric power consumers prefer the cheapest power source, if it possible. They compare various possibilities of the power supplying. In the case of the possibility choosing between the power supplying from EPS or from SAPV, this problem is concentrate on investment and operation costs of the power supplying system. It means that it depends on investment SAPV costs, EPS investment and operation costs (which are the function of the distance from distribution grid Fig.2)/p

    p class="zkladn_text_odsazen_2"p0/p

    p0,5/p

    p1/p

    p1,5/p

    p2/p

    p2,5/p

    p3/p

    p3,5/p

    p4/p

    p0/p

    p0,5/p

    p1/p

    p1,5/p

    p2/p

    p2,5/p

    p0/p

    p0,5/p

    p1/p

    p1,5/p

    p2/p

    p2,5/p

    p3/p

    p3,5/p

    p4/p

    p0/p

    p0,5/p

    p1/p

    p1,5/p

    p2/p

    p2,5/p

    pSAPV module/p

    pConnnection to grid/p

    pPV instalation/p

    pcosts decreasing /p

    pS/p

    pu/p

    pp/p

    pp/p

    pl/p

    pa/p

    py/p

    pi/p

    pn/p

    pg/p

    p /p

    po/p

    pu/p

    pt/p

    pp/p

    pu/p

    pt/p

    p /p

    p[/p

    pk/p

    pW/p

    p]/p

    pDistance from grid [km]/p

    /p

    p class="zkladn_text_odsazen_2"Figure2. SAPV economy/p

    p class="zkladn_text_odsazen_2"In SAPV systems, special attention must pay to malfunction or failure. Start-up power peaks, or reactive power and harmonic distortion can cause system signal instability and protective devices will close the system down./p

    p class="zkladn_text_odsazen_2"A well-matched load together with a carefully selected choice of appliances can lead to significant savings in terms of reduced need for PV and electricity storage capacity. Conversely, inefficient appliances and processes, standby loads and inappropriate loads will increase the requirement for expensive PV and storage capacity./p

    p class="zkladn_text_odsazen_2"Grid-connected distributed PV systems (Fig.3) are a relatively recent application where a PV system is installed to supply power to a building or other load that is also connected to the utility grid. These systems are increasingly integrated into the built environment and are likely to become commonplace. They are used to supply electricity to residential dwellings, commercial and industrial buildings, and are typically between 0,4 kW and 100 kW in size. The systems typically feed electricity back into the utility grid when the on-site generation exceeds the building loads. These systems offer a number of advantages: distribution losses are reduced because the systems are installed at the point of use, no extra land is required for the PV systems, costs for mounting systems can be reduced, and the PV array itself can be used as a cladding or roofing material as building integrated PV (BIPV). Compared to an off-grid installation, system costs are lower as energy storage is not generally required a factor that also improves system efficiency and decreases the environmental impact./p

    p class="zkladn_text_odsazen_2"pPV generator/p

    pJunction box/p

    pDC circuit/p

    pBreaker/p

    pInverter/p

    pString/p

    pdiodes/p

    pString 1/p

    pString 2/p

    pOvervoltage/p

    pprotection/p

    pGrid/p

    pLoad/p

    pElectricity/p

    pmeter/p

    pPV/p

    pelectricity/p

    pMeter/p

    pConnector/p

    pEarth/p

    pMain/p

    pDC cable/p

    pWh/p

    pWh/p

    pWh/p

    p=/p

    p~/p

    /p

    p class="zkladn_text_odsazen_2"Figure 3. Grid on PV systems/p

    p class="zkladn_text_odsazen_2"Three types of grid-connected photovoltaic systems are considered in the Grid-Connected Photovoltaic System Design Review and Approval process. These include:/p

    p class="zkladn_text_odsazen_2" Grid-Connected PV Systems without Battery Storage,/p

    p class="zkladn_text_odsazen_2" Grid-Connected PV Systems with Battery Storage,/p

    p class="zkladn_text_odsazen_2" Grid-connected centralized systems./p

    p class="zkladn_text_odsazen_2"Systems without battery are designed to operate in parallel with and interconnected to the electric utility grid./p

    p class="zkladn_text_odsazen_2"Systems with battery under normal circumstances operate in a grid-connected mode, supplementing the on-site loads or sending excess power back onto the grid while keeping the battery fully charged. In the event the grid becomes de-energized, control circuitry in the inverter opens the connection with the utility through a bus transfer mechanism, and operates the inverter from the battery to supply power to the dedicated critical load circuits only./p

    p class="zkladn_text_odsazen_2"Grid-connected centralized systems have been installed for two main purposes:/p

    p class="zkladn_text_odsazen_2" an alternative to conventional centralized power generation,/p

    p class="zkladn_text_odsazen_2" strengthening the utility distribution system./p

    p class="zkladn_text_odsazen_2"Utilities in a number of countries are investigating the feasibility of these types of power plants. Demonstration plants have been installed in Germany, Italy, Japan, Spain, Switzerland and the USA, generating reliable power for utility grids and providing experience in the construction, operation and performance of such systems. /p

    p class="zkladn_text_odsazen_2"In IPV (except centralised) systems, the power which flows from PV cells can fully or partly covers the power consumption. In the case of the backward power flow to the EPS, the PV power cause electric power parameters changes. These changes are caused by invertors converting DC to AC. Inverters fall into two-main categories:/p

    p class="zkladn_text_odsazen_2" self-commutated, /p

    p class="zkladn_text_odsazen_2" line-synchronised./p

    p class="zkladn_text_odsazen_2"The first can operate independently, being activated solely by the input power source; the line-synchronised inverters are triggered directly from the grid. Utilities require that inverters connected to the grid must contain suitable control and protection to ensure that systems are installed safely and do not adversely affect the power quality./p

    p class="zkladn_text_odsazen_2"A major issue related to interconnection of distributed resources onto the power grid which can have the potential impacts on the quality of power provided to other customers connected to the grid can be described by the attributes which define power quality and include:/p

    p class="zkladn_text_odsazen_2" Voltage regulation - The maintenance of the voltage at the point of delivery to each customer within an acceptable range./p

    p class="zkladn_text_odsazen_2" Flicker - The repetitive and rapid changes of voltage, which has the effect of causing unacceptable variations in light output and other effects on power consumers and their equipment./p

    p class="zkladn_text_odsazen_2" Voltage imbalance - The grid voltage does not have identical voltage magnitude on each phase, and a 120 phase separation between each pair of phases./p

    p class="zkladn_text_odsazen_2" Harmonic distortion - The injection of currents having frequency components which are multiples of the fundamental frequency./p

    p class="zkladn_text_odsazen_2" Direct current injection - A situation which can cause saturation and heating of transformers and motors, and can also cause these passive devices to produce unacceptable harmonic./p

    p class="zkladn_text_odsazen_2"The maintenance of the voltage in supplying points in the definite limits s the problems related with the growing number connected PV generators to EPS. This is the main reason which limits penetration of PV systems in electric power networks./p

    p class="nadpis_1"9. VOLTAGE IN ELECTRIC POWER SYSTEMS/p

    p class="normln"The supply voltage at the delivery point (IJ, Fig.4) must lie within 10% of the nominal voltage for European LV networks according to EN50160. Stricter limits may apply nationally, both in Europe and elsewhere. Accepted voltages typically lie between 90% and 106% of the nominal voltage. Stricter limits may apply nationally, both in Europe and elsewhere. Accepted voltages typically lie between 90% and 106% of the nominal voltage./p

    p class="nadpis_9"3.1 The classic electric power system (centralised production)/p

    p class="normln"In a classic EPS, the generated power of large centralised power generators supply the power network with electrical energy. The generators are connected to the high voltage power network at the highest voltage level (A) and the power is consumed usually at the lowest voltage level (I-J) (see Fig.4). The power flow causes the voltages to drop through the power system from the highest voltage to the lowest voltage./p

    p class="normln"div class="embedded" id="_1145449475"/

    pVHV/VH/p

    pV/p

    pVHV/HV/p

    pHV/HV/p

    pHV/LV/p

    p110/p

    p105/p

    p100/p

    p95/p

    p90/p

    p85/p

    pUPPER LIMIT/p

    pLOWER LIMIT/p

    pA/p

    pB/p

    pC/p

    pD/p

    pF/p

    pG/p

    pH/p

    pI/p

    pJ/p

    pE/p

    /p

    p class="normln"Figure4. Voltage drops in the central supply EPS/p

    p class="normln"The voltage changes caused by the loads switching on or off are regulate by the automatic tap changers on the secondary side of the transformers (except HV/LV) as close as possible to the nominal system voltage. The maximum deviation from the nominal system voltage is only one step in tap changer position (typically 1,5%)./p

    p class="nadpis_9"3.2 The decentralised electric power system with high PV penetration/p

    p class="normln"In a decentralised EPS the power sources are interconnected to the EPS typically in the points of MV or LV levels. For the PV cells it is usually LV level due to their low power output./p

    p class="normln"As regards to EPS connection, usually, the MV/LV transformers are evenly distributed along an open MV ring (Fig.5) and the PV cells can be interconnected to the system:/p

    p class="zkladn_text_odsazen_2" from a single LV line (A) (in the order of 30-80 kWp power output),/p

    p class="zkladn_text_odsazen_2" from all the LV lines connected to a single MV/LV transformer (B) (in the order of 200-400 kWp),/p

    p class="zkladn_text_odsazen_2" penetration from all the MV/LV transformers connected to MV ring (C) (in the order of 1-2 MWp)./p

    p class="normln"In each of these cases, the maximal penetration of PV cells is acceptable in depending on the voltage droop./p

    p class="normln"div class="embedded" id="_1145457163"/

    pHV/p

    pEQVIVALENT/p

    pHV/p

    pRIN/p

    pG/p

    pHV/LV/p

    pA/p

    pB2/p

    pC1/p

    pC2/p

    pD1/p

    pD2/p

    pB1/p

    p0,4/p

    pkV/p

    /p

    p class="normln"Figure5. PV connection to EPS/p

    p class="normln"The computation of PV penetration can be simplified and limited to the MV and LV due to automatic voltage regulation and therefore approximately 100% voltage values in point A. The results are on Fig.6. As the computation data, the typical values for the EPS scheme on Fig.5. were assumed - five LV lines per MV/LV transformer, 10 MV/LV transformers per ring, average values for line impedances, line lengths and transformer sizes. The minimum load is set to 25% of the maximum load./p

    p class="normln" SHAPE \* MERGEFORMAT

    /p

    p class="normln"Figure6. Maximal penetration PV in EPS/p

    p class="normln"The results show that no PV penetration is acceptable at minimum load. However, the excess voltages are rather limited for PV penetrations up to the minimum load. In addition, only a small increase in the load from the minimum load opens up for aconsiderable amount of PV, especially if PV power only penetrates from a single LV line./p

    p class="nadpis_1"10. POSIBILITIES OF THE PV PENETRATION LIMITS STRETCHING/p

    p class="normln"The purpose of PV cells interconnecting to EPS can follow two possibilities:/p

    p class="normln" utilisation of the sun power as the renewable source as much as possible,/p

    p class="normln" maximal sun power utilisation on costumer sides with the acceptable economy and easy interconnection to EPS./p

    p class="normln"Maximal sun power concentration can be situated only to places with very high power density and their higher power output is interconnected to MV EPS./p

    p class="normln"In the case of PV interconnection on LV, usually, measures to increase the amount of PV penetration are only necessary if large power production from PV coincides with minimum load situations. This is the case during the summer holidays. The restrictions at minimum load are only a problem if the power generation from the PV systems coincides with minimum load situations./p

    p class="nadpis_9"4.1 Separate PV reception networks/p

    p class="normln"The separate reception networks for PV systems (i.e. PV networks separated from the consumer networks) is the solution of over-voltages from PV penetration only for suitable places with good possibility to EPS interconnection. Such measures are often seen in the open country with high concentrations of wind power. The power networks are, however, not very often well established to begin within the open country. On the contrary, one of the big advantages of PV is in fact to connect to the existing power networks./p

    p class="nadpis_9"4.2 Controllable PV systems/p

    p class="normln"Another way to avoid over-voltages could be to let PV systems reduce their power generation in case of over-voltage. Such a measure would add unacceptable costs to PV systems, especially if the measure is not needed in the majority of cases. In addition, over-voltage monitoring systems would make PV more sensitive to short-term disturbances in the network./p

    p class="normln"Next, PV penetration can be assumed as one of distributed generation resources to the low voltage grids, such as, combined heat and power (CHP), micro-turbines, small wind turbines in certain areas and fuel cells. The number of these sources in the near future is constantly increasing and altering the traditional operating principle of the grids. A particularly promising aspect, related to the proliferation of small-scale decentralized generation, is the possibility for parts of the network comprising sufficient generating resources to operate in isolation from the main grid, in a deliberate and controlled way. Grid portions with such a capability are called Micro grids. Acritical factor in order to exploit the potential offered by the micro grid concept is the presence of micro sources, with fast-acting power electronics interfaces to regulate voltage and frequency and ensure proper load sharing among the various sources, when operating in isolated mode. In interconnected mode, the micro-source and central micro-grid controllers regulate the power exchange with the grid, monitor grid conditions and ensure proper separation./p

    p class="nadpis_9"4.3 Adjusting the transformer tap changer position/p

    p class="normln"A more feasible solution to allow for more PV penetration at minimum load during summer days is to change the MV/LV transformer tap changer positions in the summer period when maximum load situations are not expected. Such a measure would require an interruption of the electricity supply twice a year for adjustment of the offload tap changers./p

    p class="nadpis_9"4.4 Customer initiatives/p

    p class="normln"One of the most efficient measures to allow for more PV is perhaps the customers' own changes of behaviour. Experience with PV systems shows that the customers change their consumption behaviour when at the same time they become power producers. Many customers who have PV installed endeavour to move their consumption to moments with high power production from PV. This tendency is especially noticeable if the customers experience different buying and selling prices of electricity./p

    p class="nadpis_1"11. CONCLUSION/p

    p class="zkladn_text_odsazen_2"The limits to the amount of PV power that can be fed into a power system are least critical if the PV power penetrates from only a single LV line and they are more severe if PV power penetrates from multiple LV lines connected to the same MV/LV transformer or even multiple MV/LV transformers. The most severe limits occur in minimum load situations, especially if the power system beforehand is operated to its design limits. However, an amount of PV penetration equal to the minimum load only causes slight excess voltages in the power network./p

    p class="zkladn_text_odsazen_2"In the longer term, it is expected that more flexible and accommodated consumption will remove the barriers of limits to PV penetration. It is essential that PV - together with other elements of distributed generation is must be considered in the future network planning./p

    p class="zkladn_text_odsazen_2"The work was supported by the GAR (Czech Science Foundation), research project GAR No. 102/02/0949./p

    p class="nadpis_6"REFERENCES/p

    p class="zhlav"[1]/p

    p class="normln"Impacts of Power Penetration from Photovoltaic Power Systems in Distribution Networks Report IEA-PVPS T5-10: 2002./p

    p class="normln"[2]/p

    p class="normln"Grid-connected photovoltaic power systems, Report IEA PVPS T5-03: 1999./p

    p class="normln"[3]/p

    p class="normln"Dvorsk, E., Hejtmnkov, P., Mhlbacher, J.: Influence of photovoltaic cells interconection to distribution power networks. 13th International expert meeting Power Engineering, Maribor, Slovenia, 2004, ISBN 86-435-0617-6./p

    p class="normln"[4]/p

    p class="normln"Dvorsk, Hejtmnkov, Tma: Optimisation and reliability CHP units utilisation, International workshop The Effective Use of Physical Theories of Conversion of Energy, Pernink, June n 2002./p

    p class="normln"/p

    p class="normln"/

    p class="normln"[5]/p

    p class="normln"Dvorsk, Hejtmnkov, Tesaov: Introduction of Distributed Generation Technologies, 12th International Expert Meeting "Power Engineering" Maribor, 2003./p

    p class="normln"[6]/p

    p class="normln"Mhlbacher, J., Noh, K., Nohov, L.: Distributed power systems: 12th International Expert Meeting "Power Engineering" Maribor, 2003./p

    p class="normln"BIOGRAPHY/p

    p class="normln"Jan korpil was born on 18. 8. 1941. In the 1964 he received the M.Sc. (Ing.) degree with distinction at the Technical College in Plze, at the Department of Power machinery engineering. Then he worked as a plant engineer at the power and heating station of Chemical plant in Zlu. Since 1966 to 1971 he was a senior lector at the department of Power machinery engineering at the Technical College in Plze. Since 1971 to 1981 was a special worker for teaching technology, since 1981 he works at Power engineering department of Electric Engineering Faculty of the West Bohemia University in Plze. He finished PhD (CSc) study in the field of technical teaching in 1989. Since 1991 he is associate professor (Doc), his thesis title was from area of Power engineering. His branch area is power station equipment, renewable energy sources and environmental protection./p

    p class="normln"Pavla Hejtmnkov finished her university study at 1987 in the specialisation of Electric Power Engineering. Then she worked as a assistant at Department of Power Engineering of Electric Engineering Faculty of West Bohemia University in Plze. In 1992 to 1995 continued in Ph.D. study in the field of Electric distribution network control.. She finished her Ph.D. study in 1996 and since she works as senior lecture on the same department. Now her research activity is concentrate to the Decentralised Sources utilisation./p

    p class="nzev"EFFECTIVE WAY OF OVERRIDING C++ OPERATORS FOR MATRIX OPERATIONS/p

    p class="normln"Jan CVEJN/p

    p class="normln"Technical University of Liberec, Faculty of Mechatronics and Interdisciplinary Engineering Studies/p

    p class="normln"Hlkova 6, 461 17 Liberec, Czech Republic, E-mail: [email protected]/p

    p class="normln"SUMMARY/p

    p class="normln"The article describes an effective way how to create a library for manipulating matrices in a similar manner as if they were numbers in programming language C++. Although overriding arithmetic operators for manipulating classes such as vectors and matrices seems to be a standard procedure in C++, there occur serious problems when performance aspects are taken into account. The main disadvantage consists in frequent allocating and deallocating memory, which can significantly slow down evaluating arithmetic expressions. In the article an implementation is described that eliminates redundant memory allocations and enables using overridden operators for effective evaluating matrix expressions. /p

    p class="normln"Keywords: C++ programming, overloading operators, object-oriented programming/p

    p class="nadpis_1"12. INTRODUCTION /p

    p class="normln"Creating classes to simplify operations with dynamic structures is one of key concepts of object-oriented programming in general and implementing such a class is a common procedure in any objectoriented language. In C++, it is however possible to go further and extend applicability of arithmetic operators to manipulate with structured objects. This would be especially useful at work with objects that naturally participate in arithmetic expressions, such as vectors and matrices. For example, let us consider the following matrix expression:/p

    p class="normln"div class="embedded" id="_1152089600"/

    p1/p

    pT/p

    pCBABDE/p

    p-/p

    p=+/p

    /p

    p class="normln"A general approach to creating a class to simplify programming such expressions in an object-oriented language would be defining methods for matrix addition, multiplication, transposition and inversion. The expression above would be then transcribed as follows:/p

    p class="prog"Z.inv(B);/p

    p class="prog"Y.mul(B,A);/p

    p class="prog"X.mul(Y,Z); /p

    p class="prog"W.tr(D);/p

    p class="prog"V.mul(W,E);/p

    p class="prog"C.add(X,V);/p

    p class="normln"In this example it is needed to define help matrix variables Z, Y, X, W, V. However, which is worse, from the lines above it is not very obvious what they actually do. In C++ we can, under some conditions, which are discussed later, override operators to enable writing the expression in natural form as follows:/p

    p class="prog"C = B*A*B.inv()+D.tr()*E; /p

    p class="normln"In this way matrix expressions can be computed in a similar manner as in some high-level matrix language while keeping high performance and full control of evaluating. Since such expressions are usually parts of complex numerical algorithms, effectivity is very important. /p

    p class="nadpis_1"13. BASIC IMPLEMENTATION/p

    p class="normln"To start, let us define a simple C++ class for matrix operations and describe briefly some elementary steps. The memory for floating-point data is allocated dynamically. /p

    p class="prog"#define NUM double/p

    p class="prog"class Matrix/p

    p class="prog"{/p

    p class="prog"NUM *pdata; //buffer for data/p

    p class="prog"int dataLen; //length of /p

    p class="prog"//allocated buffer/p

    p class="prog"int m,n; //dimensions of matrix /p

    p class="prog"public:/p

    p class="prog"Matrix()/p

    p class="prog"{/p

    p class="prog"

    pdata=NULL;/p

    p class="prog"dataLen=m=n=0;}/p

    p class="prog"Matrix(int m,int m) /p

    p class="prog"{ /p

    p class="prog"

    pdata=NULL;/p

    p class="prog"dataLen=m=n=0;redim(m,n);/p

    p class="prog"}/p

    p class="prog"~Matrix() { delete[] pdata; }/p

    p class="prog"void copy(const Matrix &A);/p

    p class="prog"void redim(int _m, int _n); //method /p

    p class="prog"//for (re)allocating memory/p

    p class="prog"Matrix(const Matrix &A) { copy(A);}/p

    p class="prog"Matrix &operator=(const Matrix &A) /p

    p class="prog"{ /p

    p class="prog"

    delete[] pdata;/p

    p class="prog"

    copy(A); return *this; }/p

    p class="prog"void setValue(NUM value); /p

    p class="prog"void checkBounds(int i, int j) /p

    p class="prog"{ assert(0=i && im /p

    p class="prog"

    && 0=j && jn); }/p

    p class="prog"NUM &operator()(int i, int j) /p

    p class="prog"{ checkBounds(i,j); /p

    p class="prog"

    return pdata[i*n+j]; }

    /p

    p class="prog"NUM getAt(int i, int j) /p

    p class="prog"{ checkBounds(i,j); /p

    p class="prog"

    return pdata[i*n+j]; }/p

    p class="prog"void setAt(int i, int j, NUM v) /p

    p class="prog"{ checkBounds(i,j); pdata[i*n+j]=v;}/p

    p class="prog"};/p

    p class="normln"Using this class we can create a matrix variable A of dimensions m x n as follows:/p

    p class="prog"Matrix A(m,n);/p

    p class="normln"To access a particular element of the matrix, the overridden operator () can be used:/p

    p class="prog"A(0,0)=1;/p

    p class="normln"or /p

    p class="prog"A.setAt(0,0,1);/p

    p class="normln"Methods for accessing elements use standard debugging macro assert() to check for correct values of indexes. This macro does not affect performance since it is automatically omitted in the release build of the program. All these methods are written as inline for eliminating overhead of calling functions. /p

    p class="normln"Since the copy constructor and operator = are overridden, we can write /p

    p class="prog"Matrix B(A),C;/p

    p class="prog".../p

    p class="prog"C=B;/p

    p class="normln"Note that operator= has to deallocate data unlike the copy constructor. /p

    p class="normln"There remain several things to write method redim() used for allocating and reallocating the memory, method setValue() used for initializing data and method copy() for copying the data./p

    p class="prog"void Matrix::setValue(NUM value)/p

    p class="prog"{/p

    p class="prog"for(int i=0;im*n;i++)/p

    p class="prog"

    pdata[i]=value;}/p

    p class="prog"void Matrix::redim(int _m, int _n)/p

    p class="prog"{/p

    p class="prog"if(m!=_m || n!=_n)/p

    p class="prog"{/p

    p class="prog"

    if(_m*_ndataLen)/p

    p class="prog"

    {/p

    p class="prog"

    dataLen=_m*_n;/p

    p class="prog"delete[] pdata;/p

    p class="prog"

    pdata=new NUM[dataLen];/p

    p class="prog"

    }/p

    p class="prog"

    m=_m;/p

    p class="prog"

    n=_n;/p

    p class="prog"}/p

    p class="prog"}/p

    p class="prog"void Matrix::copy(const Matrix &A)/p

    p class="prog"{/p

    p class="prog"redim(A.m, A.n);/p

    p class="prog"memcpy(pdata,A.pdata,/p

    p class="prog" m*n*sizeof(NUM));/p

    p class="prog"}/p

    p class="normln"Note that dataLen is greater or equal to m.n. The buffer is reallocated only if it is necessary, i.e. if it would be m.n dataLen. This is one of the most important features of this class implementation. If any global or static object of class Matrix is used for storing results, its data buffer is reallocated only once, or possibly a few times if the variable is used at more places of the program for storing matrices with different dimensions. /p

    p class="normln"Internal implementation of Matrix could be different. For example, it would be possible to define it as a dynamic array of pointers to dynamic arrays that are allocated separately. In this case, the access to an element may be on some machines little faster, since there is no need of multiplication:/p

    p class="prog"class Matrix/p

    p class="prog"{/p

    p class="prog"public:/p

    p class="prog"NUM **pdata; /p

    p class="prog"int m, n; //dimensions of /p

    p class="prog"

    //the matrix /p

    p class="prog".../p

    p class="prog"NUM& operator()(int i,int j) /p

    p class="prog"

    //obtain ref. to A(i,j) elem./p

    p class="prog"{ /p

    p class="prog"

    return pdata[i][j]; /p

    p class="prog"}/p

    p class="prog"};/p

    p class="normln"However, in this case, reallocating the data buffers is more complicated. Another advantage of the first implementation is the possibility of iterating through data in a single loop, which is faster (see implementation of methods setValue() and copy() above). /p

    p class="nadpis_1"14. OVERRIDING OPERATORS/p

    p class="normln"Now, some operators for manipulating matrices can be defined. At first, two simple and commonly used ways are described with their disadvantages. /p

    p class="normln"The operators +, += can be defined as follows:/p

    p class="prog"class Matrix/p

    p class="prog"{/p

    p class="prog".../p

    p class="prog"void operator+=(const Matrix &A);/p

    p class="prog"Matrix operator+(const Matrix &A); /p

    p class="prog"};/p

    p class="prog"void Matrix::operator+=(const Matrix &A)/p

    p class="prog"{/p

    p class="prog"assert(m==A.m && n==A.n); /p

    p class="prog"//+ defined only for matrices of //the same dimensions/p

    p class="prog"for(int i=0;im*n;i++) /p

    p class="prog"pdata[i]+=A.pdata[i];/p

    p class="prog"} /p

    p class="prog"Matrix Matrix::operator+(const Matrix &A)/p

    p class="prog"{/p

    p class="prog"assert(m==A.m && n==A.n);/p

    p class="prog"Matrix B(m,n);/p

    p class="prog"for(int i=0;im*n;i++) /p

    p class="prog"

    B.pdata[i]=pdata[i]+A.pdata[i];/p

    p class="prog"return B;}/p

    p class="normln"Operator += can be overridden in this way without any problems. Operator + implementation requires some discussion. The previous simple way to define operator + is correct, but unfortunately ineffective. Each time it is called, it has to allocate memory for the local object B that it returns. However, when B is returned, a new temporary object is created to hand over the data. This needs one more memory allocation. Finally, assignment operator = used in expression C=A+B may require one more call of redim(). Here the reallocation is not done if the matrix C has already sufficiently large data buffer. The right-hand arguments of the operators are handed over using references to save more memory allocations. Nevertheless, the operating system is called two times at minimum to allocate memory at each time the + operator is invoked. This allocation can be much more demanding than pure copying and summing the data. /p

    p class="normln"Let us look at a different approach to defining operators. To omit unnecessary copying of data, arithmetic operators are often defined to return the reference to a local static object:/p

    p class="prog"class Matrix/p

    p class="prog"{/p

    p class="prog".../p

    p class="prog"Matrix &operator+(const Matrix &A); /p

    p class="prog"Matrix &operator*(const Matrix &A); /p

    p class="prog"};/p

    p class="prog"Matrix &Matrix::operator+(const Matrix &A)/p

    p class="prog"{/p

    p class="prog"assert(m==A.m && n==A.n);/p

    p class="prog"static Matrix B; //!/p

    p class="prog"B.redim(m,n);/p

    p class="prog"for(int i=0;im*n;i++) /p

    p class="prog"B.pdata[i]=pdata[i]+A.pdata[i];/p

    p class="prog"return B;} /p

    p class="prog"Matrix &Matrix::operator*(const Matrix &A)/p

    p class="prog"{/p

    p class="prog"assert(n==A.m); /p

    p class="prog"static Matrix B; /p

    p class="prog"B.redim(m,A.n);/p

    p class="prog"for(int i=0;im;i++)/p

    p class="prog"{/p

    p class="prog"

    for(int j=0;jA.n;j++)/p

    p class="prog"

    {/p

    p class="prog"

    NUM sum=0;/p

    p class="prog"for(int k=0;kn;k++)/p

    p class="prog"sum+=getAt(i,k)*A.getAt(k,j);/p

    p class="prog"

    B.setAt(i,j,sum);/p

    p class="prog"

    }/p

    p class="prog"}/p

    p class="prog"return B;/p

    p class="prog"}/p

    p class="normln"In this case, the expression A+B will return reference to the static variable, defined in the operator + body, which holds the result. Expressions C=A+B and D=A+B+C will be evaluated correctly. However, let us look at an expression like this:/p

    p class="prog"E=A*B+C*D;/p

    p class="normln"C++ language keeps priorities of operators even if they are overridden for user classes. In the case of expression above, products A*B and C*D have to be held in two separate places before summation. However, there is only one help static variable in operator * to hold the temporary result. Therefore, the result cannot be correct. /p

    p class="normln"Obviously, both the approaches are not suitable either ineffective, or incorrect in some cases. /p

    p class="normln"Note that an operator function has to return either Matrix or reference to Matrix so that more complicated expressions can be created. /p

    p class="nadpis_1"15. EFFECTIVE IMPLEMENTATION OF OPERATORS /p

    p class="normln"One approach to correct and yet effective implementation is described further. The main ideas are the following:/p

    p class="normln"1. There is needed a static array of objects to hold the temporary results that occur while evaluating complex expressions. It can be called array of results ant its items result objects. The array has to be static because in the opposite case it would be allocated and deallocated each time any expression is evaluated. /p

    p class="normln"2. Each time a result object is needed, the array will be searched for the first free result object. This means, each item of the array has to have a flag signalizing that it is free or not./p

    p class="normln"3. After evaluating the expression the flags of all the result objects have to be set to the free state so that the objects can be used again for evaluating next expressions./p

    p class="normln"Since the result objects can be repeatedly used for matrices of different dimensions, before they are used, redim() method has to be called. Here one of the key ideas of this implementation is used - method redim() allocates memory only if the data buffer needs to grow. In consequence, after a small number of evaluations of expressions no more memory reallocation will be done. /p

    p class="normln"In the beginning, the objects in the array of results have not allocated memory. The memory will be allocated the first time the object is used (calling method redim()). Since the array is static, memory will be automatically destroyed when the program ends. /p

    p class="normln"The previous paragraphs imply that after many different expressions were evaluated, the number of allocated result objects will correspond to the number of operators in the most complex expression and the size of the largest memory block held by the array of results is the same as the size of the largest matrix that occurred in any expression. /p

    p class="normln"Each operator could now ask the Matrix class for a free result object, perform the operation and return the reference to the result object. In this case, it is not yet clear how to realize step 3. The result objects will not be informed in any way that the whole expression was evaluated and cannot re