an algorithm for multiple-precision floating-point multiplication

8
An algorithm for multiple-precision floating-point multiplication Daisuke Takahashi Institute of Information Sciences and Electronics, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan Abstract We present an algorithm for multiple-precision floating-point multiplication. The conventional algorithms based on the fast Fourier transform (FFT) multiply two n- bit numbers to obtain a 2n-bit result. In multiple-precision floating-point multiplication, we need only the returned result whose precision is equal to the multiple-precision float- ing-point number. We show that the overall arithmetic operations for FFT-based multiple-precision floating-point multiplication are reduced by decomposition of the full-length multiplication into shorter-length multiplication. Ó 2004 Elsevier Inc. All rights reserved. Keywords: Program derivation; Multiplication; Multiple-precision arithmetic; Fast Fourier trans- form 1. Introduction Many multiple-precision multiplication algorithms have been well studied [1–6]. Multiple-precision multiplication of n-bit numbers requires O(n 2 ) bit 0096-3003/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2004.04.034 E-mail address: [email protected] Applied Mathematics and Computation 166 (2005) 291–298 www.elsevier.com/locate/amc

Upload: daisuke-takahashi

Post on 26-Jun-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An algorithm for multiple-precision floating-point multiplication

Applied Mathematics and Computation 166 (2005) 291–298

www.elsevier.com/locate/amc

An algorithm for multiple-precisionfloating-point multiplication

Daisuke Takahashi

Institute of Information Sciences and Electronics, University of Tsukuba,

1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

Abstract

We present an algorithm for multiple-precision floating-point multiplication. The

conventional algorithms based on the fast Fourier transform (FFT) multiply two n-

bit numbers to obtain a 2n-bit result. In multiple-precision floating-point multiplication,

we need only the returned result whose precision is equal to the multiple-precision float-

ing-point number. We show that the overall arithmetic operations for FFT-based

multiple-precision floating-point multiplication are reduced by decomposition of the

full-length multiplication into shorter-length multiplication.

� 2004 Elsevier Inc. All rights reserved.

Keywords: Program derivation; Multiplication; Multiple-precision arithmetic; Fast Fourier trans-

form

1. Introduction

Many multiple-precision multiplication algorithms have been well studied[1–6]. Multiple-precision multiplication of n-bit numbers requires O(n2) bit

0096-3003/$ - see front matter � 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.amc.2004.04.034

E-mail address: [email protected]

Page 2: An algorithm for multiple-precision floating-point multiplication

292 D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298

operations using ordinary multiplication algorithm [1]. Karatsuba�s algorithm[2] is known to reduce the number of operations to O(nlog2 3).

Multiple-precision multiplication of n-bit numbers can be performed in

O(n logn log logn) bit operations by using the Schonhage–Strassen algorithm

[3], which is based on the fast Fourier transform (FFT) [7]. However, the

Schonhage–Strassen algorithm may not be able to take advantage of comput-ers with fast floating-point hardware, and it needs binary-to-decimal radix con-

version for the final result. Bailey used the discrete Fourier transform (DFT)

with three prime modulo computations followed by reconstruction through

the Chinese Remainder Theorem for his p calculation to 29 million decimal

digits [8].

In addition, a multiple-precision multiplication algorithm using floating-

point real FFT is known as another fast multiplication algorithm [9,10].

These conventional FFT-based multiplication algorithms multiply two n-bitnumbers to obtain a 2n-bit result. In multiple-precision floating-point multipli-

cation, we need only the returned result whose precision is equal to the multiple-

precision floating-point number. We will call the returned result the ‘‘short

product’’ here. In [4,6], algorithms for multiple-precision floating-point multi-

plication are shown. They used the ordinary O(n2) multiplication algorithm

or Karatsuba�s O(nlog2 3) algorithm. However, in multiple-precision multiplica-

tion of several thousand decimal bits or more, FFT-based multiplication is the

fastest. We show that the overall arithmetic operations for FFT-based multiple-precision floating-point multiplication are reduced by decomposition of the full-

length multiplication into shorter-length multiplication.

For simplicity, in this paper, we use the short product which does not pro-

vide exact rounding. However, it is not hard to extend multiple-precision float-

ing-point multiplication with exact rounding [4,6]. We also restrict ourselves to

computing a mantissa of floating-point numbers in this paper.

2. Multiple-precision multiplication based on FFT

We deal here with real FFT-based multiple-precision multiplication.

Let us consider the product Z of two integers X and Y with length n and

base B.

X ¼X2n�1

i¼0

xiBi; Y ¼X2n�1

i¼0

yiBi;

where 0 6 xi < B, 0 6 yi < B, for 0 6 i 6 n � 1, and xi = yi = 0 for i P n.

Then,

Z ¼ X � Y ¼X2n�1

i¼0

xiBi

!�X2n�1

j¼0

yjBj

!¼X2n�1

k¼0

X2n�1

j¼0

xjyk�j

!Bk ¼

X2n�1

k¼0

zkBk:

Page 3: An algorithm for multiple-precision floating-point multiplication

D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298 293

Thus, zk can be written as follows:

zk ¼ Ckðx; yÞ ¼X2n�1

j¼0

xjyk�j; for 06 k6 2n� 2; and z2n�1 ¼ 0;

where the subscript k � j is interpreted as k � j + 2n if k � j is negative.

We note that the maximum value of zk is nB2 at most. Each zk should be

normalized to 0 6 zk < B.

The DFT and the inverse DFT are given by

F kðxÞ ¼XN�1

j¼0

xj e�2pijk=N ; ð1Þ

F �1k ðxÞ ¼ 1

N

XN�1

j¼0

xj e2pijk=N ; ð2Þ

where N = 2n, and i ¼ffiffiffiffiffiffiffi�1

p.

Then the convolution theorem for discrete sequences states that

F ½Cðx; yÞ� ¼ F ðxÞF ðyÞ:Let C(x,y) denote the convolution of sequences x and y:

Cðx; yÞ ¼ F �1½F ðxÞF ðyÞ�:We can use the FFT to compute the DFT in (1) and (2). The arithmetic

operation count of the N-point FFT is O(N logN) [7].The data length N is assumed to be a power-of-two. Let the arithmetic oper-

ation count of N(= 2n)-point FFT be cfft1N log2N + cfft2N and let the arithmetic

operation count for the point-by-point product of the N-point Fourier coeffi-

cients be cprodN.

In the conventional FFT-based multiplication algorithm, two N-point

forward FFTs and an N-point inverse FFT are needed to multiply two n-bit

numbers. The overall arithmetic operation count of multiple-precision float-

ing-point multiplication Tconv is as follows:

T conv ¼ 3ðcfft1N log2N þ cfft2NÞ þ cprodN : ð3Þ

3. Multiple-precision floating-point multiplication using the splitting algorithm

Let us consider multiple-precision floating-point multiplication of n-bit

numbers based on FFT-based multiplication.First of all, we show FFT-based multiple-precision floating-point multipli-

cation with 2-way splitting between X and Y, both of which are n-bit numbers

Page 4: An algorithm for multiple-precision floating-point multiplication

294 D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298

with base B. We deal here with a multiple-precision floating-point multiplica-

tion, such as multiplying two n-bit numbers to obtain an n-bit result.

X ¼ x1Bn=2 þ x0 ð06 xi < Bn=2; 06 i6 1Þ;

Y ¼ y1Bn=2 þ y0 ð06 yj < Bn=2; 06 j6 1Þ:

The short product of upper half Z � X � Y is as follows:

X � Y � Z ¼ F �1½F ðx1ÞF ðy1Þ�Bn þ F �1½F ðx1ÞF ðy0Þ þ F ðx0ÞF ðy1Þ�Bn=2

¼ ðz1Bn=2 þ z0ÞBn=2:

The illustration of the short product with the 2-way splitting algorithm is

shown in Fig. 1.

When each partial product xiyj (0 6 i 6 1, 0 6 j 6 1, 0 6 i + j 6 1) iscomputed, it is necessary to compute the forward FFTs of xi (0 6 i 6 1)

and yj (0 6 j 6 1).

If we preserve the Fourier coefficients F(xi) (0 6 i 6 1) and F(yj)

(0 6 j 6 1), the total number of the forward FFTs is 4.

Since the inverse Fourier transformed results summed up at the same place

after the inverse FFTs are computed, the total number of the inverse FFTs is 3

(¼P2

i¼1i). However, we can utilize the linearity of the DFT. First of all, we

sum up the Fourier coefficients at the same place. Since we compute the inverseFFTs of these results, we can reduce the number of the inverse FFTs to 2.

We consider the arithmetic operation count of multiple-precision multiplica-

tion with 2-way splitting. The total arithmetic operation count Tfft for (N/2)-

point FFTs, is as follows:

T fft ¼ 6 cfft1N2log2

N2þ cfft2

N2

� �¼ 3 cfft1Nðlog2N � 1Þ þ cfft2Nf g: ð4Þ

Fig. 1. Illustration of the short product with the 2-way splitting algorithm.

Page 5: An algorithm for multiple-precision floating-point multiplication

D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298 295

The total arithmetic operation count Tprod for the point-by-point product of

the Fourier coefficients, is as follows:

T prod ¼X2i¼1

i

!cprod

N2¼ 3

2cprodN : ð5Þ

Let the arithmetic operation count for the partial sum of the N-point coef-

ficients be csumN. Then, the total arithmetic operation count Tsum for the par-

tial sum of the Fourier coefficients at the same place is as follows:

T sum ¼X1i¼1

i

!csum

N2¼ 1

2csumN : ð6Þ

From (3)–(6) the overall arithmetic operation count of multiple-precision

floating-point multiplication with 2-way splitting, Tsplit, is as follows:

T split ¼ T fft þ T prod þ T sum þ T norm

¼ 3 cfft1Nðlog2N � 1Þ þ cfft2Nf g þ 3

2cprodN þ 1

2csumN þ cnormN

¼ T conv þ �3cfft1 þ1

2cprod þ

1

2csum

� �N : ð7Þ

The overall arithmetic operation count of multiple-precision floating-point

multiplication with 2-way splitting is less than that of the conventional algo-

rithm when

�3cfft1 þ1

2cprod þ

1

2csum < 0 ð8Þ

follows from (7).

From (8), we conclude that the 2-way splitting algorithm is better than the

conventional algorithm if (cprod + csum)/(6cfft1) < 1.

In the same way, we can show FFT-based multiple-precision multiplicationwith d-way splitting. The d-way splitting algorithm is shown in Fig. 2.

We consider the arithmetic operation count of multiple-precision floating-

point multiplication with d-way splitting. The total arithmetic operation count

Tfft for (N/d)-point FFTs, is as follows:

T fft1 ¼ 3d cfft1Ndlog2

Ndþ cfft2

Nd

� �¼ 3 cfft1Nðlog2N � log2dÞ þ cfft2Nf g: ð9Þ

The total arithmetic operation count Tprod for the point-by-point product ofthe Fourier coefficients, is as follows:

Page 6: An algorithm for multiple-precision floating-point multiplication

Fig. 2. The d-way splitting algorithm.

296 D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298

T prod ¼Xdi¼1

i

!cprod

Nd¼ d þ 1

2cprodN : ð10Þ

Then, the total arithmetic operation count Tsum for the partial sum of theFourier coefficients at the same place is as follows:

T sum ¼Xd�1

i¼1

i

!csum

Nd¼ d � 1

2csumN : ð11Þ

From (9)–(11) the overall arithmetic operation count of multiple-precision

floating-point multiplication with d-way splitting, Tsplit, is as follows:

T split ¼ T fft þ T prod þ T sum þ T norm

¼ 3 cfft1Nðlog2N � log2dÞ þ cfft2Nf g þ d þ 1

2cprodN þ d � 1

2csumN þ cnormN

¼ T conv þ �3cfft1log2d þ d � 1

2cprod þ

d � 1

2csum

� �N : ð12Þ

The overall arithmetic operation count of the splitting algorithm is less than

that of the conventional algorithm when

�3cfft1log2d þ d � 1

2cprod þ

d � 1

2csum < 0 ð13Þ

follows from (12).From (13), we conclude that the d-way splitting algorithm is better than the

conventional algorithm if

cprod þ csum6cfft1

<log2dd � 1

;

where d is independent of N.

In order to evaluate the effectiveness of the d-way splitting algorithm, we

compare the arithmetic operation counts between the conventional algorithm

and the splitting algorithm.

Page 7: An algorithm for multiple-precision floating-point multiplication

0

0.2

0.4

0.6

0.8

1

1.2

1 2 4 8 16 32 64

Tco

nv /

Tsp

lit

d

N=2^8N=2^12N=2^16N=2^20

Fig. 3. Ratio of arithmetic operation count of splitting algorithm and conventional algorithm.

D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298 297

Prior to our comparison, we determine the values of cfft1, cfft2, cprod and csum.

Since the arithmetic operation count of the N-point real split-radix FFT is2N log2N � 4N [11], we assume that cfft1 = 2 and cfft2 = �4. In general, a com-

plex multiplication is done with four real multiplications and two real addi-

tions. Since we can utilize the symmetric property of the real FFT, we

assume that cprod is 3. Then, we assume that csum is 1.

We show the ratio of Tconv/Tsplit for the above assumption when d and

N(= 2n) are varied in Fig. 3. The ratio Tconv/Tsplit shows the improvement that

the splitting algorithm provides over the conventional FFT-based multiple-

precision multiplication. We can see that the arithmetic operation count ofthe splitting algorithm is less than that of the conventional algorithm for

2 6 d 6 8. In particular, in comparison with the conventional algorithm,

the 4-way splitting algorithm saves about 18% of the arithmetic operation

counts for N = 28.

4. Conclusion

In this paper, we proposed an algorithm for multiple-precision floating-

point multiplication.

We showed that the overall arithmetic operations for FFT-based multiple-

precision floating-point multiplication are reduced by decomposition of the

full-length multiplication into shorter-length multiplication.

In comparison with the conventional algorithm, the 4-way splitting algo-

rithm saves about 18% of the arithmetic operation counts for N = 28.

Page 8: An algorithm for multiple-precision floating-point multiplication

298 D. Takahashi / Appl. Math. Comput. 166 (2005) 291–298

We conclude that 4-way splitting is optimal in terms of the total arithmetic

operation count of multiple-precision multiplication.

References

[1] D.E. Knuth, in: The Art of Computer Programming, vol. 2: Seminumerical Algorithms, third

ed., Addison-Wesley, Reading, MA, 1997.

[2] A. Karatsuba, Y. Ofman, Multiplication of multidigit numbers on automata, Doklady Akad.

Nauk SSSR 145 (1962) 293–294.

[3] A. Schonhage, V. Strassen, Schnelle Multiplikation grosser Zahlen, Computing (Arch.

Elektron. Rechnen) 7 (1971) 281–292.

[4] W. Krandick, J.R. Johnson, Efficient multiprecision floating point multiplication with optimal

directional rounding, in: Proc. 11th IEEE Symposium on Computer Arithmetic, 1993, pp. 228–

233.

[5] D. Zuras, More on squaring and multiplying large integers, IEEE Trans. Comput. 43 (1994)

899–908.

[6] T. Mulders, On short multiplications and divisions, Appl. Algebra Eng. Commun. Comput. 11

(2000) 69–88.

[7] J.W. Cooley, J.W. Tukey, An algorithm for the machine calculation of complex Fourier series,

Math. Comput. 19 (1965) 297–301.

[8] D.H. Bailey, The computation of p to 29,360,000 decimal digits using Borweins� quarticallyconvergent algorithm, Math. Comput. 50 (1988) 283–296.

[9] J.M. Borwein, P.B. Borwein, D.H. Bailey, Ramanujan, modular equations, and approxima-

tions to pi or how to compute one billion digits of pi, Am. Math. Mon. 96 (1989) 201–219.

[10] D.H. Bailey, Algorithm 719: Multiprecision translation and execution of FORTRAN

programs, ACM Trans. Math. Softw. 19 (1993) 288–319.

[11] H.V. Sorensen, D.L. Jones, M.T. Heideman, C.S. Burrus, Real-valued fast Fourier transform

algorithms, IEEE Trans. Acoust. Speech Signal Processing ASSP-35 (1987) 849–863.