ma3265 introduction to number theory - dept of maths, nuschanhh/notes/ma3265.pdf · 1 divisibility...

MA3265 Introduction to

Number Theory

Notes by Chan Heng Huat

Contents

References page 1

1 Divisibility 2

1.1 Introduction 2

1.2 Division algorithm and congruences 5

1.3 Greatest Common Divisors 8

1.4 Least common multiple 12

1.5 Euclid’s Lemma 13

1.6 Fundamental Theorem of Arithmetic 14

1.7 Euclid’s Lemma and the linear Diophantine equation 15

1.8 Appendix : The cyclic group (Z,+) 17

2 More on Congruences 19

2.1 Fermat’s Theorem and Euler’s Theorem 19

2.2 An extension of Fermat’s Little Theorem: The Euler Theorem 21

2.3 Finite abelian groups and Euler’s Theorem 23

2.4 The Chinese Remainder Theorem 25

2.5 Multiplicative functions and even perfect numbers 27

2.6 The Euler ϕ-function is multiplicative 28

2.7 An Application to Cryptography 30

2.8 Appendix : Group Action and Fermat’s Little Theorem 32

3 Solutions of congruences 35

3.1 The Chinese Remainder Theorem revisited 35

3.2 Prime power moduli 36

3.3 Prime moduli 39

3.4 Wolstenholme’s congruence 43

4 Primitive roots 44

4.1 Order of an element in a group 44

4.2 Integers m for which (Z/mZ)∗ is cyclic 45

4.3 Integers m for which (Z/mZ)∗ is not cyclic 49

5 Quadratic Reciprocity Law 50

iv Contents

5.1 Primitive roots and solutions of congruences 50

5.2 The Legendre Symbol 51

5.3 Gauss Lemma 52

5.4 Proofs of Gauss’ Quadratic Reciprocity Law 53

5.5 The Jacobi Symbol 56

5.6 The Transfer 59

5.7 Gauss Lemma via the transfer 62

5.8 The Legendre Symbol and primes of the form x2 + y2 62

5.9 Appendix : Primes of the form x2 + y2, a second approach 66

6 Binary Quadratic forms 68

6.1 Fermat-Euler Theorem 68

6.2 Representations of integers by binary quadratic forms 70

6.3 Equivalence of binary quadratic forms 74

6.4 Reduced forms 75

7 Form Class groups 80

7.1 The set C(d) 80

7.2 C(d) is a finite abelian group 82

7.3 The operation • is well defined 83

8 Continued fractions and Pell’s equations 89

8.1 Pell’s equations 89

8.2 Continued fractions 90

8.3 Infinite continued fraction 90

8.4 A simple Lemma and primes of the form x2 + y2 92

8.5 Solutions to Pell’s equations 94

8.6 The Pell equation x2 − dy2 = 1 is solvable with xy 6= 0 97

9 Euler’s identities and Jacobi Triple Product Identity 105

9.1 Jacobi’s Triple Product Identity 105

9.2 The terminating q-binomial Theorem 106

9.3 Two identities of Euler 108

9.4 Andrews’ proof of Jacobi’s Triple Product Identity 109

9.5 Number of representations of n as sums of two squares 111

9.6 The partition function p(n) 113

9.7 Proof of p(5n+ 4) ≡ 0 (mod 5) 116

10 Lattices and Minkowski’s Theorem 118

10.1 Lattices in Rn 118

10.2 Translations of fundamental parallelopiped 119

10.3 Spheres and bounded sets in Rn 120

10.4 Minkowski’s Theorem 120

10.5 Primes of the form x2 + y2, the geometric approach 122

Contents v

11 Elliptic curves 124

11.1 Rational Points on curves 125

11.2 The Geometry of cubic curves 127

11.3 Explicit form of the group law 130

11.4 The Pollard’s p− 1 method 130

11.5 Appendix 133

References

The main reference is “An introduction to the theory of numbers” by I. Niven,

H.S. Zuckerman and H.L. Montgomery. Most of the tutorial problems in this

course can be found in the book.

Other references are “A course in arithmetic” by J.P. Serre, “Primes of the

form x2 + ny2” by D.A. Cox, “Introduction to Analytic Number Theory” by

T.M. Apostol, “Introduction to Number Theory” by D. Flath and “Elementary

Number Theory” by D. Burton.

1 Divisibility

1.1 Introduction

Number Theory is a branch of mathematics that study problems leading to a

better understanding of the additive and multiplicative properties of integers

Z = {0,±1,±2,±3, · · · }.

We list several examples of such problems.

example 1.1

An integer s is called a perfect square if it is of the form s = m2 for some

integer m. One may ask when is an integer a sum of two perfect squares?

example 1.2

What are the most general solutions to the equation x2 + y2 = z2 where x, y, z ∈Z?

Remark 1.1 The triplet (x, y, z) satisfying x2+ y2 = z2 is called a Pythagorean

triplet.

example 1.3

Let n ≥ 3. Are there integers x, y, z with xyz 6= 0 such that xn + yn = zn?

Remark 1.2 This is the Fermat Last Theorem. It was proved by A. Wiles and

R. Taylor in 1995.

example 1.4

How many squares do we need to represent all positive integers?

definition 1.3 Let a and b be two integers such that ab 6= 0. We say that a

is a divisor of b if b = aq for some integer q. We use the notation a|b to mean a

divides b. We also say that a is a factor of b and that b is a multiple of a.

definition 1.4 A prime number is a positive integer with exactly two positive

divisors. An integer n > 1 which is not a prime is called a composite number.

1.1 Introduction 3

Prime numbers are the “atoms” of integers when we study their multiplicative

structure. There are many interesting problems (most of which are unsolved)

that are related to prime numbers. We list a few here.

example 1.5

Are there infinitely many primes of the form n2 + 1?

example 1.6

Are there infinitely many primes p such that p+ 2 is also a prime?

Remark 1.5 A breakthrough in this problem was discovered in 2013 by an

unknown mathematician Y.T. Zhang. He showed that there exists an even integer

a less than 70 million such that there are infinitely many primes p with the

property that p+a is also a prime. The number 70 million has since been reduced

to 270, thanks to the effort of J. Maynard and Polymath headed by T. Tao.

example 1.7

(Goldbach) Is an even positive integer greater than 2 always a sum of two primes?

example 1.8

(Goldbach) Is every odd positive integer greater than 6 a sum of three primes?

Remark 1.6 This problem was settled completely in 2013 by H. Helfgott. Before

Helfgott’s work, I.M. Vinogradov was able to show the truth of the statement for

sufficiently large odd positive integer. It can be shown that “sufficiently large”

odd integers are integers greater than 3315

).

example 1.9

For every positive integer n > 1, is there always a prime between n and 2n?

Remark 1.7 This is the Bertrand’s Postulate. Note that Bertrand’s Postulate

implies that there are infinitely many primes since there are infinitely many

positive integers. This fact was first proved by P. Chebyshev. Other proofs were

given independently by S. Ramanujan and P. Erdos.

definition 1.8 We say that two integers a and b are relatively prime if there

does not exist an integer c > 1 such that c|a and c|b.

example 1.10

Suppose a and b are two relatively prime positive integers. Are there infinitely

many primes of the form an+ b?

Remark 1.9 This is the Dirichlet Theorem of primes in arithmetic progression.

definition 1.10 The symbol π(x) denote the number of primes less than x.

4 Divisibility

example 1.11

Is it true that

limx→∞

π(x)

(x/ lnx)= 1?

Remark 1.11 This is the Prime Number Theorem. It was first observed by C.F.

Gauss around 1791 and A.M. Legendre around 1798. It was proved independently

around 1896 by J. Hadamard and Charles de la Vallee Poussin using complex

analysis. Elementary proofs of this result without using complex analysis were

later given by A. Selberg and P. Erdos. Both proofs required the Selberg identity.

Since the appearance of the proofs by Selberg and Erdos, other elementary proofs

were discovered. Some of these proofs were independent of Selberg’s identity and

one of the most elegant proofs was given by A. Hildebrand using the theory of

large sieve.

definition 1.12 A positive integer n is called a perfect number if it is the

sum of all its divisors d 6= n.

There are many even perfect numbers. The first two examples are 6 and 28.

Remark 1.13 An even integer is a perfect number if and only if it is of the form

2k−1(2k − 1) where 2k − 1 is a prime. (Try proving this statement.) A prime of

the form 2k − 1 is called a Mersenne prime.

One can ask the following question :

example 1.12

Does odd perfect number exist?

definition 1.14 An integer n is called a power if n = mk for some integer

k > 1.

example 1.13

It is clear that 23, 32 are consecutive powers. Are there any other consecutive

powers?

Remark 1.15 This is the Catalan conjecture and it was proved in 2002 by P.

Mihailescu..

There are many other problems besides those stated here. Some of these prob-

lems are solved while others remained open.

In this course, we will study some basic results in various areas of number

theory. Hopefully by the end of this course, students will be able to appreciate the

relations among different areas of mathematics through solving number theoretic

problems and that they will be able to appreciate the beauty of this topic.


1.2 Division algorithm and congruences

We have seen in Definition 1.3 that if a is a divisor of b, we write a|b. When a is

not a divisor of b, we use the notation a - b.

theorem 1.16 Let a, b, c, x,m, y be integers.

(a) a|b implies that a|bc for any integer c;

(b) a|b and b|c imply a|c;(c) a|b and a|c imply a|(bx+ cy) for any integers x and y;

(d) a|b and b|a imply a = ±b;(e) a|b, a > 0, b > 0, imply a ≤ b;

(f) Let m be a nonzero integer. Then a | b implies that am | bm; if am | bm,then a | b.

Proof

We prove (c). Suppose a | b and a | c , then b = ak and c = al for some integers

k and l. Now,

bx+ cy = a(kx+ ly).

Hence a|(bx+ cy).

theorem 1.17 (The Division Algorithm) Given any integers a and b with

a > 0, there exist unique integers q and r such that b = qa + r, 0 ≤ r < a. If

a - b, then r satisfies the stronger inequalities 0 < r < a.

To prove the above result, we will first need the least integer axiom, which

says that if S is the nonempty subset of non-negative integers, then there is an

element r ∈ S such that r ≤ s for all s ∈ S. This principle cannot be proved

but it is certainly plausible. Suppose 0 is not in S, then 0 is not the smallest

element in S. If 1 is not in S, then 1 is not the smallest element in S. Since S is

nonempty, then there must be an integer r that happens to be the first integer

in S. This integer r is the smallest integer in S.

Remark 1.18 One can show that the least integer axiom is equivalent to the

mathematical induction.

We are now ready to prove the Division Algorithm.

Proof

Let

S = {y : y = b− ax, x is an integer, y ≥ 0}.

Note that since a ≥ 1 and

b− a · (−|b|) = |b|(b

|b| + a

)≥ 0,

6 Divisibility

we see that S is non-empty. By the least integer axiom, it has a smallest member,

say r = b− aq. Then b = aq + r and r ≥ 0. Now we show that r < a.

Suppose r ≥ a. Then 0 ≤ r−a = b−a(q+1) and hence r−a ∈ S. Since r−a < r,

this contradicts the minimality of r. Hence, r < a. The pair q, r is unique, for

if there is another pair q′, r′ such that b = aq′ + r′, then a(q′ − q) = r − r′. If

r = r′, then q = q′ and we complete the proof of the Theorem. Suppose r 6= r′.

Then |r − r′| > 0 and a| |r − r′|. By Theorem 1.16(e), we find that

|r − r′| ≥ a. (1.1)

But 0 ≤ r′ < a implies that −a < −r′ ≤ 0. Together with 0 ≤ r < a, we

conclude that

−a < r − r′ < a,

or

|r − r′| < a.

This contradicts (1.1).

Remark 1.19 If we let a = 2 in Theorem 1.17, then our conclusion is that every

integer is of the form 2k or 2k+1. If we let a = 4, then we find that every integer

is of the form 4k, 4k + 1, 4k + 2 or 4k + 3.

We now introduce congruences.

definition 1.20 Let n > 0 be an integer. We say that “a is congruent to b

modulo n” and write

a ≡ b (mod n)

if

n|(a− b).

With the above notation, we can replace the sentence “Every integer n is of

the form 2k or 2k + 1.” by “Every integer n is such that n ≡ 0 or 1 (mod 2).”.

We now list a few basic properties of congruences.

theorem 1.21 Let a, b, c, d, n be integers with n > 0. Then

(a) For all integers k, k ≡ k (mod n).

(b) If a ≡ b (mod n) then b ≡ a (mod n).

(c) If a ≡ b (mod n) and b ≡ c (mod n) then a ≡ c (mod n).

(d) If a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n) and

ac ≡ bd (mod n).

We will only supply the proof of Theorem 1.21 (d).


Proof of Theorem 1.21 (d)

Since a ≡ b (mod n) and c ≡ d (mod n), we deduce that n|(a−b) and n|(c−d).Hence, a − b = nk and c − d = nj. Therefore, a + c = b + d + n(k + j), which

implies that n|(a + c) − (b + d), i.e., a + c ≡ b + d (mod n). Now, a = nk + b

and c = nj + d also implies that ac = bd + n(kd + bj + nkj). Hence ac ≡ bd

(mod n).

The use of congruences allow us to have shorter presentations of solutions to

the next few problems.

example 1.14

Show that the product of two consecutive integers is always even.

Solutions. Let n be the integer. Then by Theorem 1.17, n ≡ 0 or 1 (mod 2).

Hence n(n+1) ≡ 0 · 1 or 1 · (1+1) (mod 2). In both cases, we have n(n+1) ≡ 0

(mod 2) and hence, n(n+ 1) is even.

example 1.15

Show that 3 divides a(a2 + 2) for any integer a ∈ Z.

Solutions. Let a be any integer. Then a ≡ 0, 1 or 2 (mod 3). This implies that

a(a2 + 1) ≡ 0 · 1, 1 · 3 or 2 · 3 (mod 3).

In all cases, a(a2 + 2) ≡ 0 (mod 3) and hence 3|a(a2 + 2).

example 1.16

Show that if n is of the form 4k + 3 then n is not a perfect square.

Solutions. Let m be any integer. Then m ≡ 0, 1, 2 or 3 (mod 4). This implies

that m2 ≡ 0 or 1 (mod 4) and therefore a square can only be of the form 4k or

4k + 1.

example 1.17

Prove that no integer in the sequence 11, 111, 1111, 11111, · · · is a perfect square.

Solutions. First, note that 11 ≡ 3 (mod 4) is not a square. The rest of the

integers are of the form

10k + 10k−1 + · · ·+ 100 + 10 + 1

where k ≥ 2. But these integers can be written as

100(10k−2 + · · ·+ 1) + 11 ≡ 3 (mod 4)

and so it is of the form 4N + 3, and hence cannot be a square.

example 1.18

Show that if p is an odd prime of the form 4k + 3 then p is not a sum of two

squares.

Solutions. We have seen that a square is congruent to 0 and 1 modulo 4. Hence

8 Divisibility

a sum of two squares must be congruent to 0,1 or 2 modulo 4. This means that

any integer of the form 4k + 3 cannot be a sum of two squares. In particular, a

prime of the form 4k+3 is not a sum of two squares. This completes our solution

of the problem.

Now, since an odd prime is either of the form 4k+1 or 4k+3, it is natural to

determine if primes of the form 4k + 1 were a sum of two squares. It turns out

that this is always true. We will see proofs of this result in subsequent chapters.

example 1.19

Show that 41|(220 − 1).

Solutions. 25 = 32 ≡ −9 (mod 41). Therefore, 220 = (25)4 ≡ (−9)4 (mod 41).

Now, (−9)4 ≡ 812 ≡ 1 (mod 41). Hence 220 − 1 ≡ 0 (mod 41).

example 1.20

Find the remainder of 1! + 2! + · · ·+ 100! when this number is divided by 12.

Solutions. Note that 4! ≡ 0 (mod 12) and n! ≡ 0 (mod 12) when n > 4.

Therefore 1! + 2! + · · ·+100! ≡ 1! + 2!+ 3! ≡ 1+ 2+ 6 ≡ 9 (mod 12). Therefore

the remainder is 9.

1.3 Greatest Common Divisors

definition 1.22 A common divisor of integers a and b is an integer c with

c|a and c|b.

definition 1.23 A greatest common divisor of integers a and b is a number

d with the following properties :

(a) The integer d is nonnegative.

(b) The integer d is a common divisor of a and b.

(c) If e is any common divisor of a and b, then e|d.Note that if d and d′ are both greatest common divisors of a and b, then d is

a common divisor of a and b and d′ is a greatest common divisor, we note that

d′|d using Definition 1.23 (c). Similarly, since d′ is a common divisor and d is a

greatest common divisor, d|d′. By Theorem 1.16 (d), |d| = |d′| and by Definition

1.23 (a), we deduce that d = d′. This shows that the greatest common divisor of

a and b is unique.

definition 1.24 The greatest common divisor of a and b is denoted by (a, b).

Remark 1.25 When a and b are zeros, then (0, 0) = 0. If a = 0 and b is

non-zero, then (0, b) = b.

We will next show that the greatest common divisor of two integers exists. By

Remark 1.25, it suffices to consider the case when both a and b are nonzero.


theorem 1.26 Let a and b be nonzero integers. Then the smallest positive

integer in the set

P := {sa+ tb|s, t ∈ Z and sa+ tb > 0}

is (a, b).

Proof

If a is positive then a ∈ P since

a = 1 · a+ 0 · b.

Similarly, if b is positive, then b ∈ P . Suppose a and b are both negative. Then

0 · a+(−1) · b ∈ P . Hence that P is nonempty. By the least integer axiom, there

is a smallest positive integer, say d, in P . We must show that d = (a, b).

We observe that d ≥ 1 since we have assumed that both a and b are nonzero.

Next, since d ∈ P ,

d = xa+ yb (1.2)

for some integers x, y ∈ Z. We first show that d is a common divisor of a and b.

We claim that d|a. Suppose not. By Theorem 1.17, we may suppose

a = dq + r, 0 < r < d.

Then

r = a− dq = a− (xaq + ybq) = a(1− xq) + byq.

Therefore, r ∈ P and it is smaller than d. But d is the smallest element in P .

Hence, we conclude that d | a. In a similar way, we find that d | b. This shows

that d is a common divisor of a and b.

Finally, if c|a and c|b then a = cu and b = cv. This implies, by (1.2), that

xa+ yb = c(ux+ vy) = d

and hence, c|d. This shows that d satisfies Definition 1.23 and we conclude that

d = (a, b).

Remark 1.27 Note that the above theorem says that if d = (a, b) then there

exists integers x and y such that

d = ax+ by.

Remark 1.28 The condition in Definition 1.23 (c) can be replaced by

(c)’ If e|a and e|b then e ≤ d.

This condition explains the word “greatest” used in the definition of (a, b).

We recall that two integers a and b are relatively prime if their only common

divisor is 1. Now, (a, b) is a common divisor and this means that (a, b) = 1

since there is only one common divisor. In other words, we find that a and b are

relatively prime if and only if (a, b) = 1.

10 Divisibility

theorem 1.29 Let a and b be integers. Then (a, b) = 1 (or a and b are

relatively prime) if and only if 1 = ax+ by for some integers x and y.

Proof

Note that if (a, b) = 1, then 1 = ax+ by for some integers x and y.

Conversely, if 1 = ax+by, then (a, b)|a and (a, b)|b, and therefore (a, b)|1. Thisimplies that (a, b) = 1.

example 1.21

Show that if (m,n) = 1 then m|c and n|c implies that (mn)|c.

Solutions. Since m|c and n|c, we have c = mh and c = nk for some integers h

and k. Now, the fact that (m,n) = 1 implies that

1 = mu+ nv

for some integers u and v. Hence,

c = mcu+ ncv = m(nk)u+ n(mh)v = mn(ku+ hv).

This implies that (mn)|c.

We now list down some basic properties of the greatest common divisor of two

integers.

theorem 1.30 Let a, b and c be integers. Then

(a) (a, b) = (b, a) (commutative law),

(b) (a, (b, c)) = ((a, b), c) (associative law),

(c) (ac, bc) = |c|(a, b), and

(d) (a, 1) = (1, a) = 1. If a is non-zero, then (a, 0) = (0, a) = |a|.

Proof

We will prove only (c) and leave the proofs of the other statements as exercises.

Let d = (ac, bc) and d′ = |c|(a, b). By Theorem 1.26, d = acx+ bcy for some

integers x and y. Hence,

d =c

|c| (a · |c| · x+ b · |c| · y) .

Now, d′ = |c|(a, b) and since (a, b)|a and (a, b)|b, we find that d′ is a common

divisor of a · |c| and b · |c| and therefore, by (1.3), d′|d.Next, since d′/|c| = (a, b), by Theorem 1.26,

d′

|c| = au+ bv

for some integers u and v. This implies that

d′ = a · |c| · u+ b · |c| · v =|c|c(acu+ bcv) .

But d is a common divisor of ac and bc and hence d|d′. Since d′|d and d|d′, we


conclude by Theorem 1.16 (d) that |d| = |d′|. Since both d and d′ are positive,

we deduce that d = d′.

In the next Theorem, we prove The Euclidean Algorithm. This allows us to

compute the greatest common divisor of two numbers.

theorem 1.31 (The Euclidean Algorithm) Given positive integers a and b,

where a - b. Let r0 = b, r1 = a, and apply the division algorithm repeatedly to

obtain a set of remainders r2, r3, ..., rn, rn+1 defined successively by the relations

r0 = r1q1 + r2, 0 < r2 < r1

r1 = r2q2 + r3, 0 < r3 < r2

...

rn−2 = rn−1qn−1 + rn, 0 < rn < rn−1

rn−1 = rnqn + rn+1, rn+1 = 0.

Then rn, the last nonzero remainder in this process is (a, b), the greatest common

divisor of a, and b.

We need a simple lemma.

lemma 1.32 Let a, b, q, and r be integers such that b = aq + r, then (a, b) =

(a, r).

Proof

Let d = (a, b) and d′ = (a, r). Note that since d|a and d|b, we find that d|(b−aq)by Theorem 1.16 (c). Hence, d|r and d is a common divisor of b and r. By

Definition 1.23 (c), d|d′ since d′ = (a, r). Similarly, d′|b and d′|r implies that

d′|(aq + r) by Theorem 1.16 (c) and consequently, d′|b. By Definition 1.23 (c),

d′|d since d = (a, b). Therefore, by Theorem 1.16 (d), d = d′.

We now complete the proof of Theorem 1.31.

Proof of Theorem 1.31

There is a stage at which rn+1 = 0 because the ri are decreasing and nonnegative.

Next, applying Lemma 1.32, we find that

(a, b) = (r0, r1) = (r1, r2) = · · · = (rn, rn+1) = (rn, 0) = rn.

This completes the proof of Theorem 1.31.

example 1.22

Find (196884, 576).

12 Divisibility

Solutions.

196884 = 341 · 576 + 468

576 = 1 · 468 + 108

468 = 4 · 108 + 36

108 = 3 · 36 + 0.

Therefore (196884, 576) = 36.

example 1.23

Find integers x and y satisfying

43x+ 64y = 1.

Solutions.

64 = 43 · 1 + 21

43 = 21 · 2 + 1.

Working backwards, we have

1 = 43− 21 · 2 = 43− 2 · (64− 43) = 43 · 3 + 64 · (−2).

1.4 Least common multiple

definition 1.33 A common multiple of two integers a and b is a positive

integer m with the property that a|m and b|m.

definition 1.34 The least common multiple of two integers a and b is an

integer m such that

(a) m > 0,

(b) a|m and b|m, and

(c) If c is a common multiple of a and b then m|c.

Note that [a, b] = [|a|, |b|] and so we will consider [a, b] when a and b are

positive. We first establish the existence of the least common multiple of a and

b. Consider the set

S := {c ∈ Z+ | a|c and b|c}.

By least integer axiom, there exists a common multiple m of a and b such that

m ≤ c if c is any common multiples of a and b. We need to show that m|c.Suppose m - c for some common multiple c of a and b. Write

c = mq + r, 0 < r < m.

1.5 Euclid’s Lemma 13

Since a|c and a|m, we find that a|r. Similarly, b|r and hence r ∈ S. But r < m,

contradicting the minimality of m. Hence, m|c.We now give two properties of the least common multiple of a and b. Note

that part (a) of the following result is an analogue of (ka, kb) = |k|(a, b).

theorem 1.35 Let a, b,m be positive integers. Then

(a) [ma,mb] = m[a, b], and

(b) [a, b] · (a, b) = ab.

Proof

Let H = [ma,mb] and h = [a, b]. Then a|h and b|h. Hence, ma|mh and mb|mh.This implies that H |mh.Next, ma|H and mb|H . Hence, a|(H/m) and b|(H/m) and so h|(H/m). Hence

mh|H. Therefore H = mh.

We may assume that a, b are both positive since (a,−b) = (a, b) and [a,−b] =[a, b]. Our first step is to show that if (h, k) = 1 then [h, k] = hk.

Suppose (h, k) = 1 and that h|c and k|c. Then by Example 1.21, we conclude

that hk|c. This implies that hk = [h, k] since all common multiples of h and k

are less than or equal to hk.

Now, let a and b be two integers and d = (a, b). Then [a, b] = d[a/d, b/d].

Since (a/d, b/d) = 1, we conclude that [a/d, b/d] = ab/d2. Therefore [a, b] =

d(ab/d2) = ab/d, or (a, b)[a, b] = ab.

1.5 Euclid’s Lemma

We know that if c 6= 0 then ca = cb implies that a = b. This is the law of

cancelation for equality. The law is not true in general if we replace “=” by ≡.

For example, 15 ≡ 3 (mod 12) but 5 6≡ 1 (mod 12). This shows that the law of

cancellation does not hold in general for congruences.

The next result (Euclid’s Lemma) shows that the law of cancelation holds if

we impose a condition on the integer c.

theorem 1.36 If ca ≡ cb (mod n) and (c, n) = 1, then a ≡ b (mod n).

Proof

Recall from Theorem 1.26 that if (c, n) = 1 then there exist integers x and y

such that cx+ ny = 1. Multiplying a and b yields

acx+ any = a

and

bcx+ bny = b,

respectively. This implies that

(ac− bc)x+ n(ay − by) = a− b.

14 Divisibility

Since n|(ac− bc) and n|n, we conclude that n|(a− b).

Theorem 1.36 can be used to prove the following result of Euclid.

corollary 1.37 Let p be a prime. If p|(ab), then p|a or p|b.Proof

For any integer n, (n, p) = 1 or p since p has only two divisors. Suppose p - a.

Then (p, a)=1. By Theorem 1.36, the relation

ab ≡ 0 (mod p)

then implies that

b ≡ 0 (mod p).

By induction, we have the following:

corollary 1.38 Let p be a prime. If p|(a1a2 · · · am) then p|ak for some k.

1.6 Fundamental Theorem of Arithmetic

theorem 1.39 (Fundamental Theorem of Arithmetic) Every positive integer

n > 1 can be expressed as a product of primes; this representation is unique

apart from the order in which the factors occur.

Proof

We first show that n can be expressed as a prime or a product of primes. We

use induction on n. The statement is clearly true for n = 2 since 2 is a prime.

Suppose m is a prime or a product of primes for 2 ≤ m ≤ n− 1. If n is a prime

then we are done. Suppose n is composite then n = ab, where 1 < a, b < n. By

induction each of the a and b is either a prime or a product of primes. Hence,

n = ab is a product of primes. By mathematical induction, every positive integer

n > 1 is a prime or a product of primes.

To prove uniqueness, we use induction on n again. If n = 2 then the repre-

sentation of n as a product of primes is clearly unique. Assume, then that it is

true for all integers greater than 1 and less than n. We shall prove that it is also

true for n. If n is prime, then there is nothing to prove. Assume, then, that n is

composite and that n has two factorizations, say,

n = p1p2 · · · ps = q1q2 · · · qt. (1.3)

Since p1 divides the product q1q2 · · · qt, it must divide at least one factor. Relabel

q1, q2, ..., qt so that p1|q1. Then p1 = q1 since both p1 and q1 are primes. In (1.3),

we may cancel p1 on both sides to obtain

n/p1 = p2 · · · ps = q2 · · · qt.

1.7 Euclid’s Lemma and the linear Diophantine equation 15

Now the induction hypothesis implies that the two factorizations of n/p1 must

be the same, apart from the order of the factors. Therefore, s = t and the

factorizations in (1.3) are also identical, apart from order. This completes the

proof.

example 1.24

Let

a =∏

p

pαp and b =∏

p

pβp .

Let

d =∏

p

pmin(α0,βp).

Show that d = (a, b). Show that if

` =∏

p

pmax(αp,βp),

then ` = [a, b].

1.7 Euclid’s Lemma and the linear Diophantine equation

In Example 1.23, we give a particular solution to

43x+ 64y = 1.

In general, an equation of the type

ax+ by = m

is called a linear Diophantine equation. We now discuss how one can obtain

general solutions of linear Diophantine equation.

Suppose (a, b) = d. If d - m then the equation has no solution. Suppose d|m.

Let x0 and y0 be a particular solution of ax+ by = m. Let X and Y be general

solution of the equation. Then

a(X − x0) = b(y0 − Y ).

Hence,

a

d(X − x0) =

b

d(y0 − Y ).

Since (a/d, b/d) = 1, we conclude from Theorem 1.36 that

y0 − Y =a

dk and X − x0 =

b

dk.

The general solution is therefore

X =b

dk + x0 and Y = −a

dk + y0.

16 Divisibility

example 1.25

When Mr Smith cashed a check at his bank, the teller mistook the number of

cents for the number of dollars and vice versa. Unaware of this, Mr Smith spent

68 cents and then noticed to his surprise that he had twice the amount of the

original check. Determine the smallest value for which the check could have been

written.

Solutions. Let x be the number of dollars and y be the number of cents. From

the information given, we set up the equation

98y − 199x = 68.

Using Euclidean algorithm, we find that

1 = 33 · 199− 98 · 67,

This implies a particular solution for

98y − 199x = 68

is

y0 = −67 · 68 and x0 = 33 · 68.

So the general solution for

98y − 199x = 68

is

y = 199k − 67 · 68 and x = 98k + 33 · 68.

Now, 0 < y < 100 implies that 22 < k ≤ 23 and hence, k = 23. Thus, y = 21

and x = 10.

We now apply the solutions of linear Diophantine equation to solve linear

congruences. We will prove that

theorem 1.40 Given integers a, b such that (a, b) = d and d|N , the number

of solutions satisfying

ax ≡ N (mod b) (1.4)

is exactly d.

Proof

Suppose ax ≡ N (mod b). Then

ax = N − by. (1.5)

If x0 and y0 are integers satisfying (1.5), then the general solution of (1.5) is

X =b

dk + x0 and Y = −a

dk + y0.

1.8 Appendix : The cyclic group (Z,+) 17

This implies that X is a solution of (1.4) for any integer k. We claim that

modulo b there are exactly d solutions. Let k = dq + r with 0 ≤ r < d, and

b

d(dq + r) ≡ b

dr (mod b).

So any solution with k > d can be identified with a solution r < d. Next, no

two solutions from the set of solutions

{ bdk + x0, 0 ≤ k < d}

are the same modulo b. Suppose

b

di+ x0 ≡ b

dj + x0 (mod b).

Thenb

d(i − j) ≡ 0 (mod b).

This implies that d|(i− j). Since i and j are both less than d, i = j. Hence there

are exactly d solutions to (1.4).

Remark 1.41 Theorem 1.40 will be needed for determining the number of so-

lutions of congruences of the type

xm ≡ c (mod n).

1.8 Appendix : The cyclic group (Z,+)

Let G be a nonempty set and ∗ be function from G×G to G satisfying

(a) There exists an e ∈ G such that for all g ∈ G, e ∗ g = g ∗ e = g.

(b) For all g ∈ G, there exists a g′ ∈ G such that g ∗ g′ = g′ ∗ g = e.

(c) For all g, h, k ∈ G, g ∗ (h ∗ k) = (g ∗ h) ∗ k.

The set G together with the binary operation ∗ is called a group.

If H is a nonempty subset of G and (H, ∗) is a group, we say that H is a

subgroup of G and the notation is H ≤ G. One can use a simple criterion to

check if H were a subgroup of G. This is given by H ≤ G if and only if for all

h, k ∈ H , hk−1 ∈ H .

One of the simplest example of a group is (Z,+). The identity is 0 and the

inverse of n is −n. This is a cyclic group (that is, a group that is generated by one

element, namely 1). Using Division Algorithm, we can show that any subgroup

of a cyclic group is cyclic. Now, if we define

T = {au+ bv|u, v ∈ Z},

then we can easily check that T is a subgroup of Z since au+ bv− (au′ + bv′) =

a(u − u′) + b(v − v′) ∈ T. Since (Z,+) is cyclic, we conclude that T is cyclic

18 Divisibility

and hence T = dZ, that is, T is generated by one element d. We may choose d

to be positive. Now, note that both a and b are in T . Hence d|a and d|b. Sinced ∈ T , d = au + bv for some integers u, v. Suppose c|a and c|b. Then from the

representation of d, we conclude that c|d. Hence d is the greatest common divisor

of a and b.

2 More on Congruences

2.1 Fermat’s Theorem and Euler’s Theorem

definition 2.1 Let m be a positive integer. A set

S = {x0, x1, · · · , xm−1|xi ∈ Z}

is called a complete residue system if xi 6≡ xj (mod m) whenever i 6= j.

Let C = {0, 1, · · · ,m− 1}. Then C is a complete residue system modulo m. If

we define

S = {0 ≤ r ≤ m− 1|r ≡ s (mod m) for some s ∈ S},

then we find that S = C. To see this, define a function ψ(s) = s′ where s′ is

between 0 and m − 1 and s ≡ s′ (mod m). The condition on complete residue

system for s ∈ S shows that ψ is an injection and hence, a bijection since

|S| = |C|.The complete residue system modulo m is not unique. For example, when

n = 5, the set {0, 1, 2, 3, 4} is a set of complete residues modulo 5. The set

{5, 11, 22, 33, 14} is also a set of complete residues modulo 5.

Note that the set {1, 11, 12, 13, 14} is not complete since 1 ≡ 11 (mod 5).

We now use the concept of residues modulo p to prove a famous Theorem of

Fermat.

theorem 2.2 (Fermat’s Little Theorem) If p is a prime and p - a then

ap−1 ≡ 1 (mod p),.

Remark 2.3 Fermat’s Little Theorem is sometimes written as: For a fixed prime

p,

ap ≡ a (mod p)

for all integer a. Note that when written in this form, the condition p - a can be

removed since ap ≡ a ≡ 0 (mod p) when p|a.

Proof


Let

S = {0, 1, 2, · · · , p− 1}.Suppose p - a. Let T = {a · 0, a · 1, · · · , a · (p − 1)}. Note that no two elements

in T are congruent to each other modulo p. For if ka ≡ ja (mod p), then k ≡ j

(mod p), since (a, p) = 1 and cancelation law applies. Hence, T is a complete

residue system and

T = S.

where

T = {0 ≤ r ≤ m− 1|r ≡ s (mod m) for some s ∈ S}.

Hence, the product of all elements in T must be congruent to the product of

all elements in S. In other words, if we multiply all the elements in T − {0},the product must be congruent to the product of all the elements in S − {0}.Therefore,

ap−1(p− 1)! ≡ (p− 1)! (mod p).

Since p - (p − 1)!, (p, (p − 1)!) = 1 and cancelation law holds again. Therefore,

ap−1 ≡ 1 (mod p).

Fermat’s Little Theorem can be a labor-saving device in certain calculations.

example 2.1

Prove that 18351910 + 19862061 ≡ 0 (mod 7).

Solutions. We have 1835 ≡ 1 (mod 7) and 1986 ≡ 5 (mod 7). Therefore

18351910 + 19862061 ≡ 1 + 52061 (mod 7).

Now, 2061 ≡ 3 (mod 6) and therefore, 2061 = 6q+3, for some integer q. Hence,

52061 ≡ 56q+3 ≡ (56)q · 53 ≡ 53 (mod 7)

since 56 ≡ 1 (mod 7), by Theorem 1.1. The congruence thus reduces to

1 + 52061 ≡ 1 + 53 ≡ 1 + 6 ≡ 0 (mod 7).

Fermat’s Little Theorem may also be used as a tool for primality testing. For

example, for a given n if we can find an a such that (a, n) = 1 and an−1 6≡ 1

(mod n) then we conclude immediately that a is not relatively prime to n. This

implies that n is not a prime. The following example illustrate this point.

example 2.2

Show that 91 is not a prime.

Solutions. We have

290 ≡ 210·9 ≡ 10249 ≡ 239 ≡ 643 ≡ 64 (mod 91).

Since 290 6≡ 1 (mod 91), 91 is composite.

2.2 An extension of Fermat’s Little Theorem: The Euler Theorem 21

Of course this test is very inefficient for several reasons. Firstly, one might

need to select a number a several times before arriving at a conclusion that n

is composite. Secondly, computing powers of a number is very time consuming.

Thirdly, it may be inconclusive even if an−1 ≡ 1 (mod n) for all a < n as the

following example shows.

example 2.3

Show that a561 ≡ a (mod 561) for a < 561.

Solutions. If a is prime to 561, then a2 ≡ 1 (mod 3), a10 ≡ 1 (mod 11) and

a16 ≡ 1 (mod 17). Now,

a560 ≡ (a2)280 ≡ 1 (mod 3)

a560 ≡ (a10)56 ≡ 1 (mod 11)

a560 ≡ (a16)35 ≡ 1 (mod 17)

Therefore a560 ≡ 1 (mod 561), or a561 ≡ a (mod 561). If a is not relatively to

561, the above still holds. For if N = 3 · 11 · 17 and (a, 561) = 561, then

a561 ≡ a ≡ 0 (mod 561).

If N = 3, 11, 17, 3 ·11, 3 ·17, or 11 ·17 and (a, 561) = N , then since N is square

free, (a, 561/N) = 1 and hence, using Fermat’s Little Theorem, we have

a561 ≡ a (mod 561/N).

Since N |a, we also have

a561 ≡ a ≡ 0 (mod N).

Hence,

a561 ≡ a (mod 561).

A number which satisfies the property that an ≡ a (mod n) for all a is called

a Carmichael number. It has been a conjecture for nearly 90 years that there are

infinitely many Carmichael numbers. This result has only recently been proved

by W.R. Alford, A. Granville, C. Pomerance (see Ann. of Math. 140 (1994),

703-722).

2.2 An extension of Fermat’s Little Theorem: The Euler Theorem

Fermat’s Little Theorem is false when n is composite. For example, 55 ≡ 5

(mod 6), i.e., 56 6≡ 5 (mod 6). In this Section we will learn an extension of

Fermat’s Little Theorem. But first let us define a very important function.


definition 2.4 The function ϕ(n) is defined to be the number of elements

1 ≤ x < n which are relatively prime to n. The function ϕ(n) is usually called

Euler’s ϕ function.

example 2.4

There are altogether 2 numbers less than 6 which are relatively prime to 6. These

are 1 and 5. Therefore, ϕ(6) = 2. For n = 12, ϕ(12) = 4 and these four numbers

are 1, 5, 7, 11.

example 2.5

Note that when p is prime, all the numbers 1 ≤ x < p are relatively prime to p

and so,

ϕ(p) = p− 1. (2.1)

For example, ϕ(7) = 6 and the numbers that are relatively prime to 7 are

1, 2, · · · , 6.

We are now ready to state Euler’s Theorem:

theorem 2.5 (Euler’s Theorem) Suppose (a, n) = 1. Then

aϕ(n) ≡ 1 (mod n).

Note that when n is prime, by (2.1), ϕ(p) = p− 1. So Theorem 2.5 reduces to

Fermat’s Little Theorem, since

aϕ(p) ≡ ap−1 ≡ 1 (mod p).

Proof

The idea of the proof is the same as that of Fermat’s Little Theorem. Consider

the set

S := {a1, a2, · · · , aϕ(n)},

where S contains all the positive integers x < n which are relatively prime to n.

(Note by definition, there are ϕ(n) of them). Suppose a is relatively prime to n.

We define

T = {a · a1, a · a2, · · · , a · aϕ(n).}.

Once again, no two elements in T are congruent modulo n. For if

a · ai ≡ a · aj (mod n),

then

ai ≡ aj (mod n)

since (a, n) = 1. Hence each element in T is congruent modulo n to exactly one

2.3 Finite abelian groups and Euler’s Theorem 23

element in S. Now, taking the product of all the elements of T , we conclude

that

aϕ(n)

ϕ(n)∏

k=1

ak ≡ϕ(n)∏

k=1

ak (mod n).

Once again since (ai, n) = 1 we have

ϕ(n)∏

k=1

ak, n

= 1

and applying cancelation law, we obtain

aϕ(n) ≡ 1 (mod n).

example 2.6

Find the remainder of 44444444 when it is divided by 9.

Solutions. Observe that 4444 ≡ −2 (mod 9) and ϕ(9) = 6 (the positive integers

< 9 and relatively prime to 9 are 1, 2, 4, 5, 7, 8). Since a6 ≡ 1 (mod 9) when

(a, 9) = 1 by Theorem 2.5. It is natural to write the exponent in form 6q + r.

Now, 4444 = 6q + 4 and so,

44444444 ≡ (−2)6q+4 ≡ ((−2)6)q · (−2)4 ≡ (−2)4 ≡ 7 (mod 9).

Hence, the remainder is 7.

2.3 Finite abelian groups and Euler’s Theorem

The division algorithm states that if n is a fixed positive integer then any integer

N can be written in the form qn+ r where 0 ≤ r < n. In other words, the set Z

can be partitioned into n distinct sets, denoted by [r]n, 0 ≤ r < n, where

[r]n := {k ∈ Z|k ≡ r (mod n)}.

Let

Z/nZ = {[r]n|0 ≤ r < n}

and define the binary operation

[a]n+[b]n = [a+ b]n.

Then the set Z/nZ forms a group under the operation +. This group is “similar”

to the additive group (Z,+). Of course, the set Z together with the binary

operation · is not a group. Can we construct a group for Z/nZ with operation

the analogue of ·? The answer is yes but with a “twist”.


We let

(Z/nZ)∗= {[r]n|(r, n) = 1, 0 ≤ r < n}.

Define

[a]n·[b]n = [a · b]n.

Then the set (Z/nZ)∗together with · is a multiplicative group.

The Lagrange Theorem states that for any finite group G, the order of any

subgroup H of G divides the order of the group G. To see this, we first construct

G/H := {gH |g ∈ G}

where the coset gH is defined by

gH = {gh|h ∈ H}.

Note first that if G = H , then there is one element in G/H . If G 6= H and

gH, g′H ∈ G/H , we note that

gH ∩ g′H = φ.

For otherwise, if x ∈ gH∩g′H , then x = gh = g′h′, or g−1g′ ∈ H and gH ⊂ g′H .

Similarly, g′H ⊂ gH and we have gH = g′H. Since gH and g′H are disjoint for

distinct cosets, we conclude that there are finitely many distinct sets gH in G/H .

Furthermore, each g ∈ G must be in one of these cosets. This implies that

G = ∪gH.

Using the map

ϕ : H → gH,

we find that

|H | = |gH |.

Therefore,

|G| =k∑

j=1

|gH | = k|H |.

Hence, |H | | |G| and this is Lagrange’s Theorem for subgroups of finite groups.

Now, one can see that for any element g ∈ G, the set

H = {gm|m ∈ Z}

forms a finite group of order d, where d is the least positive integer such that

gd = 1. Since H is a subgroup of G with |H | = d, by Lagrange’s Theorem,

d | |G|, or |G| = d`. Now, gd = 1, where 1 is the identity of the group G. Hence

g|G| = (gd)` = 1. We have thus shown that

g|G| = 1

2.4 The Chinese Remainder Theorem 25

for all g ∈ G. Applying this to our group (Z/nZ)∗, we find that for any integer

a such that (a, n) = 1,

aϕ(n) ≡ 1 (mod n),

since ϕ(n) is the number of elements of (Z/nZ)∗.

Remark 2.6 Let G = (Z/nZ)∗. The elementary proof of Euler’s Theorem

follows from the fact that the multiplication of a fixed element g of G induces a

permutation of the elements in G. In other words, if we define a map

ξ : G→ G

by ξ(g1) = g · g1, then ξ is a bijection of G.

Remark 2.7 Euler’s Theorem can be used to solve Linear congruences such as

ax ≡ c (mod m)

when (a,m) = 1. One simply multiplies aϕ(m)−1 to both sides of the congruence

to obtain

aϕ(m)x ≡ aϕ(m)−1c (mod m).

2.4 The Chinese Remainder Theorem and solving linearcongruence equations

theorem 2.8 Let m and n be such that (m,n) = 1. Let

A(t) = Z/tZ.

Then the map

ξ : A(mn) → A(m)×A(n)

defined by

ξ([k]mn) = ([k]m, [k]n)

is a bijection.

Proof

First, note that ξ is a well defined map. For, if [j]mn = [k]mn then ξ([j]mn) =

([j]m, [j]n). But j ≡ k (mod mn) implies that j ≡ k (mod m) and j ≡ k

(mod n). In other words, ξ([j]mn) = ([k]m, [k]n) and the map is well defined.

Note that the number of elements in A(mn) and A(m) × A(n) are equal. It

suffices to show that ξ is an injection. Suppose

([`]m, [`]n) = ([k]m, [k]n).

Then m|(` − k) and n|(` − k). This implies, by Example 1.21 that mn|(` − k)

since (m,n) = 1. Hence, [`]mn = [k]mn and the map ξ is injective and hence the

map ξ is a bijection.


Theorem 2.8 is known as the Chinese Remainder Theorem. It says that given

integers m and n such that (m,n) = 1, the simultaneous congruence equations

x ≡ h (mod m) and x ≡ k (mod n) (2.2)

can be solved. In other words, there exist an integer x satisfying (2.2).

The Chinese Remainder Theorem can be extended to more than two positive

integersm and n. More precisely, we can show that ifm1,m2, · · · , ms are pairwise

relatively prime, then the system of equations

x ≡ a1 (mod m1)

x ≡ a2 (mod m2)

...

x ≡ as (mod ms)

is solvable.

The usual way to prove the Chinese Remainder Theorem is by constructing

an x that is the solution of

x ≡ a (mod m) and x ≡ b (mod n).

To do so, one sets

x = nta+msb,

where ms ≡ 1 (mod n) and nt ≡ 1 (mod m). Note that with this choice of x,

x ≡ nta+msb ≡ a (mod m)

and

x ≡ b (mod n).

The only difficulty now is to determine the value s and t that satisfy

ms ≡ 1 (mod n) and nt ≡ 1 (mod m).

This amounts to solving linear congruence of the form

aX ≡ 1 (mod k). (2.3)

Note that this linear congruence is solvable only if (a, k) = 1. In this case, there

exists u and v such that

1 = au+ kv.

Hence X = u is a solution to the linear congruence (2.3).

example 2.7

Using the second remark above, find the least positive integer x such that x ≡ 5

(mod 7), x ≡ 7 (mod 11) and x ≡ 3 (mod 13).

2.5 Multiplicative functions and even perfect numbers 27

2.5 Multiplicative functions and even perfect numbers

definition 2.9 An arithmetical function f is a function from N to C.

definition 2.10 An arithmetical function f is said to be multiplicative if

f(1) = 1 and when (m,n) = 1,

f(mn) = f(m)f(n).

definition 2.11 An arithmetical function f is said to be completely multi-

plicative if f(1) = 1 and for all positive integers m and n,

f(mn) = f(m)f(n).

example 2.8

The functions N(n) = n and u(n) = 1 are completely multiplicative.

definition 2.12 The Mobius function µ(n) defined by µ(1) = 1 and for

n =∏m

k=1 pαk

k ,

µ(n) =

{(−1)m if αi = 1, 1 ≤ i ≤ m

0 otherwise..

Note that µ(n) is a multiplicative function.

definition 2.13 Let σ(1) = 1 and σ(n) be the sum of all divisors of n. We

write

σ(n) =∑

d|n

d.

example 2.9

Show that σ(n) is multiplicative and deduce that an even integer N is perfect if

and only if N = 2k−1(2k − 1) where 2k − 1 is prime.

Solutions. Suppose k, ` are relatively prime. Then each divisor d of k` can be

expressed in the form d1 and d2 where d1|k and d2|`. We thus write

σ(k`) =∑

d|k`

d =∑

d1d2|k`

d1d2

=∑

d1|k

d1∑

d2|`

d2 = σ(k)σ(`).

Next, we observe N is a perfect number if and only if

σ(N) = 2N.

Now, if N = 2k−1(2k − 1) with 2k − 1 a prime, then the divisors of N are

2j, 0 ≤ j ≤ k − 1 and 2j(2k − 1), 1 ≤ j ≤ k − 1. The sum of divisors is

2k − 1 + (2k − 1)(2k − 1) = 2k(2k − 1) = 2N.


Hence, N is perfect.

Conversely, if N is even and perfect. Write N = 2k−1m, k ≥ 2 and m odd.

Since σ(n) is multiplicative and (2k−1,m) = 1, we conclude that

σ(N) = σ(2k−1)σ(m) =(1 + 2 + · · ·+ 2k−1

)σ(m) = (2k − 1)σ(m). (2.4)

But N is perfect and this implies that

σ(N) = 2N = 2km. (2.5)

From (2.4) and (2.5), we deduce that

(2k − 1)σ(m) = 2km.

Since (2k − 1, 2k) = 1, by Euclid’s Lemma, we deduce that

(2k − 1)|m. (2.6)

By (2.6), we may write

m = (2k − 1)s, (2.7)

with s ≥ 1. With this expression for m, we find using (2.4) and (2.5) that

σ(m) = 2ks. (2.8)

If s > 1 then 1, s and (2k − 1)s are all divisors of m. Hence,

σ(m) ≥ 1 + s+ (2k − 1)s > s+ (2k − 1)s = 2ks.

This contradicts (2.8). Therefore s = 1, m = 2k − 1 and σ(m) = 2k. But this

means that 1 and 2k − 1 are the only divisors of m and hence m = 2k − 1 must

be a prime.

Remark 2.14 The method we used to prove that σ(n) is multiplicative can be

modified to show that ∑

d|n

f(d)g(nd

)

is multiplicative if f and g are multiplicative. The expression∑

d|n

f(d)g(nd

),

denoted by (f ∗ g)(n) is called the Dirichlet product of f and g.

2.6 The Euler ϕ-function is multiplicative

We show next that ϕ(n) is also a multiplicative function.

theorem 2.15 For positive integers m and n such that (m,n) = 1,

ϕ(mn) = ϕ(m)ϕ(n).

2.6 The Euler ϕ-function is multiplicative 29

We will prove a fact.

Let

M(t) := (Z/tZ)∗.

Note that |M(t)| = ϕ(t). Consider the map

ψ :M(mn) →M(m)×M(n)

defined by

ψ([k]mn) = ([k]m, [k]n).

Note that ψ = ξ∣∣M(mn)

. From the proof of Theorem 2.8, we see that given

([u]m, [v]n) ∈M(m)×M(n), there exists [h]mn ∈ A(mn) such that

ξ([h]mn) = ([u]m, [v]n).

We need to show that [h]mn ∈M(mn). This follows from the fact that if [h]m ∈M(m) and [h]n ∈ M(n), then [h]mn ∈ M(mn). In other words, if (h,m) = 1

and (h, n) = 1 then (h,mn) = 1. This proves that the map is surjective. Now

the map ψ is injective because ξ is injective. We therefore conclude that ψ is a

bijection and

ϕ(mn) = ϕ(m)ϕ(n).

corollary 2.16 Let n =∏

p pαp . Then

ϕ(n) =∏

p

ϕ(pαp).

We end this section with another proof that ϕ(n) is multiplicative. We begin

with the following theorem:

theorem 2.17 Let µ(n) be the Mobius function. Then

∑

d|n

µ(d) =

{1 if n = 1,

0 otherwise.(2.9)

Proof

First, we note that the left hand side (2.9) is µ ∗ u where u(n) = 1 for all

positive integers n. From Remark 2.14, we know that µ ∗ u is multiplicative.

Hence µ ∗ u(1) = 1. To determine µ ∗ u(n) for n > 1, it suffices to determine the

values of µ ∗ u at prime powers. Now,∑

d|pα

µ(d) = 1− 1 = 0,

if α ≥ 1. Hence µ ∗ u(n) = 0 for n > 1. Note that the right hand side of (2.9)

can be written as [1/n] where, [x] is the integer part of x.


We now give another proof that ϕ(n) is multiplicative. Note that

ϕ(n) =n∑

k=1(k,n)=1

1 =n∑

k=1

[1

(k, n)

]

=n∑

k=1

∑

`|(k,n)

µ(`) =∑

`|n

µ(`)n∑

k=1`|k

1

=∑

`|n

µ(`)n

`.

Hence ϕ(n) is the Dirichlet product of the multiplicative functions µ(n) and

N(n) and by Remark 2.14, ϕ(n) is multiplicative.

2.7 An Application to Cryptography

The purpose of studying Cryptography is to protect information transmitted

through public communication networks. In the language of cryptography, the

information to be concealed is called plaintext. The message in secret form is

called ciphertext. The purpose of converting plaintext to ciphertext is said to be

encrypting while the reverse is called decrypting.

In 1977, R. Rivest, A. Shamir and L. Adleman proposed a public key cryptosys-

tem, which uses only elementary ideas from Number Theory. The enciphering

system is now called RSA. Its security depends on the assumption that in the

current state of computer technology, the factorization of composite numbers

with large prime factors is prohibitively time-consuming.

Each user of RSA system chooses a pair of distinct primes p, q large enough

so that the factorization of n = pq (enciphering modulus) is beyond the current

computational capabilities. For example one may select p and q with 200 digits

and n would have 400 digits. Having selected n, the user then chooses a random

positive integer k such that (k, ϕ(n)) = 1. The pair (n, k) is placed in a public

file, as user’s personal encryption key. This will allow anyone else to encrypt and

send a message to that individual. Note that while n is openly revealed the listed

public key does not mention the factors p and q of n.

The encryption begins with the conversion of the message into digital alpha-

bets. For example,

A = 01, B = 02, · · ·Z = 26, Space = 00.

Under this scheme, the message ”The brown fox is quick” is transformed into

M = 20080500021815231400061524000919001721090311.

We assume that M < n. If M > n we can break M into blocks of digits

2.7 An Application to Cryptography 31

M1,M2, · · · ,Mr, Mi < n. To encrypt, raised M to kth power (mod n), i.e.,

Mk ≡ r (mod n)

and send r.

To decrypt, first recall that k is chosen such that (k, ϕ(n)) = 1. Therefore

there exist j such that

kj ≡ 1 (mod ϕ(n)).

This implies that

rj ≡M jk ≡Mϕ(n)·m+1 (mod n).

If M is coprime to n, then

Mϕ(n) ≡ 1 (mod n).

Therefore

rj ≡M (mod n),

and we retrieve the message. Note that the number j, called the recovery ex-

ponent, can only be calculated by someone who knows ϕ(n) = (p − 1)(q − 1).

Therefore, j is secure from an illegitimate third party whose knowledge is limited

to the public key (n, k).

We have assumed above that M is relatively prime to n. Suppose M is not

relatively prime to n = pq. Then (M,pq) = p, q, or pq. If the is pq then since

M < pq, M = 0. Suppose without loss of generality that (M,pq) = p. Then

kj ≡ 1 (mod ϕ(n))

implies that

kj ≡ 1 (mod q − 1),

rj ≡Mkj ≡M (mod q)

and

0 ≡Mkj ≡M (mod p).

Hence,

rj ≡M (mod pq)

and we retrieve the message.

example 2.10

Let p = 29, q = 53, pq = 1537. Note that

ϕ(n) = 1456.


Take k = 47. Then j = 31 since 31 ·47 ≡ 1 (mod 1456). To encrypt the message

”No Way”, we have

M = 141500230125.

We divide M into blocks of 3:

141 500 230 125.

Raised to the power of 47 modulo 1537, we have

0658 1408 1250 1252.

This is the ciphertext. Note that to retrieve we raise to the power of 31:

65831 ≡ 141 (mod 1537).

2.8 Appendix : Group Action and Fermat’s Little Theorem

definition 2.18 If X is a set and G is a group, then G acts on X if there

exists a function α : G×X → X , called an action such that

(i) for g, h ∈ G, then αg ◦ αh = αgh.

(ii) α1 = 1SXthe identity function.

From now on, we will write the action of g on x as g · x instead of αg(x). The

above conditions are then replaced by gh · x = g · (h · x) and 1G · x = x.

definition 2.19 If G acts on a set X , then the orbit of x (x ∈ X), denoted

by O(x), is the subset of X given by

O(x) = {g · x}.

The stabilizer of x, denoted by Gx, is the subgroup of G given by

Gx = {g ∈ G|g · x = x}.

We recall the following Theorems:

theorem 2.20 If G acts on X, then X is a disjoint union of the orbits. If X

is finite, then |X | =∑i |Oxi|, where one xi is chosen from each orbits.

Proof

This follows from the fact that if Ox ∩ Oy 6= φ, then Ox = Oy. Let zOx ∩ Oy.

Then z = g ·x = g′ ·y. This implies that x = g−1g′ ·y and so Ox ⊂ Oy. Similarly,

Oy ⊂ Ox and hence, Ox = Oy.

2.8 Appendix : Group Action and Fermat’s Little Theorem 33

theorem 2.21 If G acts on a set X and x ∈ X, then

|Ox| = |G : Gx|.

Proof

This follows by defining the map

ξ : G/Gx → Ox

with

ξ(gGx) = g · x.

One must check that this map is well defined and bijective. We leave this as

an exercise.

lemma 2.22 (i) Let a group G act on a set X . If x ∈ X and σ ∈ G then

Gσx = σGxσ−1.

(ii) If a finite group G acts on a finite set X and if x and y lie in the same

orbit, then |Gy| = |Gx|.

Proof

We again leave the above as exercises.

theorem 2.23 Let G act on a finite set X. If N is the number of orbits, then

N =1

|G|∑

g∈G

|Fix(g)|,

where Fix(g) is the set of elements x ∈ X fixed by σ.

Proof

To prove the above, we consider∑

g∈G,x∈Xg·x=x

1 =∑

g∈G

∑

x∈Xg·x=x

1

=∑

g∈G

|Fix(g)|.

On the other hand,∑

g∈G,x∈Xg·x=x

1 =∑

x∈X

∑

g∈Gg·x=x

1

=∑

x∈X

|Gx|

=

m∑

j=1

∑

x∈O(j)

|Gx|,


where O(j), 1 ≤ j ≤ m are the distinct orbits. Note that O(j) = Oxjfor some

xj ∈ X . Since |Gx| = |Gy| for any x, y ∈ O(j), we conclude that∑

x∈O(j)

|Gx| = |G|/|O(j)| · |O(j)| = |G|.

Therefore,

m =1

|G| =∑

g∈G

|Fix(g)|.

We are now ready to give another proof of Fermat’s Little Theorem. Consider

a p-gon where p is a prime. Suppose we want to color the vertices of the p-gon

using n colors. How many ways can we do this?

Let X contains the p-tuples (a1, a2, · · · , ap) where ai takes one of the n colors.

Let G be the cyclic group generated by σ := (1, 2, · · · , p). Let G acts on X by

σj · (a1, a2, · · · , ap) = (aσj(1), aσj2, · · · , aσjp).

Note that under this action, the orbit containing (a1, a2, · · · , ap) are

{(ai+1, ai+2, · · · , ap, a1, · · · , ai), 0 ≤ i ≤ p− 1}.

This means that if a colored polygon can be obtained from another colored

polygon via rotation, we only count the polygon once. As such, the number of

orbits is the number of polygons with distinct colorings. Hence the number of

orbits is1

p

(|Fix((1)(2) · · · (p))| + |Fix(σ)|+ · · ·+ |Fix(σp−1)|

).

Each σi, i 6= 0, is a p-cycle and so the total number of x fixed by σi is n. The

number of x fixed by (1)(2) · · · (p) is np. Since the number of orbits is an integer,

we find that

np ≡ n (mod p),

which is Fermat’s Little Theorem.

3 Solutions of congruences

3.1 The Chinese Remainder Theorem revisited

Let f(x) be a polynomial with coefficients in Z. A congruence equation is of the

form

f(x) ≡ 0 (mod m),

where m is a positive integer. In this Chapter, we will study such equations.

theorem 3.1 Let f(x) be a fixed polynomial with integral coefficients, and for

any positive integerm letN(m) denote the number of solutions of the congruence

f(x) ≡ 0 (mod m).

If m = m1m2 where (m1,m2) = 1, then N(m) = N(m1)N(m2). If m =∏

p pαp

is the canonical factorization of m, then

N(m) =∏

p

N(pαp).

Proof

Let m be a positive integer. Suppose (m1,m2) = 1 and that there are N(m1)

solutions, say {a1, · · · , aN(m1)}, to

f(x) ≡ 0 (mod m1)

and N(m2) solutions, say {b1, · · · , bN(m2)}, to the equation

f(x) ≡ 0 (mod m2).

To each pair of solutions (ai, bj) there exists an integer cij modulo m such that

f(cij) ≡ 0 (mod m1)

and

f(cij) ≡ 0 (mod m2).

The existence of cij is guaranteed by the Chinese Remainder Theorem. This

implies that

f(cij) ≡ 0 (mod m)


and therefore

N(m1)N(m2) ≤ N(m).

Next, if

f(x) ≡ 0 (mod m)

and m = m1m2, (m1,m2) = 1, then by Chinese Remainder Theorem, there are

N(m) pairs (a mod m1, b (mod m2)) such that

f(a) ≡ 0 (mod m1)

and

f(b) ≡ 0 (mod m2).

This implies that

N(m) ≤ N(m1)N(m2).

Hence,

N(m1m2) = N(m1)N(m2).

example 3.1

Let f(x) = x2 + x+ 3. Find all roots of the congruence f(x) ≡ 0 (mod 15).

Solutions. The solutions for f(x) ≡ 0 (mod 15) are 3, 6, 8, 11. The solutions for

f(x) ≡ 0 (mod 3) are 0 and 2 and for f(x) ≡ 0 (mod 5) are 1 and 3. Note that

N(15) = N(3)N(5).

3.2 Prime power moduli

Theorem 3.1 shows that in order to solve

f(x) ≡ 0 (mod m),

it suffices to solve

f(x) ≡ 0 (mod pα) (3.1)

for pα‖m where p is a prime.

We now show that in order to solve (3.1), if suffices to find the solutions of

f(x) ≡ 0 (mod p).

By Taylor’s series expansion,

f(a+ tpj) = f(a) + tpjf ′(a) + t2p2jf ′′(a)/2 + · · ·+ tnpnjf (n)(a)/n!, (3.2)

3.2 Prime power moduli 37

where n is the degree of f(x). Observe that if

f(x) =

n∑

r=0

crxr,

then

f (k)(x)

k!=

n∑

r=k

cr

(r

k

)xr−k.

Therefore,

f (k)(a)

k!

is an integer for 0 ≤ k ≤ n. We now suppose that

f(a) ≡ 0 (mod pj).

Since

tkpkjf (k)(a)

k!≡ 0 (mod pj+1)

for k > 1 (2j > j for j ≥ 1), we conclude from (3.2) that

f(a+ tpj) ≡ f(a) + tpjf ′(a) (mod pj+1). (3.3)

We now return to (3.3). We see that we need to choose

tf ′(a) ≡ −f(a)pj

(mod p). (3.4)

We now split our investigation into cases:

Case 1. p - f ′(a).

In this case, there is a unique solution t for (3.4) and hence we obtain

theorem 3.2 (Hensel’s Lemma) Suppose that f(x) is a polynomial with in-

tegral coefficient. If f(a) ≡ 0 (mod pj) and f ′(a) 6≡ 0 (mod p), then there is a

unique t (mod p) such that f(a+ tpj) ≡ 0 (mod pj+1).

Case 2.

If p|f ′(a) but p -f(a)

pjthen (3.4) is not solvable.

Case 3.

If p|f ′(a) and p|f(a)pj

then there are p solutions for t in (3.4).


Remark 3.3 Note that if u is solution of

f(x) ≡ 0 (mod pj+1), (3.5)

then by the fact that we can always write u = a + tpj for some a and t where

0 ≤ a < pj and 0 ≤ t ≤ p, we conclude that all solutions of (3.5) are of the form

a+ tpj, with

f(a) ≡ 0 (mod pj), (3.6)

and so, we have exhausted all possible solutions to (3.5) by considering solutions

of (3.6).

example 3.2

Solve x3 − 2x2 + 3x+ 9 ≡ 0 (mod 33).

Solutions. The solutions to the congruence

f(x) ≡ 0 (mod 3)

are x = 0 and 2.

Now,

f ′(x) = 3x2 − 4x+ 3.

Since f ′(2) 6≡ 0 (mod 3), we see that there is a unique solution arising for x = 2.

We need to solve

7t ≡ −15/3 (mod 3)

and it turns out that t = 1. The solution for the congruence f(x) ≡ 0 (mod p)

arising from x = 2 is therefore 2 + 3 = 5.

Now

f ′(0) = 3 and f(0) = 9.

Hence there are three solutions for

f(x) ≡ 0 (mod 9)

corresponding to the solution x = 0 from

f(x) ≡ 0 (mod 3).

These are x = 0, 3 and 6.

We now have four solutions for

f(x) ≡ 0 (mod 32).

We check that f(0) = 9 and f ′(0) = 3 and so

f(x) ≡ 0 (mod 27)

has no solution arising from x = 0.

Next, f(3) = 27 and f ′(3) = 18 and so, there are three solutions arising from

x = 3. These are x = 3, 12 and 21.

3.3 Prime moduli 39

For x = 6 we have f(6) = 171 and f ′(6) = 81. But 3 - (171/9) and hence,

there are no solutions arising from x = 6.

For x = 5 we find that f(5) = 99 and f ′(5) = 58 and we need to solve the

congruence

58t ≡ 11 (mod 3).

The solution is t = 1 and so 14 is the solution for the congruence.

In conclusion the solutions to the congruence

f(x) ≡ 0 (mod 27)

are 3,12,14 and 21.

3.3 Prime moduli

In the previous section, we have seen that we can reduce the problem of finding

solutions for f(x) ≡ 0 (mod pα) to finding solutions for f(x) ≡ 0 (mod p).

We will first prove a result for polynomials over a field F, that is an analogue

for the Division Algorithm for integers.

theorem 3.4 Given the polynomials f(x), g(x) ∈ F[x], where deg g(x) > 0,

there exist polynomials q(x), r(x) ∈ F[x] such that

f(x) = g(x)q(x) + r(x),

with either r(x) = 0 or 0 ≤ deg r(x) < deg g(x).

Proof

We proceed by induction on deg f(x). First fix the polynomial g(x). Suppose

f(x) = 0 or deg f(x) < deg g(x), then f(x) = 0 · g(x)+ f(x). So we may assume

that deg f(x) ≥ deg g(x).

Suppose the statement is true for any polynomials with degrees less than or

equal to n− 1. Let f(x) be a polynomial of degree n. Write

f(x) = a0 + a1x+ · · ·+ anxn,

and

g(x) = b0 + b1x+ · · ·+ bmxm,

with an 6= 0 and bm 6= 0 and n ≥ m. Since F is a field, b−1m exists and the

polynomial

P (x) := f(x)− b−1m anx

n−mg(x) (∗)

has degree n− 1. Hence, there exist q(x) and r(x) such that

P (x) = q(x)g(x) + r(x),


with deg r(x) = 0 or deg r(x) < deg g(x). But by (*),

f(x) = (q(x) + b−1m anx

n−m)g(x) + r(x),

hence the result.

We now let F be the finite field Z/pZ. Let g(x) be the polynomial xp − x and

suppose the degree of f(x) is n ≥ p. By Theorem 3.4, we conclude that

f(x) = (xp − x)q(x) + r(x),

with q(x), r(x) ∈ (Z/pZ) [x] and 0 ≤ deg r(x) < p.

Suppose u is such that

f(u) ≡ 0 (mod p).

By Fermat’s Little Theorem, we find that up−u ≡ 0 (mod p). Therefore, r(u) ≡0 (mod p). Conversely, if r(u) ≡ 0 (mod p), then f(u) ≡ 0 (mod p). This shows

that to study f(x) ≡ 0 (mod p), it suffices to consider those polynomial f(x)

with degree of f(x) is less than p.

theorem 3.5 (Lagrange) Let f(x) be a polynomial of degree n < p and

an 6≡ 0 (mod p). Then the congruence

f(x) ≡ 0 (mod p) (3.7)

has at most n solutions.

Proof

We prove this by induction on the degree of f(x). If n = 0, a0 6≡ 0 (mod p)

implies that there are no solution to (3.7). If the degree of f(x) is 1, then we see

that we are solving

a1x+ a2 ≡ 0 (mod p).

This is solvable since a1 6≡ 0 (mod p). Therefore the number of solution is 1.

Suppose the result holds for all polynomials of degree less than n. Let f(x) be a

polynomial of degree n. If f(x) ≡ 0 (mod p) has no solution, then we are done.

Next, suppose u be a root of the nth degree polynomial f . Then by the division

algorithm for Q[x],

f(x) = (x− u)g(x) + r0

for some polynomial g(x) of degree n− 1. Hence,

r0 ≡ 0 (mod p)

since

f(u) ≡ 0 (mod p) and (u− u)g(u) ≡ 0 (mod p).

Therefore,

f(x) ≡ (x− u)g(x) (mod p).

3.3 Prime moduli 41

If b is a root of f(x) ≡ 0 (mod p), then either p|(b − u) or p|g(b), by Euclid’s

lemma. If p|g(b), then g(b) ≡ 0 (mod p) and by induction there are at most n−1

possible values for b. Therefore, we conclude that f(x) can have at most n roots.

The above result is false if p is not a prime. For example the congruence

equation

x2 ≡ 1 (mod 8)

has four solutions x = 1, 3, 5, 7.

corollary 3.6 If d|(p− 1) then the congruence xd ≡ 1 (mod p) has exactly

d solutions.

Proof

By previous theorem, the congruence cannot have more than d solutions. For

the converse, let p− 1 = de. Note that

xp−1 − 1 = (xd − 1)(xd(e−1) + xd(e−2) + · · ·+ xd + 1).

By the previous theorem, the polynomial xd(e−1)+xd(e−2)+ · · ·+xd+1 cannot

have more than de − d or p − 1 − d solutions. Hence, if xd − 1 has less than d

solutions then xp−1 − 1 must have less than p − 1 − d + d = p − 1 solutions.

But there are exactly p− 1 solutions to the equation xp−1 − 1 by Fermat’s little

theorem. This contradiction shows that xd−1 must have exactly d solutions.

example 3.3

The congruence x10 ≡ 1 (mod 811) has ten solutions and these are

1, 212, 241, 311, 339, 472, 500, 570, 599 and 810.

Remark 3.7 In general, if d - (p−1), the number of solutions for the congruence

xd ≡ 1 (mod p)

is (d, p− 1). To see this, we need the fact that the multiplicative group of Z/pZ

is cyclic, say, generated by g. This fact will be proved in the next Chapter. Now,

if u is a solution to

ud ≡ 1 (mod p),

then u = gy for some integer y since g generates (Z/pZ)∗. Therefore,

gyd ≡ 1 (mod p).

But since the order of g is p− 1, we conclude that p− 1 divides yd. This implies

that

yd ≡ 0 (mod p− 1)

and the number of solutions to this congruence is (d, p− 1) by Theorem 1.40.


theorem 3.8 (Wilson) If p is prime then

(p− 1)! ≡ −1 (mod p).

Proof

If p = 2, then 1 ≡ −1 (mod 2). Let p be an odd prime. By Fermat’s little

theorem, we know that the polynomial

xp−1 − 1− (x− 1)(x− 2) · · · (x − (p− 1))

has at least p− 1 roots. Since the degree of the above polynomial is p− 2 and

such polynomial can have at most p− 2 roots, we conclude that the polynomial

must be identically 0 modulo p. Setting x = 0 we conclude that

(p− 1)! ≡ −1 (mod p)

since p is odd.

Remark 3.9 Wilson’s Theorem is false when the prime p is replaced by a com-

posite number m. Suppose m is composite. Let

m =∏

k

pαk

k .

If m is not a prime power, then

m− 1 = pα11 m′ − 1 > pα1

1

and so

(m− 1)! ≡ 0 (mod pα11 ).

If m is a prime power, then let m = pα. If α > 2, then m − 1 > pα−1 and p

and pα−1 are distinct divisors of (m− 1)! and hence

(m− 1)! ≡ 0 (mod pα).

It remains to show the result for α = 2, or when m = p2. In this case m− 1 =

p2−1 > p. The only point we have to worry is that m−1 < 2p. For if m−1 ≥ 2p

then both p and 2p are distinct divisors of (m− 1)! and hence,

(m− 1)! ≡ 0 (mod p2).

But if p2 − 1 < 2p we have p(p − 2) < 1 and this is only possible if p = 2. We

have thus prove that

(m− 1)! ≡ 0 (mod m)

if m > 4 and composite. For p = 2 and α = 2 we have 3! ≡ 2 6≡ −1 (mod 4).

Hence, we find that

(m− 1)! ≡ −1 (mod m)

if and only if m is a prime.

3.4 Wolstenholme’s congruence 43

3.4 Wolstenholme’s congruence

Let

F (x) = (x− 1)(x− 2) · · · (x− (p− 1)).

Write

F (x) = xp−1 − σ1xp−2 + σ2x

p−3 − · · ·+ σp−3x2 − σp−2x+ σp−1,

where σj denotes the sum of all products of j distinct roots of F (x). In the

proof of Theorem 3.8, we have seen that the polynomial

f(x) = xp−1 − 1− (x− 1)(x− 2) · · · (x− (p− 1)) = xp−1 − 1− F (x)

is the zero polynomial in (Z/pZ)[x]. This means that in (Z/pZ)[x],

xp−1 − 1 = F (x) = (x − 1)(x− 2) · · · (x− (p− 1))

= xp−1 − σ1xp−2 + σ2x

p−3 − · · ·+ σp−3x2 − σp−2x+ σp−1.

Comparing the coefficients of xj on both sides of the above polynomials in

(Z/pZ)[x], we conclude that

σj ≡ 0 (mod p), 1 ≤ j ≤ p− 2 (3.8)

We now prove a result stronger that (3.8) when j = p− 2.

theorem 3.10 (Wolstenholme’s congruence) For prime p ≥ 5,

σp−2 ≡ 0 (mod p2).

Proof

Note that

F (p) = (p− 1)! = pp−1 − σ1pp−2 + · · · − σp−2p+ (p− 1)!.

Now,

pp−2 − σ1pp−3 + · · ·+ σp−3p− σp−2 = 0.

By (3.8),

σp−3 ≡ 0 (mod p)

and since p ≥ 5, we find that

σp−2 ≡ 0 (mod p2).

4 Primitive roots

4.1 Order of an element in a group

Recall from the theory of finite groups that if G is a group and g ∈ G then the

smallest positive integer for which gh = 1 is called the order of g in G.

Let G = (Z/mZ)∗. Let [a]m ∈ G. We say that h is the order of [a]m modulo

m if h is the least positive integer such that

ah ≡ 1 (mod m).

theorem 4.1 If a has order h modulo m and ak ≡ 1 (mod m) for some

positive integer k, then h|k.

Proof

Write k = hq + r, r < h. If r 6= 0 then gr ≡ 1 (mod p) and r < h. This

contradicts the minimality of h and hence h|k.

corollary 4.2 If (a,m) = 1, then the order of a modulo m divides ϕ(m).

Proof

From Euler’s Theorem,

aϕ(m) ≡ 1 (mod m).

By the previous theorem, h|ϕ(m).

lemma 4.3 If a has order h modulo m, then ak has order h/(h, k) modulo m.

Proof

Let d = (h, k). Let r be the order of ak. Then ark ≡ 1 (mod m) and hence h|rk.Hence,

h

d

∣∣∣∣(r · k

d

).

Since (h/d, k/d) = 1, we conclude that (h/d)|r.Next,

a(h/d)k = (ak)(h/d) ≡ 1 (mod m).

4.2 Integers m for which (Z/mZ)∗is cyclic 45

Hence, r|(h/d). This implies that

r = h/d = h/(h, k).

lemma 4.4 If a has order h modulo m, b has order k modulo m, and if (h, k) =

1, then ab has order hk modulo m.

Proof

Let the order of ab be r. Note that (ab)hk ≡ 1 (mod m). Hence r|hk. Next,(ab)r ≡ 1 (mod m). We have

ar ≡ b−r (mod m).

Raising both sides to the power of h, we conclude that

brh ≡ 1 (mod m).

Since k is the order of b, we conclude that k|rh. But (k, h) = 1 and hence k|r.In a similar way, we deduce that h|r. Finally, since (h, k) = 1, hk|r.

4.2 Integers m for which (Z/mZ)∗ is cyclic

definition 4.5 If g has order ϕ(m) in (Z/mZ)∗ then g is called a primitive

root modulo m.

Remark 4.6 When a primitive exists for (Z/mZ)∗, we see that (Z/mZ)∗ is a

cyclic group of order ϕ(m).

We first turn to the number of primitive roots in (Z/mZ)∗, assuming that

M(m) := (Z/mZ)∗ is cyclic.

Let g be a primitive root ofM(m). Then every element inM(m) is of the form

gk. By Lemma 4.3, the order of gk is ϕ(m)/(ϕ(m), k) and so gk is a primitive

root if and only if (ϕ(m), k) = 1. We have thus shown that

theorem 4.7 If M(m) is cyclic, then there are precisely ϕ(ϕ(m)) primitive

roots modulo m.

We will next show that M(m) is cyclic (or in another words, primitive roots

modulo m exists) if and only if m = 1, 2, 4, pα and 2pα, where p is an odd prime

and α ≥ 1.

We will first establish the fact that if m = 1, 2, 4, pα and 2pα, where p is an

odd prime, then M(m) is cyclic.

We are therefore left with the cases m = 1, 2, 4, pα and 2pα.

There is nothing to prove for m = 1 and 2. For m = 4 we pick g = 3 and this

is a primitive root modulo 4.

We now prove that if m = p where p is an odd prime then M(m) is cyclic.

46 Primitive roots

definition 4.8 We will use ∑

d|n

f(d)

to denote the sum of f(d) over all divisors d of n.

Note that we have already used the above notation for the definition of σ(n).

lemma 4.9 If n is an integer ≥ 1, then

n =∑

d|n

ϕ(d).

Proof

We partitioned the set {k|1 ≤ k ≤ n} into disjoint sets {(n, `) = d|1 ≤ ` ≤n}.

We now return to the fact that n = p− 1. Suppose y is an element of order d.

Then by Corollary 3.6, there are exactly d elements satisfying

xd ≡ 1 (mod p)

and these elements must be

{1, y, · · · , yd−1}.

In particular, by Lemma 4.3, there will be exactly ϕ(d) elements of order d in

M(p) if there were an element of order d in M(p). In other words, the number

of elements of order d is either 0 or ϕ(d).

Now, each element in M(p) has an order and we let f(d) be the number of

elements having order exactly d. By Lagrange’s Theorem, an order of an element

divides order of M(p) and hence we have

M(p) =⋃

d|(p−1)

{u ∈M(p)|u has order d.}

and this implies that

p− 1 =∑

d|(p−1)

f(d) ≤∑

d|(p−1)

ϕ(d).

By the above lemma, we conclude that f(d) = ϕ(d) and in particular, ϕ(p−1) 6=0 and hence M(p) is cyclic. We have proved that

theorem 4.10 The group M(p) is cyclic.

We now let α > 1 and show that M(m) is cyclic when m = pα, with p an odd

prime.

lemma 4.11 Let p be an odd prime. There exists a primitive root g modulo p

such that

gp−1 6≡ 1 (mod p2).

4.2 Integers m for which (Z/mZ)∗is cyclic 47

Proof

Let h be a primitive root modulo p. If hp−1 6≡ 1 (mod p2) then there is nothing

to prove. Suppose

hp−1 ≡ 1 (mod p2). (4.1)

Then g = h+ p is another primitive root modulo p and by (4.1),

(h+ p)p−1 ≡ hp−1 + (p− 1)php−2 ≡ 1 + (p− 1)php−2 (mod p2).

Note that if

(p− 1)php−2 ≡ 0 (mod p2),

then p|h and h cannot be a primitive root modulo p. Hence,

gp−1 ≡ hp−1 + (p− 1)php−2 6≡ 1 (mod p2).

lemma 4.12 Let g be a primitive root modulo p such that

gp−1 6≡ 1 (mod p2).

Then for every α ≥ 2, we have

gϕ(pα−1) 6≡ 1 (mod pα).

Proof

For α = 2, the conclusion follows from our choice of g, which is constructed

from the previous lemma. We now establish our result by induction. Suppose

gϕ(pα−2) 6≡ 1 (mod pα−1). (4.2)

We want to show that

gϕ(pα−1) 6≡ 1 (mod pα).

By Euler’s Theorem,

gϕ(pα−2) ≡ 1 (mod pα−2)

and hence,

gϕ(pα−2) = 1 + spα−2,

for some integer s. By (4.2), we conclude that p - s.

Since p - s, we find that

gpϕ(pα−2) = gϕ(pα−1

) ≡ 1 + tpα−1 6≡ 1 (mod pα).

This completes the proof of the theorem.

theorem 4.13 The group M(pα) is cyclic.

48 Primitive roots

Proof

From Lemma 4.12, we know that there exists a primitive g modulo p such that

gϕ(pα−1) 6≡ 1 (mod pα).

We will show that the order of this element is ϕ(pα). Let ` be the order of

gϕ(pα−1). By Euler’s Theorem,(gϕ(pα−1)

)p≡ gϕ(pα) ≡ 1 (mod pα)

since

ϕ(pα) = pϕ(pα−1).

Hence `|p and ` = p since ` 6= 1. Let k be the order of g. Then by Lemma 4.3,

the order of gϕ(pα−1) is

k

(ϕ(pα−1), k)= ` = p.

This implies that

k = p(ϕ(pα−1), k

)= (ϕ(pα), kp)

or

1 =

(pα−1(p− 1)

k, p

).

In other words,

k = pα−1d

for some d|(p− 1). Now,

gpα−1d ≡ gk ≡ 1 (mod pα)

implies that

gpα−1d ≡ 1 (mod p)

and this shows that

gd ≡ 1 (mod p)

since gp ≡ g (mod p). Since g is a primitive root modulo p, (p− 1)|d. Therefore,k = pα−1d = pα−1(p− 1) = ϕ(pα)

and this shows that g is a primitive root modulo pα.

Since

M(2pα) 'M(pα),

we conclude that M(2pα) is also cyclic.

We have thus shown that

theorem 4.14 Let p be an odd prime. The group M(m) is cyclic if

m = 1, 2, 4, pα and 2pα.

4.3 Integers m for which (Z/mZ)∗is not cyclic 49

4.3 Integers m for which (Z/mZ)∗ is not cyclic

We will now prove that M(m) is not cyclic for m 6= 1, 2, 4, pα and 2pα.

We first prove that M(2β) is not cyclic for β ≥ 3. This follows from the

following lemma:

lemma 4.15 If β ≥ 3 then M(2β) is not cyclic.

Proof

The elements in M(2β) are of the form [x]2β , with x being an odd integer. If we

can show that all elements of M(2β) has order strictly less than 2β−1, then we

can conclude that M(2β) is not cyclic. Hence it suffices to show that for all odd

integers x,

x2β−2 ≡ 1 (mod 2β). (4.3)

When β = 3 then x2 ≡ 1 (mod 8) for all odd integers x. This can be verified

directly. We now prove by induction that M(2β) is not cyclic for all positive

integer β ≥ 3. Suppose (4.3) is true for positive integers less than β. Then this

means that

x2β−3 ≡ 1 (mod 2β−1)

for all odd integers x. Therefore,

x2β−3

= 1 + 2β−1t.

Squaring both sides of the above, we find that

x2β−2

= 1 + 2βt′.

Hence,

x2β−2 ≡ 1 (mod 2β),

and this completes our proof that M(2β) is not cyclic.

We next suppose that 2αqβ divides m, where q is an odd prime with α > 1

and β > 0. Then by application of Chinese Remainder Theorem, we know that

M(m) 'M(2α)⊕M(qβ) · · · .

This would imply thatM(m) contains a Klein four group and hence it has more

than one subgroup of order 2. Since a cyclic group contains a unique subgroup

of order d when d is a divisor of the order of the group, we conclude that M(m)

is not cyclic.

Next suppose p and q are odd primes such that pαqβ divides m, where α > 0

and β > 0. Then since ϕ(p) and ϕ(q) are divisible by 2, M(m) again contains a

Klein four group.

This completes our proof that M(m) is not cyclic if m 6= 1, 2, 4, pα and 2pα.

5 Quadratic Reciprocity Law

5.1 Primitive roots and solutions of congruences

In Corollary 3.6, we see that if d|(p− 1), where p is a prime, then

xd ≡ 1 (mod p)

is solvable and has exactly d solutions. In this section, we study an extension of

this result.

theorem 5.1 Suppose m = 1, 2, 4, pα or 2pα, where p is an odd prime. Let

(a,m) = 1 and let d = (ϕ(m), n). The congruence

xn ≡ a (mod m) (5.1)

is solvable if and only if

aϕ(m)/d ≡ 1 (mod m).

In the case when the congruence is solvable, there are exactly (n, ϕ(m)) solutions.

Proof

We may assume that n ≤ ϕ(m) by Euler’s Theorem. If

xn ≡ a (mod m)

has a solution, then let gk be its solution where g is a primitive root modulo m.

Hence

aϕ(m)/d ≡ gkn(ϕ(m)/d) ≡(gϕ(m)

)kn/d≡ 1 (mod m).

Conversely, suppose

aϕ(m)/d ≡ 1 (mod m).

Let a = g`. Note that from our assumption and the fact that g is a primitive

root, we must have

ϕ(m)|(`ϕ(m)/d).

Hence d|`.Now consider the linear congruence

nu ≡ ` (mod ϕ(m)).

5.2 The Legendre Symbol 51

Note that since d = (n, ϕ(m)) divides `, the above congruence is solvable by

Theorem 1.40. Now, let s = gu. Then

sn = gnu = gnu+ϕ(m)v ≡ g` ≡ a (mod m)

and hence

xn ≡ a (mod m)

is solvable.

By specifying n = 2 and m = p we see that

theorem 5.2 (Euler’s Criterion) The congruence equation

x2 ≡ a (mod p)

is solvable if and only if

a(p−1)/2 ≡ 1 (mod p).

5.2 The Legendre Symbol

Let a be any integer relatively prime to p, where p is an odd prime. The Legendre

symbol is defined by

(a

p

)=

{1 if x2 ≡ a (mod p) is solvable,

−1 otherwise..

From the Euler Criterion, we see immediately that

a(p−1)/2 ≡(a

p

)(mod p).

This is because ap−1 ≡ 1 (mod p) and so a(p−1)/2 ≡ ±1 (mod p).

When p|n we simply define (n

p

)= 0.

From Euler’s Criterion, we obtain immediately that the Legendre symbol(

np

)

is a completely multiplicative function of n, i.e.,(mn

p

)=

(m

p

)(n

p

).

In the subsequent sections, we will learn an algorithm in computing the Leg-

endre symbol, thereby allowing us to determine the solvability of the congruence

x2 ≡ a (mod p).


definition 5.3 For (a, p) = 1, we say that a is a quadratic residue modulo p if

x2 ≡ a (mod p) is solvable. Otherwise, we say that a is a quadratic non-residue

modulo p.

5.3 Gauss Lemma

In this section, we give an elementary proof of the Gauss Lemma.

theorem 5.4 Let p be a prime and let a be an integer such that (a, p) = 1.

Consider

S := {a, 2a, · · · , p− 1

2a}

and let

T := {s (mod p)|s ∈ S},

with elements in T between 0 and p − 1. Suppose there are m elements in T

which are greater than p/2, then(a

p

)= (−1)m.

Proof

Let r1, r2, · · · , rm be elements in T exceeding p/2. Let s1, s2, · · · , sk be the

remaining elements in T . Note that

m+ k =p− 1

2.

Now, rj > p/2 implies that 0 < p− rj < p/2.

Note that since (Z/pZ)∗is a group, multiplication by a induces a bijection on

this group and hence, no two si’s coincide and no two p− rj coincide. It remains

to show that no p− rj coincides with si. Let

rj ≡ ρa (mod p) and si ≡ σa (mod p),

with ρ < p/2, σ < p/2 and ρ 6= σ. If

−rj ≡ si (mod p),

then

a(ρ+ σ) ≡ 0 (mod p).

This implies that p|(ρ+ σ). Since |ρ+ σ| < p, this is impossible.

Now, since m+ k = (p− 1)/2, p− ri, 1 ≤ i ≤ n and sj , 1 ≤ j ≤ k, are distinct

and less than p/2 and that they are all less than p/2, we conclude that

{p− r1, p− r2, · · · , p− rm, s1, s2, · · · , sk} = {1, 2, · · · , p− 1

2}.


Multiplying the elements on both sides, we find that

(−r1)(−r2) · · · (−rm) · s1 · · · sk ≡(p− 1

2

)! (mod p).

But

r1 · r2 · · · rm · s1 · · · sk ≡ (−1)ma(p−1)/2

(p− 1

2

)! (mod p)

as the ri’s and sj ’s are congruent to elements in S, we conclude that

a(p−1)/2 ≡ (−1)m (mod p),

and hence, (a

p

)= (−1)m.

5.4 Proofs of Gauss’ Quadratic Reciprocity Law

Let a = −1 then m = (p−1)/2 since as 6∈ S for all s ∈ S. From Euler’s Criterion

or Gauss Lemma (Theorem 5.4), we deduce that(−1

p

)= (−1)(p−1)/2.

This is the first part of Gauss’ quadratic reciprocity law.

Let a = 2 and p ≡ 1 (mod 4). Set p = 4` + 1. We have S = {1, 2, · · · , 2`}.Therefore 2S = {2, 4, · · · , 2`, 2`+ 2, · · · , 4`}. Hence m = `. In other words

(2

p

)=

{1 if p = 8`+ 1

−1 if p = 8`+ 5.

Similarly, we have(2

p

)=

{1 if p = 8`+ 7

−1 if p = 8`+ 3.

In short, we may write (2

p

)= (−1)(p

2−1)/8.

We now prove the final part of Gauss’ quadratic reciprocity law. First, we need

a lemma.

lemma 5.5 For odd positive integer m, we have

sinmx

sinx= (−4)(m−1)/2

∏

1≤j≤(m−1)/2

(sin2 x− sin2

2πj

m

).


Proof

It is known that

cosmx+ i sinmx = (cos x+ i sinx)m =m∑

k=0

Ak (cosx)m−k (sinx)k ik.

Comparing the imaginary parts of both sides, we conclude that

sinmx =

(m−1)/2∑

`=0

A2`+1 sin2`+1 x cosm−(2`+1) x(−1)`.

Hencesinmx

sinx

is a polynomial in sin2 x of degree (m− 1)/2. We note that the function

sinmx

sinx

vanishes when x = 2πj/m for 1 ≤ j ≤ (m− 1)/2. Hence

sinmx

sinx= C

∏

1≤j≤(m−1)/2

(sin2 x− sin2

2πj

m

).

The constant C can be found by letting x tends to 0. The left hand side is m

and the right hand side becomes

C(−1)(m−1)/2∏

1≤j≤(m−1)/2

sin22πj

m.

Using the fact that

sin(π − x) = sinx,

we find that

∏

1≤j≤(m−1)/2

sin22πj

m=

m−1∏

k=1

sinkπ

m.

Butm−1∏

k=1

sinkπ

m=

1

2m−1im−1

m−1∏

k=1

e−ikπ/m(ωk − 1

),

where ω = e2πi/m. Now,

m−1∏

k=1

e−ikπ/m = i−(m−1),

and using the equation

xm − 1 =

m∏

k=1

(x − ωk),


we deduce that

m(−1)m−1 =m−1∏

k=1

(ωk − 1).

Hence,

m−1∏

k=1

sinkπ

m=

m

2m−1.

This implies that C = (−4)(m−1)/2 and the proof is complete.

The main part of Gauss’ quadratic reciprocity law states that for odd distinct

primes p, q(p

q

)(q

p

)= (−1)(p−1)(q−1)/4.

Let S = {1, 2, · · · , (p− 1)/2}. Write

[qs]p = [es(q)sq ]p,

where sq ∈ S and

es(q) =

{1 if qs = sq ∈ S

−1 if qs 6∈ S..

Note that es(q) = −1 precisely when qs exceeds p/2. Hence, if m is the number

of elements in qS = {1 ≤ qs (mod p) ≤ p|s ∈ S} that exceeds p/2, then(q

p

)= (−1)m =

∏

s∈S

es(q).

Now, define

F ([m]p) = sin2πm

p.

Note that F is well defined on (Z/pZ)∗ since F ([m + kp]p) = F ([m]p) because

sin(2π(m+ kp)/p) = sin(2πm/p). Therefore, the relation

[qs]p = [es(q)sq]p

yields

sin2π

pqs = F ([qs]p) = F ([es(q)sq]p) = es(q) sin

2π

psq.

Since s→ sq is a bijection of S, we conclude that

(q

p

)=∏

s∈S

es(q) =∏

s∈S

sin2π

pqs

sin2π

psq

=∏

s∈S

sin2π

pqs

sin2π

ps.


Applying Lemma 5.5, we deduce that(q

p

)=∏

s∈S

(−4)(q−1)/2∏

t∈T

(sin2

2πs

p− sin2

2πt

q

),

where T = {1, 2, · · · , (q − 1)/2}. Hence(q

p

)= (−4)(q−1)(p−1)/4

∏

s∈S

∏

t∈T

(sin2

2πs

p− sin2

2πt

q

).

Interchanging the role of p and q, we deduce that(p

q

)= (−4)(q−1)(p−1)/4

∏

t∈T

∏

s∈S

(sin2

2πt

q− sin2

2πs

p

).

Therefore, (p

q

)and

(q

p

)

agrees up to sign and the sign is given by (−1)(p−1)(q−1)/4 by looking at the

products

∏

s∈S

∏

t∈T

(sin2

2πs

p− sin2

2πt

q

)and

∏

t∈T

∏

s∈S

(sin2

2πt

q− sin2

2πs

p

).

We have thus completed the proof of the quadratic reciprocity law and we

summarize the result as follow:

theorem 5.6 Let p and q be distinct primes. Then we have(−1

p

)= (−1)(p−1)/2

(2

p

)= (−1)(p

2−1)/8

(p

q

)(q

p

)= (−1)(p−1)(q−1)/4.

example 5.1

Show that x2 ≡ 10 (mod 89) is solvable.

Solutions. We compute(10

89

)=

(2

89

)(5

89

)= (−1)(89

2−1)/8

(5

89

)=

(89

5

)= 1.

5.5 The Jacobi Symbol

In this section, we discuss an extension of the Legendre symbol.

5.5 The Jacobi Symbol 57

definition 5.7 Let Q be a positive odd integer so that

Q = q1q2 · · · qswhere qi are not necessarily distinct. The Jacobi symbol is defined by

(P

Q

) s∏

j=1

(P

qj

)

where the expressions on the right hand side involving qj are the Legendre sym-

bols.

We observe that if Q is prime then the Jacobi symbol is simply the Legendre

symbol. We also note that if (P,Q) > 1, then(P

qj

)= 0

for some j and hence (P

Q

)= 0.

Now if P is a quadratic residue modulo Q, then

x2 ≡ P (mod qj)

is solvable and hence (P

Q

)= 1.

The converse, however, is not true. Although(

2

15

)= 1,

the congruence

x2 ≡ 2 (mod 15)

is not solvable.

From the definition of the Jacobi symbol and the properties of the Legendre

symbol, it is immediate that(P

Q

)(P

Q′

)=

(P

QQ′

)

(P

Q

)(P ′

Q

)=

(PP ′

Q

)

(P 2

Q

)= 1 and

(P

Q2

)= 1

Finally, we observe that if P ≡ P ′ (mod Q), then(P

Q

)=

(P ′

Q

).


The main result we want to show in this section is

theorem 5.8 If Q is odd positive integer, then

(−1

Q

)= (−1)

Q−12

(2

Q

)= (−1)

Q2−1)8

(P

Q

)(Q

P

)= (−1)

P−12 ·Q−1

2 .

Proof

To prove the first equality, we observe that

ab− 1

2−(a− 1

2+b− 1

2

)=

(a− 1)(b− 1)

2.

The numerator of the right hand side is divisible by 4 if both a and b are odd.

Hence,

ab− 1

2≡(a− 1

2+b− 1

2

)(mod 2).

Hence if Q = q1q2 · · · qs, thens∑

j=1

(qj − 1

2

)≡(Q− 1

2

)(mod 2).

This implies that

(−1

Q

)=

s∏

j=1

(−1

qj

)= (−1)

∑sj=1(qj−1)/2 = (−1)

Q−12 .

The proof of the second equality is similar except that we used the relation

a2b2 − 1

8−(a2 − 1

8+b2 − 1

8

)=

(a2 − 1)(b2 − 1)

8

and observe that the numerator of the right hand side is divisible by 64.

The proof of the last equality follows in exactly the same way as the proof of

the first equality.

With the Jacobi symbol, we can now calculate Legendre symbol without having

to factorize integers (except for factoring −1 and 2).

In the next two sections, we will give another proof of the Gauss Lemma using

results from group theory.

5.6 The Transfer 59

5.6 The Transfer

Let G be a finite group and H be a subgroup of G. Let X = G/H and fix a set

of representatives for X , namely,

R = {x1, · · · , xm}where m = |X |. Write the elements in X as [xj ] instead of xjH .

The group G acts on X via

g · [xi] = [gxi].

Now, because we insist that the representative for the cosets in X to be from R

we must write [gxi] as [xj ] for some 1 ≤ j ≤ m. Since [gxi] = [xj ] for some j we

could view G as acting on {1, 2, · · · ,m} and we write

g ◦ i = j.

In other words, we have

[gxi] = [xg◦i].

From the above, we conclude that

gxi = xg◦ihg,[xi]

for some hg,[xi] ∈ H. We define the transfer of g to be

Ver(g) =∏

[x]∈X

hg,[x] (mod [H,H ])

where where [H,H ] is the commutator subgroup of H , namely, the group gen-

erated by {h1h2h−11 h−1

2 |h1, h2 ∈ H}.It appears that Ver(g) depends on our choice of set of representatives R. We

will show that this is not the case.

theorem 5.9 The map Ver : G → H/[H,H ] is independent of the choice of

representatives of x ∈ X and it is a homomorphism.

Proof

Let R∗ = {x∗1, x∗2, · · · , x∗m} be another fixed set of coset representatives and

suppose we have

[x∗i ] = [xi].

This means that

x∗i = xih[xi]

for some h[xi] ∈ H. Now,

g[x∗i ] = [gx∗i ] = [gxi] = [xg◦i] = [x∗g◦i].

If we write

gx∗i = x∗g◦ih∗g,[x∗

i]


for some h∗g,[x∗

i] ∈ H , then the corresponding transfer is defined by

Ver∗(g) =∏

[x]∈X

h∗g,[x] (mod [H,H ]).

Note that

gx∗i = gxih[xi] = xg◦ihg,[xi]h[xi] = x∗g◦ih−1[xg◦i]

hg,[xi]h[xi].

This implies that

h∗g,[x∗

i] = h−1

[xg◦i]hg,[xi]h[xi].

Therefore,

Ver∗(g) =∏

[xi]∈X

h−1[xg◦i]

hg,[xi]h[xi] (mod [H,H ]) =∏

[x]∈X

hg,[xi] (mod [H,H ]) = Ver(g)

since

ab = abb−1a−1ba = ba (mod [H,H ])

and∏

[xi]∈X

h−1[xg◦i]

h[xi] = 1 (mod [H,H ]).

Therefore the transfer is independent of the coset representatives.

Next,

st[xi] = [stxi] = [xst◦i].

Hence,

stxi = xst◦ihst,[xi].

Now,

s(txi) = s(xt◦i)ht,[xi] = xst◦ihs,[xt◦i]ht,[xi].

Therefore,

Ver(st) =∏

[xi]∈X

hst,[xi] (mod [H,H ]) =∏

[xi]∈X

hs,[xt◦i]ht,[xi] (mod [H,H ])

=∏

[xi]∈X

hs,[xt◦i]

∏

[xi]∈X

ht,[xi] (mod [H,H ]) = Ver(s)Ver(t).

Hence Ver is a homomorphism.

Our next task to compute Ver(s) for a single element s ∈ G. We consider the

cyclic subgroup C generated by s and let C acts on X , namely,

sj[x] = [sjx].

5.6 The Transfer 61

Note that X is a disjoint union of orbits Oα under the action of C. If |Oα| = fαand [xα] ∈ Oα then the elements in the orbit are

{[sjxα]|j = 0, 1, · · · , fα − 1}.

We can now choose our coset representatives R to contain

{sjxα|j = 0, 1, · · · , fα − 1}

for each orbit Oα. Note that if 0 ≤ j ≤ fα − 2,

s[sjxα] = [sj+1xα]

and since sj+1xα ∈ R, we have

s(sjxα) = sj+1xαhs,[sjxα]

with hs,[sjxα] = 1. When j = fα − 1, we find that

s(sfα−1xα) = sfαxα = xαhs,[sfα−1xα]. (5.2)

Therefore,

Ver(s) =∏

[x]∈X

hs,[x] =∏

α

fα−1∏

j=0

hs,[sjxα] =∏

α

hs,[sfα−1xα] =∏

α

x−1α sfαxα,

where we have used (5.2).

Hence we have

theorem 5.10 Let s ∈ G and let Oα be the orbits of X under the action of

the cyclic subgroup generated by s. Suppose [xα] is an element in Oα, then

Ver(s) =∏

α

hα =∏

α

(xα)−1sfαxα (mod [H,H ]).

We have already seen that the map Ver is a homomorphism fromG toH/[H,H ].

If G is abelian then [H,H ] is trivial and we have a homomorphism from G to H

given by

Ver(s) =∏

α

sfα = s∑

αfα = sm,

since∑

α

fα = m

where m = |X |.


5.7 Gauss Lemma via the transfer

Let G = (Z/pZ)∗and H = {[±1]p}. Let the coset representatives of H in G be

S := {1, 2, · · · , (p− 1)/2}. Now, for a ∈ G,

Ver([a]p) = [a](p−1)/2p .

We now compute Ver from its definition. Note that for [s]p ∈ S and [a]p ∈ G,

[as]p = [saes(a)]p

where

es(a) =

{1 if [as]p = [sa]p for some sa ∈ S,

−1 otherwise.

Hence,

Ver([a]p) = [−1]mp

where m is the number of elements s such that [as]p 6∈ S. Comparing with the

previous computation of Ver(s), we conclude that[(

a

p

)]

p

≡ [a](p−1)/2p = Ver([a]p) = [−1]mp ,

where m is the number of elements s such that [as]p 6= [s′]p for all s′ ∈ S.

In other words, m is the number of elements in S such that [as]p = [−sa]p =

[p − sa]p, or the number of elements in S such that the least positive residues

of as (mod p) exceeds p/2. This proves Gauss Lemma and shows that Gauss

Lemma corresponds to computing Ver([a]p) in two different ways.

5.8 The Legendre Symbol and primes of the form x2 + y2

In this section, we give an explicit form of x and y that satisfy

x2 + y2 = p

where p is a prime, p ≡ 1 (mod 4).1 This would certainly imply that

theorem 5.11 Let p be an odd prime such that p ≡ 1 (mod 4). Then p is a

sum of two squares.

We first need two lemmas.

lemma 5.12 Let p be an odd prime. We have

∑

m(mod p)

(m

p

)= 0.

1 Adapted from “Number Theory” by G.E. Andrews


Proof

Note that using primitive roots modulo p, we see that if g is a primitive root

modulo p, then the even powers of g are quadratic residues and the odd powers

of g are quadratic non-residues. Therefore there are (p− 1)/2 quadratic residues

and (p− 1)/2 quadratic non-residues. Therefore in the sum

∑

m(mod p)

(m

p

),

there are (p− 1)/2 terms which take the value 1 and (p− 1)/2 terms which take

the value −1. In other words,

∑

m(mod p)

(m

p

)= 0.

lemma 5.13 Let p be an odd prime. We have

∑

m(mod p)

((m− a)(m− b)

p

)=

{p− 1 if a ≡ b (mod p)

−1 if a 6≡ b (mod p)..

Proof

Note first by replacing m by m+ a, we find that

∑

m(mod p)

((m− a)(m− b)

p

)=

∑

m(mod p)

((m)(m− (b − a))

p

)

=∑

m(mod p)m 6≡0(mod p)

((m)(m− (b − a))

p

)

since(

mp

)= 0 when p|m. Let m′ be such that m′m ≡ 1 (mod p). Then

∑


(m(m− (b− a))

p

)=

∑


(m′2

p

)(m(m− (b− a))

p

)

=∑


(m′m

p

)(m′m−m′(b− a)

p

)

=∑

m(mod p)m′ 6≡0(mod p)

(1−m′(b− a)

p

)

=∑

m′(mod p)

(1−m′(b− a)

p

)− 1

= −1


by Lemma 5.12. Note that number -1 in the second last line is added so that we

could sum over complete residues m′ (mod p).

Let

S(m) =

p∑

n=1

(n(n2 −m)

p

).

Note that S(0) = 0 by Lemma 5.12 and S(m+Np) = S(m) for any integer N .

Let k 6≡ 0 (mod p). Then

S(m) =∑

n(mod p)

(k4

p

)(n(n2 −m)

p

)

=∑

n(mod p)

(k2n

p

)(k2n2 − k2m

p

)

=

(k

p

) ∑

n(mod p)

(kn

p

)((kn)2 − k2m

p

)

=

(k

p

)S(k2m).

Hence, we have

S2(k2m) = S2(m). (5.3)

If u is a quadratic residue modulo p, then u ≡ k2 (mod p) for some integer k

and

S2(u) = S2(k2 · 1) = S2(1) (5.4)

by setting m = 1 in (5.3).

If v is a non-residue, then v = `2j` where ` is a fixed primitive root modulo

p. This is because all quadratic non-residues are odd powers of primitive roots

modulo p. Then by (5.3),

S2(v) = S2(`(`j)2) = S2(`). (5.5)

We note that we can replace ` by any quadratic non-residue modulo p.

There are (p − 1)/2 quadratic residues and (p − 1)/2 quadratic non-residues

and hence by (5.4) and (5.5), we deduce that

∑

m(mod p)

S2(m) = S2(0) +p− 1

2

(S2(1) + S2(`)

)=p− 1

2

(S2(1) + S2(`)

),

since S2(0) = 0.


Now,

∑

m(mod p)

S2(m) =∑

m(mod p)

∑

s(mod p)t(mod p)

(s(s2 −m)

p

)(t(t2 −m)

p

)

=∑

m,s,t

(st

p

)((m− s2)(m− t2)

p

)

=∑

s,t

(st

p

)∑

m

((m− s2)(m− t2)

p

).

By Lemma 5.13, we deduce that

p− 1

2

(S2(1) + S2(`)

)=

∑

s,ts2≡t2(mod p)

(st

p

)(p− 1) +

∑

s,ts2 6≡t2(mod p)

(st

p

)(−1)

=∑

s,ts2≡t2(mod p)

(st

p

)(p− 1)−

∑

s,t

(st

p

)−

∑

s,ts2≡t2(mod p)

(st

p

)

= p∑

s,ts2≡t2(mod p)

(st

p

),

where we have used Lemma 5.12 in the last equality. Next,

∑

s,ts2≡t2(mod p)

(st

p

)=

∑

s,ts≡t(mod p)

(st

p

)+

∑

s,ts≡−t(mod p)

(st

p

)

=∑

t(mod p)

(t2

p

)+

∑

t(mod p)

(−t2p

)= 2(p− 1)

Hence we conclude that

(S(1)

2

)2

+

(S(`)

2

)2

= p.

Now, if 2|S(m) then we would have found integers x and y such that

x2 + y2 = p.


To show that 2|S(m) for all integers m, we observe that

S(m) =

(p−1)/2∑

n=1

(n(n2 −m)

p

)+

p−1∑

n=(p+1)/2

(n(n2 −m)

p

)

=

(p−1)/2∑

n=1

(n(n2 −m)

p

)+

(p−1)/2∑

n=1

((p− n)((p− n)2 −m)

p

)

= 2

(p−1)/2∑

n=1

(n(n2 −m)

p

),

where the last equality follows for the fact that p ≡ 1 (mod 4).

Remark 5.14 Recently, H.H. Chan, L. Long and Y.F. Yang showed the follow-

ing observation:

Let p ≡ 1 (mod 6). Suppose a is any integer such that x3 ≡ a (mod p) is not

solvable. Then

3p = x2 + xy + y2,

with

x =

p∑

α=1

(α3 + 1

p

)and y =

(a

p

) p∑

α=1

(α3 + a

p

).

5.9 Appendix : Primes of the form x2 + y2, a second approach

In this section, we give another proof of Theorem 5.11. This proof is due to D.

Zagier and is motivated by the work of R. Heath-Brown, who is in turn motivated

by Liouville.

Consider the set

S = {(x, y, z) ∈ Z+|x2 + 4yz = p}.

Note that S is non-empty. This is because if p = 4k+1, then (1, 1, k) ∈ S. Define

the map on S by

α : (x, y, z) →

(x+ 2z, z, y− x− z) if x < y − z,

(2y − x, y, x− y + z) if y − z < x < 2y,

(x− 2y, x− y + z, y) if x > y.

.

We can check that this is a map on S by checking that each expression in the

image satisfies x2 + 4yz = p. For example,

(x + 2z)2 + 4z(y − x− z) = p

if x2 + 4yz = p.

5.9 Appendix : Primes of the form x2 + y2, a second approach 67

We next check that α is an involution. This again can be checked by considering

the three cases as listed in the definition of α. For example, if x < y − z, then

α(x, y, z) = (x+ 2z, z, y− x− z) = (x′, y′, z′)

and x+ 2z > 2z or x′ > 2y′. Hence

α(x′, y′, z′) = (x′ − 2y′, x′ − y′ + z′, y′) = (x, y, z).

The rest of the cases can be verified in a similar way.

Our next step is to show that the only fixed point of α is (1, 1, k) where

p = 1 + 4k. If we use the definition of α, we find that the only possible case

where a fixed point exists for α is when y − z < x < 2y since x, y, z are all

positive integers. This implies that the fixed point must be of the form (u, u, w).

But this would mean that u2 +4uw = p and that u|p, or u = 1 or p. Now, u 6= p

for otherwise, the left hand side would be greater than the right hand side. Hence

u = 1 and w = k and the only fixed point of α is (1, 1, k). For any involution β,

we find that the number of elements in S is the sum of the number of elements

in

{s, β(s)|β(s) 6= s}

and

{t|β(t) = t}.

Since there is only one fixed point for α, we conclude that the number of elements

in S is odd.

We next consider a simple involution

σ(x, y, z) = (x, z, y).

Since the number of elements in S is odd, σ must have a fixed point, say

(x0, y0, z0). But this is a fixed point of σ means that y0 = z0 and thus, we

conclude that

x20 + 4y20 = p

and this completes the proof of Theorem 5.11.

6 Binary Quadratic forms

6.1 Fermat-Euler Theorem

A binary quadratic form is an expression of the form

f(x, y) = ax2 + bxy + cy2

where a, b, c ∈ Z.

Representation of an integer by a binary quadratic form has attracted attention

of many mathematicians in the past. For example, Fermat observed that

theorem 6.1 An odd prime p can be represented by a sum of two squares if

and only if p ≡ 1 (mod 4).

We have seen two proofs of Theorem 6.1 in the last Chapter. We now give

Euler’s proof of this result using Fermat’s method of descent.

Proof

If p ≡ 3 (mod 4) then it cannot be a sum of two squares since a sum of two

squares is either congruent to 1,2, or 0 modulo 4. Suppose p ≡ 1 (mod 4) then(−1

p

)= 1

implies that

x2 ≡ −1 (mod p)

is solvable. Let u be an integer with 1

|u| ≤ p− 1

2(6.1)

such that

u2 ≡ −1 (mod p).

This implies that

u2 + 1 = kp (6.2)

1 We use the remainders from −(p − 1)/2 to (p − 1)/2 instead of 0 to p− 1.

6.1 Fermat-Euler Theorem 69

for some positive integer k. Note that by (6.1),

u2 + 1 < p2,

and hence 0 < k < p. Now since p - (u · 1), we conclude that if

S := {m ∈ Z+|x2 + y2 = mp, p - xy,m < p.},

then by (6.2) and our discussion so far, we conclude that k ∈ S and hence S is

a non-empty subset of positive integers.

By the least integer axiom, S contains a minimal positive m0 satisfying the

conditions. If m0 = 1 then we are done.

Suppose m0 > 1. We first note that m0 cannot divide both x and y. If this

were the case, then m0 would divide p. Since m0 6= 1 and m0 < p, this is not

possible. Therefore, if we write

x = x1 +m0r and y = y1 +m0s

with |x1| < m0/2 and |y1| < m0/2, x1 and y1 cannot be both zero. This implies

that

x21 + y21 > 0.

Now,

0 < x21 + y21 ≤ m20/2 < m2

0 (6.3)

by our choice of the range of |x1| and |y1|. Hence,

x21 + y21 = m0(p− 2rx− 2sy +m0r2 +m0s

2) = m0m1

with m1 < m0 by (6.3).

Next, since

(x− iy)(x1 + iy1) = (xx1 + yy1 + i(xy1 − x1y),

we find that

m20m1p = (x2 + y2)(x21 + y21) = (xx1 + yy1)

2 + (xy1 − x1y)2.

Now, observe that

X = xx1 + yy1 = x(x −m0r) + y(y −m0s) = m0w,

for some integer w. Similarly,

Y = xy1 − x1y = m0v.

Therefore,

m20m1p = X2 + Y 2 = m2

0(w2 + v2).

This implies that

w2 + v2 = m1p. (6.4)

Now, if p|(wv), then p|w or p|v. Without loss of generality, we may assume p|u.


Now, (6.4) implies that p|v2, or p|v. Again from (6.4), we find that p|m1, which

is impossible because 0 < m1 < p. Hence, p - (wv). The above discussion implies

that m1 ∈ S. But this contradicts the minimality of m0 in S. This implies that

m0 = 1 in the first place and we are done.

From Theorem 6.1, we see that primes of the form x2 + y2 can be determined

by the Legendre condition

(−1

p

)= 1. It turns out primes of the form x2 + 2y2

and x2 + 3y2 are determined completely by the conditions

(−2

p

)= 1 and

(−3

p

)= 1, respectively. A natural question to ask is therefore, can all primes

of the form x2 + ny2 determined by a single condition

(−np

)= 1? The answer

is no.

For example, one can show that

p = x2 + 5y2

if and only if(−1

p

)= 1 and

(−5

p

)= 1.

So two conditions are needed. In fact if we only have one condition(−5

p

)= 1,

then we can only conclude either

p = x2 + 5y2

or

p = 2x2 + 2xy + 3y2.

The binary quadratic forms x2+5y2 and 2x2+2xy+3y2 both have discriminant

b2− 4ac = −20. Furthermore these binary quadratic forms are “genuinely differ-

ent”. In the following sections, we will explain the meaning of binary quadratic

forms being “genuinely different.”

6.2 Representations of integers by binary quadratic forms

definition 6.2 Given a binary quadratic form ax2 + bxy + cy2 the discrim-

inant (denoted usually by d or ∆) of the quadratic form is defined to be the

number b2 − 4ac.


theorem 6.3 Let d be an integer. There exists at least one binary quadratic

form with integral coefficients and discriminant d, if and only if d ≡ 0 or 1

(mod 4).

Proof

Let f(x, y) = ax2+bxy+cy2 be a binary quadratic form of discriminant d. Since

b2 ≡ 0 or 1 (mod 4), d = b2 − 4ac ≡ 0 or 1 (mod 4). Suppose d ≡ 0 (mod 4).

Then the form

x2 − d

4y2

has discriminant d. If d ≡ 1 (mod 4), then

x2 + xy −(d− 1

4

)y2

has discriminant d.

definition 6.4 A form f(x, y) is called indefinite if it takes on both positive

and negative values. The form is called called positive semidefinite (or negative

semidefinite) if f(x, y) ≥ 0 ( or f(x, y) ≤ 0) for all integers x, y. A semidefinite

form is called definite if in addition the only integers x, y for which f(x, y) = 0

are x = 0, y = 0.

The form f(x, y) = x2 − 2y2 is indefinite. The form f(x, y) = (x − y)2 is

semidefinite and the form f(x, y) = x2 + y2 is positive definite form.

example 6.1

Let d be the discriminant of the form

f(x, y) = ax2 + bxy + cy2.

If d < 0 and a > 0, show that f(x, y) is positive definite.

Solutions. We see that

f(x, y) = a

(x+

by

2a

)2

+4ac− b2

4a2y2 = a

(x+

by

2a

)2

− d

4ay2.

This immediately shows that f(x, y) is positive definite.

definition 6.5 We say that a quadratic form f(x, y) represents an integer

n if there exists integers k, ` such that

f(k, `) = n.

We say that the representation is proper if (k, `) = 1; otherwise, it is improper.

In the first section, we have seen that if p ≡ 1 (mod 4) then p is properly

represented by x2 + y2. Our aim in this chapter is to continue our investigation

of integers properly represented by a particular quadratic forms.


theorem 6.6 Let n > 0 and d be given integers with n 6= 0. There exists a

binary quadratic form of discriminant d that represents n properly if and only if

the congruence

x2 ≡ d (mod 4n)

has a solution.

Proof

Suppose b is a solution of

x2 ≡ d (mod 4n).

Then b2 = d+ 4nc, for some integer c. The form

f(x, y) = nx2 + bxy + cy2

has discriminant d and properly represents n since f(1, 0) = n.

Next, suppose f(k, `) = n and (k, `) = 1. We claim that we can find a quadratic

form g(x, y) that takes the form

nx2 +Bxy + Cy2.

In order to prove this, we rewrite a general binary quadratic form in the form

(x y

)( a b/2

b/2 c

)(x

y

).

We note the discriminant may be defined as −4det(M), where

M =

(a b/2

b/2 c

).

So in general if F (x, y) = f(αx + γy, βx + δy), then the discriminant of F is

given by

−4 · det(a b/2

b/2 c

)· det

(α β

γ δ

)2

.

Suppose f(x, y) = ax2 + bxy + cy2. Since (k, `) = 1, there exist integers u, v

such that ku− `v = 1. We let

g(x, y) = f(kx+ vy, `x+ uy).

Then

g(x, y) = nx2 +Bxy + Cy2.

Since

det

(k `

v u

)= 1,

by the above discussion, we find that the discriminant of g(x, y) is given by

B2 − 4nC = d.


This immediately implies that

x2 ≡ d (mod 4n)

has a solution.

corollary 6.7 Suppose d ≡ 0 or 1 (mod 4). If p is an odd prime, then there

is a binary quadratic form of discriminant d that represents p if and only if

(d

p

)= 1.

Proof

By Theorem 6.6, we conclude that

x2 ≡ d (mod 4p)

is solvable if p can be represented by a binary quadratic form of discriminant d.

This implies that

x2 ≡ d (mod p)

is solvable, or(d

p

)= 1.

Conversely, if

(d

p

)= 1 then

x2 ≡ d (mod p)

is solvable. Note that d ≡ 0 or 1 (mod 4), then

x2 ≡ d ≡ 0 (mod 4)

or

x2 ≡ d ≡ 1 (mod 4)

respectively. Hence,

x2 ≡ d (mod 4)

is solvable. By Chinese Remainder Theorem, we conclude that

x2 ≡ d (mod 4p)

is solvable and by Theorem 6.6, p is represented by a binary quadratic form of

discriminant d.


6.3 Equivalence of binary quadratic forms

We have seen in the proof of Theorem 6.6 that matrices play a role in the study

of quadratic forms. In fact, we observe that

(α β

γ δ

)is 1, then the discriminant

of g(x, y) that is obtained from f(x, y) under the transformation

(x y

)→(x y

)(α β

γ δ

)

is equal to the discriminant of f(x, y). This observation also shows that an

integer n can be properly represented by f(x, y) if and only if it can be properly

represented by g(x, y). In other words, the forms f(x, y) and g(x, y) are “similar”.

This leads us to the next definition.

definition 6.8 We say that two quadratic forms f(x, y) = ax2 + bxy + cy2

and g(x, y) = Ax2 +Bxy+Cy2 are related and write f ∼ g if there are integers

α, β, γ, δ ∈ Z with αδ − βγ = 1, such that

g(x, y) = f(αx+ γy, βx+ δy).

theorem 6.9 The relation ∼ on the set of binary quadratic forms is an equiv-

alence relation.

Proof

Since f(x, y) = f(x, y), f ∼ f . Suppose f ∼ g, then we can write

g(x, y) = Ax2 +Bxy + Cy2 =(x y

)( A B/2

B/2 C

)(x

y

)

=(x y

)(α γ

β δ

)(a b/2

b/2 c

)(α β

γ δ

)(x

y

).

Comparing the representations above (check that the (1, 2)-entry and (2, 1)-entry

of the product of three matrices in the last expression are the same), we conclude

that (A B/2

B/2 C

)=

(α γ

β δ

)(a b/2

b/2 c

)(α β

γ δ

).

With the above representation, we see that

(a b/2

b/2 c

)=

(α γ

β δ

)−1(A B/2

B/2 C

)((α γ

β δ

)−1)T

and hence g ∼ f.

Lastly, to check transitivity, we suppose that

h(x, y) = A′x2 +B′xy + C′y2.


Suppose f ∼ g via

(α β

γ δ

)and g ∼ h via

(α′ β′

γ′ δ′

). Then

(A′ B′/2

B′/2 C′

)=

(α′ γ′

β′ δ′

)(α γ

β δ

)(a b/2

b/2 c

)(α β

γ δ

)(α′ β′

γ′ δ′

).

This computation shows that

f ∼ h.

Remark 6.10 If f ∼ g, then

g(x, y) = f(αx+ γy, βx+ δy).

Note that since αδ − βγ = 1, the discriminant of f and g are the same and

so the relation ∼ is an equivalence relation on the set of binary quadratic forms

with the same discriminant.

From now on, we will concentrate on the set of positive definite binary quadratic

forms (the discriminant is negative and that its absolute value is not a perfect

square). We will also introduce the notion of reduced forms and show that in

each equivalence class under ∼, there exists one and only one reduced form which

serves as a representative of that class. We will then show that given a fixed dis-

criminant, there are finitely many equivalence classes and this number is equal

to the number of reduced forms.

6.4 Reduced forms

definition 6.11 Let f(x, y) = ax2+bxy+cy2 be a positive definite binary

quadratic form whose discriminant d < 0 with ac 6= 0. We call f reduced if

−|a| < b ≤ |a| < |c|

or if

0 ≤ b ≤ |a| = |c|.

We will first show the following:

theorem 6.12 Let d be a given negative integer.Each equivalence class of

positive definite binary quadratic forms of discriminant d contains at least one

reduced form.

Proof

Suppose

f(x, y) = ax2 + bxy + cy2.


Since ac 6= 0, a 6= 0 and c 6= 0. Let

F := {A ∈ Z+|Ax2 +Bxy + Cy2 ∼ f(x, y)}.

Note that F is non-empty and because a 6= 0 and f ∼ f . By the least integer

axiom, there exists a smallest positive integer u in the set F .

Let

g(x, y) = ux2 + vxy + wy2

such that g(x, y) ∼ f(x, y).

Suppose the second coefficient of g, namely, v does not lie in (−|u|, |u|]. Thenby Division Algorithm, there exists a unique integer m such that

−|u| < 2um+ v ≤ |u|.

The form

g(x+my, y) = ux2 + (2um+ v)xy + w′y2

will then have the middle coefficient lying in (−|u|, |u|]. Note that if w′ < u,

then by applying

S =

(0 −1

1 0

),

we arrive at an equivalent form

w′x2 − (2um+ v)xy + uy2,

which contradicts the minimality of u. Hence, w′ ≥ u.

If w′ = u and 2um+ v = 0 then u = 1 since ux2 + uy2 has to be primitive. If

2um+ v > 0, then

ux2 + (2um+ v)xy + uy2

is reduced. If 2um+ v < 0, then since

ux2 + (2um+ v)xy + uy2 ∼ ux2 − (2um+ v)xy + uy2,

and ux2 − (2um + v)xy + uy2 is reduced, we observe that f is equivalent to a

reduced form.

lemma 6.13 Let f(x, y) = ax2+ bxy+ cy2 be a reduced positive definite form.

If for some pair of integers x and y we have (x, y) = 1 and f(x, y) ≤ c, then

f(x, y) = a or c and the point (x, y) is one of the six points

±(1, 0),±(0, 1),±(1,−1).

Moreover, the number of proper representations of a by f is

2 if a < c,

4 if 0 ≤ b < a = c, and

6 if a = b = c.


Proof

Suppose that |y| ≥ 2. Then

4af(x, y) = (2ax+ by)2 − dy2 ≥ −dy2

≥ −4d = 16ac− 4b2 > 8ac− 4b2

≥ 4a2 − 4b2 + 4ac ≥ 4ac.

Thus, f(x, y) > c if |y| ≥ 2. This implies that if f(x, y) ≤ c then |y| ≤ 1.

Now, suppose that |y| = 1 and |x| ≥ 2. Then

|2ax+ by| ≥ |2ax| − |by| ≥ 4a− |b| ≥ 3a.

Therefore,

4af(x, y) = (2ax+ by)2 − dy2 ≥ 9a2 − dy2

= 9a2 − d = 9a2 + 4ac− b2 > a2 − b2 + 4ac ≥ 4ac.

Thus, f(x,±1) > c.

Next, if y = 0 then since (x, y) = 1, we conclude that x = ±1.

Collecting what we obtain so far, we conclude that if f(x, y) ≤ c, then |y| ≤ 1

and |x| ≤ 1. There are altogether eight points ((0, 0) is excluded since the greatest

common divisor of 0 and 0 is not 1) (x, y) satisfying the above inequalities. These

are

±(1, 0)± (0, 1),±(1,−1) and ± (1, 1).

Next, since b > −a, we find that

f(±1,±1) = a+ b+ c > c.

Hence, (1, 1) and (−1,−1) are excluded as well.

We have thus arrived at the six possible solutions. The last assertion follows

on observing that

f(1, 0) = a, f(0, 1) = c and f(1,−1) = a− b + c.

We observe that from the above computations, a and c are the smallest positive

integer and second smallest positive integer represented by f respectively.

theorem 6.14 Let f(x, y) = ax2 + bxy+ cy2 and g(x, y) = Ax2 +Bxy+Cy2

be reduced positive definite quadratic forms. If f ∼ g then f = g.

Proof

From the previous lemma, we find that a is the smallest integer that is properly

represented by f(x, y). Since f is equivalent to g, we may write

f(x, y) = g(αx+ γy, βx+ δy).


Now, two solutions to f(x, y) = a are (±1, 0). Thus,

g(±α,±β) = a.

Since

αδ − βγ = 1,

we conclude that (α, β) = 1 and the representation of a by g(x, y) is proper.

Therefore a ≥ A since A is the smallest positive integer represented by g. In-

terchanging the roles of f and g and using similar arguments, we conclude that

A ≥ a. Hence, a = A.

Suppose a < c. Then f(x, y) = a has only two solutions, namely, (±1, 0).

If C = A = a, then by previous lemma, g(x, y) = a would have more than 2

solutions with (x, y) = 1. Hence, we conclude that C > a.

Observe that c is the second smallest integer properly represented by f(x, y)

and the solutions f(x, y) = c are (0,±1). This implies that g(x, y) properly repre-

sents c. Hence c ≥ C since C is the second smallest integer properly represented

by g(x, y). Interchanging the roles of f and g, we conclude that C ≥ c and hence,

C = c.

To show that b = B suppose g(x, y) = f(α′x+ γ′y, β′x+ δ′y). Note that

g(x, y) = ax2 +Bxy + cy2

= (aα′2 + bα′β′ + cβ′2)x2 +B′xy + (aγ′2 + bγ′δ′ + cδ′2)y2.

Comparing the coefficients of x2, we find that

f(α′, β′) = a.

The only two solutions to the above are (±1, 0). Therefore, α′ = ±1 and β′ = 0.

Now, by considering solutions to f(x, y) = c we conclude that (γ′, δ′) = (0,±1).

We then conclude that

g(x, y) = f(−x,−y) or f(x, y)

and b = B.

Suppose a = c. Then f(x, y) = a has at least 4 proper representations by f .

Therefore the same is true for g. Hence, C = a = c. Now, B2 − 4AC = b2 − 4ac

and the fact that b ≥ 0, B ≥ 0 yields b = B and our proof is complete.

theorem 6.15 Let f be a reduced positive definite binary quadratic form

whose discriminant d is not a perfect square. Then 0 < a ≤√−d/3. The number

of reduced forms of a given nonsquare discriminant d is finite.

Proof

We see that

−d = 4ac− b2 ≥ 4a2 − a2 = 3a2.


Hence,

a ≤√−d/3.

Now −a < b ≤ a and b is determined by the equation

b2 − 4ac = d.

This implies that there are finitely many reduced forms.

There is a formula to compute the number of reduced binary quadratic form of

discriminant d < 0. For example if d ≡ 1 (mod 4) and |d| > 3, then the number

is−1

|d|∑

(n,d)=11<n<|d|

n

(n

|d|

).

For example, if |d| = 11, we have

1

11{1 + 4 + 9 + 5 + 3− 2− 6− 7− 8− 10} = 1.

7 Form Class groups

In the last Chapter, we introduce the concept of reduced binary (positive defi-

nite) quadratic forms. In this chapter, we will show that given discriminant d, a

group can be constructed from the set of reduced binary quadratic forms with

discriminant d.

7.1 The set C(d)

Let d < 0 be such that d ≡ 1 (mod 4) (respectively 0 (mod 4)) and |d| (respec-tively |d|/4) is squarefree. This will guarantee that the positive definite binary

quadratic forms

f(x, y) = ax2 + bxy + cy2

with discriminant d are primitive (i.e., the (a, b, c) = 1). In this section, we will

use

[a, b, c]

to represent the quadratic form

f(x, y) = ax2 + bxy + cy2.

For

A =

(α γ

β δ

)∈ SL2(Z) and f(x, y) = ax2 + bxy + cy2,

we define

A(f(x, y)) =(x y

)A

(a b/2

b/2 c

)At

(x

y

).

Note that (1 0

m 1

)([a, b, c]) = [a, 2am+ b, am2 + bm+ c]

and (0 −1

1 0

)([a, b, c]) = [c,−b, a].

Let C(d) be the set of equivalence classes of positive definite binary quadratic

7.1 The set C(d) 81

forms. We have seen that each class contains a unique reduced form and hence,

|C(d)| is finite. We wish to obtain a binary operation • on C(d) and obtain a

group structure on C(d). Let C1, C2 ∈ C(d). Suppose C1 and C2 contains the

forms [a,B,Ca′] and [a′, B, Ca] respectively. We define

[a,B,Ca′] ◦ [a′, B, Ca] := [aa′, B, C]

and that 1

C1 • C2 := [[aa′, B, C]].

Our operation ◦ is motivated by the following identity of Gauss 2:

(a1x21 + bx1y1 + a2cy

21)(a2x

22 + bx2y2 + a1cy

22) = a1a2x

2 + bxy + cy2

where

x = x1x2 − cy1y2, y = a1x1y2 + a2x2y1 + by1y2.

This identity of Gauss-Dirichlet can be proved by observing that

f(x, y) = ax2 + bxy + cy2 =1

4aN(2ax+ by +

√dy)

with d = b2 − 4ac and N(u+ v√d) = u2 − dv2.

The main difficulty in showing that (C(d), •) is a group is to show that • is

well-defined. But first, we will show that given C1 and C2, we can always find

pairs of forms in C1 and C2 of the form [a,B,Ca′] and [a′, B, Ca] for some integers

B and C. To do this, we need the following theorem:

theorem 7.1 Suppose f is a primitive positive definite binary quadratic form

and k is a nonzero integer, then there exists an integer n properly represented

by f with the property that (n, k) = 1.

Proof

Let k =∏m

j=1 pαj

j .Note that for each prime pj |k, one of the numbers f(1, 0), f(1, 1)

and f(0, 1) is relatively prime to pj . For if this were not the case, then there will

be a number d that divides f(1, 0) = a, f(1, 1) = a+ b+ c and f(0, 1) = c. This

number d would divide b as well, which implies that f is not primitive.

Let (xj , yj) denote the pair of integers for which f(xj , yj) is relatively prime to

pj or any power of pj . Now, by Chinese Remainder Theorem, there are integers

u, v satisfying the congruences

x ≡ xj (mod pαj

j ) and y ≡ yj (mod pαj

j ), 1 ≤ j ≤ m.

Note that f(u, v) is the number relatively prime to k.

1 The choice of representatives [a,B, Ca′] and [a′, B, Ca] is so that C can be computed asC = (B2

− d)/(4aa′).2 This approach is due to Dirichlet.

82 Form Class groups

Let C1 = [f ] and C2 = [g]. Suppose f = [a, b, c]. By Theorem 7.1, there exists a′

represented by g such that (2a, a′) = 1. We replace g by [a′, b′, c′]. Using Chinese

Remainder Theorem, there exists a unique B modulo 2aa′ such that

B ≡ b (mod 2a) and B ≡ b′ (mod a′).

Note that

B = b+ 2aK and B = b′ + a′K ′ (7.1)

implies that

B2 ≡ d (mod 4a) and B2 ≡ d (mod a′).

Hence, 4aa′ divides B2−d. Let C = (B2−d)/(4aa′) and we find that [aa′, B, C]

is a quadratic form of discriminant d. Note that

[a, b, c] ∼ [a, b+ 2aK, c1] = [a,B,Ca′].

Now, b2 − 4ac = (b′)2 − 4a′c′ implies that b and b′ have the same parity. Next,

(7.1) implies that

B = b′ + a′K ′ ≡ b+ 2aK (mod 2).

This shows that K ′ is even and we have

[a′, b′, c′] ∼ [a′, B, Ca].

We have thus shown that C1 and C2 contains forms of the form [a,B,Ca′] and

[a′, B, Ca] and • is defined as

C1 • C2 = [[aa′, B, C]].

The above observation that b and b′ having the same parity allows us to replace

(7.1) by

B = b+ 2aK and B = b′ + 2a′L (7.2)

whenever a′ is odd.

7.2 C(d) is a finite abelian group

We will show later that • is well defined. But first, assuming that • is well defined,we want to show that (C(d), •) is a finite abelian group.

To simplify our treatment, we assume d ≡ 0 (mod 4). The case for d ≡ 1

(mod 4) is similar. Let C0 = [[1, 0,−d/4]]. We will show that C0 is an identity for

(C(d), •). Take C1 = [[a, b, c]]. Now,

[1, 0,−d/4] ∼ [1, b, c′]

since b is even. Therefore,

C0 • C1 = [[a, b, c]] = C1


and C0 is the identity.

Next, let C1 = [[a, b, c]]. We claim that C2 = [[a,−b, c]] is the inverse of C1.

[a,−b, c] ∼ [c, b, a]

and

[a, b, c] ◦ [c, b, a] = [ac, b, 1].

Now,

[ac, b, 1] ∼ [1,−b, ac] ∼ [1, 0,−d/4].

Hence,

C1 • C2 = C0.

To show associativity, we let C1, C2, C3 be three elements in C(d). Using The-

orem 7.1, we can find a, a′, a′′ such that a, a′, a′′, 2 are pairwise relatively prime.

We construct B such that

B ≡ b (mod 2a), B ≡ b′ (mod a′) and B ≡ b′′ (mod a′′).

Note that

[[a,B, c1]] • [[a′, B, c2]] = [[aa′, B, C]]

and

[[aa′, B, C]] • [[a′′, B, c3]] = [[aa′a′′, B, C1]].

Similarly,

C1 • (C2 • C3) = [[aa′a′′, B, C2]].

Note that C1 = C2 and we are done.

7.3 The operation • is well defined

The most difficult part in showing (C(d), •) is a group is to show that • is well

defined. In other words, if we have

[u, V,Wu′] ∼ [a,B,Ca′]

and

[u′, V,Wu] ∼ [a′, B, Ca],

we want to show that

[uu′, V,W ] ∼ [aa′, B, C].

We will complete this task in a few steps. Let

f1 = [a,B,Ca′], f2 = [u, V,Wu′]


and

g1 = [a′, B, Ca], g2 = [u′, V,Wu]

be such that

f1 ∼ f2 and g1 ∼ g2.

Step 1

Suppose f1 = f2, (a, u′) = 1 and V = B. Since

f1 = f2,

we deduce that

Wu′ = Ca′.

Our aim is to show that if

g1 ∼ g2,

then

f1 ◦ g1 ∼ f1 ◦ g2.

This is equivalent to showing that

[aa′, B, C] ∼ [au′, B,W ].

Now, since g1 ∼ g2, there is a matrix(α β

γ δ

)∈ SL2(Z)

such that(α β

γ δ

)(a′ B/2

B/2 Ca

)=

(u′ B/2

B/2 Wa

)(δ −γ−β α

).

The identity

αB/2 + βCa = −u′γ + αB/2

shows that

βCa = −u′γ.

By our choice of u, u′, we know that (u, u′) = 1 and hence,

γ/a ∈ Z.

Now, observe that(α βa

γ/a δ

)(aa′ B/2

B/2 C

)=

(au′ B/2

B/2 W

)(δ −γ/a

−βa α

).

Hence,

[aa′, B, C] ∼ [au′, B,W ].


Step 2

Suppose (a, u′) = 1 and B = V . We have seen from Step 1 that f1◦g1 ∼ f1◦g2.Applying Step 1 again, g2 ◦ f1 ∼ g2 ◦ f2. Hence,

f1 ◦ g1 ∼ f1 ◦ g2 = g2 ◦ f1 ∼ g2 ◦ f2 = f2 ◦ g2.

Step 3

We now drop the assumption that B = V . Assume that (aa′, uu′) = 1. If aa′

and uu′ are both odd, then there is a solution to

x ≡ B (mod 2aa′) and x ≡ V (mod uu′).

If either aa′ or uu′ were even, then we may assume without loss of generality

that aa′ is even. By Chinese Remainder Theorem, there is a solution to

x ≡ B (mod 2aa′) and x ≡ V (mod uu′).

Let this solution be B∗ and we observe that

B∗ = B + 2aa′n1 = V + uu′m.

By (7.2), we conclude that m is even and write m = 2n2. In other words, we

have

B∗ = B + 2aa′n1 = V + 2uu′n2.

Let

F1 =

(1 0

a′n1 1

)([a,B,Ca′]) = [a,B∗, C1]

and

G1 =

(1 0

an1 1

)([a′, B, Ca]) = [a′, B∗, C′

1].

Note that

f1 ∼ F1 and g1 ∼ G1

and

f1 ◦ g1 = [aa′, B, C]

and

F1 ◦G1 = [aa′, B∗, C∗1 ].

Now,(

1 0

n1 1

)(f1 ◦ g1) =

(1 0

n1 1

)([aa′, B, C])

= [aa′, B∗, C∗1 ]

implies that

f1 ◦ g1 ∼ F1 ◦G1.


Using similar argument with f1, g1, n1 replaced by f2, g2, n2, we deduce that

f2 ◦ g2 = F2 ◦G2

where

F2 =

(1 0

u′n2 1

)([u, V,Wu′]) = [u,B∗, C2]

and

G2 =

(1 0

un2 1

)([u′, V,Wu]) = [u′, B∗, C′

2].

Since the coefficient of the xy-term in F1, F2, G1, G2 is B∗, by Step 2, we

deduce that

F1 ◦G1 ∼ F2 ◦G2.

But

f1 ◦ g1 ∼ F1 ◦G1

and

f2 ◦ g2 ∼ F2 ◦G2.

Combining the three relations, we conclude that

f1 ◦ g1 ∼ f2 ◦ g2.

Step 4

Let f1 = [a,B,Ca′], f2 = [u, V,Wu′] and g1 = [a′, B, Ca], g2 = [u′, V,Wu].

Choose s such that (s, aa′uu′) = 1 with s representable by f1. Moreover, we can

find a form F such that F ∼ f1 and F ∼ f2 with the property that

f = [s,B′, C′].

Choose s′ such that (s′, aa′uu′s) = 1 with s′ representable by g1. Note that there

is a form G such that G ∼ g1 and G ∼ g2 with

g = [s′, B′′, C′′].

We remark here that the conditions on the gcd’s are to guarantee that (s, s′) = 1,

(ss′, aa′) = (ss′, uu′) = 1. Since (s, s′) = 1, we can find F,G with F ∼ f and

G ∼ g and

F = [s,B∗, C∗1 ] and G = [s′, B∗, C∗

2 ],

with

B∗ ≡ B′ (mod 2s) and B∗ ≡ B′′ (mod s′).

Since (ss′, aa′) = 1, by Step 3, we conclude that

F ◦G ∼ f1 ◦ g1.


Similarly, since (ss′, uu′) = 1, by Step 3, we find that

F ◦G ∼ f2 ◦ g2.Hence, we conclude that

f1 ◦ g1 ∼ F ◦G ∼ f2 ◦ g2.This shows that • is a well defined operation and this completes the proof that

(C(d), •) is a finite abelian group.

Remark 7.2 One can show that (C(d), •) is an abelian group by identifying

it with ideal class group associated with imaginary quadratic fields. Let d < 0

be a discriminant of a quadratic field and let d be the product of (−1) and the

distinct primes dividing d. For example, if d = −4, then we choose d = −1.

Let K = Q(√d). An element in K is an algebraic integer α if its minimal

polynomial is of the form

x2 + ax+ b, a, b ∈ Z.

It is a fact that

Theorem 7.3 The set of algebraic integers, denoted by OK , in Q(√d) is

OK =

Z[√d] if d 6≡ 1 (mod 4)

Z

[1 +

√d

2

]if d ≡ 1 (mod 4).

An ideal I of a ring R is an additive subgroup of R satisfying the condition

that for all r ∈ R and i ∈ I,

ir = ri ∈ I.

It turns out that any ideal I in OK is of the form

[a, b+ c√d] := aZ+ (b + c

√d)Z.

One can define an equivalence relation on the set of non-zero ideals: We say

that I is equivalent to J if I = αJ for some non-zero α ∈ K∗. If PK denotes the

set of principal ideals in OK then the set of equivalence classes under ∼ is given

by

CK = IK/PK ,

where IK is the set of ideals in OK .

We now define a map and its inverse from C(d) to C(K). Given a class in C(d)

of the form [f(x, y)] with f(x, y) = ax2 + bxy + cy2, we let τ be the solution of

f(τ, 1) = 0 with Im τ > 0. We define ψ by

ψ([f ]) = [a[1, τ ]].


We define the inverse ϕ : C(K) → C(d) via

ϕ([[α, β]]) =N(αx − βy)

N ([α, β]),

where N (I) = |OK/I| and N(u) is the product of u and its conjugate. These

are inverse of each other and ψ is a bijection. To show that C(d) is a group under

the Dirichlet composition •, we observe that under ψ,

ψ([f ] • [g]) = [I] · [J ] = [IJ ],

where ψ([f ]) = [I] and ψ([g]) = [J ]. More precisely, given f = ax2 + bxy + cy2

and g = a′x2 + b′xy + c′y2, we have

F = aa′x2 +Bxy +B2 −D

4aa′y2

if F = f ◦ g. Corresponding to these three forms, we have respectively three

ideals

[a, (−b+√d)/2], [a′, (−b′ +

√d)/2] and [aa′, (−B +

√d)/2].

Let ∆ = (−B +√d)/2. Then we can rewrite the above three ideals as

[a,∆], [a′,∆] and [aa′,∆].

Note that we would have

ψ([f ] • [g]) = [I] · [J ].

It is known that (CK , ·) is an abelian group and so (C(d), •) is an abelian

group.

8 Continued fractions and Pell’sequations

8.1 Pell’s equations

We often encounter situations in which we need to find solutions of an equation

with integral or rational values of the variables. We refer to such equations Dio-

phantine equations. In Example 1.23, we found integral solutions to the linear

diophantine equation

43x+ 64y = 1.

In the previous chapter, we determine those primes represented by x2 + y2.

In this chapter, we will study another type of equation, known as Pell’s equa-

tion.

definition 8.1 Let d > 0. A diophantine equation x2 − dy2 = 1 is known as

Pell’s equation.

Solving Pell’s equation is related to finding generating element for the group

of units in real quadratic fields.

An algebraic integer a satisfying the property that a−1 is also an algebraic

integer is known as a unit. It is known that if d > 0 then there are infinitely

many units in Q(√d). For example, (

√2 + 1)m,m ∈ Z are units in Q(

√2).

It is also known that these are of the form wεm where w is a finite root of

unity and ε is a unit. In order find this special unit, one needs to solve the Pell

equation. For example, if a+ b√d were a unit, then

1

a+ b√d=a− b

√d

a2 − db2

is also an algebraic integer. In many cases, an algebraic integer is of the form

u+ v√d and so in order for a+ b

√d, a2 − db2 = 1. This explains the connection

between Pell’s equation and units in real quadratic fields.

Before we show that the Pell equation is always solvable, we need to study

continued fractions.

90 Continued fractions and Pell’s equations

8.2 Continued fractions

Given a rational fraction u0/u1 such that (u0, u1) = 1 and u1 > 0, we may write

using the Euclidean algorithm,

u0 = u1a0 + u2, 0 < u2 < u1,

u1 = u2a1 + u3, 0 < u3 < u2,

......

uj−1 = ujaj−1 + uj+1, 0 < uj+1 < uj,

uj = uj+1aj .

Let

ξi =uiui+1

.

By using the above equations, we find that

ξ0 = a0 +1

ξ1= · · · = a0 +

1

a1 +1

.. . +1

aj−1 +1

aj

.

This is a continued fraction expansion of ξ0 or u0/u1. We use the notation

< a0, a1, · · · , aj >

to represent the above continued fraction. Note that

< x0, x1, · · · , xj >= x0 +1

< x1, · · · , xj >=

⟨x0, x1, · · · , xj−2, xj−1 +

1

xj

⟩.

8.3 Infinite continued fraction

Let a0, a1, a2, · · · be an infinite sequence of integers, all positive except perhaps

a0. We define two sequences of integers {hn} and {kn} inductively as follows :

h−2 = 0, h−1 = 1, hi = aihi−1 + hi−2 for i ≥ 0

k−2 = 1, k−1 = 0, ki = aiki−1 + ki−2 for i ≥ 0.

theorem 8.2 Let x be a positive real number and a0 = x. For positive integer

n ≥ 1,

< a0, a1, · · · , an−1, x >=xhn−1 + hn−2

xkn−1 + kn−2.

8.3 Infinite continued fraction 91

Proof The case n = 1 is easily verified. We now prove the theorem by induction.

The induction step follows from the following computations:

< a0, a1, · · · , an, x > =

⟨a0, a1, · · · , an−1, an +

1

x

⟩

=(an + 1/x)hn−1 + hn−2

(an + 1/x)kn−1 + kn−2

=x(anhn−1 + hn−2) + hn−1

x(ankn−1 + kn−2) + kn−1

=xhn + hn−1

xkn + kn−1.

If we let x = an in Theorem 8.2, then we immediately obtain the following:

theorem 8.3 Let n ≥ 1 be an integer. If rn =< a0, a1, · · · , an >, then

rn =hnkn.

definition 8.4 The expression

ri =hiki

is called the n-th convergent of the continued fraction < a0, a1, · · · , an > .

theorem 8.5 For i ≥ −1,

hiki−1 − hi−1ki = (−1)i−1 (8.1)

For i ≥ 2,

ri − ri−1 =(−1)i−1

kiki−1(8.2)

hiki−2 − hi−2ki = (−1)iai (8.3)

and

ri − ri−2 =(−1)iaikiki−2

. (8.4)

Proof Note that h−1k−2 − h−2k−1 = 1. Continuing the proof by induction, we

suppose that

hi−1ki−2 − hi−2ki−1 = (−1)i−2.

Then

hiki−1 − hi−1ki = (aihi−1 + hi−2)ki−1 − hi−1(aiki−1 + ki−2)

= −(hi−1ki−2 − hi−2ki−1) = (−1)i−1,


and this completes the proof of (8.1).

Identity (8.2) follows by dividing (8.1) by kiki−1.

Identity (8.3) can be derived directly as follow:

hiki−2 − hi−2ki = (aihi−1 + hi−2)ki−2 − hi−2(aiki−1 + ki−2)

= ai(hi−1ki−2 − hi−2ki−1) = (−1)iai.

Dividing the above identity by kiki−2 we obtain (8.4).

From (8.4), we find that that

r2k−1 > r2k+1

and

r2k+2 > r2k.

From (8.2), we find that for a fixed positive k, the following inequalities hold:

r2 < r4 < · · · < r2k < r2k−1 < r2k−3 < · · · < r3.

This shows that the sequence {r2k}∞k=1 is bounded above by r3 and the se-

quence {r2k−1}∞k=1 is bounded below by r2. These facts led us to the following

theorem:

theorem 8.6 The sequence {r2n} is monotonic increasing, bounded above by

r1, and the sequence {r2n−1} is monotonic decreasing and bounded below by r0.

The limit limn→∞ rn exists.

The existence of the limits of {r2n} and {r2n−1} follows from the Monotone

sequence theorem. The fact that these limits are equal follow from (8.2).

definition 8.7 An infinite sequence a0, a1, · · · of integers, all positive exceptperhaps for a0, determines an infinite simple continued fraction< a0, a1, a2, · · · >.The value of < a0, a1, a2, · · · > is defined to be

limn→∞

< a0, a1, a2, · · · , an > .

Note that this limit is the same as limn→∞ rn.

8.4 A simple Lemma and primes of the form x2 + y2

We now establish a simple inequality and apply it to the study of primes of the

form x2 + y2.

lemma 8.8 Let

ξ =< a0, a1, · · · , an, · · · > .

8.4 A simple Lemma and primes of the form x2 + y2 93

Then

|ξ − rn| < |rn − rn+1| .Proof

If n = 2k, then

r2k+1 − r2k > 0.

Now {r2k}∞k=1 is increasing, and {r2k+1}∞k=1 is decreasing, we conclude that

0 < ξ − r2k < r2k+1 − r2k

and

ξ − r2k > 0 > r2k − r2k+1.

Hence,

|ξ − r2k| < |r2k − r2k+1| .

The argument is similar for odd integer 2k + 1. This completes the proof of the

Lemma.

Now, let p ≡ 1 (mod 4) be an odd prime. Then there exists a positive u such

that

u2 ≡ −1 (mod p).

Letu

p=< a0, a1, · · · , a` > .

Let j be such that

kj <√p < kj+1. (8.5)

By Lemma 8.8, we conclude that∣∣∣∣u

p− rj

∣∣∣∣ < |rj+1 − rj | <1

kjkj+1<

1

kj√p.

This implies that

|ukj − hjp| <√p. (8.6)

Now, let x = kj and y = ukj − hjp. By (8.5) and (8.6), we find that

0 < x2 + y2 < |kj |2 + |ukj − hjp|2 < 2p.

Furthermore,

x2 + y2 ≡ k2j + u2k2j ≡ 0 (mod p)

since

u2 ≡ −1 (mod p).

Hence,

p = x2 + y2.


This gives another proof of the fact that if p ≡ 1 (mod 4) is an odd prime, it is

a sum of two squares. This proof is due to C. Hermite.

8.5 Solutions to Pell’s equations

Let ξ be an irrational number and let < a0, a1, · · · > be the continued fraction

representation of ξ. We have

theorem 8.9 If a/b is a rational number with positive denominator such that

∣∣∣ξ − a

b

∣∣∣ <∣∣∣∣ξ −

hnkn

∣∣∣∣

for some n ≥ 1, then b > kn. In fact, if

|ξb − a| < |ξkn − hn|

for some n ≥ 0, then b ≥ kn+1.

Note that this result says that the convergents are the best rational approxi-

mations to ξ if we want to minimize the denominator of the fraction.

Proof We will first show that the first assertion follows from the second. Assume

that b ≤ kn < kn+1. Using the fact that b/kn ≤ 1, we conclude that

|ξb− a| < b

kn≤ |ξkn − hn|

and b ≤ kn ≤ kn+1. This is a contradiction to the second assertion.

To prove the second assertion, suppose

|ξb − a| < |ξkn − hn|

for some n ≥ 0 and b < kn+1. Consider the equations in x and y:

xkn + ykn+1 = b, xhn + yhn+1 = a.

Since the determinant of the coefficients is ±1, we conclude that x, y are integers.

Note that if x = 0, then

y =b

kn+1> 0,

or y ≥ 1. This implies that b ≥ kn+1, a contradiction.

If y = 0, then a = xhn and b = xkn. Hence,

|ξb − a| = |ξxkn − xhn| = |x||ξkn − hn| ≥ |knξ − hn|

since |x| ≥ 1. This is a contradiction.

Next, we claim that xy < 0. If y < 0 then

xkn = b− ykn+1

8.5 Solutions to Pell’s equations 95

shows that x > 0. If y > 0, then b < kn+1 implies that

b < ykn+1.

So xkn < 0 and therefore, x < 0.

Since the even convergents increase to ξ and the odd convergents decrease to

ξ, we conclude that

ξkn − hn and ξkn+1 − hn+1

have opposite signs. Hence

x(ξkn − hn) and y(ξkn+1 − hn+1)

have the same sign. From the equations defining x and y, we get

ξb− a = x(ξkn − hn) + y(ξkn+1 − hn+1).

Now, if A and B are of the same sign then

|A+B| = |A|+ |B|.

Hence,

|ξb− a| = |x(ξkn − hn) + y(ξkn+1 − hn+1)| ≥ |x(ξkn − hn)| ≥ |ξkn − hn|,

which is a contradiction.

theorem 8.10 Let ξ denote any irrational number. If there is a rational num-

ber a/b with b > 1 and (a, b) = 1 such that

∣∣∣ξ − a

b

∣∣∣ < 1

2b2,

then a/b equals one of the convergents of the simple continued fraction of ξ.

Proof Suppose a/b is not a convergent. Let n be such that kn ≤ b < kn+1. For

this n, we find by the previous theorem that

|ξkn − hn| ≤ |ξb− a| < 1

2b.

This implies that∣∣∣∣ξ −

hnkn

∣∣∣∣ <1

2bkn(8.7)

and∣∣∣ξ − a

b

∣∣∣ < 1

2b2. (8.8)

Since

a

b6= hnkn,


the expression bhn − akn is a non-zero integer and we deduce that using (8.7)

and (8.8) that

1

bkn≤ |bhn − akn|

bkn=

∣∣∣∣hnkn

− a

b

∣∣∣∣ ≤∣∣∣∣ξ −

hnkn

∣∣∣∣+∣∣∣ξ − a

b

∣∣∣ < 1

2bkn+

1

2b2.

This implies that b < kn, which is a contradiction.

Suppose d is not a square and d > 0. We will show in this section that if

x2 − dy2 = 1

is solvable, then the solutions can be obtained from the continued fraction ex-

pansion of√d. 1

Suppose

p2 − dq2 = 1.

Then

p2 = 1 + dq2 > dq2.

Hence,

p > q and p+√dq ≥ 2q

since√d > 1.

Next,

p−√dq =

1

p+√dq

≤ 1

2q.

Therefore, we find that

p

q−√d ≤ 1

2q2.

By Theorem 8.10, we conclude that p/q must be a convergent of the continued

fraction of√d.

Remark 8.11 By studying periodic continued fraction, one can show that there

exists non-zero positive integers x and y such that x2 − dy2 = 1. Moreover, if x1and y1 are the smallest non-zero positive integers satisfying

x2 − dy2 = 1,

then all solutions with x, y positive are of the form

(x1 +√dy1)

n, n ∈ Z+.

1 It can be shown that the Pell equations can be solved if d is positive and squarefree.


8.6 The Pell equation x2 − dy2 = 1 is solvable with xy 6= 0

In the last section, we see that all solutions to the Pell equation

x2 − dy2 = 1

can be obtained from the continued fraction expansion of√d if the equation is

solvable with xy 6= 0. In this section, we will prove that the equation is indeed

solvable.

definition 8.12 A continued fraction is purely periodic if there exists a pos-

itive integer m such that

am+k = ak,

for all integers k ≥ 0.

In other words, a purely periodic continued fraction appears in the form

< a0, a1, · · · , am−1, a0, a1, · · · , am−1, · · · >,

where ai are all positive integers. We denote this as

< a0, a1, · · · , am−1 > .

theorem 8.13 Let ξ be a real number. The continued fraction expansion of ξ

is purely periodic if and only if ξ is a real quadratic irrational number satisfying

ξ > 1 and −1 < ξ′ < 0, where

ξ′ = a− b√N if ξ = a+ b

√N .

Proof We will first prove that if ξ > 1 and −1 < ξ′ < 0, then ξ is purely

periodic.

We will divide our proof into several steps. The continued fraction expansion

of ξ can be constructed inductively via

ai = [ξi]

and

ξi+1 =1

ξi − ai. (8.9)

where we set ξ0 = ξ.

Step 1. We first show that there exist positive integers k, j such that ξk = ξj .

We first note that if

α =a+

√b

c,

then multiplying α by |c|/|c|, we deduce that

α =±ac+

√bc2

±c2 .


In other words, if we start with

ξ =a+ b

√N

c,

we can write ξ in the form

ξ = ξ0 =m0 +

√M

q0,

where

q0|(M −m20).

Now, write

ξ0 = a0 +1

ξ1. (8.10)

Suppose we write

ξ1 =m1 +

√M

q1.

Note that q1 may not be an integer. From (8.10), we conclude that

m1 +√M

q1=

q0

m0 +√M − a0q0

.

This implies that

m1m0 +m0

√M +m1

√M +M − a0q0m1 − a0q0

√M = q0q1.

Comparing coefficients on both sides, we conclude that

m1 = a0q0 −m0

and

q0q1 =M −m21.

Note that m1 is an integer. Since q0|(M −m20) and

q1 =M − (a0q0 −m0)

2

q0

=M −m2

0

q0− (a20q0 − 2m0a0),

q1 is also an integer. Note that this implies that q1|(M −m21).

By induction, we conclude that we can write ξi as

ξi =mi +

√M

qi,

with integers qi and mi. Furthermore, we find that

mi+1 = aiqi −mi


and

qi+1 =M −m2

i+1

qi. (8.11)

The computations are exactly the same as the case i = 0.

Step 2.

We claim that qi > 0. First, we prove that ξ′i is negative.

Let σ be the automorphism of Q(√M) such that

σ(√M) = −

√M.

Applying σ to (8.9), we deduce that

1

ξ′i+1

= ξ′i − ai. (8.12)

Now, a0 ≥ 1 and ξ′0 = ξ′ < 0. This implies that

ξ′0 − a0 < −1,

or1

ξ′1< −1.

This implies that

−1 < ξ′1 < 0.

Now, each ai ≥ 1 and so, by induction and (8.12), we conclude that each

−1 < ξ′i < 0. (8.13)

The argument used for showing that −1 < ξ′i−1 < 0 implies that −1 < ξ′i < 0 is

exactly the same for the case i = 1.

Next, the fact that ξ′i is negative implies that ξi − ξ′i > 0 and

ξi − ξ′i = 2

√M

qi> 0.

This means that

0 < qi < qiqi+1 =M −m2i+1 ≤M,

by (8.11). Furthermore

0 < m2i+1 < m2

i+1 + qiqi+1 =M.

Since M is fixed, we can only have finitely many possible pairs of values for

(mi, qi). Therefore, there exist k and j with k 6= j such that mj = mk and

qj = qk. This implies that

ξj = ξk

for some j 6= k.


Step 3. We now show that the continued fraction expansion of ξ is purely

periodic. From (8.12), we find that for i ≥ 0,

ξ′i = ai +1

ξ′i+1

,

where

0 < − 1

ξ′i+1

< 1.

The last inequality follows from (8.13). From (8.12),

ai − ξ′i = − 1

ξ′i+1

=

[− 1

ξ′i+1

]+

{− 1

ξ′i+1

}

and noting from (8.13) that

0 < −ξ′i < 1,

we conclude that

ai =

[− 1

ξ′i+1

]. (8.14)

Now, ξj = ξk implies that

ξ′k = ξ′j

and

aj−1 =

[− 1

ξ′j

]=

[− 1

ξ′k

]= ak−1.

Hence

ξj−1 = aj−1 +1

ξj= ak−1 +

1

ξk= ξk−1.

Hence, ξj = ξk implies that ξj−1 = ξk−1. If k > j, then a j-fold iteration allows

us to conclude that

ξ =< a0, a1, · · · , ak−j−1 >

and hence the result.

To prove the converse, we assume that the continued fraction expansion of ξ

is purely periodic, say,

ξ =< a0, a1, · · · , an−1 > .

Hence,

ξ =ξhn−1 + hn−2

ξkn−1 + kn−2.

We remark here that since ξ has an infinite continued fraction expansion, ξ

cannot be rational. To see this, assume rn and rn+1 are the n-th and (n+ 1)-th


convergents of the continued fraction expansion of ξ, then by Lemma 8.8, we

find that

|ξ − rn| < |rn − rn+1| = (knkn+1)−1.

If ξ were rational, say u/v then

|ukn − vhn| <v

kn+1.

Since kn is increasing, we conclude that |ukn−vhn| is eventually less than 1 and

greater than 0 and hence ξ cannot be rational.

Now ξ satisfies a quadratic equation and it must therefore be a real quadratic

number.

Let

f(x) = x2kn−1 + x(kn−2 − hn−1)− hn−2.

Then f(ξ) = 0 and ξ > 1 since a0 is positive by definition of purely periodic

continued fraction expansion. Clearly ξ′ is another root of f(x). It suffices to

show that −1 < ξ′ < 0. By intermediate value theorem, we need only to show

that f(0) and f(−1) have opposite signs. Now f(0) = −hn−2 < 0.

f(−1) = kn−1 − kn−2 + hn−1 − hn−2.

But this is positive since both {hn} and {kn} are increasing.

corollary 8.14 The number√d+ [

√d] is purely periodic.

Proof The results follow immediately from Theorem 8.13 by checking that√d+ [

√d] > 1

and

−1 < [√d]−

√d < 0.

Suppose√d+ [

√d] =< a0, a1, · · · , ar−1 > .

Note that a0 = 2[√d] and thus,

√d =< [

√d], a1, a2, · · · , 2[

√d] > .

Now, write√d+ [

√d] =< a0, a1, · · · , ξn >,

and√d =< [

√d], a1, · · · , ηn > .


Now, ξ0 is already of the form

ξ0 =m0 +

√d

q0

with m0 = [√d] and q0 = 1 and

ξ1 =1√

d− [√d]. (8.15)

We have seen from Theorem 8.13 with M = N = d that the subsequent ξn can

be written as

ξn =mn +

√d

qn.

Next, we note that

η0 =√d =

s0 +√d

t0,

with s0 = 0 and t0 = 1. But

η0 = [√d] +

1

η1.

By (8.15), we find that

η1 =1√

d− [√d]

= ξ1.

By induction, we conclude that

ξn = ηn

for all n ≥ 1. We thus conclude that

theorem 8.15 If√d+ [

√d] =< a0, a1, · · · , ξn >,

and √d =< [

√d], a1, · · · , ηn >,

then

ξn = ηn

for all integers n ≥ 1.

theorem 8.16 If d is a positive integer that is not a perfect square, then

h2n − dk2n = (−1)n−1qn+1

for all integers n ≥ −1, where

ξn+1 =mn+1 +

√d

qn+1.


Proof We first note that

√d =

ξn+1hn + hn−1

ξn+1kn + kn−1=

(mn+1 +√d)hn + qn+1hn−1

(mn+1 +√d)kn + qn+1kn−1

,

where we have used Theorem 8.15 to express ηn in terms of ξn, which we write

as

ξn =mn +

√d

qn.

Simplifying, we get√d(mn+1kn + qn+1kn−1) + dkn =

√dhn + (mn+1hn + qn+1hn−1).

Comparing coefficients, we find that

hn = mn+1kn + qn+1kn−1,

and

dkn = mn+1hn + qn+1hn−1.

Therefore,

h2n − dk2n = (−1)n−1qn+1,

since

kn−1hn − hn−1kn = (−1)n−1,

by (8.1).

Now, if√d+ [

√d] =< a0, a1, a2, · · · , ar−1 >,

with a0 = 2[√d] then

ξnr = ξr .

But for m > 1, ξm = ηm and hence

ηnr = ηr

for n ≥ 1.

Note also that qr = 1. Hence qnr = 1 for all integers n ≥ 1. Therefore, we

conclude that

h22nr−1 − dk22nr−1 = 1.

This shows that the Pell equation is always solvable in x and y with xy 6= 0.

example 8.1 The continued fraction expansion of√3 + [

√3] is

< 2, 1 > .


Here r = 2. Hence we only need to look at the first convergent of the con-

tinued fraction expansion of√3. The continued fraction expansion of

√3 is

< 1, 1, 2, 1, 2, · · · > . The first convergent is therefore given by

1 + 1/1 =2

1.

Note that

22 − 3 · 12 = 1.

Hence we obtain a solution for the Pell equation

x2 − 3y2 = 1.

9 Euler’s identities and Jacobi TripleProduct Identity

9.1 Jacobi’s Triple Product Identity

Let q and z be complex numbers such that |q| < 1. One of the most elegant

identities involving q and z is perhaps Jacobi’s Triple Product Identity given by

∞∑

j=−∞

qj2

zj =

∞∏

j=1

(1 + zq2j−1)(1 + z−1q2j−1)(1− q2j). (9.1)

There are several proofs of (9.1). In this chapter, we will present an elementary

proof due to G.E. Andrews. Before we proceed with the proof of (9.1), we give

a few interesting results that follow from this identity.

example 9.1 Let z = 1 in (9.1). We immediately obtain a product represen-

tation of the generating function for squares, namely,

∞∑

j=−∞

qj2

=

∞∏

j=1

(1 + q2j−1)2(1− q2j).

We remark here that the squares j2 = 1, 4 and 9 for j = 1, 2 and 3 can be viewed

as the number of dots in the following diagram:

j = 1 j = 2 j = 3

Figure 9.1

example 9.2 Let z = q in (9.1), followed by replacing q by q1/2. We find that

∞∑

j=−∞

qj(j+1)/2 = 2

∞∏

j=1

(1 + qj)2(1− qj).

The integers j(j + 1)/2 = 1, 3, 6 for j = 1, 2 and 3 can be viewed as the number

of dots in the following diagram:

106 Euler’s identities and Jacobi Triple Product Identity

j = 1 j = 2 j = 3

Figure 9.2

example 9.3 Let q be replaced by q3/2, followed by letting z = −q1/2. We

immediately deduce from (9.1) that

∞∑

j=−∞

(−1)jqj(3j+1)/2 =

∞∏

j=1

(1− q3j−1)(1 − q3j−2)(1 − q3j) =

∞∏

j=1

(1− qj), (9.2)

where the last equality follows from the fact that a positive integer must be of the

form 3j − r, with r = 0, 1 or 2. Note that by replacing j by −j, we may write

the left hand of the above series as

∞∑

j=−∞

(−1)jqj(3j−1)/2.

The integers of the form ω(j) = j(3j−1)/2, with j ≥ 1, are known as pentagonal

numbers. For j = 1, 2 and 3, w(j) = 1, 5 and 12. These correspond respectively

to the number of dots in the following diagrams:

j = 1 j = 2 j = 3

Figure 9.3

9.2 The terminating q-binomial Theorem

Let n ≥ 1 be an integer. The binomial theorem states that

(1 + x)n =

n∑

k=0

(n

k

)xk (9.3)

where (n

k

)=

n!

k!(n− k)!. (9.4)

There is a q-analogue of (9.3) and to describe this analogue, we first define the

q-analogue of (9.4).

9.2 The terminating q-binomial Theorem 107

definition 9.1 Let (q)0 = 1 and let

(q)n =n∏

j=1

(1 − qj)

and [n

k

]

n

=(q)n

(q)k(q)n−k. (9.5)

We also let

(q)∞ =

∞∏

k=1

(1− qk).

The expression in (9.5) is a q-analogue of (9.4) since

limq→1−

[n

k

]

q

= limq→1−

(1− q)(1− q2) · · · (1 − qn)

(1− q)n·(1− q)k

(q)k·(1− q)n−k

(q)n−k

=

(n

k

).

Observe further that the expressions for (9.4) and (9.5) motivate us to view the

q-analogue of n! as (q)n, even though (q)n does not approach n! as q approaches

1−.

The q-binomial coefficient has several interesting properties. For example, one

can verify that[n

k

]

q

= qk[n− 1

k

]

q

+

[n− 1

k − 1

]

q

(9.6)

and[n

k

]

q

=

[n− 1

k

]

q

+ qn−k

[n− 1

k − 1

]

q

. (9.7)

When q approaches 1−, both (9.6) and (9.7) reduce to the well known relation(n

k

)=

(n− 1

k

)+

(n− 1

k − 1

).

By (9.6), (9.7) and mathematical induction, we deduce that

[n

k

]

q

is a polynomial

in q. This is analogous to the fact that

(n

k

)is an integer.

We are now ready to establish the following analogue of (9.3):

theorem 9.2 Let n be a positive integers, x and q be independent variables.

Then

(1 + x)(1 + xq) · · · (1 + xqn−1) =

n∑

k=0

[n

k

]

q

qk(k−1)/2xk. (9.8)


Proof We follow the approach of G. Polya and G.L. Alexanderson Denote

f(x) = (1 + x)(1 + xq) · · · (1 + xqn−1).

Note that

(1 + x)f(xq) = (1 + x)(1 + xq) · · · (1 + xqn−1)(1 + xqn)

= f(x)(1 + xqn).

If we write

f(x) =

n∑

k=0

Qkxk,

where Qk is a function of q, then

(1 + x)

(n∑

k=0

qkQkxk

)= (1 + xqn)

(n∑

k=0

Qkxk

).

Comparing the coefficient of xk on both sides, we deduce that

qkQk + qk−1Qk−1 = Qk + qnQk−1,

which implies that

(1− qk)Qk = qk−1(1− qn−k+1)Qk−1.

Hence, we find that

Qk = qk−1 (1 − qn−k+1)

(1− qk)Qk−1. (9.9)

Observing that Q0 = 1 and iterating (9.9), we deduce that

Qk = q(k−1)+(k−2)+···2+1(1− qn−k+1)(1− qn−k+2) · · · (1− qn)

(1 − qk)(1− qk−1) · · · (1− q2)(1 − q)Q0

= qk(k−1)/2

[n

k

]

q

.

This completes the proof of (9.8).

9.3 Two identities of Euler

In this section, we derive two identities due to L. Euler. These identities may be

viewed as analogues of the identity

ex =

∞∑

k=0

xk

k!.

9.4 Andrews’ proof of Jacobi’s Triple Product Identity 109

We first rewrite (9.8) as

(1 + x)(1 + xq) · · · (1 + xqn−1) =

n∑

k=0

(q)n

(q)k(q)n−kqk(k−1)/2xk.

We may let n→ ∞ and deduce that 1

∞∏

k=1

(1 + xqk−1) =

∞∑

k=0

qk(k−1)/2

(q)kxk. (9.10)

This identity is due to Euler. Note that if we take (q)n as the analogue of n! and

allow the numerator qk(k−1)/2 to approach 1, we could view the right hand side

of (9.10) as an analogue of the power series expansion of ex.

Next, set x = −1 in (9.8) and multiply both sides by (−1)n/(q)n to deduce

that

0 =

n∑

k=0

(−1)n−1

(q)n−k· q

k(k−1)/2

(q)k, n ≥ 1. (9.11)

Note that the right hand side is of the formn∑

k=0

akbn−k, which is the coefficient

of the xk in the expansion of A(x)B(x) where

A(x) =

∞∑

m=0

(−1)mxm

(q)mand B(x) =

∞∑

m=0

qm(m−1)/2

(q)mxm.

By (9.11), we have

A(x)B(x) = 1. (9.12)

Using (9.10), we find that

∞∑

m=0

(−1)mxm

(q)m=

∞∏

k=1

1

(1 + xqk−1). (9.13)

Identity (9.13) is also due to Euler. By taking (q)n to be the analogue of n!, we

find that the left hand side of (9.13) is the analogue of the power series expansion

of e−x. This shows that (9.12) is the analogue of

ex · e−x = 1.

9.4 Andrews’ proof of Jacobi’s Triple Product Identity

In this section, we will present Andrews’ proof of Jacobi’s Triple Product Identity.

We first replace q by q2 in (9.10), we find that

∞∏

k=1

(1 + xq2k−2) =

∞∑

k=0

qk(k−1)

(q2)kxk. (9.14)

1 This process can be justified and it is known as Tannery’s Theorem.


Replacing x by xq in (9.14), we find that

∞∏

k=1

(1 + xq2k−1) =

∞∑

k=0

qk2

(q2)kxk. (9.15)

Observing that

1

(q2)k=

∞∏

m=1

1− q2m+2k

1− q2m,

we rewrite (9.15) as

∞∏

k=1

(1 + xq2k−1) =

∞∑

k=0

∞∏

m=1

(1− q2m+2k)

(1 − q2m)qk

2

xk

=

∞∏

m=1

1

(1 − q2m)

∞∑

k=0

∞∏

m=1

(1 − q2m+2k)qk2

xk

=

∞∏

m=1

1

(1 − q2m)

∞∑

k=−∞

∞∏

m=1

(1 − q2m+2k)qk2

xk,

where in the last equality, we used the fact that for k < 0,

∞∏

m=1

(1 − q2m+2k) = 0.

This implies that

∞∏

k=1

(1 + xq2k−1)(1− q2k) =∞∑

k=−∞

∞∏

m=1

(1− q2m+2k)qk2

xk. (9.16)

Next, using (9.14) with x = −q2k+2, we find that

∞∏

m=1

(1− q2m+2k) =

∞∑

`=0

(−1)`q`

2+`+2k`

(q2)`.

9.5 Number of representations of n as sums of two squares 111

Substituting this expression into the right hand side of (9.16), we deduce that

∞∑

`=0

(−1)`q`

2+`+2k`

(q2)`−

∞∑

k=−∞

qk2

xk∞∑

`=0

(−1)`q`

2+`+2k`

(q2)`

=

∞∑

`=0

∞∑

k=−∞

(−q)`q(k+`)2

(q2)`xk

=∞∑

`=0

(∞∑

k=−∞

q(k+`)2xk+`

)(−q/x)`(q2)`

=

(∞∑

k=−∞

qk2

xk

)(∞∑

`=0

(−q/x)`(q2)`

)

=

(∞∑

k=−∞

qk2

xk

)∞∏

m=1

1

(1 + x−1q2m−1), (9.17)

where we have used

∞∑

`=0

(−q/x)`(q2)`

=

∞∏

m=1

1

(1 + x−1q2m−1)(9.18)

in the last equality. Identity (9.18) follows from (9.13) with q replaced by q2,

followed by replacing x by −q/x. Combining (9.16) and (9.17), we conclude the

proof of Jacobi’s Triple Product Identity.

Remark 9.3 Andrew’s proof was motivated by Bellman’s remark that there

was no elementary proof of the Jacobi Triple Product identity. Andrews’ proof

is perhaps one of the first elementary proofs of the identity. Around the same

time, P.K. Menon independently discovered a proof similar to that of Andrews.

9.5 Number of representations of n as sums of two squares

We already know that p ≡ 1 (mod 4) if and only if p is a sum of two squares.

One way to see this explicitly is to use the formula

r2(n) = 4

∑

d|nd≡1 (mod 4)

1−∑

d|nd≡3 (mod 4)

1

, (9.19)

where r2(n) is the number of ways of writing n as a sum of two squares. Now if

p ≡ 1 (mod 4) then r2(p) = 8. If p ≡ 3 (mod 4) then r2(p) = 0.

In this section, we present M.D. Hirschhorn’s proof of (9.19). Replace z by


−x2q and q2 by q in (9.1) to deduce that

(x− x−1)∞∏

k=1

(1− xqk)(1− x−2qk)(1− qk) =∞∑

k=−∞

(−1)kx2k+1qk(k+1)/2. (9.20)

We now set k = 2n and k = 2n+1 in the sum on the right hand side and deduce

that

(x − x−1)

∞∏

k=1

(1− x2qk)(1− x−2qk)(1− qk)

=

∞∑

n=−∞

x4n+1q2n2+n −

∞∑

n=−∞

x4n−1q2n2−n

= x

∞∏

n=1

(1 + x4q4n−1)(1 + x−4q4n−3)(1 − q4n)

− x−1∞∏

n=1

(1 + x4q4n−3)(1 + x−4q4n−1)(1 − q4n),

where the last equality is obtained using (9.1).

Differentiating 2 the last equality with respect to x and setting x = 1, we

deduce that

∞∏

n=1

(1− qn)3 =

∞∏

n=1

(1 + q4n−3)(1 + q4n−1)(1 − q4n) (9.21)

×(1− 4

∞∑

n=1

(q4n−3

1 + q4n−3− q4n−1

1 + q4n−1

)).

This implies that (exercise)

∞∏

n=1

((1− q2n

) (1− q2n−1

)2)2= 1− 4

∞∑

n=1

(q4n−3

1 + q4n−3− q4n−1

1 + q4n−1

). (9.22)

Note that by replacing q by −q, the left hand side of (9.22) can be identified

using (9.1) as(

∞∑

k=−∞

qk2

)2

which is∞∑

n=0

r2(n)qn.

Comparing the coefficients of the last series with the right hand side of (9.22)

(after replacing q by −q), we arrive at (9.19).

2 To differentiate an infinite product, one uses the idea of logarithmic differentiation.


9.6 The partition function p(n)

In this section, we will illustrate the use of Jacobi Triple Product Identity in the

study partition function p(n).

definition 9.4 A partition of a positive integer n is a representation of n as

a sum of non-decreasing positive integers. The partition function p(n) is defined

as the number of partitions of n.

example 9.4 The integer 4 has the following partitions:

4 = 1 + 3 = 2 + 2 = 1 + 1 + 2 = 1 + 1 + 1 + 1.

The number of partitions of 4 is 5 and we write p(4) = 5.

Let

F (q) =∞∑

n=0

p(n)qn,

where p(0) := 1. One of the most important identity associated with p(n) is the

following:

theorem 9.5 For |q| < 1, we have

∞∏

k=1

1

(1− qk)=

∞∑

n=0

p(n)qn. (9.23)

Proof First let q be real and that 0 < q < 1. Consider the product

Fm(q) =

m∏

k=1

1

(1− qk).

If we expand the individual term in the product, we find that

m∏

k=1

1

(1− qk)= (1 + q1 + q1+1 + q1+1+1 + · · · )(1 + q2 + q2+2 + q2+2+2 + · · · ) · · ·

· · · (1 + qm−1 + q(m−1)+(m−1) + · · · )(1 + qm + qm+m + · · · ).

We now expand the products on the right hand side and denote the power series

as∞∑

n=0

pm(n)qn,

where pm(0) = 1. For a fixed n ≥ 1, we note that pm(n) counts the number

of representations of n as a partition with parts (i.e. the integers used in the

sum) not exceeding m. For example, when m = 2 then p2(3) = 2 since the

representations of 3 with parts not exceeding 2 are 2 + 1 and 1 + 1 + 1 + 1.

Now, observe that

pm(n) ≤ p(n)


and if n ≤ m,

pm(n) = p(n).

When m approaches ∞, pm(n) approaches p(n). Therefore,

1 +

m∑

n=1

p(n)qn ≤ Fm(q) ≤ F (q),

where

F (q) =

∞∏

k=1

1

(1− qk).

Hence,

1 +

∞∑

n=1

p(n)qn

converges for 0 < q < 1 uniformly in m. Now,

∞∑

n=0

pm(n)qn ≤∞∑

n=0

p(n)qn ≤ F (q)

since p(n) ≥ pm(n). Hence,

Fm(q) ≤∞∑

n=0

p(n)qn ≤ F (q)

and by letting m→ ∞, we complete the proof of the theorem.

Letting m approaches ∞, we conclude that

∞∏

k=1

1

(1− qk)=

∞∑

n=0

p(n)qn.

We then extend the above identity by analytic continuation to the unit disk

|q| < 1.

Now, observe from Theorem 9.5 that

1 =

(∞∑

n=0

p(n)qn

)·(

∞∏

k=1

(1− qk)

).

By Example 9.3, we conclude that(

∞∑

n=0

p(n)qn

)·(1 +

∞∑

n=1

(−1)n(qω(n) + qω(−n))

)= 1

where ω(n) = n(3n− 1)/2. Hence we deduce that for N ≥ 1,∑

n+ω(±m)=N

p(n)(−1)m = 0.


In other words,

p(N) =∑

n+ω(±m)=Nn6=N

(−1)m+1p(n) =N−1∑

m=1

(−1)m+1p(N − ω(±m)).

Major Macmahon tabulated p(n) up to n = 200 using the above recurrence

formula for p(n). From this table, S. Ramanujan observed the following amazing

congruences

p(5n+ 4) ≡ 0 (mod 5) (9.24)

p(7n+ 5) ≡ 0 (mod 7) (9.25)

and

p(11n+ 6) ≡ 0 (mod 11). (9.26)

In the next section, we will give a proof of (9.24).

We end this section by giving another formula for computing p(n). This for-

mula relates p(n) and σ(n) defined by

σ(n) =∑

`|n

`.

By logarithmic differentiating (9.23), we find that

∞∑

n=0

np(n)qn =

(∞∑

n=1

nqn

1− qn

)·(

∞∑

n=0

p(n)qn

),

where we have used the fact that∞∑

n=1

nqn

1− qn=

∞∑

n=1

∞∑

m=1

nqmn =

∞∑

s=1

σ(s)qs.

This immediately gives the recurrence formula for p(n), namely,

np(n) =∑

`+m=n

σ(`)p(m).

Next, instead of using (9.23), we start with

G(q) =

∞∏

n=1

(1− qn),

then

qG′(q)

G(q)=

∞∑

n=1

σ(n)qn. (9.27)

Using (9.3), we may write

G′(q) = 1 +

∞∑

`=1

(−1)`ω(±`)qω(±`).


Furthermore, by using (9.23), we conclude from (9.27) that

σ(n) = −∞∑

`=0

(−1)`(`(3`+ 1)

2

)p

(n− `(3`+ 1)

2

)(9.28)

−∞∑

`=0

(−1)`+1

((`+ 1)(3`+ 2)

2

)p

(n− (` + 1)(3`+ 2)

2

).

The above formula allows us to compute σ(n) even if we do not know the prime

factorization of n. In particular, if σ(n) = n+ 1 then n is a prime.

example 9.5 When n = 19, we note that p(17) = 297, p(12) = 77, p(4) =

5, p(18) = 385, p(14) = 135 and p(7) = 15. By (9.28), we conclude that

σ(19) = 2 · 297− 7 · 77 + 15 · 5 + 385− 5 · 135 + 12 · 15 = 20.

Note that σ(19) = 20 and hence we conclude that 19 is a prime.

We have shown above that values of p(n) are useful in testing if an integer N

were a prime. It is therefore very useful if one has an explicit formula for p(n).

Such a formula which arises from a divergent series was first discovered by G.H.

Hardy and S. Ramanujan around 1918 and independently by J.V. Uspensky in

1920. In 1937, H. Rademacher discovered an exact formula of p(n) in terms of a

convergent series. A. Selberg had also independently discovered the same formula

around the same time. For more details, the readers are encouraged to read A.

Selberg’s aritcle.

9.7 Proof of p(5n+ 4) ≡ 0 (mod 5)

In this section, we give Ramanujan’s proof of (9.24). We will need another iden-

tity of Jacobi.

First, recall (9.20), namely,

(x − x−1)

∞∏

k=1

(1− xqk)(1− x−2qk)(1− qk) =

∞∑

k=−∞

(−1)kx2k+1qk(k+1)/2.

We write the right hand side of the above identity as

∞∑

k=0

(−1)k(x2k+1 − x−(2k+1))qk(k+1)/2.

Dividing both sides of the identity by (x−x−1) and letting x tends to 1, we find

that

∞∏

n=1

(1− qn)3 =

∞∑

n=0

(−1)n(2n+ 1)qn(n+1)/2. (9.29)

9.7 Proof of p(5n+ 4) ≡ 0 (mod 5) 117

Ramanujan’s original proof for the congruence for p(5n+4) depends (9.2) and

(9.29). Write

q

∞∏

n=1

(1− qn)4 =

∞∑

r=−∞

∞∑

s=0

(−1)r+s(2s+ 1)q1+r(3r+1)/2+s(s+1)/2.

When 1 + r(3r + 1)/2 + s(s+ 1)/2 is a multiple of 5, we find that

2(r + 1)2 + (2s+ 1)2 ≡ 0 (mod 5).

But this can only happen when 2s+1 is divisible by 5. Therefore, the coefficient

a5n of the series expansion

q

∞∏

n=1

(1− qn)4 =

∞∑

k=0

akqk

is divisible by 5.

Hence, the coefficient of q5N of

q

∞∏

k=1

(1− q5k)

(1− qk)

is also 0 (mod 5) and therefore,

p(5n+ 4) ≡ 0 (mod 5).

10 Lattices and Minkowski’s Theorem

In this Chapter, we present a proof characterizing primes of the form x2 + y2

using Minkowski’s Theorem.

10.1 Lattices in Rn

definition 10.1 Let V be an n−dimensionalR vector space, with V equipped

with an inner product. Suppose u1,u2, · · · ,un is an orthonormal basis. Volumes

of regions in V are computed from the fundamental computations of the volume

of a rectangular parallelopiped.

If

C = {r1u1 + · · ·+ rnun|0 ≤ ri ≤ ai, 1 ≤ i ≤ n}

then the volume of C is

vol(C) = a1a2 · · · an.

If

C′ = {r1v1 + · · ·+ rnvn|0 ≤ ri ≤ ai, 1 ≤ i ≤ n}

and for each i,

vi =∑

j

αijuj ,

then

vol(C′) = a1a2 · · · an|det(αij)|.

definition 10.2 An abelian subgroup L of a real vector space V is called a

lattice if L = Zv1 + · · ·+ Zvr for some linearly independent vectors v1, · · · ,vr.

We say L is a full lattice if r is the dimension of V over R.

If L is the full lattice as in the definition we refer to the basis {v1, · · · ,vn} of

V as a basis of L. Of course a basis of L differs from the other by a change of

matrix with integer coefficients and determinant equal to ±1.

10.2 Translations of fundamental parallelopiped 119

Let L be a full lattice with basis {v1, · · · ,vn} . The set

T = {r1v1 + · · ·+ rnvn|0 ≤ ri < 1, 1 ≤ i ≤ n}

is called a fundamental parallelopiped for L. T depends on the choice of vi’s but

the volume is independent of the choice of basis. The volume of the fundamental

parallelopiped is called the volume of L.

10.2 Translations of fundamental parallelopiped

The following Lemma shows that if V is a vector space of dimension n over R

and T is a fundamental parallelopiped, then the union of the sets in

{λ+ T |λ ∈ L}

is disjoint and “covers” V .

lemma 10.3 Let L be a full lattice in V and T a fundamental parallelopiped

of L. Thenλ+ T, λ ∈ L

are pairwise disjoint and cover all of V .

Proof

Suppose λ1 + T and λ2 + T have a common point. Let

λ1 + t1 = λ2 + t2.

Since

t1 =∑

i

rivi and t2 =∑

i

r′ivi,

with 0 ≤ ri < 1 and 0 ≤ r′i < 1, with 1 ≤ i ≤ n, we conclude that

|ri − r′i| < 1.

Now,

λ1 − λ2 = t2 − t1

implies that t2 − t1 is a lattice point and hence ri − r′i are integers. But since

|ri − r′i| < 1, ri − r′i is an integer only when ri − r′i = 0. Hence,

λ1 = λ2.

For any v =∑

i sivi ∈ V , with si ∈ R, we may write

si = ni + ri


with ni ∈ Z and 0 ≤ ri < 1. Then

v =∑

i

nivi +∑

i

rivi

expresses v as a sum of an element of L and an element of T . Hence the translates

cover all of V .

10.3 Spheres and bounded sets in Rn

A sphere of radius m in V is a set

U(m) = {r1u1 + · · ·+ rnun|r21 + · · ·+ r2n ≤ m2},

where {u1, · · · ,un} is some orthonormal basis of V . If {v1, · · · ,vn} is another

basis for V , then we define

U∗(m) = {r1v1 + · · ·+ rnvn|r21 + · · ·+ r2n ≤ m2}.

Note that for a given positive real number m, there exists a positive real number

m′ such that U(m) ⊂ U∗(m′).

definition 10.4 A subset W in V is bounded if it is contained in some

sphere.

theorem 10.5 Let L be a lattice. Then every sphere contains only a finite

number of points of L

Proof

Suppose L is a lattice with basis {v1, · · · ,vr}. If r < n extend this to a basis

{v1, · · · ,vn} of V and replace L by∑n

i=1 Zvi. Any sphere is contained in a

sphere

U∗(m) = {riv1 + · · ·+ rnvn|r21 + · · ·+ r2n ≤ m2}.

So it suffices to show that L ∩ U(m) is finite for each m.

If∑

i

nivi ∈ L ∩ U∗(m),

then each ni ∈ Z and |ni| ≤ m. Hence there are finitely many possible ni’s.

10.4 Minkowski’s Theorem

We now prove one of the most important result in the subject known as “Geom-

etry of Numbers”.

10.4 Minkowski’s Theorem 121

definition 10.6 A set X is convex if x,y ∈ X implies thatx+ y

2∈ X . A

set X is centrally symmetric if x ∈ X implies −x ∈ X.

lemma 10.7 A centrally symmetric convex set is one which contains (x−y)/2

whenever it contains x and y.

Proof Now, x ∈ X,y ∈ X implies that x ∈ X,−y ∈ X . This implies that

(x− y)/2 ∈ X by convexity.

theorem 10.8 Let L be a full lattice in the vector space V of dimension n

over R and let X be a bounded, centrally symmetric, convex subset of V . If

vol(X) > 2nvol(L) then X contains a nonzero point of L.

Remark 10.9 This says that if X is large enough then X contains a lattice

point.

Proof Let T denote a fundamental parallelopiped of L. The key to the proof

will be to show that the volume condition imposes a restriction on the translates.

Assertion: If Y is a bounded subset of V such that λ + Y , λ ∈ L are disjoint,

then vol(T ) ≥ vol(Y ).

Proof of Assertion. Let V = Zv1 + · · · + Zvn. Since Y is bounded, there exists

a positive integer m such that

Y ⊂ U∗(m),

where

U∗(m) = {r1v1 + · · ·+ rnvn|r21 + · · ·+ r2n ≤ m2}.

Now, if (λ + T ) ∩ U∗(m) is non-empty and λ + t ∈ (λ + T ) ∩ U∗(m) write

λ+ t =∑

i(ai + ri)vi, with ai ∈ Z and 0 ≤ ri < 1. Since λ+ t ∈ U∗(m), we find

that |ai + ri| ≤ m. Now,

|ai| − |ri| ≤ |ai + ri| ≤ m

and this implies that

|ai| ≤ m+ 1

since 0 ≤ ri < 1. In other words, we see that if (λ+T )∩U∗(m) is non-empty, then

λ ∈ U∗(√n(m+1)). There are finitely many lattice points in U∗(

√n(m+1)) by

Theorem 10.5. Hence, there are finitely many (λ+T ) such that (λ+T )∩U∗(m)

is non-empty. Therefore, there are finitely many (λ + T ) such that (λ + T ) ∩ Yis non-empty.

By Lemma 10.3 the nonempty intersections (λ + T ) ∩ Y are pairwise disjoint

and cover Y . Thus

vol(Y ) =∑

λ∈L

vol((λ+ T ) ∩ Y ).


Note that this sum is finite. Now,

(λ + T ) ∩ Y = (T ∩ (Y − λ)) + λ.

For if x ∈ (λ + T ) ∩ Y,x = λ + t, t ∈ (T ∩ (Y − λ)) + λ. Conversely, x ∈(T ∩ (Y − λ)) + λ, x = t+ λ, t = y − λ implying that

x = y = t+ λ ∈ Y ∩ (T + λ).

Now volume is not changed by translations and so,

vol((λ+ T ) ∩ Y ) = vol((T ∩ (Y − λ)) + λ) = vol(T ∩ (Y − λ))

and

vol(Y ) =∑

λ∈L

vol(T ∩ (Y − λ)).

The sets T∩(Y −λ) are disjoint ( by assumption that translates of Y are disjoint)

but they might not cover T . Thus

vol(T ) ≥∑

λ∈L

vol(T ∩ (Y − λ)) = vol(Y ).

This proves the assertion.

Now we turn to the set X in the Theorem. Let

1

2X =

{1

2x|x ∈ X

}.

We have

vol(1

2X) = 2−nvol(X) > vol(T ) = vol(L).

In view of the assertion just proved, the translates λ + 12X cannot be disjoint.

So there exist λ1 6= λ2 in L such that

1

2x+ λ1 =

1

2y + λ2, x, y ∈ X.

Thenx− y

2∈ X

by assumption on X . Therefore λ2 − λ1 is a nonzero element of L in X .

10.5 Primes of the form x2 + y2, the geometric approach

After all these preparations, we are now ready to prove that if p ≡ 1 (mod 4)

then p is a sum of two squares.

By Wilson’s Theorem, we know that there exists an element u such that u2 ≡−1 (mod p). Let

L = {(a, b)|b ≡ ua (mod p)}.

10.5 Primes of the form x2 + y2, the geometric approach 123

The volume of L is p. Pick r2 = 3p/2 and we have a circle S with volume

3πp/2 > 4p.

This circle must contain a nonzero lattice point (a, b) ∈ L. Now,

a2 + b2 ≤ r2 = 3p/2 < 2p

and

a2 + b2 ≡ a2 + u2a2 ≡ 0 (mod p).

Hence,

a2 + b2 = p.

Of course, Minkowski’s Theorem has more applications than just classifying

primes of the form x2 + y2. It is used to show that the ideal class group of a

finite extension of Q (not just imaginary quadratic) is finite.

11 Elliptic curves

In Number Theory, we are often interested in knowing integral or rational solu-

tions of certain equations. We call these equations Diophantine equations. There

are various methods of solving Diophantine equations since these equations come

in all “shapes and sizes”. One of the most famous Diophantine equation is per-

haps the Fermat’s equation

xn + yn = zn

where n ≥ 3. It is proved by A. Wiles that there are no integral solutions to the

above equation when n ≥ 3.

We now describe another less well known Diophantine equation. A positive

rational number r is called a congruent number if it is the area of some right

triangle with rational sides. So for example the number 157 is a congruent number

because it is the area of a right triangle with two sides forming the right angle

given by

6803298487826435051217540

411340519227716149383203and

411340519227716149383203

21666555693714761309610.

The problem of finding congruent numbers is reduced to the study of the Dio-

phantine equation

y2 = x3 − n2x.

It turns out that if there is a rational point (x, y) on the curve

y2 = x3 − n2x

and x satisfies the conditions that

(i) it is the square of a rational number,

(ii) its denominator is even,

(iii) its numerator has no common factor with n,

then n is a congruent number.

Both the congruent number and the Fermat’s last theorem are related to

finding rational solutions of equation of the form

y2 = f(x)

where f(x) has degree 3. Points satisfying y2 = f(x) contribute to a special curve

known as the elliptic curve.

11.1 Rational Points on curves 125

In this chapter, we will study some properties of rational points on ellip-

tic curves. Although elliptic curves are extremely useful in solving Diophantine

equations, we will not discuss this aspect here. Instead, we will describe Lentra’s

algorithm of factoring integers using elliptic curves.

11.1 Rational Points on curves

Let f(x, y) be a polynomial of two variables. The set of points (x, y) in the plane

for which

f(x, y) = 0

constitutes an algebraic curve and we denote it as Cf (R).

A point (x, y) is called a rational point if x and y are rational. The set of

rational points satisfying f(x, y) = 0 will be denoted by Cf (Q). This is the set

we will study in this chapter.

example 11.1 Let f(x, y) = x2 + y2 − 3. Then to find elements in Cf (Q) we

may assume that x = X/Z and y = Y/Z 1 Then we have

X2 + Y 2 − 3Z2 = 0.

We may also assume that (X,Y, Z) = 1. Now modulo 3, we have

X2 + Y 2 ≡ 0 (mod 3).

But if X and Y are integers then the possible outcome for

X2 + Y 2

is 2. Hence, Cf (Q) is empty.

example 11.2 Find all rational points on the curve x2 + 5y2 = 1.

Solution. Observe that (1, 0) is a rational point of this curve. If (x1, y1) is a

second rational point on this curve, then the slope joining these two points is a

rational number, for

m =y1

x1 − 1.

Conversely if m is a rational number, the line through (1, 0) has the equation

y = m(x− 1).

To find the other intersection of this line with the ellipse, we set y = m(x − 1)

in the formula for the ellipse. This gives

(x− 1)((5m2 + 1)x− (5m2 − 1)) = 0.

1 If x = u/v and y = s/t, then take Z = vt,X = ut and Y = sv.

126 Elliptic curves

Hence the other rational point has x-coordinate

x1 =5m2 − 1

5m2 + 1.

The corresponding y-coordinate is

y1 = − 2m

5m2 + 1.

Hence, we see that

m =y1

x1 − 1,

x1 =5m2 − 1

5m2 + 1

y1 = − 2m

5m2 + 1

determine a one-to-one correspondence between rational numbersm and rational

points (x1, y1) on the ellipse

x2 + 5y2 = 1,

apart from the point (1, 0) that we started with. If we express m = r/s then

x1 =5r2 − s2

5r2 + s2, y1 = − 2rs

5r2 + s2.

As a consequence, if (X,Y, Z) are solutions to

X2 + 5Y 2 = Z2

then the point (X/Z, Y/Z) lies on the ellipse and hence, there exist r and s such

that

(5r2 − s2,−2rs, 5r2 + s2)

is proportional to (X,Y, Z).

We do not necessarily obtain all primitive triples2 (X,Y, Z) that satisfy

X2 + 5Y 2 = Z2

in this way. For example (2, 3, 7) is a solution that cannot be expressed as (5r2−s2,−2rs, 5r2 + s2), for some integers r and s. (Do you know why is this solution

not captured by the above method?)

Note that in both examples, we find rational solutions of f(x, y) = 0 by finding

integer solution for

Znf(X/Z, Y/Z) = 0,

where n is the maximum of the sum of the degrees of x and y. The expression

Znf(X/Z, Y/Z) = 0

2 That is, the gcd(X, Y,Z) = 1.


is the homogeneous form of

f(x, y) = 0.

For example the homogeneous form of x3 + y3 = 1 is X3 + Y 3 = Z3.

11.2 The Geometry of cubic curves

Let

f(x, y) = ax3 + bx2y + cxy2 + dy3 + ex2 + fxy + gy2 + hx+ iy + j = 0

be the equation of a general cubic, namely, one such that f(x, y) has degree 3.

In our discussion we will assume that our curve is non-singular.

Suppose we can find two rational points P and Q on the curve, then we can

generally find a third one. What we need to do is to draw a line joining the two

rational points on the curve. This line is of the form y = mx+ c, where m and

c are rational numbers. Substituting y = mx + c into f(x, y) = 0 we obtain an

cubic equation

p(x) = 0.

Since two of the roots of p(x) = 0 are rational and that p(x) ∈ Q[x], we conclude

that the third point of intersection must also be a rational point. We will denote

this point as P ∗ Q.We note that if we only have one rational point, we can still construct new

rational points on the curve by drawing a line tangent to the curve at the point.

If we continue with this operation, we may get many new rational points on the

cubic curve.

Although ∗ is a binary operation on Cf (Q), it is not the binary operation we

use to turn Cf (Q) into a group. To obtain the right group operation, we first fix

a rational point O of Cf (C) (assuming this point exists). We define the group

law by

P⊕ Q := O ∗ (P ∗ Q).

We will show next that Cf(Q) under ⊕ is a group. But before proceeding, we

record one of the most important theorems in the theory of elliptic curves,

namely, the Mordell Theorem. Mordell’s theorem states that if Cf (C) is non-

singular rational curve (i.e, a, b, c, d, e, f, g, h, i, j ∈ Q), then there is a finite set

of rational points

{P1, P2, · · · , Pk}

such that for any Q ∈ Cf(Q),

Q = n1P1 ⊕ n2P2 ⊕ · · · ⊕ nkPk.

We now show that (Cf (Q),⊕) is an abelian group.

128 Elliptic curves

First, we observe that

P ⊕Q = Q⊕ P

for P,Q ∈ Cf (Q).

Next, we observe that O acts as an identity. Note that

O⊕ P = P.

The reason is clear. The line joining P and O meets the curve at a third point

Q. Now the line joining O and Q must then meet the curve again at P.

We now come to the “inverse” of a point Q under this group law. First, draw

a tangent line at O and denote the “third” point on the curve by S. Given Q, we

claim that T = Q ∗ S is the inverse of Q.

Note that T ⊕ Q = O ∗ (T ∗ Q). This is clearly O since S must be T ∗ Q and

O ∗ S is again O. This shows that T is the inverse of Q and we denote T by Q.

To prove that the set of rational points forms a group with the group law, it

remains to show that the group law is associative. Let P, Q and R be three points

on the curve. We want to show that

(P⊕ Q)⊕ R = P⊕ (Q⊕ R).

Rewriting the above expression, we find that

((P⊕ Q) ∗ R) ∗ O = ((Q⊕ R) ∗ P) ∗ O.

Hence, it suffice to show that

(P⊕ Q) ∗ R = (Q⊕ R) ∗ P.

We now identify nine points : O,P,Q,R,P ∗ Q,P ⊕ Q,Q ∗ R,Q ⊕ R and the

intersection of the two lines, one joining P⊕ Q and R and the other joining Q⊕ R

and P. Our aim is to show that the ninth point lies on the curve.

To achieve this, we need a result from Geometry.

theorem 11.1 Let C,C1 and C2 be three cubic curves. Suppose C goes through

eight of the nine intersection points of C1 and C2, then C goes through the ninth

intersection point.

Here C1 and C2 are cubic curves and so they have nine intersection points by

Bezout’s theorem (I have to be vague here. We are also assuming that these curves

are defined over C).

Assuming the above theorem, we let C1 be the three lines 3

[P,Q,P ∗ Q], [O,Q ∗ R,Q⊕ R] and [R,P⊕ Q,R ∗ (P⊕ Q)].

Let C2 be the three lines

[P,Q⊕ R,P ∗ (Q⊕ R)], [R,Q,R ∗ Q] and [O,P ∗ Q,P⊕ Q].

3 I use [A,B, C] to denote the line through the points A,B and C.


Note that these two cubics intersect at the nine points we have identified. We denote

the last point by T and this is the intersection of

[P,Q⊕ R,P ∗ (Q⊕ R)] and [R,P⊕ Q,R ∗ (P⊕ Q)].

Since our curve passes through eight of these nine points, it must pass through T.

This implies that

T = P ∗ (Q⊕ R) = R ∗ (P⊕ Q).

Hence the result.

Remark 11.2 It appears that our operation ⊕ is dependent on the choice of

O. It is not difficult to show that if we begin with another point O′ and define

a corresponding group law +′, we will obtain a group isomorphic to the group

arising from O.

The Mordell Theorem can now be stated as saying that the group (Cf (Q),⊕) is

a finitely generated abelian group.

We end this section by saying a few words about Theorem 11.1.

To define a cubic, we need ten coefficients. But we can multiply by a constant

and still obtain the same curve and hence, we really need only nine coefficients.

The set of cubics that goes through one given point is therefore eight dimensional.

Continuing this way, we see that the set of cubics that goes through eight given

points of intersection of two curves C1 and C2 is one dimensional.

Let F1(x, y) = 0 and F2(x, y) = 0 be the equations of the two curves C1 and C2.

The curve Ct given by the equation F1 + tF2 = 0 passes through the eight points

of intersections of C1 and C2. The collection of curves {Ct|t ∈ C} is clearly one

dimensional and it must therefore be the set of cubics that passes the eight given

points. In other words, C must coincide with Ct for some t. Since F1 and F2 vanishes

at the ninth point, the curve Ct, which is also the curve C, must pass through the

ninth point.

Theorem 11.1 can also be explained as follow:

We will denote C1 · C2 to mean the intersections of C1 and C2 and we write

C1 · C2 =k∑

j=1

Pj,

where Pj are the points of intersection. We use C1C2 to denote the curve defined by

f(x, y)g(x, y) = 0 if C1 is defined by f(x, y) = 0 and C2 is defined by g(x, y) = 0.

By Bezout’s Theorem,

C1 · C2 =

9∑

j=1

Pj.

Let

C · C1 =

8∑

j=1

Pj + Q

130 Elliptic curves

and let L be a line passing through P9. Then

L · C1 = P9 + R+ S

and

LC · C1 =

9∑

j=1

Pj + R+ S+ Q = C2 · C1 + R + S+ Q.

Now, R, S and Q are points on C1. Hence there must be a line L′ such that L′ ·C1 =

R+S+Q and this line has to be L since both lines contain R and S. Hence Q = P9.

11.3 Explicit form of the group law

It is a fact that any cubic with a rational point can be transformed into a certain

special form called the Weierstrass normal form given by

y2 = x3 + ax2 + bx+ c.

The homogeneous form of this equation is given by

Y 2Z = X3 + aX2Z + bXZ2 + cZ3.

What is the intersection of this curve with Z = 0? It is the line X = 0 and the

cubic meets this line at infinity in a point three times! This point can be shown

to be non-singular and it is counted as a rational point. We will take our O in our

discussion of the group law to be this point at infinity. With this choice of identity,

we will compute P⊕ Q.

Let P = (x1, x2) and Q = (x2, y2), with P 6= Q. Then P⊕ Q = (x3, y3). One can

show that

x3 =

(y2 − y1x2 − x1

)2

− a− x1 − x2, y3 = −y1 −y2 − y1x2 − x1

(x3 − x1).

If P = Q and y1 6= 0 then

x3 =

(3x21 + 2ax1 + b

2y1

)2

− a− 2x1, y3 = −y1 −(3x21 + 2ax1 + b

2y1

)(x3 − x1).

11.4 The Pollard’s p− 1 method

Let n be composite and suppose n happens to have a prime factor p such that p− 1

is a product of small primes. From Fermat’s Theorem, we know that

ap−1 ≡ 1 (mod p).

This means that p will divide (ap−1 − 1, n).

Of course, we do not know what p is and so we choose an integer

k = 2α1 · 3α2 · · · pαrr ,

11.4 The Pollard’s p− 1 method 131

where pj is the j-th prime and αj are small positive integers. We then compute

(ak− 1, n). Note that if we are lucky that n has a prime factor p such that (p− 1)|kthen p|(ak − 1). If (ak − 1, n) 6= n, then we would obtain a non-trivial factor for

n since (ak − 1, n) > 1. If the gcd is n we choose a new a but if the gcd is 1, we

choose a larger k.

The Lenstra’s Elliptic curve algorithm is motivated by Pollard’s p− 1 method.

Step 1. Choose random integers b, x1, y1 between 1 and n. Let c = y21 − x31 − bx1(mod n) and let C be the curve defined by

y2 = x3 + bx+ c (mod n).

Let P = (x1, y1).

Step 2. Check that (4b3 +27c2, n) = 1. If not, pick a new b. If it is strictly between

1 and n, then we have found a new factor of n.

Step 3. Choose k as in Pollard’s method.

Step 4. Compute

kP =

(akd2k,bkd3k

).

Step 5. Compute D = (dk, n). If 1 < D < n then we have found a factor for n. If

D = 1 then either change k or change a curve. If D = n, then decrease k.

Let us look at why the algorithm works. Suppose that we are lucky and we have

chosen a curve C given by f(x, y) = 0 and a number k so that for some prime

p dividing n, we have |Cf(Fp)| divides k. Then every element in Cf (Fp) has order

dividing k. In particular taking P ∈ Cf (Q) and reduce modulo p, we know that

kP = O.

In other words, kP (mod p) must be the point at infinity and hence p must divide

dk.

132 Elliptic curves

example 11.3 Let N = 2433943 and k = 24 · 32 · 5 · 7 · 11 · 13 · 17. Choose the

curve with equation

y2 = x3 + 337x− 337.

Then we have the following table of points 2kP (mod N).

2P (28898,2389338)

22P (1920153,1597958)

23P (1828557,767520)

24P (2100822,566900)

25P (801535,1529630)

26P (2271592,1481641)

27P (1601924,1866581)

28P (1168060,2433281)

29P (59132,2344314)

210P (429492,1979097)

211P (962483,1377053)

212P (2297424,1684259)

213P (127907,578944)

214P (2219139,2354694)

215P (832844,2040952)

216P (300449,839254)

217P (1446442,2156245)

218P (2250971,2425040)

219P (1935828,2047880)

220P (2316360,333373)

221P (1168755,1049467)

222P (333607, 905175)

223P (662422,2225670)We have

k = 24 + 26 + 210 + 212 + 213 + 214 + 215 + 217 + 219 + 220 + 221 + 223.

We now record all the computations modulo N :

(24 + 26)P = (1089595, 2238715)

(210 + 212)P = (1524824, 77394)

(213 + 214)P = (1678371, 202532)

(215 + 217)P = (1782098, 1792826)

(219 + 220)P = (345528, 66256)

(221 + 223)P = (1704240, 1187779)

Let

α = {(24 + 26 + 210 + 212)}P = (1099133, 402310),

11.5 Appendix 133

and

β = {(213 + 214 + 215 + 217 + 219 + 220 + 221 + 223)}P = (1041049, 1824988).

Now α+ β is not defined. A quick check shows that

gcd(1099133− 1041049, N) = 1117.

Hence we have

2433943 = 1117 · 2179.

11.5 Appendix

Elliptic curves can be “parametrized” by the Weierstrass elliptic function defined by

℘(z;L) :=1

z2+

∑

ω∈L\{0}

(1

(z − ω)2− 1

ω2

),

where

L = Z⊕ Zτ,

with τ ∈ C such that Imτ > 0. The function ℘(z) (we drop the L when the lattice

is fixed) is doubly periodic with periods 1 and τ . Also the points (℘(z), ℘′(z)) are

on the curve

C : y2 = 4x3 − g2x− g3,

where

g2 = 60∑

ω∈L\{0}

1

ω4and g3 = 140

∑

ω∈L\{0}

1

ω6.

Furthermore,

℘(z + w) = −℘(z)− ℘(w) +1

4

(℘′(z)− ℘′(w)

℘(z)− ℘(w)

)2

,

and

℘′(z + w) = −℘(z)− ℘′(w) − ℘′(z)

℘(w)− ℘(z)(℘(z + w)− ℘(z)) .

This shows that if P = (℘(z), ℘′(z)) and Q = (℘(w), ℘′(w)) then

P+Q = (℘(z+ w), ℘′(z+ w))

and hence associativity law for+ becomes very clear since it translates to associativity

for addition in C.

ma3265 introduction to number theory - dept of maths, nuschanhh/notes/ma3265.pdf · 1 divisibility...

Documents