c1 - insertion sort

13
Insertion Sort Ing. Juan Ignacio Zamora M. MSc. | Universidad La8noamericana de Ciencia y Tecnología

Upload: juan-zamora-msc-mba

Post on 15-Jul-2015

128 views

Category:

Education


0 download

TRANSCRIPT

Page 1: C1 - Insertion Sort

ì  Insertion  Sort  Ing.  Juan  Ignacio  Zamora  M.  MSc.  |  Universidad  La8noamericana  de  Ciencia  y  Tecnología    

Page 2: C1 - Insertion Sort

La  leyenda  de  Gauss  

ì  Érase  una  vez  un  niño  alemán  llamado  Carl  Friedrich  Gauss.  Cuando  tenía  diez  años,  en  1787,  su  profesor  de  la  escuela,  enfadado  porque  sus  alumnos  se  portaban  mal,  le  puso  un  problema  matemá8co  al  pequeño  Carl  y  a  sus  compañeros.  

ì  Los  niños  debían  sumar  todos  los  números  del  1  al  100;  es  decir,  1+2=3+3=6+4=10+5=15+6=21  y  así  sucesivamente  hasta  sumar  los  100  

ì  El  profesor  se  sentó  en  su  silla  a  leer  el  periódico,  confiaba  en  que  tendría  horas  hasta  que  los  niños  sumaran  todos  los  números….  

Page 3: C1 - Insertion Sort

ì  Gauss  lo  resolvió  en  5  minutos…  

Page 4: C1 - Insertion Sort

Como  lo  hizo?    

ì  Sea  la  progresión  S  =  a……m…….p…….u  cuya  razón  esta  definida  por  r.  

ì  Entonces  S  =  a+b+c………….+l+m+u  

ì  También  S  =  u+m+l………….+c+b+a  

ì  Entonces  2S  =  (a+u)  +  (b+m)  +  (c+l)  +  (l+c)  +  (m+b)  +  (u+a).  

ì  Todos  los  binomios  anteriores  son  iguales  a  (a+u).  Recuerde  que  a  es  el  primer  termino  y  u  el  ul8mo.  

ì  Esto  quiere  decir  que  la  la  suma  de  la  progresión  es  (a+u)  “n”  veces.  Ósea,    (a+u)n  y  esto  se  divide  entre  2  ya  que  todos  los  términos  se  cuentan  2  veces  por  tanto:  

S = (a+u)n2

Page 5: C1 - Insertion Sort

Probemos  el  Teorema  

S =1+ 2+3.......+ 98+ 99+100S =100+ 99+ 98.......+3+ 2+12S =101+101+101.......+101+101+101

S = (a+u)n2

S = (1+100)1002

=(101)100

2= 5050

Page 6: C1 - Insertion Sort

Progresiones  Aritméticas  

ì  Es  toda  serie  es  la  cual  cada  termino  después  del  primero  se  ob8ene  sumándole  al  termino  anterior  una  can8dad  constante.  

ì  S  =1,  3,  5,  7  ….  Donde  la  razón  r  o  diferencia  d  es  2,  ya  que  3-­‐1=  2  à  esto  implica  que  la  razón  (r)  es  la  diferencia  entre  un  termino  cualquiera  menos  el  anterior.  

Page 7: C1 - Insertion Sort

Deducción  de  la  formula  del  enésimo  termino  

ì  Sea  la  progresión  S  =  a,  b,  c  ,d……….u,  en  donde  “u”  es  el  enésimo  termino  y  cuya  razón  es  “r”  

ì  Entonces  tenemos  que    ì  b  =  a  +  r  ì  c  =  b  +  r  à  (a  +  r)  +  r  =  a  +  2r  ì  d  =  c  +  r  à  (a  +  2r)  +  r  =  a  +  3r  

ì  Entonces  cada  termino  es  igual  al  primer  termino  de  la  progresión  mas  la  razón  como  términos  le  preceden.  

ì  Sabemos  que  el  primer  termino  es  “a”  y  le  preceden  (n-­‐1)  términos  donde  la  razón  esta  dada  por  “r”,  entonces  podemos  concluir  que  

u = a+ (n−1)r

Page 8: C1 - Insertion Sort

Deducción  de  la  formula  del  enésimo  termino  

ì  Volviendo  al  ejemplo  del  pequeño  Gauss,  tenemos  que  S  =  5050,  el  primer  termino  “a”  es  1  y  que  la  razón  “r”  es  1  ya  que  se  suma  de  uno  en  uno  y  que  la  can8dad  (n)  de  términos  es  100.  

   

ì  Con  esto  respaldamos  el  teorema  de  Gauss.  

ì  Inténtelo  Ud:  ì  Hallar  el  15vo  termino  de  la  sucesión  4,  7,  10…..  ì  El  15vo  termino  es  no  representa  la  suma  de  los  

términos,  solamente  representa  su  valor.  

u = a+ (n−1)r u =1+ (100−1)1=100

Page 9: C1 - Insertion Sort

Algoritmo  1  :  Insertion  Sort  2.1 Insertion sort 17

2!!

! 2!

4!! !

!! 4!

5!! !

!! 5!

!

7!!! !

! !!!7!

10! !! !! !

!!!!!

10!

Figure 2.1 Sorting a hand of cards using insertion sort.

reading our algorithms. What separates pseudocode from “real” code is that inpseudocode, we employ whatever expressive method is most clear and concise tospecify a given algorithm. Sometimes, the clearest method is English, so do notbe surprised if you come across an English phrase or sentence embedded withina section of “real” code. Another difference between pseudocode and real codeis that pseudocode is not typically concerned with issues of software engineering.Issues of data abstraction, modularity, and error handling are often ignored in orderto convey the essence of the algorithm more concisely.

We start with insertion sort, which is an efficient algorithm for sorting a smallnumber of elements. Insertion sort works the way many people sort a hand ofplaying cards. We start with an empty left hand and the cards face down on thetable. We then remove one card at a time from the table and insert it into thecorrect position in the left hand. To find the correct position for a card, we compareit with each of the cards already in the hand, from right to left, as illustrated inFigure 2.1. At all times, the cards held in the left hand are sorted, and these cardswere originally the top cards of the pile on the table.

We present our pseudocode for insertion sort as a procedure called INSERTION-SORT, which takes as a parameter an array AŒ1 : : n! containing a sequence oflength n that is to be sorted. (In the code, the number n of elements in A is denotedby A: length.) The algorithm sorts the input numbers in place: it rearranges thenumbers within the array A, with at most a constant number of them stored outsidethe array at any time. The input array A contains the sorted output sequence whenthe INSERTION-SORT procedure is finished.

Page 10: C1 - Insertion Sort

Pseudo-­‐Codigo  ::  InsertionSort  

18 Chapter 2 Getting Started

1 2 3 4 5 65 2 4 6 1 3(a)

1 2 3 4 5 62 5 4 6 1 3(b)

1 2 3 4 5 62 4 5 6 1 3(c)

1 2 3 4 5 62 4 5 6 1 3(d)

1 2 3 4 5 62 4 5 61 3(e)

1 2 3 4 5 62 4 5 61 3(f)

Figure 2.2 The operation of INSERTION-SORT on the array A D h5; 2; 4; 6; 1; 3i. Array indicesappear above the rectangles, and values stored in the array positions appear within the rectangles.(a)–(e) The iterations of the for loop of lines 1–8. In each iteration, the black rectangle holds thekey taken from AŒj !, which is compared with the values in shaded rectangles to its left in the test ofline 5. Shaded arrows show array values moved one position to the right in line 6, and black arrowsindicate where the key moves to in line 8. (f) The final sorted array.

INSERTION-SORT.A/

1 for j D 2 to A: length2 key D AŒj !3 // Insert AŒj ! into the sorted sequence AŒ1 : : j ! 1!.4 i D j ! 15 while i > 0 and AŒi ! > key6 AŒi C 1! D AŒi !7 i D i ! 18 AŒi C 1! D key

Loop invariants and the correctness of insertion sortFigure 2.2 shows how this algorithm works for A D h5; 2; 4; 6; 1; 3i. The in-dex j indicates the “current card” being inserted into the hand. At the beginningof each iteration of the for loop, which is indexed by j , the subarray consistingof elements AŒ1 : : j ! 1! constitutes the currently sorted hand, and the remainingsubarray AŒj C 1 : : n! corresponds to the pile of cards still on the table. In fact,elements AŒ1 : : j ! 1! are the elements originally in positions 1 through j ! 1, butnow in sorted order. We state these properties of AŒ1 : : j ! 1! formally as a loopinvariant:

At the start of each iteration of the for loop of lines 1–8, the subarrayAŒ1 : : j !1! consists of the elements originally in AŒ1 : : j !1!, but in sortedorder.

We use loop invariants to help us understand why an algorithm is correct. Wemust show three things about a loop invariant:

18 Chapter 2 Getting Started

1 2 3 4 5 65 2 4 6 1 3(a)

1 2 3 4 5 62 5 4 6 1 3(b)

1 2 3 4 5 62 4 5 6 1 3(c)

1 2 3 4 5 62 4 5 6 1 3(d)

1 2 3 4 5 62 4 5 61 3(e)

1 2 3 4 5 62 4 5 61 3(f)

Figure 2.2 The operation of INSERTION-SORT on the array A D h5; 2; 4; 6; 1; 3i. Array indicesappear above the rectangles, and values stored in the array positions appear within the rectangles.(a)–(e) The iterations of the for loop of lines 1–8. In each iteration, the black rectangle holds thekey taken from AŒj !, which is compared with the values in shaded rectangles to its left in the test ofline 5. Shaded arrows show array values moved one position to the right in line 6, and black arrowsindicate where the key moves to in line 8. (f) The final sorted array.

INSERTION-SORT.A/

1 for j D 2 to A: length2 key D AŒj !3 // Insert AŒj ! into the sorted sequence AŒ1 : : j ! 1!.4 i D j ! 15 while i > 0 and AŒi ! > key6 AŒi C 1! D AŒi !7 i D i ! 18 AŒi C 1! D key

Loop invariants and the correctness of insertion sortFigure 2.2 shows how this algorithm works for A D h5; 2; 4; 6; 1; 3i. The in-dex j indicates the “current card” being inserted into the hand. At the beginningof each iteration of the for loop, which is indexed by j , the subarray consistingof elements AŒ1 : : j ! 1! constitutes the currently sorted hand, and the remainingsubarray AŒj C 1 : : n! corresponds to the pile of cards still on the table. In fact,elements AŒ1 : : j ! 1! are the elements originally in positions 1 through j ! 1, butnow in sorted order. We state these properties of AŒ1 : : j ! 1! formally as a loopinvariant:

At the start of each iteration of the for loop of lines 1–8, the subarrayAŒ1 : : j !1! consists of the elements originally in AŒ1 : : j !1!, but in sortedorder.

We use loop invariants to help us understand why an algorithm is correct. Wemust show three things about a loop invariant:

MIT  Chapter  2  –  pag  18  

Page 11: C1 - Insertion Sort

Tiempo  de  Ejecución  de  InsertionSort  26 Chapter 2 Getting Started

INSERTION-SORT.A/ cost times1 for j D 2 to A: length c1 n2 key D AŒj ! c2 n ! 13 // Insert AŒj ! into the sorted

sequence AŒ1 : : j ! 1!. 0 n ! 14 i D j ! 1 c4 n ! 15 while i > 0 and AŒi ! > key c5

Pnj D2 tj

6 AŒi C 1! D AŒi ! c6

Pnj D2.tj ! 1/

7 i D i ! 1 c7

Pnj D2.tj ! 1/

8 AŒi C 1! D key c8 n ! 1

The running time of the algorithm is the sum of running times for each state-ment executed; a statement that takes ci steps to execute and executes n times willcontribute cin to the total running time.6 To compute T .n/, the running time ofINSERTION-SORT on an input of n values, we sum the products of the cost andtimes columns, obtaining

T .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5

nX

j D2

tj C c6

nX

j D2

.tj ! 1/

C c7

nX

j D2

.tj ! 1/C c8.n ! 1/ :

Even for inputs of a given size, an algorithm’s running time may depend onwhich input of that size is given. For example, in INSERTION-SORT, the bestcase occurs if the array is already sorted. For each j D 2; 3; : : : ; n, we then findthat AŒi ! " key in line 5 when i has its initial value of j ! 1. Thus tj D 1 forj D 2; 3; : : : ; n, and the best-case running time isT .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5.n ! 1/C c8.n ! 1/

D .c1 C c2 C c4 C c5 C c8/n ! .c2 C c4 C c5 C c8/ :

We can express this running time as anC b for constants a and b that depend onthe statement costs ci ; it is thus a linear function of n.

If the array is in reverse sorted order—that is, in decreasing order—the worstcase results. We must compare each element AŒj ! with each element in the entiresorted subarray AŒ1 : : j ! 1!, and so tj D j for j D 2; 3; : : : ; n. Noting that

6This characteristic does not necessarily hold for a resource such as memory. A statement thatreferences m words of memory and is executed n times does not necessarily reference mn distinctwords of memory.

26 Chapter 2 Getting Started

INSERTION-SORT.A/ cost times1 for j D 2 to A: length c1 n2 key D AŒj ! c2 n ! 13 // Insert AŒj ! into the sorted

sequence AŒ1 : : j ! 1!. 0 n ! 14 i D j ! 1 c4 n ! 15 while i > 0 and AŒi ! > key c5

Pnj D2 tj

6 AŒi C 1! D AŒi ! c6

Pnj D2.tj ! 1/

7 i D i ! 1 c7

Pnj D2.tj ! 1/

8 AŒi C 1! D key c8 n ! 1

The running time of the algorithm is the sum of running times for each state-ment executed; a statement that takes ci steps to execute and executes n times willcontribute cin to the total running time.6 To compute T .n/, the running time ofINSERTION-SORT on an input of n values, we sum the products of the cost andtimes columns, obtaining

T .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5

nX

j D2

tj C c6

nX

j D2

.tj ! 1/

C c7

nX

j D2

.tj ! 1/C c8.n ! 1/ :

Even for inputs of a given size, an algorithm’s running time may depend onwhich input of that size is given. For example, in INSERTION-SORT, the bestcase occurs if the array is already sorted. For each j D 2; 3; : : : ; n, we then findthat AŒi ! " key in line 5 when i has its initial value of j ! 1. Thus tj D 1 forj D 2; 3; : : : ; n, and the best-case running time isT .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5.n ! 1/C c8.n ! 1/

D .c1 C c2 C c4 C c5 C c8/n ! .c2 C c4 C c5 C c8/ :

We can express this running time as anC b for constants a and b that depend onthe statement costs ci ; it is thus a linear function of n.

If the array is in reverse sorted order—that is, in decreasing order—the worstcase results. We must compare each element AŒj ! with each element in the entiresorted subarray AŒ1 : : j ! 1!, and so tj D j for j D 2; 3; : : : ; n. Noting that

6This characteristic does not necessarily hold for a resource such as memory. A statement thatreferences m words of memory and is executed n times does not necessarily reference mn distinctwords of memory.

Donde  el  Tiempo  T  para  una  progresión  n  esta  dado  por:  

Page 12: C1 - Insertion Sort

Tiempo  de  Ejecución  de  InsertionSort  

ì  El  Mejor  Tiempo  de  Ejecución  

ì  Por  tanto  y  en  resumen  esta  ecuación  se  comporta  como  una  función  lineal  de  n.    

 

26 Chapter 2 Getting Started

INSERTION-SORT.A/ cost times1 for j D 2 to A: length c1 n2 key D AŒj ! c2 n ! 13 // Insert AŒj ! into the sorted

sequence AŒ1 : : j ! 1!. 0 n ! 14 i D j ! 1 c4 n ! 15 while i > 0 and AŒi ! > key c5

Pnj D2 tj

6 AŒi C 1! D AŒi ! c6

Pnj D2.tj ! 1/

7 i D i ! 1 c7

Pnj D2.tj ! 1/

8 AŒi C 1! D key c8 n ! 1

The running time of the algorithm is the sum of running times for each state-ment executed; a statement that takes ci steps to execute and executes n times willcontribute cin to the total running time.6 To compute T .n/, the running time ofINSERTION-SORT on an input of n values, we sum the products of the cost andtimes columns, obtaining

T .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5

nX

j D2

tj C c6

nX

j D2

.tj ! 1/

C c7

nX

j D2

.tj ! 1/C c8.n ! 1/ :

Even for inputs of a given size, an algorithm’s running time may depend onwhich input of that size is given. For example, in INSERTION-SORT, the bestcase occurs if the array is already sorted. For each j D 2; 3; : : : ; n, we then findthat AŒi ! " key in line 5 when i has its initial value of j ! 1. Thus tj D 1 forj D 2; 3; : : : ; n, and the best-case running time isT .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5.n ! 1/C c8.n ! 1/

D .c1 C c2 C c4 C c5 C c8/n ! .c2 C c4 C c5 C c8/ :

We can express this running time as anC b for constants a and b that depend onthe statement costs ci ; it is thus a linear function of n.

If the array is in reverse sorted order—that is, in decreasing order—the worstcase results. We must compare each element AŒj ! with each element in the entiresorted subarray AŒ1 : : j ! 1!, and so tj D j for j D 2; 3; : : : ; n. Noting that

6This characteristic does not necessarily hold for a resource such as memory. A statement thatreferences m words of memory and is executed n times does not necessarily reference mn distinctwords of memory.

T (n) = an+ b

Ω(n) = an+ b

Page 13: C1 - Insertion Sort

Tiempo  de  Ejecución  de  InsertionSort  

ì  El  Peor  Tiempo  de  Ejecución  

ì  Esta  función  se  comporta  de  forma  cuadrá8ca  de  n  

 

2.2 Analyzing algorithms 27

nX

j D2

j Dn.nC 1/

2! 1

andnX

j D2

.j ! 1/ Dn.n ! 1/

2

(see Appendix A for a review of how to solve these summations), we find that inthe worst case, the running time of INSERTION-SORT is

T .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5

!n.nC 1/

2! 1

"

C c6

!n.n ! 1/

2

"C c7

!n.n ! 1/

2

"C c8.n ! 1/

D#c5

2C

c6

2C

c7

2

$n2 C

#c1 C c2 C c4 C

c5

2!

c6

2!

c7

2C c8

$n

! .c2 C c4 C c5 C c8/ :

We can express this worst-case running time as an2 C bnC c for constants a, b,and c that again depend on the statement costs ci ; it is thus a quadratic functionof n.

Typically, as in insertion sort, the running time of an algorithm is fixed for agiven input, although in later chapters we shall see some interesting “randomized”algorithms whose behavior can vary even for a fixed input.

Worst-case and average-case analysisIn our analysis of insertion sort, we looked at both the best case, in which the inputarray was already sorted, and the worst case, in which the input array was reversesorted. For the remainder of this book, though, we shall usually concentrate onfinding only the worst-case running time, that is, the longest running time for anyinput of size n. We give three reasons for this orientation.! The worst-case running time of an algorithm gives us an upper bound on the

running time for any input. Knowing it provides a guarantee that the algorithmwill never take any longer. We need not make some educated guess about therunning time and hope that it never gets much worse.

! For some algorithms, the worst case occurs fairly often. For example, in search-ing a database for a particular piece of information, the searching algorithm’sworst case will often occur when the information is not present in the database.In some applications, searches for absent information may be frequent.

2.2 Analyzing algorithms 27

nX

j D2

j Dn.nC 1/

2! 1

andnX

j D2

.j ! 1/ Dn.n ! 1/

2

(see Appendix A for a review of how to solve these summations), we find that inthe worst case, the running time of INSERTION-SORT is

T .n/ D c1nC c2.n ! 1/C c4.n ! 1/C c5

!n.nC 1/

2! 1

"

C c6

!n.n ! 1/

2

"C c7

!n.n ! 1/

2

"C c8.n ! 1/

D#c5

2C

c6

2C

c7

2

$n2 C

#c1 C c2 C c4 C

c5

2!

c6

2!

c7

2C c8

$n

! .c2 C c4 C c5 C c8/ :

We can express this worst-case running time as an2 C bnC c for constants a, b,and c that again depend on the statement costs ci ; it is thus a quadratic functionof n.

Typically, as in insertion sort, the running time of an algorithm is fixed for agiven input, although in later chapters we shall see some interesting “randomized”algorithms whose behavior can vary even for a fixed input.

Worst-case and average-case analysisIn our analysis of insertion sort, we looked at both the best case, in which the inputarray was already sorted, and the worst case, in which the input array was reversesorted. For the remainder of this book, though, we shall usually concentrate onfinding only the worst-case running time, that is, the longest running time for anyinput of size n. We give three reasons for this orientation.! The worst-case running time of an algorithm gives us an upper bound on the

running time for any input. Knowing it provides a guarantee that the algorithmwill never take any longer. We need not make some educated guess about therunning time and hope that it never gets much worse.

! For some algorithms, the worst case occurs fairly often. For example, in search-ing a database for a particular piece of information, the searching algorithm’sworst case will often occur when the information is not present in the database.In some applications, searches for absent information may be frequent.

Ο(n) = an2 + bn+ cT (n) = an2 + bn+ c

Mejor  vs  Peor