testing and debugging (depuração em haskell)

Testing and Debugging(Depuração em Haskell)

Complementa as seções anteriores

Original autorizado por: John Hughes

http://www.cs.chalmers.se/~rjmh/

Adaptado por: Claudio Cesar de Sá

What’s the Difference?

Testing means trying out a program which is believed to work, in an effort to gain confidence that it is correct. If no errors are revealed by thorough testing, then, probably, relatively few errors remain.

Debugging means observing a program which is known not to work, in an effort to localise the error. When a bug is found and fixed by debugging, testing can be resumed to see if the program now works.

This lecture: describes recently developed tools to help with each activity.

Debugging

Here’s a program with a bug:

median xs = isort xs !! (length xs `div` 2)isort = foldr insert []insert x [] = [x]insert x (y:ys) | x<y = x:y:ys | x>=y = y:x:ys

Median> median [8,4,6,10,2,7,3,5,9,1]2 Median> isort [8,4,6,10,2,7,3,5,9,1][1,8,4,6,10,2,7,3,5,9]

A test revealsmedian doesn’t

work.We start trying

functions median calls.

isort doesn’t work either.

Debugging Tools

The Manual Approach

We choose cases to try, and manually explore the behaviour of the program, by calling functions with various (hopefully revealing) arguments, and inspecting their outputs.

The Automated Approach

We connect a debugger to the program, which lets us observe internal values, giving us more information to help us diagnose the bug.

The Haskell Object Observation Debugger

Provides a function

which collects observations of its second argument, tagged with the String, and returns the argument unchanged.

Think of it as like connecting an oscilloscope to the program: the program's behaviour is unchanged, but we see more.

(You need to import the library which defines observe in order to use it: add

at the start of your program).

observe :: String -> a -> a

import Observe

Listas> :l listas_haskell.hs Reading file "listas_haskell.hs":Reading file "/usr/share/hugs/lib/exts/Observe.lhs":Reading file "listas_haskell.hs": Hugs session for:/usr/share/hugs/lib/Prelude.hs/usr/share/hugs/lib/exts/Observe.lhslistas_haskell.hsListas>

Garantindo que Observe.lhsfoi carregado ...

What Do Observations Look Like?

Median> sum [observe "n*n" (n*n) | n <- [1..4]]30

>>>>>>> Observations <<<<<<

n*n 1 4 9 16

We add a ''probe'' to the program

The values observed aredisplayed, titled with thename of the observation.

Observing a List

Median> sum (observe "squares" [n*n | n <- [1..4]])30


squares (1 : 4 : 9 : 16 : [])

Observing the entire list letsus see the order of values

also.

Now there is just oneobservation, the list

itself.Lists are always

observedin ''cons'' form.

Observing a Pipeline

Median> (sum . observe "squares" . map (\x->x*x)) [1..4]30


squares (1 : 4 : 9 : 16 : [])

We can add observers

to ''pipelines'' -- longcompositions of

functions -- to see the

values flowingbetween them.

Observing Counting Occurrences

countOccurrences = map (\ws -> (head ws, length ws)) . observe "after groupby" . groupBy (==) . observe "after sort" . sort . observe "after words” . words

Add observationsafter each stage.

Observing Counting Occurrences

Main> countOccurrences "hello clouds hello sky"[("clouds",1),("hello",2),("sky",1)]


after groupby (("clouds" : []) : ("hello" : "hello" : []) : ("sky" : []) : [])

after sort ("clouds" : "hello" : "hello" : "sky" : [])

after words ("hello" : "clouds" : "hello" : "sky" : [])

Observing Consumers

An observation tells us not only what value flowed past the observer -- it also tells us how that value was used!

Main> take 3 (observe "xs" [1..10])[1,2,3]


xs (1 : 2 : 3 : _)

The _ is a ''don't care'' value --certainly some list appeared here,

but it was never used!

Observing Length

Main> length (observe "xs" (words "hello clouds"))2


xs (_ : _ : [])

The length function did not needto inspect the values of the

elements,so they were not observed!

Observing Functions

We can even observe functions themselves!

Main> observe "sum" sum [1..5]15


sum { \ (1 : 2 : 3 : 4 : 5 : []) -> 15 }

Observe ''sum'' sumis a function, which is

applied to [1..5]

We see arguments andresults, for the calls

whichactually were made!

Observing foldr

Recall that : foldr (+) 0 [1..4] = 1 + (2 + (3 + (4 + 0)))

Let’s check this, by observing the addition function.

Main> foldr (observe "+" (+)) 0 [1..4]10


+ { \ 4 0 -> 4 , \ 3 4 -> 7 , \ 2 7 -> 9 , \ 1 9 -> 10 }

Observing foldl

We can do the same thing to observe foldl, which behaves as

foldl (+) 0 [1..4] = (((0 + 1) + 2) + 3) + 4

Main> foldl (observe "+" (+)) 0 [1..4]10


+ { \ 0 1 -> 1 , \ 1 2 -> 3 , \ 3 3 -> 6 , \ 6 4 -> 10 }

How Many Elements Does takeWhile Check?

takeWhile isAlpha ''hello clouds hello sky'' == ''hello''

takeWhile isAlpha selects the alphabetic characters from the front of the list.

How many times does takeWhile call isAlpha?

How Many Elements Does takeWhile Check?

Main> takeWhile (observe "isAlpha" isAlpha) "hello clouds hello sky""hello"


isAlpha { \ ' ' -> False , \ 'o' -> True , \ 'l' -> True , \ 'l' -> True , \ 'e' -> True , \ 'h' -> True }

takeWhile calls isAlpha

six times -- the last calltells us it’s time to

stop.

Observing Recursion

fac 0 = 1fac n | n>0 = n * fac (n-1)

Main> observe "fac" fac 6720


fac { \ 6 -> 720 }

We did not observe therecursive calls!

We observe this useof the function.

Observing Recursionfac = observe "fac" fac'fac' 0 = 1fac' n | n>0 = n * fac (n-1)

Main> fac 6720


fac { \ 6 -> 720 , \ 5 -> 120 , \ 4 -> 24 , \ 3 -> 6 , \ 2 -> 2 , \ 1 -> 1 , \ 0 -> 1 }

We observe all calls ofthe fac function.

Debugging median

median xs = observe "isort xs" (isort xs) !! (length xs `div` 2)

Main> median [4,2,3,5,1]2


isort xs (1 : 4 : 2 : 3 : 5 : [])

Wrong answer:the median is 3

Wrong (unsorted)result from isort

Debugging isort

isort :: Ord a => [a] -> [a]isort = foldr (observe "insert" insert) []



insert { \ 1 [] -> 1 : [] , \ 5 (1 : []) -> 1 : 5 : [] , \ 3 (1 : 5 : []) -> 1 : 3 : 5 : [] , \ 2 (1 : 3 : 5 : []) -> 1 : 2 : 3 : 5 : [] , \ 4 (1 : 2 : 3 : 5 : []) -> 1 : 4 : 2 : 3 : 5 : [] }

All well, except

for this case

Debugging insert

insert x [] = [x]insert x (y:ys) | x<y = observe "x<y" (x:y:ys) | x>=y = observe "x>=y" (y:x:ys)



x>=y (1 : 5 : []) (1 : 3 : 5 : []) (1 : 2 : 3 : 5 : []) (1 : 4 : 2 : 3 : 5 : [])

Observe theresults from

eachcase

Only the secondcase was used!

The Bug!

I forgot the recursive call…

insert x [] = [x]insert x (y:ys) | x<y = x:y:ys | x>=y = y:insert x ys


Bug fixed!

The right answer

Summary

➢ The observe function provides us with a wealth of information about how programs are evaluated, with only small changes to the programs themselves

➢ That information can help us understand how programs work (foldr, foldrl, takeWhile etc.)

➢ It can also help us see where bugs are.

Testing

Testing means trying out a program which is believed to work, in an effort to gain confidence that it is correct.

Testing accounts for more than half the development effort on a large project (I’ve heard all from 50-80%).

Fixing a bug in one place often causes a failure somewhere else -- so the entire system must be retested after each change. At Ericsson, this can take three months!

''Hacking'' vs Systematic Testing

''Hacking''

Systematic testing

•Try some examples until the software seems to work.

•Record test cases, so that tests can be repeated after a modification (regression testing).

•Document what has been tested.

•Establish criteria for when a test is successful -- requires a specification.

•Automate testing as far as possible, so you can test extensively and often.

QuickCheck: A Tool for Testing Haskell Programs

Based on formulating properties, which

•can be tested repeatedly and automatically

•document what has been tested

•define what is a successful outcome

•are a good starting point for proofs of correctness

Properties are tested by selecting test cases at random!

Random Testing?

Is random testing sensible? Surely carefully chosen test cases are more effective?

By taking 20% more points in a random test, anyadvantage a partition test might have had is wipedout.

D. Hamlet

✔ QuickCheck can generate 100 random test cases in less time than it takes you to think of one!

✔Random testing finds common (i.e. important!) errors effectively.

A Simple QuickCheck Property

prop_Sort :: [Int] -> Boolprop_Sort xs = ordered (sort xs)

Check that theresult of sort is

ordered.

Random valuesfor xs are generated.

Main> quickCheck prop_SortOK, passed 100 tests. The tests were

passed.

Some QuickCheck Details

import QuickCheck

prop_Sort :: [Int] -> Boolprop_Sort xs = ordered (sort xs)

Main> quickCheck prop_SortOK, passed 100 tests.

We must importthe QuickCheck

library.The type ofa property

must not bepolymorphi

c.

quickCheck is an(overloaded) higher order

function!

We give properties names beginning

with ”prop_” so we can easily find and

test all the properties in a module.

A Property of insert

prop_Insert :: Int -> [Int] -> Bool

prop_Insert x xs = ordered (insert x xs)

Main> quickCheck prop_InsertFalsifiable, after 4 tests:-2[5,-2,-5] Whoops! This list

isn’t ordered!

Discards test caseswhich are not

ordered.

A Corrected Property of insert

prop_Insert :: Int -> [Int] -> Property

prop_Insert x xs = ordered xs ==> ordered (insert x xs)

Main> quickCheck prop_InsertOK, passed 100 tests.

Result is nolonger a simple

Bool.

Read it as ”implies”:

if xs is ordered, then so is (insert x

xs).

Using QuickCheck to Develop Fast Queue Operations

What we’re going to do:

•Explain what a queue is, and give slow implementations of the queue operations, to act as a specification.

•Explain the idea behind the fast implementation.

•Formulate properties that say the fast implementation is ”correct”.

•Test them with QuickCheck.

What is a Queue?

Leave

fromthe

front

Join at the back

Examples

• Files to print

• Processes to run

• Tasks to perform

What is a Queue?

A queue contains a sequence of values. We can add elements at the back, and remove elements from the front.

We’ll implement the following operations:

•empty :: Queue a -- an empty queue

•isEmpty :: Queue a -> Bool -- tests if a queue is empty

•add :: a -> Queue a -> Queue a -- adds an element at the back

•front :: Queue a -> a -- the element at the front

•remove :: Queue a -> Queue a -- removes an element from the front

The Specification: Slow but Simple

type Queue a = [a]

empty = []

isEmpty q = q==empty

add x q = q++[x]

front (x:q) = x

remove (x:q) = q

Addition takes timedepending on the

numberof items in the queue!

The Idea: Store the Front and Back Separately

b c d e f g h ia jOld

Fast toremove

Slow to add

b c d e

i h g f

a

j

New

Fast to add

Fast to remove Periodically

move theback to the

front.

The Fast Implementation

type Queue a = ([a],[a])

flipQ ([],b) = (reverse b,[])flipQ (x:f,b) = (x:f,b)

emptyQ = ([],[])isEmptyQ q = q==emptyQaddQ x (f,b) = (f,x:b)removeQ (x:f,b) = flipQ (f,b)frontQ (x:f,b) = x

Make sure thefront is never

empty when theback is not.

Relating the Two Implementations

What list does a ”double-ended” queue represent?

retrieve :: Queue a -> [a]retrieve (f, b) = f ++ reverse b

What does it mean to be correct?

✔ retrieve emptyQ == empty

✔ isEmptyQ q == isEmpty (retrieve q)

✔ retrieve (addQ x q) == add x (retrieve q)

✔ retrieve (removeQ q) == remove (retrieve q) and so on.

Using Retrieve Guarantees Consistent Results

Example

frontQ (removeQ (addQ 1 (addQ 2 emptyQ)))

== front (retrieve (removeQ (addQ 1 (addQ 2 emptyQ))))

== front (remove (retrieve (addQ 1 (addQ 2 emptyQ))))

== front (remove (add 1 (retrieve (addQ 2 emptyQ))))

== front (remove (add 1 (add 2 (retrieve emptyQ))))

== front (remove (add 1 (add 2 empty)))

QuickChecking Properties

prop_Remove :: Queue Int -> Boolprop_Remove q = retrieve (removeQ q) == remove (retrieve q)

Main> quickCheck prop_Remove4Program error: {removeQ ([],sized_v1740 (instArbitrary_v1…

Removing froman empty queue!

Correcting the Property

prop_Remove :: Queue Int -> Propertyprop_Remove q = not (isEmptyQ q) ==> retrieve (removeQ q) == remove (retrieve q)

Main> quickCheck prop_Remove0Program error: {removeQ ([],[Arbitrary_arbitrary instArbitrary…

How can this be?

Making Assumptions Explicit

We assumed that the front of a queue will never be empty if the back contains elements!

Let’s make that explicit:

goodQ :: Queue a -> BoolgoodQ ([],[]) = TruegoodQ (x:f,b) = TruegoodQ ([],x:b) = False

prop_Remove q = not (isEmptyQ q) && goodQ q ==> retrieve (removeQ q) == remove (retrieve q)

NOW IT

WORKS!

How Do We Know Only Good Queues Arise?

Queues are built by add and remove:

addQ x (f,b) = (f,x:b)removeQ (x:f,b) = flipQ (f,b)

New properties:

prop_AddGood x q = goodQ q ==> goodQ (addQ x q)prop_RemoveGood q =

not (isEmptyQ q) && goodQ q ==>goodQ (removeQ q)

Whoops!

Main> quickCheck prop_AddGoodFalsifiable, after 0 tests:2([],[])

addQ x (f,b) = (f,x:b)removeQ (x:f,b) = flipQ (f,b)

See the bug?

Whoops!

Main> quickCheck prop_AddGoodFalsifiable, after 0 tests:2([],[])

addQ x (f,b) = flipQ (f,x:b)removeQ (x:f,b) = flipQ (f,b)

Looking Back

•Formulating properties let us define precisely how the fast queue operations should behave.

•Using QuickCheck found a bug, and revealed hidden assumptions which are now explicitly stated.

•The property definitions remain in the program, documenting exactly what testing found to hold, and providing a ready made test-bed for any future versions of the Queue library.

•We were forced to reason much more carefully about the program’s correctness, and can have much greater confidence that it really works.

Summary

• Testing is a major part of any serious software development.

• Testing should be systematic, documented, and repeatable.

• Automated tools can help a lot.

• QuickCheck is a state-of-the-art testing tool for Haskell.

The remaining slides discuss an important subtlety when using

QuickCheck

Testing the Buggy insert

prop_Insert :: Int -> [Int] -> Property

prop_Insert x xs = ordered xs ==> ordered (insert x xs)

Main> quickCheck prop_InsertFalsifiable, after 51 tests:5[-3,4]

Yields [-3,5,4]

Why so manytests?

Observing Test Data

prop_Insert :: Int -> [Int] -> Propertyprop_Insert x xs = ordered xs ==> collect (length xs) (ordered (insert x xs))

Main> quickCheck prop_InsertOK, passed 100 tests.43% 0.37% 1.11% 2.8% 3.1% 4.

Collect valuesduring testing.

Distributionof length xs.

Random lists whichhappen to be

ordered are likelyto be short!

A Better Property

prop_Insert :: Int -> Propertyprop_Insert x = forAll orderedList

(\xs -> collect (length xs) (ordered (insert x xs)))

Main> quickCheck prop_Insert2OK, passed 100 tests.22% 2.15% 0.14% 1.8% 6.8% 5.

8% 4.8% 3.4% 8.3% 9.3% 11.

2% 12.2% 10.1% 7.1% 30.1% 13.

Read this as:xsorderedList. …

What is forAll?

A higher order function!

forAll :: (Show a, Testable b) => Gen a -> (a -> b) -> Property

A generator fortest data of type

a.

A function, whichgiven a

generateda, produces a

testable result.forAll orderedList

(\xs -> collect (length xs) (ordered (insert x xs)))

What is orderedList?

A test data generator:

orderedList :: Gen [Int]

A ”generator for” a,behaves like IO a

”a command producing a”.

Some primitive generators

arbitrary :: Arbitrary a => Gen aoneof :: [Gen a] -> Gen afrequency :: [(Int,Gen a)] -> Gen a

80% of the time, generate

another list ns of elements

>=n, and return n:ns.

Defining orderedList

orderedList :: Gen [Int]orderedList = do n <- arbitrary listFrom n where listFrom n = frequency [(1,return []),

(4,do m <- arbitrary ns <- listFrom (n+abs m)

return (n:ns))]

We can use the do syntaxto write generators, like IO,but we cannot mix Gen and

IO!Choose an n, and make a list

of elements >= n.

Choose anumber>= n.

20% of the time, just stop.

testing and debugging (depuração em haskell)

Documents