the mysteries of dropbox€¦ · bug in the test. we shouldn’t generatenonsense tests that abuse...

49
The Mysteries of Dropbox John Hughes

Upload: others

Post on 17-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

The Mysteries ofDropbox

John Hughes

Page 2: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

File Synchronizer Usage

400 million (June 2015)

240 million (Oct 2014)

250 million (Nov 2014)

Page 3: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

What do they do?

How can we test them?

Are they trustworthy?

Page 4: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

TESTING

Hand written test cases

Generated test cases

Page 5: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

QuickCheck

1999—invented by Koen Claessen and myself, for Haskell

2006—Quviq founded marketing Erlang version

Many extensions

Finding deep bugs for Ericsson, Volvo Cars, Basho, etc…

Page 6: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Why Generate Tests?

•Much wider variety!

•More confidence!

•Less work!

Page 7: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Example: Testing a Queue withQuickCheck

• API:• q:new(Size) – create a queue

• q:put(Q, N) – put N into the queue

• q:get(Q) – remove and return the firstelement

Page 8: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

A Generated Failing Testq:new(1) -> {ptr,...}q:put({ptr,...}, 1) -> okq:get({ptr,...}) -> 1q:put({ptr,...}, -1) -> okq:get({ptr,...}) -> -1q:put({ptr,...}, 0) -> okq:put({ptr,...}, 1) -> okq:put({ptr,...}, 0) -> okq:put({ptr,...}, -1) -> okq:get({ptr,...}) -> -1

Reason:Post-condition failed:-1 /= 0

Quitelong and boring!

Page 9: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

A Shrunk Failing Test

q:new(1) -> {ptr, ...}q:put({ptr, ...}, 0) -> okq:put({ptr, ...}, 1) -> okq:get({ptr, ...}) -> 1

Reason:Post-condition failed:1 /= 0

We made a queue of size 1…

…and put TWO things into it!

Page 10: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Bug in the code

We should have got an exception

Bug in the test

We shouldn’tgenerate nonsensetests that abuse the API

Page 11: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

QuickCheck

API under test

A minimal failingexample

Page 12: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell
Page 13: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

How can we tell if a test passed?

State transitions

Postconditions

Page 14: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Modelling a queue

new(1)

put(…,0)

put(…,1)

get(…)

Model

[]

[0]

[0,1]

[1]

Postcondition

result/=NULL

result==0

Page 15: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

But what about Dropbox?

VM

VM

VM

Laptop

Dropbox service

Read and write files

Page 16: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Goals

• A simple model of what a file synchronizer does, that the user can understand without reference to implementation details

• Ideally, a model that works for many different synchronizers

Page 17: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

What’s the model?

• Contents of each file?

• Contents of each file on each node?

write(”a”)

read()missing

write(”a”)

read()”a”

Backgroundaction

0.5s

1.0s

Page 18: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

A new approach

Observatione.g. read() ”a”

State transitione.g. write(”b”)

Initial state

All possible background actions

Page 19: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

No matchingobservations means the test fails

Explanation

Page 20: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

What background operations?

write(”a”)

read()”a”

write(”b”)

write(”c”)

read()”b”

Page 21: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

What background operations?

write(”a”)

read()”a”

write(”b”)

write(”c”)

read()”b”

Server

Page 22: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Conflicts

write(”a”)

read()”a”

write(”b”)

”b” will appear in a conflict file

Page 23: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Conflicts

write(”a”)

read()”b”

write(”b”)

”a” may or may not appear in a conflict file

”a”

Observe the value overwritten

will not

Page 24: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Conflicts

write(”a”)

read()”b”

write(”b”)

”a” may or may not appear in a conflict file

missing

will

Page 25: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Observing conflict files

• Conflict files do not appear immediately!Wait for a stable state to check for them

stabilize() (V,C)

The final value in the file(that all nodes converge to)

The set of values in conflict files

Page 26: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

stabilize() (V,C)How long should we wait?

• Until all nodes agree on file contents (V) and conflict files (C)

• Until the Dropbox daemon on each node claims to be idle

• Until we see what we expect!

BUT NOT TOO

LONG!!!

Page 27: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Our model

• Global value• Global conflict set

• For each node:• Local value• ”Fresh” or ”Stale”• ”Clean” or ”Dirty”

On the ”server”

”Stale” meansneeds to downloadfrom the server

”Dirty” meansneeds to upload to the server

Page 28: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

read() V

Observes:Local value is V

State transition:None

Page 29: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

write(Vnew) Vold

Observes:Local value is Vold

State transition:Local value becomes VnewThis node becomes dirty

Page 30: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

stabilize() (V,C)

Observes:Global value is VConflict set is CAll nodes are fresh and clean

State transition:None

Page 31: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

download()

Observes:This node is stale and clean

State transition:Local value becomes global valueThis node becomes fresh

Page 32: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

upload()

Observes:This node is dirty

State transition:This node becomes cleanif this node is freshthen Global value becomes local value

All other nodes become staleelse Local value is added to conflicts

First upload wins

Page 33: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Does the model match reality?

write(”a”) missingwrite(”a”) missing

stabilize() (”a”,{})

in conflict

Where is it?

Page 34: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

A value does not conflict with itself

Page 35: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

upload()Observes:

This node is dirty

State transition:This node becomes cleanif local value /= global value then

if this node is freshthen Global value becomes local value

All other nodes become staleelse Local value is added to conflicts

Page 36: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Another inconsistency

Wr(”a”)●

Rd()”a””a”

Wr(●)”a”

Rd()●●

Wr(”b”)”a”

in conflict

Rd()”b””b”

But ”b” should be in a conflict file!

Page 37: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

’missing’ loses everyconflict

Page 38: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

upload()Observes:

This node is dirty

State transition:This node becomes cleanif local value /= global value then

if this node is fresh or global value is missingthen Global value becomes local value

All other nodes become staleelse Local value is added to conflicts

Page 39: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

So far…

• We’re fitting the model to the implementation

Why?• Because Dropbox have thought harder about

synchronization than we have!

For each inconsistency:• Ask ”Is this the intended behaviour?”

Page 40: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Surprises

Page 41: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Dropbox can delete a newlycreated file

Wr(”a”)●

Wr(●)”a”

Wr(”b”)”a”

Wr(”c”)●

Rd()●

It’s gone!!

We’d expect”b” or ”c”!!

Page 42: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Dropbox can recreate deleted files

Wr(”a”)●

Wr(●)”a”

Rd()”a”

What??

stabilize() (”a”,{})

Page 43: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Dropbox can lose data completely

Wr(”a”)●Wr(”b”)”a”Rd()”b”

stabilize() (”b”,{}) (”a”,{})

Dirty, butbehaves as clean

Page 44: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Dropbox can lose data completely

Wr(”a”)●Wr(”b”)”a”Rd()”b”

stabilize() (”c”,{})

Wr(”c”)”a”Lostaltogether!!

Page 45: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

What did we do?

• Tested a non-deterministic system by searching for explanations using a model with hidden actions

• Used QuickCheck’s minimal failing tests to refinethe model, until it matched the intended behaviour

• Now minimal failing tests reveal unintended system behaviour

Page 46: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

What do Dropbox say?

• The synchronization team has reproduced the buggy behaviours

• They’re rare failures which occur under very special circumstances

• They’re developing fixes

Page 47: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Synchronization is subtle!

• There’s much more to do…

• Add directories!• Directories and files with the same names• Conflicts between deleting a directory and writing a file

in it• …

• More file synchronizers!

Page 48: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

Coming out in April,

IEEE International Conference on Software Testing,

Chicago

Page 49: The Mysteries of Dropbox€¦ · Bug in the test. We shouldn’t generatenonsense tests that abuse the API. QuickCheck. API under test. A minimal failing example. How can we tell

www.quviq.com