exploratory data analysiswhat is exploratory data analysis? tukey: looking at data to see what it...

74
Exploratory data analysis Karl Broman Biostatistics & Medical Informatics, UW–Madison kbroman.org github.com/kbroman @kwbroman Slides: kbroman.org/BMI773/eda.pdf

Upload: others

Post on 31-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Exploratory data analysis

Karl Broman

Biostatistics & Medical Informatics, UW–Madison

kbroman.orggithub.com/kbroman

@kwbromanSlides: kbroman.org/BMI773/eda.pdf

Page 2: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

What is exploratory data analysis?

2

Page 3: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

2

Page 4: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

What is exploratory data analysis?

Tukey: Looking at data to see what it seems to say.

It is important to understand what you can dobefore you learn to measure how well you seem to have done it.

3

Page 5: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

What is exploratory data analysis?

Tukey: Looking at data to see what it seems to say.

It is important to understand what you can dobefore you learn to measure how well you seem to have done it.

3

Page 6: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Uses of EDA

▶ Get a sense of things▶ Data diagnostics (quality control)▶ Hoping for an “a-ha” moment▶ Following up “huh” moments

4

Page 7: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Data diagnostics: principles▶ What might have gone wrong?▶ How could it be revealed?▶ Make lots of plots

– scatterplots– plots against time– consider taking logs

▶ Check consistency between files▶ Re-calculate derived variables and check that they match▶ Outliers

– Real or error?– Are the results affected?

▶ Don’t trust anyone, including yourself

5

Page 8: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Data diagnostics: principles▶ What might have gone wrong?▶ How could it be revealed?▶ Make lots of plots

– scatterplots– plots against time– consider taking logs

▶ Check consistency between files▶ Re-calculate derived variables and check that they match▶ Outliers

– Real or error?– Are the results affected?

▶ Don’t trust anyone, including yourself5

Page 9: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Batch effect

Mouse index

IL3

0 200 400 600

0

100

200

300

400

500

600

●●●

●●●●

●●●

●●●●

●●●●●●●●●●

●●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●●●●

●●●●●

●●●●●●

●●●●●●

●●●

●●

●●●●

●●●

●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●

●●●

●●●

●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●●

●●●●●●●●●●●●●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●

6

Page 10: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Batch effect

Mouse index

log 1

0 IL

3

0 200 400 600

−1

0

1

2

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●●

●●●

●●●

●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●●

●●●

●●●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

6

Page 11: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Messed up units

Mouse index

Adi

pose

wei

ght (

mg)

0 100 200 300 400 500

0

200

400

600

800

1000

1200

1400

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

7

Page 12: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Outliers

6 wk body weight (g)

10 w

k bo

dy w

eigh

t (g)

15 20 25

15

20

25

30

●●

●●

●●

●●

●●

●●

●●

●●

8

Page 13: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Weird stuff I’ve seen▶ 500 worksheet excel file where the middle 100 worksheets have the

variables arranged in a different order▶ Weird rounding patterns▶ Missing values that shouldn’t be, because derived values are not

missing▶ Categorical data with inconsistent categories▶ Missing value codes that weren’t mentioned and that could be real

values (e.g., 999)▶ OMG dates

9

Page 14: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Weird rounding

10

Page 15: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Identifiers

▶ Are the subject IDs unique?▶ Are there subject or gene IDs that don’t fit the typical pattern?

– 1e5 vs 100000– hyphens turned into periods– IDs that became dates

▶ Subjects in one file but not in another and vice versa– Real, or messed up IDs?

11

Page 16: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Missing values

▶ As intended?▶ Below detection limit?▶ Telling you something about sample quality?▶ Introducing bias?

12

Page 17: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Fitting a model can be useful

Week

Bod

y w

eigh

t (g)

1 5 10 150

5

10

15

20

25

30

35

●●

● ●●

●● ●

● ●

13

Page 18: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Fitting a model can be useful

Week

Bod

y w

eigh

t (g)

1 5 10 150

5

10

15

20

25

30

35

●●

● ●●

●● ●

● ●

13

Page 19: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Fitting a model can be useful

Week

Bod

y w

eigh

t (g)

1 5 10 150

5

10

15

20

25

30

35

●●

● ●●

●● ●

● ●

13

Page 20: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Biggest change vs 2nd difference

Max absolute change

Max

abs

olut

e 2n

d−di

ffere

nce

2 3 4 5 6 7 8

2

4

6

8

10

●●

●●

●●●

● ●● ●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

● ●●

●●

● ●

●●

●●●

● ●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●●

●●

● ●

● ●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

●●

14

Page 21: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Fit a smooth curve

Week

Bod

y w

eigh

t (g)

1 5 10 150

5

10

15

20

25

30

35

●●

● ●●

●● ●

● ●

15

Page 22: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Residuals

Index

Abs

olut

e va

lue

of r

elat

ive

resi

dual

(%

)

0 200 400 600 800 1000 1200

5

10

15

20

25

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

16

Page 23: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Follow up artifacts

They might be the most interesting results

17

Page 24: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Attie project∼500 B6 × BTBR intercross mice, all ob/ob

▶ Genotypes at 2057 SNPs (Affymetrix arrays)

▶ Gene expression in six tissues (Agilent arrays)– adipose– gastrocnemius muscle– hypothalamus– pancreatic islets– kidney– liver

▶ Numerous clinical phenotypes(e.g., body weight, insulin and glucose levels)

18

Page 25: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Intercross

P1 P2

F1 F1

F2

19

Page 26: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Sex and the X chr

BTBR B6

F1

F2

Female Male

20

Page 27: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Strong eQTL

0

50

100

150

Chromosome

LOD

sco

re

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X

probe 499541 (on chr 1)

21

Page 28: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Strong eQTL

0

50

100

150

Chromosome

LOD

sco

re

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X

probe 499541 (on chr 1)

probe 10002916257 (on chr 13)

21

Page 29: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs G

−1.0

−0.5

0.0

expr

essi

on o

f 49

9541

●●

●●

● ●

●●●●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

●●

BB BR RR

Genotype at rs13476158

22

Page 30: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs G

−1.0

−0.5

0.0

expr

essi

on o

f 49

9541

●●

●●

● ●

●●●●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

BB BR RR

Genotype at rs13476158

22

Page 31: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

kNN classifier

−1.0

−0.5

0.0

expr

essi

on o

f 49

9541

●●

●●

● ●

●●●●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●● ●

●●

● ●

BB BR RR

Genotype at rs13476158

23

Page 32: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs G

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

−1.0 −0.5 0.0

−0.6

−0.4

−0.2

0.0

0.2

0.4

expression of 518187

expr

essi

on o

f 10

0040

3548

8

BB

BR

RRGenotype at rs6244221

24

Page 33: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs G

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

−1.0 −0.5 0.0

−0.6

−0.4

−0.2

0.0

0.2

0.4

expression of 518187

expr

essi

on o

f 10

0040

3548

8●

●●

BB

BR

RRGenotype at rs6244221

24

Page 34: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Basic schemeexpression traits

transcripts

mic

e

observed eQTL genotypes

eQTL

mic

e

25

Page 35: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Basic schemeexpression traits

transcripts

mic

e

observed eQTL genotypes

eQTL

mic

einferred eQTL genotypes

eQTL

mic

e

25

Page 36: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Basic schemeexpression traits

transcripts

mic

e

observed eQTL genotypes

eQTL

mic

einferred eQTL genotypes

eQTL

mic

e

25

Page 37: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Basic schemeexpression traits

transcripts

mic

e

observed eQTL genotypes

eQTL

mic

einferred eQTL genotypes

eQTL

mic

e

25

Page 38: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Prop’n mismatches

26

Page 39: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Prop’n mismatches

20 40 60 80 100

20

40

60

80

100

DNA sample

mR

NA

sam

ple

1

1

0.0

0.2

0.4

0.6

0.8

1.0

27

Page 40: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Prop’n mismatches

220 240 260 280 300

220

240

260

280

300

DNA sample

mR

NA

sam

ple

201

201

0.0

0.2

0.4

0.6

0.8

1.0

28

Page 41: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Genotype mix-ups

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ●● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ●

● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ●● ● ● ● ● ● ● ● ● ● ●

● ●● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

B6

B6

B6

BTBR

BTBR

F1

F1

BTBR

BTBR

F1

F1

●●

● ●

1628

1629

1630

1631

1632

1633

1634

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

ABCDEFGH

1 2 3 4 5 6 7 8 9 10 11 12

29

Page 42: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Plate 1631

30

Page 43: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Plates 1632 and 1630

31

Page 44: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

32

Page 45: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

32

Page 46: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs E

32

Page 47: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

−2 −1 0 1 2

−4

−3

−2

−1

0

1

2

islet expression

liver

exp

ress

ion

transcript 497973

32

Page 48: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

●●

●●

● ●●● ●●

●●

●●●●

●●

●●

●●

● ●●●

●● ● ●

●●

● ●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

● ●

●● ●●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●● ●

●●

●●●●●●

●●

●●

●●

●●● ●

●●

●● ●● ●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●● ●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●●●●

●●

● ●

●●

●● ●

●●●

●●●

●●

●●●●

●●

●● ●●●●

●● ●

●● ●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●● ● ●●

●●● ●

● ●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●● ●

●●●

●●

● ●●●

●●

●●●●

●●

●● ●

●●

●●

●●●

●●

●● ●●

−1.0 −0.5 0.0 0.5 1.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

islet expression

liver

exp

ress

ion

transcript 512831

32

Page 49: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e●

● ●

●●

● ●

●● ●●

●●

●●

●●

● ●

●●

● ●●●

●●

● ●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

● ●●

● ●

●●●

●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−2

−1

0

1

islet expression

liver

exp

ress

ion

transcript 507042

32

Page 50: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

32

Page 51: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−3

−2

−1

0

1

2

3

islet expression

liver

exp

ress

ion

Mouse3280

32

Page 52: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

● ●

● ●

●● ●●

●●

●●

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

0

1

islet expression

liver

exp

ress

ion

Mouse3598

32

Page 53: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

0

1

2

Mouse3598 islet expr

Mou

se35

99 li

ver

expr

Mouse3599 liver vs Mouse3598 islet

32

Page 54: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

E vs Eexpression in islet

transcripts

mic

e

expression in liver

transcripts

mic

e

● ●

● ●

●● ●●

●●

●●

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

0

1

Mouse3599 islet expr

Mou

se35

98 li

ver

expr

Mouse3598 liver vs Mouse3599 islet

32

Page 55: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Expression mix-upsadipose

● ●3583 3584

●●

3187

31883200

gastroc

● ●3655 3659

hypo

● ●3179 3188

● ●3208 3210

● ●3347 3348

● ●3367 3369

● ●3381 3382

● ●3449 3451

● ●3452 3454

● ●3589 3590

● ●3592 3594

islet

● ●●3295 3296

● ●3598 3599

kidney

● ●● ●3484 3503?

● ●3510 3523

liver

● ●●3136 3141

● ●3142 3143

33

Page 56: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Another example

kbroman.org/blog/2012/04/25/microarrays-suck 34

Page 57: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

What the heck?

kbroman.org/blog/2012/04/25/microarrays-suck 34

Page 58: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Dense box plots

35

Page 59: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Follow up artifacts

They might be the most interesting results

36

Page 60: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

37

Page 61: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Eucalypt genetic map1 2 3 4 5 6

0.0 '" 0.0 .m 0.0 0.0 ;I�se

0.0 0'" �1 1l '" ,IU ,. ."", ' " g427 " ", .0'" ,., ;na

•. , '.0 ,0029 • •• "1 C422

••• •• " ... . �, ". C184

'" '" \2.3 92)7 941 ..... 1 �.4 ,-, 1 '.2 gOgBA, �474 " '"

1 a. , �!iSB 17,!I " .'� 1 !I.I 91178

19.3 " 1'.1 ,�, 20. \ �, c187 2l.i 9'·2 IF ,,06;8

'H

I' U14..J " , c16!IC

.�" H ,�, 2B. ' "' .. 21.2 gOJ2.\, eU' ... " .J711A 21.1 ., ..

""� 21.1 HII-l 32.8 J'H 31.. .�. 32.1 9121 29.' ,42) 33.2 '"

0423 �2�' 3'.e 11-1 11-2

,Ill "'j 'I' N12-' • • 43.' 9I)88A 43.' 45.7 gtlUB 52.1 � .. 4B.l BI2-1 .. , cHI L1Z-! SIUI 812-1 ". , "" 52.' loIl-2

".IU, II. J ,OlU. .'" ,�, Sl.O 0'9) clBO ... lUI "" ' ..., .,,, d�7 62.0 ;0'99 d • Ii! .451S 81.54 "'00 11.8 u., •• sec 6'.1 e211 C

... ".W .... 6B.1 .�,

., .• e132 18.1 ,421 71.1 51-\ 7'.1 .- 14.' 11-\ 71.2 ... ,4-2

78.2 11.7 ,JliB 78.1 " 7&.1 0171 82.1 "�,

18.2 ,31 . .0. liS.' _ g3D " 78.' "

'1.1 ,,421C 111.1 m 11.2 ",n <:41\ BJ.t 9412 " ".0 ""

111.1 94128 , 01.1 IOU <00, 100.3 g"S "j'! 18N e211A ". 107.' .",

(18-' ,,,., " 10\1 . 2 L12-2

, 10.1 FI-2 110. I .m 110.1 ... ' lot1.7 1I7

'i 114.1 c421A 1\3 .• ,,,.., 110.1 c4278 110.7 117. 115

.

:5 11'11-1 "1'1 I II.' 111.� cOlO 111.2 _ WI1_2

. . . 11 8.1 120.3 �"7 III. G�-: 120.7 - ;428 1 H. lli.1 120.7 1;-2 122.7 �, 121.2 121.8 I:l.' ,6-'

121.2 128 .• I 21.0 -ill! 12'.5 ,

In.7 IlO.2 . _ cono 12'.7 " In. 130.11 Ill.' ;262 I2t. Ill.: 13a.' 13J.7

1&-2 1 38 .1 8342 IJI .• � ..

13'.4 Ul.: ellS 1l8.1 6-, 140.2

14'.0 IU.7 cSIS H5.7 ,lin

155.1 -l1f 00'" 151.1 -1+ ,Igl!I.I,

Byrne et al., Theor Appl Genet 91:869–875, 199538

Page 62: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Meiosis

39

Page 63: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

CEPH pedigrees1331 1332 1347

1362 1413 1416

884 102

● ●

● ● ● ● ● ●

● ●

● ● ● ● ●

● ●

● ● ●

● ●

● ● ● ● ● ● ●

● ●

● ● ● ●

● ●

● ● ● ● ● ●

● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ●

● ● ● ● ● ●

● ●

● ● ● ● ●

● ●

● ● ●

● ●

● ● ● ● ● ● ●

● ●

● ● ● ●

● ●

● ● ● ● ● ●

● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

15 14 13 12

2 1

3 4 5 6 7 8 9 10 11 16 17

16 15 14 13

2 1

3 4 5 6 7 8 9 10 11 12 17

15 14 13 12

2 1

3 4 5 6 7 8 9 10 11 16

16 15 14 13

2 1

3 4 5 6 7 8 9 10 11 12 17

19 21 18 20

2 1

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

14 13 12 11

2 1

3 4 5 6 7 8 9 10 11 15 16

18 17 16 15

2 1

3 4 5 6 7 8 9 10 11 12 13 14

2 1

3 4 5 6 7 8 9 10 11 12 13 14 15 16

40

Page 64: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Crossover locations

Broman and Weber, Am J Hum Genet 66:1911–1926, 200041

Page 65: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

42

Page 66: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Crossover interference

Broman and Weber, Am J Hum Genet 66:1911–1926, 2000 43

Page 67: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Maternal chr 8

44

Page 68: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Apparent triple XOs

Broman et al., In: Science and Statistics: A Festschrift for Terry Speed, 2003 45

Page 69: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Chr 8p inversion

Broman et al., In: Science and Statistics: A Festschrift for Terry Speed, 2003 46

Page 70: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Capturing EDA

▶ what were you trying to do?▶ what you’re thinking about?▶ what did you observe?▶ what did you conclude, and why?

47

Page 71: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Avoid

▶ “How did I create this plot?”▶ “Why did I decide to omit those six samples?”▶ “Where (on the web) did I find these data?”▶ “What was that interesting gene?”

48

Page 72: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Basic principles

Step 1: slow down and document.Step 2: have sympathy for your future self.Step 3: have a system.

49

Page 73: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

Capturing EDA

▶ copy-and-paste from a script▶ grab code from the log (e.g., .Rhistory)▶ Write an informal report (R Markdown or Jupyter)▶ Write code for use with the KnitR function spin()

Comments like #' This will become textChunk options like so: #+ chunk_label, echo=FALSE

50

Page 74: Exploratory data analysisWhat is exploratory data analysis? Tukey: Looking at data to see what it seems to say. It is important to understand what you can do before you learn to measure

If you torture the data long enough,it will confess to anything.

– Tukey

51