welcome to intro to bioinformatics. intergalactic border patrol bioinformatics in space tribbles...

26
Welcome to Intro to Bioinformatics

Upload: hortense-woods

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Welcome to Intro to Bioinformatics

Page 2: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Intergalactic Border PatrolBioinformatics in Space

Tribbles

Warning! Highly dangerous!

Trogs

Cute and harmless.

Page 3: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Intergalactic Border PatrolBioinformatics in Space

Tribbles

Warning! Highly dangerous!

Trogs

Cute and harmless.

Page 4: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Welcome to the Intergalactic Detention Center

Please answer the following questions

1. Like broccoli

2. Floss every brushing

3. Enjoy ballet

4. Always pair socks

5. Liked Moby Dick

6. Eat the Maraschino cherry

1………………………….……………..10

Page 5: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

T1 T2 T3 T4 T5 T6 T7 . . .

Responses to questionnaire

1. Broccoli

2. Floss

3. Ballet

4. Pair socks

5. Moby Dick

6. Maraschino

. . .

9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .

2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .

8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .

9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .

6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .

6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .

4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .

You need a plan

Page 6: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

A Plan

• Release all Tribbles / Trogs

• Note outcome for each individual

• Deduce identities

• Integrate identities into results

• Figure out which questions/answers informative

Page 7: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

T1 T2 T3 T4 T5 T6 T7 . . .

Responses to questionnaire

1. Broccoli

2. Floss

3. Ballet

4. Pair socks

5. Moby Dick

6. Maraschino

. . .

9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .

2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .

8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .

9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .

6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .

6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .

Tribbles Trogs

4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .

(what now?)

Page 8: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

T1 T2 T3 T4 T5 T6 T7 Mean

Responses to questionnaire

1. Broccoli

2. Floss

3. Ballet

4. Pair socks

5. Moby Dick

6. Maraschino

. . .

9.2 1.6 4.0 5.2 2.2 9.1 1.0 6.4 2.2

2.2 1.9 1.0 4.6 7.6 9.8 1.0 6.0 1.3

8.3 3.1 2.4 6.1 9.3 9.2 1.0 8.2 2.2

9.6 5.5 1.3 8.4 9.8 9.0 1.0 9.2 2.6

4.2 2.1 1.0 4.1 5.2 4.4 1.0 4.4 1.4

6.4 8.9 7.1 3.3 1.9 2.0 1.0 4.4 3.7

6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 1.8 5.5

Tribbles Trogs

Page 9: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

The responses to which questions are correlated with class?

1…………………….……………..10

Δμ

1…………………….……………..10

Δμ

Δμ

σ + σCorrelation of question with class =

Page 10: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Strategy

Δμ

σ + σCorrelation =

• Calculate correlation for each question

• Look for questions with largest correlations with class

Implementation

μ = (Σ s ) / N

1…………………….……………..10

Page 11: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Strategy

Δμ

σ + σCorrelation =

• Calculate correlation for each question

• Look for questions with largest correlations with class

Implementation

σ2 = [Σ (s - μ)2 / (N-1)]σ = sqrt(σ)

1…………………….……………..10

- +

Page 12: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Strategy

Δμ

σ + σCorrelation =

• Calculate correlation for each question

• Look for questions with largest correlations with class

Implementation

(Σ s)/ N - (Σ s)/N

sqrt(Σ (s - μ)2 / (N-1)] + sqrt(Σ (s - μ)2 / (N-1)) =

Page 13: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Δμ

σ + σCorrelation =

Implementation

=

Read_Responses_To_Question();

$numerator = Mean(@tribble_scores) – Mean(@trog_scores);

$denominator = StDev(@tribble_scores) + StDev(@trog_scores);

$correlation = $numerator / $denominator;

push @question_info, [$question_number, $correlation];

(Σ s)/ N - (Σ s)/N

sqrt(Σ (s - μ)2 / (N-1)] + sqrt(Σ (s - μ)2 / (N-1))

Page 14: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Δμ

σ + σCorrelation =

Implementation

=

Read_Responses_To_Question();

$numerator = Mean(@tribble_scores) – Mean(@trog_scores);

$denominator = StDev(@tribble_scores) + StDev(@trog_scores);

while (<INPUT>) {

}

$correlation = $numerator / $denominator;

push @question_info, [$question_number, $correlation];

(Σ s)/ N - (Σ s)/N

sqrt(Σ (s - μ)2 / (N-1)] + sqrt(Σ (s - μ)2 / (N-1))

Page 15: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Implementation

sub Mean {

my @scores = @_; # Grab Tribble or Trog scores

my $s_sum = 0; # Start Σ at 0

my $N = 0; # Need to count N

foreach my $score (@scores) {

$s_sum = $s_sum + $score;

$N = $N + 1;

}

return $s_sum / $N; # mean = (Σ s)/ N

Page 16: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Results

Question Correlation

3497 1.76 281 1.72 1114 1.71

… …

Are these questions good predictors of class?

Suppose there are NO good predictors of class…

Page 17: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

(Interlude)

NEWS!

Precinct in Harrisonburg has voted for the winning senatorial candidate every time

for the past ten elections!

(Probability if by chance = (1/2) · (1/2) · (1/2) · …

= (1/2)10

= 1/1024 1/1000

Suppose there are 1000 precincts in Virginia…

(BLAST from the past) E = (probability) · (number of combinations)

Beware the fallacy of the unlikely result!

Page 18: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?Which can be used to predict class?

Results

Question Correlation

3497 1.76 281 1.72 1114 1.71

… …

Are these questions good predictors of class?

Suppose there are NO good predictors of class…

… what would be the expected correlation?

Page 19: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

? ? ?

Which questions are informative?How to test class predictors?

Choice #1

Rerun time with the different (?) reality that Tribbles are no different from Trogs

Choice #2

Use random data

Page 20: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

T1 T2 T3 T4 T5 T6 T7 . . .

Random responses to questionnaire

1. Broccoli

2. Floss

3. Ballet

4. Pair socks

5. Moby Dick

6. Maraschino

. . .

9.2 -1600 331/3 99 3.14159 -0 1.0 . . .

6817. MacArthur’s Park

Random doesn’t mean crazy

Page 21: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

T1 T2 T3 T4 T5 T6 T7 . . .

Random responses to questionnaire

1. Broccoli

2. Floss

3. Ballet

4. Pair socks

5. Moby Dick

6. Maraschino

. . .

9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .

2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .

8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .

9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .

6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .

6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .

4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .

Maybe but…

Page 22: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

T1 T2 T3 T4 T5 T6 T7 . . .

Random responses to questionnaire

1. Broccoli

2. Floss

3. Ballet

4. Pair socks

5. Moby Dick

6. Maraschino

. . .

9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .

2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .

8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .

9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .

6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .

6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .

4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .

Keep the data, shuffle the players

Page 23: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?How to test class predictors?

Choice #1

Rerun time with the different (?) reality that Tribbles are no different from Trogs

Choice #2

Use random data

Choice #3

Shuffle data

Page 24: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?How to test class predictors?

Correlation2.0 1.5 1.0 0.5 0 -0.5

# of questions

with better correlations

10000

1000

100

10

0

5% of shuffled responses

Page 25: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?How to test class predictors?

Correlation2.0 1.5 1.0 0.5 0 -0.5

# of questions

with better correlations

10000

1000

100

10

0

1% of shuffled responses

Actual responses

Page 26: Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles Warning! Highly dangerous! Trogs Cute and harmless

Which questions are informative?How to test class predictors?

Correlation2.0 1.5 1.0 0.5 0 -0.5

# of questions

with better correlations

10000

1000

100

10

0

1% of shuffled responses

Actual responses

If class predictors don’t work

If class predictors are valid