computer programming for biologists class 7 nov 27 th, 2014 karsten hokamp

18
Computer Programming for Biologists Class 7 Nov 27 th , 2014 Karsten Hokamp tp://bioinf.gen.tcd.ie/GE3M25/programm

Upload: trevor-hunter

Post on 11-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Class 7

Nov 27th, 2014

Karsten Hokamp

http://bioinf.gen.tcd.ie/GE3M25/programming

Page 2: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

associative arrays

list of key/value pairs

values and keys scalars

access values by key names

Great for look-ups!

Description

Page 3: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash VariablesLook-up Table

Look-up table in real life for translation:

AAA K

AAC N

AAG K

AAU N

UUG L

UUU F

Genetic code

In Perl use hash variable:

%genetic_code = ('AAA' => 'K','AAC' => 'N','AAG' => 'K','AAU' => 'N', …'UUG' => 'L', 'UUU' => 'F');

Keys are unique!

Page 4: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

%bases = ('a', 'purine', 'c', 'pyrimidine', 'g', 'purine','t', 'pyrimidine');

%complement = ('a' => 't','c' => 'g','g' => 'c','t' => 'a');

%letters = (1, 'a', 2, 'b', 3, 'c', 4, 'd');

Examples

Hashes: Lists with special relationship between each pair of elements!

Page 5: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Storing Data

# count frequency of nucleotides:my $As = 0; my $Cs = 0; my $Gs = 0; my $Ts = 0;

foreach my $nuc (split //, $dna) {if ($nuc eq 'A') {

$As++;} elsif ($nuc eq 'C') {

$Cs++;} elsif ($nuc eq 'G') {

$Gs++;} elsif ($nuc eq 'T') {

$Ts++;}

}

Page 6: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Storing Data

# count frequency of nucleotides:my %freq = ();

foreach my $nuc (split //, $dna) {$freq{$nuc}++;

}

Page 7: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Storing Data

# count frequency of nucleotides:my %freq = ();

foreach my $nuc (split //, 'ACTTGGGT') {$freq{$nuc}++;

}

key value

A 1

C 1

G 3

T 3

keys are stored in no specific order

auto-initialisationwith '' or 0

Page 8: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Scalar vs Hash

$As = 0;

As

0

$Cs = 0;

Cs

0

$Gs = 0;

Gs

0

$Ts = 0;

Ts

0

Page 9: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Scalar vs Hash

$As = 0;

$As++;

As

1

$Cs = 0;

$Cs++;

Cs

1

$Gs = 0;

$Gs++;

Gs

1

$Ts = 0;

$Ts++;

Ts

1

Page 10: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Scalar vs Hash

$As = 0;

$As++;

As

1

$Cs = 0;

$Cs++;

Cs

1

$Gs = 0;

$Gs++;

Gs

1

$Ts = 0;

$Ts++;

Ts

1 Cs

As

Gs

Ts

1

%freq = ();

$freq{'Gs'}++;

freq

Page 11: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Practical:

http://bioinf.gen.tcd.ie/GE3M25/programming/class7

Exercises

Page 12: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Accessing Elements

General: $value = $hash{$key};

Special funtions: keys and values

# get complement of a basemy $new_base = $complement{$base};

# get aminoacid for a codonmy $aa = $genetic_code{$codon};

# list all the aa's that occurredforeach my $aa (keys %list) {

print "$aa was found!\n";}

loop through all keys

Page 13: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

$freq = $freq{'Gs'};

print "Gs: $freq\n";

Gs: 3

Retrieving a key/value pair

Cs

As

Gs

Ts

3

%freq

Page 14: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

$nuc = 'Gs';

print "$nuc: $freq{$nuc}\n";

Gs: 3

Retrieving a key/value pair

Cs

As

Gs

Ts

3

%freq

Page 15: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

foreach my $nuc (keys %freq) {

print "$nuc: $freq{$nuc}\n";

}

Cs: 1

Ts: 3

Gs: 3

As: 1

Retrieving a key/value pair

Cs

As

Gs

Ts

3

%freq

Page 16: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

foreach my $nuc (sort keys %freq) {

print "$nuc: $freq{$nuc}\n";

}

As: 1

Cs: 1

Gs: 3

Ts: 3

Retrieving a key/value pair

Cs

As

Gs

Ts

3

%freq

Page 17: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Hash Variables

Checking for keys/values

# does the key exist?if (exists $hash{$key}) {}

# does the key have a defined value?if (defined $hash{$key}) {}

# does the key have a valueif ($hash{$key}) {}

Page 18: Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Use hashes in your sequence analysis tool for:

-reporting frequencies of nucleotides

or amino acids

- reporting the GC content

Exercises