biopython, doctest and makefiles

Download biopython, doctest and makefiles

If you can't read please download the document

Upload: giovanni-dallolio

Post on 16-Apr-2017

2.225 views

Category:

Technology


3 download

TRANSCRIPT

Barcelona Python Developers Seminars

biopython, doctest and
makefiles

This is me

Giovanni

Phd student in a Population Genetics lab

Not a biopython dev

(that could be not my real photo)

Intro

BioPython -> a collection of standard python modules for bioinformatics

Advantages of using open source libraries in science:

more reproducibility

easier to compare results

less errors

less time spent

BioPython some use cases

The human genome sequencing project (2001):

TCCATGGCCTCCCGGCAAGCCTAAGCTAGCGCAATTGTCAGACGCACAGGACCGGTCTGGGGAGACCAATGTGTTCAGACAACGATTCCCAGCTAGTACCACTGTTTGACTCGGAAGATGTGTACAACTATTGTAGCGACTGTGTCCCATCATTGCATTCAAACCCAAGTAATTGATGGATCAACAAAGGATACACTCCAAAAGTCGCACAGAGATTGGTCATCTTAACGCGAGATTAAACATGCGTCTATACGCCCGTGTTAAGTTCGGCCGCCATCGTACAAATAAGCGAGNNNNTATCAATCTAATCTTAAACCGGCTCTTGAGAAGGGCTAGCGGCGTTAGGACCCGCTGCCGGCCGTGAGCGTGCGTTCACTCTGAACAGCGCCATCGATGGGTCGCTTGTGTAGCTATTTTAAGGACGCGACATAGGCCCTGGGGCAGTTACTGGGGCATGCCCACTATATCCGCGGGCAAGTTGGTATTCAGCTATGTTTATCTCTCGCCCAATGCGTGAAAGCGCCAAACGTGGGTAGAGGACTTAGCAATTTGGGGCATGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCNNNNNNTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATCCCTGTTGGTGAGGATTTATGGAGGTGGACTGTCAGGTAGGCAAGAACTCTGGGTGAATTTGCGAGCGCTATCTCTAAGTTACACGCTTTACTGGGGCATGCCCGGGCCGTAGAAGTTACTGGGGCATGCCCCACGTAATAGGTTTTCATGAGGAGATGTTTGGTCTGATTCTCGAGATTGTGGCTAAGTATTGAGTCAGACTTACTGGGGCATTTACTGGGGCATGCCCGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCTTCATGATTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATTTACTGGGGCATGCCCCCCNNNNNGAGGATTTNNNNTGGAGCCTATCTCACATTTTAAACTTCAATCATCATAACACGTGCGCACTTTTTCCGCGCTTGACGGCGAAGTGACTGGCCACTTCCTGCTCCCTGTTTTTCCCAATACCTGACAAGTGTGGCATCTGTCCCCCTGAAGAGGACTAGAGTATCATTACGGGGGGCTTGACACTTACCTTCATAGG.............

Up to ~3*109 characters

Lot of regexs (perl-ists like it)

Could be obtained for >> help(say_hello)

Help on function say_hello in module __main__:

say_hello(name) print hello to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!!

doctest how does it works

#!/usr/bin/env python

def sum(x, y):
'''
sums two numbers

example:
>>> print sum(1, 2)
3
'''
return x + y

if __name__ == '__main__':
import doctest
doctest.testmod()

doctest.testmod() looks for any line beginning with '>>>' and execute it as a python command

The result is compared with the subsequent lines (expected output). If there are differences, an error is raised.

If 'print sum(1, 2)' doesn't return 3, an error is raised

doctest - examples

BioPython - SeqIO.parse

doctest file parsing example

In bioinformatics there are many formats with semi-homonymous names

ped, tped, bed, tmap, pdb, fasta...

It is useful to put an example of input file in every parser function

Choose good examples

Write the doctest along with who will use the script (e.g. A fellow scientist)

Ask them 'how this function is supposed to behave in this example?'

Simplify: round all numbers to multiples of 100, put comments

Doctest Pros and Cons

Pros:

docs always up to date

Usage examples

Quick tests when you are coding

Cons:

Functions that read files (StringIO? NamedTempFile?)

Still need to write a unittest

Can't use lines longer than 80 characters (PEP8)

Random generators / statistics / rounding

Bioinformatics a different approach

The approach between programming software and programming experiments is different:

Testing has different dimensions (biological meaning, reproducibility)

Usually you write numerous scripts, each one carrying out a small task, and glue them with a pipeline/wrapper script/makefile/automated builds tool/xml described workflow/insert others here

I am a makefile guy

What is a makefile?

gnu/make is an utility for building C/C++ programs.

It can be used to save shell commands (...) with their options and re-execute them at will.

Example:
:$ make all
python retrieve_data.py --option1 --option2
perl convert_format.pl --input inputfile --option3
perl convert_format.pl --inputfile inputfile2

Simplest Makefile example

$: cat Makefile

help:
echo 'execute make all to carry out the whole analysis'

get_data:
python retrieve_data.py --database ensembl --specie Human --output sequences.fasta

calculate_results:
perl calculate_results.pl --option1 --option2 --input sequence.fasta --output results.txt

all: get_data calculate_results

Makefiles Pros

Conditional execution

If there is no need to execute a command, it is skipped (checks if the expected output file already exists and is up-to-date)

Chaining commands

You can define the order in which commands must be executed (download sequences first, then read them)

Support for clusters

Syntax is ugly, but standard

Make - Cons

Gnu/Make has a very ugly syntax

Really, I hate its syntax

I am looking for substitutes in python:

scons

paver

waf (google summer of code project)

Still haven't start using them

Implement something in biopython?

A more complicated Makefile

Variables like %, $@, $<

Modificators like -, @

addprefix, addsuffix ??

Triple parentesis ??

Thanks for the attention!

Did you like the talk?

BioPython use cases

Single Nucleotides Polymorphisms are positions in the genome that tend to vary most between different individuals

We are working with data on 650.000 SNPs on 1000 of individuals

Need to organize data on objects (SNPs, Genotypes, Individuals, Populations), use a database for support, calculate statistics on them

Doctest a closer look

#usr/bin/env python

def say_hello(name):
'''
print hello (name) to the screen

example:
>>> say_hello('Albert Einstein')
hello Albert Einstein!!!
'''
print 'hello ' + name + '!!!'

if __name__ == '__main__':
import doctest
doctest.testmod()

new function definitionnormal doc

example of function usage

expected output

body of the function

call to the doctest module

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamalla

Toinen jsennystaso

Kolmas jsennystaso

Neljs jsennystaso

Viides jsennystaso

Kuudes jsennystaso

Seitsems jsennystaso

Kahdeksas jsennystaso

Yhdekss jsennystaso