“how perl saved the human genome project”

16
“How Perl Saved the Human Genome Project” DATE: Early February, 1996 LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing center in Europe. OCCASION: A high level meeting between the computer scientists of this center and the largest DNA sequencing center in the United States. THE PROBLEM: Although the two centers use almost identical laboratory techniques, almost identical databases, and almost identical data analysis tools, they still can't interchange data or meaningfully compare results. THE SOLUTION: Perl. Lincoln Stein, TPJ Vol 1 #2 Summer 1996

Upload: kennan

Post on 31-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

“How Perl Saved the Human Genome Project”. DATE: Early February, 1996 LOCATION : Cambridge, England, in the conference room of the largest DNA sequencing center in Europe. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “How Perl Saved the Human Genome Project”

“How Perl Saved the Human Genome Project”

DATE: Early February, 1996

LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing center in Europe.

OCCASION: A high level meeting between the computer scientists of this center and the largest DNA sequencing center in the United States.

THE PROBLEM: Although the two centers use almost identical laboratory techniques, almost identical databases, and almost identical data analysis tools, they still can't interchange data or meaningfully compare results.

THE SOLUTION: Perl.

Lincoln Stein, TPJ Vol 1 #2 Summer 1996

Page 2: “How Perl Saved the Human Genome Project”

“How Perl Saved the Human Genome Project”

Perl solved issues of:

a rapidly-changing situation text-manipulation to convert between data formats building pipelines to glue data analysis programs together

Page 3: “How Perl Saved the Human Genome Project”

10 years on

Page 4: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

The genome is the source of a program to build and run a human

Page 5: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

But: the author is not available for comment

Page 6: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

It’s 3GB in size

Page 7: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

Due to constant forking, there are about 7 billion different versions

Page 8: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

It’s full of copy-and-paste and cruft

Page 9: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

And it’s completely undocumented

Page 10: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

Q: How do you debug it?

Page 11: “How Perl Saved the Human Genome Project”

Obligatory tenuous coding analogy

A: Diff a working copy and a broken copy

Page 12: “How Perl Saved the Human Genome Project”

Same as it ever was

We still have the same problems as in 1996

a rapidly-changing situation text-manipulation to convert between data formats building pipelines to glue data analysis programs together

Page 13: “How Perl Saved the Human Genome Project”

A rapidly changing situation

MR Stratton et al. Nature 458, 719-724 (2009)

Page 14: “How Perl Saved the Human Genome Project”

Many data formats

“a sea of incompatible data formats”

“[for each new piece of software] you could always count on it to sport its own idiosyncratic user interface and data format.

Lincoln Stein, TPJ Vol 1 #2 Summer 1996

Page 15: “How Perl Saved the Human Genome Project”

Building pipelines

Initial data QC

Data QC

Submission to public archives

Sample reception

Library prep

Sequence ordering

Sequencing

Tracking

Genotype check

Library QC

Recalibration

Mapping to reference

Merging libraries

To collaboratorsSNP calling Structural variants

Filtering

Build release BAM files

Collaborator data

Visualization

Downstream analysis

Page 16: “How Perl Saved the Human Genome Project”

In conclusion

“Although it's not perfect, Perl fills the needs of the genome centers remarkably well, and is usually the first tool we turn to when we have a problem to solve.”