allbio and eu codefest 2014

16
/ | Phd Student @ Bioinformatics and Population Genomics Supervisor: Yannick Wurm | Before: AllBio EU CodeFest bmpvieira.com/allbio14 Bruno Vieira @bmpvieira @yannick__ © 2014 Bruno Vieira CC-BY 4.0

Upload: bruno-vieira

Post on 24-Jun-2015

204 views

Category:

Science


3 download

DESCRIPTION

Lightning talk for TGAC - AllBio: Open Science & Reproducibility Best Practice Workshop (first 4 slides). Request talk at the end of EU CodeFest (all slides)

TRANSCRIPT

Page 1: AllBio and EU CodeFest 2014

 /  |  

Phd Student @Bioinformaticsand Population GenomicsSupervisor:Yannick Wurm |    Before: 

AllBio EU CodeFestbmpvieira.com/allbio14

Bruno Vieira @bmpvieira

@yannick__

 © 2014 Bruno Vieira CC-BY 4.0

Page 2: AllBio and EU CodeFest 2014

Some problems I facedduring my research:

Difficulty getting relevant descriptionsand datasets from NCBI API using bio* libsFor web projects, needed to implementthe same functionality on browser andserverDifficulty writing scalable, reproducibleand complex bioinformatic pipelines

Page 3: AllBio and EU CodeFest 2014

 - Modular and universal bioinformatics

Pipeable UNIX command line tools and

JavaScript / Node.js APIs for bioinformatic

analysis workflows on the server and browser.

Collaborates with   - Represent biological data on the web

 - Build data pipelinesProvides a streaming interface between every file

format and data storage backend. "git for data" |    |  

Bionode.io

BioJS

Dat

dat-data.com @maxogden @mafintosh

Page 4: AllBio and EU CodeFest 2014

 ExamplesBASH

JavaScript

bionode.io (online shell)

bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna

http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/ GCA_000188075.1_Si_gnG_genomic.fna.gz

bionode-ncbi download sra arthropoda | bionode-sra

bionode-ncbi download gff bacteria

var ncbi = require('bionode-ncbi') ncbi.urls('assembly', 'Solenopsis invicta'), gotData) function gotData(urls) { var genome = urls[0].genomic.fna download(genome) })

Page 5: AllBio and EU CodeFest 2014

Difficulty getting relevant description anddatasets from NCBI API using bio* libsPython example

Solution: 

import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "[email protected]" esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('<root>' + metadata_XML + '</root>') for entry in Metadata[1]: print entry.text

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG

bionode-ncbi

Page 6: AllBio and EU CodeFest 2014

Need to reimplement the same code onbrowser and server.Solution: JavaScript everywhere

 is converting parsers toBionode

AfraSequenceServerGeneValidatorBioJS

Biodalliance

Page 7: AllBio and EU CodeFest 2014

Difficulty writing scalable, reproducible andcomplex bioinformatic pipelines.Solution: Node.js   everywhereStreams var ncbi = require('bionode-ncbi') var tool = require('tool-stream') var through = require('through2') var fork1 = through.obj() var fork2 = through.obj()

ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads)

fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples)

fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed'))

Page 8: AllBio and EU CodeFest 2014

     

Benefit from other JSprojects

Dat BioJS NoFlo

Page 9: AllBio and EU CodeFest 2014
Page 10: AllBio and EU CodeFest 2014
Page 11: AllBio and EU CodeFest 2014

Reusable, small and testedmodules

Page 12: AllBio and EU CodeFest 2014

Some users and Contributors:

 (UC San Diego)Michael LovciOlga Botvinnik

Soon:

DatBiodallianceBioJSYeo Lab

AfraGeneValidator

DNADigest

Page 14: AllBio and EU CodeFest 2014

Why Node.js / JavaScript applies well to Bioinformatics

Easy to write   for Streams

Same language everywhere (JavaScript)Package Manager that works ( )Huge number modules ( )Use other JS projects ( ,  ,  )Possible to write 

StreamsCLI wrappers

Reusable, small and tested modules

NPM93327, 199/day

Dat BioJS NoFloDesktop GUI apps in JS

Page 15: AllBio and EU CodeFest 2014

Module counts

Page 16: AllBio and EU CodeFest 2014

Package Manager that works

Not only for JavaScript, C/C++ too:

npm install bionodenpm install bionode -gnpm testnpm startnpm run test-browsernpm run build-docsnpm initnpm publish

Node.js style C/C++ modulesNative C/C++ running in Google V8