blast&and&bioinformacs& applicaons&on&& purdue’s&diagrid&...

18
BLAST and Bioinforma/cs Applica/ons on Purdue’s DiaGrid May 3, 2012 Brian Raub Purdue University [email protected] Condor Week 2012

Upload: lyque

Post on 07-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLAST  and  Bioinforma/cs  Applica/ons  on    Purdue’s  DiaGrid  

May  3,  2012    

Brian  Raub  Purdue  University  [email protected]  

Condor  Week  2012  

Page 2: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Where  were  we?  

•  Over  37  kilocores  across  campus  – Three  community  clusters  (Steele,  Coates,  Rossmann)  

– Two  “ownerless”  clusters  (Radon,  Miner)  – CMS  Tier-­‐2  cluster  – Other  small  clusters  –  Instruc/onal  labs  and  academic  departments  

Condor  Week  2012  

Page 3: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

…  and  what  about  now?  

Condor  Week  2012  

•  Nearly  50  kilocores  across  campus!  –  Two  new  community  clusters  

•  Hansen  –  Dell  nodes  w/  four  12-­‐core  AMD  Opteron  6176  processors  

•  Carter  –  HP  nodes  w/  2  8-­‐core  Intel  Xeon-­‐E5  processors  (Sandy  Bridge)  

–  Carter  ranks  54th  in  the  latest  Top500.org  list  for  fastest  supercomputers  

–  Carter  is  the  na/on’s  fastest  campus  supercomputer  

Page 4: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

DiaGrid?  

•  A  large,  high-­‐throughput,  distributed  compu/ng  system  

•  Using  Condor  to  manage  jobs  and  resources  •  Purdue  leading  a  partnership  of  10  campuses  and  ins/tu/ons    –  University  of  Wisconsin,  Notre  Dame  and  Indiana  University  to  name  a  few  

•  Including  all  Purdue  (and  other  campus)  clusters,  lab  computers,  department  computers,  desktop,  totaling  60,000+  cores  

Condor  Week  2012  

Page 5: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Ok,  cool…  Now  what?  

Condor  Week  2012  

Page 6: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Basic  Local  Alignment  Search  Tool  

•  Comparing  nucleo/de  or  protein  sequences  –  String  and  Substring  pafern  matching  

•  Na/onal  Center  for  Biotechnology  Informa/on  (NCBI)  

Condor  Week  2012  

Page 7: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Why  remake  something?  

•  Input  file  size  limita/ons  (5MB,  10MB,  etc.)  •  #  of  sequences  for  comparison  •  Timeliness  •  Ease  of  use  

Condor  Week  2012  

Page 8: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLAST  and  DiaGrid  

•  BLAST  is  highly  parallelizable  – No  one  sequence  result  depends  on  another  (GREAT!!!)  

– Split  input  file  with  trusty  friend  AWK  – Build  a  Condor  DAG  to  maintain  all  jobs  

•  Never  more  than  1500  individual  jobs  

Condor  Week  2012  

Page 9: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLAST  and  DiaGrid  

Results  

Input  File  

Condor  Week  2012  

Page 10: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLASTer  

Condor  Week  2012  

Page 11: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLASTer  

Condor  Week  2012  

Page 12: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLASTer  

Condor  Week  2012  

Page 13: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Big  Benefits?  We  think  so!  

•  Rick  Westerman  – Bioinforma/cs  Specialist  at  the  Purdue  University  Genomics  Facility  

Condor  Week  2012  

Page 14: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Development  Hurdles  

•  DiaGrid  disk  quota  per  user  – Default  1GB  -­‐>  NOT  ENOUGH  SPACE!!!  

•  Condor  job  failure  – Set  retry  flag  (We  use  20  to  be  safe)  

•  Need  more  features!  

Condor  Week  2012  

Page 15: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

To  the  Future!  

Condor  Week  2012  

Page 16: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

BLASTer  Plans  

•  Custom  Databases  – Nearly  all  researchers  want  this  feature  – Concern:  Database  permissions  

•  More  output  viewing  op/ons  –  Integrated  HTML  viewer  – Blast2Go  

•  Befer  file  management  

Condor  Week  2012  

Page 17: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

DiaGrid  Plans  

•  R  (programming  language)  sta/s/cal  compu/ng  – Landscape  Ecology  &  Biodiversity  Department  

•  Cryo-­‐Electron  Microscopy  Tools  (Cryo-­‐EM)  – Single  par/cle  reconstruc/on  (EMAN2  and  similar  tools)  

– Department  of  Biological  Sciences  

Condor  Week  2012  

Page 18: BLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& · PDF fileBLAST&and&Bioinformacs& Applicaons&on&& Purdue’s&DiaGrid& May&3,&2012& & Brian&Raub& Purdue&University& braub@purdue.edu&

Ques/ons?  

Condor  Week  2012