modelling proteins and proteomes using linux clusters ram samudrala university of washington
TRANSCRIPT
![Page 1: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/1.jpg)
Modelling proteins and proteomes using Linux clustersRam Samudrala
University of Washington
![Page 2: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/2.jpg)
Examples of biological problems
Protein structure prediction/docking simulations- need to run different trajectories that sometimes
talk with each other
Molecular dynamics simulations- need more cohesive parallelisation
Polarisable force fields - need true parallelisation
Bioinformatics searches/exploration- trivially parallelisable
![Page 3: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/3.jpg)
Computational issues
Need efficient methods to start/stop jobs
Need load/balancing queuing system
Need fast communications at times
Need stability (months/years uptimes)
Need low maintainance/management overhead
Need low installation overhead
Needs to be cheap!
![Page 4: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/4.jpg)
Hardware and operating system
256 AMD and Intel CPUs (1-2.5 GHz)
0.5-1 GB RAM, 100-200 GB HD, dual processor MBs
100Mbps ethernet connectivity for 64 processor sets
White boxes are good but use up space – 1u racks ideal
Minimal Linux installation – create clone “CD” – copy on all machines
![Page 5: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/5.jpg)
Our solution
No single solution – user implements their own
Completely decentralised
Analyse problem and determine parallelisable parts
Implementation specific to problem
Use local scratch space for computation
Redundant storage of data for faster access
Limit problem space to specific problems
![Page 6: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/6.jpg)
Problem specific implementation
MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs
Docking: sample different ligands/regions of the proteinon different CPUs
MD: Pairwise force-fields are additive
PFF: ?
Bioinformatics: trivial parallelisation; communication by disk
![Page 7: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/7.jpg)
Semi-exhaustive segment-based foldingEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK
generatefragments from database14-state , model
… …
minimisemonte carlo with simulated annealingconformational space annealing, GA
… …
filter all-atom pairwise interactions, bad contactscompactness, secondary structure
![Page 8: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/8.jpg)
T170/sfrp3 – 4.8 Å for all 69 aa
Ab initio prediction at CASP
![Page 9: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/9.jpg)
Comparative modelling at CASP
T182 – 1.0 Å (249 aa; 41% id)
![Page 10: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/10.jpg)
Prediction of SARS CoV proteinase inhibitors
Ekachai Jenwitheesuk
![Page 11: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/11.jpg)
Bioverse – S. typhimurium protein-protein interaction network
Jason McDermott
![Page 12: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/12.jpg)
Bioverse – H. sapiens protein-protein interaction network
Jason McDermott
![Page 13: Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington](https://reader035.vdocuments.net/reader035/viewer/2022062803/56649f575503460f94c7b569/html5/thumbnails/13.jpg)
Future directions
Network connection with multiple ethernet cards based on traffic analysis
Gigabit ethernet (switches are still expensive)
Better network filesystems