is there an app for that ?
DESCRIPTION
Is there an app for that ?. Challenges in scalable analysis for Life sciences. Nirav Merchant UA BioComputing + iPlant Arizona Research Laboratories University of Arizona http:// bcf.arl.arizona.edu /. 1. Topic Coverage. Formula for success (and failure) Flavors of Bio-information - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/1.jpg)
1
Is there an app for that ?Challenges in scalable analysis for Life sciences
1
Nirav MerchantUA BioComputing + iPlantArizona Research LaboratoriesUniversity of Arizonahttp://bcf.arl.arizona.edu/
![Page 2: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/2.jpg)
Topic Coverage Formula for success (and failure) Flavors of Bio-information What is iPlant ? Typical Non-NGS workflow Data life cycle issues (some) Application life cycle issues (some) Why “app” ?
2
![Page 3: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/3.jpg)
3
+ =
Simple Formula
![Page 4: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/4.jpg)
The Reality
4
+ +PERL PythonJava RubyFortran C C# C++R Matlabetc.
AmazonAzureRackspaceCampus HPCXSEDEEtc.
and lots of glue…..
![Page 5: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/5.jpg)
+ =
Simple Formula
![Page 6: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/6.jpg)
Life science: Going across scales
6
![Page 7: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/7.jpg)
Putting it all to work
Wayne Stayskal, The Tampa Tribune
![Page 8: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/8.jpg)
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
• The iPlant CI is designed as infrastructure. • This means it is a platform upon which other projects
can build. • Use of the iPlant infrastructure can take one of several
forms: Storage Computation Hosting Web Services Scalability
![Page 9: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/9.jpg)
For a challenge as broad as “plant science,” focus on specific applications/tools is a moving target, and never enough.
Most important to build a platform that can support diverse and constantly evolving needs. “Cyberinfrastructure” is, in fact, infrastructure. The platform can lift all the apps, not select winners and losers.
“The useful lifetime of our analysis toolchains is now 6 months”
-Matthew Trunnel, Broad Institute
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
![Page 10: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/10.jpg)
EndUsers
ComputationalUsers
TeragridXSEDE
The iPlant CollaborativeCyberinfrastructure for the Plant Sciences
![Page 11: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/11.jpg)
BioInformation :: Data FlavorsSequencesStructuresImagesVideoAudioPathways (graphs)Text (Publications)TracesCombination (eg Video & Traces)And much more …
![Page 12: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/12.jpg)
Life scientist :: Data Wrestler
Volume of data is increasing Resolution of data is increasing Number of data repositories is
increasing Ever increasing analysis options Demands to share, collaborate
data (team science) Do you know where your data is ?
(and your collaborators data !)
![Page 13: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/13.jpg)
13
SystemsBiology
Genomics
FunctionalGenomics
Metabolomics
Proteomics
Pharmaco-genomics
Modeling
Clinical
Pathways
![Page 14: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/14.jpg)
X prize for sequencing
142012 guidelines are different, this is graphics dated
![Page 15: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/15.jpg)
X prize for analyzing it ?
?15
![Page 16: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/16.jpg)
The Lifecycle
Data Acquisition
and Modeling
Collaboration and
Visualization
Analysis and Data
Mining
Dissemination and Sharing
Archive and Presentatio
n
16
The Fourth Paradigm: Data-Intensive Scientific Discovery
![Page 17: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/17.jpg)
17
![Page 18: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/18.jpg)
18
![Page 19: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/19.jpg)
Why is this hard when we have … Pegasus Taverna Kepler Condor (DAGman) Gearman Makeflow myExperiment Science pipes We have X (take your pick)
19
![Page 20: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/20.jpg)
What did the scientists do ?
20
• Used the “parametric launcher” • Essentially its a very functional “submit” script !• Why use it ?
• Dir of full of files and one executable• Simple linear flow (no branching)• Needed results “yesterday” for
conference/working group• Need to be run ONCE every year
• Not sexy but functional• Serial runs are important
![Page 21: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/21.jpg)
Python in HPC : OMG
21
![Page 22: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/22.jpg)
Data issues
22
![Page 23: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/23.jpg)
DLM: Issues Most “pipelines/analysis” are Data
intensiveSadly data originates from slow desktops, external hard drives, file servers using ftp, http etc (and ends up there)
Hard to stage data to begin computation !No place to bring things together (quickly)
Data needs substantial pre and post processingMeta data is usually not adequate
RDBMS are part of workflows Do you need better indexing of flat files ?
It does not have to be this way !
23
![Page 24: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/24.jpg)
24
![Page 25: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/25.jpg)
Data Lifecycle: Our effort
25
![Page 26: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/26.jpg)
What can users do ?
26
![Page 27: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/27.jpg)
27
![Page 28: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/28.jpg)
But I don’t get throughput
28
Networking is huge BLACK BOX and too much finger pointing
![Page 29: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/29.jpg)
Compute Issues: Cloud
29
![Page 30: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/30.jpg)
What is cloud computing ?
http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html
![Page 31: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/31.jpg)
The application lifecycle
31
![Page 32: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/32.jpg)
32
A rich web client Provides a consistent interface to
a range of bioinformatics tools Provides a portal to users not
wishing to interact with lower level infrastructure
An integrated, extensible system of applications and services
Provides additional intelligence above low level APIs – Provenance, Collaboration, etc.
The iPlant CollaborativeiPlant Discovery Environment
![Page 33: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/33.jpg)
API-compatible implementation of Amazon EC2/S3 interfaces
Virtualize the execution environment for applications and services
Get Up to 12 core / 48 GB instances Access to Cloud Storage + EBS 1008 users 167 users launched 657 instances (May 2012) 227 were terminated outside the of Atmosphere due to
idleness (per user's request) 430 instances average time was 1 day, 16 hours, and 13
minutes. Longest running was 30 days Run servers, CloudBurst desktop use cases. Big data and
the desktop are co-local again!
>60 hosted applications in Atmosphere today, including users from USDA, Forest Service, data providers, etc.
30+ private images for postdocs and grad students for training classes
The iPlant CollaborativeProject Atmosphere™: Custom Cloud Computing
![Page 34: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/34.jpg)
Atmosphere: Collaboration
iPlant Data Store
![Page 35: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/35.jpg)
Lifecycle
![Page 36: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/36.jpg)
How to Connect
![Page 37: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/37.jpg)
Different Ways to Log in to VMs
![Page 38: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/38.jpg)
Steps to get started !
![Page 39: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/39.jpg)
My wish list for CCL (parrot) Improved performance for iRODS
transfers(parallel transfers ?)
File permission calls (iRODS ACL)* Ability to provide throughput/transfer
stats Thanks for updating iRODS support to
3.1
39
![Page 40: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/40.jpg)
My wish list for CCL (makeflow) *Bundle dependencies along with
script and binaries e.g.CDE: Automatically create portable Linux applicationshttp://www.pgbovine.net/cde.html
Progress reporting, profiling of performance e.g equivalent progress bar
40
*Not a makeflow issue but a good feature
![Page 41: Is there an app for that ?](https://reader035.vdocuments.net/reader035/viewer/2022070422/568164e9550346895dd75484/html5/thumbnails/41.jpg)
Staff:Greg AbramSonali AdityaRoger BarthelsonBrad BoyleTodd BryanGordon BurleighJohn CazesMike ConwayKaren CranstonRion DoodeyAndy EdmondsDmitry FedorovMichael GattoUtkarsh GaurCornel GhibanMichael GonzalesHariolf HäfeleMatthew Hanlon
74
Metadata Data Tools Workflows Viz
Executive Team:Steve GoffDan Stanzione
Faculty Advisors & Collaborators:Ali AkogluGreg AndrewsKobus BarnardSue BrownThomas BrutnellMichael DonoghueCasey DunnBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDan KliebensteinJim Leebens-MackDavid LowenthalRobert Martienssen
Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYa-Di ChenJohn DonoghueSteven Gregory Yekatarina KhartianovaMonica Lent Amgad Madkour
B.S. Manjunath Nirav Merchant David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisAnn StapletonLincoln SteinVal TannenTodd VisionDoreen WareSteve WelchMark Westneat
Andrew LenardsZhenyuan LuEric LyonsNaim MatasciSheldon McKayRobert McLayAngel MercerDave MicklosNathan MillerSteve Mock Martha NarroPraveen NuthulapatiShannon OliverShiran PasternakWilliam PeilTitus PurdinJ.A. Raygoza GarayDennis RobertsJerry Schneider
Anthony HeathBarbara HeathMatthew Helmke Natalie HenriquesUwe HilgertNicole HopkinsEun-Sook JeongLogan JohnsonChris JordanB.D. KimKathleen KennedyMohammed KhalfanSeung-jin KimLars KoersterkSangeeta KuchimanchiKristian KvilekvalAruna LakshmananSue LauterTina Lee
Bruce SchumakerSriramu SingaramEdwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellKris UriePeter Van BurenHans Vasquez-GrossMatthew VaughnFusheng WeiJason WilliamsJohn WregglesworthWeijia XuJill Yarmchuk
Aniruddha MaratheKurt MichaelsDhanesh PrasadAndrew PredoehlJose SalcedoShalini SasidharanGregory StriemerJason VandeventerKuan Yang
Postdocs:Barbara BanburyJamie EstillBindu JosephChristos Noutsos Brad RuhfelStephen A. SmithChunlao TangLin WangLiya WangNorman Wickett
The iPlant Collaborative