indianauniversityindianauniversity hpc@idc april 2002 implementing advanced it facilities for the...
TRANSCRIPT
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Implementing advanced IT facilities for the Indiana
Genomics Initiative
Craig A. Stewart
HPC@IDC meeting
April 23-24, 2002, HPC User Forum meeting, Santa Fe, New Mexico
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
Y
License terms• Please cite as: Stewart, C.A. Implementing advanced IT facilities for the Indiana
Genomics Initiative. 2002. Presentation. Presented at: HPC User Forum (Santa Fe, New Mexico, 23 Apr 2002). Available from: http://hdl.handle.net/2022/15220
• Except where otherwise noted, by inclusion of a source url or some other note, the contents of this presentation are © by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.
2
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Indiana University’s Goals
• IT Goal: “To be a leader in absolute terms in information technology.” IU president Myles Brand, 1996
• Goals of the Indiana Genomics Initiative: To advance understanding of life’s processes, develop new therapies for human diseases, improve the quality of human health in Indiana, and enhance the strength of the central Indiana high-tech economy
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
IU in a nutshell
• Founded in 1820
• $2B Annual Budget
• 8 campuses
• Campuses well connected; esp. IUB, IUPUI, and Purdue’s campus at W. Lafayette connected by I-light
• IU Operates TransPAC, GlobalNOC
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
IT@IU in a nutshell
• Academic programs in IT through computer science, library and information sciences, engineering and technology, and most notably through new School of Informatics
• CIO: Vice President Michael A. McRobbie• ~$100M annual budget• Technology services offered university-wide• pervasivetechnologylabs
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
School of Medicine in a nutshell
• 2nd largest School of Medicine in the US
• IU Cancer Center nationally recognized leader
• Regenstrief Institute longstanding leader in medical informatics
• National leader in optical and tomographic imaging
• Longstanding leader in genetically influenced diseases including Huntington’s (Conneally), alcoholism (Li); currently lead institution in national study of bipolar disorder
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
INGEN
• Created by $105 M grant from the Lilly Endowment to Indiana University
• Involves IU School of Medicine (IUPUI), Departments of Biology and Chemistry (IUB), Center for Genomics and Proteomics (IUB), and University Information Technology Services
• Comprised of “Programs” (central research areas) and “Cores” (supporting units that are also generally research areas)
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
IT and INGEN
• INGEN’s IT core is a critical part of the infrastructure for the initiative as a whole
– Networking (using I-light facility)
– Supercomputing
– Massive Data Storage
– Visualization
– Support
• IT is one of the paths by which INGEN should enhance the Indiana Economy
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Supercomputing - Oct 17 IU/IBM announcement
• IU tripled the capacity of its IBM SP, to > 1 TFLOPS (a trillion mathematical operations per second).
• IU’s SP is very large when considered within the set of supercomputers owned by individual universities
• Large part of this acquisition made possible via funding from INGEN
• IU and IBM also announced a partnership in developing new supercomputer applications for the life sciences
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Photo: Tyagan Miller. May be reused by IU for noncommercialpurposes. To license for commercial use, contact the photographer
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Sun E10000
• IU is a Sun “Center of Excellence” and is pursuing collaborative research with Sun in the area of Chemical Informatics
Photo: Tyagan Miller. May be reused by IU for noncommercialpurposes. To license for commercial use, contact the photographer
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
AVIDD
• Analysis & Visualization of Instrument-Driven Data
• Large, distributed Intel-compatible Linux cluster
• Distributed data storage/data staging
• Distributed visualization
• Education a key component of this initiative – distributed education (IUB, IUPUI, IUN) taught via Access Grids at advanced undergrad/beginning grad level
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Massive Data Storage
• IU has a large massive data storage system based on IBM and STK tape robotic systems.
• IU’s massive data storage system is based on HPSS (High Performance Storage System) which provides for excellent security.
• >300 TB current capacity• Mirrored storage in Indianapolis and Bloomington should
provide safety in data storage• IU was first installation to implement remote HPSS
movers over long haul networks
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Photo: Tyagan Miller. May be reused by IU for noncommercialpurposes. To license for commercial use, contact the photographer
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Advanced Visualization
• UITS, IU School of Medicine, and IUPUI Computer & Information Science have already collaborated to create 3-DIVE (3-D Interactive Volume Explorer)
• CAVE
• Immersadesk
• IU-designed passive 3D environments (4’ sq screen, 5’ sq footprint)
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Accomplishments & Challenges
• Past accomplishments
– fastDNAml
– 3DIVE
• Challenges
– Broader engagement with life scientists
– Data heterogeneity
– New application areas
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
fastDNAml
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Building Phylogenetic Trees
• Goal: an objective means by which phylogenetic trees can be estimated in tolerable amounts of wall-clock time, producing phylogenetic trees with measures of their uncertainty
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Why is tree-building a HPC problem?
• The number of bifurcating unrooted trees for n taxa is
(2n-5)!/ {2n-3(n-3)!}
• For 100 taxa the number of possible trees is ~10182
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
fastDNAml
• Developed by Gary Olsen• Derived from Felsensteins’s PHYLIP programs• One of the more commonly used ML methods• The first phylogenetic software implemented in a parallel
program (at Argonne National Laboratory, using P4 libraries)• Olsen, G.J.,et al.1994. fastDNAml: a tool for construction of
phylogenetic trees of DNA sequences using maximum likelihood. Computer Applications in Biosciences 10: 41-48
• MPI version available from IU now (development supported by IBM SUR grant)
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Performance of fastDNAml
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60 70
Number of Processors
Spee
dUp
Perfect Scaling 50 Taxa 101 Taxa 150 Taxa
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Current projects
• Data integration
• Gamma knife
• Pedigree analysis
• PET scan analysis
• Protein families
• AMASS – shotgun sequence assembly
• Data, data, data
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
HPC and life sciences
• HPC hardware and software market set to dramatically expand thanks to life sciences
• HPC and life sciences communities don’t share common language• Biomedical researchers are no more conservative than anyone else • Biomedical researchers not alone in creating bad code• Both communities have lots to offer each other, but it seems at
present up to the HPC community to reach out (when was the last time an astronomer saved your life?)
• HPC community has been slow to take advantage of opportunities offered via collaboration with life scientists
• This will be like the dot-com bust – sort of. The key question is: how great will be the similarities?
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Challenges: creating collaborations with life scientists
• Need to challenge “I can do it on my desktop” mentality when appropriate
• Go for the low hanging fruit
• Remember that physics, astronomy, and other traditional HPC codes have a head start of many years
• Need to recognize the complexity of the life sciences
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Current approaches @ IU
• Really clever batch scripts…. then portals
• Appropriate documentation
• Door to door consulting
• Proof of concept projects
• Contributions to open source/community code efforts
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Keys to success in partnerships @ IU
• Long history of openness, diversity in HPC uses
• Accountability and service philosophy
• Supercomputing time and programming support baseline services
• Central computing center staff hired from several disciplines (including biology)
• Computer scientists who actually care about applications
• History and a certain amount of luck
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Summary
• IU has thus far been very successful in implementing advanced IT infrastructure for life scientists
• Reaching out has been essential to formation of partnerships
• Industry partnerships have been essential to success
• So far, so good……
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Acknowledgements
• IBM research relationships & SUR grants
• Sun and Center of Excellence relationships
• Compaq relationship
• Computer scientists at IU (esp. Randall Bramley, Dennis Gannon, Shaoifen Fang)
• State of Indiana
• Lilly Endowment
I
N
D
I
A
N
A
U
N
I
V
E
R
S
I
T
YHPC@IDC April 2002
Important URLs
• University Information Technology Services: www.indiana.edu/~uits/
• UITS Research & Academic Computing Division www.indiana.edu/~uits/rac
• InGen IT Core: www.indiana.edu/~rac/bioinformatics/ingen.html
• IU Teraflop SP announcement: www.indiana.edu/~rac/outreach.html
• IT@IU: it.iu.edu