indianauniversityindianauniversity hpc@idc april 2002 implementing advanced it facilities for the...

30
I N D I A N A U N I V E R S I T Y HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart [email protected] HPC@IDC meeting April 23-24, 2002, HPC User Forum meeting, Santa Fe, New Mexico

Upload: everett-foster

Post on 25-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Implementing advanced IT facilities for the Indiana

Genomics Initiative

Craig A. Stewart

[email protected]

HPC@IDC meeting

April 23-24, 2002, HPC User Forum meeting, Santa Fe, New Mexico

Page 2: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

Y

License terms• Please cite as: Stewart, C.A. Implementing advanced IT facilities for the Indiana

Genomics Initiative. 2002. Presentation. Presented at: HPC User Forum (Santa Fe, New Mexico, 23 Apr 2002). Available from: http://hdl.handle.net/2022/15220

• Except where otherwise noted, by inclusion of a source url or some other note, the contents of this presentation are © by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

2

Page 3: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Indiana University’s Goals

• IT Goal: “To be a leader in absolute terms in information technology.” IU president Myles Brand, 1996

• Goals of the Indiana Genomics Initiative: To advance understanding of life’s processes, develop new therapies for human diseases, improve the quality of human health in Indiana, and enhance the strength of the central Indiana high-tech economy

Page 4: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

IU in a nutshell

• Founded in 1820

• $2B Annual Budget

• 8 campuses

• Campuses well connected; esp. IUB, IUPUI, and Purdue’s campus at W. Lafayette connected by I-light

• IU Operates TransPAC, GlobalNOC

Page 5: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

IT@IU in a nutshell

• Academic programs in IT through computer science, library and information sciences, engineering and technology, and most notably through new School of Informatics

• CIO: Vice President Michael A. McRobbie• ~$100M annual budget• Technology services offered university-wide• pervasivetechnologylabs

Page 6: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

School of Medicine in a nutshell

• 2nd largest School of Medicine in the US

• IU Cancer Center nationally recognized leader

• Regenstrief Institute longstanding leader in medical informatics

• National leader in optical and tomographic imaging

• Longstanding leader in genetically influenced diseases including Huntington’s (Conneally), alcoholism (Li); currently lead institution in national study of bipolar disorder

Page 7: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

INGEN

• Created by $105 M grant from the Lilly Endowment to Indiana University

• Involves IU School of Medicine (IUPUI), Departments of Biology and Chemistry (IUB), Center for Genomics and Proteomics (IUB), and University Information Technology Services

• Comprised of “Programs” (central research areas) and “Cores” (supporting units that are also generally research areas)

Page 8: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Page 9: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

IT and INGEN

• INGEN’s IT core is a critical part of the infrastructure for the initiative as a whole

– Networking (using I-light facility)

– Supercomputing

– Massive Data Storage

– Visualization

– Support

• IT is one of the paths by which INGEN should enhance the Indiana Economy

Page 10: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Supercomputing - Oct 17 IU/IBM announcement

• IU tripled the capacity of its IBM SP, to > 1 TFLOPS (a trillion mathematical operations per second).

• IU’s SP is very large when considered within the set of supercomputers owned by individual universities

• Large part of this acquisition made possible via funding from INGEN

• IU and IBM also announced a partnership in developing new supercomputer applications for the life sciences

Page 11: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Photo: Tyagan Miller. May be reused by IU for noncommercialpurposes. To license for commercial use, contact the photographer

Page 12: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Sun E10000

• IU is a Sun “Center of Excellence” and is pursuing collaborative research with Sun in the area of Chemical Informatics

Photo: Tyagan Miller. May be reused by IU for noncommercialpurposes. To license for commercial use, contact the photographer

Page 13: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

AVIDD

• Analysis & Visualization of Instrument-Driven Data

• Large, distributed Intel-compatible Linux cluster

• Distributed data storage/data staging

• Distributed visualization

• Education a key component of this initiative – distributed education (IUB, IUPUI, IUN) taught via Access Grids at advanced undergrad/beginning grad level

Page 14: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Massive Data Storage

• IU has a large massive data storage system based on IBM and STK tape robotic systems.

• IU’s massive data storage system is based on HPSS (High Performance Storage System) which provides for excellent security.

• >300 TB current capacity• Mirrored storage in Indianapolis and Bloomington should

provide safety in data storage• IU was first installation to implement remote HPSS

movers over long haul networks

Page 15: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Photo: Tyagan Miller. May be reused by IU for noncommercialpurposes. To license for commercial use, contact the photographer

Page 16: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Advanced Visualization

• UITS, IU School of Medicine, and IUPUI Computer & Information Science have already collaborated to create 3-DIVE (3-D Interactive Volume Explorer)

• CAVE

• Immersadesk

• IU-designed passive 3D environments (4’ sq screen, 5’ sq footprint)

Page 17: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Accomplishments & Challenges

• Past accomplishments

– fastDNAml

– 3DIVE

• Challenges

– Broader engagement with life scientists

– Data heterogeneity

– New application areas

Page 18: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

fastDNAml

Page 19: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Building Phylogenetic Trees

• Goal: an objective means by which phylogenetic trees can be estimated in tolerable amounts of wall-clock time, producing phylogenetic trees with measures of their uncertainty

Page 20: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Why is tree-building a HPC problem?

• The number of bifurcating unrooted trees for n taxa is

(2n-5)!/ {2n-3(n-3)!}

• For 100 taxa the number of possible trees is ~10182

Page 21: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

fastDNAml

• Developed by Gary Olsen• Derived from Felsensteins’s PHYLIP programs• One of the more commonly used ML methods• The first phylogenetic software implemented in a parallel

program (at Argonne National Laboratory, using P4 libraries)• Olsen, G.J.,et al.1994. fastDNAml: a tool for construction of

phylogenetic trees of DNA sequences using maximum likelihood. Computer Applications in Biosciences 10: 41-48

• MPI version available from IU now (development supported by IBM SUR grant)

Page 22: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Performance of fastDNAml

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70

Number of Processors

Spee

dUp

Perfect Scaling 50 Taxa 101 Taxa 150 Taxa

Page 23: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Current projects

• Data integration

• Gamma knife

• Pedigree analysis

• PET scan analysis

• Protein families

• AMASS – shotgun sequence assembly

• Data, data, data

Page 24: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

HPC and life sciences

• HPC hardware and software market set to dramatically expand thanks to life sciences

• HPC and life sciences communities don’t share common language• Biomedical researchers are no more conservative than anyone else • Biomedical researchers not alone in creating bad code• Both communities have lots to offer each other, but it seems at

present up to the HPC community to reach out (when was the last time an astronomer saved your life?)

• HPC community has been slow to take advantage of opportunities offered via collaboration with life scientists

• This will be like the dot-com bust – sort of. The key question is: how great will be the similarities?

Page 25: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Challenges: creating collaborations with life scientists

• Need to challenge “I can do it on my desktop” mentality when appropriate

• Go for the low hanging fruit

• Remember that physics, astronomy, and other traditional HPC codes have a head start of many years

• Need to recognize the complexity of the life sciences

Page 26: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Current approaches @ IU

• Really clever batch scripts…. then portals

• Appropriate documentation

• Door to door consulting

• Proof of concept projects

• Contributions to open source/community code efforts

Page 27: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Keys to success in partnerships @ IU

• Long history of openness, diversity in HPC uses

• Accountability and service philosophy

• Supercomputing time and programming support baseline services

• Central computing center staff hired from several disciplines (including biology)

• Computer scientists who actually care about applications

• History and a certain amount of luck

Page 28: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Summary

• IU has thus far been very successful in implementing advanced IT infrastructure for life scientists

• Reaching out has been essential to formation of partnerships

• Industry partnerships have been essential to success

• So far, so good……

Page 29: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Acknowledgements

• IBM research relationships & SUR grants

• Sun and Center of Excellence relationships

• Compaq relationship

• Computer scientists at IU (esp. Randall Bramley, Dennis Gannon, Shaoifen Fang)

• State of Indiana

• Lilly Endowment

Page 30: INDIANAUNIVERSITYINDIANAUNIVERSITY HPC@IDC April 2002 Implementing advanced IT facilities for the Indiana Genomics Initiative Craig A. Stewart stewart@iu.edu

I

N

D

I

A

N

A

U

N

I

V

E

R

S

I

T

YHPC@IDC April 2002

Important URLs

• University Information Technology Services: www.indiana.edu/~uits/

• UITS Research & Academic Computing Division www.indiana.edu/~uits/rac

• InGen IT Core: www.indiana.edu/~rac/bioinformatics/ingen.html

• IU Teraflop SP announcement: www.indiana.edu/~rac/outreach.html

• IT@IU: it.iu.edu