cyberinfrastructure day 2010: applications in biocomputing

52
Jeremy Yang Software Systems Manager Division of Biocomputing Dept. of Biochemistry & Molecular Biology UNM School of Medicine Cyberinfrastructure Day -- April 22, 2010

Upload: jeremy-yang

Post on 12-Nov-2014

1.064 views

Category:

Documents


0 download

DESCRIPTION

UNM Cyberinfrastructure Day 2010 presentation: Applications in Biocomputing, biomedical and cheminformatics research computing cyberinfrastructure issues.

TRANSCRIPT

Page 1: Cyberinfrastructure Day 2010: Applications in Biocomputing

Jeremy Yang Software Systems Manager Division of Biocomputing Dept. of Biochemistry & Molecular Biology UNM School of Medicine

Cyberinfrastructure Day -- April 22, 2010

Page 2: Cyberinfrastructure Day 2010: Applications in Biocomputing

What is Biocomputing? Cyber Revolution (~1980-2010+) Cyberinfrastructure (To be or not to be?) Super Computing, Redefined

I. II. III. IV.

Page 3: Cyberinfrastructure Day 2010: Applications in Biocomputing

Division of Biocomputing http://biocomp.health.unm.edu/

Department of Biochemistry & Molecular Biology School of Medicine

Also affiliated with the NIH Roadmap-funded UNM Center for Molecular Discovery

Page 4: Cyberinfrastructure Day 2010: Applications in Biocomputing

  Biomolecular screening informatics

  Cheminformatics   Bioinformatics   Genomics   Virtual screening   Molecular modeling   SAR (Structure-

Activity-Relationship)

  Data mining, machine learning

  3D visualization   Public data integration   Collaborations in

chemistry, biology, medicine, comp sci

  BIOMED 505 course   Software development,

management, deployment & support

Page 5: Cyberinfrastructure Day 2010: Applications in Biocomputing

Larry Sklar, et al., UNMCMD (NIH Roadmap)

~$20M NIH awarded to date

Page 6: Cyberinfrastructure Day 2010: Applications in Biocomputing

 32 cpu Linux cluster  32GB RAM server  Linux: OpenSUSE, CentOS, RedHat, Fedora, Ubuntu  SGI/IRIX  Windows, Mac OS X  Automated integration with NIH databases

 2+ Oracle instances  PostgreSQL, MySQL  Stereo graphics workstation  25+ scientific software packages  Supported in-house applications

We are cyberinfrastructure users and providers!

Page 7: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 8: Cyberinfrastructure Day 2010: Applications in Biocomputing

Virtual chemistry; property prediction, chemspace navigation, computer aided molecular design, graph

theory, databases

Page 9: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Nucleotide and protein sequence analysis  Genomics, proteomics  Merging with chemical biology, etc.

Page 10: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Computational search for likely biological actives  Database may be real or virtual compounds  2D and 3D methods  2D similarity search  3D similarity search (shape, pharmacophore)  docking (3D, protein binding site)

Example: 3D shape search;

prozac & paxil

c/o OpenEye Rocs

Page 11: Cyberinfrastructure Day 2010: Applications in Biocomputing

atoms, bonds, surfaces, fields, interactions, stereo

serotonin

hemoglobin

Page 12: Cyberinfrastructure Day 2010: Applications in Biocomputing

Computational models for protein-ligand binding

interaction potential sites: hydrophobic (green), hydrophillic (purple), hbond acceptors (red)‏

Gleevec is a leukemia drug known to bind with Abl kinase.

Abl kinase (1iep.pdb)‏

Gleevec in binding site

Page 13: Cyberinfrastructure Day 2010: Applications in Biocomputing

http://video.google.com/videoplay?docid=-5859274887925224981#

http://chemapps.stolaf.edu/pe/protexpl/htm/top.htm?id=1d66&&&chpa=true

Jmol interactive DNA modeling demo:

PyMol movie:

Expert users can advance understanding via rich, dynamic, visual interfaces.

(Watch movies...)

Page 14: Cyberinfrastructure Day 2010: Applications in Biocomputing

E.g., Searching NIH PubChem for non-selectivity

Page 15: Cyberinfrastructure Day 2010: Applications in Biocomputing

Many biomedical data sources worldwide

SLIDE 15 (15 MIN?)

Page 16: Cyberinfrastructure Day 2010: Applications in Biocomputing

Division of Biocomputing in 2008

Page 17: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Rapid change, challenge and opportunity  Learning from history, trends (new not enough)  Winners and losers  Science, experts have led and followed.  ~1980-2010 covers 3σ (99.7%)  And evolution...

Page 18: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Rapid change, challenge and opportunity  Learning from history, trends  Winners and losers  Science, experts have led and followed.  ~1980-2010 covers 3σ (99.7%)  And evolution...

Page 19: Cyberinfrastructure Day 2010: Applications in Biocomputing

1977: Atari 2600 1978: Space Invaders 1981: IBM-PC (MS-DOS) 1983: cellphone 1983: GNU Project 1984: Neuromancer,

William Gibson, “cyberspace”

1984: Apple Mac, mouse, windows & icons

Page 20: Cyberinfrastructure Day 2010: Applications in Biocomputing

1985: Oracle 5 (client-server) 1989: Intel 486 Pentium (1M

transistors, 50MHz) 1990: MS Windows 3.0 1990: WWW (Berners-Lee) 1991: High Perf Comp &

Comm Act (Al Gore) 1991: Linux (Linux Torvalds) 1991: AOL 1991: ETrade

Page 21: Cyberinfrastructure Day 2010: Applications in Biocomputing

1993: Jurassic Park (via SGI) 1993: NCSA Mosaic 1994: Netscape Navigator 1994: “Good Times” hoax 1994: Match.com 1995: “Concept” virus (Word) 1995: Internet Explorer 1995: Apache project 1995: Yahoo!

Page 22: Cyberinfrastructure Day 2010: Applications in Biocomputing

1995: Amazon.com 1995: My mother gets email 1997: Google 1997: eBay 1999: Melissa virus (Outlook) 1999: Napster (p2p) 2000: MS convicted 2000: 3M USA broadband* 2000: dot-com bubble pops

*Fixed non dial-up internet connections >56k (FCC).

Page 23: Cyberinfrastructure Day 2010: Applications in Biocomputing

2000: 802.11b wireless 2001: Apple iPod 2001: Apple iTunes 2001: Wikipedia 2003: Skype 2005: YouTube 2005: Rio power grid hacked 2005: NSA domestic surveillance 2006: Facebook

Page 24: Cyberinfrastructure Day 2010: Applications in Biocomputing

2006: Amazon Cloud 2007: DOD hacked 2008: 70M USA broadband* 2009: Cyberdefense USA priority 2009: Twitter role in Iran election

protests 2010: UAVs are SOPs 2011: Cyber terrorism?

*Fixed non dial-up internet connections >56k (FCC).

Page 25: Cyberinfrastructure Day 2010: Applications in Biocomputing

The dotted line keeps moving...

Case study: database cheminformatics in pharma research, 1990→2000.

Page 26: Cyberinfrastructure Day 2010: Applications in Biocomputing

 In 1990, high speed chemical searching was beyond standard capabilities.  Research groups managed local servers in their labs & specialized DB engines (e.g. Daylight Inc.).  By 2000, this function had moved to IT (via Oracle cartridges, etc.) corporate informatics infrastructure  Transition not smooth, but very beneficial.

Page 27: Cyberinfrastructure Day 2010: Applications in Biocomputing

imidazoles

cocaine Standard functions:

substructure, similarity, identity

chemical searching

Page 28: Cyberinfrastructure Day 2010: Applications in Biocomputing

(1) office equipment (2) lab equipment (3) experimental apparatus (4) the experiment (5) a commodity (6) custom configured experimental

vehicle for exploration (5) all of the above

Page 29: Cyberinfrastructure Day 2010: Applications in Biocomputing

(1) office equipment (2) lab equipment (3) experimental apparatus (4) the experiment (5) a commodity (6) custom configured experimental

vehicle for exploration (5) all of the above

Page 30: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Scientific software  Computational science  Commodity software  Engineering enables science  Science requires agile development, high performance, experimentation, risk taking, play.  Cyberinfrastructure users and developers/maintainers

SLIDE 30 (30 MIN?)

Page 31: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Scientific research  Computational research  High performance computing as a research tool  High performance infrastructure as a productivity tool

 Scientific software for experts  Enabling software for scientists  Commoditization (e.g. cloud computing)  Plumbing vs. experimental apparatus  Appropriate tiers and domains

Page 32: Cyberinfrastructure Day 2010: Applications in Biocomputing

IT: “Poorly managed computers and needy ill-

trained users put the system at risk.”

Research: “We need power, flexibility and

access and not another lame PC.”

Page 33: Cyberinfrastructure Day 2010: Applications in Biocomputing

And with other cyberfolks too. And with great results.

Page 34: Cyberinfrastructure Day 2010: Applications in Biocomputing

 In ~5 yrs, super → un-super  Super computing? Define computer.  Advances from unexpected places:

  gaming, movies (graphics -- vs. AI)   social networking (crowdsourcing)   even business (web standards, UIs, security)

 Super computing is pushing the current limits  But where are the key frontiers?

Page 35: Cyberinfrastructure Day 2010: Applications in Biocomputing

Advances from unexpected places...

Page 36: Cyberinfrastructure Day 2010: Applications in Biocomputing

Colossus code breaking computer, UK.

Page 37: Cyberinfrastructure Day 2010: Applications in Biocomputing

Eniac computer, Univ of Pennsylvania.

Page 38: Cyberinfrastructure Day 2010: Applications in Biocomputing

Cray computer

Page 39: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 40: Cyberinfrastructure Day 2010: Applications in Biocomputing

SLIDE 40 (40 MIN?)

Page 41: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 42: Cyberinfrastructure Day 2010: Applications in Biocomputing

High performance (super) computing is pushing the current limits.

Page 43: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 44: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 45: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 46: Cyberinfrastructure Day 2010: Applications in Biocomputing

This is what a “computer” looks like.

Page 47: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 48: Cyberinfrastructure Day 2010: Applications in Biocomputing

“The network is the computer.” - John Gage (Sun, NetDay founder)

Page 49: Cyberinfrastructure Day 2010: Applications in Biocomputing

Corollaries:  The network is the (semantic) database  The network is cyberspace  The network is us too

Page 50: Cyberinfrastructure Day 2010: Applications in Biocomputing

 Super users → super computing  Blackbox AI/monolith paradigm limiting  Human/computer co-evolution

Cytoscape biological network

visualizer with drug - target interactions

Page 51: Cyberinfrastructure Day 2010: Applications in Biocomputing
Page 52: Cyberinfrastructure Day 2010: Applications in Biocomputing

“Super Computers” @ Division of Biocomputing  Tudor Oprea  Cristian Bologa  Stephen Mathias  Oleg Ursu  Jerome Abear  Ramona Curpan  Liliana Halip  Andrei Leitao

Jeremy Yang [email protected]

Cyberinfrastructure Day -- April 22, 2010

Happy Earth Day!