cyberinfrastructure and molecular structure determination · cyberinfrastructure and molecular...

60
Russ Miller State University of New York at Buffalo Hauptman-Woodward Med Res Inst Cyberinfrastructure and Molecular Structure Determination NSF, NIH, DOE, NIMA, NYS, HP

Upload: others

Post on 07-Jan-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Russ MillerState University of New York at BuffaloHauptman-Woodward Med Res Inst

Cyberinfrastructure and Molecular Structure Determination

NSF, NIH, DOE, NIMA, NYS, HP

Russ Miller ([email protected]) University of Miami 9/23/2006

Academia in the 21st Century

Embrace digital data-driven societyEmpower students to compete in knowledge-based economySupport HPC infrastructure, research, and applications Support education, outreach, and training Deliver high-end cyberinfrastructure to enable efficient

Collection of data Management/Organization of dataDistribution of dataAnalysis of dataVisualization of data

Russ Miller ([email protected]) University of Miami 9/23/2006

Organization of CSNY(Cyberinstitute of SUNY-Buffalo)

CSNY

HPC (CCR)•Computing•Data•Visualization•Networking

CSE•MultiScale•Sciences•Engineering•Life Sciences•Media

CI•Scheduling•Monitoring•Virtual Reality

Enabling•Programmers•GUI Design•Integration

Abilene

MAN LAN

Buffalo

RochesterAlbany

32 AoA

AbileneHEAnet

CA*net

Syracuse

OC-12GigE

legend

NYSERNet PoP

10 GigE

R&E Network

DWDM

NLR

ESnet

Courtesy of NYSERNet

Russ Miller ([email protected]) University of Miami 9/23/2006

BioACE: Bioinformatics System

Sun V880 (3), Sun 6800Sun 280R (2), Intel PIIIsSun 3960: 7 TB Disk Storage

EMC SAN35 TB Disk, 190 TB Tape

Founding Director1998-2006Peak of ~25 TFPeak of ~600 TB StoragePeak of 20/30 StaffROI: $7M → ~$300M @ UBROI: ~$500M to WNY

Dell Linux Cluster (10TF peak)1600 Xeon EM64T Processors (3.2 GHz)2 TB RAM; 65 TB DiskMyrinet / Force1030 TB EMC SAN

Dell Linux Cluster (3TF peak)600 P4 Processors (2.4 GHz)600 GB RAM; 40 TB Disk; Myrinet

SGI Altix3700 (0.4TF peak)64 Processors (1.3GHz ITF2)256 GB RAM2.5 TB Disk

Center for Computational Research

Russ Miller ([email protected]) University of Miami 9/23/2006

CCR Visualization Resources

Tiled-Display Wall20 NEC projectors: 15.7M pixelsScreen is 11’×7’Dell PCs with Myrinet2000

Access Grid Nodes (2)Group-to-Group CommunicationCommodity components

3D Passive Stereo Display VisDuo ceiling mounted system

Russ Miller ([email protected]) University of Miami 9/23/2006

CCR Research & ProjectsArchaeologyBioinformatics/Protein FoldingComputational ChemistryComputational Fluid DynamicsData Mining/DatabaseEarthquake EngineeringEnviron Modeling & SimulationGrid ComputingMolecular Structure DeterminationPhysics

Videos: MTVUrban Simulation and Viz

StreetScenesI-90 Toll BarrierMedical CampusPeace Bridge

Accident ReconstructionScientific Viz

DentalSurgeryMRI/CT ScanConfocal MicroscopyCrystallization WellsCollaboratories

Real-Time Visualization

Russ Miller ([email protected]) University of Miami 9/23/2006

Accurate local landmarks: Bridges, Street Signs, Business, HomesCan be viewed from driver’s perspectiveReal-Time NavigationWorks with

CorsimSynchro

Generate AVI & MOVMultiple Simultaneous

Traffic LoadsSimulationVarying POV

StreetScenes: Real-Time3D Traffic Simulation

Animation & Simulation

Rendered Scenes

Russ Miller ([email protected]) University of Miami 9/23/2006

Williamsville Toll Barrier Improvement Project

Initial Photo Match incorporating real and computer-generated components

Russ Miller ([email protected]) University of Miami 9/23/2006

Peace Bridge Visualization:Animation & Simulation

International CrossingThe Problem

75 year old bridge3 lanes – poor capacityExisting US plaza: small and poor design

Proposed OptionsRelocate US plazaBuild a 3-lane companion span

& rehab existing bridgeBuild a six lane signature span

Russ Miller ([email protected]) University of Miami 9/23/2006

Song: I’m OK (I Promise)Band: Chemical Romance

Gaming Environment: Death Jr.MTV

IBC Digital & CCR

Scientific Visualization

Russ Miller ([email protected]) University of Miami 9/23/2006

Multiple Sclerosis Project

Collaboration with Buffalo Neuroimaging Analysis Center (BNAC)

Developers of Avonex, drug of choice for treatment of MS

MS Project examines patients and compares scans to healthy volunteers

Russ Miller ([email protected]) University of Miami 9/23/2006

3D Medical Visualization App

Collaboration with Children’s Hospital

Leading miniature access surgery center

Application reads data output from a CT ScanVisualize multiple surfaces and volumesExport images, movies or CAD representation of model

Science & Engineering

Russ Miller ([email protected]) University of Miami 9/23/2006

Groundwater Flow ModelingRegional-scale modeling of groundwater flow and contaminant transport (Great Lakes Region)Ability to include all hydrogeologicfeatures as independent objectsCurrent work is based on Analytic Element MethodKey features:

High precisionHighly parallelObject-oriented programmingIntelligent user interfaceGIS facilitates large-scale regional applications

Utilized 10,661 CPU days (32 CPU years) of computing in past year on CCR’s commodity clusters

Russ Miller ([email protected]) University of Miami 9/23/2006

Geophysical Mass Flow Modeling

Modeling of Volcanic Flows, Mud flows (flash flooding), and AvalanchesIntegrate information from several sources

Simulation resultsRemote sensingGIS data

Develop realistic 3D models of mass flows Present information at appropriate level

Shake-and-Bake

Molecular Structure Determinationfrom X-Ray Crystallographic Data

University at Buffalo The State University of New York CCRCenter for Computational Research

Objective: Provide a 3-D mapping of the atoms in a crystal.Procedure:

1. Isolate a single crystal.2. Perform the X-Ray diffraction experiment.

3. Determine molecular structure that agrees with diffration data.

X-Ray Crystallography

University at Buffalo The State University of New York CCRCenter for Computational Research

Experiment yields reflections and associated intensities.Underlying atomic arrangement is related to the reflections by a 3-D Fourier transform.Phase angles are lost in experiment.Phase Problem: Determine the set of phases corresponding to the reflections.

X-Ray Data Molecular Structure

FFT

FFT-1

X-Ray Data & Corresponding Molecular Structure

Reciprocal or “Phase” Space Real Space

University at Buffalo The State University of New York CCRCenter for Computational Research

FFTTrial

Phases

Solutions

?PhaseRefinement

DensityModification

(Peak Picking)

TangentFormula

Reciprocal Space Real Space

Conventional Direct Methods

University at Buffalo The State University of New York CCRCenter for Computational Research

Shake-and-Bake Method:Dual-Space Refinement

FFTTrial

Phases

Solutions

?PhaseRefinement

TangentFormula

Reciprocal Space Real Space“Shake” “Bake”

PhaseRefinement

FFT-1ParameterShift

DensityModification

(Peak Picking)(LDE)

Trial Structures Shake-and-Bake

StructureFactors

University at Buffalo The State University of New York CCRCenter for Computational Research

Shake-and-Bake

A Direct Methods Flowchart

University at Buffalo The State University of New York CCRCenter for Computational Research

Useful Relationships for Multiple Trial Phasing

||2 :Weights

0 :Invariantsshells resolutionin normalized || || where

)()(cos1)(

)cos(||

)sin(||tan

2/1

,

2

0

1

,

KHKHHKHK

KHKHHK

HH

KH HK

HKHKHK

KHHK

KKHKKHK

KKHKKHK

H

EEENAW

FE

WIWIW

WR

EE

EE

−−−

−−

−−−−

−−−−

==

≈++=Φ∝

⎟⎟⎠

⎞⎜⎜⎝

⎛−Φ=

+

+−=

∑∑

∑∑

φφφ

φ

φφ

φφφTangent

Formula

Parameter ShiftOptimization

University at Buffalo The State University of New York CCRCenter for Computational Research

Ph8755: SnB Histogram

University at Buffalo The State University of New York CCRCenter for Computational Research

Number of Atoms in Structure0 100 1,000 10,000 100,000

Conventional Direct Methods

Shake-and-Bake

Multiple Isomorphous Replacement

Se-Met

Se-Met with Shake-and-Bake

Vancomycin

567 kDa (160 Se)

?

?

Phasing and Structure Size

Grid Computing

Russ Miller ([email protected]) University of Miami 9/23/2006

Grid Computing

Russ Miller ([email protected]) University of Miami 9/23/2006

Grid Computing Overview

Coordinate Computing Resources, People, Instruments in Dynamic Geographically-Distributed Multi-Institutional EnvironmentTreat Computing Resources like Commodities

Compute cycles, data storage, instruments Human communication environments

No Central Control; No Trust

Imaging Instruments Large-Scale Databases

Data Acquisition AnalysisAdvanced Visualization

Computational ResourcesLHC

Russ Miller ([email protected]) University of Miami 9/23/2006

ACDC-Grid Collaborations IHigh-Performance Networking InfrastructureGrid3+ CollaborationiVDGL Member

Only External MemberOpen Science Grid

Organizational CommitteeBlueprint CommitteeSecurity Working GroupData Working GroupGRASE VO

Grid-Lite: Campus GridHP Labs Collaboration

Innovative Laboratory PrototypeDell Collaboration

Russ Miller ([email protected]) University of Miami 9/23/2006

ACDC-Grid Collaborations IINYS Grid

BrookhavenCanisius CollegeColumbia Hauptman-Woodward Inst.Niagara UniversityNYURITRPISUNY-AlbanySUNY-BinghamtonSUNY-BuffaloSUNY-GeneseoSUNY-Stony BrookSyracuseUniv of Rochester

GRASE VO: Grid Resources for Advanced Science and Engineering Virtual Organization

(Non-Physics Research)Structural BiologyGroundwater ModelingEarthquake EngineeringComputational ChemistryGIS/BioHazards

Russ Miller ([email protected]) University of Miami 9/23/2006

ACDC-Grid Cyber-Infrastructure

Integrated Data GridAutomated Data File Migration based on profiling users.

Lightweight Grid Monitor (Dashboard)Predictive Scheduler

Define quality of service estimates of job completion, by better estimating job runtimes by profiling users.

Dynamic Resource AllocationDevelop automated procedures for dynamic computational resource allocation.

High-Performance Grid-Enabled Data RepositoriesDevelop automated procedures for dynamic data repository creation and deletion.

University at Buffalo The State University of New York CCRCenter for Computational Research

ACDC-Grid Data Grid

Browser view of “miller”group files published by

user “rappleye”

University at Buffalo The State University of New York CCRCenter for Computational Research

ACDC-Grid Data Grid Functionality

Basic file management functions are accessible via a platform-independent web interface.User-friendly menus/interface.File Upload/Download to/from the Data Grid Portal.Simple Web-based file editor.Efficient search utility.Logical display of files (user/ group/ public).Ability to logically display files based on metadata (file name, size, modification date, etc.)

University at Buffalo The State University of New York CCRCenter for Computational Research

Predictive Scheduler

Build profiles based on statistical analysis of logs of past jobs

Per User/Group Per Resource

Use these profiles to predict runtimes of new jobsMake use of these predictions to determine

Resources to be utilizedAvailability of Backfill

University at Buffalo The State University of New York CCRCenter for Computational Research

Small number (40) of CPUs were dedicated at nightAn additional 400 CPUs were dynamically allocated during the dayNo human intervention was requiredGrid applications were able to utilize the resources and surpassed the Grid3 goals

ACDC-Grid Dynamic Resource Allocation at SC03 with Grid3

University at Buffalo The State University of New York CCRCenter for Computational Research

GigE and Myrinet connection

GigE connection

73 GB hard drive

292 – Dell 2650production nodes

4 node Dell 2650 PVFS server (1096 GB)

1 node Dell 2650 NFS server (342 GB)

Dell 2650 backup front-end

Dell 6650 4-wayfront-end

Dell 6650 4-way(ACDC)

Dell 6650 4-way(GRID)

Dell 6650 4-way(EAGLES)

Joplin ConfigurationDiagram

Node scratch space (120 GB)

ACDC-Grid Dynamic Resource Allocation

ACDC-Grid Monitoring:

The ACDC-GridDASHBOARD

http://osg.ccr.buffalo.edu

University at Buffalo The State University of New York CCRCenter for Computational Research

ACDC-Grid Administration

University at Buffalo The State University of New York CCRCenter for Computational Research

Structural BiologySnB and BnP for Molecular Structure Determination/Phasing

Groundwater ModelingOstrich: Optimization and Parameter Estimation ToolPOMGL: Princeton Ocean Model Great Lakes for Hydrodynamic CirculationSplit: Modeling Groundwater Flow with Analytic Element Method

Earthquake EngineeringEADR: Evolutionary Aseismic Design and Retrofit; Passive Energy Dissipation System for Designing Earthquake Resilient Structures

Computational ChemistryQ-Chem: Quantum Chemistry Package

Geographic Information Systems & BioHazardsTitan: Computational Modeling of Hazardous Geophysical Mass Flows

Grid-Enabling Application Templates (GATs)

Startup Screen for ACDC-Grid Job Submission

Instructions and Description for Running a Job on ACDC-Grid

Software Package Selection

Full Structure / Substructure Template Selection

Default Parameters Based on Template

Default Parameters (cont’d)

Generating Reflections (Drear)

Invariant Generation

SnB Setup

SnB Setup (cont’d)

SnB Review (Grid job ID: 447)

Graphical Representation of Intermediate Job Status

Histogram of Completed Trial Structures

Status of Jobs

User starts up – default image of structure.

Heterogeneous Back-End Interactive Collaboratory

Molecule scaled, rotated, and labeled.

Russ Miller ([email protected]) University of Miami 9/23/2006

Acknowledgments

Mark GreenCathy RubyAmin GhadersohiNaimesh ShahSteve GalloJason RappleyeJon BednaszSam GuercioMartins InnusCynthia Cornelius

George DeTittaHerb HauptmanCharles WeeksSteve Potter

Phil GlickRohit Bakshi

Alan RabideauIgor JanckovicMichael Sheridan Abani PatraMatt Jones

IBC DigitalTVGABergmann AssociatesPeace Bridge Authority

Bruce HolmJanet Penksa

NSF, NIH, NYS, NIMA, NTA, Oishei, Wendt, DOE

Russ Miller ([email protected]) University of Miami 9/23/2006

[email protected]