sustainable software for computational chemistry and materials modeling

23
SUSTAINABLE SOFTWARE FOR COMPUTATIONAL CHEMISTRY AND MATERIALS MODELING Beverly Sanders, University of Florida

Upload: softwarepractice

Post on 01-Dec-2014

809 views

Category:

Documents


5 download

DESCRIPTION

Presented at the 1st Workshop on Maintainable Software Practices in e-Science, Chicago, 9 October 2012. Co-located with e-Science 2012.

TRANSCRIPT

Page 1: Sustainable Software for Computational Chemistry and Materials Modeling

SUSTAINABLE SOFTWARE FOR

COMPUTATIONAL CHEMISTRY

AND MATERIALS MODELING

Beverly Sanders, University of Florida

Page 2: Sustainable Software for Computational Chemistry and Materials Modeling

Outline

Overview

Challenges and Current State

Overcoming the Barriers—technical and cultural

Page 3: Sustainable Software for Computational Chemistry and Materials Modeling

Computational Chemistry

Long history – over 50 years

Underpins broad array of scientific applications—Grand Challenges Efficient combustion systems

Drug design

Understanding biological systems

Semiconductor design

Water sustainability

CO2 Sequestration

Efficient lighting (quantum dots)

Full partner with experiments—”computational experiments” may be more reliable than lab measurements

Page 4: Sustainable Software for Computational Chemistry and Materials Modeling

Scientific Software Innovation Institute for Computational

Chemistry and Materials Modeling -- S2I2C2M2

Collaboration between computational chemists,

computer scientists, applied mathematicians, and

computer engineers

Goal: overcome obstacles of algorithms and culture

and change the nature of computational chemistry

software development.

Year long conceptualization phase has been funded by

NSF

First meeting scheduled for January 2013

Page 5: Sustainable Software for Computational Chemistry and Materials Modeling

People

Daniel Crawford (Virginia Tech)

Robert Harrison (Tennessee, ORNL)

Anna I. Krylov (U.S.C.)

Theresa Windus (Iowa State)

Emily Carter (Princeton)

Edmund Chow (Georgia Tech)

Erik Deumens (Florida)

Mark Gordon (Iowa State)

Martin Head-Gordon (Berkeley)

Todd Martinez (Stanford),

David McDowell (Georgia Tech)

Vijay Pande (Stanford)

Manish Parashar (Rutgers)

Ram Ramanujam (LSU)

Beverly Sanders (Florida)

Bernhard Schlegel (Wayne State)

David Sherrill (Georgia Tech)

Lyudmila Slipchenko (Purdue)

Masha Sosonkina (Iowa State)

Edward Valeev (Virginia Tech)

Ross Walker (San Diego

Supercomputing Center)

+ others

Page 6: Sustainable Software for Computational Chemistry and Materials Modeling

Current State of Computational

Chemistry

Long history--legacies of modern molecular dynamics and quantum chemistry packages span decades

Both open source and commercial

Amalgam of programming languages

Domain specific methods

Multi-dimensional integral engines

General purpose

Davidson method for computing eigenvalues of large matrices (ranks in tens of billions)

Page 7: Sustainable Software for Computational Chemistry and Materials Modeling

Software is extremely complex

Example: Modern ab initio quantum chemistry simulations

Computations scale as O(N7) or higher

Where N represents size of molecular system (number of atoms, electrons, or basis functions)

Code complexity arises naturally from problems, but

is an obstacle to long-term sustainability

is an obstacle to exploitation of (ever changing) HPC hardware

hinders education of next generation of scientists

only a handful of very senior students can make a contribution

Page 8: Sustainable Software for Computational Chemistry and Materials Modeling

Much recent software development focused on

exploiting parallel architectures

Varying degrees of success

With a few exceptions, still not fully exploiting

available systems

Utilizing exascale will require rethinking of approach

Desperate need for tools to generate high

performance massively parallel code from high

level specifications

Page 9: Sustainable Software for Computational Chemistry and Materials Modeling

Developers

Mostly grad students and post-docs

Training in software engineering left to individual

research groups

Extent to which this is done varies

Large burden for small groups: community approach

has potential benefits for both software and the

students

Education tends to be narrow: students learn about

software their advisors are involved with

Page 10: Sustainable Software for Computational Chemistry and Materials Modeling

Science Drivers

Catalysis

Catalysts facilitate control of chemical reactions by raising rates that chemical bonds are formed or broken

Improve selectivity and control over unwanted byproducts

Decreased energy consumption

Reduction of waste stream

Rational design of catalysts for a specific application is one of the Holy Grails of of modern chemistry and chemical engineering

Requires quantitative information about transition states

Intermediates low concentration and short lifetimes—thwart experimental evaluation

Will require state-of-the-art computation combined with experiements

Page 11: Sustainable Software for Computational Chemistry and Materials Modeling

Science Drivers

Organic photovoltaic cells

Potential applications: thin-film transistors, LEDs, solar

cells, optical switches

Advantages

Devices can be flexible

Inexpensive to produce

Limitations

Reduced power conversion energy

But, process leading to current generation not well

understood

Page 12: Sustainable Software for Computational Chemistry and Materials Modeling

Overcoming the Barriers

1 year conceptualization phase funded by NSF

First meeting Jan 2013

3 working groups

Highest priority

Portable parallel infrastructure

General-purpose tensor algebra algorithms

Protocols for information exchange and code

interoperability

Education and training

Page 13: Sustainable Software for Computational Chemistry and Materials Modeling

Portable parallel infrastructure

Technology trends

Massive concurrency on a chip

Massive number of sockets in largest supercomputers

Heterogeneity (CPU + GPU)

Deep, complex memory hierarchies

Memory and communication bandwidth limited

Bleeding edge applications may need to

Coordinate over 109 threads

Tolerate faults

Explicitly manage energy consumption

Page 14: Sustainable Software for Computational Chemistry and Materials Modeling

Sustainability of large and widely distributed

chemistry codes

Enable most computations in chemistry

Likely will run on leadership class machines

Working group will include computational chemists,

parallel programming experts, and reps from major

tech providers (NVIDIA, Intel, IBM)

Page 15: Sustainable Software for Computational Chemistry and Materials Modeling

Sustainability of software developed by smaller

research groups

Need to understand programming models and tools

Need to understand how both community and software can be better organized

Accelerate testing of new ideas at sufficient scale to determine their worth

Key: being able to write code and integrate into existing software.

Currently, new developments take months or years to migrate from developers software to other packages

Page 16: Sustainable Software for Computational Chemistry and Materials Modeling

General-purpose tensor algebra

algorithms

Tensor algebra ubiquitous in science and

engineering

Need new approaches for computing with high

dimensional tensors

Current software—8 or fewer

Emerging methods require 3N dimensions where N (the

number of electrons) may be O(100) or more.

Need common framework of reusable software

elements.

Page 17: Sustainable Software for Computational Chemistry and Materials Modeling

Challenges for high-rank tensors

Challenges

Develop robust implementations of algorithms

Standardize data structures and algorithms, APIs,

software elements, and frameworks

Automate the derivation, transformation, and

implementation of tensor expressions.

Will require cross-disciplinary collaborations

Infrastructure will include DSL, runtimes, compilers as

well as static and dynamic algorithm analyses

Page 18: Sustainable Software for Computational Chemistry and Materials Modeling

Protocols for Information Exchange and

Code Interoperability

Historically culture has been competitive rather than

collaborative

Sharing of code and data limited

Theoretical methods driving code towards greater

size and complexity

Currently, progress may require substantial

duplication of effort—code that could be reused is

not due to lack of standards

Impedes new science, wastes human labor

Page 19: Sustainable Software for Computational Chemistry and Materials Modeling

Information Sharing

Standards for data shared between codes

Standards (or methods to convert between

standards) for stored data to facilitate mining.

Data provenance

Cannot expect all code to use the same format, but

transformation leads to errors, computational

inefficiencies, complex interfaces

Page 20: Sustainable Software for Computational Chemistry and Materials Modeling

Code Sharing

Need to establish level of interoperability

Coarse-grained

Hartree-Fock code from one app, MP2 code from another

Fine-grained

Calculate most of one electron contributions to a Fock matrix

in one program, relativistic and solvent terms in another

Need architecture

Page 21: Sustainable Software for Computational Chemistry and Materials Modeling

Education and Training

Mastering existing codes is daunting task for grad students

Chemical education culture worse than many STEM fields

PhD only requires modest coursework

OK for most fields of chemistry where undergrad training is adequate preparation for hands-on lab research

Most students unprepared for research in computational chemistry

Ad hoc training by individual research groups is innefficient

How should students be prepared for 2020 and beyond?

Programming models and tools

Multidisciplinary foundation to computational science

Reasoning about software

Manipulating software with confidence and facility

Page 22: Sustainable Software for Computational Chemistry and Materials Modeling

Summer school

Summer school for grad students supported by

community

Fundamental algorithms of computational chemistry

Software best practices

Page 23: Sustainable Software for Computational Chemistry and Materials Modeling

If S2I2C2M2 is successful

Open access to new software tools and infrastructure

Training and educational opportunities for grad students and post-docs in

Algorithms

Code standards

Software best practices

NOT the goal to produce monolithic computational chemistry package to replace existing ones

Healthy competition is good

Sets of robust and properly validated software components that can be shared will benefit the entire community