software tools to facilitate materials science research

38
Software tools to facilitate materials science research Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA S2I2 Workshop, Feb 2017 Slides (already) posted to http://www.slideshare.net/anubhavster

Upload: anubhav-jain

Post on 19-Feb-2017

34 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Software tools to facilitate materials science research

Software tools to facilitate materials science research

Anubhav Jain Energy Technologies Area

Lawrence Berkeley National Laboratory Berkeley, CA

S2I2 Workshop, Feb 2017

Slides (already) posted to http://www.slideshare.net/anubhavster

Page 2: Software tools to facilitate materials science research

What we work on

•  We don’t develop or debut the new and fashionable computational methods

•  We adopt methods, standardize the parts that are ready for mass reproduction, and execute them over thousands of materials

2

Page 3: Software tools to facilitate materials science research

Our research interests as materials scientists

3

High-throughput calculations(each point is a possible battery cathode)

Discovery of new functional materials(e.g., new bulk thermoelectrics)

Page 4: Software tools to facilitate materials science research

A user’s perspective of materials simulation

4

“something”!

Results!

PI!

WhatistheGGA-PBEelas0ctensorofGaAs?

Page 5: Software tools to facilitate materials science research

A user’s perspective of materials simulation

5

“something”!= student/postdoc!

Results!

PI!

WhatistheGGA-PBEelas0ctensorofGaAs?

InputfileflagsQueueformathowtofixZPOTRF?

Page 6: Software tools to facilitate materials science research

Why this system? •  It works!

•  Many aspects of running simulations seem tailor-made for assigning to students/postdocs–  requires specialized

knowledge–  labor intensive–  helpful to have a high pain

threshold

•  But there are also disadvantages…

6

Nicola Marzari’s “Middle Age Workshop” analogy

Page 7: Software tools to facilitate materials science research

Staff specialization can get out of control

Because of the steep learning curve to computational methods, there is often a single group member assigned to a technique

7

“Alice knows how to do charged defect calculations.”!

“Bob is the one who can properly converge GW runs.”!

“Olga has all the scripts for phonon calculations.”!

Page 8: Software tools to facilitate materials science research

Errors are all too common Let’s take a look at two alternate universes:

Which universe you are in? Are you sure? 8

student! has coffee! copies files from!previous simulation! edits 5 lines! runs simulation,!

delivers report!

student! forgets coffee! copies files from!previous simulation!

edits 4 lines!forgets!LHFCALC=F!

delivers report, !looks fine at first, !in a month you !discover it was wrong!

1

2

Page 9: Software tools to facilitate materials science research

Takes too long to get results

•  Calculations are labor intensive!–  set up the structure coordinates–  write input files, double-check all the flags–  copy to supercomputer–  submit job to queue–  deal with supercomputer headaches–  monitor job–  fix error jobs, resubmit to queue, wait again–  repeat process for subsequent calculations in

workflow–  parse output files to obtain results–  copy and organize results, e.g., into Excel

9

Page 10: Software tools to facilitate materials science research

There is a lot of back-and-forth in the analysis •  Student/postdoc presents Powerpoint/Excel of the

results

•  PI wants to know certain details or follow up based on the data, which are missing from the Powerpoint/Excel

•  Student/postdoc says “I will get back to you”, goes back to office, re-processes the data, and prepares a revised report within a few days

•  Repeat…

10

Page 11: Software tools to facilitate materials science research

What would be a better way?

11

“something”!= a computer!!

Results!

PI!

WhatistheGGA-PBEelas0ctensorofGaAs?

Page 12: Software tools to facilitate materials science research

All past and present knowledge, from everyone in the group, everyone previously

in the group, and outside collaborators, about how to run calculations

Reduce specialization

12

Page 13: Software tools to facilitate materials science research

Reduce errors and improve efficiency

•  Computers can’t forget to set an input flag

•  Computers (in theory) can create, correct, submit, parse, and deliver the results of calculations much faster than even the fastest student

13

Page 14: Software tools to facilitate materials science research

Improve analytics / visualization

•  Excel and Powerpoint works for a curated view of the results

•  But online analytics would allow you to do things like:–  view crystal structures on

demand–  generate the plot you

want

14

Page 15: Software tools to facilitate materials science research

So this the vision we want – is it achievable?

15

“something”!= a computer!!

Results!

PI!

WhatistheGGA-PBEelas0ctensorofGaAs?

Page 16: Software tools to facilitate materials science research

Yes! – and it is available on Materials Project

16

Input generation (parameter choice)

Workflow mapping Supercomputer submission /

monitoring

Error handling

File Transfer File Parsing / DB insertion

Custom material Submit!

www.materialsproject.org“Crystal Toolkit” Anyone can find, edit, and submit (suggest) structures

Currently, this feature is available for:•  structure optimization•  band structures•  elastic tensors

Page 17: Software tools to facilitate materials science research

Software technologies to enable automatization

17

(automatic materials science workflows)

Custodian (calculation error

recovery)

(materials analysis

framework)

Base packages

Derived package

(workflow framework and supercomputer interface)

These are all open-source:

•  pymatgen and custodian are led by Prof. Ong group (UC San Diego)•  Developed in coordination with the Materials Project and Persson group

Page 18: Software tools to facilitate materials science research

pymatgen – object-oriented materials analysis

18

www.pymatgen.org! Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D., Chevrier, V. L., Persson, K. a. & Ceder, G. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).!

Page 19: Software tools to facilitate materials science research

pymatgen – examples of analyses

19

phase diagrams Pourbaix diagrams

diffusivity from MDband structure analysis

Page 20: Software tools to facilitate materials science research

pymatgen - many useful tools made accessible

20

Structure Matcheranalyzes if two periodic structures are equivalent, even if they are in different settings or have minor distortions

= ?!

Order-disorderresolve partial or mixed occupancies into a fully ordered crystal structure(e.g., mixed oxide-fluoride site into separate oxygen/fluorine)

Many other tools, such as:•  Bond-valence sums to determine valence•  Voronoi coordination as well as 3D coordination polyhedron analysis•  Automatically find and insert interstitial sites•  Diffraction pattern modeling•  Simple cost and materials availability estimators

Page 21: Software tools to facilitate materials science research

custodian – fixing job errors •  Custodian can wrap

around an executable (e.g., VASP)–  i.e., run custodian instead of

directly running VASP•  During execution,

custodian will monitor output files and detect errors / problems–  If so, it can change input files

and rerun the job–  e.g., if ZPOTRF error

detected, rerun with ISYM=0–  ever-expanding library of

fixes

21

Page 22: Software tools to facilitate materials science research

FireWorks – scientific workflow software •  FireWorks is an open-source scientific

workflow software•  Materials Project, JCESR, and other

projects manage their runs with FireWorks–  >1 million jobs–  >100 million CPU-hours–  multiple computing clusters

•  You can write any workflow–  e.g., FireWorks is used for graphics

processing, machine learning, document processing, and protein folding

–  #1 Google hit for “Python workflow software”, top 5 for general scientific workflow software

•  Detailed tutorials are available

22

Jain, A., Ong, S. P., Chen, W., Medasani, B., Qu, X., Kocher, M., Brafman, M., Petretto, G., Rignanese, G.-M., Hautier, G., Gunter, D. & Persson, K. A. FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015).!

www.pythonhosted.org/FireWorks!

Page 23: Software tools to facilitate materials science research

FireWorks – screenshot of jobs status

23 Live version at http://fireworks.dash.materialsproject.org

Page 24: Software tools to facilitate materials science research

atomate – our newest code (redesigns our older codes)

24

translate PI-style (minimal) specifications into well-defined FireWorks workflows

(FireWorks handles all the execution and job management details)

WhatistheGGA-PBEelas0ctensorofGaAs?

Page 25: Software tools to facilitate materials science research

atomate – what’s available?

25

K. Mathew J. Montoya S. Dwaraknath A. Faghaninia

•  band structure•  spin-orbit coupling•  hybrid functional calcs•  elastic tensor•  piezoelectric tensor•  Raman spectra•  GIBBS method•  QH thermal expansion•  AIMD

•  FEFF method•  LAMMPS MD

All past and present knowledge, from everyone in the group, everyone previously

in the group, and outside collaborators, about how to run calculations

M. Aykol S.P. Ong

Page 26: Software tools to facilitate materials science research

Further resources

•  The Github web sites–  www.github.com/materialsproject–  www.github.com/hackingmaterials

•  Software carpentry•  https://software-carpentry.org

26

Page 27: Software tools to facilitate materials science research

Needed: better way to learn methods •  It can take many months, and perhaps even an internship in a

group with relevant expertise, to learn to use a new method•  Workshops are one way to speed the process•  However, self-serve ways to learn new methods would be

wonderful–  e.g., web tutorials that mix together theory and practice

•  Consider: what fraction of people could learn to correctly use your code/method given only a single web link and no direct communication with anyone? (they are allowed to find and use other web resources based on the initial link)–  Example: https://www.youtube.com/user/MaterialsProject

27

Page 28: Software tools to facilitate materials science research

Needed: curation of tools and methods

•  A place to kick-start discovery and learning of new codes and tools:–  “Too basic” example: http://materials.sh (Shyue Ping

Ong, UCSD)–  “Too complex/messy” example: Nanohub

28

Page 29: Software tools to facilitate materials science research

Needed: standardizing data *containers* •  Different codes will have different inputs and

outputs, so obviously data organization will vary•  But the “container” of the data organization can be

consistent. e.g., you can represent arrays within:–  JSON–  YAML–  XML–  HDF5–  but don’t invent your own format to represent an array!

•  Some of these container formats are human-readable, i.e., easy to edit in a text editor

•  No more “code parses custom input file format to produce custom output file format”

29

Page 30: Software tools to facilitate materials science research

Needed: other ways to improve accuracy

30

DFT band gap = cheap lens Some kind of super accurate post-Bethe-Salpeter method

How to improve image quality? Strategy 1

Page 31: Software tools to facilitate materials science research

Needed: other ways to improve accuracy

31

Computer algorithms improve image

How to improve image quality? Strategy 2

Software corrects for cheap lens. e.g., distortion, two images to create depth of field

Page 32: Software tools to facilitate materials science research

Needed: other ways to improve accuracy

32

correct and mix cheap/simple calculations to improve output quality

Jain,A.,Hau0er,G.,Ong,S.P.,Moore,C.J.,Fischer,C.C.,Persson,K.A.&Ceder,G.Forma0onenthalpiesbymixingGGAandGGA+Ucalcula0ons.Phys.Rev.B84,45115(2011).!

Page 33: Software tools to facilitate materials science research

Needed: other ways to improve accuracy

33

Correcting the DFT is necessary to getting decent phase diagrams

Almost everyone that is practicing new materials design does some flavor of post-correction (e.g., gas phase energies)

More effort into comparing, developing, and validating such methods is needed.

Jain,A.,Hau0er,G.,Ong,S.P.,Moore,C.J.,Fischer,C.C.,Persson,K.A.&Ceder,G.Forma0onenthalpiesbymixingGGAandGGA+Ucalcula0ons.Phys.Rev.B84,45115(2011).!

Page 34: Software tools to facilitate materials science research

Questions?

34 Slides (already) posted to http://www.slideshare.net/anubhavster

Page 35: Software tools to facilitate materials science research

Some lessons learned (1) •  In the beginning, strong central coordination from

authority was needed to develop these–  require that people contribute to common code, e.g.

pymatgen, and not write their own detached scripts•  Once a code was “established”, less authority was

needed–  people voluntarily contributed improvements rather than

writing their own code because this benefited them•  Today the process is almost completely

decentralized–  culture has changed–  even for new codes, people rally around it rather than

build independent things

35

Page 36: Software tools to facilitate materials science research

Some lessons learned (2)

•  It is helpful to have a strong BDFL (benevolent dictator for life) for each codebase

•  Requirements for the BDFL:–  very detail-oriented–  cares about the code itself, not just the application–  cares more about the code quality than about offending

teammates, i.e., will not accept poor quality contributions–  at the same time, able to rally support from people and

convince them to contribute or clean up code–  willing to work overtime to do things like write detailed

docs, advocate for the code, review commits, etc.–  derives joy from building and deploying things!

36

Page 37: Software tools to facilitate materials science research

Some lessons learned (3) •  Spending time to do things like improve code-cleanliness, writing

unit tests, writing documentation, etc. is not such a “noble” and “self-sacrificing” act like people make it out to be–  I’ve referred my own documentation many times–  I’ve saved myself from a world of trouble by previously writing unit tests to

detect bugs–  I’ve been able to write and build large code much faster due to previous

commitments to code cleanliness (and been slowed down in my progress when I’ve relaxed these constraints)

•  We don’t like to admit this, but a lack of attention to detail in the past has easily cost us tens of thousands of dollars in wasted computing and countless labor hours – but some of this is inevitable with large projects

37

Page 38: Software tools to facilitate materials science research

Some lessons learned (4)

•  Computer scientists are useful for staying up to date in the fast-moving world of software–  2006: I took a graduate class in databases at MIT; all SQL,

not a single mention of “NoSQL”–  2011: We are designing the framework for Materials

Project; I have lots of experience with SQL; a computer scientist casually mentions NoSQL, its growing prominence, and its potential applicability to our problem

–  2017: We do almost everything in NoSQL•  Lesson: software moves fast! Much faster than

materials science knowledge or methods. Don’t use data from 5 years ago to inform your decision.

38