reverse engineering metadata for the materials project · reverse engineering metadata for the...

21
Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist Lawrence Berkeley Labs SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Upload: others

Post on 31-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Reverse engineering metadata for the Materials

ProjectShyam DwaraknathResearch Scientist

Lawrence Berkeley Labs

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 2: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

What is Materials Project

• Free and open source database of computed materials properties

• Applications to explore data set in ”material science” means

• Design computational workflows based on experimental ground truth

• Focus on property diversity over number of structures

• All the software bells and whistles that go with a large web data project

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 3: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

The good old days

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

StructureWorkflowDetermine Material

Structure StructureWorkflow

StructureWorkflow

DataAnalysis Website

Page 4: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

What is metadata?

A set of data that describes and gives information about other data.

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Descriptive Structural Administrative

Page 5: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

What is metadata?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

DFT Calculation

Descriptive Structural Administrative

DFT Code Where are the inputs Who computed

Where computed

Who gets to access

Where are the outputs

Units

Composition

Volume Change

Page 6: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Some metadata are constructed implicitly

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

DFT Calculation

Structural Administrative

Where are the inputs Who computed

Where computed

Who gets to access

Where are the outputs

Units

Page 7: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Some metadata are constructed explicitly

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

DFT Calculation

Descriptive

DFT Type

DFT Code

Composition

Intent

Administrative

Page 8: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

How do you determine the intent of a calculation?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

• The purpose of the workflow• The purpose of the database it

was in• The purpose of the computer it

ran on

Why not determinethe intent as theneed arises?

Page 9: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Metadata is technical debt

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Before After

Structures

Calculate

MongoDBDatabase

Material

Calculate Calculate

Calculate Calculate Calculate

MongoDBDatabase

Material

Decoupling Metadata from the workflow

enabled more agile workflow development

Structures

Page 10: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Decoupling metadata makes it declarative

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structures

Compute

Band structure Path

I changed my path!?

Structures

Compute

New band structure path

Structures Computation

Band structure Path

Build band structure

Vorführender
Präsentationsnotizen
Told it to vs Wanted it to Do
Page 11: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Declarative metadata gives us connectivity

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Build band structure

Structures Computation

Find key points in Calculations

Group Calculations by Structure

Path DefinitionFind key paths

in Calculations

Band structure Path

Page 12: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Reverse Engineering Provenance

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structures

Structures

Structures

ICSD

User Submissions

MP

Structures

Equivalent Structures

Structure

Page 13: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Define process metadata in one spot

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structure

Spacegroup

Lattice

Composition

Atoms

Structure

Spacegroup

Lattice

Composition

Atoms

=

=

=

=

tolerance

tolerance

tolerance

within

within

within

Equivalent Structures

Page 14: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Define process metadata in one spot

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structure

Spacegroup

Lattice

Composition

Atoms

Calculation Input Structure

Spacegroup

Lattice

Composition

Atoms

=

=

=

=

Equivalent Structures

Page 15: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Codify human processes

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Disordered Structures

ICSD

Enumeration Library

Choose “Best”

Ordered Equivalent

Ordered Equivalent

Compare Compositions

Disordered Structures

ICSD

Compare Disordered

Spacegroups

Compare Anonymous Structures

Disordered Equivalent

Page 16: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Can we generalize this to more than computation?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 17: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Propnet – Connecting Materials Models

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 18: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Too many models, now what?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 19: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Metadata lets us augment properties

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 20: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Reverse engineering metadata for experiments

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Page 21: Reverse engineering metadata for the Materials Project · Reverse engineering metadata for the Materials Project Shyam Dwaraknath Research Scientist. Lawrence Berkeley Labs. SHARED

Questions?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP