reverse engineering metadata for the materials project · reverse engineering metadata for the...

Post on 31-May-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reverse engineering metadata for the Materials

ProjectShyam DwaraknathResearch Scientist

Lawrence Berkeley Labs

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

What is Materials Project

• Free and open source database of computed materials properties

• Applications to explore data set in ”material science” means

• Design computational workflows based on experimental ground truth

• Focus on property diversity over number of structures

• All the software bells and whistles that go with a large web data project

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

The good old days

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

StructureWorkflowDetermine Material

Structure StructureWorkflow

StructureWorkflow

DataAnalysis Website

What is metadata?

A set of data that describes and gives information about other data.

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Descriptive Structural Administrative

What is metadata?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

DFT Calculation

Descriptive Structural Administrative

DFT Code Where are the inputs Who computed

Where computed

Who gets to access

Where are the outputs

Units

Composition

Volume Change

Some metadata are constructed implicitly

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

DFT Calculation

Structural Administrative

Where are the inputs Who computed

Where computed

Who gets to access

Where are the outputs

Units

Some metadata are constructed explicitly

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

DFT Calculation

Descriptive

DFT Type

DFT Code

Composition

Intent

Administrative

How do you determine the intent of a calculation?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

• The purpose of the workflow• The purpose of the database it

was in• The purpose of the computer it

ran on

Why not determinethe intent as theneed arises?

Metadata is technical debt

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Before After

Structures

Calculate

MongoDBDatabase

Material

Calculate Calculate

Calculate Calculate Calculate

MongoDBDatabase

Material

Decoupling Metadata from the workflow

enabled more agile workflow development

Structures

Decoupling metadata makes it declarative

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structures

Compute

Band structure Path

I changed my path!?

Structures

Compute

New band structure path

Structures Computation

Band structure Path

Build band structure

Vorführender
Präsentationsnotizen
Told it to vs Wanted it to Do

Declarative metadata gives us connectivity

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Build band structure

Structures Computation

Find key points in Calculations

Group Calculations by Structure

Path DefinitionFind key paths

in Calculations

Band structure Path

Reverse Engineering Provenance

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structures

Structures

Structures

ICSD

User Submissions

MP

Structures

Equivalent Structures

Structure

Define process metadata in one spot

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structure

Spacegroup

Lattice

Composition

Atoms

Structure

Spacegroup

Lattice

Composition

Atoms

=

=

=

=

tolerance

tolerance

tolerance

within

within

within

Equivalent Structures

Define process metadata in one spot

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Structure

Spacegroup

Lattice

Composition

Atoms

Calculation Input Structure

Spacegroup

Lattice

Composition

Atoms

=

=

=

=

Equivalent Structures

Codify human processes

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Disordered Structures

ICSD

Enumeration Library

Choose “Best”

Ordered Equivalent

Ordered Equivalent

Compare Compositions

Disordered Structures

ICSD

Compare Disordered

Spacegroups

Compare Anonymous Structures

Disordered Equivalent

Can we generalize this to more than computation?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Propnet – Connecting Materials Models

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Too many models, now what?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Metadata lets us augment properties

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Reverse engineering metadata for experiments

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

Questions?

SHARED METADATA AND DATA FORMATS FOR BIG-DATA DRIVEN MATERIALS SCIENCE: A NOMAD-FAIRDI WORKSHOP

top related