bigdata analytics in materials science · pearson, k. "on lines and planes of closest fit to...

61
Big-Data Analytics in Materials Science Luca M. Ghiringhelli Fritz Haber Institute Hands-on workshop density-functional theory and beyond: First-principles simulations of molecules and materials Berlin, Germany, July 13 - July 23, 2015

Upload: others

Post on 10-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Big­Data Analytics in 

Materials Science

Luca M. GhiringhelliFritz Haber Institute

Hands­on workshop density­functional theory and beyond:First­principles simulations of molecules and materials

Berlin, Germany, July 13 ­ July 23, 2015

Page 2: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Data, data, data: big data

Big­data challenge, four­V:Volume  (amount of data)Variety  (heterogeneity, of form and meaning of data)Veracity  (uncertainty of data quality)Velocity

High­throughput screening: query and read out what was stored

Shouldn't we do more?

Analysis

­ Identify (so far) hidden correlations­ Identify which materials should be studied next as most promising candidates­ Identify anomalies

Page 3: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

We have a dream

From the periodic table of the elements to a chart of materials:Organize materials according to their properties and functions, e.g.

­ figure of merit of thermoelectrics (as function of T )

­ turn­over frequency of catalytic materials (as function of T and p)

­ efficiency of photovoltaic systems

Page 4: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Training setCalculate properties and 

functions P, for many materials, iDensity­Functional Theory

Fast PredictionCalculate properties 

and functions for new values of d (new materials)

Big Data Analysis

DescriptorFind the appropriate 

descriptor di, build a table: | i | di | Pi | 

LearningFind the function PSL(d) for the table; 

do cross validation.Statistical learning

Page 5: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

(Orbital period)² = C (orbit's major axis)³

Learning   Discovery→

Suppose to know the trajectories of all planets in the solar system, from accurate observations (experiment)orby numerically integrating general relativity equations (calculations at the highest level of theory)

Page 6: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

http://www.bbdc.berlin

Ingredients of a Big­Data project

Page 7: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Databases, platforms

“Just Databases”ICSD Inorganic Crystal Structure DB http://icsd.fiz­karlsruhe.deCOD Crystallography Open DB  http://www.crystallography.net/ESP Electronic Structure Project http://gurka.fysik.uu.se/ESP/CCCBDB Comp. Chemistry Comparison and Benchmark DB 

http://cccbdb.nist.gov/

Databases + analytic toolsMaterials project http://www.materialsproject.orgAFLOW Atomatic Flow for Materials Discovery  http://aflowlib.orgAiiDA Automated interactive infrastructure and DB for Atomistic Simulations 

http://www.aiida.netOQMD Open Quantum Materials DB http://oqmd.org/

Novel Materials Discovery http://nomad­repository.euNoMaD http://nomad­coe.eu 

Data from many different codes   need for a conversion layer→

Page 8: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

“Just Databases”ICSD Inorganic Crystal Structure DB http://icsd.fiz­karlsruhe.deCOD Crystallography Open DB  http://www.crystallography.net/ESP Electronic Structure Project http://gurka.fysik.uu.se/ESP/CCCBDB Comp. Chemistry Comparison and Benchmark DB 

http://cccbdb.nist.gov/

Databases + analytic toolsMaterials project http://www.materialsproject.orgAFLOW Atomatic Flow for Materials Discovery  http://aflowlib.orgAiiDA Automated interactive infrastructure and DB for Atomistic Simulations 

http://www.aiida.netOQMD Open Quantum Materials DB http://oqmd.org/

Novel Materials Discovery http://nomad­repository.euNoMaD http://nomad­coe.eu 

Data from many different codes   need for a conversion layer→

Databases, platforms

Page 9: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

http://www.bbdc.berlin

Ingredients of a Big­Data project

Page 10: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Outline

Descriptors and fingerprints

Back to machine learning: Automatic descriptor search

Linear and non­linear dimensionality reduction

Feature selection

Some words on causal descriptor­property relationship

Page 11: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Descriptors

Can we predict an optimal material for a complex process (e.g. heterogenous catalysis) 

by looking to a simple (set of) descriptor(s) ?

Page 12: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Many parameters enter into the kinetics of a given reaction:­ The energy of all intermediates ­ and of the transition states separating them 

Even for simple reactions, this easily gives 10–20 energy variables for a specific catalytic reaction.

Are they all uncorrelated?

Descriptor for adsorption energies

CH

CH2

CH3

Abild­Pedersen, …,  and J. K. Nørskov PRL 99, 016105 (2007)Nørskov, Abild­Pedersena, Studta, Bligaard PNAS 108, 937 (2011)

Page 13: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

CO + 3H2   →CH4 + H2O

(dissociative adsorption of CO)

Descriptor for turnover frequency (TOF)

C   CH→ x

O   OH→C and O   CO(ads)→CO(ads)   CO(TS) →C   CH→ x(TS)O   OH(TS)→

Stepped 221 surfaces

Page 14: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Selective hydrogenation of acetylene (C2H2) in the presence of an excess of ethylene (C2H4) on 111 surfaces.Unwanted: ethane, C2H6

Descriptor based search for a new catalyst

Page 15: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

(Genetic­like) fingerprint

Gaussian Kernel Ridge Regression

Data: 175 linear 4­blocks periodic polymers. 7 blocks:  CH2, SiF2, SiCl2, GeF2, GeCl2, SnF2, SnCl2, 

Descriptor: 20 dimensions [# building blocks of type i, of ii pairs, of iii triplets]

Pilania, Wang, …, and Ramprasad, Scientific Reports 3, 2810 (2013). DOI: 10.1038/srep02810

Page 16: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Isayev, …, and Curtarolo, Chemistry of Materials 27, 735 (2015)

(Genetic­like) fingerprint

Page 17: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Isayev, …, and Curtarolo, Chemistry of Materials 27, 735 (2015)

(Genetic­like) fingerprint

Page 18: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Supervised learning

d   → P mapping Support vector machinesNeural networksDecision treesGenetic programming   (symbolic regression)Kernel ridge regressionCompressed sensing

Unsupervised learning

d   → d'Find patterns / trends

Principal­components analysisNon­linear dim. reduction  Sketch mapClustering

Machine learning: (yet) a(nother) classification

Page 19: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Principal component analysis

Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901)

Orthonormal transformation of coordinates, converting a set of (possibly) linearly correlated coordinates into a new set of linearly uncorrelated (called principal or normal) components, such that the first component has the largest variance and each subsequent has the largest variance constrained to being orthogonal to all the preceding components

From 8: rs, rp, Es/Zv, Ep/Zv , for A and B

(Linear) dimensionality reduction: principal components

Saad, …, Chelikowsky, and Andreoni, PRB  85, 104104 (2012)

1 2 3Arb

. (lin

ear)

 sca

le

Components

Page 20: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Principal component analysis

Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901)

From 8: rs, rp, Es/Zv, Ep/Zv , for A and B

(Linear) dimensionality reduction: principal components

Saad, …, Chelikowsky, and Andreoni, PRB  85, 104104 (2012)

What's on the axes?

Linear combination of (possibly all) the initial dimensions

1 2 3Arb

. (lin

ear)

 sca

le

Components

Page 21: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

(Non­linear) dimensionality reduction

Page 22: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

(Non­linear) dimensionality reduction

Page 23: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

(Non­linear) dimensionality reduction

Page 24: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Proximity matchingProximity matching

Page 25: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Sketch­map algorithm

Page 26: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Minimization of the stress function (for a set of landmarks points)

Sketch­map algorithm

Page 27: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Sketch map of folded landscape of Ala12

Page 28: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Is point­wise (relative) free energy invariant upon dimensionality reduction?No, only for regions:

Sketch map of folded landscape of Ala12

Page 29: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Sketch map of folded landscape of Ala12

Page 30: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

From clusters to defects in bulk

Page 31: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

From clusters to defects in bulk

Page 32: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

From clusters to defects in bulk

Page 33: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Dimensionality reduction with meaningful axes?

What's on the axes?

Page 34: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

What about having a dimensionality reduction, or call it feature selection, 

i.e., such that the (best) low­dimensional representation is selected 

among (many many) given candidates?

It is time for: compressed sensing

Reference:LMG, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, 

Phys. Rev. Lett. 114, 105503 (2015)Don't overlook the Supplementary Information!

Page 35: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

A famous example: classification of octet binaries

Page 36: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

The chemical space

Page 37: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

A famous example: classification of octet binaries

Eh depends on the first nearest neighbour distance.C depends on the electrical conductivity.Eh²+ C² is related to the (average) band gap.

Both are properties of the binary compound.

Note: more than classification, the distance from the dividing line has a meaning.

Page 38: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Figure of merit to be optimized:

Regularization (prefer “lower complexity” in the solution)

(Linear) ridge regression

Mathematical formulation of the problem

Page 39: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Inadequacy of the “natural” descriptor

Page 40: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Inadequacy of the “natural” descriptor

   = 1E­4 ;   = 0.1λ σ

Gaussian Kernel Ridge Regression

Page 41: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Wish list for a descriptor

Page 42: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Figure of merit to be optimized:

Regularization (prefer “lower complexity” in the solution)

A more complex regularization:

(Linear) ridge regression

NP – hard !!!

Mathematical formulation of the problem

Page 43: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Mathematical formulation of the problem: sparsity

LASSO: convex problem, equivalent to the NP-hard if features (columns of D) are uncorrelated

Page 44: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

LASSO, compressed/ive sensing in Materials Science

Page 45: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

23 primary features

­  Find a descriptor AND an accurate  evaluation  for the difference  in energy between RS and ZB crystal structures for all (82) AB octet semiconductors.ΔE = ΔE ( d )

­  Possibly  identify  a  2D  descriptor  which  gives  a “nice” representation of the materials in a plane

The task

Page 46: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

KS 

leve

ls [

eV]

Valence p

Valence sRadial probability densities 

[Å]

Primary (atomic) features

Radius @ maxAverage radiusTurning point

example: Sn (Tin)

Valence p (HOMO)

Valence s

KS lev els [eV

]

LUMO

Page 47: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

23 primary features

Only linear combination of primary features: not enough.

“Best” performance: 21 selected  features (RMSE = 0.10 eV).

  

LASSO:

LASSO

Page 48: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

+

Radius 1 Radius 2

+

KS level 1 KS level 2

Systematic construction of the feature space

Page 49: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Radius 1 Radius 2 KS level 1 KS level 2

Systematic construction of the feature space

| x ­ y | | x ­ y |

Page 50: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

KS level 1 KS level 2

+

Radius 1 Radius 2

/

| x ­ y |

Systematic construction of the feature space

+

Radius 1 Radius 2 KS level 1 KS level 2

| x ­ y |

Page 51: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

KS level 1 KS level 2

+

Radius 1 Radius 2

/

| x ­ y |

Systematic construction of the feature space

+

Radius 1 Radius 2 KS level 1 KS level 2

| x ­ y |

exp(x)

(x)^n

In practice: formalism borrowed form symbolic regression

Page 52: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Systematic construction of the feature space: EUREQA

Descriptor (candidates: 242)a The largest distance between a H atom and its nearest Si neighborb The shortest distance between a Si atom and its sixth­nearest Si neighborc The maximum bond valence sum on a Si atomd The smallest value for the fifth­smallest relative bond length around a Si atome The fourth­shortest distance between a Si atom and its eighth­nearest neighborf The second­shortest distance between a Si atom and its fifth­nearest neighborg The third­shortest distance between a Si atom and its sixth­nearest neighborh The H­Si nearest­neighbor distance for the hydrogen atom with the fourth­smallest difference between the distances to the two Si atoms nearest to a H atom

T. Müller et al. PRB 89 115202 (2014):Data: ~1000 amorphous structures of 216 Si atoms (saturated)

Property: hole trap depth

EUREQA: genetic programming software. Global optimization (genetic algorithm).Schmidt M., Lipson H., Science, Vol. 324, No. 5923, (2009)

Page 53: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Total ~ 10000 features

Systematic construction of the feature space

Primary features

Page 54: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

1D

2D

3D

“Extended” LASSO : features are correlated, so the first 25-30 features selected by lasso when scanning from large to low λ are selected and all single features, all pairs, all triplets... are separately tested via linear regression (the NP-hard problem, but only with 25-30 features)

1D 2D 3D

Finding the descriptor

Page 55: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Performance of the descriptors: accuracy, validation

ε

!

“Complexity”

Erro

r

Training err.

Validation err.

Leave 10% out cross validation

Errors are energies, in eV

Max Absolute Error

Convergence with dimensionality of the descriptor

Page 56: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Performance of the descriptors: accuracy

1D 2D

5D3D

Page 57: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Two­dimensional descriptor

0 0.2 eV 0.45 eV 1.0 eV-0.2 eV

Page 58: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Few words on causality

There are four possibilities (types of causality relationship) behind P(d):

1. d → P : P “listens” to d

2. P → d : d “listens” to P

3. A → d and A → P : There is no direct connection between d and P, but d and P both “listen” to a third “actuator”

4. There is no direct connection between d and P, but they have a common effect (Berkson paradox)

...that listens to both and screams: “I occurred” [Judea Pearl]

[If the admission criteria to a certain graduate school call for either high grades as an undergraduate or special musical talents, then these two attributes will be found to be correlated (negatively) in the student population of that school, even if these attributes are uncorrelated in the population at large (selection bias). Indeed, students with low grades are likely to be exceptionally gifted in music, which explains their admission to graduate school.]

Page 59: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Few words on causality

We are not able to write down a scientific law that connects the descriptor

directly with the total-energy difference between RS and ZB structures

However, ZA, ZB determine these descriptors, and ZA, ZB determine the many-body Hamiltonians and the total-energy difference.

Quantitative analysis: effect of noise

Page 60: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Features multiplied by Gaussian noise

The same 2D descriptor is found:

Property perturbed by adding uniform noise

The same 2D descriptor is found:

The effect of noise

Page 61: BigData Analytics in Materials Science · Pearson, K. "On Lines and Planes of Closest Fit to Systems of Points in Space". Philosophical Magazine 2, 559 (1901) Orthonormal transformation

Big­data for Materials Science: Infrastructures

Descriptors and fingerprints

Automatic descriptor search:Linear and non­linear dimensionality reduction

Principal component analysisSketch map

Feature selectionLASSO (compressed sensing)Symbolic regression

Some words on causal descriptor­property relationshipCross­validation and Stability analysis

Summary