arxiv:2007.07512v1 [physics.comp-ph] 15 jul 2020

12
Machine learning on the electron-boson mechanism in superconductors Wan-Ju Li, 1, 2 Ming-Chien Hsu, 1 and Shin-Ming Huang 1, * 1 Department of Physics, National Sun Yat-sen University, Kaohsiung 80424, Taiwan 2 Institute of Physics, Academia Sinica, Taipei 11529, Taiwan (Dated: July 16, 2020) To unravel pairing mechanism of a superconductor from limited, indirect experimental data is always a difficult task. It is common but sometimes dubious to explain by a theoretical model with some tuning parameters. In this work, we propose that the machine learning might infer pairing mechanism from observables like superconducting gap functions. For superconductivity within the Migdal-Eliashberg theory, we perform supervised learning between superconducting gap functions and electron-boson spectral functions. For simple spectral functions, the neural network can easily capture the correspondence and predict perfectly. For complex spectral functions, an autoencoder is utilized to reduce the complexity of the spectral functions to be compatible to that of the gap functions. After this complexity-reduction process, relevant information of the spectral function is extracted and good performance restores. Our proposed method can extract relevant information from data and can be applied to general function-to-function mappings with asymmetric complexities either in physics or other fields. I. INTRODUCTION The mechanism of superconductivity has been one of hottest topics for more than one century since the first ex- perimental evidence of superconductivity in mercury was observed by H. K. Onnes in 1911 1 . After half a century, the seminal work by Bardeen, Cooper and Schrieffer 2 , called the BCS theory, elucidating inevitable pairing at- traction from the electron-phonon coupling (EPC), suc- cessfully resolve the long-standing puzzle of the origin of superconductivity. However, materials with strong EPC could not be incorporated in this scope. The strong EPC issue in normal metal was solved by Migdal using the field theoretical Green’s function approach to show the perturbation series of the EPC can converge quickly due to the negligible vertex correction compared to the self- energy 3,4 . The work was further extended to describe the superconducting states by Eliashberg on the basis of collection of states introduced by Bogoliubov 5,6 . The resulting Eliashberg theory, also known as the Migdal- Eliashberg (ME) theory, is a modification of the BCS theory to take into account the EPC more realistically and thus the retardation effect more accurately 7 . The success of EPC for superconductivity demon- strates the huge impact from collaboration between fermions (electrons) and bosons (phonons). Since then, scientists keep searching for possible bosonic modes be- hind superconductivity, including unconventional super- conductors such as high-temperature superconducting cuprate 8 and iron-based superconductors 9,10 . Possible bosonic modes include the spin wave, spin fluctuation and so on 11 , each of which has specific spectrum. In this way, the original BCS theory is widely generalized to de- scribe the superconductivity based on the interactions between electronic and bosonic modes and is applied to various superconducting systems even when the pairing glues of them are expected to be unconventional. Know- ing the bosonic spectrum relevant to the superconducting pairing thus gives great insights on the mechanism of su- perconductivity. Theoretically, it is straightforward to calculate gap functions based on the spectrum of bosonic glue of su- perconductivity. On the contrary, the bosonic spec- trum is material dependent and requires case-dependent ab initio calculations. Since electron and boson con- tribute to the superconductivity through the overall effec- tive electron-boson spectral function (EBSF) without the need to know much material details, the inference of the EBSF from superconducting properties is possible and can stand for a convincing proof of the mechanism. Our goal is to infer the EBSF from the gap function using machine learning techniques, and this strategy may be extended to other mechanisms including unconventional superconductivity. Since the beginning of this century, machine learning (ML) related techniques have been intensively developed in the fields of the computer vision and natural language processing 12–15 . Among the methods for machine learn- ing, the supervised learning and unsupervised learning have been largely applied to various fields other than the computer sciences. In the supervised learning (SL), data with labels are used to train the machine such that the well-trained machine can predict correct labels of data unseen before. For instance, if we have two functions F A and F B having some unknown one-to-one correspondence to each other we can train a SL machine to predict F B for given F A or vice versa. In the unsupervised learn- ing (USL), a machine is trained by feeding data without labels to learn the distributions of the data with or with- out the dimension reduction. For instance, the restricted Boltzmann machines can learn the distributions of data without the dimension reduction 16 while autoencoders (AEs) can learn the distributions in the latent space for a compressed representation in reduced dimensions 17 . Starting from few years ago, modern ML gradually becomes a relevant skill for solving different types of problems in physics 18–24 . Recently, the detection of phase transitions starts to employ ML as an alterna- arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

Upload: others

Post on 22-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

Machine learning on the electron-boson mechanism in superconductors

Wan-Ju Li,1, 2 Ming-Chien Hsu,1 and Shin-Ming Huang1, ∗

1Department of Physics, National Sun Yat-sen University, Kaohsiung 80424, Taiwan2Institute of Physics, Academia Sinica, Taipei 11529, Taiwan

(Dated: July 16, 2020)

To unravel pairing mechanism of a superconductor from limited, indirect experimental data isalways a difficult task. It is common but sometimes dubious to explain by a theoretical model withsome tuning parameters. In this work, we propose that the machine learning might infer pairingmechanism from observables like superconducting gap functions. For superconductivity within theMigdal-Eliashberg theory, we perform supervised learning between superconducting gap functionsand electron-boson spectral functions. For simple spectral functions, the neural network can easilycapture the correspondence and predict perfectly. For complex spectral functions, an autoencoderis utilized to reduce the complexity of the spectral functions to be compatible to that of the gapfunctions. After this complexity-reduction process, relevant information of the spectral function isextracted and good performance restores. Our proposed method can extract relevant informationfrom data and can be applied to general function-to-function mappings with asymmetric complexitieseither in physics or other fields.

I. INTRODUCTION

The mechanism of superconductivity has been one ofhottest topics for more than one century since the first ex-perimental evidence of superconductivity in mercury wasobserved by H. K. Onnes in 19111. After half a century,the seminal work by Bardeen, Cooper and Schrieffer2,called the BCS theory, elucidating inevitable pairing at-traction from the electron-phonon coupling (EPC), suc-cessfully resolve the long-standing puzzle of the origin ofsuperconductivity. However, materials with strong EPCcould not be incorporated in this scope. The strong EPCissue in normal metal was solved by Migdal using thefield theoretical Green’s function approach to show theperturbation series of the EPC can converge quickly dueto the negligible vertex correction compared to the self-energy3,4. The work was further extended to describethe superconducting states by Eliashberg on the basisof collection of states introduced by Bogoliubov5,6. Theresulting Eliashberg theory, also known as the Migdal-Eliashberg (ME) theory, is a modification of the BCStheory to take into account the EPC more realisticallyand thus the retardation effect more accurately7.

The success of EPC for superconductivity demon-strates the huge impact from collaboration betweenfermions (electrons) and bosons (phonons). Since then,scientists keep searching for possible bosonic modes be-hind superconductivity, including unconventional super-conductors such as high-temperature superconductingcuprate8 and iron-based superconductors9,10. Possiblebosonic modes include the spin wave, spin fluctuationand so on11, each of which has specific spectrum. In thisway, the original BCS theory is widely generalized to de-scribe the superconductivity based on the interactionsbetween electronic and bosonic modes and is applied tovarious superconducting systems even when the pairingglues of them are expected to be unconventional. Know-ing the bosonic spectrum relevant to the superconductingpairing thus gives great insights on the mechanism of su-

perconductivity.

Theoretically, it is straightforward to calculate gapfunctions based on the spectrum of bosonic glue of su-perconductivity. On the contrary, the bosonic spec-trum is material dependent and requires case-dependentab initio calculations. Since electron and boson con-tribute to the superconductivity through the overall effec-tive electron-boson spectral function (EBSF) without theneed to know much material details, the inference of theEBSF from superconducting properties is possible andcan stand for a convincing proof of the mechanism. Ourgoal is to infer the EBSF from the gap function usingmachine learning techniques, and this strategy may beextended to other mechanisms including unconventionalsuperconductivity.

Since the beginning of this century, machine learning(ML) related techniques have been intensively developedin the fields of the computer vision and natural languageprocessing12–15. Among the methods for machine learn-ing, the supervised learning and unsupervised learninghave been largely applied to various fields other than thecomputer sciences. In the supervised learning (SL), datawith labels are used to train the machine such that thewell-trained machine can predict correct labels of dataunseen before. For instance, if we have two functions FAand FB having some unknown one-to-one correspondenceto each other we can train a SL machine to predict FBfor given FA or vice versa. In the unsupervised learn-ing (USL), a machine is trained by feeding data withoutlabels to learn the distributions of the data with or with-out the dimension reduction. For instance, the restrictedBoltzmann machines can learn the distributions of datawithout the dimension reduction16 while autoencoders(AEs) can learn the distributions in the latent space fora compressed representation in reduced dimensions17.

Starting from few years ago, modern ML graduallybecomes a relevant skill for solving different types ofproblems in physics18–24. Recently, the detection ofphase transitions starts to employ ML as an alterna-

arX

iv:2

007.

0751

2v1

[ph

ysic

s.co

mp-

ph]

15

Jul 2

020

Page 2: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

2

tive method25–33. Since then, many following effortshave been devoted to applying ML to phase transitionsof diverse theoretical models30,34–43. In addition to thephase transition detection, ML techiques can also beapplied to various fields, such as guiding the materi-als discovery44–47, preparing special quantum states48–52

and extracting information from measurements in X-rayexperiments53–55.

For superconductivity, efforts are mainly devotedto the prediction of the transition temperature.These include the similarity search using the encodedfingerprints56, learning the representations of elementsin the periodic table to predict superconductivity57,58

and the estimation of the transition temperature via dis-parate models, such as the support vector machines59,60,gradient boosted model61, regression models62,63, self-learned descriptors combined with the atom tableconvolutional neural networks (ATCNN)64, equation-based model65, convolutional gradient boosting decisiontrees66, variational Bayesian neural network67 and mod-els using the text-mining-generated training datasets68.Effects of training data and improved measures are alsoinvestigated for materials discovery69. However, com-pared to the prediction of the superconducting transi-tion temperatures, the mechanism of the superconduc-tivity attracts less attention. One representative work isthe estimation of normal and anomalous self-energies ofsuperconductors from experimental ARPES data70.

In this work, we realize a machine which can answer thepossible EBSF from the input: a gap function. Throughthe establishment of the relations learned by the ma-chines, we can derive the spectral function relevant tothe gap function without performing the ab initio calcula-tions. The performance for simple EBSFs is good by theSL and what is learned by the machine during the train-ing process is investigated. The SL alone, however, ishard to improve the performance of complex, randomly-generated EBSFs. In order to resolve this issue, we usedthe AE to evaluate the complexities of both the EBSFsand gap functions. It was found that EBSFs have muchhigher complexity. Therefore, the complexity of EBSFsis reduced by AE to be compatible to that of the gapfunction, in which way smoothed EBSFs are generatedto contain key information relevant to the input data.By using the AE-smoothed EBSFs as the new labels andthe AE-transformed gap functions as the new inputs, theperformance is greatly improved, which indicates that thenew EBSFs preserve the essential information relevant tothe new gap function so that they have one-to-one corre-spondence between them. This approach thus proposeda process of complexity reduction which can generallyextract the key information of the data relevant to ourinput functions, and can be utilized to improve SL tasksin physics or other fields. The process is demonstratedin Fig. 1. Note that we also study the cases where thedensity of state (DOS) is the input. The results are sim-ilar to those for the gap functions so only results for thegap functions are presented unless otherwise specified.

SL machine of good performance?

1

YES NO

Machine for prediction (Nice prediction due to 

one‐to‐one correspondence)

2Check complexities by AE

(𝒅𝑳𝒐 𝒅𝑳𝒊?)

3

YES

Change hyper‐parameters

NO

Choose latent dimension : 𝒅𝑳𝒊(new data (𝒅𝑳𝒊) generated by AE)

Train SL machine using new data

Machine for prediction (extracting information relevant 

to input function)

4

5

6

FIG. 1: The flowchart of the training process. We start atstep 1 by training a supervised learning (SL) machine with adataset. If the performance is good then we can use the well-trained machine for predictions (step 2). If the performanceis not good, we have to check if the latent dimension of out-put functions, dLo, is larger than that of input functions,dLi

(step 3). If it is not the case, we expect the performance canbe improved by changing the hyper-parameters of the neu-ral network, such as the number of layers and the number ofneurons in each layer (go back to step 1). See also the dis-cussion section. If dLo > dLi, we firstly train an autoencoder(AE) with latent dimension dLi using the original dataset andgenerate new datasets from the well-trained AE (step 4). Atstep 5, we train a new SL machine with the newly generateddataset and the good performance is expected (step 6).

The outline of the rest of this article is given in thefollowing. In Sec. II, we introduce the ME formalism,describing the relations between the EBSFs and the gapfunctions, the training datasets. In Sec. III, we describethe details and results for the training with simple (Sec.III A) and complex EBSFs (Sec. III B). We discuss therelation between the performance and the compatibilityof complexity as well as potential applications of our workand then make a conclusion in Sec. IV.

Page 3: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

3

II. MIGDAL-ELIASHBERG FORMALISM

The key feature of the superconductivity relies on theCooper-pair condensation, initiated by the pair of states(k ↑,−k ↓) occupied coherently. Therefore, the Nambu-Gor’kov formalism utilizing the two-component spinor

ψ†k = (c†k↑ c†−k↓) was introduced71–73. the Green function

at momentum k and (imaginary) Matsubara frequencyiωn is

[G(k, iωn)]−1 = iωnI− εkσ3 − Σ(k, iωn), (1)

where εk is the normal-state energy dispersion and Σ isthe self-energy. For the self-energy Σ, the ME theoryconsiders the self-energy Σ comes from both the EPCand the electron-electron Coulomb interaction. Espe-cially, the vertex corrections are argued to be small sothat a bare vertex is a good approximation, which meansthe EPC is truncated at order of ωD/EF , the ratio be-tween the Debye frequency and the Fermi energy. Thisleads to the self-energy

Σ(k, iωn) = − 1β

∑k′n′ν σ3G(k′, iωn′)σ3

×[|gk,k′,ν |2Dν(k− k′, iωn − iωn′) + VC(k− k′)],(2)

where VC(k−k′) is the screened Coulomb potential, andgk,k′,ν is the screened EPC strength which describes scat-tering between electron states k and k′ through a phononwith wave vector q = k′ − k and frequency ωqν . Thepropagator for phonons is Dν(q, iωn) = 2ωqν/[(iωn)2 −ω2qν ] and the self-energy is in the form of

Σ(k, iωn) = iωn[1−Z(k, iωn)]I+χ(k, iωn)σ3+φ(k, iωn)σ1(3)

where Z(k, iωn), χ(k, iωn), φ(k, iωn) are functions at theMatsubara frequencies to be determined. Here, the phaseof the pairing potential φ(k, iωn) is gauged so as to ex-clude the σ2 component. Therefore, the Dyson equationbecomes

[G(k, iωn)]−1 = iωnZI− (εk + χ)σ3 − φσ1. (4)

The functions of Z(k, iωn), χ(k, iωn) and φ(k, iωn) canbe determined.

The functions depend on k, which impedes heavily onthe computational task. A standard approximation is ap-plied that the DOS is constant of NF around the Fermienergy EF since EF is much larger than the pairing en-ergy. Hence the energy shift χ is a constant to be omit-ted. Furthermore, for conventional superconductors thegap function anisotropy is usually weak or smeared outby impurities73. Therefore, an isotropic approximationis adopted to average k over the Fermi surface. By defin-ing the electron-phonon (electron-boson) spectral func-tion (EBSF)

α2F (ω) =1

NF

∑k,k′

∑ν

|gk,k′,ν |2δ(εk′)δ(εk)δ(ω − ωqν)

(5)

where the function is positive definite, we finally have theME equations to solve self-consistently in the following

Z(iωn) = 1 +πT

ωn

∑n′

ωn′

R(iωn′)λ(n′ − n) (6)

Z(iωn)∆(iωn) = πT∑n′

∆(iωn′)

R(iωn′)[λ(n′−n)−µ∗θ(ωc−|ωn′ |)]

(7)

where R(iωn) ≡√ω2n + ∆2(iωn), ∆ ≡ φ/Z is the gap

function, and

λ(n′ − n) =

∫2ωα2F (ω)

(ωn − ωn′)2 + ω2dω, (8)

is the interaction kernel function. T is the temperature,and the reduced Planck constant ~ and the Boltzmannconstant kB are set to unity for convenience.

The electron-electron interaction is renormalized andthen considered via the the Morel-Anderson pseudopo-tential µ∗ in the frequency below ωc. The µ∗ value isempirical and in the range of 0.1 ∼ 0.2 for most con-ventional superconductors73. Its value usually does notchange the behaviour of the results much. The sum-mation of the Matsubara frequencies is truncated in thenumerical calculations. Once Z(iωn) and ∆(iωn) are cal-culated at the Matsubara frequencies, we perform ana-lytical continuation iωn → ω+ = ω + iΓ, where ω isthe real frequency and Γ → 0+ an infinitesimal positivenumber. Numerically this can be performed under Padeapproximation74. In Appendix C, we show the details ofthe numerical calculations of the ME equations and thepreparation of the input (gap function ∆(ω)) and output(EBSF α2F (ω)) part of the training datasets.

III. INFERRING THE SPECTRAL FUNCTIONSBY SUPERVISED LEARNING

We build up a SL machine by using Keras on theTensorflow backend to take ∆(ω) as an input functionto predict the EBSF as the output function. The net-work structure of the machine contains the input layer(300 neurons), output layer (1000 neurons) and two hid-den layers (500 and 800 neurons, respectively), sketchedin Fig. 2. ∆(ω) is fed into the input layer and, afterpassing through two hidden layers, the predicted EBSFs,α2F (ω), are produced in the output layer. The activa-tion functions for the first three layers are ReLU while alinear function is used in the output layer. We choose themean-square error (MSE) as the loss function to be mini-mized by the ADAM optimizer75. We divide our datasetinto three subsets: 10% of the data for the testing set,9% for the validation set and the rest for the training set.

Page 4: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

4

∆ 𝜔

Input layer Output layerTwo hidden layers

EBSF

NTO layer

FIG. 2: The neural network for the supervised learning topredict EBSFs based on the information from the gap func-tions.

A. Good performance for simple bosonic spectrum

In this subsection, we study the performance of themachine when the complexity of the data is low. We pre-pare the data (totally 1800 data in this dataset) in whichevery EBSF is constructed by a combination of Gaussianvariants. We show some EBSFs in Fig. 3(a) and gapfunctions in Fig. 3(b). Our machine nicely predicts theEBSF, as shown in the close overlap between the real(dashed orange) and predicted (solid blue) functions inFig. 3(a).

Given the good performance, we would like to have in-sights on what the machine has learned from the trainingdataset. The details are shown in the appendix B.

B. AE-aided supervised learning for asymmetriccomplexities

In this part of work, we prepare diverse EBSFs (totally10000 data in this dataset) by randomly generating thevalue for each frequency in one EBSF and performingsome smoothing (averaging) schemes to slightly elimi-nate fluctuations. The performance after the training isshown in Fig. 4. The predicted EBSFs (p-EBSFs) matchapproximately the main profile of the ground-truth EB-SFs (gt-EBSFs) but miss fine structures. We proposethat this discrepancy is caused by different complexitiesof EBSFs and gap functions so that they do not have one-to-one correspondence to each other. Explicitly speaking,it is close to a many(EBSFs)-to-one(gap function) map-ping, which will be discussed in the appendix A. There-fore, in order to restore good performance necessary isthe complexity adjustment, during which relevant infor-mation of EBSFs can be extracted. In the following, wequantify the complexity by an AE and show explicitlythat the complexity of the EBSFs is much higher thanthat of the gap functions ∆(ω).

An AE, as illustrated in Fig. 5, is usually used toreduce the dimensions of data and to represent data aslatent vectors in the latent space. We train an AE to

(a)

0.0 1.7 3.4 5.1(meV)

0

5

10

15

20

EBSF

0.0 1.7 3.4 5.1(meV)

0.02.55.07.5

10.012.515.017.520.0

EBSF

predictionlabel

0.0 1.7 3.4 5.1(meV)

012345678

EBSF

0.0 1.7 3.4 5.1(meV)

0.00.51.01.52.02.53.03.54.0

EBSF

(b)

𝑹𝒆 ∆𝑰𝒎 ∆

FIG. 3: (a). The labels (orange dashed curves) and predic-tions (blue solid curves) for EBSFs. (b). The real (blue solidcurves) and imaginary (orange dashed curves) part of gapfunctions.

firstly take in a data in the input layer and, after goingthrough the whole AE network, to reconstruct the samedata in the output layer. The middle layer of an AE iscalled the bottleneck whose number of neurons definesthe dimension of the latent space, the latent dimensiondL. Our strategy is to concatenate each gap function

0.0 3.3 6.6 9.9(meV)

0.000.250.500.751.001.251.501.75

EBSF

0.0 3.3 6.6 9.9(meV)

0.00.20.40.60.81.0

EBSF

predictionlabel

0.0 3.3 6.6 9.9(meV)

0.0

0.5

1.0

1.5

2.0

EBSF

0.0 3.3 6.6 9.9(meV)

0.00.20.40.60.81.0

EBSF

FIG. 4: The labels (orange dashed curves) and predictions(blue solid curves) for complex EBSFs. The predictions onlymatch the main profile of the labels because the complexityof the EBSFs (output) is much higher than that of the gapfunctions (input).

Page 5: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

5

Autoencoder

Latent space(Middle layer)

Hidden layer Hidden layerHidden layer output layerInput layer

FIG. 5: The sketch of an autoencoder (AE). An AE containsone encoder and one decoder. An encoder is defined as thepart from the input layer to the middle layer (The left threelayers.) and a decoder is defined as the part from the middlelayer to the output layer (the right three layers). Therefore,an encoder maps a data into an latent vector and a decodergenerates a data based on a given latent vector. We train theAE to reconstruct the input in the output layer.

(vector) and the corresponding gt-EBSF (vector) as anew vector and then use those combined vectors to trainthe AE.

As the first trial, we set dL = 2 and the result is shownin Fig. 6(b). This 1400-dimension vector is composed ofthe EBSF (the first 1000 dimensions) and the gap func-tion (the remaining 400 dimensions), where each dimen-sion is a discrete frequency point of the correspondingfunctions. The gap function is well reconstructed whilethe EBSF is not. This result shows that the two dimen-sional latent space is enough for representing the gapfunctions but not enough for representing the complexEBSFs. Next, in order to find the required dimension ofthe latent space for well representing both the EBSFs andgap functions, we study cases with the latent dimensionsdL = 1, 2, 4, 8, 32, and 64. The results are shown in Fig.6. Both EBSFs and gap functions are well reconstructedfor dL ≥ 32, which indicates that the latent dimensionfor the EBSFs is roughly 32 (The minimal required latentdimension should be between 24 and 32 according to oursimulations.)

The two complexities (or dL’s) for EBSFs and gapfunctions are so different that many distinct EBSFs maymap onto similar gap functions, consistent with the pro-posed many-to-one mapping. In this case, the machinehas difficulties in learning the mapping and possibly endup with outputting the averaged EBSF. It is consistentwith what has been observed in Fig. 4 as the predic-tion curves seem to serve as smoothed functions of thereal curves. This also explains our bad training perfor-mance in Fig. 4: randomly generated EBSFs containmuch information, and most of it is not relevant to thegap function so that the performance of the SL machineis reduced. Questions are raised: can the simplified p-

(a) dL = 1

EBSF Gap

(b) dL = 2

0 467 933 140064202468

0 467 933 140032101234

(c) dL = 4

0 467 933 14002

1

0

1

2

3

0 467 933 14007.55.02.50.02.55.07.5

10.0

(d) dL = 8

0 467 933 14002

1

0

1

2

3

0 467 933 14007.55.02.50.02.55.07.5

10.0

(e) dL = 32

0 467 933 14002

1

0

1

2

3

0 467 933 14007.55.02.50.02.55.07.5

10.0

(f) dL = 64

0 467 933 140064202468

0 467 933 140032101234

FIG. 6: The original (orange dashed curves) and recon-structed (blue solid curves) functions of AE’s with differentlatent dimensions dL = 1 (a), 2 (b), 4 (c), 8 (d), 32 (e) and64 (f). The first 1000 components (on the left) of each curveis the EBSF while the last 400 components (on the right) isthe gap function. When dL ≥ 32, both the EBSF and gapfunctions can be well reconstructed.

EBSF be a good prediction? It is possible that this sim-plified p-EBSFs have contained enough information toreproduce all relevant physics in the gap functions andwe do not really need the original, complex EBSFs forunderstanding the mechanism behind the superconduc-tivity. Maybe all we need are those EBSFs having similarcomplexity to that of the gap functions.

In order to clarify ideas mentioned above, we calcu-late two gap functions based on the gt-EBSFs and thep-EBSFs. If both the simplified p-EBSF and complexgt-EBSF contain the relevant information, their result-ing gap functions should look similar. Our results, asshown in Fig. 7, supports this idea. For each subfigure,the first panel is the EBSF and the following two pan-els show the calculated gap-related functions. Althoughthe gt-EBSF has some small-scale fluctuations which donot exist in the p-EBSF, their resulting two functionslook similar. It demonstrates that the extra complexity(say, for instance, those small-scale fluctuations) is notphysically relevant to the gap functions. Therefore, ourSL machine, in fact, does learn the physically relevantpart of the EBSFs even though it does not fully repro-duce the gt-EBSFs. Given that both the p-EBSFs andgt-EBSFs contain information relevant to the gap func-tions, we examine what the key information looks like inthe following.

Our belief is that if one simplified EBSF do containmostly the key information relevant to the gap functions,the complexity of that simplified EBSF has to be of thesame degree as that of the gap functions. Furthermore,the SL machine, trained by the simplified EBSFs andcorresponding gap functions, will restore the good per-formance. In order to systematically control the complex-ities, here we use an AE to generate new EBSFs as wellas the gap functions with desired complexity (or latentdimension) for our SL trainings. The way to generate

Page 6: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

6

(a) p-EBSF

EBSF

Δ𝜔

Z𝜔

5

0

0

0

2

0𝜔 (meV)

10 20 30 40 50 60

10

1

(b) gt-EBSF

EBSF

Δ𝜔

Z𝜔

5

0

0

0

2

0𝜔 (meV)

10 20 30 40 50 60

10

1

FIG. 7: The results based on the p-EBSF (a) and the gt-EBSF (b). The first panel of each subplot is the EBSF and thefollowing two panels are gap-related functions. The differencebetween two EBSFs is mainly small-scale fluctuations, whichturn out to be not physically relevant as the resulting gap-related functions from these two EBSFs are similar.

new datasets is given in Appendix F. As an example, wetrain an AE to generate new dataset {v2} with dL = 2(The blue curve in Fig. 6(b) is one sample.). The {v2}dataset, containing the gap functions {g2} and EBSFs{f2}, is then used to train a new SL machine and theresults are shown in Fig. 8(b).

The performance is good for two reasons. Firstly, thelatent dimension of the EBSFs is reduced to two, similarto that of the gap functions. The second and most im-portant reason is that the EBSFs and gap functions areco-transformed so that relevant connection between themare kept by the AE. If these two functions are separatelytransformed by the AE the performance is lowered. Inother words, via an AE, the latent dimension of the EB-SFs is reduced from 32 to 2 and the co-transformationwith the gap functions makes the 2-D latent space of theEBSFs equal to the 2-D latent space of the gap func-tions. We also check that the AE-transformed EBSFs re-ally lead to, by the self-consistent calculations, the new,co-transformed gap functions for any latent dimension.That is, an AE can preserve the connection between thegap functions and EBSFs and transfer this connectionto the new dataset no matter what latent dimension is

(a) dL = 1

0.0 3.3 6.6 9.9(meV)

0.0

0.2

0.4

0.6

0.8

1.0

EBSF

0.0 3.3 6.6 9.9(meV)

0.0

0.1

0.2

0.3

0.4

EBSF

predictionlabel

(b) dL = 2

0.0 3.3 6.6 9.9(meV)

0.00.51.01.52.02.53.0

EBSF

0.0 3.3 6.6 9.9(meV)

0.0

0.5

1.0

1.5

2.0

EBSF

(c) dL = 4

0.0 3.3 6.6 9.9(meV)

0.000.250.500.751.001.251.501.75

EBSF

0.0 3.3 6.6 9.9(meV)

0.000.250.500.751.001.251.501.75

EBSF

(d) dL = 8

0.0 3.3 6.6 9.9(meV)

0.00.20.40.60.81.01.21.4

EBSF

0.0 3.3 6.6 9.9(meV)

0.20.00.20.40.60.81.01.21.4

EBSF

(e) dL = 16

0.0 3.3 6.6 9.9(meV)

0.0

0.2

0.4

0.6

0.8

EBSF

0.0 3.3 6.6 9.9(meV)

0.00.20.40.60.81.0

EBSF

(f) dL = 32

0.0 3.3 6.6 9.9(meV)

0.0

0.2

0.4

0.6

0.8

EBSF

0.0 3.3 6.6 9.9(meV)

0.000.250.500.751.001.251.501.75

EBSF

FIG. 8: The labels (orange dashed curves) and predictions(blue solid curves) for the SL trainings using data generatedfrom the AE with different latent dimensions dL = 1 ∼ 32.The performance of the SL training increases with decreasingdL’s. The difference in the complexities between gap func-tions and EBSFs decreases with decreasing dL, leading to theimproved SL training performance.

chosen. In the appendix E, we provide evidences show-ing that the EBSFs predicted by our AE-follow-up SLmachine do contain information relevant to the gap func-tions for any latent dimensions. In the appendix A, weuse the connection preserved by the AE to numericallyshow the many-to-one mapping between EBSFs and gapfunctions claimed above.

Given that all newly generated datasets {v′ns} for anylatent dimensions n satisfy the ME equations, we have to

Page 7: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

7

determine which dL is the appropriate one to constructthe SL machine for predictions. The key is to find thelatent dimension of the gap functions dLg

. From the in-spection of Fig. 6, it is easy to find that dLg

= 2, whichis the minimal dimension where the AE-transformed gapfunctions g2 look similar to the original gap functionsg0. In other words, the phase space of {g2}, the col-lection of gap functions generated by a trained AE withdL = 2, may contain most of the phase space of {g0}. Onthe contrary, {g1} may contain a smaller portion of thephase space of {g0}, which can be seen, for instance, inFig. 6(a) as the reconstructed gap function has some mis-match with the original one. In Appendix D, we evaluatethe portion of the {g0} phase space occupied by {gn} withseveral different n’s to quantitatively determine dLg

= 2.For the cases using the DOS as input functions, similar

calculations and analysis have been performed and theappropriate latent dimension is one, instead of two for thegap functions. It is consistent with our observations asthe difference between DOS curves is mainly the positionsof the coherence peaks while the difference between gapfunctions can exist in much more different ways.

Given the good performance shown above, we can sum-marize our process in Fig. 1 for improving the perfor-mance of a general functional-mapping SL as following.First of all, we check the latent dimensions (or complex-ities) of both input and output functions dLi and dLo

respectively by AEs with different dL’s. Usually, the badperformance occurs when dLo > dLi , which will be dis-cussed next. Secondly we train an AE with the latentdimension dLi . During this second step, the complexi-ties of input and output functions are adjusted to be thesame and new dataset can be generated by the trainedAE. As the final step, we use newly generated datasets totrain the SL machine and use this well-trained machineto make predictions for gap functions from experiments.

IV. DISCUSSIONS AND CONCLUSIONS

The bad performance usually occurs when dLo > dLi .In this case, the machine has to learn to infer remarkablydifferent output functions (EBSFs in this work) basedon similar input functions (gap functions in this work).This is a difficult task and the machine usually ends upwith predicting the ”coarse grained” functions as men-tioned above. However, for the opposite situation, wheredLo

< dLi, remarkably different input functions are used

to infer similar output functions, which is a relativelyeasier task. Consider an extreme example where two dif-ferent input functions (q1 and q2) correspond to the sameoutput function (r1). Then, the machine only needs tolearn to infer r1 no matter which one of q1 and q2 isencountered. For a consistency check, we reverse theroles of input and output functions (that is, here, we useEBSFs to predict gap functions via a SL machine) andthe performance of the SL machine is good (not shown),which is consistent with our expectations.

In order to have good performance, the latent dimen-sion of the AE has to be chosen appropriately if initiallythe two latent dimensions of input and output functionsare largely different. For the simpler case shown in thefirst part of this work, two complexities (latent dimen-sions) are compatible to each other so that the perfor-mance is good and we do not need an extra AE. For thecomplex case shown in the second part of this work, thelatent dimension of the output functions (dLo = 32) ismuch larger than that of input functions (dLi = 2 forthe gap functions and dLi = 1 for the DOS functions).Therefore the best performance occurs when the latentdimension of the AE is chosen to be two (one) if thegap functions (DOS functions) are chosen as the inputfunctions.

Usually, it is difficult to have enough dataset directlyfrom experimental inputs for training a SL machine.Therefore, we theoretically generate datasets for train-ing. In order to include all possibilities encountered inthe real physical world, data is generated with maximumrandomness. However, this process can create extra com-plexity which is irrelevant to the physical properties of in-terest leading to bad performance as expected. Our pro-cess of complexity reduction serves as a way to extractphysically relevant information of the randomly gener-ated functions.

In this work, our method is applied to mainly BCS su-perconductivity but the extension to other types of orderparameters of superconductivity, such as d-wave SC orother unconventional ones, should be straightforward aslong as similar theoretical formalism connecting the gapfunctions and bosonic glue is established. Furthermore,this method can be generally applied to any function-to-function supervised-learning tasks with asymmetric com-plexities to improve the performance and obtain moreunderstanding on the data.

In summary, we predict the EBSF of a superconductorbased on the measured gap functions via the supervisedlearning aided by the autoencoder. When the latent di-mensions of input and output functions are compatible,the performance is good. When these two latent dimen-sions are remarkably different (dLo

> dLi), we have to

reduce the latent dimension of the complex one to becompatible to the simple one so that the good perfor-mance restores. This reduction can extract physicallyrelevant information. Therefore, this method paves theways to improving the performance of general functionmappings with asymmetric complexities and to extract-ing meaningful information in data from physics or otherfields.

The authors thank Prof. Ting-Kuo Lee for fruitful dis-cussions. This work is supported by Ministry of Scienceand Technology (Grant No. MOST 109-2112-M-110-010and MOST 108-2112-M-110-013-MY3 ) and National SunYat-sen University.

Page 8: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

8

Appendix A: Many-to-one mapping

In order to demonstrate the many-to-one mapping, westatistically examine the difference between two EBSFs(say, the original f1 and the complexity-reduced f2) andthat between two corresponding gap functions (say, g1and g2). As the first step we randomly generate f1 andobtain, via ME self-consistent calculations, the gap func-tion g1. We then choose an AE-follow-up SL machinewith the latent dimension dL = n and feed this machinewith g1. The output is f2. As the next step, we obtaing2 self-consistently based on f2. Note that, both pairs((f1,g1) and (f2,g2)) are shown in the main text to sat-isfy the ME equations. This completes the generation ofone data (f1, f2, g1, g2) and, in the following, we de-fine and evaluate two differences: ∆f for EBSFs and ∆g

for gap functions using the Pearson correlation coefficient(PCC)82.

For each data, the PCC of f1 and f2 and that of g1and g2 are calculated as pf and pg, respectively. ThePCC will be equal to 1 if the two compared functions areexactly the same. Then two differences are defined as∆f = 1−pf and ∆g = 1−pg. We can then calculate theaverage of ∆f and ∆g over the dataset for each latentdimension n. As shown in table I that ∆f is roughlytwo orders of magnitude larger than ∆g, which meansremarkably different EBSFs correspond to similar gapfunctions, a many-to-one mapping.

TABLE I: The differences in the gap functions (∆g) andEBSFs (∆f ) in terms of the Pearson correlation coefficient(PCC). The difference in EBSFs is much larger than that ingap functions, the many-to-one mapping.

dL 2 4 8 16 32

∆g 0.00336 0.00451 0.0019 0.00234 0.002

∆f 0.107 0.084 0.0624 0.0623 0.0562

Appendix B: What the machine has learned fromsimple datasets

First of all we analyze how the performance is influ-enced by the neuron number of the layer next to the out-put layer (NTO layer, see Fig. 2). This neuron numberturns out to be the number of basis functions whose com-binations produce the output functions. We extract theweights between the NTO and output layers to find thesebases. Suppose the output function f(ωi) =

∑j wijaj ,

where aj is the value of the j-th neuron of the NTO layer.The bias vector is ignored in this discussion as it does notinfluence the performance. Then wij is the i-th elementof the j-th basis. Note that the weights between the lasttwo layers can be directly interpreted as bases becausethe activation function of the output layer is set to belinear, instead of nonlinear functions such as ReLU. Wedemonstrate cases when the NTO layer has different neu-

(a) 2 neurons

0 175 350 525 7000.100.050.000.050.100.150.20

0 175 350 525 700

0.20.10.00.1

(b) 4 neurons

0 175 350 525 7000.10

0.05

0.00

0.05

0.10

0.15

0.20

0 175 350 525 7000.05

0.00

0.05

0.10

0.15

0.20

0 175 350 525 7000.05

0.00

0.05

0.10

0.15

0.20

0 175 350 525 7000.10

0.05

0.00

0.05

0.10

0.15

0.20

(c) 6 neurons

0 175 350 525 7000.05

0.00

0.05

0.10

0.15

0 175 350 525 7000.0750.0500.0250.0000.0250.0500.0750.100

0 175 350 525 7000.10

0.05

0.00

0.05

0.10

0.15

0 175 350 525 7000.1000.0750.0500.0250.0000.0250.0500.0750.100

0 175 350 525 7000.0750.0500.0250.0000.0250.0500.0750.100

0 175 350 525 7000.150.100.050.000.050.100.150.20

(d) 8 neurons

0 175 350 525 7000.10

0.05

0.00

0.05

0.10

0.15

0 175 350 525 7000.150.100.050.000.050.100.150.20

0 175 350 525 7000.10

0.05

0.00

0.05

0.10

0.15

0 175 350 525 7000.1000.0750.0500.0250.0000.0250.0500.0750.100

0 175 350 525 7000.0750.0500.0250.0000.0250.0500.0750.100

0 175 350 525 7000.05

0.00

0.05

0.10

0.15

0.20

0 175 350 525 7000.10

0.05

0.00

0.05

0.10

0.15

0.20

0 175 350 525 7000.100.050.000.050.100.150.200.25

FIG. 9: The extracted basis functions for machines with dif-ferent neuron numbers in the NTO layer. Smooth curves areproper basis functions while noise-like curves indicate that themachine fails to find proper bases. The performance increaseswith the number of the proper bases so that the performanceof the four-neuron case (b) is better than the six-neuron case(c).

ron numbers in Fig. 9 so that each curve corresponds toone basis function the machine has learned from data.Smooth curves are proper basis found by the machinewhile noise-like curves, such as the lower-left panel in Fig.9(c), indicate that the corresponding neuron is impotent.The more proper bases found by the machine, the bet-ter the performance. For instance, for the four-neuroncase, the machine finds four proper bases so the perfor-mance is better than that in the six-neuron case wherethe machine only finds three proper bases, but worse thanthat in the eight-neuron case where the machine finds sixproper bases. The connection between the performanceand the number of proper basis can be further visual-ized from the training history. The four-neuron case isdemonstrated in Fig. 10. When a basis is found the lossfunction shows a drop and totally four proper bases (SeeFig. 9(b)) are found.

Appendix C: Numerical Calculation of theEliashberg equation

The effects of the EPC are manifest in the massrenormalization factor Z(ω) and the gap function ∆(ω),

Page 9: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

9

find 1st basis

find 2nd basis

find 3rd basis

find 4th basis

FIG. 10: The loss function for the four-neuron case during thetraining process. The loss function drops when the machinefinds one basis function.

which can be determined theoretically by the material-dependent EBSF α2F (ω). To have large enough datasetsfor the machine to learn, we generate various EBSFs ran-domly which suffice for our need of data. The ME equa-tions (6) and (7) are then solved self-consistently by in-putting these generated EBSFs. Other quantities like Tand µ∗ are input parameters to be specified to solve theequations.

The generated EBSF must be continuous and positive-definite. We first create random functions along theone-dimensional ω space by the geometric Brownian mo-tion which is often used as the financial market pricesimulations76,77. The functions are then shifted to guar-antee the positive-definite behaviour, and are further en-forced to vanish at zero and large frequencies above theDebye cutoff. The generated functions are then numeri-cally smoothed to make themselves continuous. Finally,the functions can be mapped to the chosen low frequencyregion of interest.

In the whole work here, we map the generated EBSF tothe frequency range of 0 ∼ 10 meV, and set the tempera-ture T = 1.16K. The dimensionless empirical parameterµ∗ is fixed to be 0.1 for simplicity since its value doesnot change the result much. Note that the energy can beused as the common unit and other dimensional quan-tities are scaled correspondingly. For example, we canstill get the same result of the gap function and renor-malization if the frequency range is mapped to the rangeof 0 ∼ 50 meV, while T is chosen to be 5.8 K simultane-ously. Some randomly-generated EBSFs will not producesuperconductivity at the chosen temperature. We keeponly the nonzero ∆ data for training and testing.

In Eqs. (6) and (7), the Z and ∆ are coupled all to-gether with all Matsubara frequencies. In the numericalcalculations of the coupled equations, the terms are trun-cated at high frequencies since Z and ∆ are decaying tofixed values at high Matsubara frequencies. The summa-tion over different Matsubara frequencies ωn is truncatedto n ∈ (−M − 1,M) terms. The M value determines the

maximum Matsubara frequency included, which shouldbe large enough to ensure convergent results, and espe-cially high accuracies for the sake of analytical continua-tion to be performed later.

There always exists the solution ∆ = 0 which corre-sponds to the normal state. In numerical calculation weshould be careful of choosing the initial values and theconvergent criteria to find the superconducting solutions(∆ 6= 0) of interest. To solve the coupled equations self-consistently, we solve them iteratively. However, the sim-plest and straightforward iteration of a previous outputas the next input usually converges poorly. Typically thetrivial solution ∆ = 0 might be obtained, especially whenthe non-trivial solution is small. To perform the numer-ical iteration properly, we adopt the Broyden’s method,which is a quasi-Newton method to find roots of a set ofnonlinear equations f(x) = 0 by updating the Jacobianmatrix78. Here x = {∆(iωn), n = −(M + 1), . . . ,M} isthe root solution to be found. The Jacobian matrix Jwith (i, j) entry Jij = ∂fi

∂xjspecifies the first-order par-

tial derivatives in each small displacement. The n−thiteration is written in the form of

Jnδxn ' δfn, (C1)

where fn = f(xn), δfn = fn − fn−1, and δxn = xn −xn−1. The inverse of the the Jacobian matrix is updateddirectly as

J−1n = J−1n−1 +δxn − J−1n−1δfn

δxTnJ−1n−1δfn

δxTnJ−1n−1 (C2)

to minimize the Frobenius norm ||Jn − Jn−1||F , as sug-gested by Broyden78. With J−1 determined, the rootsof the equations are then improved by proceeding in theNewton direction:

xn+1 = xn − J−1n f(xn) (C3)

until the solution converges.Once the Z and ∆ are solved at the Matsubara fre-

quencies, the analytical continuation of these functionsto the real axis is then executed. We adopt the Padeapproximation, which requires sufficient and accurate N-points (here 2M+2 points) data at the imaginary axis74.In practice, the truncated M must be large enough to en-sure the maximum Matsubara frequency several times ofthe real frequency range under discussion5. Accordingto our energy unit choice, our real frequency ω of inter-est is about 0 ∼ 60 meV, above which the Z and ∆ areusually saturated to 1 and 0, respectively. Only the lowfrequency parts below 35 meV of Z and ∆ are used asthe input for the later machine learning. Our choice ofM = 350 corresponds to a Matsubara frequency ∼ 220meV, about 22 times of the EBSF domain range and sev-eral times of the input frequency range, sufficient to re-solve the low-frequency information. If Z(iωn), ∆(iωn),or φ(iωn) do not converge for the chosen M value, Mwill be doubled until they converge. After these quan-tities are analytically continued to the real axis, we also

Page 10: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

10

compare ∆(ω) and φ(ω)/Z(ω) for consistency check. Thetwo should be mathematically equal but may differ dueto the adopted Pade approximation or the numerical er-rors. We set the criteria to their difference and ratio toeach other at each frequency point, and only those datahaving both criteria passed beyond ω = 35 meV will bekept, since only the Z(ω) and ∆(ω) below 35 meV willbe used for later machine learning to find reasonable re-lations among functions.

Appendix D: Determination of the appropriatelatent dimension

For gap functions in the AE-generated {gn}, the per-formance (of predicting the EBSF) can be solely deter-mined by the loss function of the SL machine Sn, whichis trained using {gn}. How about gap functions otherthan the dataset {gn}? We expect that for gap functionsclose enough to any member of {gn}, the performance isalso determined by the loss function of Sn when Sn isused to make predictions. Therefore, to determine thecloseness of two data set, we can define a characteristicdistance ln between two functions within {gn} and claimone gap function is close enough to {gn} when its clos-est distance to {gn} is less than ln. For each membergni in {gn}, we can always find its closest partner withsmallest distance lni which is defined as the summationof the normalized absolute difference between two func-tions,

∑ω |gni−gnj |/

∑ω |gni|, over all frequencies. Then

ln is defined as the maximum of {lni}. We can then ex-pect that one gap function not belonging to {gn} canshare similar performance of Sn if the distance betweenthis gap function and its closest member in {gn}, lon, isless than ln. We can then estimate the correct rate of theprediction by the portion of the original {go} satisfyinglon ≤ ln to further evaluate the performance of {go} us-ing Sn. In Table II, we show this portion as well as theoptimized value of the loss function of Sn for differentdL = n.

TABLE II: The portion of the original dataset included inthe newly generated datasets with different latent dimensions(dL’s). The optimized value of the loss function for each dL isalso presented. For dL ≥ 2, the domains of newly generateddatasets can cover almost all of the original dataset, whichensures the high correct rate of the predictions.

dL 1 2 4 8 16 32

portion 63.71% 99.39% 99.97% 99.97% 99.98% 99.99%

loss 0.00001 0.0001 0.0038 0.015 0.025 0.029

Please see Fig. 8 to visualize the performance. Thebetter the performance, the smaller the loss function.The correct rate is close to 100% for dL ≥ 2 and the per-formance decreases with increasing dL. Therefore, thebest choice will be dL = 2, consistent with naıve inspec-tions of Fig. 6.

32168421 p‐EBSF

p=0 p=2p=‐1p=1

100%

‐100%

0

50%

‐50%

FIG. 11: The difference in different moments (p) between old(original) and newly SL-predicted EBSFs. We study the caseswith dL = 1, 2, 4, 8, 16, 32 as well as the p-EBSF. The errorbars are standard deviations from the whole datasets. Statis-tically, differences in these moments between original and newEBSFs can be viewed as zeros, indicating that original andnew EBSFs share key information about the gap functions.

Appendix E: Relevant properties shared by originaland newly generated datasets

Since often superconducting properties, such as thecritical temperature, can be estimated by some mo-ments of the EBSFs without diving into the spec-trum details79–81, we calculate the p-th moments µp =∫ωpf(ω)dω carrying relevant information for compari-

son, where f(ω) = α2F (ω) and p = 2, 1, 0, and -1 iscalculated. Therefore, we can check if the original andSL-predicted EBSFs share the same properties. We cal-culate µp,o and µp,n which are p-th moment of old (orig-inal) and new (SL-predicted, dL = n) EBSFs, respec-tively. These new EBSFs are obtained in three steps. Atthe first step, we obtain AE-transformed gap and EBSFs(See Appendix F). At the second step, transformed gapfunctions and EBSFs are used to train a SL machine Sn.At the third step, we feed the old gap functions into Snto predict the new EBSFs and compare them to the oldEBSFs. We then take the mean (m) and standard devi-

ation (std) of the difference ofµp,o−µp,n

µp,oover the whole

dataset. The results, as shown in Fig. 11, show that theold and new EBSFs do share relevant properties. We alsocompare the old EBSFs to the p-EBSF functions, whichis consistent with Fig. 7.

Appendix F: Generating new functions

In this appendix, we provide detailed procedures ofgenerating new EBSFs and gap functions for given dL =n, which are used to train a SL machine. Consider oneoriginal gap function go and EBSF fo as two vectors andwe concatenate these two vectors to one combined vec-tor vo, the first part of which is the EBSF fo and thesecond part of which is the gap function go. We thentrain a AE with dL = n, denoted as An, by using thedataset {vo} until the reconstruction error is optimized(or minimized). After the training is completed, we againfeed {vo} to this trained AE and obtain the output {vn}.Then the first part of each vector vn is the new EBSFfn and the second part of it is the new gap function gn.

Page 11: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

11

Then we can use this dataset, {gn} and {fn}, to perform the SL training mentioned in the main text.

∗ Electronic address: [email protected] H. K. Onnes, Comm. Leiden. April 28, (1911); Comm.

Leiden. May 27, (1911); Comm. Leiden. Nov. 25, (1911).2 J. Bardeen, L. N. Cooper, J. R. Schrieffer, Phys. Rev. 106,

162 (1957); Phys. Rev. 108, 1175 (1957).3 A. B. Migdal, Sov. Phys. JETP 34, 996 (1958).4 F. Marsiglio and J. P. Carbotte: Electron-Phonon Super-

conductivity in K. H. Bennemann and J. B. Ketterson(eds.): Superconductivity (Springer, 2008).

5 J. P. Carbotte, Rev. Mod. Phys. 62, 1027 (1990).6 G. M. ’Eliashberg, Sov. Phys. JETP 11, 696 (1960); Sov.

Phys. JETP 12, 1000 (1961).7 P. Morel & P. W. Anderson, Phys. Rev. 125, 1263 (1962).8 J. G. Bednorz; K. A. Mueller, Z. Phys. B. 64, 189 (1986).9 Y. Kamihara, H. Hiramatsu, M. Hirano, R. Kawamura, H.

Yanagi, T. Kamiya & H. Hosono, J. Am. Chem. Soc. 128,10012 (2006)

10 Y. Kamihara, T. Watanabe, M. Hirano & H. Hosono, J.Ame.Chem. Soc. 130, 3296 (2008).

11 K.H.Bennemann and J.B.Ketterson, Superconductiv-ity Conventional and Unconventional Superconductors-Volume1 (2008).

12 Y. LeCun et al., Nature 521, 436 (2015)13 A. Dey, IJCSIT 7,1174 (2016)14 M. R. Minar and J. Naher, arXiv:1807.0816915 A. Shrestha and A. Mahmood, ”Review of Deep Learning

Algorithms and Architectures,” in IEEE Access, vol. 7, pp.53040-53065, 2019.

16 G. Montufar, arXiv:1806.0706617 G. Dong et al., ”A Review of the Autoencoder and Its Vari-

ants: A Comparative Perspective from Target Recognitionin Synthetic-Aperture Radar Images,” in IEEE Geoscienceand Remote Sensing Magazine, vol. 6, no. 3, pp. 44-68,Sept. 2018.

18 G. Carleo et al., Rev. Mod. Phys. 91, 045002 (2019)19 S. Das Sarma et al., Physics Today 72,3,48 (2019)20 P. Mehta et al., Physics Reports 810, 1 (2019)21 L.-F. Arsenault et al, Phys. Rev. B 90, 155136 (2014)22 G. Torlai and R. G. Melko, Phys. Rev. B 94, 165134 (2016)23 A. Lopez-Bezanilla and O. Anatole von Lilienfeld, Phys.

Rev. B 89, 235411 (2014)24 F. A. Faber et al., Phys. Rev. Lett. 117, 135502 (2016)25 J. Carrasquilla and R. G. Melko, Nat. Phys. 13, 431 (2017)26 T. Ohtsuki and T. Ohtsuki, J. Phys. Soc. Jpn. 85, 123706

(2016)27 S. S. Schoenholz et al., Nature Physics 12, pages 469471

(2016)28 L. Wang, Phys. Rev. B 94, 195105 (2016)29 E. P. L. van Nieuwenburg et al., Nature Physics 13, pages

435439 (2017)30 S. J. Wetzel, Phys. Rev. E 96, 022140 (2017)31 K. Ch’ng et al, PHYSICAL REVIEW X 7, 031038 (2017)32 K. Ch’ng et al., PHYSICAL REVIEW E 97, 013306 (2018)33 F. Schindler et al., PHYSICAL REVIEW B 95, 245134

(2017)34 A. Tanaka and A. Tomiya, J. Phys. Soc. Jpn. 86, 063001

(2017)35 A. Morningstar et al., Journal of Machine Learning Re-

search 18 (2018) 1-1736 P. Huembeli et al., Phys. Rev. B 97, 134109 (2018)37 Y.-H. Liu et al., Phys. Rev. Lett. 120, 176401 (2018)38 Y. Zhang et al., Phys. Rev. B 96, 245119 (2017)39 Y. Zhang and Eun-Ah Kim, Phys. Rev. Lett. 118, 216401

(2017)40 E. van Nieuwenburg et al., Phys. Rev. B 98, 060301(R)

(2018)41 B. S. Rem et al., arXiv:1809.0551942 P. Zhang et al., Phys. Rev. Lett. 120, 066401 (2018)43 N. Sun et al., Phys. Rev. B 98, 085402 (2018)44 Q. Zhou et al., Proceedings of the National Academy of

Sciences Jul 2018, 115 (28) E6411-E641745 V. Tshitoyan et al., Nature 571, 9598 (2019)46 Y. Zhang et al., NATURE COMMUNICATIONS —

(2019) 10:526047 C. Ma et al., npj Computational Materials (2020) 6:4048 A. A. Melnikov et al., PNAS, 2018 115 (6) 1221-122649 M. Bukov et al., Phys. Rev. X 8, 031086 (2018)50 J. M. Arrazola et al., Quantum Sci. Technol. 4 (2019)

02400451 M. Bukov, Phys. Rev. B 98, 224305 (2018)52 R. Nichols et al., Quantum Sci. Technol. 4 (2019) 04501253 J. Timoshenko et al., Phys. Rev. Lett. 120, 225502 (2018)54 M. R. Carbone et al., PHYSICAL REVIEW LETTERS

124, 156401 (2020)55 M. R. Carbone et al., PHYSICAL REVIEW MATERIALS

3, 033604 (2019)56 O. Isayev et al., Chem. Mater. 2015, 27, 73557 T. Konno et al., arXiv:1812.0199558 S. Li et al., Symmetry 2020, 12, 26259 Y. Liu et al., MTAEC9, 52(5)639(2018)60 T. O. Owolabi et al., Applied Soft Computing 43 (2016)

14314961 K. Hamidieh, Computational Materials Science 154 (2018)

34635462 V. Stanev et al., npj Computational Materials (2018) 4:2963 K. Matsumoto and T. Horide, Applied Physics Express 12,

073003 (2019)64 S. Zeng et al., npj Computational Materials (2019) 5:8465 S. R. Xie et al., PHYSICAL REVIEW B 100, 174513

(2019)66 Y. Dan et al., in IEEE Access, vol. 8, pp. 57868-57878,

202067 T. D. Le et al., in IEEE Transactions on Applied Super-

conductivity, vol. 30, no. 4, pp. 1-5, June 2020, Art no.8600105

68 C. J. Court and J. M. Cole, npj Computational Materials(2020) 6:18

69 B. Meredig et al., Mol. Syst. Des. Eng., 2018,3, 81970 Y. Yamaji et al., arXiv:1903.0806071 L. P. Gorkov, Zh. Eksp. Teor. Fiz. 34, 735 (1958) [Sov.

Phys. JETP 7, 505 (1958)]72 Y. Nambu, Phys. Rev. 117, 648 (1960)73 E. R. Margine and F. Giustino, Phys. Rev. B 87, 024505

(2013).74 H. J. Vidberg and J. W. Serene, J. Low Temp. Phys. 29,

179 (1977); C. R. Leavens and D. S. Ritchie, Solid State

Page 12: arXiv:2007.07512v1 [physics.comp-ph] 15 Jul 2020

12

Communication, 53, 137, (1985).75 D. P. Kingma and Jimmy Ba, arXiv:1412.698076 K. Reddy & V. Clinton, Aus. Acc. Bus. Fin. J., 10, 3

(2016).77 S. N. Z. Abidin & M. M. Jaffar, App. Math. Info. Sci., 8,

107 (2014).78 C. G. Broyden, Math. Comp., 19, 577 (1965).

79 P.B. Allen and R.C. Dynes, Phys. Rev. B12 905 (1975).80 R.C. Dynes, Solid State Commun. 10 615 (1972).81 W.L. McMillan, Phys. Rev. 167 331 (1968).82 K. Pearson (20 June 1895). ”Notes on regression and in-

heritance in the case of two parents”. Proceedings of theRoyal Society of London. 58: 240242