generative adversarial imitation learning
Embed Size (px)
TRANSCRIPT

Score-Based Models
LECTURE 14
CS236: Deep Generative Models

Summary

Score-based models• A model that represents the score function
• Score estimation: training the score-based model from datapoints• Score matching
• Not scalable for training deep score-based models.
s✓(x)<latexit sha1_base64="TOksaX90IOebB1H9zWpmmQdkvIA=">AAAH2HicfZVbb9MwFMezcekYtw0eeQlUE2OCqBkIeNyFSUxiYkhbNzGXyXHsxqodR7bbtYsi8YZ45YlX+Ax8Gb4NTpM2a1xhKfLR+f19fHwc20HCqNKt1t+FxWvXb9xsLN1avn3n7r37K6sP2kr0JcLHSDAhTwOoMKMxPtZUM3yaSAx5wPBJ0NvN+ckAS0VFfKRHCe5w2I0poQhq4/qizoGOsIbrICDDZ+crzZbXGjfXNvzSaDplOzxfvfkHhAL1OY41YlCpM7+V6E4KpaaI4WwZ9BVOIOrBLj4zZgw5Vp10nHbmrhlP6BIhzRdrd+y9OiIdFsIZH+RKjXhgRnOoI1VnuXMeO+tr8raT0jjpaxyjYnLSZ64Wbl4XN6QSI81GxoBIUpO/iyIoIdKmessgxhdIcA7jMAUwUNmZ30kBM5XVTR/IvM9mRUdZCmhsBiPIaogHpBifZxmQtOlnNcVkgkCwMF+tYHNEPAiuhAkKxaykH2YFljwNDV27QjcMLaBI0gw8vqqbDZJMgxgSYgLCAMpp3HGfl+rp5mvAe1jGL/xN3netOBdwgE0kRXl9IXmJyjhWsWJhpi2qjYkGrF2WnHYjDWS7KPw7bP4+iQ9MhI8JllALuWH2SXZVnrzpwfPc+o+QxqQU5lZtN8gwm27W0Norsl3RbZvuVHTHpicVPbFpu6Jtmx5U9MCmQUUDmw4qOrDpZUUvbbpf0X2b6orqOZGxFJWgVRcQCXu85LmdfrBDmJ8oPyEpwImiTMTZnBzyO20sqw5QcdHNEUOWRJa4cNriJKJ1ae6yhaNqlSObqooqm0YVjbLyzOGuWTBPohR73QyIGIdCl2hvgvYsRHGJqIdraH+C9i2ESIkQqZHdCdmtE6zRJEGNamygSjRQNXIhdXrhSU/PukNB0tATHpl1wx5ModfzYG2RNDTro15YzwiyaUquuVim2Lxzfv1Vs432pue/9FqfXjW3dsoXb8l55Dxx1h3feeNsOe+dQ+fYQY50fjq/nN+Nz42vjW+N74V0caEc89CZaY0f/wD9b+lT</latexit>
𝑥!
𝑥"𝑥#
𝑠!,#
𝑠!,$
𝑠!,%

Denoising score matching
• Pros:• Much more scalable than score matching; • connection to optimal denoising.
• Con:
Perturbation distribution/kernel
Data distribution
Noise-perturbed data distribution

Sliced score matching
• Projection distribution can be Gaussian or Rademacher• Pros:
• Much more scalable than score matching; • Estimates the true data score.
• Con: Slower than denoising score matching.

Score-based generative modeling
score matching
Scores New samplesData samples
Langevin dynamics

Pitfalls• Manifold hypothesis. Data score is undefined.
• Score matching fails in low data density regions
• Langevin dynamics fail to weight different modes correctly
Data pointsrx log pdata(x)
<latexit sha1_base64="DbRlY6Chp6fWi5fEPqWUhYsubBg=">AAAH8HicfZXdjtQ2FMcDBYYuBZZy2ZvAFPEhiCbbquUSdkFiJValEjuLtB6NbMeeWOOPyPbMzmDlBfoEvUPccsUFN/AmfZs6m8xkJ15hKcrR+f19fM5xHKOCM2MHg/8uXPzh0uUrvas/bl376fqNm9u3fh4aNdOYHGLFlX6HoCGcSXJomeXkXaEJFIiTIzTdq/jRnGjDlHxrlwUZCTiRjDIMrXeNt38FEiIOxwDRRQy4msTFGFiysC6DFpYPKv/D8XZ/kAxORxwaaWP0o2a8Gd+68gVkCs8EkRZzaMxxOijsyEFtGeak3AIzQwqIp3BCjr0poSBm5E7LKeN73pPFVGn/SBufes/OcItauOGDwpilQH62gDY3XVY5z2PHM0ufjhyTxcwSievF6YzHVsVVv+KMaYItX3oDYs18/jHOoYbY+q5uAUlOsBICyswBiEx5nI4c4L7jtp8CXb3LTdHb0gEm/WQMeQcJROv5VZaIun5adhSrBZDiWVWt4ueIBEJnwqBasSmZZWWNtXCZp/fO0Eee1lAVrgR3zuo2gxTrIJ5khIIMQb2Oe/quWnV/5w8gpkTLJ+mOmMVBnBM4Jz6SYaJbSNWiJk7QLKn8snW3CbWAD5uWs0lugR7WjX9B/NenyYGP8FdBNLRKP/L7pCemSt6/wePK+o6QSdoIK6uzG3RRrjdrEewVfd7S5yHdbeluSI9aehTSYUuHIT1o6UFIUUtRSOctnYf0fUvfh3S/pfshtS2150QmWrWCQVdANZyKhle2ex2G8B9RdUIcIIVhXMnynBxy4n9qbuMAgdoZiiEv8kBcO0NxkbOutHKFwmVb5TKkpqUmpHlL87I5c2TiCxZF7kgyKYGSJFO2QS9X6GWAGGkQS0gH7a/QfoAwbRCmHbK3IntdQixeJWhxh81Ng+amQ060dSeJTuymO1PUZYlK6KYbTqGDyTSBnSJZ5utjSdbNCPJ1SrH/sayxv+fS7q0WGsOdJP0tGfz9e//ZbnPjXY1+ie5GD6I0+jN6Fr2K3kSHEY7+iT5HX6NvPd37t/eh97GWXrzQzLkdbYzep/8BLBTy0Q==</latexit>

Gaussian perturbation• The solution to all pitfalls: Gaussian perturbation!
• Manifold + noise
• Score matching on noisy data.
N (0; 0.0001)<latexit sha1_base64="vEY8ql2iNttGkThU0QQkbKB7+rI=">AAAH33icfZXPb9MwFMezAesYvzZ25BKoJsYEUTIQIHHZDyZRicGQtm7SXE2O4zRW4ziynW6dlTM3xJUTV7jzz/Df4DTp0sYTlio/vc/Xz+89x7WfxkRI1/07N3/j5q2F1uLtpTt3791/sLzysCtYxhE+Qixm/MSHAsckwUeSyBifpBxD6sf42B/sFvx4iLkgLDmUoxT3KOwnJCQISu06W14FFMFYfcrX3Xe267iu6z07W26PLT1s0/Aqo21V4+BsZeEPCBjKKE4kiqEQp56byp6CXBIU43wJZAKnEA1gH59qM4EUi54aZ5/ba9oT2CHj+pdIe+ydXqEuSuGMD1IhRtTXqymUkWiywnkdO81k+LanSJJmEieo3DzMYlsyu2iPHRCOkYxH2oCIE52/jSLIIZK6iUsgweeIUQqTQAHoi/zU6ykQ6wbLtgd4MeezosNcAZLoxbrLDUT9sFxfZOmHqu3lDcVkA5/FQVEti68RUd+fCuOXillJFuQl5lQFmq5N0Q1NS8hSlYPH07rZIOlVEE0CHILAh/wq7nguWvV08zWgA8yTF94mzWwjzjkcYh1JENospGhRFcdoVsL0tmW3cShB3K1aTvqRBLxbNv491l8fx/s6wucUcygZ39DnxPuiSF7P4Hlh/UdIkrASFlbjNMKL/OqwLoyzCrdrum3SnZrumPS4pscm7da0a9L9mu6b1K+pb9JhTYcmvazppUk7Ne2YVNZUXhMZc1YL3KYg5HBAK17Y6qMZQn9ExQ1RAKeCxCzJr8khwhKOZfUFAqXTFMM4jQxx6TTFaUSa0sJlCkd1lSOTipoKk0Y1jfLqzuG+LpimkcJOPwcswQGTFdqboD0DEVwh4uAG6kxQx0AorBAKG2R3QnabBEs0SVCiBhuKCg1Fg5xzqc4d7shZd8BCFTjMCWfdcAAVdAYObBRJAl0fcYJmRvp9m6Rk6z+WK6zfOa/5qplGd9PxXjrul1ftrZ3qxVu0HllPrHXLs95YW9YH68A6spA1sn5av6zfLdj62vrW+l5K5+eqNavWzGj9+AdaWOm/</latexit>
CIFAR-10 Noisy CIFAR-10

Challenge in low data density regions
Inaccurate Inaccurate
Song and Ermon. “Generative Modeling by Estimating Gradients of the Data Distribution.” NeurIPS 2019.

Adding noise to data
Accurate Accurate
Song and Ermon. “Generative Modeling by Estimating Gradients of the Data Distribution.” NeurIPS 2019.
Provide useful directional information for Langevin MCMC.

Multi-scale Noise Perturbation• Trade-off
• Multi-scale noise perturbations.�1 > �2 > · · · > �L�1 > �L
<latexit sha1_base64="QUR8Lkz7u2GbfaIh0slXeHTikyA=">AAAIAnicfZXdbtMwFMczPlYYXxtcchOoJhBiUVMQcIXGBhKTNgES65DmanIcp7Fqx5Htduus3PEYPAF3CC654hZegLfBadJ6jScsVTk9v7+PzzmO4yinRKpO5+/ShYuXLi+3rlxduXb9xs1bq2u3e5KPBML7iFMuPkVQYkoyvK+IovhTLjBkEcUH0XC75AdjLCTh2Uc1yXGfwUFGEoKgMq6j1WdAkgGDR6H/0q/NbmmimCtpfXp3Iyzs392j1XYn6EyH7xphbbS9erw/Wlv+AWKORgxnClEo5WHYyVVfQ6EIorhYASOJc4iGcIAPjZlBhmVfTwss/HXjif2EC/PLlD/1np2hTyrhgg8yKScsMrMZVKlsstJ5HjscqeRFX5MsHymcoWrxZER9xf2yg35MBEaKTowBkSAmfx+lUECkTJ9XQIaPEWcMZrEGMJLFYdjXgJo9UO0QiPJZLIo+FhqQzExGkDYQi5JqfplllOh2WDQUswUiTuOyWk7PEbEoOhMmqhSLklFcVFgwHRu6foY+MrSCPNcFuHdWtxgknwcxJMYJiCMo5nGnz7JVD7rPABtikW2EXTbynTjHcIxNJElYs5CyRXUcp1kZN8tW3caJArRXt5wMUgVEr2r8a2zePoH3TIR3ORZQcfHI7JMYyDJ58wSPS+s/QpIltbC0GruRnBTzzTpx9ip5Zekrl25ZuuXSA0sPXNqztOfSPUv3XBpZGrl0bOnYpaeWnrp0x9IdlypL1TmRseBW0GkKEgGHrOalrXfdEOYlKk+IBjiXhPKsOCeHFCs4ldkDBCqnK4Y0Tx1x5XTFeUqa0tLlCie2yolLpaXSpamlaVGfOTwwBbM81TgYFIBn2HzKa/Rmht44iOAakQA30M4M7TgIJTVCSYNsz8h2k2CFZgkq1GBjWaOxbJBjofRxIAK16I55ouOAB8miGw6hhsEwgI0iSWzqI0HczAjSeUq++bDMsbnnwuat5hq9bhA+CTofnrY3t+ob74p317vvPfRC77m36b313nv7HvK+eL+8396f1ufW19a31vdKemGpnnPHWxitn/8AD5L31Q==</latexit>
…

Trading off Data Quality and Estimation Accuracy
Worse data quality! Better score estimation!
(Red encodes error)

Using multiple noise scales

Annealed Langevin Dynamics: Joint Scores to Samples• Sample using sequentially with Langevin dynamics.• Anneal down the noise level.• Samples used as initialization for the next level.
�1,�2, · · · ,�L<latexit sha1_base64="4U2iI4Wry6eYJzayVSyTAOzvKvA=">AAAH8HicfZXdbtMwFMez8dExvja45CZQJtA0oqYg4HIfTKLSJoa0tZPmanIcp7Fqx5HttuuivABPwB3ilisuuIE34W1wmrRp4wpLUY7O7+/jc47j2IspkarR+LuyeuPmrdu1tTvrd+/df/BwY/NRW/KBQPgMccrFuQclpiTCZ4oois9jgSHzKO54/YOMd4ZYSMKjUzWOcZfBXkQCgqDSrsuN50CSHoOX7o5dWE1tIZ8rOfMcXW7UG05jMmzTcAujbhXj5HLz9i/gczRgOFKIQikv3EasugkUiiCK03UwkDiGqA97+EKbEWRYdpNJOam9pT2+HXChn0jZE+/8jOQqFy74IJNyzDw9m0EVyirLnMvYxUAF77sJieKBwhHKFw8G1Fbczvpl+0RgpOhYGxAJovO3UQgFREp3dR1EeIQ4YzDyEwA9mV643QRQ3XFVd4HI3umi6DRNAIn0ZARpBTEvyOdnWXpBUnfTimK6gMepn1XL6RIR87y5MF6uWJQM/DTHgiW+pltzdFvTHPI4ScHTed1ikHgWRBMfB8D3oJjFnbyzVr1ovgWsj0X0ym2ygW3EGcEh1pEkYdVCshYVcYxmRVwvm3cbBwrQdtFy0gsVEO288R+w/voEPtYRPsVYQMXFtt4n0ZNZ8voNdjLrP0ISBYUwsyq7EVyls826MvYq2Cvpnkn3S7pv0k5JOyZtl7Rt0uOSHpvUK6ln0mFJhya9Lum1SVslbZlUlVQtiYwFLwWNqiAQsM8KntnJkRlCf0TZCUkAjiWhPEqX5BBiBSey8gCB3GmKIY1DQ5w7TXEckqo0c5nCcVnl2KSypNKkYUnDtDhzuKcLZnGYYKeXAh5h/esu0OEUHRqI4AIRB1dQa4paBkJBgVBQIQdTclAlWKFpggpV2FAWaCgrZCRUMnKEoxbdPg8S3+FOsOiGfZhAp+/ASpHE1/URx69mBOksJVv/WGZY33Nu9VYzjXbTcV87jc9v6rv7xY23Zj2xnlkvLdd6Z+1aH60T68xC1hfrp/Xb+lMTta+1b7XvuXR1pZjz2FoYtR//ADvL8Z8=</latexit>

Annealed Langevin dynamics

Comparison to the vanilla Langevin dynamics
Langevin dynamics Annealed Langevin dynamics

Joint Score Estimation via Noise Conditional Score Networks
𝑥!
𝑥"
𝜎
𝑠#,!
𝑠#,"
�3<latexit sha1_base64="Ul4/830jDTVn09Az1qNMeFxTM+4=">AAAH0nicfZVbb9MwFMezcekYtw0eeQlUE2iCqNkQ8LgLk6jExJC2dlJdTY7jNFbtOLKddl2UB8QrT7zCx+DL8G1wmrRp4wpLlY/O7+/jc47j2ospkarV+ru2fuv2nbuNjXub9x88fPR4a/tJR/JEIHyBOOXi0oMSUxLhC0UUxZexwJB5FHe94XHOuyMsJOHRuZrEuM/gICIBQVBp1yWQZMDg1f7VVrPltKbDNg23NJpWOc6utu/+AT5HCcORQhRK2XNbseqnUCiCKM42QSJxDNEQDnBPmxFkWPbTacKZvaM9vh1woX+RsqfexRXpdSFc8kEm5YR5ejWDKpR1ljtXsV6igg/9lERxonCEis2DhNqK23lHbJ8IjBSdaAMiQXT+NgqhgEjpvm2CCI8RZwxGfgqgJ7Oe208B1T1VTReIfM6WRedZCkikFyNIa4h5QbE+z9IL0qab1RSzDTxO/bxaTleImOcthPEKxbIk8bMCC5b6mu4s0F1NC8jjNAPPF3XLQeJ5EE18HADfg2IedzrnrXq59w6wIRbRG3ePJbYRZwxHWEeShNULyVtUxjGaFXG9bdFtHChAO2XLySBUQHSKxn/E+usT+FRH+BJjARUXu/qcxEDmyesZvM6t/whJFJTC3KqdRnCdzQ/r2jir4LCihyY9quiRSbsV7Zq0U9GOSU8rempSr6KeSUcVHZn0pqI3Jm1XtG1SVVG1IjIWvBK06oJAwCEreW6nn80Q+iPKb0gKcCwJ5VG2IocQKziVVRcIFE5TDGkcGuLCaYrjkNSlucsUTqoqJyaVFZUmDSsaZuWdwwNdMIvDFDuDDPAI+1yV6GSGTgxEcImIg2uoPUNtA6GgRCiokeMZOa4TrNAsQYVqbCRLNJI1MhYqHTvCUctunwep73AnWHbDIUyhM3RgrUji6/qI49czgnSekq3/WOZYv3Nu/VUzjc6e4+47ra9vmwdH5Yu3YT2zXlivLNd6bx1Yn6wz68JCFrV+Wr+s343zxk3jW+N7IV1fK9c8tZZG48c/otLm5Q==</latexit>
Noise Conditional Score Network
(NCSN)
�1<latexit sha1_base64="18mNNkjNU+gUJ40V0h1QgIf0VJA=">AAAH0nicfZXdbtMwFMezAesYXxtcchOoJtAEUTMQcLkPJlGJiSFt7aS6mhzHaazacWQ77booF4hbrriFx+BleBucJm3auMJS5aPz+/v4nOO49mJKpGq1/q6t37p9Z6OxeXfr3v0HDx9t7zzuSJ4IhC8Qp1xcelBiSiJ8oYii+DIWGDKP4q43PM55d4SFJDw6V5MY9xkcRCQgCCrtugSSDBi8cq+2my2nNR22abil0bTKcXa1s/EH+BwlDEcKUShlz23Fqp9CoQiiONsCicQxREM4wD1tRpBh2U+nCWf2rvb4dsCF/kXKnnoXV6TXhXDJB5mUE+bp1QyqUNZZ7lzFeokKPvRTEsWJwhEqNg8Saitu5x2xfSIwUnSiDYgE0fnbKIQCIqX7tgUiPEacMRj5KYCezHpuPwVU91Q1XSDyOVsWnWcpIJFejCCtIeYFxfo8Sy9Im25WU8w28Dj182o5XSFinrcQxisUy5LEzwosWOprurtA9zQtII/TDDxb1C0HiedBNPFxAHwPinnc6Zy36sX+O8CGWESv3X2W2EacMRxhHUkSVi8kb1EZx2hWxPW2RbdxoADtlC0ng1AB0Ska/xHrr0/gUx3hS4wFVFzs6XMSA5knr2fwKrf+IyRRUApzq3YawXU2P6xr46yCw4oemvSookcm7Va0a9JORTsmPa3oqUm9inomHVV0ZNKbit6YtF3RtklVRdWKyFjwStCqCwIBh6zkuZ1+NkPojyi/ISnAsSSUR9mKHEKs4FRWXSBQOE0xpHFoiAunKY5DUpfmLlM4qaqcmFRWVJo0rGiYlXcOD3TBLA5T7AwywCPsc1Wikxk6MRDBJSIOrqH2DLUNhIISoaBGjmfkuE6wQrMEFaqxkSzRSNbIWKh07AhHLbt9HqS+w51g2Q2HMIXO0IG1Iomv6yOOX88I0nlKtv5jmWP9zrn1V800OvuO+8ZpfX3bPDgqX7xN66n13HppudZ768D6ZJ1ZFxayqPXT+mX9bpw3bhrfGt8L6fpaueaJtTQaP/4BlATm4w==</latexit>
�2<latexit sha1_base64="coJlohA62IECat4FT54d4jsLB40=">AAAH0nicfZXdbtMwFMezAesYXxtcchOoJtAEUVMQcLkPJlGJiSFt7aS6mhzHaazacWQ77booF4hbrriFx+BleBucJm3aeMJS5aPz+/v4nOO49mJKpGq1/q6t37p9Z6OxeXfr3v0HDx9t7zzuSp4IhM8Rp1xceFBiSiJ8roii+CIWGDKP4p43Osp5b4yFJDw6U9MYDxgcRiQgCCrtugCSDBm8bF9uN1tOazZs03BLo2mV4/RyZ+MP8DlKGI4UolDKvtuK1SCFQhFEcbYFEoljiEZwiPvajCDDcpDOEs7sXe3x7YAL/YuUPfMur0ivCuGKDzIpp8zTqxlUoayz3HkT6ycq+DBISRQnCkeo2DxIqK24nXfE9onASNGpNiASROdvoxAKiJTu2xaI8ARxxmDkpwB6Muu7gxRQ3VPVdIHI52xVdJalgER6MYK0hpgXFOvzLL0gbbpZTTHfwOPUz6vl9AYR87ylMF6hWJUkflZgwVJf090luqdpAXmcZuDZsm41SLwIoomPA+B7UCzizua8VS/a7wAbYRG9dtsssY04EzjGOpIkrF5I3qIyjtGsiOtti27jQAHaLVtOhqECols0/iPWX5/AJzrClxgLqLjY0+ckhjJPXs/gVW79R0iioBTmVu00gqtscVhXxlkFBxU9MOlhRQ9N2qtoz6TdinZNelLRE5N6FfVMOq7o2KTXFb02aaeiHZOqiqobImPBK0GrLggEHLGS53b62QyhP6L8hqQAx5JQHmU35BBiBWey6gKBwmmKIY1DQ1w4TXEckro0d5nCaVXl1KSyotKkYUXDrLxzeKgLZnGYYmeYAR5hn6sSHc/RsYEILhFxcA115qhjIBSUCAU1cjQnR3WCFZonqFCNjWWJxrJGJkKlE0c4atXt8yD1He4Eq244gil0Rg6sFUl8XR9x/HpGkC5SsvUfywLrd86tv2qm0W077hun9fVtc/+wfPE2rafWc+ul5VrvrX3rk3VqnVvIotZP65f1u3HWuG58a3wvpOtr5Zon1spo/PgHm2vm5A==</latexit>

Training noise conditional score networks• Which score matching loss?
• Sliced score matching?• Denoising score matching?
• Denoising score matching is naturally suitable, since the goal is to estimate the score of perturbed data distributions.
• Weighted combination of denoising score matching losses

Choosing noise scales• Maximum noise scale
• Minimum noise scale: should be sufficiently small to control the noise in final samples.

Choosing noise scales• Key intuition: adjacent noise scales
should have sufficient overlap to facilitate transitioning across noise scales in annealed Langevin dynamics.
• A geometric progression with sufficient length.

Choosing the weighting function• Weighted combination of denoising score matching losses
• How to choose the weighting function ? • Goal: balancing different score matching losses in the sum à

Training noise conditional score networks• Sample a mini-batch of datapoints• Sample a mini-batch of noise scale indices
• Sample a mini-batch of Gaussian noise• Estimate the weighted mixture of score matching losses
• Stochastic gradient descent• As efficient as training one single non-conditional score-based model

Experiments: Sampling

Experiments: sampling

High Resolution Image Generation

Using an infinite number of noise scales
: continuous index of perturbed distributions

Compact representation of infinite distributions
• Stochastic process à Marginal probability densities
• Stochastic differential equation:
Infinitesimal white noiseDeterministic drift

Score-based generative modeling via SDEs

Score-based generative modeling via SDEs
Time reversal Score function!

Score-based generative modeling via SDEs• Time-dependent score-based model
• Training:
• Reverse-time SDE
• Euler-Maruyama (analgous to Euler for ODEs)
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.

Predictor-Corrector sampling methods• Predictor-Corrector sampling.
• Predictor: Numerical SDE solver• Corrector: Score-based MCMC
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
predictor
corrector

Results on predictor-corrector sampling
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.

High-Fidelity Generation for 1024x1024 Images
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.

Probability flow ODE: turning the SDE to ODE
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
• Probability flow ODE (ordinary differential equation)
Score function

Solving Reverse-ODE for Sampling• More efficient samplers via black-box ODE solvers
• Exact likelihood though models are trained with score matching.
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
NFE = Number of score Function Evaluations
• Predictor-corrector:• > 1000 NFE
• ODE• ≈100 NFE

Solving Reverse-ODE for Sampling
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
models trainedwith score matching
black-box ODE Solvers for sampling

Probability flow ODE: latent space manipulation
Temperature scaling
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.

Probability flow ODE: uniquely identifiable encoding• Uniquely identifiable encoding
• No trainable parameters in the probability flow ODE!
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
Model 1
Model 2
Flow models, VAE, etc
Model 1
Model 2Score-based models via
probability flow ODE

Controllable Generation
• Conditional reverse-time SDE via unconditional scores
unconditional score,Trained w/o y
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
Trained separately orspecified with domain knowledge
Control signal

Controllable Generation: class-conditional generation
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
• is the class label• is a time-dependent classifier

Controllable Generation: inpainting
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
• is the masked image• can be approximated without training

Controllable Generation: colorization
Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole. “Score-Based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
• is the gray-scale image• can be approximated without training

43
Controllable Generation: colorization
Resolution: 1024x1024

44
Controllable generation: Text-guided generationAn abstract painting of computer
science:A painting of the starry night by van
Gogh
https://colab.research.google.com/drive/1FuOobQOmDJuG7rGsMWfQa883A9r4HxEO?usp=sharing

Conclusion• Gradients of distributions (scores) can be estimated easily
• Flexible architecture choices — no need to be normalized/invertible• Stable training — no minimax optimization
• Better or comparable sample quality to GANs• State-of-the-art performance on CIFAR-10 and others• Scalable to resolution of 1024x1024 for image generation
• Exact likelihood computation• State-of-the-art likelihood on CIFAR-10• Sample editing via manuplating latent codes• Uniquely identifiable encoding