a hierarchy of limitations in machine learning · a hierarchy of limitations in machine learning...
TRANSCRIPT
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 1 of 61
machine learn
ing
A Hierarchy of Limitations in Machine Learning
Momin M. Malik, Data Science Postdoctoral Fellow, Berkman Klein Center for Internet & Society at Harvard University > 03 December 2019, Microsoft Research New England
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 2 of 61
Objective What are all the ways in which machine learning can fail*?
How can we address these failure points?
* Fail = be unreliable, or result in unanticipated harm
Failures will form a hierarchy
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 3 of 61
The basic bargain of ML So long as machine learning establishes external validity (generalizability) from a given input, it can arguably ignore… All other validity
questions All the problems that
plague statistics
y xunknown
decision treesneural nets
y xnature
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Breiman, 2001, Stat. Sci. See also Jones, 2018, Hist. Stud. Nat. Sci.
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 4 of 61
Can focus on one goal Random variable, X
AAAMgnicpZbfb9s2EMfVrts6N1vT7W17YRcYGArPkJwG7cMCtFkKrA8tsixp0sZGQFGUTYQUNZJKrAgG9tfsdft39t/sKCuhTGXdLwWBz/e5+4p3onWMc860CcM/bt3+4M6HH31895PevbVPP7u//uDzN1oWitBDIrlUxzHWlLOMHhpmOD3OFcUi5vQoPvve8qNzqjST2YEpczoReJqxlBFswHW6/uXY0Lmp9nGWSIHOsWIYUgdogY5P1zfCYVhfqGtEjbERNNfe6YN7e+NEkkLQzBCOtT6JwtxMKqwMI5wueuNC0xyTMzylVRwLzwHf+y3HCZgZFlRPqrrMBeqDJ0GpVPCfGVR7VySw0LoUMUQKbGbaZ9Z5EzspTPp0UrEsLwzNyPJGacGRkcj2DCVMUWJ4CQYmikE1iMywwsRAZ3tjRTN6QaQQ0MRxigXjZUJTXHBTjXXamIveynKKfKooPVttgV2Z0qnueo2U3HNTkc/oz6s9q/K5bY32boaVwiVku2U+qsawK0yM1aIaq4LTk28jOp9U4XArN4tqNNyi84WfMZPqspVhYyZNaJMI911pLBGTSqd139ta1RirqWAZKFkm8+WnElXjX9wQjuc3h4PfDwcJJtgl7SZck04Knv9VyhXxU6ImNk6ryGcvHXvps9Cx0Gf7Vyyu9n32wrEXlrW3XTXec3TPz2RZAjSnKh8/rP+s6cUcLK4rPvDzn7sVP/fZjmM7Ptt1bNdnrxx75bMjx458duzYsc/eOvbWZ+8ce+ezC8cufDZ3bO6z0rHSZ5eOXfosmbtmJz4krxtIMK9e+zTGPJ9hiIgF/Cxq24+IqbkKqE2fi6KhYPiM5ppxmTUBMA6uHH7gwczdZWn7EbuUX0csbT/iJwa/3iZiafsRSUsjuVHDtNZhmnWsvvwMO7tcvjmtxVmssCqrXGpmJyDLpgM9wznVg4QSqeqxqAcxJE+VLLLEe+vm0zTn0nje2I6D+hUPHHQ4NI7jkqrKCS26EEYZbUNYYg10K20gMMsGrUiEUL+PtBQU4SSpi8AcLfPq4ZhQIXudgmE3kYGBmaMH7dphNMgL3e0PLoyETYjbwd2oWq/rbnVymMNOFlLlMyuA+iiTTJdo2fJOYsqMrW959RF8NZDWRNtxTKRUCcuwuSG5/cwgOVH4wibDrkAOIZzC0LbOHnrf1Ueu5e/bPN3iYbHJdUvhobb8uW0WPBobrk3JYbaXmUzoYvuEMEXsucvMGDkboHpOFQJpGDzb4fAJEYO6oO2YQykPn4aTtoqMNVXnNPnXOlEYDqDLnDffR6u6HPqcmf+7utzKgIg9Q+Fs2tJhWQbPAroEIqOt7tpWZGxf6T/Tevw3UsaecV1ZTQMKCvUPvO442Sg3XuWb9nTb1iWcLfvllvdftaOR1e7BGTzyT9xd481oGG0ORz8+3ni205zG7wZfBV8H3wRR8CR4FvwQ7AWHAQl+CX4Nfgt+X7uz9mgtWttcht6+1eR8Eaxca9/9Cbb7oiE=
Mean, E(X ) = µAAAMgXicpZZtb9s2EMfVdg+dm23p9mrYG3aBgXZQDdlt0AJDgDZLgPVFiqxLm7SREVASZRPhg0ZSiR3B2KfZ2+3z7NvsKCuhTGXdk4LA5/vd/cU70TomBaPaRNEfN27e+uDDjz6+/Unvztqnn32+fveLN1qWKiWvU8mkOkqwJowK8tpQw8hRoQjmCSOHyen3lh+eEaWpFAdmXpAxxxNBc5piA66T9a9iQ2am2iNYhGiB4t37Rw/QFop5ebK+EQ2i+kJdY9gYG0Fz7Z/cvbMfZzItOREmZVjr42FUmHGFlaEpI4teXGpS4PQUT0iVJNxzwPd+y3EMpsCc6HFVV7lAffBkKJcK/oVBtXdFAnOt5zyBSI7NVPvMOq9jx6XJn44rKorSEJEub5SXDBmJbMtQRhVJDZuDgVNFoRqUTrHCqYHG9mJFBDlPJedYZHGOOWXzjOS4ZKaKdd6Yi97Kcspiogg5XW2BXZnSue56jZTMcxNeTMnPqz2ripltjfZuhpXCc8h2y/y2imFTmASrRRWrkpHjh0MyG1fRYLMwi2o02CSzhZ8xleqilWFjxk1okwj3XWlsyseVzuu+t7WqGKsJpwKULJPF8lPxqvEvrgnHs+vDwe+HgwTl9IJ0E65IJwXP/irlkvgpwyY2yauhz1449sJnkWORz15dsqR65bNdx3Yta2+7Kt53dN/PpCIDWhBVxPfqP2t6MQeLq4oP/PznbsXPfbbt2LbPdhzb8dmeY3s+O3Ts0GdHjh357K1jb332zrF3Pjt37NxnM8dmPps7NvfZhWMXPstmrtmZD9OXDUwxq176NMGsmGKISDj8LGrbj0iIuQyoTZ/zsqFg+IwUmjIpmoAzrC4dfuDB1N1lafsRO4RdRSxtP+InCr/eJmJp+xFZSyO7VsO01mGaday+/Aw9vVi+Oa3FaKKwmleF1NQOQComoZ7igugwI6lU9VTUYQLJEyVLkXlv3WKSF0waz5vYcVC/4oGDDoPGMTwnqnJCiy6EUUbaEJZYA91KCzmmImxFIoT6faQlJwhnWV0EZmiZVw/HjHDZ6xQMuykNDcwcHbZrh9Egz3W3P7g0EjYhbgd3o2q9rrvVyUEBO5lLVUytAOojIameo2XLO4k5Nba+5dVH8NVAWhNtx3EqpcqowOaa5PYzg+RM4XObDLsCOYRwDkPbOnvofVcfuZa/b/N0i4fFZlcthYfa8he2WfBobLg2cwazfS5kRhZbxylVcEQKYWE0PQ1RPadKjjQMnq1o8CTlYV3QVsKglHtPo3FbRSaaqDOS/WudYRSF0GXGmu+jVV0GfRbm/66usDIgYs9QWExaOlQIeBbQJRAZbXbXtiJj+0r+mdbjv5EyGE7HrqymASWB+kOvO052WBiv8kf2dNvWTRld9sst779qD0dWuwdn8KF/4u4ab0aD4aPB6MfHG8+2m9P47eDr4JvgfjAMngTPgh+C/eB1kAa/BL8GvwW/r91ae7AWrY2WoTdvNDlfBivX2nd/Aj2goH4=
Estimator, bµAAAMiHicpZZbb9s2FMfV7ta52ZZuj3thFxgYBs2Q3QbpHgK0WQKsDy2yLmnSRkZAUZRNhBI5koqtCH7dp9nr9l32bXZ0SShTWXdTEPj4/M75i+eI1mEkOdMmCP64c/e99z/48KN7Hw/ub3zy6WebDz5/rUWuCD0mggt1GmFNOcvosWGG01OpKE4jTk+ii+8rfnJJlWYiOzKFpNMUzzKWMIINuM43UWjo0pQH2rAUG6F8tELhgsV0jk0ZpvnqfHMrGAX1hfrGuDW2vPY6PH9w/zCMBclTmhnCsdZn40CaaYmVYYTT1SDMNZWYXOAZLaModRzwfdhxnIGZ4ZTqaVkXu0JD8MQoEQr+M4Nq75oETrUu0ggioaK5dlnlvI2d5SZ5Mi1ZJnNDM9LcKMk5MgJVnUMxU5QYXoCBiWJQDSJzrDAx0N9BqGhGF0SkKc7iMMEp40VME5xz6KNOWnM1WFtOLmeK0ov1FlQrUzrRfa8Rgjtumso5/Xm9Z6VcVq3Rzs2wUriAbLvMb8oQ9oaJsFqVoco5Pft2TJfTMhhtS7MqJ6Ntuly5GXOhrjoZVcy0DW0T4b5rjSXptNRJ3feuVhliNUtZBkoVE7L5VGnZ+le3hOPl7eHgd8NBgqXsivYTbkgvBS//KuWauCnjNjZKyrHLnlv23GWBZYHLXl2zqHzlsgPLDirW3XZleGjpoZvJshiopEqGD+u/ynRijlY3FR+5+c/sip+5bM+yPZftW7bvsheWvXDZiWUnLju17NRlbyx747K3lr112cKyhcuWli1dVlhWuOzKsiuXxUvb7NiF5GULCeblS5dGmMs5hogohZ9FbbsRETXXAbXpcni1NxQMl1GpGRdZG3CJ1bXDDTya27s0thuxT/lNRGO7ET8x+PW2EY3tRsQdjfhWDdNZh2nXsf7yM+ziqnlzVhZnkcKqKKXQrJqDLJv5eo4l1X5MiVD1cNR+BMkzJfIsdt66cpZILozjjapxUL/igYMOh8ZxXFBVWqFVH8Ioo10IS6yB7qT5KWaZ34lECA2HSIuUIhzHdRGYoyavHo4xTcWgVzDsJuIbmDna79YOo0EsdL8/ODcCNiHuBvejar2+u9PJkYSdnAol55UAGqJMMF2gpuW9xISZqr7mGiL4aiCtja7GMRFCxSzD5pbk7jOD5FjhRZUMuwJZhHACQ7tyDtC7riGyLX/X5ukXD4uNb1oKD7Xjl1Wz4NFU4doUHGZ7kYmYrnbPCFNwRPJhYYxc+KieU3mKNAye3WC0Q1K/Lmg34lDKwyfBtKsiIk3VJY3/tc44CHzoMuft98m6Loc+Z+b/rk5WMiBSnaFwNuvosCyDZwFdApHJdn9tazJVX+k/03r8N1IGwyHZltU2IKdQv+90x8qOpXEqf1Sdbru6hLOmX3Z5/1V7PKm0B3AGH7sn7r7xejIaPxpNfny89XSvPY3f8770vvK+9sbejvfU+8E79I494v3i/er95v2+MdgINnY2vmtC795pc77w1q6NvT8B8BKlbg==
Expected value, E(bµ)AAAMkXicpZbdbts2FMfV7qtzvTVdL3fDLjDQDpohuw3aXQRIswRYMbTIurRJGxsBJVE2EUrkSCq2IugZ9jS73Z5jb7NDWQllKuu+FAQ+Pr9z/uI5onUYCkaVDoI/btz84MOPPv7k1qe92/3PPr+zcfeLN4rnMiKvI864PA6xIoxm5LWmmpFjIQlOQ0aOwrPvDD86J1JRnh3qQpBpimcZTWiENbhONx5ONFnqcn8pSKRJjM4xy4mPKjTZfzBZ0JjMsS4naV49PN3YDIZBfaGuMWqMTa+5Dk7v3j6YxDzKU5LpiGGlTkaB0NMSS00jRqreJFdE4OgMz0gZhqnjgO+DluMEzAynRE3LuuoKDcATo4RL+M80qr1rEjhVqkhDiEyxniuXGed17CTXydNpSTORa5JFqxslOUOaI9NCFFMJzWIFGDiSFKpB0RxLDA2UoCRJRhYRT1OcxZMEp5QVMUlwzqCRKmnMqre2nFzMJCFn6y0wK5MqUV2v5pw5bpKKOfl5vWelWJrWKOdmWEpcQLZd5tflBDaJDrGsyonMGTn5ZkSW0zIYbgldlePhFllWbsacy4tWhomZNqFNItx3rbFROi1VUve9rVVOsJylNAMlw7hYfcq0bPzVNeF4eX04+N1wkKApvSDdhCvSScHLv0q5JG7KqIkNk3LksueWPXdZYFngsleXLCxfuWzfsn3D2tuunBxYeuBm0iwGKogUk/v1nzGdmMPqquJDN/+ZXfEzl+1atuuyPcv2XPbCshcuO7LsyGXHlh277K1lb132zrJ3LltYtnDZ0rKlywrLCpddWHbhsnhpmx27MHrZwAiz8qVLQ8zEHENEmMLPorbdiJDoy4DadDm821cUDJcRoSjjWRNwjuWlww08nNu7rGw3Yo+wq4iV7Ub8ROHX20SsbDcibmnE12ro1jp0s471l5+mZxerN6exGA0llkUpuKJmINJs5qs5FkT5MYm4rKek8kNInkmeZ7Hz1hWzRDCuHW9oxkH9igcOOgwax3BBZGmFqi6EUUbaEJZYA9VK81NMM78ViRAaDJDiKUE4jusiMEOrvHo4xiTlvU7BsJsiX8PMUX67dhgNfKG6/cG55rAJcTu4G1Xrdd2tTg4F7OSUSzE3AmiAMk5VgVYt7yQmVJv6VtcAwVcNaU20GccR5zKmGdbXJLefGSTHEi9MMuwKZBHCCQxt4+yh910DZFv+vs3TLR4WG1+1FB5qyy9Ms+DRmHClCwazvch4TKrtk4hKOCL5sDAanfmonlN5ihQMnu1g+CRK/bqg7ZBBKfefBtO2Cg8Vkeck/tc6oyDwocuMNd/H67oM+pzp/7s6YWRAxJyhcDZr6dAsg2cBXQKR8VZ3bWsypq/kn2k9/hspjeG0bMtqGpATqN93umNlR0I7lT8yp9u2bsToql92ef9VezQ22j04g4/cE3fXeDMejh4Nxz8+3tzZbU7jt7wvva+8B97Ie+LteN97B95rL/J+8X71fvN+79/rf9vf6TexN280Ofe8tav/w58IWahx
Standard error,p
V(bµ)AAAMoHicpZZLb9w2EMeV9JVu3MZpj70wMRZIC3Wh3cSILwYS1wGSQ1In8SuxFgYlUbuEKVEhKe/Kgr5KP02v7b3fpkNJNrWUm75kGDs7v5kR5y+uhkHGqFSe98eNm598+tnnX9z6cnB77auv76zf/eZQ8lyE5CDkjIvjAEvCaEoOFFWMHGeC4CRg5Cg4+0nzo3MiJOXpvioyMk3wLKUxDbEC1+n6lq/IUpVvFU4jLCJEhODCRRXy5QehSj/Bah4E5WH1wF/QiMyx9uXV99Xp+oY38uoL9Y1xa2w47bV3evf2nh/xME9IqkKGpTwZe5mallgoGjJSDfxckgyHZ3hGyiBILAd8H3YcJ2CmOCFyWtYiVGgIngjFXMB/qlDtXSmBEymLJIBI3ZS0mXZex05yFW9NS5pmuSJp2NwozhlSHGlFUUQFCRUrwMChoNANCudY4FCB7gNfkJQsQp4koLAf44SyIiIxzhkoKePWrAYry8mzmSDkbFUCvTIhY9n3Ks6Z5SZJNicfVjUrs6WWRlo3w0LgArLNMn8ofdgzKsCiKn2RM3Ly45gsp6U32sxUVU5Gm2RZ2RlzLi46GTpm2oa2iXDfFWHDZFrKuNa9W6v0sZglNK2a3cez5lMkZeuvrgnHy+vDwW+HQwma0AvST7givRS8/KuUS2KnjNvYIC7HNnth2AubeYZ5NntTXf0g39jsmWHPNOtuu9LfM3TPzqRpBDQjIvPv1X/atGL2q6uO9+38p2bFT222Y9iOzXYN27XZS8Ne2uzIsCObHRt2bLN3hr2z2XvD3ttsYdjCZkvDljYrDCtsdmHYhc2ipRE7smH4qoUhZuUrmwaYZXMMEUECP4vatiMCoi4DatPm8HJvKBg2I5mkjKdtwDkWlw47cH9u7tLYdsQuYVcRjW1HvKXw620jGtuOiDo1omtrqM46VLuO1ZefomcXzZtTW4wGAouizLikej7SdObKOc6IdCMSclEPTekGkDwTPE8j662bzeKMcWV5Az0O6lc8cKjDQDiGCyJKU6jqQxhlpAthiTWQnTQ3wTR1O5EIoeEQSZ4QhKOobgIz1OTVwzEiCR/0GobdFLoKZo50u73DaOAL2dcH54rDJsTd4H5UXa/v7ig5ymAnJ1xkc10ADVHKqSxQI3kvMaZK99dcQwRfFaS10Xoch5yLiKZYXZPcfWaQHAm80MmwK5BBCMcwtLVzgD52DZGR/GObp988LDa6khQeasefabHg0ehwqQoGs71IeUSq7ZOQCjgiubAwGp65qJ5TeYIkDJ5tb/Q4TNy6oe2AQSv3trxptwoPJBHnJPrXdcae54LKjLXfJ6t1Geicqv+7ukyXgSL6DIXTWacOTVN4FqASFJls9te2UkbrSv5ZrUd/U0phODybtloBcgL9u5Y6puw4U1bnD/Xptls3ZLTRyyzvv9YeT3TtAZzBx/aJu28cTkbjh6PJ60cbT3ba0/gt5zvnvvPAGTuPnSfOc2fPOXBC5xfnV+c35/e1+2vP135ee92E3rzR5nzrrFxr7/8EnVGvqA==
Standard deviation,p
V(X ) = �AAAMoXicpZbfb9s2EMfV7lfnZlu6Pe6FXWCgGzTDdhu0ewjQZimwPLTw0qRJaxkBJVE2EVJUSSqxIuhv2V+z1+15/82OshLKVNb9UhD4fJ+7E+8rWscwY1Tp4fCPW7c/+PCjjz+582nv7sZnn3+xee/L10rkMiJHkWBCnoRYEUZTcqSpZuQkkwTzkJHj8OxHw4/PiVRUpIe6yMiM43lKExphDa7TzR8CTZa6fKVxGmMZo5ic0xr5qEKBeid1GXCsF2FYvq4enHxboR1w0znHp5tbw8GwvlDXGDXGltdck9N7dydBLKKck1RHDCs1HQ0zPSux1DRipOoFuSIZjs7wnJRhyB0HfO+3HFMwU8yJmpW1ChXqgydGiZDwn2pUe9dKYK5UwUOINB0plxnnTWya6+TJrKRplmuSRqsbJTlDWiAjKYqpJJFmBRg4khS6QdECSxxpEL4XSJKSi0hwDhIHCeaUFTFJcM5AWpU0ZtVbW06ezSUhZ+sSmJVJlaiuVwvBHDfh2YK8W9eszJZGGuXcDEuJC8i2y/yuDGDT6BDLqgxkzsj0+xFZzsrhYDvTVTkebJNl5WYshLxsZZiYWRPaJMJ914SN+KxUSa17u1YZYDnnNK1WW09kq0/Jy8Zf3RCOlzeHg98NhxKU00vSTbgmnRS8/KuUK+KmjJrYMClHLtu3bN9lQ8uGLjuorn+NBy57btlzw9rbrgwmlk7cTJrGQDMis+B+/WdMJ+awuu740M1/Zlf8zGW7lu26bM+yPZe9sOyFy44tO3bZiWUnLntj2RuXvbXsrcsuLLtw2dKypcsKywqXXVp26bJ4acWOXRi9bGCEWfnSpSFm2QJDRMjhZ1HbbkRI9FVAbbqc5w0Fw2UkU5SJtAk4x/LK4QYeLuxdVrYbsUfYdcTKdiNemSnTRKxsNyJu1YhvrKFb69DNOtZffpqeXa7enMZiNJRYFmUmFDVTkKZzXy1wRpQfk0jIejQqP4TkuRR5Gjtv3WyeZExoxxuacVC/4oFDHQbCMVwQWdpCVRfCKCNtCEusgWql+RzT1G9FIoT6faQEJwjHcd0EZmiVVw/HmHDR6zQMuynyNcwc5bd7h9EgLlRXH5xrAZsQt4O7UXW9rrul5CCDncyFzBamAOqjVFBVoJXkncSEatPf6uoj+KohrYk24zgSQsY0xfqG5PYzg+RY4guTDLsCWYRwAkPbOHvofVcfWcnft3m6zcNi42tJ4aG2/JkRCx6NCVe6YDDbi1TEpNqZRlTCEcmHhdHozEf1nMo5UjB4doaDxxH364Z2Qgat3H8ynLWriFAReU7if11nNBz6oDJjzffxel0GOqf6/64uM2WgiDlD4XTeqkPTFJ4FqARFxtvdta2VMbqSf1br0d+U0hhOz7atRoCcQP++o44tO8q00/lDc7pt140YXelll/dfa4/GpnYPzuAj98TdNV6PB6OHg/HPj7ae7jan8Tve19433gNv5D32nno/eRPvyIu8X7xfvd+83ze2NvY3JhsHq9Dbt5qcr7y1a2P6J36mrxU=
Estimator, b�AAAMi3icpZbdbts2FMfV7qtz3a3dLnfDLjAwDJphOw1aDAvQZgmwXrTIurRJGxkFJVE2EVLUSCq2IugB9jS73R5lb7NDSQllKuu+FAQ5Or//OeI5YnQYZowqPZn8cePme+9/8OFHtz4e3B7e+eTTu/c+e6VELiPyMhJMyJMQK8JoSl5qqhk5ySTBPGTkODz73vDjcyIVFemRLjIy53iR0oRGWIPr7d2tQJO1Lg+UphxrIX1UoWBFY7LEugwUXXBcgWoyntQX6hvT1tjy2uvw7b3bh0EsopyTVEcMK3U6nWR6XmKpacRINQhyRTIcneEFKcOQOw64H3Ucp2CmmBM1L+t6KzQCT4wSIeE31aj2bqTAXKmCh6CEopbKZcZ5HTvNdfJoXtI0yzVJo+ZBSc6QFsg0D8VUkkizAgwcSQrVoGiJJY40tHgQSJKSVSQ4x2kcJJhTVsQkwTkzrUxasxpsLCfPFpKQs80WmJVJlai+VwvBHDfh2ZL8vNmzMlub1ijnYVhKXEC0XebXZQDbQ4dYVmUgc0ZOv5mS9bycjHcyXZWz8Q5ZV27EUsiLToTRzFtpGwjP3WhsxOelSuq+d3OVAZYLTlPIZJjImr+Sl62/ukaO19fLwe/KIQXl9IL0A65ILwSv/yrkkrgh01YbJuXUZU8te+qyiWUTl724ZGH5wmUHlh0Y1t12ZXBo6aEbSdMYaEZkFtyvf4zpaI6qq4qP3PgndsVPXLZn2Z7L9i3bd9kzy5657NiyY5edWHbisteWvXbZG8veuGxl2cpla8vWLissK1x2YdmFy+K1bXbswuh5CyPMyucuDTHLlhgUIYd/i9p2FSHRl4LadDnPWwqGy0imKBNpKzjH8tLhCo+W9imN7Sr2CbtSNLar+KkeM42isV1F3MkRX5tDd9ah23Vsfvw0PbtovpzGYjSUWBZlJhQ1o5CmC18tcUaUH5NIyHo+Kj+E4IUUeRo7X91skWRMaMcbmnFQf+KBQx4GjWO4ILK0iao+hFFGuhCWWAPVCfM5pqnfUSKERiOkBCcIx3FdBGaoiauHY0y4GPQKht0U+RpmjvK7tcNoECvV7w/OtYBNiLvivqrO13d3OjnOYCdzIbOlSYBGKBVUFahpeS8wodrU11wjBLcawlq1GceREDKmKdbXBHffGQTHEq9MMOwKZBHCCQxt4xygd10jZFv+rs3TLx4WG1+1FF5qx5+ZZsGrMXKlCwazvUhFTKrd04hKOCL5sDAanfmonlM5RwoGz+5k/DDifl3QbsiglPuPJvNuFhEqIs9J/K/zTCcTH7rMWHs/28zLoM+p/r+ry0waSGLOUDhddPLQNIV3AV2CJLOd/to20pi+kn+W68HfpNIYzsm2rLYBOYH6fac7Nu00007l2+Z0280bMdr0yy7vv+aezkzuAZzBp+6Ju2+8mo2n2+PZjw+2Hu+1p/Fb3hfel95X3tR76D32fvAOvZde5P3i/er95v0+vDPcHn47/K6R3rzRxnzubVzDgz8B3symuw==
Standard error,pV(b�)
AAAMo3icpZZtb9s2EMfV7qlzvTXdXu4Nu8BAN2iG7TZogSFAm6XAiqFFliZN2sgIKImyiZCiSlKxHUEfZp9mb7eX+zY7SkooU1n3pCDw+X53J95ftI5hxqjSo9EfN25+8OFHH39y69Pe7f5nn9/ZuPvFayVyGZHDSDAhj0OsCKMpOdRUM3KcSYJ5yMhRePaD4UfnRCoq0gO9ysiU41lKExphDa7Tje8DTZa6eKVxGmMZIyKlkD4qUaDeSV0EHOt5GBavy/vBgsZkjsGn6Izj8pvydGNzNBxVF+oa48bY9Jpr7/Tu7b0gFlHOSaojhpU6GY8yPS2w1DRipOwFuSIZjs7wjBRhyB0HfB+0HCdgppgTNS0qHUo0AE+MEiHhP9Wo8q6VwFypFQ8h0vSlXGac17GTXCePpwVNs1yTNKpvlOQMaYGMqCimkkSarcDAkaTQDYrmWOJIg/S9QJKULCLBOYgcJJhTtopJgnNmxEwas+ytLSfPZpKQs3UJzMqkSlTXq4VgjpvwbE7erWtWZEsjjXJuhqXEK8i2y/y2CGDb6BDLsghkzsjJd2OynBaj4Vamy2Iy3CLL0s2YC3nRyjAx0ya0SYT7rgkb8Wmhkkr3dq0iwHLGaVrWG1Bk9afkReMvrwnHy+vDwe+GQwnK6QXpJlyRTgpe/lXKJXFTxk1smBRjlz237LnLRpaNXLZfXv0m9132zLJnhrW3XRHsWbrnZtI0BpoRmQX3qj9jOjEH5VXHB27+U7vipy7bsWzHZbuW7brshWUvXHZk2ZHLji07dtkby9647K1lb122sGzhsqVlS5etLFu57MKyC5fFSyt27MLoZQMjzIqXLg0xy+YYIkIOP4vKdiNCoi8DKtPlPG8oGC4jmaJMpE3AOZaXDjfwYG7vUttuxC5hVxG17Ua8qgZNHVHbbkTcqhFfW0O31qGbday//DQ9u6jfnMZiNJRYropMKGpGJE1nvprjjCg/JpGQ1dxUfgjJMynyNHbeutksyZjQjjc046B6xQOHOgyEY3hFZGELlV0Io4y0ISyxAqqV5nNMU78ViRAaDJASnCAcx1UTmKE6rxqOMeGi12kYdlPka5g5ym/3DqNBLFRXH5xrAZsQt4O7UVW9rrul5DCDncyFzOamABqgVFC1QrXkncSEatNffQ0QfNWQ1kSbcRwJIWOaYn1NcvuZQXIs8cIkw65AFiGcwNA2zh563zVAVvL3bZ5u87DY+EpSeKgtf2bEgkdjwpVeMZjtq1TEpNw+iaiEI5IPC6PRmY+qOZVzpGDwbI+GjyLuVw1thwxaufd4NG1XEaEi8pzE/7rOeDTyQWXGmu+T9boMdE71/11dZspAEXOGwumsVYemKTwLUAmKTLa6a1srY3Ql/6zWw78ppTGcn21bjQA5gf59Rx1bdpxpp/MH5nTbrhsxWutll/dfa48npnYPzuBj98TdNV5PhuMHw8nPDzef7DSn8VveV97X3n1v7D3ynng/enveoRd5v3i/er95v/cH/Z/6+/2DOvTmjSbnS2/t6k//BKjlsPU=
Expected value, E(b�)AAAMlHicpZbdbts2FMfV7qtzvS3dgN3shl1goBs0Q3YbtBcL0KYJsF40yLqkSRsbASVRNhFS5EgqtiP4JfY0u93eYm+zQ1kJZSrrvhQEOTq/c/7iOWJ0GEtGtYmiP27dfu/9Dz786M7HnbvdTz79bOPe56+1KFRCjhLBhDqJsSaM5uTIUMPIiVQE85iR4/j8ueXHF0RpKvJDs5BkzPEkpxlNsAHX2UY4MmRuyr25JIkhKbrArCAhWqLR3oPRjKZkik050nTC8fKbs43NqB9VF2obg9rYDOrr4Oze3YNRKpKCk9wkDGt9OoikGZdYGZowsuyMCk0kTs7xhJRxzD0H3PcajlMwc8yJHpdV4UvUA0+KMqHgNzeo8q5JYK71gscQybGZap9Z503stDDZk3FJc1kYkierB2UFQ0Yg20WUUgX9YgswcKIoVIOSKVYYeqhASZGczBLBOc7TUYY5ZYuUZLhgtpdZbS47a8sp5EQRcr7eArsypTPd9hohmOcmXE7Jz+s9K+XctkZ7D8NK4QVku2V+W45gn5gYq2U5UgUjp98NyHxcRv0taZblsL9F5ks/YyrUZSPDxozr0DoRnrvW2ISPS51VfW9qlSOsJpzmoGSZkKu/ipe1f3lDOJ7fHA5+PxwkKKeXpJ1wTVopeP5XKVfETxnUsXFWDnz2wrEXPosci3z26orF5Suf7Tm2Z1lz25WjA0cP/Eyap0AlUXJ0v/qxphdzuLyu+NDPf+ZW/MxnO47t+GzXsV2fvXTspc+OHTv22YljJz5749gbn7117K3PZo7NfDZ3bO6zhWMLn106dumzdO6anfow2a9hglm579MYMznFEBFz+LeobD8iJuYqoDJ9zouaguEzIjVlIq8DLrC6cviBh1P3lJXtR+wSdh2xsv2In6o5s4pY2X5E2tBIb9QwjXWYeh3rHz9Dzy9XX05rMRorrBalFJramUjzSainWBIdpiQRqhqUOowheaJEkafeV1dOMsmE8byxHQfVJx446DBoHMMLokontGxDGGWkCWGJFdCNtJBjmoeNSIRQr4e04AThNK2KwAyt8qrhmBIuOq2CYTcloYGZo8Nm7TAaxEy3+4MLI2AT4mZwO6rSa7sbnexL2MlcKDm1AqiHckH1Aq1a3krMqLH1ra4eglsDaXW0HceJECqlOTY3JDffGSSnCs9sMuwK5BDCGQxt6+ygd1095Fr+rs3TLh4Wm163FF5qwy9ts+DV2HBtFgxm+yIXKVlunyZUwREphIXR5DxE1ZwqONIweLaj/uOEh1VB2zGDUu4/icZNFRFroi5I+q91BlEUQpcZq++H67oM+pyb/7s6aWVAxJ6hcD5p6NA8h3cBXQKR4VZ7bWsytq/kn2k9+hspg+HA7MqqG1AQqD/0uuNkB9J4lT+0p9umbsLoql9uef9VezC02h04gw/8E3fbeD3sDx72hz8+2ny6U5/G7wRfBV8HD4JB8Dh4GvwQHARHQRL8Evwa/Bb83v2y+333eXdvFXr7Vp3zRbB2dff/BP/pqb4=
See also: Robert Tibshirani, “Recent Advances in Post-Selection Inference” (2015)
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 5 of 61
Kinds of Validity
Construct Validity(measurement)
Inference Validity(studies)
“Translation”
Face Content Predictive Concurrent Convergent Discriminant
Criterion Internal External
Everything else can be ignored Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Kass, 2011, Stat. Sci. Adapted from Borgatti, 2012
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 6 of 61
…or can it?
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 7 of 61
Outline Problems with/Failures of: Meaning and measurement Central tendency Causality Capturing variability Cross-validation
Reflection More specific
More general; problems percolate downward in this hierarchy
ML
STS
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 8 of 61
Meaning and measurement
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 9 of 61
Meaning-making “During the writing of this book, my first grandchild was born. The hospital records document her weight, height, health[;] the mother’s condition, length of labor, time of birth, and hospital stay… These are physiological and institutional metrics. When aggregated across many babies and mothers, they provide trend data about the beginning of life—birthing.”
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 10 of 61
Meaning-making “But nowhere in the hospital records will you find anything about what the birth of Calla Quinn means. Her existence is documented but not what she means to our family, what decision-making process led up to her birth, the experience and meaning of the pregnancy, the family experience of the birth process, and the familial, social, cultural, political, and economic context…” (Patton, 2015)
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 11 of 61
Experience
“A white woman can say that a neighborhood is ‘sketchy’ and most people will smile and nod. She felt unsafe, and we automatically trust her opinion. A black man can tell the world that every day he lives in fear of the police, and suddenly everyone demands statistical evidence to prove that his life experience is real.”
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 12 of 61
Validating measurements
Kinds of Validity
Construct Validity(measurement)
Inference Validity(studies)
“Translation”
Face Content Predictive Concurrent Convergent Discriminant
Criterion Internal External
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Adapted from Borgatti, 2012
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 13 of 61
Solutions See quantitative research as building on what is qualitatively known, not replacing it Think about measurement! Work with experts in validation Integrate qualitative research for: – Needs assessments prior to ML – Annotation for training data – Evaluating implementations of ML systems
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 14 of 61
Central tendency
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 15 of 61
The world as a data matrix “it is striking how absolutely these assumptions contradict those of the major theoretical traditions of sociology. Symbolic interactionism rejects the assumption of fixed entities and makes the meaning of a given occurrence depend on its location — within an interaction, within an actor's biography, within a sequence of events. “Both the Marxian and Weberian traditions deny explicitly that a given property of a social actor has one and only one set of causal implications… Marx, Weber, and work deriving from them in historical sociology all approach social causality in terms of stories, rather than in terms of variable attributes.” (Abbott, 1988)
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 16 of 61
“Flaw of averages”
Source: Todd Rose. Illustration: Future for Learning.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 17 of 61
Solutions No matter how small the bins are, is still a
central tendency (i.e.: mean, median, majority class, etc.) – No longer a central tendency when the bins have an n
of 1… but then ML and stats can do nothing but restate that datum
Recognize that planning to the central tendency punishes outliers (Keyes 2018): plan for this!
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 18 of 61
Causality
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 19 of 61
Sometimes, people want causality “A project I worked on in the late 1970s was the
analysis of delay in criminal cases in state court systems... A large decision tree was grown, and I showed it on an overhead and explained it to the assembled Colorado judges. One of the splits was on District N which had a larger delay time than the other districts. I refrained from commenting on this. But as I walked out I heard one judge say to another, ‘I knew those guys in District N were dragging their feet.’” (Breiman, 2001)
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 20 of 61
35
30
25
20
10
5
15
0
0 5 10 15
Chocolate Consumption (kg/yr/capita)
Nob
el L
aure
ates
per
10
Mill
ion
Popu
latio
n
Poland
SwitzerlandSweden
Norway
China Brazil
GreecePortugal
United States
Germany
France
Finland
Italy
Australia
The Netherlands
CanadaBelgium
United Kingdom
Ireland
Spain
Austria
Denmark
r=0.791
Japan
“Predictions” are correlations
Messerli, 2012, NEJM
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 21 of 61
Not an obvious usage of “predict” Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 22 of 61
Creates two types of modeling! Informative models may not fit well
Breiman 2001: Information Shmueli 2010: Explanation
Kleinberg et al 2015: Rain dance Mullainathan & Spiess, 2017: β-hat
Correlations may “predict” well
Breiman, 2001: Prediction Shmueli, 2010: Prediction
Kleinberg et al., 2015: Umbrella Mullainathan & Spiess, 2017: y-hat
AAAB+HicbZBNS8NAEIY39avWj0Y9egkWwVNJRNBj0YvHCvYD2lA2m0m7dLMJuxOlhv4SLx4U8epP8ea/cdvmoK0vLDy8M8PMvkEquEbX/bZKa+sbm1vl7crO7t5+1T44bOskUwxaLBGJ6gZUg+ASWshRQDdVQONAQCcY38zqnQdQmifyHicp+DEdSh5xRtFYA7vaf+QhjCjm/QCQTgd2za27czmr4BVQI4WaA/urHyYsi0EiE1Trnuem6OdUIWcCppV+piGlbEyH0DMoaQzaz+eHT51T44ROlCjzJDpz9/dETmOtJ3FgOmOKI71cm5n/1XoZRld+zmWaIUi2WBRlwsHEmaXghFwBQzExQJni5laHjaiiDE1WFROCt/zlVWif1z3Ddxe1xnURR5kckxNyRjxySRrkljRJizCSkWfySt6sJ+vFerc+Fq0lq5g5In9kff4APIGTcw==
AAAB8nicbZDLSsNAFIYn9VbrrerSzWARXJVEBF0W3bisYC+QhjKZTNqhk5kwc6KE0Mdw40IRtz6NO9/GaZuFtv4w8PGfc5hz/jAV3IDrfjuVtfWNza3qdm1nd2//oH541DUq05R1qBJK90NimOCSdYCDYP1UM5KEgvXCye2s3ntk2nAlHyBPWZCQkeQxpwSs5Q+eeMTGBIp8Oqw33KY7F14Fr4QGKtUe1r8GkaJZwiRQQYzxPTeFoCAaOBVsWhtkhqWETsiI+RYlSZgJivnKU3xmnQjHStsnAc/d3xMFSYzJk9B2JgTGZrk2M/+r+RnE10HBZZoBk3TxUZwJDArP7scR14yCyC0QqrndFdMx0YSCTalmQ/CWT16F7kXTs3x/2WjdlHFU0Qk6RefIQ1eohe5QG3UQRQo9o1f05oDz4rw7H4vWilPOHKM/cj5/ANfhkZs=
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 23 of 61
“True” model can predict worse!
100 150 200
0.000
0.005
0.010
0.015
0.020
Mean Squared (test) Error over 1,000 runs
MSE
Density
TrueUnderspecifiedAll−subsetStepwiseRidgeLasso
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Simulation of Shmueli, 2010, Stat. Sci
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 24 of 61
Solution: Determine type of problem
By “prediction”, we mean correlation (Caruana et al., 2015; Doshi-Velez & Kim, 2017): communicate this!!
If not a “prediction policy problem”, then machine learning may not be appropriate (whether explainable/interpretable or not!)
Then: causal modeling, or statistical modeling of data-generating process to get “explanations” (causality lite)
Other benefits to causal knowledge: – Makes predictions robust to distributions shifts (argument of
causal learning literature; Spirtes & Zhang, 2016) – Allows us to intervene
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 25 of 61
Capturing variability
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 26 of 61
When data don’t capture key variability (Google Flu Trends)
2.5
5
0
2.5
5
0
2.5
5
0
2.5
ILI p
erce
ntag
e
0
5Data available as of 4 February 2008
Data available as of 3 March 2008
Data available as of 31 March 2008
Data available as of 12 May 2008
40 43 47 51 3Week
7 11 15 19
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Ginsberg et al., 2012, Nature Santillana et al., 2014, Am. J. Prev. Med.
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 27 of 61
Real-world testing of ML results van’t Veer et al. (2002) found 70 genes correlated with developing breast cancer Of course the
correlations were optimal, post-hoc. But did it generalize?
10
20
30
40
50
60
70
Tum
ours
5
10
15
Tum
ours
AL08
0059
Con
tig 6
3649
RC
LOC
5120
3C
ontig
462
18R
CC
ontig
382
88R
CAA
5550
29R
CC
ontig
285
52R
CFL
T1M
MP9
DC
13EX
T1AL
1377
18PK
428
HEC
ECT2
GM
PSC
ontig
321
85R
CU
CH
37C
ontig
352
51R
CKI
AA10
67G
NAZ
SER
F1A
OXC
TO
RC
6LL2
DTL
PRC
1AF
0521
62C
OL4
A2KI
AA01
75R
AB6B
Con
tig 5
5725
RC
DC
KC
ENPA
SM20
MC
M6
AKAP
2C
ontig
564
57R
CR
FC4
DKF
ZP56
4D04
62SL
C2A
3M
P1C
ontig
408
31R
CC
ontig
242
52R
CFL
J111
90C
ontig
514
64R
CIG
FBP5
IGFB
P5C
CN
E2ES
M1
Con
tig 2
0217
RC
NM
ULO
C57
110
Con
tig 6
3102
RC
PEC
IAP
2B1
CFF
M4
PEC
ITG
FB3
Con
tig 4
6223
RC
Con
tig 5
5377
RC
HSA
2508
39G
STM
3BB
C3
CEG
P1C
ontig
483
28R
CW
ISP1
ALD
H4
KIAA
1442
Con
tig 3
2125
RC
FGF1
8–1 0 1
–1 0 1
Sporadic breast tumourspatients
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 28 of 61
High
Both tests agree, high
risk
Both tests agree, low
risk
“Clinical” risk Low
High
Low
Model says treat,
doctor says don’t
Doctor says treat, model says don’t
Risk via correlations
with gene expression
Treat with chemo
Implementation testing
Don’t treat with chemo
???
Cardoso et al., 2016, NEJM
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 29 of 61
High
Both tests agree, high
risk
Both tests agree, low
risk
Low
High
Low
Chemo-therapy is
worse!
Chemo-therapy is
similar
Risk via correlations
with gene expression
Implementation testing “Clinical” risk
Treat with chemo
Don’t treat with chemo
???
Cardoso et al., 2016, NEJM
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 30 of 61
High
Both tests agree, high
risk
Both tests agree, low
risk
Low
High
Low
Chemo-therapy is
worse!
Chemo-therapy is
similar
Risk via correlations
with gene expression
Implementation testing “Clinical” risk
Treat with chemo
Don’t treat with chemo
Cardoso et al., 2016, NEJM
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Finding: Machine learning alone would make things worse. But as a secondary diagnosis, on average it catches false positives and avoids unhelpful chemo!
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 31 of 61
Implementation testing: Details
Before experiment (training data)
High model risk, low clinical risk: randomize. Chemo worse!
Low model risk, high clinical risk: chemo makes no difference
Baseline Survival (no chemotherapy)S
urv
iva
l w
ith
ou
t D
ista
nt
Me
tasta
sis
(%
)
100
80
90
70
60
40
30
10
50
20
0
0 2 431 5 6 7 98
Year
Low clinical, high genomic
Low clinical and genomic
High clinical,
low genomic
High clinical and genomic
Clinicial says high risk, Model says low risk
100
80
90
70
60
40
30
10
50
20
0
0 1 2 3 4 5 6 7 8 9
Year
Chemotherapy
No chemotherapy
90 1 2 3 4 5 6 7 8
80
0
85
90
95
100
Year
Clinicial says low risk, Model says high risk
100
80
90
70
60
40
30
10
50
20
0
0 1 2 3 4 5 6 7 8 9
Chemotherapy
No chemotherapy
90 1 2 3 4 5 6 7 8
85
0
95
80
90
100
Baseline Survival (no chemotherapy)S
urv
iva
l w
ith
ou
t D
ista
nt
Me
tasta
sis
(%
)
100
80
90
70
60
40
30
10
50
20
0
0 2 431 5 6 7 98
Year
Low clinical, high genomic
Low clinical and genomic
High clinical,
low genomic
High clinical and genomic
Clinicial says high risk, Model says low risk
100
80
90
70
60
40
30
10
50
20
0
0 1 2 3 4 5 6 7 8 9
Year
Chemotherapy
No chemotherapy
90 1 2 3 4 5 6 7 8
80
0
85
90
95
100
Year
Clinicial says low risk, Model says high risk
100
80
90
70
60
40
30
10
50
20
0
0 1 2 3 4 5 6 7 8 9
Chemotherapy
No chemotherapy
90 1 2 3 4 5 6 7 8
85
0
95
80
90
100
Baseline Survival (no chemotherapy)S
urv
iva
l w
ith
ou
t D
ista
nt
Me
tasta
sis
(%
)
100
80
90
70
60
40
30
10
50
20
0
0 2 431 5 6 7 98
Year
Low clinical, high genomic
Low clinical and genomic
High clinical,
low genomic
High clinical and genomic
Clinicial says high risk, Model says low risk
100
80
90
70
60
40
30
10
50
20
0
0 1 2 3 4 5 6 7 8 9
Year
Chemotherapy
No chemotherapy
90 1 2 3 4 5 6 7 8
80
0
85
90
95
100
Year
Clinicial says low risk, Model says high risk
100
80
90
70
60
40
30
10
50
20
0
0 1 2 3 4 5 6 7 8 9
Chemotherapy
No chemotherapy
90 1 2 3 4 5 6 7 8
85
0
95
80
90
100
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
(Note: still limitations in how experimental subjects may be unrepresentative.)
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 32 of 61
Solutions to capturing variability GFT is a prediction policy problem: problem wasn’t
overfitting, or causality (although causality may have helped), but that there was key variability (non-winter flu) that had not yet been observed
Rhetoric: Emphasize that cross-validation does not give performance guarantees, only suggestive of performance until testing is done
Real-world testing reveals how system can be used! (Also: experimental data can introduce full amount
of variability, even if not doing causal inference)
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 33 of 61
Cross validation
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 34 of 61
Generalizability through CV Generalizability is shown (at the very least)
through cross validation CV can go wrong in known ways: – improper splitting – publication bias (Gayo-Avello, 2012) – overfitting to the test set (Dwork et al. 2015, Park
2012) Not systematically acknowledged: dependencies
among observations
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 35 of 61
Classic argument for CV Training:
Testing: The difference is the optimism (Efron, 2004; Rosset & Tibshirani, 2018):
Err(µ̂) = 1nEf kY⇤ � bY k22
= 1n
htr⌃+ kµ� E(bY )k22 + tr Varf (bY )� 2 tr Covf (Y ⇤, bY )
i
AAAO9HicpVdbb9s2FHa6W+etWbo97oVd4aEX15DcZM1LgDZNihZoi6xNc6nlBpRE2UQoUSOpxI7qf7K3Ya/7P9uv2aEkhzLldTcZBo7O952P5xxSIuWnjErlOL+vXPno408+/ezq5+0vvry2+tXa9a8PJM9EQN4EnHFx5GNJGE3IG0UVI0epIDj2GTn0Tx9r/PCMCEl5sq+mKRnGeJTQiAZYgetk7Q9vV4hb3hir3Iuz2W30/RbyVCRwkLuzPJkhb/ckQt57dPzuDrqHvHMaEk0+nnnv3/VP+sjz2o0QRiI1AJ9A3ms6ijG6qxVAXyvs3qqJ3K5U7pbsAyy8hDMaUyVPogUihPZL0mN+VidBYl20ICnoaKyG6GTtptNzigs1Dbcybraqa+/k+rUrXsiDLCaJChiWcuA6qRrmWCgaMDJre5kkKQ5O8Yjkvh9bDrjv1BwDMBMcEznMi2maoQ54QhRxAf9EocK7IIFjKaexD8wYq7G0Me1chg0yFW0Oc5qkmSJJUA4UZQwpjvSco5AKEig2BQMHgkI1KBhjmDAFK6PtCZKQ84DHMU5CL8IxZdOQRDhjsCZkVJmz9kI6WToShJwutkBnJmQkm17FObPcJE7H5KfFnuXpRLdGWoNhIfAUok2ad3IPVrXysZjlnsgYGdxzyWSYO72NVM3yfm+DTGZ2xJiLi1qE5gwrahVojTspJ67tQQ/g+Sru9GTCMn+6/+L5LN/ZdLd/cJqE/pzg7m5u7m40CffnhCdPHrvOgyZhfU5wnPWddWcxsUEQD3MZFeuhXmOuH2Yor5iHONc3Fk7qOGniWIximlQUnl5SK/8yOp4sp4PfpoMEPLUXpBlwiTRC8OSvQuaIHaJEkww+m+ZWLD+C15aFPTPYMxtzDObY2Ks55uevbGzXYLs2doCX5KydNhHefU2idmpi/TnOvT0z3p4tQ5MQ0JSI1LtR/LRpcfbNQtm34x+ZHjyysW2DbdvYjsF2bMx/bsDnNvjCYC9s7NBghzZ2ZLAjGzs22LGNvTXYWxs7N9i5jU0MNrGxqcGmNnZhsAsbCydmJkIbDF5WYIBZ/rLRU8zSMQaGH8PDWtg2wydqTihMGyeppIwnFeUMi7nDJsLpoeSAYWNSnwIquLRtxv7Y5FHaNmOHsEtGaduM17VRXi8dJaxphEs1VC0PtTSPEY4vRylteDkvbBuKnl6Ue522GPUFFtM85ZLqMxdNRl05ximR3RDe96I4iMmuD8EjwbMktPbJdBSljCvL6+sNvNiUAQcdBhPD8JSI3AjNmiAcPkgdhBQLQNbCujGmSbfGRAh1OkjymCAchkURmKEyrjjOhCTm7UbBsCKDroJTguzWa4fNnJ/LZn9wpjgsZFwnN1mFXtNd62Qvhach5iIdawHUQQmncorKljcCI6p0feXVQXCrIKxi6wNUwLkIaYLVkuD6nEFwKPC5DoZ1gwyEcATHLO1sow9dHWRa/qHF0ywekg0vWwqTWvOnulkwNZou1ZTBaWya8JDMtgYBFXCo7UJiNDjtomIHzmIkYUvdcnoPgrhbFLTlMyjlxqYzrKtwXxJxRsJ/reM6The6zFh131/UZdDnRP3f7FItAyL61IuTUU2HJgnMBXQJRPobzdwWZHRfyT/TWv8bKYXhg8yUVTUgI1B/1+qOkXVTZVV+X3+P1HUDRst+mfT+q7bb19pt+Gpy7W+kpnHQ77n3exs/rt98uF19P11tfdv6rnWr5bYetB62nrb2Wm9awcrDlWiFr6SrZ6s/r/6y+mtJvbJSxXzTWrhWf/sT5pOQXg==
Opt(µ̂) = Err(µ̂)� err(µ̂) = 2n tr Covf (Y , bY )AAAOiHicpZddb9s2FIaVtts6b03b7XI37AID7eAakpss2UWANh9FC7Rd1qZN0sgIKImyiVCiRlKJHUH/cjf7K7vaoSyHMuV1XzIMHJ3nPYfkISVSQcaoVK77+8qNm7c++/yL2192vvr6zurde/e/+SB5LkLyPuSMi+MAS8JoSt4rqhg5zgTBScDIUXC+q/nRBRGS8vRQTTMyTPAopTENsQLX2b3UT7Aai6T4OVPlQ3+MVeEnefkIbSN/X4im5zHyyaIHNCoWOCwGZZGWcCOQv8sv/JQzmlAlz+KHJz3kX9KI6KCT8tHZvTW371YXahtebaw59XVwdv/ODT/iYZ6QVIUMS3nquZkaFlgoGjJSdvxckgyH53hEiiBILAfcdxuOUzBTnBA5LKrSlagLngjFXMA/VajyLqTAiZTTJAClLpS0mXYuY6e5ireGBU2zXJE0nDUU5wwpjvQ8oIgKEio2BQOHgsJoUDjGUEwFs9XxBUnJZciTBKeRH+OEsmlEYpwzqL2Ma7PsLHQnz0aCkPPFElTTK2PZ9irOmeUmSTYmvy7WrMgmujTSagwLgacQbbr5Q+HDSlMBFmXhi5yR08cemQwLt78Ba6sY9DfIpLQjxlxcNSK0ZlhL60Cr3cls4jo+1ADWfHWnJ9MrixeHr1+Vxd6Wt/Oj2xYM5gJvf2trf6MteDIXPH++67mbbcH6XOC663vr7mLHTsNkWMi4Wg/NMRb6MYLh1Y+ZvrE4aXLS5liMEprWEp5dS2v/MjmeLJeD35ZDCnhWr0g74Jq0QvDkr0LmxA5Roi0Gny3zalUQF57NXhr20mauYa7N3s5ZULy12b5h+zb7gJf0WTttIbzx2kLt1MLmc1z4B6a9AzsNTSOgGRGZ/6D6adPSHJqFcmjHPzM1eGazHcN2bLZn2J7NglcGvrLha8Ne2+zIsCObHRt2bLMTw05s9tGwjza7NOzSZhPDJjabGja12ZVhVzaLJmYmIhuGb2oYYla8adUUs2yMQREk8LBWtq0IiJoLKtPmJJOU8bSWXGAxd9hC2KVnGjBsJim8D2o8s23F4dj0Y2bbij3CrhUz21a8a7TybmkrUSNHtDSHavRDLe3HCCfXrcxseDkvbBuKnl/N9jptMRoILKZFxiXV5yCajnpyjDMiexG870V1OJK9AIJHgudpZO2T2SjOGFeWN9AbeLUpA4c8DCaG4SkRhUlUtiEcPkgTQhcrIBthvQTTtNdQIoS6XSR5QhCOomoQmKFZXHWciUjCO60Bw4oMewpOCbLXHDts5vxStuuDc8VhIeOmuK2q8rXdjUr2M3gaEi6ysU6AuijlVE7RrOStwJgqPb7Z1UVwqyCsVusDVMi5iGiK1ZLg5pxBcCTwpQ6GdYMMQjiGY5Z2dtCnri4yJf/U4mkPHjobXZcUJrXhz3SxYGq0XKopg9PYNOURKbdPQyrgUNuDjtHwvIeqHThPkIQtddvtb4ZJrxrQdsBgKA+23GEzCw8kERck+td5PNftQZUZq+8Hi3kZ1DlV/7d3mU4DSfSpF6ejRh6apjAXUCVIMtho920hja4r+We51v8mlcLwkWSGVRcgJzD+nlUdk9bLlDXyJ/p7pJk3ZHRWL9O9/5rbG+jcHfhq8uxvpLbxYdD3nvQ3fllfe7pTfz/ddr5zvnceOp6z6Tx1XjgHznsndH5z/li5uXJrtbPqrm6u/jST3lipY751Fq7VnT8B93hrRQ==
err(µ̂) = 1nEf kY � bY k22
= 1n
htr⌃+ kµ� E(bY )k22 + tr Varf (bY )� 2 tr Covf (Y , bY )
i
AAAO8HicpVdbb9s2FHa6W+etWbs97oVd4aHdXENykzUPC9CmcdECbZG1aS61vICSKJsIJWokldhR/T/2Nux1P2jA/s0OJTmUKa+7yTBwdL7vfDznkBIpP2VUKsf5Y+3Ke+9/8OFHVz9uf/LptfXPrt/4/EDyTATkdcAZF0c+loTRhLxWVDFylAqCY5+RQ//0kcYPz4iQlCf7apaSUYzHCY1ogBW4Tq7/7hEhbnsTrHIvzuZ30NfbyFORwEHuzvNkjrzBSYS8t+gY3UXeOQ2Jph7Pvbc/9k/6yPPajQBGIjUEn0DeKzqOMfpWx4O6VhjcroncqVS+LdkHWHgJZzSmSp5ES0QI7ZekR/ysTjruoiWeJ+h4okYn1285Pae4UNNwK+NWq7r2Tm5cu+KFPMhikqiAYSmHrpOqUY6FogEj87aXSZLi4BSPSe77seWA+07NMQQzwTGRo7yYojnqgCdEERfwTxQqvEsSOJZyFvvAjLGaSBvTzlXYMFPR1iinSZopkgTlQFHGkOJIzzcKqSCBYjMwcCAoVIOCCYbpUrAq2p4gCTkPeBzjJPQiHFM2C0mEMwbrQUaVOW8vpZOlY0HI6XILdGZCRrLpVZwzy03idEJ+Wu5Znk51a6Q1GBYCzyDapPlN7sGKVj4W89wTGSPDuy6ZjnKnt5mqed7vbZLp3I6YcHFRi9CcUUWtAq1xp+XEtT3oATxbxZ2eTFjkT/afP5vnu1vuzndOk9BfENzB1tZgs0m4tyA8fvzIde43CRsLguNs7G44y4kNg3iUy6hYD/Uac28gdHnFPMS5vrFwUsdJE8diHNOkovD0klr5V9HxdDUd/DYdJOCZvSDNgEukEYKnfxWyQOwQJZpk8Nk0t2L5Eby0LOypwZ7amGMwx8ZeLjA/f2ljA4MNbOwAr8hZO20ivPmaRO3UxPpznHt7Zrw9W4YmIaApEal3s/hp0+Lsm4Wyb8c/ND14aGM7BtuxsV2D7dqY/8yAz2zwucGe29ihwQ5t7MhgRzZ2bLBjG3tjsDc2dm6wcxubGmxqYzODzWzswmAXNhZOzUyENhi8qMAAs/xFo6eYpRMMDD+Gh7WwbYZP1IJQmDZOUkkZTyrKGRYLh02Ek0PJAcPGpD4DVHBp24z9icmjtG3GLmGXjNK2Ga9qo7xaOUpY0whXaqhaHmplHmMcX45S2vByXto2FD29KPc6bTHqCyxmecol1ectmoy7coJTIrshvO9FcQiTXR+Cx4JnSWjtk+k4ShlXltfXG3ixKQMOOgwmhuEZEbkRmjdBOHyQOggpFoCshXVjTJNujYkQ6nSQ5DFBOAyLIjBDZVxxnAlJzNuNgmFFBl0FpwTZrdcOmzk/l83+4ExxWMi4Tm6yCr2mu9bJXgpPQ8xFOtECqIMSTuUMlS1vBEZU6frKq4PgVkFYxdYHqIBzEdIEqxXB9TmD4FDgcx0M6wYZCOEIjlna2UbvujrItPxdi6dZPCQbXrYUJrXmT3WzYGo0XaoZg9PYLOEhmW8PAyrgUNuFxGhw2kXFDpzFSMKWuu307gdxtyho22dQys0tZ1RX4b4k4oyE/1rHdZwudJmx6r6/rMugz4n6v9mlWgZE9KkXJ+OaDk0SmAvoEoj0N5u5LcnovpJ/prXxN1IKw8eYKatqQEag/q7VHSPrpsqq/J7+HqnrBoyW/TLp/Vdtt6+12/DV5NrfSE3joN9z7/U2f9i49WCn+n662vqy9VXrdstt3W89aD1p7bVet4K179f8tdM1ti7Wf17/Zf3XknplrYr5orV0rf/2J09mj0Y=
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 36 of 61
Apply this to non-iid data Imagine we have, for and
Then, optimism in the training set is:
But test set also has nonzero optimism!
Y1Y2
�⇠ N
✓XX
��,
⌃ ⇢�211T
⇢�211T ⌃
�◆
AAAO6XicpZfdbts2FMed7qvz1qzdLnfDrnDRDZ4hucmSmwBtmhQt0BZZmzZJLTegJMomQooaSSV2BD/E7obd7oF2vbfZoayYMuV1XwoSHJ7f/xySh1RIhRmjSnveH2vXPvjwo48/uf5p+7PPb6x/cfPWl2+UyGVEXkeCCXkcYkUYTclrTTUjx5kkmIeMHIVnjww/OidSUZEe6mlGhhyPUprQCGtwnd78PQjJiKZFyLGWdDI7OfVREKCT0z4KSBov/ChQlKMgeoECRhJ9DzlxKDg2cebvclgYEo27TXn4io44RndRIMfCZIfWO+gUuvffHZa5VoK7i1CnI0lHY/3t6c07Xs8rH9Q0/Mq406qeg9NbN64FsYhyTlIdMazUwPcyPSyw1DRiZNYOckUyHJ3hESnCkDsOaHdqjgGYKeZEDYtybWaoA54YJULCb6pR6V1KgblSUx6CEqYyVi4zzlVskOtke1jQNMs1SaN5R0nOkBbILDSKqSSRZlMwcCQpzAZFYyxxpGE7tANJUnIRCc5xGgcJ5pRNY5LgnOkiUEllztpLw8mzkSTkbLkEZmRSJarp1UIwx014NiY/LdesyCamNMrpDEuJpxBth/ldEcBW1iGWsyKQOSOD730yGRZebzPTs6Lf2ySTmRsxFvKyFmE0w0paBTr9TuYL1w6gBvBSlS2zmP6seHL4/Nms2Nv2d3/wmoL+lcDf397e32wK7l8JHj9+5HtbTcHGlcDzNvY2vOWBDSI+LFRS7of6HItgX5rplevAC9NwOKlz0uRYjjhNK4nIFtLKv0qOJ6vl4HflkIJyekmaAQvSCMGTvwq5Im6Ilk0x+FyZX6nCpPBd9tSypy7zLPNc9vKKhcVLl+1btu+yN3jFmI3TFT4S502hcRph/T0uggPb34GbhqYx0IzILLhd/hjT0RzajXLoxj+0NXjosl3Ldl22Z9mey8JnFj5z4XPLnrvsyLIjlx1bduyyE8tOXPbWsrcuu7DswmUTyyYum1o2ddmlZZcuiyd2JWIXRi8qGGFWvGjUFLNsjEERcnhZS9tVmDO5EpSmy0mmKBNpJTnH8srhCnleacBwWXlyV3huu4rDsR3H3HYVe4QtFHPbVbyq9fJqZS9xLUe8MoeujUOvHMcI80Uvcxv+OS8dG5qeXc7POmMxGkosp0UmFDUXLZqOumqMM6K6Mfy/l+XtS3VDCB5Jkaexc05moyRjQjve0Bzg5aEMHPIwWBiGp0QWNtGsCeHyQeoQhlgCVQvrckzTbk2JEOp0kBKcIBzH5SQwQ/O48joTEy7ajQnDjoy6Gm4JqlufOxzm4kI164NzLWAj47q4qSrzNd21SvYyeBu4kNnYJEAdlAqqpmhe8kZgQrWZ3/zpIGhqCKvU5gIVCSFjmmK9Iri+ZhAcS3xhgmHfIIsQTuCaZZxt9L6ng2zJ37d5mpOHwcaLksKi1vyZKRYsjZErPWVwG5umIiaznUFEJVxquzAwGp11UXkC5xwpOFJ3vN5WxLvlhHZCBlO5ve0N61lEqIg8J/G/zuN7XheqzFjV7i/nZVDnVP/f0WUmDSQxt16cjmp5aJrCWkCVIEl/szm2pTSmruSf5dr4m1Qaw1eYnVZVgJzA/LtOdWxaP9POzO+b75F63ojReb3s8P5rbr9vcrfhq8l3v5Gaxpt+z7/f2/xx486D3er76Xrr69Y3rXstv7XVetB60jpovW5Fa5trg7V4jayfrf+8/sv6r3PptbUq5qvW0rP+25/Fmo3e
2n tr Covf (Y1,
bY1) = 2n tr Covf (Y1,HY1) =2n trHVarf (Y1) =
2n trH⌃
AAAO03icpZfdbts2FMed7qvz1izdLncxdoWBbnANyU3W3ARo06RogbbL2nw2MgJKomwilKiRVGJH0M2w2z3QHmVvs0NJDmXK7bpNRZGj8/ufo8NDyqT8lFGpHOevlRsfffzJp5/d/Lz7xZe3Vr9au/31oeSZCMhBwBkXxz6WhNGEHCiqGDlOBcGxz8iRf/5E86MLIiTlyb6apWQU43FCIxpgBa6ztT89FQkc5MMiTwrkKYG8J/zCSzijMVXyLLp3cub2kXdJQzLBKj8pztwf0Bb6wDA/zp8V6ORdMSX2DrGwIt+jhj9v6DjG6GztrjNwygu1Dbc27nbqa+/s9q0bXsiDLCaJChiW8tR1UjXKsVA0YKToepkkKQ7O8Zjkvh9bDrjvNRynYCY4JnKUl7NQoB54QhRxAf8ThUrvQgocSzmLfVDGWE2kzbRzGTvNVLQ5ymmSZookQfWgKGNIcaSnFIVUkECxGRg4EBRGg4IJhuYpmPiuJ0hCLgMexzgJvQjHlM1CEuGMqdyTUW0W3YVysnQsCDlfbIGuTMhItr2Kc2a5SZxOyK+LPcvTqW6NtB6GhcAziDZl/ph7sGiVj0WReyJj5PS+S6aj3BlspKrIh4MNMi3siAkXV40IrRnV0jrQeu60mriuBz2A16e805PpFvmz/Zcvinxn093+yWkLhnOBu7u5ubvRFjyYC54+feI6D9uC9bnAcdZ31p3Fwk6DeJTLqFwPzTHm3q7QwyvnIc71jcVJk5M2x2Ic06SW8PRaWvuXyfF0uRz8thxSwDt8RdoB16QVgqfvCpkTO0SJthh8tsytVX6UuzZ7bthzmzmGOTZ7PWd+/tpmu4bt2uwQL6lZO20h/Ia2hdqphc33OPf2zPP27DQ0CYGmRKTenfKfNi3Nvlko+3b8Y9ODxzbbNmzbZjuG7djMf2HgCxu+NOylzY4MO7LZsWHHNjsx7MRmbw17a7NLwy5tNjVsarOZYTObXRl2ZbNwamYitGHwqoYBZvmrVk8xSycYFLAzepVtK3yi5oLStDlJJWU8qSUXWMwdtjDOag0YNpN6Q65xZduK/Ympo7JtxQ5h14rKthVvGk95s/QpYSNHuDSHatShltYxxvH1UyobfpwXtg1Fz6+qvU5bjPoCi1meckn1kYom476c4JTIfgi/96I8Z8m+D8FjwbMktPbJdByljCvL6+sNvNyUgUMeBhPD8IyI3CQq2hAOH6QJocQSyEZYP8Y06TeUCKFeD0keE4TDsBwEZqiKK48zIYl5tzVgWJFBX8EpQfabY4fNnF/Kdn9wpjgsZNwUt1Vlvra70clBCm9DzEU60QlQDyWcyhmqWt4KjKjS46uuHoJbBWG1Wh+gAs5FSBOslgQ35wyCQ4EvdTCsG2QQwhEcs7Szi9539ZBp+fsWT3vwUGx43VKY1IY/1c2CqdFyqWYMTmOzhIek2DoNqIBDbR8Ko8F5H5U7cBYjCVvqljN4GMT9ckBbPoOh3Nl0Rs0s3JdEXJDwX+dxHacPXWasvh8u5mXQ50T93+pSnQaS6FMvTsaNPDRJYC6gS5BkuNGubSGN7iv5sFzr/5BKYfjeMsOqG5ARGH/f6o5J66bKGvkD/T3SzBswWvXLlPdfc7tDnbsLX02u/Y3UNg6HA/fBYOOX9buPtuvvp5udbzvfd+513M7DzqPOs85e56ATrHy3srvyauXn1YPVfPW31d8r6Y2VOuabzsK1+sffAg+GZw==
2n tr Covf (Y2,
bY1) = 2n tr Covf (Y2,HY1) =2⇢�2
n trH11T = 2⇢�2
AAAOy3icpZfdbts2FMed7qvz1qzdLnaxG3aFgW5wDclNltwEaPPRtUBTZG3aJI3cgJIomwglaiSV2NF0uQfa4+xtdijJoUx5XbcpCHx0fv9zSB5SIuWnjErlOH+u3Pjo408+/ezm590vvry1+tXtO1+/kTwTAXkdcMbFsY8lYTQhrxVVjBynguDYZ+TIP9/R/OiCCEl5cqhmKRnFeJzQiAZYgevs9h+eigQO8mGRJwXylEDeDr/wEs5oTJU8i+6fnA37yLukIZlglZ8UZ+4PaAt9YJgf508LdLIYgzwx4ciTdBzjd40MlVj/uPOfd4cQN1zQn92+5wyc8kJtw62Ne536Oji7c+uGF/Igi0miAoalPHWdVI1yLBQNGCm6XiZJioNzPCa578eWA+57DccpmAmOiRzlZfUL1ANPiCIu4D9RqPQupMCxlLPYB2WM1UTaTDuXsdNMRZujnCZppkgSVA1FGUOKIz2VKKSCBIrNwMCBoDAaFEwwlFjBhHc9QRJyGfA4xknoRTimbBaSCGdM5Z6MarPoLnQnS8eCkPPFEuieCRnJtldxziw3idMJ+XWxZnk61aWRVmNYCDyDaNPNH3MPFqvysShyT2SMnD5wyXSUO4P1VBX5cLBOpoUdMeHiqhGhNaNaWgda7U6riet6UAN4bMo7PZlukT893H9e5Lub7vZPTlswnAvcvc3NvfW24OFc8OTJjutstAVrc4HjrO2uOYsdOw3iUS6jcj00x5h7e0IPr5yHONc3FidNTtoci3FMk1rC02tp7V8mx9PlcvDbckgBT/0VaQdck1YInv5dyJzYIUq0xeCzZW6t8iN4fVjsmWHPbOYY5tjs5Zz5+Uub7Rm2Z7M3eEmftdMWwruzLdROLWw+x7l3YNo7sNPQJASaEpF6d8s/bVqaQ7NQDu34x6YGj222bdi2zXYN27WZ/9zA5zbcN2zfZkeGHdns2LBjm50YdmKzt4a9tdmlYZc2mxo2tdnMsJnNrgy7slk4NTMR2jB4UcMAs/xFq6aYpRMMCtgjvcq2FT5Rc0Fp2pykkjKe1JILLOYOWxhntQYMm5U7co0r21YcTkw/KttW7BJ2rahsW/Gq0cqrpa2EjRzh0hyq0Q+1tB9jHF+3Utnwcl7YNhQ9v6r2Om0x6gssZnnKJdVHKZqM+3KCUyL7IbzvRXm+kn0fgseCZ0lo7ZPpOEoZV5bX1xt4uSkDhzwMJobhGRG5SVS0IRw+SBNCF0sgG2H9GNOk31AihHo9JHlMEA7DchCYoSquPM6EJObd1oBhRQZ9BacE2W+OHTZzfinb9cGZ4rCQcVPcVpX52u5GJQcpPA0xF+lEJ0A9lHAqZ6gqeSswokqPr7p6CG4VhNVqfYAKOBchTbBaEtycMwgOBb7UwbBukEEIR3DM0s4uet/VQ6bk71s87cFDZ8PrksKkNvypLhZMjZZLNWNwGpslPCTF1mlABRxq+9AxGpz3UbkDZzGSsKVuOYONIO6XA9ryGQzl7qYzambhviTigoT/Oo/rOH2oMmP1/XAxL4M6J+r/9i7VaSCJPvXiZNzIQ5ME5gKqBEmG6+2+LaTRdSUflmvtH1IpDN9ZZlh1ATIC4+9b1TFp3VRZI3+ov0eaeQNGq3qZ7v3X3O5Q5+7CV5NrfyO1jTfDgftwsP7L2r1H2/X3083Od53vO/c7bmej86jztHPQed0JVr5d2Vp5svLz6v6qXL1a/a2S3lipY77pLFyrv/8FcxCCYg==
⌃ij = ⇢�2, i 6= j
AAAORHicpZfdbts2FMfV7Kvz5q3dLnfDrjAwFJohucmamwBtmhQt0BRZmzZpIy+gJMpmQ4kqSSV2BL/Dnma329XeYe+wu2G3ww5lOZQpr/tSEODo/P7n6PCQMqkwZ1Qqz/vlyto77773/gdXP+x89HH3k0+vXf/sheSFiMjziDMujkIsCaMZea6oYuQoFwSnISOH4el9zQ/PiJCUZwdqmpNhikcZTWiEFbhOrt0Kwmd0lOKTkr6eoS0UiDEPpPZ8N3CDNwWOEUVBRt6g1yfXbnp9r7pQ2/Br46ZTX/sn17trQcyjIiWZihiW8tj3cjUssVA0YmTWCQpJchyd4hEpwzC1HHDfaziOwcxwSuSwrIY9Qz3wxCjhAv4zhSrvUgqcSjlNQ1CmWI2lzbRzFTsuVLI5LGmWF4pk0fxBScGQ4kj3EMVUkEixKRg4EhRGg6IxFjhS0OlOIEhGziOepjiLgwSnlE1jkuCCqTKQSW3OOkvlFPlIEHK63AJdmZCJbHsV58xykzQfkzfLPSvziW6NtB6GhcBTiDZl3ioDWCUqxGJWBqJg5Phrn0yGpdffyNWsHPQ3yGRmR4y5uGhEaM2wltaB1nMn84nrBNADWK/VnZ5Mf1Y+PNh7PCt3Nv3tb7y2YLAQ+Lubm7sbbcHtheDBg/u+d6ctWF8IPG99Z91bLuw4SoelTKr10BxjGewKPbxqHtJS31icNDlpcyxGKc1qCc8vpbV/lRxPVsvBb8shBU3pBWkHXJJWCJ78VciC2CFKtMXgs2V+rQqT0rfZI8Me2cwzzLPZ0wULy6c22zVs12Yv8IqatdMW3udnbaF2amHzPS6DffO8fTsNzWKgORF5cKP606alOTAL5cCOv2d6cM9m24Zt22zHsB2bhY8NfGzDPcP2bHZo2KHNjgw7stlLw17a7JVhr2x2bti5zSaGTWw2NWxqswvDLmwWT8xMxDaMntQwwqx80uopZvkYgyJM4WWtbFsRErUQVKbNSS4p41ktOcNi4bCFaVFrwLBZtUHXeG7bioOxqWNu24odwi4Vc9tWPGs85dnKp8SNHPHKHKpRh1pZxwinl0+Z2/DjvLRtKHp6Md/rtMVoKLCYljmXVJ9haDZy5RjnRLox/N6L6mAj3RCCR4IXWWztk/koyRlXljfUG3i1KQOHPAwmhuEpEaVJNGtDOHyQJoQSKyAbYW6KaeY2lAihXg9JnhKE47gaBGZoHlcdZ2KS8k5rwLAiI1fBKUG6zbHDZs7PZbs/uFAcFjJuituqKl/b3ehkP4e3IeUiH+sEqIcyTuUUzVveCkyo0uObXz0EtwrCarU+QEWci5hmWK0Ibs4ZBMcCn+tgWDfIIIQTOGZpZwe97eoh0/K3LZ724KHY+LKlMKkNf66bBVOj5VJNGZzGphmPyWzrOKICDrUuFEajUxdVO3CRIglb6pbXvxOlbjWgrZDBUG5sesNmFh5KIs5I/K/z+J7nQpcZq+8Hy3kZ9DlT/7e6XKeBJPrUi7NRIw/NMpgL6BIkGWy0a1tKo/tK/lmu9b9JpTB84Jhh1Q0oCIzftbpj0vq5skZ+W3+PNPNGjM77Zcr7r7n9gc7dga8m3/5GahsvBn3/dn/j2/Wbd7fr76erzhfOl85Xju/cce46D51957kTOd87Pzg/Oj91f+7+2v2t+/tcunaljvncWbq6f/wJ8WBWeA==
⌃ii = �2
AAAOMXicpZfPb9s2FMfV7lfnzVu7HXbYhV1hYBg8Q3KTNZcAbZoULdAWWZs2aSMvoCTKJkKKGkkldgT/Nbtup/01vQ277p/YoyyHMuV1vxQEeHqf73t6fKRMKsoZVdr331y5+s67773/wbUPOx993P3k0+s3PnupRCFj8iIWTMijCCvCaEZeaKoZOcolwTxi5DA6vW/44RmRiorsQM9yMuJ4nNGUxliD6+T6F2H0nI45PikpnaNtFCpz98Pw5Potf+BXF2obQW3c8upr/+RG92qYiLjgJNMxw0odB36uRyWWmsaMzDthoUiO41M8JmUUcccB972G4xjMDHOiRmU1xjnqgSdBqZDwn2lUeVdSYK7UjEeg5FhPlMuMcx07LnS6NSpplheaZPHiQWnBkBbINAwlVJJYsxkYOJYURoPiCZY41tDWTihJRs5jwTnOkjDFnLJZQlJcMF2GKq3NeWelnCIfS0JOV1tgKpMqVW2vFoI5bsLzCflxtWdlPjWtUc7DsJR4BtG2zG/KEJaEjrCcl6EsGDn+NiDTUekPNnM9L4eDTTKduxETIS8aEUYzqqV1oPPc6WLiOiH0ABZndWcmM5iXDw+ePJ6Xu1vBznd+WzBcCoK9ra29zbbg9lLw4MH9wL/TFmwsBb6/sbvhrxZ2HPNRqdJqPTTHWIZ70gyvmgdemhuHkyYnbY7lmNOsloj8Ulr718nxdL0c/K4cUlBOL0g74JK0QvD0r0KWxA3Rsi0GnysLalWUloHLHln2yGW+Zb7Lni1ZVD5z2Z5ley57idfUbJyu8L44awuN0wib73EZ7tvn7btpaJYAzYnMw5vVnzEdzYFdKAdu/D3bg3su27Fsx2W7lu26LHps4WMXPrHsicsOLTt02ZFlRy57Zdkrl7227LXLzi07d9nUsqnLZpbNXHZh2YXLkqmdicSF8dMaxpiVT1s9xSyfYFBEHF7WynYVEdFLQWW6nOSKMpHVkjMslw5XyItaA4bLqh25xgvbVRxMbB0L21XsEnapWNiu4nnjKc/XPiVp5EjW5tCNOvTaOsaYXz5lYcOP88q2oenpxWKvMxajkcRyVuZCUXNgodm4ryY4J6qfwO+9rE4xqh9B8FiKIkucfTIfpzkT2vFGZgOvNmXgkIfBxDA8I7K0ieZtCIcP0oRQYgVUI6zPMc36DSVCqNdDSnCCcJJUg8AMLeKq40xCuOi0BgwrMu5rOCWofnPssJmLc9XuDy60gIWMm+K2qsrXdjc6OcjhbeBC5hOTAPVQJqiaoUXLW4Ep1WZ8i6uH4FZDWK02B6hYCJnQDOs1wc05g+BE4nMTDOsGWYRwCscs4+ygt109ZFv+tsXTHjwUm1y2FCa14c9Ns2BqjFzpGYPT2CwTCZlvH8dUwqG2D4XR+LSPqh244EjBlrrtD+7EvF8NaDtiMJSbW/6omUVEisgzkvzrPIHv96HLjNX3w9W8DPqc6f9bXW7SQBJz6sXZuJGHZhnMBXQJkgw327WtpDF9Jf8s18bfpNIYvmbssOoGFATG33e6Y9MGuXZGftt8jzTzxowu+mXL+6+5g6HJ3YGvpsD9RmobL4eD4PZg8/uNW3d36u+na96X3lfe117g3fHueg+9fe+FF3tz7yfvZ++X7q/dN93fur8vpFev1DGfeytX948/Ae8xTw4=
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 37 of 61
One draw as an example
010
2030
4050
y
Training setTest setNew drawTrue mean
0.0 0.2 0.4 0.6 0.8 1.0
−3−2
−10
12
3
x
y−
µ
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Correlation between observations can pull training and test observations close to one another, but potentially far from an independent draw
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 38 of 61
Simulated MSE
0 5 10 15
01
23
45
6
MSE
Den
sity
Training errorTest set errorOut−of−sample (true) error
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Mean training error: 0.40 Mean test set error: 0.61 Mean true error: 1.61 (also, long tail!)
Matches theory! Irreducible error: 1 Estimator variance: 0.61 Expected bias: 0 (OLS is unbiased) Expected training optimism: 1.21 Expected test set optimism: 1
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 39 of 61
Solution: Split by dependencies Study covariance, and design data splitting around it – But can’t estimate both mean and the covariance
structure, have to assume one (Opsomer et al., 2001) – (For covariance, no amount of data is ever enough!) Examples in literature: – Temporal block cross-validation (Bergmeir et al., 2018) – Leave-one-subject-out cross-validation (Hammerla &
Plötz, 2015) – Network cross-validation
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 40 of 61
Reflection
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 41 of 61
About me
at Harvard University
MACHINE LEARNINGD E P A R T M E N T
Data Science For Social GoodSummer Fellowship
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 42 of 61
Background “Methods are like people: if you focus only on what they can’t do, you will always be disappointed.” (Shapiro, 2014)
Trivially, all models are wrong because they aren’t the thing itself. But specifically, when, why, and how does it matter that ML is “wrong”?
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 43 of 61
Encountering social science
42 COMMUNICATIONS OF THE ACM | MARCH 2018 | VOL. 61 | NO. 3
V viewpoints
IM
AG
E B
Y E
VAN
NO
VO
ST
RO
tional social science sits at the inter-section of computer science, statistics, and social science.
For me, shifting away from tradi-tional machine learning and into this interdisciplinary space has meant that I have needed to think outside the algo-rithmic black boxes often associated with machine learning, focusing in-stead on the opportunities and chal-lenges involved in developing and us-ing machine learning methods to analyze real-world data about society.
This Viewpoint constitutes a reflec-tion on these opportunities and chal-lenges. I structure my discussion here
THIS VIEWPOINT IS about differ-ences between computer sci-ence and social science, and their implications for compu-tational social science. Spoiler alert: The punchline is simple. Despite all the hype, machine learning is not a be-all and end-all solution. We still need so-cial scientists if we are going to use ma-chine learning to study social phenomena in a responsible and ethical manner.
I am a machine learning researcher by training. That said, my recent work has been pretty far from traditional machine learning. Instead, my focus has been on computational social sci-ence—the study of social phenomena using digitized information and com-putational and statistical methods.
For example, imagine you want to know how much activity on websites such as Amazon or Netflix is caused by recommendations versus other fac-tors. To answer this question, you might develop a statistical model for estimating causal effects from observa-tional data such as the numbers of rec-ommendation-based visits and num-bers of total visits to individual product or movie pages over time.9
Alternatively, imagine you are inter-ested in explaining when and why sen-ators’ voting patterns on particular is-sues deviate from what would be expected from their party affiliations and ideologies. To answer this ques-tion, you might model a set of issue-
based adjustments to each senator’s ideological position using their con-gressional voting history and the corre-sponding bill text.4,8
Finally, imagine you want to study the faculty hiring system in the U.S. to determine whether there is evidence of a hierarchy reflective of systematic so-cial inequality. Here, you might model the dynamics of hiring relationships between universities over time using the placements of thousands of tenure-track faculty.3
Unsurprisingly, tackling these kinds of questions requires an interdisciplin-ary approach—and, indeed, computa-
DOI:10.1145/3132698 Hanna Wallach
Viewpoint Computational Social Science ≠ Computer Science + Social Data The important intersection of computer science and social science.
3 Jun 2004 17:3 AR AR219-SO30-12.tex AR219-SO30-12.sgm LaTeX2e(2002/01/18) P1: IBC10.1146/annurev.soc.30.020404.104342
Annu. Rev. Sociol. 2004. 30:243–70doi: 10.1146/annurev.soc.30.020404.104342
Copyright c⃝ 2004 by Annual Reviews. All rights reservedFirst published online as a Review in Advance on March 9, 2004
THE “NEW” SCIENCE OF NETWORKS
Duncan J. WattsDepartment of Sociology, Columbia University, New York, NY 10027; Santa Fe Institute,Santa Fe, New Mexico 97501; email: [email protected]
Key Words graph theory, mathematical models, network data, dynamical systems
■ Abstract In recent years, the analysis and modeling of networks, and also net-worked dynamical systems, have been the subject of considerable interdisciplinaryinterest, yielding several hundred papers in physics, mathematics, computer science,biology, economics, and sociology journals (Newman 2003c), as well as a number ofbooks (Barabasi 2002, Buchanan 2002, Watts 2003). Here I review the major findingsof this emerging field and discuss briefly their relationship with previous work in thesocial and mathematical sciences.
INTRODUCTION
Building on a long tradition of network analysis in sociology and anthropology(Degenne & Forse 1994, Scott 2000, Wasserman & Faust 1994) and an even longerhistory of graph theory in discrete mathematics (Ahuja et al. 1993, Bollobas 1998,West 1996), the study of networks and networked systems has exploded acrossthe academic spectrum in the past five years. Spurred by the rapidly growingavailability of cheap yet powerful computers and large-scale electronic datasets,researchers from the mathematical, biological, and social sciences have madesubstantial progress on a number of previously intractable problems, reformulatingold ideas, introducing new techniques, and uncovering connections between whathad seemed to be quite different problems. The result has been called the “newscience of networks” (Barabasi 2002, Buchanan 2002, Watts 2003)—a label thatmay strike many sociologists as misleading, given the familiarity (to social networkanalysts) of many of its central ideas. Nevertheless, the label does capture the senseof excitement surrounding what is unquestionably a fast developing field—newpapers are appearing almost daily—and also the unprecedented degree of synthesisthat this excitement has generated across the many disciplines in which network-related problems arise.
In reviewing the main results of this field, I have tried to strike a balance betweenthe historical progression of the ideas and their logical order. To this end, the firstsection describes the main network modeling approaches, regarding local struc-ture, global connectivity, searchability, and highly skewed degree distributions.
0360-0572/04/0811-0243$14.00 243
“machine learning is not a be-all and end-all solution.”
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 44 of 61
Which social science? Is a “hard” version of social science compatible with
modeling (econometrics, political science, quantitative sociology, biological anthropology, etc.)
Enriches ML and avoid pitfalls, but is not the most valuable and profound offering
Critical sociology, cultural anthropology, critical race studies, feminist studies, STS: not just about the world, but about the very ways we see and interact with the world
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 45 of 61
The original
Toward a Critical Technical Practice: Lessons Learned Trying to Reform AI
Philip E. Agre University of California, San Diego
Every teclmology fits, in its own unique way, into a far-flung network of different sites of social practice. Some technologies are employed in a specific site, and in those cases we often feel that we can warrant clear cause-and-effect stories about the transformations that have accompanied them, either in that site or others. Other teclmologies, such as electric lighting and the telephone, are so ubiquitous-found contributing to the evolution of the activities and relations of so many distinct sites of prac-tice-that it requires considerable effort to understand their effects on society, assuming that such a global notion of "effects" even makes sense (Hughes, 1983; Latour, 1987).
Computers fall in this latter category of ubiquitous technologies. In from an analytical standpoint, computers are worse than that. Com-
puters are representational artifacts, and the people who design them often start by constructing representations of the activities found in the sites where they will be used. This is the purpose of systems analysis, for example, and of the systematic mapping of conceptual entities and rela-tions in the early stages of database design. A computer, does not
have an instrumental use in a site of practice; the computer is frequently about that site in its very design. In this sense and others, computing has been constituted as a kind of imperialism; it aims to reinvent virtually every other site of practice in its own image.
As a result, the institutional relations between the computer world and the rest of the world can be tremendously complicated-much more
131
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 46 of 61
What is “critical”? “I finally comprehended the difference between
critical thinking and its opposite. Technical people are not dumb, quite the contrary, but technical curricula rarely include critical thinking in the sense I have in mind. Critical thinking means that you can, so to speak, see your glasses. You can look at the world, or you can back up and look at the framework of concepts and assumptions and practices through which you look at the world.” (Agre, 2000)
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 47 of 61
Backing up in ML Back up and ask, what is the framework of concepts and assumptions and practices through which ML looks at the world?
Identify this to understand the limits, and proper use, of ML
Start from the very beginning: quantification. End at ML-specific model validation.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 48 of 61
Larger goals of this work Communication oriented to solutions – “Burn it all down” on one side; totalitarian
quantification + optimization on the other
Critics can be specific about what they object to, not just “ML is quantitative therefore evil”
ML can see how critiques show points of potential failure that can be addressed, rather than dismissing critiques
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 49 of 61
Pitfalls The fallacy of alternatives (Agre, 1997): “Their stance
was: if your alternative is so good then you will use it to write programs that solve problems better than anybody else's, and then everybody will believe you.”
Too limited to say, “social science can help solve a problem better”
Or even: “social science can help you identify the right problems”
Critical social science will reframe what even counts as a problem!
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 50 of 61
Future steps
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 51 of 61
Future steps Many (but not all) of the problems can be alleviated
(but not solved) through mixed methods: incorporating alternatives (qualitative research, statistical modeling, experimental design, etc.)
But mixed methods are hard! We will need to develop both intellectual frameworks
for mixed methods ML (e.g., purposeful division of labor), as well as cultural practices within ML for actually doing it. Future work, guided by this attempt to comprehensively map key points of failure!
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 52 of 61
Limitations Focused on research, not industry; some things (especially implementation, which industry may know lots about) may not be meaningful critiques
Still developing what the intellectual framework would be, and what the cultural practices would be!
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 53 of 61
Thank you!
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 54 of 61
References (1/8) Abbott, Andrew. “Transcending General Linear
Reality.” Sociological Theory 6, no. 2 (1988): 169–186. https://dx.doi.org/10.2307/202114.
Agre, Philip E. “Towards a Critical Technical Practice: Lessons Learned from Trying to Reform AI.” In Social Science, Technical Systems, and Cooperative Work: Beyond the Great Divide, edited by Geoffrey C. Bowker, Susan Leigh Star, Will Turner, and Les Gasser, 131–158. Lawrence Erlbaum Associates, 11997. https://web.archive.org/web/20040203070641/http://polaris.gseis.ucla.edu/pagre/critical.html.
Agre, Philip E. “Notes on critical thinking, Microsoft, and eBay, along with a bunch of recommendations and some URL's.” Red Rock Eater Newsletter, 12 July 2000.
https://pages.gseis.ucla.edu/faculty/agre/notes/00-7-12.html.
Bailey, David H., Jonathan M. Borwein, Marcos López de Prado, and Qiji Jim Zhu. “Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance.” Notices of the AMS 61, no. 5 (2014): 458–471. https://dx.doi.org/10.1090/noti1105.
Bergmeir, Christoph, Rob J. Hyndman, and Bonsoo Koo. “A Note on the Validity of Cross-Validation for Evaluating Autoregressive Time Series Prediction.” Computational Statistics & Data Analysis 120 (2018): 70–83. https://dx.doi.org/10.1016/j.csda.2017.11.003.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 55 of 61
References (2/8) Borgatti, Steve. “Types of Validity.” BA 762: Research
Methods. Gatton College of Business & Engineering, University of Kentucky, 2019. https://sites.google.com/site/ba762researchmethods/materials/handouts/typesofvalidity.
Box, George E. P. “Robustness in the Strategy of Scientific Model Building.” Technical Report #1954. Mathematics Research Center, University of Wisconsin-Madison, 1979.
Breiman, Leo. “Statistical Modeling: The Two Cultures.” Statistical Science 16, no. 3 (2001): 199–231. https://dx.doi.org/10.1214/ss/1009213726
Cardoso, Fatima, Laura J. van’t Veer, Jan Bogaerts, Leen Slaets, Giuseppe Viale, Suzette Delaloge, Jean-Yves Pierga, Etienne Brain, Sylvain
Causeret, Mauro DeLorenzi, Annuska M. Glas, Vassilis Golfinopoulos, Theodora Goulioti, Susan Knox, Erika Matos, Bart Meulemans, Peter A. Neijenhuis, Ulrike Nitz, Rodolfo Passalacqua, Peter Ravdin, Isabel T. Rubio, Mahasti Saghatchian, Tineke J. Smilde, Christos Sotiriou, Lisette Stork, Carolyn Straehle, Geraldine Thomas, Alastair M. Thompson, Jacobus M. van der Hoeven, Peter Vuylsteke, René Bernards, Konstantinos Tryfonidis, Emiel Rutgers, and Martine Piccart. “70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer.” New England Journal of Medicine 375, no. 8 (2016): 717–729. https://dx.doi.org/10.1056/NEJMoa1602253.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 56 of 61
References (3/8) Chatfield, Chris. “Model Uncertainty, Data Mining
and Statistical Inference.” Journal of the Royal Statistical Society. Series A (Statistics in Society) 158, no. 3 (1995): 419–466. https://dx.doi.org/10.2307/2983440.
Cox, David R. “Role of Models in Statistical Analysis.” Statistical Science 5, no. 2 (May 1990): 169–174. https://dx/doi.org/10.1214/ss/1177012165
Doshi-Velez, Finale and Been Kim. Towards a Rigorous Science of Interpretable Machine Learning. 2017. https://arxiv.org/abs/1702.08608.
Dwork, Cynthia, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. “The Reusable Holdout: Preserving Validity in Adaptive Data Analysis.” Science 349, no. 6248 (2015): 636–638.
https://dx.doi.org/10.1126/science.aaa9375. Efron, Bradley. “The Estimation of Prediction Error:
Covariance Penalties and Cross-Validation.” Journal of the American Statistical Association 99, no. 467 (2004): 619–632. https://dx.doi.org/10.1198/016214504000000692.
Fisher, R. A. “On the Mathematical Foundations of Theoretical Statistics.” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 222 (1922): 309–368. https://dx/doi.org/10.1098/rsta.1922.0009
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 57 of 61
References (4/8) Gayo-Avello, Daniel. “‘I Wanted to Predict Elections
with Twitter and all I got was this Lousy Paper’: A Balanced Survey on Election Prediction using Twitter Data.” 2012. https://arxiv.org/abs/1204.6441.
Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. “Detecting Influenza Epidemics Using Search Engine Query Data.” Nature 457 (2009): 1012–1015. https://dx.doi.org/10.1038/nature07634.
Hammerla, Nils Y., and Thomas Plötz. “Let’s (Not) Stick Together: Pairwise Similarity Biases Cross-Validation in Activity Recognition.” In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15), 1041–1051.
https://dx.doi.org/10.1145/2750858.2807551. Jones, Matthew L. “How We Became
Instrumentalists (Again): Data Positivism since World War II.” Historical Studies in the Natural Sciences 48, no. 5 (2018): 673–684. https://dx.doi.org/10.1525/hsns.2018.48.5.673.
Kass, Robert E. “Statistical Inference: The Big Picture.” Statistical Science 26, no. 1 (2011): 1–9. https://dx.doi.org/10.1214/10-STS337.
Keyes, Os. “The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition.” In Proceedings of the ACM on Human-Computer Interaction (CSCW), 2, 88:1–88:22, 2018.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 58 of 61
References (5/8) Kleinberg, Jon, Jens Ludwig, Sendhil Mullainathan,
and Ziad Obermeyer. “Prediction Policy Problems.” American Economic Review 105, no. 5 (2015): 491–495. https://dx.doi.org/10.1257/aer.p20151023.
Lanius, Candice. “Fact Check: Your Demand for Statistical Proof is Racist.” Cyborgology blog, January 15, 2015. https://thesocietypages.org/cyborgology/2015/01/12/fact-check-your-demand-for-statistical-proof-is-racist/.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343, no. 6176 (2014): 1203–1205. https://dx.doi.org/10.1126/science.1248506.
Lipton, Zachary C. “The Myth of Model Interpretability.” KDnuggets 15, no. 13 (April 2015). https://www.kdnuggets.com/2015/04/model-interpretability-neural-networks-deep-learning.html.
Lipton, Zachary C. and Jacob Steinhardt. “Troubling trends in machine learning scholarship.” 2018. https://arxiv.org/abs/1807.03341.
Messerli, Franz H. “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” The New England Journal of Medicine, 367 (2012): 1562–1564. doi:10.1056/NEJMon1211064.
Mullainathan, Sendhil and Jann Spiess. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31, no. 2 (2017): 87–106. https://dx.doi.org/10.1257/jep.31.2.87.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 59 of 61
References (6/8) Opsomer, Jean, Yuedong Wang, and Yuhong Yang.
“Nonparametric Regression with Correlated Errors.” Statistical Science 16, no. 2 (2001): 134–153. https://dx.doi.org/10.1214/ss/1009213287.
Park, Greg. “The Dangers of Overfitting: A Kaggle Postmortem.” 2012. http://gregpark.io/blog/Kaggle-Psychopathy-Postmortem/.
Patton, Michael Quinn. “The Nature, Niche, Value, and Fruit of Qualitative Inquiry.” In Qualitative Research & Evaluation Methods: Integrating Theory and Practice, 4th edition, 2–44. SAGE Publications, Inc., 2014. https://uk.sagepub.com/sites/default/files/upm-binaries/64990_Patton_Ch_01.pdf.
Rescher, Nicholas. Predicting the Future: An Introduction to the Theory of Forecasting. State University of New York Press, 1998.
Rose, Todd. The End of Average: How We Succeed in a World That Values Sameness. New York: HarperOne, 2016. See excerpt at https://www.thestar.com/news/insight/2016/01/16/when-us-air-force-discovered-the-flaw-of-averages.html. Animated video: https://vimeo.com/237632676.
Rosset, Saharon, and Ryan J. Tibshirani. “From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation.” Journal of the American Statistical Association (2019). https://dx.doi.org/10.1080/01621459.2018.1424632.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 60 of 61
References (7/8) Santillana, Mauricio, Wendong Zhang, Benjamin M.
Althouse, and John W. Ayers. “What Can Digital Disease Detection Learn from (an External Revision to) Google Flu Trends?” American Journal of Preventive Medicine 47, no. 3 (2014): 341–347. http://dx.doi.org/10.1016/j.amepre.2014.05.020.
Shapiro, Ian. “Methods are like people: If you focus only on what they can’t do, you will always be disappointed.” In Field Experiments and Their Critics: Essays on the Uses and Abuses of Experimentation in the Social Sciences, edited by Dawn Langan Teele, 228–241. Yale University Press, 2014.
Shmueli, Galit. “To Explain or to Predict?” Statistical Science 25, no. 3 (2010): 289–310. https://dx.doi.org/10.1214/10-STS330.
Spirtes, Peter and Kun Zhang. “Causal Discovery and Inference: Concepts and Recent Methodological Advances.” Applied Informatics 3, no. 3 (2016): 1–28. https://dx.doi.org/10.1186/s40535-016-0018-x.
Tibshirani, Robert. “Recent Advances in Post-Selection Inference.” Breiman Lecture, NIPS 2015 (9 December 2015) http://statweb.stanford.edu/~tibs/ftp/nips2015.pdf
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 61 of 61
References (8/8) van’t Veer, Laura J., Hongyue Dai, Marc J. van de
Vijver, Yudong D. He, Augustinus A. M. Hart, Mao Mao, Hans L. Peterse, Karin van der Kooy, Matthew J. Marton, Anke T. Witteveen, George J. Schreiber, Ron M. Kerkhoven, Chris Roberts, Peter S. Linsley, René Bernards, and Stephen H. Friend. “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer.” Nature 415, no. 6871 (2002): 530–536. https://dx.doi.org/10.1038/415530a.
Wallach, Hanna. “Computational Social Science ≠ Computer Science + Social Data.” Communications of the ACM 61, no. 3 (2018): 42–44. https://dx.doi.org/10.1145/3132698.
Wasserman, Larry A. “Rise of the Machines.” In Past, Present, and Future of Statistical Science, 525–536. Boca Raton, FL: Chapman and Hall/CRC, 2013. http://www.stat.cmu.edu/~larry/Wasserman.pdf
Watts, Duncan J. “The ‘New’ Science of Networks.” Annual Review of Sociology 30 (2004): 243–270. https://dx.doi.org/10.1146/annurev.soc.30.020404.104342.
Wu, Shaohua, T. J. Harris, and K. B. McAuley, “The Use of Simplified or Misspecified Models: Linear Case.” The Canadian Journal of Chemical Engineering 85, no. 4 (2007): 386–398. https://dx.doi.org/10.1002/cjce.5450850401.
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 62 of 61
Backup slides
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Backup slides
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 63 of 61
“True” models predict worse A linear data-generating process.
Wu et al. (2007): Fitting only Xp has lower expected MSE than fitting the model that generated the data when:
AAACYHicbVFNa9tAEF0pTeo6X056ay9LTCGhJUih0EIvIbm0l5JCHRssV4zWI3vJrqTsjhqM8J/MLYde+ku6Ulzjxh1YePPezOzs26RQ0lIQPHj+xrPNreetF+3tnd29/c7B4bXNSyOwJ3KVm0ECFpXMsEeSFA4Kg6AThf3k5rLW+z/RWJln32lW4EjDJJOpFECOijt3kQaaJmk1m/PISs2bXICqvjpCYUrH/G9JlCDBPC6WxKBO3vKGj29X6dt3PPpUD5xo+HG2VL64mUZOpnQSd7rBadAEXwfhAnTZIq7izn00zkWpMSOhwNphGBQ0qsCQFArn7ai0WIC4gQkOHcxAox1VjUFz/sYxY57mxp2MeMOudlSgrZ3pxFXWm9qnWk3+TxuWlH4cVTIrSsJMPF6UlopTzmu3+VgaFKRmDoAw0u3KxRQMCHJ/0nYmhE+fvA6uz05Dh7+9755fLOxosdfsiB2zkH1g5+wzu2I9Jtgvb8Pb8Xa9337L3/cPHkt9b9Hzkv0T/qs/yia2NA==
AAACXXicbZFNSwMxEIaz61dbq1Y9ePASLIIelF0R9OCh6KXeKvQLum3JptkazGa3yaxQlv2T3vTiXzH9Ets6EHjnmUwyeePHgmtwnE/L3tjc2t7J5Qu7xb39g9LhUVNHiaKsQSMRqbZPNBNcsgZwEKwdK0ZCX7CW//Y0qbfemdI8knUYx6wbkqHkAacEDOqXwAsJvPpB6vkMSNZPR1kvrWcL2v4lFwv0bJDMrhZp1aRxdrncsH4ofsAj7Gk+DEnvpl8qO9fONPC6cOeijOZR65c+vEFEk5BJoIJo3XGdGLopUcCpYFnBSzSLCX0jQ9YxUpKQ6W46dSfD54YMcBApsyTgKf3bkZJQ63Hom52TsfVqbQL/q3USCO67KZdxAkzS2UVBIjBEeGI1HnDFKIixEYQqbmbF9JUoQsF8SMGY4K4+eV00b65do19uy5XHuR05dIrO0AVy0R2qoCqqoQai6MtCVt4qWN/2ll2092dbbWvec4yWwj75AX7DuJc=
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Backup slides
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 64 of 61
Proposal: Precise language Predict the likelihood: Calculate the likelihood Predict the risk, predict the probability: Estimate the risk, estimate the probability Prediction, predicted: Fitted value, fitted We predict: We detect, we classify, we model X predicts Y: X is correlated with Y X predicts Y, ceteris paribus (partial correlation): X is associated with Y
Introduction
Meaning and measurement
Central tendency
Causality
Capturing variability
Cross-validation
Reflection
Future steps
References
Backup slides
-
A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 65 of 61
Proposal: Alternative language Retrodiction Backtesting (retrodiction for testing) Hindcasting (backtesting for forecasting)
In-sample vs. Interpolation vs. Diagnosis vs. Retrospective vs.
Out of-sample Extrapolation Prognosis Prospective
Introduction
Meaning and measurement
Central tendency
Causality