a hierarchy of limitations in machine learning · a hierarchy of limitations in machine learning...

72
machine learning A Hierarchy of Limitations in Machine Learning Momin M. Malik, Data Science Postdoctoral Fellow, Berkman Klein Center for Internet & Society at Harvard University > 03 December 2019, Microsoft Research New England

Upload: others

Post on 24-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 1 of 61

    machine learn

    ing

    A Hierarchy of Limitations in Machine Learning

    Momin M. Malik, Data Science Postdoctoral Fellow, Berkman Klein Center for Internet & Society at Harvard University > 03 December 2019, Microsoft Research New England

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 2 of 61

     Objective   What are all the ways in which machine learning can fail*?

      How can we address these failure points?

    * Fail = be unreliable, or result in unanticipated harm

      Failures will form a hierarchy

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 3 of 61

     The basic bargain of ML So long as machine learning establishes external validity (generalizability) from a given input, it can arguably ignore…   All other validity

    questions   All the problems that

    plague statistics

    y xunknown

    decision treesneural nets

    y xnature

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Breiman, 2001, Stat. Sci. See also Jones, 2018, Hist. Stud. Nat. Sci.

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 4 of 61

     Can focus on one goal Random variable, X

    AAAMgnicpZbfb9s2EMfVrts6N1vT7W17YRcYGArPkJwG7cMCtFkKrA8tsixp0sZGQFGUTYQUNZJKrAgG9tfsdft39t/sKCuhTGXdLwWBz/e5+4p3onWMc860CcM/bt3+4M6HH31895PevbVPP7u//uDzN1oWitBDIrlUxzHWlLOMHhpmOD3OFcUi5vQoPvve8qNzqjST2YEpczoReJqxlBFswHW6/uXY0Lmp9nGWSIHOsWIYUgdogY5P1zfCYVhfqGtEjbERNNfe6YN7e+NEkkLQzBCOtT6JwtxMKqwMI5wueuNC0xyTMzylVRwLzwHf+y3HCZgZFlRPqrrMBeqDJ0GpVPCfGVR7VySw0LoUMUQKbGbaZ9Z5EzspTPp0UrEsLwzNyPJGacGRkcj2DCVMUWJ4CQYmikE1iMywwsRAZ3tjRTN6QaQQ0MRxigXjZUJTXHBTjXXamIveynKKfKooPVttgV2Z0qnueo2U3HNTkc/oz6s9q/K5bY32boaVwiVku2U+qsawK0yM1aIaq4LTk28jOp9U4XArN4tqNNyi84WfMZPqspVhYyZNaJMI911pLBGTSqd139ta1RirqWAZKFkm8+WnElXjX9wQjuc3h4PfDwcJJtgl7SZck04Knv9VyhXxU6ImNk6ryGcvHXvps9Cx0Gf7Vyyu9n32wrEXlrW3XTXec3TPz2RZAjSnKh8/rP+s6cUcLK4rPvDzn7sVP/fZjmM7Ptt1bNdnrxx75bMjx458duzYsc/eOvbWZ+8ce+ezC8cufDZ3bO6z0rHSZ5eOXfosmbtmJz4krxtIMK9e+zTGPJ9hiIgF/Cxq24+IqbkKqE2fi6KhYPiM5ppxmTUBMA6uHH7gwczdZWn7EbuUX0csbT/iJwa/3iZiafsRSUsjuVHDtNZhmnWsvvwMO7tcvjmtxVmssCqrXGpmJyDLpgM9wznVg4QSqeqxqAcxJE+VLLLEe+vm0zTn0nje2I6D+hUPHHQ4NI7jkqrKCS26EEYZbUNYYg10K20gMMsGrUiEUL+PtBQU4SSpi8AcLfPq4ZhQIXudgmE3kYGBmaMH7dphNMgL3e0PLoyETYjbwd2oWq/rbnVymMNOFlLlMyuA+iiTTJdo2fJOYsqMrW959RF8NZDWRNtxTKRUCcuwuSG5/cwgOVH4wibDrkAOIZzC0LbOHnrf1Ueu5e/bPN3iYbHJdUvhobb8uW0WPBobrk3JYbaXmUzoYvuEMEXsucvMGDkboHpOFQJpGDzb4fAJEYO6oO2YQykPn4aTtoqMNVXnNPnXOlEYDqDLnDffR6u6HPqcmf+7utzKgIg9Q+Fs2tJhWQbPAroEIqOt7tpWZGxf6T/Tevw3UsaecV1ZTQMKCvUPvO442Sg3XuWb9nTb1iWcLfvllvdftaOR1e7BGTzyT9xd481oGG0ORz8+3ni205zG7wZfBV8H3wRR8CR4FvwQ7AWHAQl+CX4Nfgt+X7uz9mgtWttcht6+1eR8Eaxca9/9Cbb7oiE=

    Mean, E(X ) = µAAAMgXicpZZtb9s2EMfVdg+dm23p9mrYG3aBgXZQDdlt0AJDgDZLgPVFiqxLm7SREVASZRPhg0ZSiR3B2KfZ2+3z7NvsKCuhTGXdk4LA5/vd/cU70TomBaPaRNEfN27e+uDDjz6+/Unvztqnn32+fveLN1qWKiWvU8mkOkqwJowK8tpQw8hRoQjmCSOHyen3lh+eEaWpFAdmXpAxxxNBc5piA66T9a9iQ2am2iNYhGiB4t37Rw/QFop5ebK+EQ2i+kJdY9gYG0Fz7Z/cvbMfZzItOREmZVjr42FUmHGFlaEpI4teXGpS4PQUT0iVJNxzwPd+y3EMpsCc6HFVV7lAffBkKJcK/oVBtXdFAnOt5zyBSI7NVPvMOq9jx6XJn44rKorSEJEub5SXDBmJbMtQRhVJDZuDgVNFoRqUTrHCqYHG9mJFBDlPJedYZHGOOWXzjOS4ZKaKdd6Yi97Kcspiogg5XW2BXZnSue56jZTMcxNeTMnPqz2ripltjfZuhpXCc8h2y/y2imFTmASrRRWrkpHjh0MyG1fRYLMwi2o02CSzhZ8xleqilWFjxk1okwj3XWlsyseVzuu+t7WqGKsJpwKULJPF8lPxqvEvrgnHs+vDwe+HgwTl9IJ0E65IJwXP/irlkvgpwyY2yauhz1449sJnkWORz15dsqR65bNdx3Yta2+7Kt53dN/PpCIDWhBVxPfqP2t6MQeLq4oP/PznbsXPfbbt2LbPdhzb8dmeY3s+O3Ts0GdHjh357K1jb332zrF3Pjt37NxnM8dmPps7NvfZhWMXPstmrtmZD9OXDUwxq176NMGsmGKISDj8LGrbj0iIuQyoTZ/zsqFg+IwUmjIpmoAzrC4dfuDB1N1lafsRO4RdRSxtP+InCr/eJmJp+xFZSyO7VsO01mGaday+/Aw9vVi+Oa3FaKKwmleF1NQOQComoZ7igugwI6lU9VTUYQLJEyVLkXlv3WKSF0waz5vYcVC/4oGDDoPGMTwnqnJCiy6EUUbaEJZYA91KCzmmImxFIoT6faQlJwhnWV0EZmiZVw/HjHDZ6xQMuykNDcwcHbZrh9Egz3W3P7g0EjYhbgd3o2q9rrvVyUEBO5lLVUytAOojIameo2XLO4k5Nba+5dVH8NVAWhNtx3EqpcqowOaa5PYzg+RM4XObDLsCOYRwDkPbOnvofVcfuZa/b/N0i4fFZlcthYfa8he2WfBobLg2cwazfS5kRhZbxylVcEQKYWE0PQ1RPadKjjQMnq1o8CTlYV3QVsKglHtPo3FbRSaaqDOS/WudYRSF0GXGmu+jVV0GfRbm/66usDIgYs9QWExaOlQIeBbQJRAZbXbXtiJj+0r+mdbjv5EyGE7HrqymASWB+kOvO052WBiv8kf2dNvWTRld9sst779qD0dWuwdn8KF/4u4ab0aD4aPB6MfHG8+2m9P47eDr4JvgfjAMngTPgh+C/eB1kAa/BL8GvwW/r91ae7AWrY2WoTdvNDlfBivX2nd/Aj2goH4=

    Estimator, bµAAAMiHicpZZbb9s2FMfV7ta52ZZuj3thFxgYBs2Q3QbpHgK0WQKsDy2yLmnSRkZAUZRNhBI5koqtCH7dp9nr9l32bXZ0SShTWXdTEPj4/M75i+eI1mEkOdMmCP64c/e99z/48KN7Hw/ub3zy6WebDz5/rUWuCD0mggt1GmFNOcvosWGG01OpKE4jTk+ii+8rfnJJlWYiOzKFpNMUzzKWMIINuM43UWjo0pQH2rAUG6F8tELhgsV0jk0ZpvnqfHMrGAX1hfrGuDW2vPY6PH9w/zCMBclTmhnCsdZn40CaaYmVYYTT1SDMNZWYXOAZLaModRzwfdhxnIGZ4ZTqaVkXu0JD8MQoEQr+M4Nq75oETrUu0ggioaK5dlnlvI2d5SZ5Mi1ZJnNDM9LcKMk5MgJVnUMxU5QYXoCBiWJQDSJzrDAx0N9BqGhGF0SkKc7iMMEp40VME5xz6KNOWnM1WFtOLmeK0ov1FlQrUzrRfa8Rgjtumso5/Xm9Z6VcVq3Rzs2wUriAbLvMb8oQ9oaJsFqVoco5Pft2TJfTMhhtS7MqJ6Ntuly5GXOhrjoZVcy0DW0T4b5rjSXptNRJ3feuVhliNUtZBkoVE7L5VGnZ+le3hOPl7eHgd8NBgqXsivYTbkgvBS//KuWauCnjNjZKyrHLnlv23GWBZYHLXl2zqHzlsgPLDirW3XZleGjpoZvJshiopEqGD+u/ynRijlY3FR+5+c/sip+5bM+yPZftW7bvsheWvXDZiWUnLju17NRlbyx747K3lr112cKyhcuWli1dVlhWuOzKsiuXxUvb7NiF5GULCeblS5dGmMs5hogohZ9FbbsRETXXAbXpcni1NxQMl1GpGRdZG3CJ1bXDDTya27s0thuxT/lNRGO7ET8x+PW2EY3tRsQdjfhWDdNZh2nXsf7yM+ziqnlzVhZnkcKqKKXQrJqDLJv5eo4l1X5MiVD1cNR+BMkzJfIsdt66cpZILozjjapxUL/igYMOh8ZxXFBVWqFVH8Ioo10IS6yB7qT5KWaZ34lECA2HSIuUIhzHdRGYoyavHo4xTcWgVzDsJuIbmDna79YOo0EsdL8/ODcCNiHuBvejar2+u9PJkYSdnAol55UAGqJMMF2gpuW9xISZqr7mGiL4aiCtja7GMRFCxSzD5pbk7jOD5FjhRZUMuwJZhHACQ7tyDtC7riGyLX/X5ukXD4uNb1oKD7Xjl1Wz4NFU4doUHGZ7kYmYrnbPCFNwRPJhYYxc+KieU3mKNAye3WC0Q1K/Lmg34lDKwyfBtKsiIk3VJY3/tc44CHzoMuft98m6Loc+Z+b/rk5WMiBSnaFwNuvosCyDZwFdApHJdn9tazJVX+k/03r8N1IGwyHZltU2IKdQv+90x8qOpXEqf1Sdbru6hLOmX3Z5/1V7PKm0B3AGH7sn7r7xejIaPxpNfny89XSvPY3f8770vvK+9sbejvfU+8E79I494v3i/er95v2+MdgINnY2vmtC795pc77w1q6NvT8B8BKlbg==

    Expected value, E(bµ)AAAMkXicpZbdbts2FMfV7qtzvTVdL3fDLjDQDpohuw3aXQRIswRYMbTIurRJGxsBJVE2EUrkSCq2IugZ9jS73Z5jb7NDWQllKuu+FAQ+Pr9z/uI5onUYCkaVDoI/btz84MOPPv7k1qe92/3PPr+zcfeLN4rnMiKvI864PA6xIoxm5LWmmpFjIQlOQ0aOwrPvDD86J1JRnh3qQpBpimcZTWiENbhONx5ONFnqcn8pSKRJjM4xy4mPKjTZfzBZ0JjMsS4naV49PN3YDIZBfaGuMWqMTa+5Dk7v3j6YxDzKU5LpiGGlTkaB0NMSS00jRqreJFdE4OgMz0gZhqnjgO+DluMEzAynRE3LuuoKDcATo4RL+M80qr1rEjhVqkhDiEyxniuXGed17CTXydNpSTORa5JFqxslOUOaI9NCFFMJzWIFGDiSFKpB0RxLDA2UoCRJRhYRT1OcxZMEp5QVMUlwzqCRKmnMqre2nFzMJCFn6y0wK5MqUV2v5pw5bpKKOfl5vWelWJrWKOdmWEpcQLZd5tflBDaJDrGsyonMGTn5ZkSW0zIYbgldlePhFllWbsacy4tWhomZNqFNItx3rbFROi1VUve9rVVOsJylNAMlw7hYfcq0bPzVNeF4eX04+N1wkKApvSDdhCvSScHLv0q5JG7KqIkNk3LksueWPXdZYFngsleXLCxfuWzfsn3D2tuunBxYeuBm0iwGKogUk/v1nzGdmMPqquJDN/+ZXfEzl+1atuuyPcv2XPbCshcuO7LsyGXHlh277K1lb132zrJ3LltYtnDZ0rKlywrLCpddWHbhsnhpmx27MHrZwAiz8qVLQ8zEHENEmMLPorbdiJDoy4DadDm821cUDJcRoSjjWRNwjuWlww08nNu7rGw3Yo+wq4iV7Ub8ROHX20SsbDcibmnE12ro1jp0s471l5+mZxerN6exGA0llkUpuKJmINJs5qs5FkT5MYm4rKek8kNInkmeZ7Hz1hWzRDCuHW9oxkH9igcOOgwax3BBZGmFqi6EUUbaEJZYA9VK81NMM78ViRAaDJDiKUE4jusiMEOrvHo4xiTlvU7BsJsiX8PMUX67dhgNfKG6/cG55rAJcTu4G1Xrdd2tTg4F7OSUSzE3AmiAMk5VgVYt7yQmVJv6VtcAwVcNaU20GccR5zKmGdbXJLefGSTHEi9MMuwKZBHCCQxt4+yh910DZFv+vs3TLR4WG1+1FB5qyy9Ms+DRmHClCwazvch4TKrtk4hKOCL5sDAanfmonlN5ihQMnu1g+CRK/bqg7ZBBKfefBtO2Cg8Vkeck/tc6oyDwocuMNd/H67oM+pzp/7s6YWRAxJyhcDZr6dAsg2cBXQKR8VZ3bWsypq/kn2k9/hspjeG0bMtqGpATqN93umNlR0I7lT8yp9u2bsToql92ef9VezQ22j04g4/cE3fXeDMejh4Nxz8+3tzZbU7jt7wvva+8B97Ie+LteN97B95rL/J+8X71fvN+79/rf9vf6TexN280Ofe8tav/w58IWahx

    Standard error,p

    V(bµ)AAAMoHicpZZLb9w2EMeV9JVu3MZpj70wMRZIC3Wh3cSILwYS1wGSQ1In8SuxFgYlUbuEKVEhKe/Kgr5KP02v7b3fpkNJNrWUm75kGDs7v5kR5y+uhkHGqFSe98eNm598+tnnX9z6cnB77auv76zf/eZQ8lyE5CDkjIvjAEvCaEoOFFWMHGeC4CRg5Cg4+0nzo3MiJOXpvioyMk3wLKUxDbEC1+n6lq/IUpVvFU4jLCJEhODCRRXy5QehSj/Bah4E5WH1wF/QiMyx9uXV99Xp+oY38uoL9Y1xa2w47bV3evf2nh/xME9IqkKGpTwZe5mallgoGjJSDfxckgyHZ3hGyiBILAd8H3YcJ2CmOCFyWtYiVGgIngjFXMB/qlDtXSmBEymLJIBI3ZS0mXZex05yFW9NS5pmuSJp2NwozhlSHGlFUUQFCRUrwMChoNANCudY4FCB7gNfkJQsQp4koLAf44SyIiIxzhkoKePWrAYry8mzmSDkbFUCvTIhY9n3Ks6Z5SZJNicfVjUrs6WWRlo3w0LgArLNMn8ofdgzKsCiKn2RM3Ly45gsp6U32sxUVU5Gm2RZ2RlzLi46GTpm2oa2iXDfFWHDZFrKuNa9W6v0sZglNK2a3cez5lMkZeuvrgnHy+vDwW+HQwma0AvST7givRS8/KuUS2KnjNvYIC7HNnth2AubeYZ5NntTXf0g39jsmWHPNOtuu9LfM3TPzqRpBDQjIvPv1X/atGL2q6uO9+38p2bFT222Y9iOzXYN27XZS8Ne2uzIsCObHRt2bLN3hr2z2XvD3ttsYdjCZkvDljYrDCtsdmHYhc2ipRE7smH4qoUhZuUrmwaYZXMMEUECP4vatiMCoi4DatPm8HJvKBg2I5mkjKdtwDkWlw47cH9u7tLYdsQuYVcRjW1HvKXw620jGtuOiDo1omtrqM46VLuO1ZefomcXzZtTW4wGAouizLikej7SdObKOc6IdCMSclEPTekGkDwTPE8j662bzeKMcWV5Az0O6lc8cKjDQDiGCyJKU6jqQxhlpAthiTWQnTQ3wTR1O5EIoeEQSZ4QhKOobgIz1OTVwzEiCR/0GobdFLoKZo50u73DaOAL2dcH54rDJsTd4H5UXa/v7ig5ymAnJ1xkc10ADVHKqSxQI3kvMaZK99dcQwRfFaS10Xoch5yLiKZYXZPcfWaQHAm80MmwK5BBCMcwtLVzgD52DZGR/GObp988LDa6khQeasefabHg0ehwqQoGs71IeUSq7ZOQCjgiubAwGp65qJ5TeYIkDJ5tb/Q4TNy6oe2AQSv3trxptwoPJBHnJPrXdcae54LKjLXfJ6t1Geicqv+7ukyXgSL6DIXTWacOTVN4FqASFJls9te2UkbrSv5ZrUd/U0phODybtloBcgL9u5Y6puw4U1bnD/Xptls3ZLTRyyzvv9YeT3TtAZzBx/aJu28cTkbjh6PJ60cbT3ba0/gt5zvnvvPAGTuPnSfOc2fPOXBC5xfnV+c35/e1+2vP135ee92E3rzR5nzrrFxr7/8EnVGvqA==

    Standard deviation,p

    V(X ) = �AAAMoXicpZbfb9s2EMfV7lfnZlu6Pe6FXWCgGzTDdhu0ewjQZimwPLTw0qRJaxkBJVE2EVJUSSqxIuhv2V+z1+15/82OshLKVNb9UhD4fJ+7E+8rWscwY1Tp4fCPW7c/+PCjjz+582nv7sZnn3+xee/L10rkMiJHkWBCnoRYEUZTcqSpZuQkkwTzkJHj8OxHw4/PiVRUpIe6yMiM43lKExphDa7TzR8CTZa6fKVxGmMZo5ic0xr5qEKBeid1GXCsF2FYvq4enHxboR1w0znHp5tbw8GwvlDXGDXGltdck9N7dydBLKKck1RHDCs1HQ0zPSux1DRipOoFuSIZjs7wnJRhyB0HfO+3HFMwU8yJmpW1ChXqgydGiZDwn2pUe9dKYK5UwUOINB0plxnnTWya6+TJrKRplmuSRqsbJTlDWiAjKYqpJJFmBRg4khS6QdECSxxpEL4XSJKSi0hwDhIHCeaUFTFJcM5AWpU0ZtVbW06ezSUhZ+sSmJVJlaiuVwvBHDfh2YK8W9eszJZGGuXcDEuJC8i2y/yuDGDT6BDLqgxkzsj0+xFZzsrhYDvTVTkebJNl5WYshLxsZZiYWRPaJMJ914SN+KxUSa17u1YZYDnnNK1WW09kq0/Jy8Zf3RCOlzeHg98NhxKU00vSTbgmnRS8/KuUK+KmjJrYMClHLtu3bN9lQ8uGLjuorn+NBy57btlzw9rbrgwmlk7cTJrGQDMis+B+/WdMJ+awuu740M1/Zlf8zGW7lu26bM+yPZe9sOyFy44tO3bZiWUnLntj2RuXvbXsrcsuLLtw2dKypcsKywqXXVp26bJ4acWOXRi9bGCEWfnSpSFm2QJDRMjhZ1HbbkRI9FVAbbqc5w0Fw2UkU5SJtAk4x/LK4QYeLuxdVrYbsUfYdcTKdiNemSnTRKxsNyJu1YhvrKFb69DNOtZffpqeXa7enMZiNJRYFmUmFDVTkKZzXy1wRpQfk0jIejQqP4TkuRR5Gjtv3WyeZExoxxuacVC/4oFDHQbCMVwQWdpCVRfCKCNtCEusgWql+RzT1G9FIoT6faQEJwjHcd0EZmiVVw/HmHDR6zQMuynyNcwc5bd7h9EgLlRXH5xrAZsQt4O7UXW9rrul5CCDncyFzBamAOqjVFBVoJXkncSEatPf6uoj+KohrYk24zgSQsY0xfqG5PYzg+RY4guTDLsCWYRwAkPbOHvofVcfWcnft3m6zcNi42tJ4aG2/JkRCx6NCVe6YDDbi1TEpNqZRlTCEcmHhdHozEf1nMo5UjB4doaDxxH364Z2Qgat3H8ynLWriFAReU7if11nNBz6oDJjzffxel0GOqf6/64uM2WgiDlD4XTeqkPTFJ4FqARFxtvdta2VMbqSf1br0d+U0hhOz7atRoCcQP++o44tO8q00/lDc7pt140YXelll/dfa4/GpnYPzuAj98TdNV6PB6OHg/HPj7ae7jan8Tve19433gNv5D32nno/eRPvyIu8X7xfvd+83ze2NvY3JhsHq9Dbt5qcr7y1a2P6J36mrxU=

    Estimator, b�AAAMi3icpZbdbts2FMfV7qtz3a3dLnfDLjAwDJphOw1aDAvQZgmwXrTIurRJGxkFJVE2EVLUSCq2IugB9jS73R5lb7NDSQllKuu+FAQ5Or//OeI5YnQYZowqPZn8cePme+9/8OFHtz4e3B7e+eTTu/c+e6VELiPyMhJMyJMQK8JoSl5qqhk5ySTBPGTkODz73vDjcyIVFemRLjIy53iR0oRGWIPr7d2tQJO1Lg+UphxrIX1UoWBFY7LEugwUXXBcgWoyntQX6hvT1tjy2uvw7b3bh0EsopyTVEcMK3U6nWR6XmKpacRINQhyRTIcneEFKcOQOw64H3Ucp2CmmBM1L+t6KzQCT4wSIeE31aj2bqTAXKmCh6CEopbKZcZ5HTvNdfJoXtI0yzVJo+ZBSc6QFsg0D8VUkkizAgwcSQrVoGiJJY40tHgQSJKSVSQ4x2kcJJhTVsQkwTkzrUxasxpsLCfPFpKQs80WmJVJlai+VwvBHDfh2ZL8vNmzMlub1ijnYVhKXEC0XebXZQDbQ4dYVmUgc0ZOv5mS9bycjHcyXZWz8Q5ZV27EUsiLToTRzFtpGwjP3WhsxOelSuq+d3OVAZYLTlPIZJjImr+Sl62/ukaO19fLwe/KIQXl9IL0A65ILwSv/yrkkrgh01YbJuXUZU8te+qyiWUTl724ZGH5wmUHlh0Y1t12ZXBo6aEbSdMYaEZkFtyvf4zpaI6qq4qP3PgndsVPXLZn2Z7L9i3bd9kzy5657NiyY5edWHbisteWvXbZG8veuGxl2cpla8vWLissK1x2YdmFy+K1bXbswuh5CyPMyucuDTHLlhgUIYd/i9p2FSHRl4LadDnPWwqGy0imKBNpKzjH8tLhCo+W9imN7Sr2CbtSNLar+KkeM42isV1F3MkRX5tDd9ah23Vsfvw0PbtovpzGYjSUWBZlJhQ1o5CmC18tcUaUH5NIyHo+Kj+E4IUUeRo7X91skWRMaMcbmnFQf+KBQx4GjWO4ILK0iao+hFFGuhCWWAPVCfM5pqnfUSKERiOkBCcIx3FdBGaoiauHY0y4GPQKht0U+RpmjvK7tcNoECvV7w/OtYBNiLvivqrO13d3OjnOYCdzIbOlSYBGKBVUFahpeS8wodrU11wjBLcawlq1GceREDKmKdbXBHffGQTHEq9MMOwKZBHCCQxt4xygd10jZFv+rs3TLx4WG1+1FF5qx5+ZZsGrMXKlCwazvUhFTKrd04hKOCL5sDAanfmonlM5RwoGz+5k/DDifl3QbsiglPuPJvNuFhEqIs9J/K/zTCcTH7rMWHs/28zLoM+p/r+ry0waSGLOUDhddPLQNIV3AV2CJLOd/to20pi+kn+W68HfpNIYzsm2rLYBOYH6fac7Nu00007l2+Z0280bMdr0yy7vv+aezkzuAZzBp+6Ju2+8mo2n2+PZjw+2Hu+1p/Fb3hfel95X3tR76D32fvAOvZde5P3i/er95v0+vDPcHn47/K6R3rzRxnzubVzDgz8B3symuw==

    Standard error,pV(b�)

    AAAMo3icpZZtb9s2EMfV7qlzvTXdXu4Nu8BAN2iG7TZogSFAm6XAiqFFliZN2sgIKImyiZCiSlKxHUEfZp9mb7eX+zY7SkooU1n3pCDw+X53J95ftI5hxqjSo9EfN25+8OFHH39y69Pe7f5nn9/ZuPvFayVyGZHDSDAhj0OsCKMpOdRUM3KcSYJ5yMhRePaD4UfnRCoq0gO9ysiU41lKExphDa7Tje8DTZa6eKVxGmMZIyKlkD4qUaDeSV0EHOt5GBavy/vBgsZkjsGn6Izj8pvydGNzNBxVF+oa48bY9Jpr7/Tu7b0gFlHOSaojhpU6GY8yPS2w1DRipOwFuSIZjs7wjBRhyB0HfB+0HCdgppgTNS0qHUo0AE+MEiHhP9Wo8q6VwFypFQ8h0vSlXGac17GTXCePpwVNs1yTNKpvlOQMaYGMqCimkkSarcDAkaTQDYrmWOJIg/S9QJKULCLBOYgcJJhTtopJgnNmxEwas+ytLSfPZpKQs3UJzMqkSlTXq4VgjpvwbE7erWtWZEsjjXJuhqXEK8i2y/y2CGDb6BDLsghkzsjJd2OynBaj4Vamy2Iy3CLL0s2YC3nRyjAx0ya0SYT7rgkb8Wmhkkr3dq0iwHLGaVrWG1Bk9afkReMvrwnHy+vDwe+GQwnK6QXpJlyRTgpe/lXKJXFTxk1smBRjlz237LnLRpaNXLZfXv0m9132zLJnhrW3XRHsWbrnZtI0BpoRmQX3qj9jOjEH5VXHB27+U7vipy7bsWzHZbuW7brshWUvXHZk2ZHLji07dtkby9647K1lb122sGzhsqVlS5etLFu57MKyC5fFSyt27MLoZQMjzIqXLg0xy+YYIkIOP4vKdiNCoi8DKtPlPG8oGC4jmaJMpE3AOZaXDjfwYG7vUttuxC5hVxG17Ua8qgZNHVHbbkTcqhFfW0O31qGbday//DQ9u6jfnMZiNJRYropMKGpGJE1nvprjjCg/JpGQ1dxUfgjJMynyNHbeutksyZjQjjc046B6xQOHOgyEY3hFZGELlV0Io4y0ISyxAqqV5nNMU78ViRAaDJASnCAcx1UTmKE6rxqOMeGi12kYdlPka5g5ym/3DqNBLFRXH5xrAZsQt4O7UVW9rrul5DCDncyFzOamABqgVFC1QrXkncSEatNffQ0QfNWQ1kSbcRwJIWOaYn1NcvuZQXIs8cIkw65AFiGcwNA2zh563zVAVvL3bZ5u87DY+EpSeKgtf2bEgkdjwpVeMZjtq1TEpNw+iaiEI5IPC6PRmY+qOZVzpGDwbI+GjyLuVw1thwxaufd4NG1XEaEi8pzE/7rOeDTyQWXGmu+T9boMdE71/11dZspAEXOGwumsVYemKTwLUAmKTLa6a1srY3Ql/6zWw78ppTGcn21bjQA5gf59Rx1bdpxpp/MH5nTbrhsxWutll/dfa48npnYPzuBj98TdNV5PhuMHw8nPDzef7DSn8VveV97X3n1v7D3ynng/enveoRd5v3i/er95v/cH/Z/6+/2DOvTmjSbnS2/t6k//BKjlsPU=

    Expected value, E(b�)AAAMlHicpZbdbts2FMfV7qtzvS3dgN3shl1goBs0Q3YbtBcL0KYJsF40yLqkSRsbASVRNhFS5EgqtiP4JfY0u93eYm+zQ1kJZSrrvhQEOTq/c/7iOWJ0GEtGtYmiP27dfu/9Dz786M7HnbvdTz79bOPe56+1KFRCjhLBhDqJsSaM5uTIUMPIiVQE85iR4/j8ueXHF0RpKvJDs5BkzPEkpxlNsAHX2UY4MmRuyr25JIkhKbrArCAhWqLR3oPRjKZkik050nTC8fKbs43NqB9VF2obg9rYDOrr4Oze3YNRKpKCk9wkDGt9OoikGZdYGZowsuyMCk0kTs7xhJRxzD0H3PcajlMwc8yJHpdV4UvUA0+KMqHgNzeo8q5JYK71gscQybGZap9Z503stDDZk3FJc1kYkierB2UFQ0Yg20WUUgX9YgswcKIoVIOSKVYYeqhASZGczBLBOc7TUYY5ZYuUZLhgtpdZbS47a8sp5EQRcr7eArsypTPd9hohmOcmXE7Jz+s9K+XctkZ7D8NK4QVku2V+W45gn5gYq2U5UgUjp98NyHxcRv0taZblsL9F5ks/YyrUZSPDxozr0DoRnrvW2ISPS51VfW9qlSOsJpzmoGSZkKu/ipe1f3lDOJ7fHA5+PxwkKKeXpJ1wTVopeP5XKVfETxnUsXFWDnz2wrEXPosci3z26orF5Suf7Tm2Z1lz25WjA0cP/Eyap0AlUXJ0v/qxphdzuLyu+NDPf+ZW/MxnO47t+GzXsV2fvXTspc+OHTv22YljJz5749gbn7117K3PZo7NfDZ3bO6zhWMLn106dumzdO6anfow2a9hglm579MYMznFEBFz+LeobD8iJuYqoDJ9zouaguEzIjVlIq8DLrC6cviBh1P3lJXtR+wSdh2xsv2In6o5s4pY2X5E2tBIb9QwjXWYeh3rHz9Dzy9XX05rMRorrBalFJramUjzSainWBIdpiQRqhqUOowheaJEkafeV1dOMsmE8byxHQfVJx446DBoHMMLokontGxDGGWkCWGJFdCNtJBjmoeNSIRQr4e04AThNK2KwAyt8qrhmBIuOq2CYTcloYGZo8Nm7TAaxEy3+4MLI2AT4mZwO6rSa7sbnexL2MlcKDm1AqiHckH1Aq1a3krMqLH1ra4eglsDaXW0HceJECqlOTY3JDffGSSnCs9sMuwK5BDCGQxt6+ygd1095Fr+rs3TLh4Wm163FF5qwy9ts+DV2HBtFgxm+yIXKVlunyZUwREphIXR5DxE1ZwqONIweLaj/uOEh1VB2zGDUu4/icZNFRFroi5I+q91BlEUQpcZq++H67oM+pyb/7s6aWVAxJ6hcD5p6NA8h3cBXQKR4VZ7bWsytq/kn2k9+hspg+HA7MqqG1AQqD/0uuNkB9J4lT+0p9umbsLoql9uef9VezC02h04gw/8E3fbeD3sDx72hz8+2ny6U5/G7wRfBV8HD4JB8Dh4GvwQHARHQRL8Evwa/Bb83v2y+333eXdvFXr7Vp3zRbB2dff/BP/pqb4=

    See also: Robert Tibshirani, “Recent Advances in Post-Selection Inference” (2015)

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 5 of 61

    Kinds of Validity

    Construct Validity(measurement)

    Inference Validity(studies)

    “Translation”

    Face Content Predictive Concurrent Convergent Discriminant

    Criterion Internal External

     Everything else can be ignored Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Kass, 2011, Stat. Sci. Adapted from Borgatti, 2012

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 6 of 61

     …or can it?

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 7 of 61

     Outline Problems with/Failures of:   Meaning and measurement   Central tendency   Causality   Capturing variability   Cross-validation

    Reflection More specific

    More general; problems percolate downward in this hierarchy

    ML

    STS

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 8 of 61

     Meaning and measurement

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 9 of 61

     Meaning-making “During the writing of this book, my first grandchild was born. The hospital records document her weight, height, health[;] the mother’s condition, length of labor, time of birth, and hospital stay… These are physiological and institutional metrics. When aggregated across many babies and mothers, they provide trend data about the beginning of life—birthing.”

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 10 of 61

     Meaning-making “But nowhere in the hospital records will you find anything about what the birth of Calla Quinn means. Her existence is documented but not what she means to our family, what decision-making process led up to her birth, the experience and meaning of the pregnancy, the family experience of the birth process, and the familial, social, cultural, political, and economic context…” (Patton, 2015)

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 11 of 61

     Experience

      “A white woman can say that a neighborhood is ‘sketchy’ and most people will smile and nod. She felt unsafe, and we automatically trust her opinion. A black man can tell the world that every day he lives in fear of the police, and suddenly everyone demands statistical evidence to prove that his life experience is real.”

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 12 of 61

     Validating measurements

    Kinds of Validity

    Construct Validity(measurement)

    Inference Validity(studies)

    “Translation”

    Face Content Predictive Concurrent Convergent Discriminant

    Criterion Internal External

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Adapted from Borgatti, 2012

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 13 of 61

     Solutions   See quantitative research as building on what is qualitatively known, not replacing it   Think about measurement! Work with experts in validation   Integrate qualitative research for: – Needs assessments prior to ML – Annotation for training data –  Evaluating implementations of ML systems

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 14 of 61

     Central tendency

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 15 of 61

     The world as a data matrix “it is striking how absolutely these assumptions contradict those of the major theoretical traditions of sociology. Symbolic interactionism rejects the assumption of fixed entities and makes the meaning of a given occurrence depend on its location — within an interaction, within an actor's biography, within a sequence of events. “Both the Marxian and Weberian traditions deny explicitly that a given property of a social actor has one and only one set of causal implications… Marx, Weber, and work deriving from them in historical sociology all approach social causality in terms of stories, rather than in terms of variable attributes.” (Abbott, 1988)

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 16 of 61

     “Flaw of averages”

    Source: Todd Rose. Illustration: Future for Learning.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 17 of 61

     Solutions   No matter how small the bins are, is still a

    central tendency (i.e.: mean, median, majority class, etc.) –  No longer a central tendency when the bins have an n

    of 1… but then ML and stats can do nothing but restate that datum

      Recognize that planning to the central tendency punishes outliers (Keyes 2018): plan for this!

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 18 of 61

     Causality

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 19 of 61

      Sometimes, people want causality   “A project I worked on in the late 1970s was the

    analysis of delay in criminal cases in state court systems... A large decision tree was grown, and I showed it on an overhead and explained it to the assembled Colorado judges. One of the splits was on District N which had a larger delay time than the other districts. I refrained from commenting on this. But as I walked out I heard one judge say to another, ‘I knew those guys in District N were dragging their feet.’” (Breiman, 2001)

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 20 of 61

    35

    30

    25

    20

    10

    5

    15

    0

    0 5 10 15

    Chocolate Consumption (kg/yr/capita)

    Nob

    el L

    aure

    ates

    per

    10

    Mill

    ion

    Popu

    latio

    n

    Poland

    SwitzerlandSweden

    Norway

    China Brazil

    GreecePortugal

    United States

    Germany

    France

    Finland

    Italy

    Australia

    The Netherlands

    CanadaBelgium

    United Kingdom

    Ireland

    Spain

    Austria

    Denmark

    r=0.791

    Japan

     “Predictions” are correlations

    Messerli, 2012, NEJM

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 21 of 61

     Not an obvious usage of “predict” Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 22 of 61

     Creates two types of modeling! Informative models may not fit well

    Breiman 2001: Information Shmueli 2010: Explanation

      Kleinberg et al 2015: Rain dance Mullainathan & Spiess, 2017: β-hat

    Correlations may “predict” well

    Breiman, 2001: Prediction Shmueli, 2010: Prediction

      Kleinberg et al., 2015: Umbrella Mullainathan & Spiess, 2017: y-hat

    AAAB+HicbZBNS8NAEIY39avWj0Y9egkWwVNJRNBj0YvHCvYD2lA2m0m7dLMJuxOlhv4SLx4U8epP8ea/cdvmoK0vLDy8M8PMvkEquEbX/bZKa+sbm1vl7crO7t5+1T44bOskUwxaLBGJ6gZUg+ASWshRQDdVQONAQCcY38zqnQdQmifyHicp+DEdSh5xRtFYA7vaf+QhjCjm/QCQTgd2za27czmr4BVQI4WaA/urHyYsi0EiE1Trnuem6OdUIWcCppV+piGlbEyH0DMoaQzaz+eHT51T44ROlCjzJDpz9/dETmOtJ3FgOmOKI71cm5n/1XoZRld+zmWaIUi2WBRlwsHEmaXghFwBQzExQJni5laHjaiiDE1WFROCt/zlVWif1z3Ddxe1xnURR5kckxNyRjxySRrkljRJizCSkWfySt6sJ+vFerc+Fq0lq5g5In9kff4APIGTcw==

    AAAB8nicbZDLSsNAFIYn9VbrrerSzWARXJVEBF0W3bisYC+QhjKZTNqhk5kwc6KE0Mdw40IRtz6NO9/GaZuFtv4w8PGfc5hz/jAV3IDrfjuVtfWNza3qdm1nd2//oH541DUq05R1qBJK90NimOCSdYCDYP1UM5KEgvXCye2s3ntk2nAlHyBPWZCQkeQxpwSs5Q+eeMTGBIp8Oqw33KY7F14Fr4QGKtUe1r8GkaJZwiRQQYzxPTeFoCAaOBVsWhtkhqWETsiI+RYlSZgJivnKU3xmnQjHStsnAc/d3xMFSYzJk9B2JgTGZrk2M/+r+RnE10HBZZoBk3TxUZwJDArP7scR14yCyC0QqrndFdMx0YSCTalmQ/CWT16F7kXTs3x/2WjdlHFU0Qk6RefIQ1eohe5QG3UQRQo9o1f05oDz4rw7H4vWilPOHKM/cj5/ANfhkZs=

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 23 of 61

      “True” model can predict worse!

    100 150 200

    0.000

    0.005

    0.010

    0.015

    0.020

    Mean Squared (test) Error over 1,000 runs

    MSE

    Density

    TrueUnderspecifiedAll−subsetStepwiseRidgeLasso

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Simulation of Shmueli, 2010, Stat. Sci

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 24 of 61

      Solution: Determine type of problem

      By “prediction”, we mean correlation (Caruana et al., 2015; Doshi-Velez & Kim, 2017): communicate this!!

      If not a “prediction policy problem”, then machine learning may not be appropriate (whether explainable/interpretable or not!)

      Then: causal modeling, or statistical modeling of data-generating process to get “explanations” (causality lite)

      Other benefits to causal knowledge: –  Makes predictions robust to distributions shifts (argument of

    causal learning literature; Spirtes & Zhang, 2016) –  Allows us to intervene

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 25 of 61

     Capturing variability

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 26 of 61

     When data don’t capture key variability (Google Flu Trends)

    2.5

    5

    0

    2.5

    5

    0

    2.5

    5

    0

    2.5

    ILI p

    erce

    ntag

    e

    0

    5Data available as of 4 February 2008

    Data available as of 3 March 2008

    Data available as of 31 March 2008

    Data available as of 12 May 2008

    40 43 47 51 3Week

    7 11 15 19

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Ginsberg et al., 2012, Nature Santillana et al., 2014, Am. J. Prev. Med.

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 27 of 61

     Real-world testing of ML results van’t Veer et al. (2002) found 70 genes correlated with developing breast cancer   Of course the

    correlations were optimal, post-hoc. But did it generalize?

    10

    20

    30

    40

    50

    60

    70

    Tum

    ours

    5

    10

    15

    Tum

    ours

    AL08

    0059

    Con

    tig 6

    3649

    RC

    LOC

    5120

    3C

    ontig

    462

    18R

    CC

    ontig

    382

    88R

    CAA

    5550

    29R

    CC

    ontig

    285

    52R

    CFL

    T1M

    MP9

    DC

    13EX

    T1AL

    1377

    18PK

    428

    HEC

    ECT2

    GM

    PSC

    ontig

    321

    85R

    CU

    CH

    37C

    ontig

    352

    51R

    CKI

    AA10

    67G

    NAZ

    SER

    F1A

    OXC

    TO

    RC

    6LL2

    DTL

    PRC

    1AF

    0521

    62C

    OL4

    A2KI

    AA01

    75R

    AB6B

    Con

    tig 5

    5725

    RC

    DC

    KC

    ENPA

    SM20

    MC

    M6

    AKAP

    2C

    ontig

    564

    57R

    CR

    FC4

    DKF

    ZP56

    4D04

    62SL

    C2A

    3M

    P1C

    ontig

    408

    31R

    CC

    ontig

    242

    52R

    CFL

    J111

    90C

    ontig

    514

    64R

    CIG

    FBP5

    IGFB

    P5C

    CN

    E2ES

    M1

    Con

    tig 2

    0217

    RC

    NM

    ULO

    C57

    110

    Con

    tig 6

    3102

    RC

    PEC

    IAP

    2B1

    CFF

    M4

    PEC

    ITG

    FB3

    Con

    tig 4

    6223

    RC

    Con

    tig 5

    5377

    RC

    HSA

    2508

    39G

    STM

    3BB

    C3

    CEG

    P1C

    ontig

    483

    28R

    CW

    ISP1

    ALD

    H4

    KIAA

    1442

    Con

    tig 3

    2125

    RC

    FGF1

    8–1 0 1

    –1 0 1

    Sporadic breast tumourspatients

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 28 of 61

    High

    Both tests agree, high

    risk

    Both tests agree, low

    risk

    “Clinical” risk Low

    High

    Low

    Model says treat,

    doctor says don’t

    Doctor says treat, model says don’t

    Risk via correlations

    with gene expression

    Treat with chemo

      Implementation testing

    Don’t treat with chemo

    ???

    Cardoso et al., 2016, NEJM

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 29 of 61

    High

    Both tests agree, high

    risk

    Both tests agree, low

    risk

    Low

    High

    Low

    Chemo-therapy is

    worse!

    Chemo-therapy is

    similar

    Risk via correlations

    with gene expression

      Implementation testing “Clinical” risk

    Treat with chemo

    Don’t treat with chemo

    ???

    Cardoso et al., 2016, NEJM

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 30 of 61

    High

    Both tests agree, high

    risk

    Both tests agree, low

    risk

    Low

    High

    Low

    Chemo-therapy is

    worse!

    Chemo-therapy is

    similar

    Risk via correlations

    with gene expression

      Implementation testing “Clinical” risk

    Treat with chemo

    Don’t treat with chemo

    Cardoso et al., 2016, NEJM

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Finding: Machine learning alone would make things worse. But as a secondary diagnosis, on average it catches false positives and avoids unhelpful chemo!

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 31 of 61

      Implementation testing: Details

      Before experiment (training data)

      High model risk, low clinical risk: randomize. Chemo worse!

      Low model risk, high clinical risk: chemo makes no difference

    Baseline Survival (no chemotherapy)S

    urv

    iva

    l w

    ith

    ou

    t D

    ista

    nt

    Me

    tasta

    sis

    (%

    )

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 2 431 5 6 7 98

    Year

    Low clinical, high genomic

    Low clinical and genomic

    High clinical,

    low genomic

    High clinical and genomic

    Clinicial says high risk, Model says low risk

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 1 2 3 4 5 6 7 8 9

    Year

    Chemotherapy

    No chemotherapy

    90 1 2 3 4 5 6 7 8

    80

    0

    85

    90

    95

    100

    Year

    Clinicial says low risk, Model says high risk

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 1 2 3 4 5 6 7 8 9

    Chemotherapy

    No chemotherapy

    90 1 2 3 4 5 6 7 8

    85

    0

    95

    80

    90

    100

    Baseline Survival (no chemotherapy)S

    urv

    iva

    l w

    ith

    ou

    t D

    ista

    nt

    Me

    tasta

    sis

    (%

    )

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 2 431 5 6 7 98

    Year

    Low clinical, high genomic

    Low clinical and genomic

    High clinical,

    low genomic

    High clinical and genomic

    Clinicial says high risk, Model says low risk

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 1 2 3 4 5 6 7 8 9

    Year

    Chemotherapy

    No chemotherapy

    90 1 2 3 4 5 6 7 8

    80

    0

    85

    90

    95

    100

    Year

    Clinicial says low risk, Model says high risk

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 1 2 3 4 5 6 7 8 9

    Chemotherapy

    No chemotherapy

    90 1 2 3 4 5 6 7 8

    85

    0

    95

    80

    90

    100

    Baseline Survival (no chemotherapy)S

    urv

    iva

    l w

    ith

    ou

    t D

    ista

    nt

    Me

    tasta

    sis

    (%

    )

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 2 431 5 6 7 98

    Year

    Low clinical, high genomic

    Low clinical and genomic

    High clinical,

    low genomic

    High clinical and genomic

    Clinicial says high risk, Model says low risk

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 1 2 3 4 5 6 7 8 9

    Year

    Chemotherapy

    No chemotherapy

    90 1 2 3 4 5 6 7 8

    80

    0

    85

    90

    95

    100

    Year

    Clinicial says low risk, Model says high risk

    100

    80

    90

    70

    60

    40

    30

    10

    50

    20

    0

    0 1 2 3 4 5 6 7 8 9

    Chemotherapy

    No chemotherapy

    90 1 2 3 4 5 6 7 8

    85

    0

    95

    80

    90

    100

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    (Note: still limitations in how experimental subjects may be unrepresentative.)

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 32 of 61

      Solutions to capturing variability   GFT is a prediction policy problem: problem wasn’t

    overfitting, or causality (although causality may have helped), but that there was key variability (non-winter flu) that had not yet been observed

      Rhetoric: Emphasize that cross-validation does not give performance guarantees, only suggestive of performance until testing is done

      Real-world testing reveals how system can be used!   (Also: experimental data can introduce full amount

    of variability, even if not doing causal inference)

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 33 of 61

     Cross validation

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 34 of 61

     Generalizability through CV   Generalizability is shown (at the very least)

    through cross validation   CV can go wrong in known ways: –  improper splitting –  publication bias (Gayo-Avello, 2012) –  overfitting to the test set (Dwork et al. 2015, Park

    2012)   Not systematically acknowledged: dependencies

    among observations

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 35 of 61

     Classic argument for CV Training:

    Testing: The difference is the optimism (Efron, 2004; Rosset & Tibshirani, 2018):

    Err(µ̂) = 1nEf kY⇤ � bY k22

    = 1n

    htr⌃+ kµ� E(bY )k22 + tr Varf (bY )� 2 tr Covf (Y ⇤, bY )

    i

    AAAO9HicpVdbb9s2FHa6W+etWbo97oVd4aEX15DcZM1LgDZNihZoi6xNc6nlBpRE2UQoUSOpxI7qf7K3Ya/7P9uv2aEkhzLldTcZBo7O952P5xxSIuWnjErlOL+vXPno408+/ezq5+0vvry2+tXa9a8PJM9EQN4EnHFx5GNJGE3IG0UVI0epIDj2GTn0Tx9r/PCMCEl5sq+mKRnGeJTQiAZYgetk7Q9vV4hb3hir3Iuz2W30/RbyVCRwkLuzPJkhb/ckQt57dPzuDrqHvHMaEk0+nnnv3/VP+sjz2o0QRiI1AJ9A3ms6ijG6qxVAXyvs3qqJ3K5U7pbsAyy8hDMaUyVPogUihPZL0mN+VidBYl20ICnoaKyG6GTtptNzigs1Dbcybraqa+/k+rUrXsiDLCaJChiWcuA6qRrmWCgaMDJre5kkKQ5O8Yjkvh9bDrjv1BwDMBMcEznMi2maoQ54QhRxAf9EocK7IIFjKaexD8wYq7G0Me1chg0yFW0Oc5qkmSJJUA4UZQwpjvSco5AKEig2BQMHgkI1KBhjmDAFK6PtCZKQ84DHMU5CL8IxZdOQRDhjsCZkVJmz9kI6WToShJwutkBnJmQkm17FObPcJE7H5KfFnuXpRLdGWoNhIfAUok2ad3IPVrXysZjlnsgYGdxzyWSYO72NVM3yfm+DTGZ2xJiLi1qE5gwrahVojTspJ67tQQ/g+Sru9GTCMn+6/+L5LN/ZdLd/cJqE/pzg7m5u7m40CffnhCdPHrvOgyZhfU5wnPWddWcxsUEQD3MZFeuhXmOuH2Yor5iHONc3Fk7qOGniWIximlQUnl5SK/8yOp4sp4PfpoMEPLUXpBlwiTRC8OSvQuaIHaJEkww+m+ZWLD+C15aFPTPYMxtzDObY2Ks55uevbGzXYLs2doCX5KydNhHefU2idmpi/TnOvT0z3p4tQ5MQ0JSI1LtR/LRpcfbNQtm34x+ZHjyysW2DbdvYjsF2bMx/bsDnNvjCYC9s7NBghzZ2ZLAjGzs22LGNvTXYWxs7N9i5jU0MNrGxqcGmNnZhsAsbCydmJkIbDF5WYIBZ/rLRU8zSMQaGH8PDWtg2wydqTihMGyeppIwnFeUMi7nDJsLpoeSAYWNSnwIquLRtxv7Y5FHaNmOHsEtGaduM17VRXi8dJaxphEs1VC0PtTSPEY4vRylteDkvbBuKnl6Ue522GPUFFtM85ZLqMxdNRl05ximR3RDe96I4iMmuD8EjwbMktPbJdBSljCvL6+sNvNiUAQcdBhPD8JSI3AjNmiAcPkgdhBQLQNbCujGmSbfGRAh1OkjymCAchkURmKEyrjjOhCTm7UbBsCKDroJTguzWa4fNnJ/LZn9wpjgsZFwnN1mFXtNd62Qvhach5iIdawHUQQmncorKljcCI6p0feXVQXCrIKxi6wNUwLkIaYLVkuD6nEFwKPC5DoZ1gwyEcATHLO1sow9dHWRa/qHF0ywekg0vWwqTWvOnulkwNZou1ZTBaWya8JDMtgYBFXCo7UJiNDjtomIHzmIkYUvdcnoPgrhbFLTlMyjlxqYzrKtwXxJxRsJ/reM6The6zFh131/UZdDnRP3f7FItAyL61IuTUU2HJgnMBXQJRPobzdwWZHRfyT/TWv8bKYXhg8yUVTUgI1B/1+qOkXVTZVV+X3+P1HUDRst+mfT+q7bb19pt+Gpy7W+kpnHQ77n3exs/rt98uF19P11tfdv6rnWr5bYetB62nrb2Wm9awcrDlWiFr6SrZ6s/r/6y+mtJvbJSxXzTWrhWf/sT5pOQXg==

    Opt(µ̂) = Err(µ̂)� err(µ̂) = 2n tr Covf (Y , bY )AAAOiHicpZddb9s2FIaVtts6b03b7XI37AID7eAakpss2UWANh9FC7Rd1qZN0sgIKImyiVCiRlKJHUH/cjf7K7vaoSyHMuV1XzIMHJ3nPYfkISVSQcaoVK77+8qNm7c++/yL2192vvr6zurde/e/+SB5LkLyPuSMi+MAS8JoSt4rqhg5zgTBScDIUXC+q/nRBRGS8vRQTTMyTPAopTENsQLX2b3UT7Aai6T4OVPlQ3+MVeEnefkIbSN/X4im5zHyyaIHNCoWOCwGZZGWcCOQv8sv/JQzmlAlz+KHJz3kX9KI6KCT8tHZvTW371YXahtebaw59XVwdv/ODT/iYZ6QVIUMS3nquZkaFlgoGjJSdvxckgyH53hEiiBILAfcdxuOUzBTnBA5LKrSlagLngjFXMA/VajyLqTAiZTTJAClLpS0mXYuY6e5ireGBU2zXJE0nDUU5wwpjvQ8oIgKEio2BQOHgsJoUDjGUEwFs9XxBUnJZciTBKeRH+OEsmlEYpwzqL2Ma7PsLHQnz0aCkPPFElTTK2PZ9irOmeUmSTYmvy7WrMgmujTSagwLgacQbbr5Q+HDSlMBFmXhi5yR08cemQwLt78Ba6sY9DfIpLQjxlxcNSK0ZlhL60Cr3cls4jo+1ADWfHWnJ9MrixeHr1+Vxd6Wt/Oj2xYM5gJvf2trf6MteDIXPH++67mbbcH6XOC663vr7mLHTsNkWMi4Wg/NMRb6MYLh1Y+ZvrE4aXLS5liMEprWEp5dS2v/MjmeLJeD35ZDCnhWr0g74Jq0QvDkr0LmxA5Roi0Gny3zalUQF57NXhr20mauYa7N3s5ZULy12b5h+zb7gJf0WTttIbzx2kLt1MLmc1z4B6a9AzsNTSOgGRGZ/6D6adPSHJqFcmjHPzM1eGazHcN2bLZn2J7NglcGvrLha8Ne2+zIsCObHRt2bLMTw05s9tGwjza7NOzSZhPDJjabGja12ZVhVzaLJmYmIhuGb2oYYla8adUUs2yMQREk8LBWtq0IiJoLKtPmJJOU8bSWXGAxd9hC2KVnGjBsJim8D2o8s23F4dj0Y2bbij3CrhUz21a8a7TybmkrUSNHtDSHavRDLe3HCCfXrcxseDkvbBuKnl/N9jptMRoILKZFxiXV5yCajnpyjDMiexG870V1OJK9AIJHgudpZO2T2SjOGFeWN9AbeLUpA4c8DCaG4SkRhUlUtiEcPkgTQhcrIBthvQTTtNdQIoS6XSR5QhCOomoQmKFZXHWciUjCO60Bw4oMewpOCbLXHDts5vxStuuDc8VhIeOmuK2q8rXdjUr2M3gaEi6ysU6AuijlVE7RrOStwJgqPb7Z1UVwqyCsVusDVMi5iGiK1ZLg5pxBcCTwpQ6GdYMMQjiGY5Z2dtCnri4yJf/U4mkPHjobXZcUJrXhz3SxYGq0XKopg9PYNOURKbdPQyrgUNuDjtHwvIeqHThPkIQtddvtb4ZJrxrQdsBgKA+23GEzCw8kERck+td5PNftQZUZq+8Hi3kZ1DlV/7d3mU4DSfSpF6ejRh6apjAXUCVIMtho920hja4r+We51v8mlcLwkWSGVRcgJzD+nlUdk9bLlDXyJ/p7pJk3ZHRWL9O9/5rbG+jcHfhq8uxvpLbxYdD3nvQ3fllfe7pTfz/ddr5zvnceOp6z6Tx1XjgHznsndH5z/li5uXJrtbPqrm6u/jST3lipY751Fq7VnT8B93hrRQ==

    err(µ̂) = 1nEf kY � bY k22

    = 1n

    htr⌃+ kµ� E(bY )k22 + tr Varf (bY )� 2 tr Covf (Y , bY )

    i

    AAAO8HicpVdbb9s2FHa6W+etWbs97oVd4aHdXENykzUPC9CmcdECbZG1aS61vICSKJsIJWokldhR/T/2Nux1P2jA/s0OJTmUKa+7yTBwdL7vfDznkBIpP2VUKsf5Y+3Ke+9/8OFHVz9uf/LptfXPrt/4/EDyTATkdcAZF0c+loTRhLxWVDFylAqCY5+RQ//0kcYPz4iQlCf7apaSUYzHCY1ogBW4Tq7/7hEhbnsTrHIvzuZ30NfbyFORwEHuzvNkjrzBSYS8t+gY3UXeOQ2Jph7Pvbc/9k/6yPPajQBGIjUEn0DeKzqOMfpWx4O6VhjcroncqVS+LdkHWHgJZzSmSp5ES0QI7ZekR/ysTjruoiWeJ+h4okYn1285Pae4UNNwK+NWq7r2Tm5cu+KFPMhikqiAYSmHrpOqUY6FogEj87aXSZLi4BSPSe77seWA+07NMQQzwTGRo7yYojnqgCdEERfwTxQqvEsSOJZyFvvAjLGaSBvTzlXYMFPR1iinSZopkgTlQFHGkOJIzzcKqSCBYjMwcCAoVIOCCYbpUrAq2p4gCTkPeBzjJPQiHFM2C0mEMwbrQUaVOW8vpZOlY0HI6XILdGZCRrLpVZwzy03idEJ+Wu5Znk51a6Q1GBYCzyDapPlN7sGKVj4W89wTGSPDuy6ZjnKnt5mqed7vbZLp3I6YcHFRi9CcUUWtAq1xp+XEtT3oATxbxZ2eTFjkT/afP5vnu1vuzndOk9BfENzB1tZgs0m4tyA8fvzIde43CRsLguNs7G44y4kNg3iUy6hYD/Uac28gdHnFPMS5vrFwUsdJE8diHNOkovD0klr5V9HxdDUd/DYdJOCZvSDNgEukEYKnfxWyQOwQJZpk8Nk0t2L5Eby0LOypwZ7amGMwx8ZeLjA/f2ljA4MNbOwAr8hZO20ivPmaRO3UxPpznHt7Zrw9W4YmIaApEal3s/hp0+Lsm4Wyb8c/ND14aGM7BtuxsV2D7dqY/8yAz2zwucGe29ihwQ5t7MhgRzZ2bLBjG3tjsDc2dm6wcxubGmxqYzODzWzswmAXNhZOzUyENhi8qMAAs/xFo6eYpRMMDD+Gh7WwbYZP1IJQmDZOUkkZTyrKGRYLh02Ek0PJAcPGpD4DVHBp24z9icmjtG3GLmGXjNK2Ga9qo7xaOUpY0whXaqhaHmplHmMcX45S2vByXto2FD29KPc6bTHqCyxmecol1ectmoy7coJTIrshvO9FcQiTXR+Cx4JnSWjtk+k4ShlXltfXG3ixKQMOOgwmhuEZEbkRmjdBOHyQOggpFoCshXVjTJNujYkQ6nSQ5DFBOAyLIjBDZVxxnAlJzNuNgmFFBl0FpwTZrdcOmzk/l83+4ExxWMi4Tm6yCr2mu9bJXgpPQ8xFOtECqIMSTuUMlS1vBEZU6frKq4PgVkFYxdYHqIBzEdIEqxXB9TmD4FDgcx0M6wYZCOEIjlna2UbvujrItPxdi6dZPCQbXrYUJrXmT3WzYGo0XaoZg9PYLOEhmW8PAyrgUNuFxGhw2kXFDpzFSMKWuu307gdxtyho22dQys0tZ1RX4b4k4oyE/1rHdZwudJmx6r6/rMugz4n6v9mlWgZE9KkXJ+OaDk0SmAvoEoj0N5u5LcnovpJ/prXxN1IKw8eYKatqQEag/q7VHSPrpsqq/J7+HqnrBoyW/TLp/Vdtt6+12/DV5NrfSE3joN9z7/U2f9i49WCn+n662vqy9VXrdstt3W89aD1p7bVet4K179f8tdM1ti7Wf17/Zf3XknplrYr5orV0rf/2J09mj0Y=

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 36 of 61

     Apply this to non-iid data   Imagine we have, for and

      Then, optimism in the training set is:

      But test set also has nonzero optimism!

    Y1Y2

    �⇠ N

    ✓XX

    ��,

    ⌃ ⇢�211T

    ⇢�211T ⌃

    �◆

    AAAO6XicpZfdbts2FMed7qvz1qzdLnfDrnDRDZ4hucmSmwBtmhQt0BZZmzZJLTegJMomQooaSSV2BD/E7obd7oF2vbfZoayYMuV1XwoSHJ7f/xySh1RIhRmjSnveH2vXPvjwo48/uf5p+7PPb6x/cfPWl2+UyGVEXkeCCXkcYkUYTclrTTUjx5kkmIeMHIVnjww/OidSUZEe6mlGhhyPUprQCGtwnd78PQjJiKZFyLGWdDI7OfVREKCT0z4KSBov/ChQlKMgeoECRhJ9DzlxKDg2cebvclgYEo27TXn4io44RndRIMfCZIfWO+gUuvffHZa5VoK7i1CnI0lHY/3t6c07Xs8rH9Q0/Mq406qeg9NbN64FsYhyTlIdMazUwPcyPSyw1DRiZNYOckUyHJ3hESnCkDsOaHdqjgGYKeZEDYtybWaoA54YJULCb6pR6V1KgblSUx6CEqYyVi4zzlVskOtke1jQNMs1SaN5R0nOkBbILDSKqSSRZlMwcCQpzAZFYyxxpGE7tANJUnIRCc5xGgcJ5pRNY5LgnOkiUEllztpLw8mzkSTkbLkEZmRSJarp1UIwx014NiY/LdesyCamNMrpDEuJpxBth/ldEcBW1iGWsyKQOSOD730yGRZebzPTs6Lf2ySTmRsxFvKyFmE0w0paBTr9TuYL1w6gBvBSlS2zmP6seHL4/Nms2Nv2d3/wmoL+lcDf397e32wK7l8JHj9+5HtbTcHGlcDzNvY2vOWBDSI+LFRS7of6HItgX5rplevAC9NwOKlz0uRYjjhNK4nIFtLKv0qOJ6vl4HflkIJyekmaAQvSCMGTvwq5Im6Ilk0x+FyZX6nCpPBd9tSypy7zLPNc9vKKhcVLl+1btu+yN3jFmI3TFT4S502hcRph/T0uggPb34GbhqYx0IzILLhd/hjT0RzajXLoxj+0NXjosl3Ldl22Z9mey8JnFj5z4XPLnrvsyLIjlx1bduyyE8tOXPbWsrcuu7DswmUTyyYum1o2ddmlZZcuiyd2JWIXRi8qGGFWvGjUFLNsjEERcnhZS9tVmDO5EpSmy0mmKBNpJTnH8srhCnleacBwWXlyV3huu4rDsR3H3HYVe4QtFHPbVbyq9fJqZS9xLUe8MoeujUOvHMcI80Uvcxv+OS8dG5qeXc7POmMxGkosp0UmFDUXLZqOumqMM6K6Mfy/l+XtS3VDCB5Jkaexc05moyRjQjve0Bzg5aEMHPIwWBiGp0QWNtGsCeHyQeoQhlgCVQvrckzTbk2JEOp0kBKcIBzH5SQwQ/O48joTEy7ajQnDjoy6Gm4JqlufOxzm4kI164NzLWAj47q4qSrzNd21SvYyeBu4kNnYJEAdlAqqpmhe8kZgQrWZ3/zpIGhqCKvU5gIVCSFjmmK9Iri+ZhAcS3xhgmHfIIsQTuCaZZxt9L6ng2zJ37d5mpOHwcaLksKi1vyZKRYsjZErPWVwG5umIiaznUFEJVxquzAwGp11UXkC5xwpOFJ3vN5WxLvlhHZCBlO5ve0N61lEqIg8J/G/zuN7XheqzFjV7i/nZVDnVP/f0WUmDSQxt16cjmp5aJrCWkCVIEl/szm2pTSmruSf5dr4m1Qaw1eYnVZVgJzA/LtOdWxaP9POzO+b75F63ojReb3s8P5rbr9vcrfhq8l3v5Gaxpt+z7/f2/xx486D3er76Xrr69Y3rXstv7XVetB60jpovW5Fa5trg7V4jayfrf+8/sv6r3PptbUq5qvW0rP+25/Fmo3e

    2n tr Covf (Y1,

    bY1) = 2n tr Covf (Y1,HY1) =2n trHVarf (Y1) =

    2n trH⌃

    AAAO03icpZfdbts2FMed7qvz1izdLncxdoWBbnANyU3W3ARo06RogbbL2nw2MgJKomwilKiRVGJH0M2w2z3QHmVvs0NJDmXK7bpNRZGj8/ufo8NDyqT8lFGpHOevlRsfffzJp5/d/Lz7xZe3Vr9au/31oeSZCMhBwBkXxz6WhNGEHCiqGDlOBcGxz8iRf/5E86MLIiTlyb6apWQU43FCIxpgBa6ztT89FQkc5MMiTwrkKYG8J/zCSzijMVXyLLp3cub2kXdJQzLBKj8pztwf0Bb6wDA/zp8V6ORdMSX2DrGwIt+jhj9v6DjG6GztrjNwygu1Dbc27nbqa+/s9q0bXsiDLCaJChiW8tR1UjXKsVA0YKToepkkKQ7O8Zjkvh9bDrjvNRynYCY4JnKUl7NQoB54QhRxAf8ThUrvQgocSzmLfVDGWE2kzbRzGTvNVLQ5ymmSZookQfWgKGNIcaSnFIVUkECxGRg4EBRGg4IJhuYpmPiuJ0hCLgMexzgJvQjHlM1CEuGMqdyTUW0W3YVysnQsCDlfbIGuTMhItr2Kc2a5SZxOyK+LPcvTqW6NtB6GhcAziDZl/ph7sGiVj0WReyJj5PS+S6aj3BlspKrIh4MNMi3siAkXV40IrRnV0jrQeu60mriuBz2A16e805PpFvmz/Zcvinxn093+yWkLhnOBu7u5ubvRFjyYC54+feI6D9uC9bnAcdZ31p3Fwk6DeJTLqFwPzTHm3q7QwyvnIc71jcVJk5M2x2Ic06SW8PRaWvuXyfF0uRz8thxSwDt8RdoB16QVgqfvCpkTO0SJthh8tsytVX6UuzZ7bthzmzmGOTZ7PWd+/tpmu4bt2uwQL6lZO20h/Ia2hdqphc33OPf2zPP27DQ0CYGmRKTenfKfNi3Nvlko+3b8Y9ODxzbbNmzbZjuG7djMf2HgCxu+NOylzY4MO7LZsWHHNjsx7MRmbw17a7NLwy5tNjVsarOZYTObXRl2ZbNwamYitGHwqoYBZvmrVk8xSycYFLAzepVtK3yi5oLStDlJJWU8qSUXWMwdtjDOag0YNpN6Q65xZduK/Ympo7JtxQ5h14rKthVvGk95s/QpYSNHuDSHatShltYxxvH1UyobfpwXtg1Fz6+qvU5bjPoCi1meckn1kYom476c4JTIfgi/96I8Z8m+D8FjwbMktPbJdByljCvL6+sNvNyUgUMeBhPD8IyI3CQq2hAOH6QJocQSyEZYP8Y06TeUCKFeD0keE4TDsBwEZqiKK48zIYl5tzVgWJFBX8EpQfabY4fNnF/Kdn9wpjgsZNwUt1Vlvra70clBCm9DzEU60QlQDyWcyhmqWt4KjKjS46uuHoJbBWG1Wh+gAs5FSBOslgQ35wyCQ4EvdTCsG2QQwhEcs7Szi9539ZBp+fsWT3vwUGx43VKY1IY/1c2CqdFyqWYMTmOzhIek2DoNqIBDbR8Ko8F5H5U7cBYjCVvqljN4GMT9ckBbPoOh3Nl0Rs0s3JdEXJDwX+dxHacPXWasvh8u5mXQ50T93+pSnQaS6FMvTsaNPDRJYC6gS5BkuNGubSGN7iv5sFzr/5BKYfjeMsOqG5ARGH/f6o5J66bKGvkD/T3SzBswWvXLlPdfc7tDnbsLX02u/Y3UNg6HA/fBYOOX9buPtuvvp5udbzvfd+513M7DzqPOs85e56ATrHy3srvyauXn1YPVfPW31d8r6Y2VOuabzsK1+sffAg+GZw==

    2n tr Covf (Y2,

    bY1) = 2n tr Covf (Y2,HY1) =2⇢�2

    n trH11T = 2⇢�2

    AAAOy3icpZfdbts2FMed7qvz1qzdLnaxG3aFgW5wDclNltwEaPPRtUBTZG3aJI3cgJIomwglaiSV2NF0uQfa4+xtdijJoUx5XbcpCHx0fv9zSB5SIuWnjErlOH+u3Pjo408+/ezm590vvry1+tXtO1+/kTwTAXkdcMbFsY8lYTQhrxVVjBynguDYZ+TIP9/R/OiCCEl5cqhmKRnFeJzQiAZYgevs9h+eigQO8mGRJwXylEDeDr/wEs5oTJU8i+6fnA37yLukIZlglZ8UZ+4PaAt9YJgf508LdLIYgzwx4ciTdBzjd40MlVj/uPOfd4cQN1zQn92+5wyc8kJtw62Ne536Oji7c+uGF/Igi0miAoalPHWdVI1yLBQNGCm6XiZJioNzPCa578eWA+57DccpmAmOiRzlZfUL1ANPiCIu4D9RqPQupMCxlLPYB2WM1UTaTDuXsdNMRZujnCZppkgSVA1FGUOKIz2VKKSCBIrNwMCBoDAaFEwwlFjBhHc9QRJyGfA4xknoRTimbBaSCGdM5Z6MarPoLnQnS8eCkPPFEuieCRnJtldxziw3idMJ+XWxZnk61aWRVmNYCDyDaNPNH3MPFqvysShyT2SMnD5wyXSUO4P1VBX5cLBOpoUdMeHiqhGhNaNaWgda7U6riet6UAN4bMo7PZlukT893H9e5Lub7vZPTlswnAvcvc3NvfW24OFc8OTJjutstAVrc4HjrO2uOYsdOw3iUS6jcj00x5h7e0IPr5yHONc3FidNTtoci3FMk1rC02tp7V8mx9PlcvDbckgBT/0VaQdck1YInv5dyJzYIUq0xeCzZW6t8iN4fVjsmWHPbOYY5tjs5Zz5+Uub7Rm2Z7M3eEmftdMWwruzLdROLWw+x7l3YNo7sNPQJASaEpF6d8s/bVqaQ7NQDu34x6YGj222bdi2zXYN27WZ/9zA5zbcN2zfZkeGHdns2LBjm50YdmKzt4a9tdmlYZc2mxo2tdnMsJnNrgy7slk4NTMR2jB4UcMAs/xFq6aYpRMMCtgjvcq2FT5Rc0Fp2pykkjKe1JILLOYOWxhntQYMm5U7co0r21YcTkw/KttW7BJ2rahsW/Gq0cqrpa2EjRzh0hyq0Q+1tB9jHF+3Utnwcl7YNhQ9v6r2Om0x6gssZnnKJdVHKZqM+3KCUyL7IbzvRXm+kn0fgseCZ0lo7ZPpOEoZV5bX1xt4uSkDhzwMJobhGRG5SVS0IRw+SBNCF0sgG2H9GNOk31AihHo9JHlMEA7DchCYoSquPM6EJObd1oBhRQZ9BacE2W+OHTZzfinb9cGZ4rCQcVPcVpX52u5GJQcpPA0xF+lEJ0A9lHAqZ6gqeSswokqPr7p6CG4VhNVqfYAKOBchTbBaEtycMwgOBb7UwbBukEEIR3DM0s4uet/VQ6bk71s87cFDZ8PrksKkNvypLhZMjZZLNWNwGpslPCTF1mlABRxq+9AxGpz3UbkDZzGSsKVuOYONIO6XA9ryGQzl7qYzambhviTigoT/Oo/rOH2oMmP1/XAxL4M6J+r/9i7VaSCJPvXiZNzIQ5ME5gKqBEmG6+2+LaTRdSUflmvtH1IpDN9ZZlh1ATIC4+9b1TFp3VRZI3+ov0eaeQNGq3qZ7v3X3O5Q5+7CV5NrfyO1jTfDgftwsP7L2r1H2/X3083Od53vO/c7bmej86jztHPQed0JVr5d2Vp5svLz6v6qXL1a/a2S3lipY77pLFyrv/8FcxCCYg==

    ⌃ij = ⇢�2, i 6= j

    AAAORHicpZfdbts2FMfV7Kvz5q3dLnfDrjAwFJohucmamwBtmhQt0BRZmzZpIy+gJMpmQ4kqSSV2BL/Dnma329XeYe+wu2G3ww5lOZQpr/tSEODo/P7n6PCQMqkwZ1Qqz/vlyto77773/gdXP+x89HH3k0+vXf/sheSFiMjziDMujkIsCaMZea6oYuQoFwSnISOH4el9zQ/PiJCUZwdqmpNhikcZTWiEFbhOrt0Kwmd0lOKTkr6eoS0UiDEPpPZ8N3CDNwWOEUVBRt6g1yfXbnp9r7pQ2/Br46ZTX/sn17trQcyjIiWZihiW8tj3cjUssVA0YmTWCQpJchyd4hEpwzC1HHDfaziOwcxwSuSwrIY9Qz3wxCjhAv4zhSrvUgqcSjlNQ1CmWI2lzbRzFTsuVLI5LGmWF4pk0fxBScGQ4kj3EMVUkEixKRg4EhRGg6IxFjhS0OlOIEhGziOepjiLgwSnlE1jkuCCqTKQSW3OOkvlFPlIEHK63AJdmZCJbHsV58xykzQfkzfLPSvziW6NtB6GhcBTiDZl3ioDWCUqxGJWBqJg5Phrn0yGpdffyNWsHPQ3yGRmR4y5uGhEaM2wltaB1nMn84nrBNADWK/VnZ5Mf1Y+PNh7PCt3Nv3tb7y2YLAQ+Lubm7sbbcHtheDBg/u+d6ctWF8IPG99Z91bLuw4SoelTKr10BxjGewKPbxqHtJS31icNDlpcyxGKc1qCc8vpbV/lRxPVsvBb8shBU3pBWkHXJJWCJ78VciC2CFKtMXgs2V+rQqT0rfZI8Me2cwzzLPZ0wULy6c22zVs12Yv8IqatdMW3udnbaF2amHzPS6DffO8fTsNzWKgORF5cKP606alOTAL5cCOv2d6cM9m24Zt22zHsB2bhY8NfGzDPcP2bHZo2KHNjgw7stlLw17a7JVhr2x2bti5zSaGTWw2NWxqswvDLmwWT8xMxDaMntQwwqx80uopZvkYgyJM4WWtbFsRErUQVKbNSS4p41ktOcNi4bCFaVFrwLBZtUHXeG7bioOxqWNu24odwi4Vc9tWPGs85dnKp8SNHPHKHKpRh1pZxwinl0+Z2/DjvLRtKHp6Md/rtMVoKLCYljmXVJ9haDZy5RjnRLox/N6L6mAj3RCCR4IXWWztk/koyRlXljfUG3i1KQOHPAwmhuEpEaVJNGtDOHyQJoQSKyAbYW6KaeY2lAihXg9JnhKE47gaBGZoHlcdZ2KS8k5rwLAiI1fBKUG6zbHDZs7PZbs/uFAcFjJuituqKl/b3ehkP4e3IeUiH+sEqIcyTuUUzVveCkyo0uObXz0EtwrCarU+QEWci5hmWK0Ibs4ZBMcCn+tgWDfIIIQTOGZpZwe97eoh0/K3LZ724KHY+LKlMKkNf66bBVOj5VJNGZzGphmPyWzrOKICDrUuFEajUxdVO3CRIglb6pbXvxOlbjWgrZDBUG5sesNmFh5KIs5I/K/z+J7nQpcZq+8Hy3kZ9DlT/7e6XKeBJPrUi7NRIw/NMpgL6BIkGWy0a1tKo/tK/lmu9b9JpTB84Jhh1Q0oCIzftbpj0vq5skZ+W3+PNPNGjM77Zcr7r7n9gc7dga8m3/5GahsvBn3/dn/j2/Wbd7fr76erzhfOl85Xju/cce46D51957kTOd87Pzg/Oj91f+7+2v2t+/tcunaljvncWbq6f/wJ8WBWeA==

    ⌃ii = �2

    AAAOMXicpZfPb9s2FMfV7lfnzVu7HXbYhV1hYBg8Q3KTNZcAbZoULdAWWZs2aSMvoCTKJkKKGkkldgT/Nbtup/01vQ277p/YoyyHMuV1vxQEeHqf73t6fKRMKsoZVdr331y5+s67773/wbUPOx993P3k0+s3PnupRCFj8iIWTMijCCvCaEZeaKoZOcolwTxi5DA6vW/44RmRiorsQM9yMuJ4nNGUxliD6+T6F2H0nI45PikpnaNtFCpz98Pw5Potf+BXF2obQW3c8upr/+RG92qYiLjgJNMxw0odB36uRyWWmsaMzDthoUiO41M8JmUUcccB972G4xjMDHOiRmU1xjnqgSdBqZDwn2lUeVdSYK7UjEeg5FhPlMuMcx07LnS6NSpplheaZPHiQWnBkBbINAwlVJJYsxkYOJYURoPiCZY41tDWTihJRs5jwTnOkjDFnLJZQlJcMF2GKq3NeWelnCIfS0JOV1tgKpMqVW2vFoI5bsLzCflxtWdlPjWtUc7DsJR4BtG2zG/KEJaEjrCcl6EsGDn+NiDTUekPNnM9L4eDTTKduxETIS8aEUYzqqV1oPPc6WLiOiH0ABZndWcmM5iXDw+ePJ6Xu1vBznd+WzBcCoK9ra29zbbg9lLw4MH9wL/TFmwsBb6/sbvhrxZ2HPNRqdJqPTTHWIZ70gyvmgdemhuHkyYnbY7lmNOsloj8Ulr718nxdL0c/K4cUlBOL0g74JK0QvD0r0KWxA3Rsi0GnysLalWUloHLHln2yGW+Zb7Lni1ZVD5z2Z5ley57idfUbJyu8L44awuN0wib73EZ7tvn7btpaJYAzYnMw5vVnzEdzYFdKAdu/D3bg3su27Fsx2W7lu26LHps4WMXPrHsicsOLTt02ZFlRy57Zdkrl7227LXLzi07d9nUsqnLZpbNXHZh2YXLkqmdicSF8dMaxpiVT1s9xSyfYFBEHF7WynYVEdFLQWW6nOSKMpHVkjMslw5XyItaA4bLqh25xgvbVRxMbB0L21XsEnapWNiu4nnjKc/XPiVp5EjW5tCNOvTaOsaYXz5lYcOP88q2oenpxWKvMxajkcRyVuZCUXNgodm4ryY4J6qfwO+9rE4xqh9B8FiKIkucfTIfpzkT2vFGZgOvNmXgkIfBxDA8I7K0ieZtCIcP0oRQYgVUI6zPMc36DSVCqNdDSnCCcJJUg8AMLeKq40xCuOi0BgwrMu5rOCWofnPssJmLc9XuDy60gIWMm+K2qsrXdjc6OcjhbeBC5hOTAPVQJqiaoUXLW4Ep1WZ8i6uH4FZDWK02B6hYCJnQDOs1wc05g+BE4nMTDOsGWYRwCscs4+ygt109ZFv+tsXTHjwUm1y2FCa14c9Ns2BqjFzpGYPT2CwTCZlvH8dUwqG2D4XR+LSPqh244EjBlrrtD+7EvF8NaDtiMJSbW/6omUVEisgzkvzrPIHv96HLjNX3w9W8DPqc6f9bXW7SQBJz6sXZuJGHZhnMBXQJkgw327WtpDF9Jf8s18bfpNIYvmbssOoGFATG33e6Y9MGuXZGftt8jzTzxowu+mXL+6+5g6HJ3YGvpsD9RmobL4eD4PZg8/uNW3d36u+na96X3lfe117g3fHueg+9fe+FF3tz7yfvZ++X7q/dN93fur8vpFev1DGfeytX948/Ae8xTw4=

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 37 of 61

     One draw as an example

    010

    2030

    4050

    y

    Training setTest setNew drawTrue mean

    0.0 0.2 0.4 0.6 0.8 1.0

    −3−2

    −10

    12

    3

    x

    y−

    µ

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Correlation between observations can pull training and test observations close to one another, but potentially far from an independent draw

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 38 of 61

     Simulated MSE

    0 5 10 15

    01

    23

    45

    6

    MSE

    Den

    sity

    Training errorTest set errorOut−of−sample (true) error

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Mean training error: 0.40 Mean test set error: 0.61 Mean true error: 1.61 (also, long tail!)

    Matches theory! Irreducible error: 1 Estimator variance: 0.61 Expected bias: 0 (OLS is unbiased) Expected training optimism: 1.21 Expected test set optimism: 1

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 39 of 61

     Solution: Split by dependencies   Study covariance, and design data splitting around it –  But can’t estimate both mean and the covariance

    structure, have to assume one (Opsomer et al., 2001) –  (For covariance, no amount of data is ever enough!)   Examples in literature: –  Temporal block cross-validation (Bergmeir et al., 2018) –  Leave-one-subject-out cross-validation (Hammerla &

    Plötz, 2015) –  Network cross-validation

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 40 of 61

     Reflection

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 41 of 61

     About me    

    at Harvard University

    MACHINE LEARNINGD E P A R T M E N T

    Data Science For Social GoodSummer Fellowship

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 42 of 61

     Background   “Methods are like people: if you focus only on what they can’t do, you will always be disappointed.” (Shapiro, 2014)

      Trivially, all models are wrong because they aren’t the thing itself. But specifically, when, why, and how does it matter that ML is “wrong”?

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 43 of 61

     Encountering social science

    42 COMMUNICATIONS OF THE ACM | MARCH 2018 | VOL. 61 | NO. 3

    V viewpoints

    IM

    AG

    E B

    Y E

    VAN

    NO

    VO

    ST

    RO

    tional social science sits at the inter-section of computer science, statistics, and social science.

    For me, shifting away from tradi-tional machine learning and into this interdisciplinary space has meant that I have needed to think outside the algo-rithmic black boxes often associated with machine learning, focusing in-stead on the opportunities and chal-lenges involved in developing and us-ing machine learning methods to analyze real-world data about society.

    This Viewpoint constitutes a reflec-tion on these opportunities and chal-lenges. I structure my discussion here

    THIS VIEWPOINT IS about differ-ences between computer sci-ence and social science, and their implications for compu-tational social science. Spoiler alert: The punchline is simple. Despite all the hype, machine learning is not a be-all and end-all solution. We still need so-cial scientists if we are going to use ma-chine learning to study social phenomena in a responsible and ethical manner.

    I am a machine learning researcher by training. That said, my recent work has been pretty far from traditional machine learning. Instead, my focus has been on computational social sci-ence—the study of social phenomena using digitized information and com-putational and statistical methods.

    For example, imagine you want to know how much activity on websites such as Amazon or Netflix is caused by recommendations versus other fac-tors. To answer this question, you might develop a statistical model for estimating causal effects from observa-tional data such as the numbers of rec-ommendation-based visits and num-bers of total visits to individual product or movie pages over time.9

    Alternatively, imagine you are inter-ested in explaining when and why sen-ators’ voting patterns on particular is-sues deviate from what would be expected from their party affiliations and ideologies. To answer this ques-tion, you might model a set of issue-

    based adjustments to each senator’s ideological position using their con-gressional voting history and the corre-sponding bill text.4,8

    Finally, imagine you want to study the faculty hiring system in the U.S. to determine whether there is evidence of a hierarchy reflective of systematic so-cial inequality. Here, you might model the dynamics of hiring relationships between universities over time using the placements of thousands of tenure-track faculty.3

    Unsurprisingly, tackling these kinds of questions requires an interdisciplin-ary approach—and, indeed, computa-

    DOI:10.1145/3132698 Hanna Wallach

    Viewpoint Computational Social Science ≠ Computer Science + Social Data The important intersection of computer science and social science.

    3 Jun 2004 17:3 AR AR219-SO30-12.tex AR219-SO30-12.sgm LaTeX2e(2002/01/18) P1: IBC10.1146/annurev.soc.30.020404.104342

    Annu. Rev. Sociol. 2004. 30:243–70doi: 10.1146/annurev.soc.30.020404.104342

    Copyright c⃝ 2004 by Annual Reviews. All rights reservedFirst published online as a Review in Advance on March 9, 2004

    THE “NEW” SCIENCE OF NETWORKS

    Duncan J. WattsDepartment of Sociology, Columbia University, New York, NY 10027; Santa Fe Institute,Santa Fe, New Mexico 97501; email: [email protected]

    Key Words graph theory, mathematical models, network data, dynamical systems

    ■ Abstract In recent years, the analysis and modeling of networks, and also net-worked dynamical systems, have been the subject of considerable interdisciplinaryinterest, yielding several hundred papers in physics, mathematics, computer science,biology, economics, and sociology journals (Newman 2003c), as well as a number ofbooks (Barabasi 2002, Buchanan 2002, Watts 2003). Here I review the major findingsof this emerging field and discuss briefly their relationship with previous work in thesocial and mathematical sciences.

    INTRODUCTION

    Building on a long tradition of network analysis in sociology and anthropology(Degenne & Forse 1994, Scott 2000, Wasserman & Faust 1994) and an even longerhistory of graph theory in discrete mathematics (Ahuja et al. 1993, Bollobas 1998,West 1996), the study of networks and networked systems has exploded acrossthe academic spectrum in the past five years. Spurred by the rapidly growingavailability of cheap yet powerful computers and large-scale electronic datasets,researchers from the mathematical, biological, and social sciences have madesubstantial progress on a number of previously intractable problems, reformulatingold ideas, introducing new techniques, and uncovering connections between whathad seemed to be quite different problems. The result has been called the “newscience of networks” (Barabasi 2002, Buchanan 2002, Watts 2003)—a label thatmay strike many sociologists as misleading, given the familiarity (to social networkanalysts) of many of its central ideas. Nevertheless, the label does capture the senseof excitement surrounding what is unquestionably a fast developing field—newpapers are appearing almost daily—and also the unprecedented degree of synthesisthat this excitement has generated across the many disciplines in which network-related problems arise.

    In reviewing the main results of this field, I have tried to strike a balance betweenthe historical progression of the ideas and their logical order. To this end, the firstsection describes the main network modeling approaches, regarding local struc-ture, global connectivity, searchability, and highly skewed degree distributions.

    0360-0572/04/0811-0243$14.00 243

    “machine learning is not a be-all and end-all solution.”

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 44 of 61

     Which social science?   Is a “hard” version of social science compatible with

    modeling (econometrics, political science, quantitative sociology, biological anthropology, etc.)

      Enriches ML and avoid pitfalls, but is not the most valuable and profound offering

      Critical sociology, cultural anthropology, critical race studies, feminist studies, STS: not just about the world, but about the very ways we see and interact with the world

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 45 of 61

     The original

    Toward a Critical Technical Practice: Lessons Learned Trying to Reform AI

    Philip E. Agre University of California, San Diego

    Every teclmology fits, in its own unique way, into a far-flung network of different sites of social practice. Some technologies are employed in a specific site, and in those cases we often feel that we can warrant clear cause-and-effect stories about the transformations that have accompanied them, either in that site or others. Other teclmologies, such as electric lighting and the telephone, are so ubiquitous-found contributing to the evolution of the activities and relations of so many distinct sites of prac-tice-that it requires considerable effort to understand their effects on society, assuming that such a global notion of "effects" even makes sense (Hughes, 1983; Latour, 1987).

    Computers fall in this latter category of ubiquitous technologies. In from an analytical standpoint, computers are worse than that. Com-

    puters are representational artifacts, and the people who design them often start by constructing representations of the activities found in the sites where they will be used. This is the purpose of systems analysis, for example, and of the systematic mapping of conceptual entities and rela-tions in the early stages of database design. A computer, does not

    have an instrumental use in a site of practice; the computer is frequently about that site in its very design. In this sense and others, computing has been constituted as a kind of imperialism; it aims to reinvent virtually every other site of practice in its own image.

    As a result, the institutional relations between the computer world and the rest of the world can be tremendously complicated-much more

    131

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 46 of 61

     What is “critical”?   “I finally comprehended the difference between

    critical thinking and its opposite. Technical people are not dumb, quite the contrary, but technical curricula rarely include critical thinking in the sense I have in mind. Critical thinking means that you can, so to speak, see your glasses. You can look at the world, or you can back up and look at the framework of concepts and assumptions and practices through which you look at the world.” (Agre, 2000)

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 47 of 61

     Backing up in ML   Back up and ask, what is the framework of concepts and assumptions and practices through which ML looks at the world?

      Identify this to understand the limits, and proper use, of ML

      Start from the very beginning: quantification. End at ML-specific model validation.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 48 of 61

     Larger goals of this work   Communication oriented to solutions –  “Burn it all down” on one side; totalitarian

    quantification + optimization on the other

      Critics can be specific about what they object to, not just “ML is quantitative therefore evil”

      ML can see how critiques show points of potential failure that can be addressed, rather than dismissing critiques

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 49 of 61

     Pitfalls   The fallacy of alternatives (Agre, 1997): “Their stance

    was: if your alternative is so good then you will use it to write programs that solve problems better than anybody else's, and then everybody will believe you.”

      Too limited to say, “social science can help solve a problem better”

      Or even: “social science can help you identify the right problems”

      Critical social science will reframe what even counts as a problem!

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 50 of 61

     Future steps

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 51 of 61

     Future steps   Many (but not all) of the problems can be alleviated

    (but not solved) through mixed methods: incorporating alternatives (qualitative research, statistical modeling, experimental design, etc.)

      But mixed methods are hard!   We will need to develop both intellectual frameworks

    for mixed methods ML (e.g., purposeful division of labor), as well as cultural practices within ML for actually doing it. Future work, guided by this attempt to comprehensively map key points of failure!

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 52 of 61

     Limitations   Focused on research, not industry; some things (especially implementation, which industry may know lots about) may not be meaningful critiques

      Still developing what the intellectual framework would be, and what the cultural practices would be!

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 53 of 61

     Thank you!

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 54 of 61

     References (1/8) Abbott, Andrew. “Transcending General Linear

    Reality.” Sociological Theory 6, no. 2 (1988): 169–186. https://dx.doi.org/10.2307/202114.

    Agre, Philip E. “Towards a Critical Technical Practice: Lessons Learned from Trying to Reform AI.” In Social Science, Technical Systems, and Cooperative Work: Beyond the Great Divide, edited by Geoffrey C. Bowker, Susan Leigh Star, Will Turner, and Les Gasser, 131–158. Lawrence Erlbaum Associates, 11997. https://web.archive.org/web/20040203070641/http://polaris.gseis.ucla.edu/pagre/critical.html.

    Agre, Philip E. “Notes on critical thinking, Microsoft, and eBay, along with a bunch of recommendations and some URL's.” Red Rock Eater Newsletter, 12 July 2000.

    https://pages.gseis.ucla.edu/faculty/agre/notes/00-7-12.html.

    Bailey, David H., Jonathan M. Borwein, Marcos López de Prado, and Qiji Jim Zhu. “Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance.” Notices of the AMS 61, no. 5 (2014): 458–471. https://dx.doi.org/10.1090/noti1105.

    Bergmeir, Christoph, Rob J. Hyndman, and Bonsoo Koo. “A Note on the Validity of Cross-Validation for Evaluating Autoregressive Time Series Prediction.” Computational Statistics & Data Analysis 120 (2018): 70–83. https://dx.doi.org/10.1016/j.csda.2017.11.003.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 55 of 61

     References (2/8) Borgatti, Steve. “Types of Validity.” BA 762: Research

    Methods. Gatton College of Business & Engineering, University of Kentucky, 2019. https://sites.google.com/site/ba762researchmethods/materials/handouts/typesofvalidity.

    Box, George E. P. “Robustness in the Strategy of Scientific Model Building.” Technical Report #1954. Mathematics Research Center, University of Wisconsin-Madison, 1979.

    Breiman, Leo. “Statistical Modeling: The Two Cultures.” Statistical Science 16, no. 3 (2001): 199–231. https://dx.doi.org/10.1214/ss/1009213726

    Cardoso, Fatima, Laura J. van’t Veer, Jan Bogaerts, Leen Slaets, Giuseppe Viale, Suzette Delaloge, Jean-Yves Pierga, Etienne Brain, Sylvain

    Causeret, Mauro DeLorenzi, Annuska M. Glas, Vassilis Golfinopoulos, Theodora Goulioti, Susan Knox, Erika Matos, Bart Meulemans, Peter A. Neijenhuis, Ulrike Nitz, Rodolfo Passalacqua, Peter Ravdin, Isabel T. Rubio, Mahasti Saghatchian, Tineke J. Smilde, Christos Sotiriou, Lisette Stork, Carolyn Straehle, Geraldine Thomas, Alastair M. Thompson, Jacobus M. van der Hoeven, Peter Vuylsteke, René Bernards, Konstantinos Tryfonidis, Emiel Rutgers, and Martine Piccart. “70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer.” New England Journal of Medicine 375, no. 8 (2016): 717–729. https://dx.doi.org/10.1056/NEJMoa1602253.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 56 of 61

     References (3/8) Chatfield, Chris. “Model Uncertainty, Data Mining

    and Statistical Inference.” Journal of the Royal Statistical Society. Series A (Statistics in Society) 158, no. 3 (1995): 419–466. https://dx.doi.org/10.2307/2983440.

    Cox, David R. “Role of Models in Statistical Analysis.” Statistical Science 5, no. 2 (May 1990): 169–174. https://dx/doi.org/10.1214/ss/1177012165

    Doshi-Velez, Finale and Been Kim. Towards a Rigorous Science of Interpretable Machine Learning. 2017. https://arxiv.org/abs/1702.08608.

    Dwork, Cynthia, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. “The Reusable Holdout: Preserving Validity in Adaptive Data Analysis.” Science 349, no. 6248 (2015): 636–638.

    https://dx.doi.org/10.1126/science.aaa9375. Efron, Bradley. “The Estimation of Prediction Error:

    Covariance Penalties and Cross-Validation.” Journal of the American Statistical Association 99, no. 467 (2004): 619–632. https://dx.doi.org/10.1198/016214504000000692.

    Fisher, R. A. “On the Mathematical Foundations of Theoretical Statistics.” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 222 (1922): 309–368. https://dx/doi.org/10.1098/rsta.1922.0009

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 57 of 61

     References (4/8) Gayo-Avello, Daniel. “‘I Wanted to Predict Elections

    with Twitter and all I got was this Lousy Paper’: A Balanced Survey on Election Prediction using Twitter Data.” 2012. https://arxiv.org/abs/1204.6441.

    Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. “Detecting Influenza Epidemics Using Search Engine Query Data.” Nature 457 (2009): 1012–1015. https://dx.doi.org/10.1038/nature07634.

    Hammerla, Nils Y., and Thomas Plötz. “Let’s (Not) Stick Together: Pairwise Similarity Biases Cross-Validation in Activity Recognition.” In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’15), 1041–1051.

    https://dx.doi.org/10.1145/2750858.2807551. Jones, Matthew L. “How We Became

    Instrumentalists (Again): Data Positivism since World War II.” Historical Studies in the Natural Sciences 48, no. 5 (2018): 673–684. https://dx.doi.org/10.1525/hsns.2018.48.5.673.

    Kass, Robert E. “Statistical Inference: The Big Picture.” Statistical Science 26, no. 1 (2011): 1–9. https://dx.doi.org/10.1214/10-STS337.

    Keyes, Os. “The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition.” In Proceedings of the ACM on Human-Computer Interaction (CSCW), 2, 88:1–88:22, 2018.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 58 of 61

     References (5/8) Kleinberg, Jon, Jens Ludwig, Sendhil Mullainathan,

    and Ziad Obermeyer. “Prediction Policy Problems.” American Economic Review 105, no. 5 (2015): 491–495. https://dx.doi.org/10.1257/aer.p20151023.

    Lanius, Candice. “Fact Check: Your Demand for Statistical Proof is Racist.” Cyborgology blog, January 15, 2015. https://thesocietypages.org/cyborgology/2015/01/12/fact-check-your-demand-for-statistical-proof-is-racist/.

    Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343, no. 6176 (2014): 1203–1205. https://dx.doi.org/10.1126/science.1248506.

    Lipton, Zachary C. “The Myth of Model Interpretability.” KDnuggets 15, no. 13 (April 2015). https://www.kdnuggets.com/2015/04/model-interpretability-neural-networks-deep-learning.html.

    Lipton, Zachary C. and Jacob Steinhardt. “Troubling trends in machine learning scholarship.” 2018. https://arxiv.org/abs/1807.03341.

    Messerli, Franz H. “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” The New England Journal of Medicine, 367 (2012): 1562–1564. doi:10.1056/NEJMon1211064.

    Mullainathan, Sendhil and Jann Spiess. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31, no. 2 (2017): 87–106. https://dx.doi.org/10.1257/jep.31.2.87.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 59 of 61

     References (6/8) Opsomer, Jean, Yuedong Wang, and Yuhong Yang.

    “Nonparametric Regression with Correlated Errors.” Statistical Science 16, no. 2 (2001): 134–153. https://dx.doi.org/10.1214/ss/1009213287.

    Park, Greg. “The Dangers of Overfitting: A Kaggle Postmortem.” 2012. http://gregpark.io/blog/Kaggle-Psychopathy-Postmortem/.

    Patton, Michael Quinn. “The Nature, Niche, Value, and Fruit of Qualitative Inquiry.” In Qualitative Research & Evaluation Methods: Integrating Theory and Practice, 4th edition, 2–44. SAGE Publications, Inc., 2014. https://uk.sagepub.com/sites/default/files/upm-binaries/64990_Patton_Ch_01.pdf.

    Rescher, Nicholas. Predicting the Future: An Introduction to the Theory of Forecasting. State University of New York Press, 1998.

    Rose, Todd. The End of Average: How We Succeed in a World That Values Sameness. New York: HarperOne, 2016. See excerpt at https://www.thestar.com/news/insight/2016/01/16/when-us-air-force-discovered-the-flaw-of-averages.html. Animated video: https://vimeo.com/237632676.

    Rosset, Saharon, and Ryan J. Tibshirani. “From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation.” Journal of the American Statistical Association (2019). https://dx.doi.org/10.1080/01621459.2018.1424632.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 60 of 61

     References (7/8) Santillana, Mauricio, Wendong Zhang, Benjamin M.

    Althouse, and John W. Ayers. “What Can Digital Disease Detection Learn from (an External Revision to) Google Flu Trends?” American Journal of Preventive Medicine 47, no. 3 (2014): 341–347. http://dx.doi.org/10.1016/j.amepre.2014.05.020.

    Shapiro, Ian. “Methods are like people: If you focus only on what they can’t do, you will always be disappointed.” In Field Experiments and Their Critics: Essays on the Uses and Abuses of Experimentation in the Social Sciences, edited by Dawn Langan Teele, 228–241. Yale University Press, 2014.

    Shmueli, Galit. “To Explain or to Predict?” Statistical Science 25, no. 3 (2010): 289–310. https://dx.doi.org/10.1214/10-STS330.

    Spirtes, Peter and Kun Zhang. “Causal Discovery and Inference: Concepts and Recent Methodological Advances.” Applied Informatics 3, no. 3 (2016): 1–28. https://dx.doi.org/10.1186/s40535-016-0018-x.

    Tibshirani, Robert. “Recent Advances in Post-Selection Inference.” Breiman Lecture, NIPS 2015 (9 December 2015) http://statweb.stanford.edu/~tibs/ftp/nips2015.pdf

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 61 of 61

     References (8/8) van’t Veer, Laura J., Hongyue Dai, Marc J. van de

    Vijver, Yudong D. He, Augustinus A. M. Hart, Mao Mao, Hans L. Peterse, Karin van der Kooy, Matthew J. Marton, Anke T. Witteveen, George J. Schreiber, Ron M. Kerkhoven, Chris Roberts, Peter S. Linsley, René Bernards, and Stephen H. Friend. “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer.” Nature 415, no. 6871 (2002): 530–536. https://dx.doi.org/10.1038/415530a.

    Wallach, Hanna. “Computational Social Science ≠ Computer Science + Social Data.” Communications of the ACM 61, no. 3 (2018): 42–44. https://dx.doi.org/10.1145/3132698.

    Wasserman, Larry A. “Rise of the Machines.” In Past, Present, and Future of Statistical Science, 525–536. Boca Raton, FL: Chapman and Hall/CRC, 2013. http://www.stat.cmu.edu/~larry/Wasserman.pdf

    Watts, Duncan J. “The ‘New’ Science of Networks.” Annual Review of Sociology 30 (2004): 243–270. https://dx.doi.org/10.1146/annurev.soc.30.020404.104342.

    Wu, Shaohua, T. J. Harris, and K. B. McAuley, “The Use of Simplified or Misspecified Models: Linear Case.” The Canadian Journal of Chemical Engineering 85, no. 4 (2007): 386–398. https://dx.doi.org/10.1002/cjce.5450850401.

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 62 of 61

     Backup slides

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Backup slides

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 63 of 61

     “True” models predict worse   A linear data-generating process.

      Wu et al. (2007): Fitting only Xp has lower expected MSE than fitting the model that generated the data when:

    AAACYHicbVFNa9tAEF0pTeo6X056ay9LTCGhJUih0EIvIbm0l5JCHRssV4zWI3vJrqTsjhqM8J/MLYde+ku6Ulzjxh1YePPezOzs26RQ0lIQPHj+xrPNreetF+3tnd29/c7B4bXNSyOwJ3KVm0ECFpXMsEeSFA4Kg6AThf3k5rLW+z/RWJln32lW4EjDJJOpFECOijt3kQaaJmk1m/PISs2bXICqvjpCYUrH/G9JlCDBPC6WxKBO3vKGj29X6dt3PPpUD5xo+HG2VL64mUZOpnQSd7rBadAEXwfhAnTZIq7izn00zkWpMSOhwNphGBQ0qsCQFArn7ai0WIC4gQkOHcxAox1VjUFz/sYxY57mxp2MeMOudlSgrZ3pxFXWm9qnWk3+TxuWlH4cVTIrSsJMPF6UlopTzmu3+VgaFKRmDoAw0u3KxRQMCHJ/0nYmhE+fvA6uz05Dh7+9755fLOxosdfsiB2zkH1g5+wzu2I9Jtgvb8Pb8Xa9337L3/cPHkt9b9Hzkv0T/qs/yia2NA==

    AAACXXicbZFNSwMxEIaz61dbq1Y9ePASLIIelF0R9OCh6KXeKvQLum3JptkazGa3yaxQlv2T3vTiXzH9Ets6EHjnmUwyeePHgmtwnE/L3tjc2t7J5Qu7xb39g9LhUVNHiaKsQSMRqbZPNBNcsgZwEKwdK0ZCX7CW//Y0qbfemdI8knUYx6wbkqHkAacEDOqXwAsJvPpB6vkMSNZPR1kvrWcL2v4lFwv0bJDMrhZp1aRxdrncsH4ofsAj7Gk+DEnvpl8qO9fONPC6cOeijOZR65c+vEFEk5BJoIJo3XGdGLopUcCpYFnBSzSLCX0jQ9YxUpKQ6W46dSfD54YMcBApsyTgKf3bkZJQ63Hom52TsfVqbQL/q3USCO67KZdxAkzS2UVBIjBEeGI1HnDFKIixEYQqbmbF9JUoQsF8SMGY4K4+eV00b65do19uy5XHuR05dIrO0AVy0R2qoCqqoQai6MtCVt4qWN/2ll2092dbbWvec4yWwj75AX7DuJc=

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Backup slides

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 64 of 61

     Proposal: Precise language Predict the likelihood: Calculate the likelihood Predict the risk, predict the probability: Estimate the risk, estimate the probability Prediction, predicted: Fitted value, fitted We predict: We detect, we classify, we model X predicts Y: X is correlated with Y X predicts Y, ceteris paribus (partial correlation): X is associated with Y

    Introduction

    Meaning and measurement

    Central tendency

    Causality

    Capturing variability

    Cross-validation

    Reflection

    Future steps

    References

    Backup slides

  • A Hierarchy of Limitations of ML https://MominMalik.com/msrne2019.pdf 65 of 61

     Proposal: Alternative language Retrodiction Backtesting (retrodiction for testing) Hindcasting (backtesting for forecasting)

      In-sample vs.   Interpolation vs.   Diagnosis vs.   Retrospective vs.

      Out of-sample   Extrapolation   Prognosis   Prospective

    Introduction

    Meaning and measurement

    Central tendency

    Causality