safe and fair machine learning · 2019-10-06 · reinforcement learning • historical data,...

Safe and Fair Machine Learning

Philip S. Thomas

College of Information and Computer Sciences, UMass AmherstMSR Reinforcement Learning Day, October 3, 2019

Andy Barto Blossom MetevierStephen Giguere Bruno Castro da SilvaEmma BrunskillYuriy Brun

Georgios Theocharous Mohammad GhavamzadehAri Kobren Sarah Brockman

Machine learning algorithms should avoid undesirable behaviors.

This Photo by Unknown Author is licensed under CC BY-NC-ND

http://diabetesaliciousness.blogspot.com/2012_01_01_archive.html

https://creativecommons.org/licenses/by-nc-nd/3.0/

Undesirable behavior of ML algorithms is causing harm.

Supervised Learning(Classification and Regression)

Reinforcement Learning

This Photo by Unknown Author is licensed under CC BY-SA-NCThis Photo by Unknown Author is licensed under CC BY-SAThis Photo by Unknown Author is licensed under CC BY-NDThis Photo by Unknown Author is licensed under CC BY-SAThis Photo by Unknown Author is licensed under CC BY-SA-NCThis Photo by Unknown Author is licensed under CC BY-SAThis Photo by Unknown Author is licensed under CC BY-SA-NC

http://prehospitalresearch.eu/?p=2127

https://creativecommons.org/licenses/by-nc-sa/3.0/

https://en.wikipedia.org/wiki/Epilepsy

https://creativecommons.org/licenses/by-sa/3.0/

http://theconversation.com/if-machines-can-beat-us-at-games-does-it-make-them-more-intelligent-than-us-60555

https://creativecommons.org/licenses/by-nd/3.0/

https://en.wikipedia.org/wiki/Atari


http://sudofixit.blogspot.com/2013/10/svg-dota-2-logo.html


https://scifi.stackexchange.com/questions/167201/what-kind-of-creature-is-pac-man


https://heroesofthestorm.gamepedia.com/StarCraft


Can we create algorithms that allow their users to more easily control their behavior?

Desiderata

• Easy for users to define undesirable behavior.

• Guarantee that the algorithm will not produce this undesirable behavior.

Mean Time Hypoglycemicvs

Weighted Mean Time Hypoglycemic

Learn to predict loan repayment, and don’t discriminate.

Data, 𝐷𝐷

Algorithm, 𝑎𝑎

𝑋𝑋1

𝑋𝑋2

Solution, 𝜃𝜃

Repay

Default

This Photo by Unknown Author is licensed under CC BY-NC-ND

http://thisisnotthatblog.com/2013/05/07/good-news-for-people-who-hate-boring-news/

https://creativecommons.org/licenses/by-nc-nd/3.0/

Learn to predict job aptitude, and don’t discriminate.

Data, 𝐷𝐷

Algorithm, 𝑎𝑎

𝑋𝑋1

𝑋𝑋2

Solution, 𝜃𝜃

Hire

Decline

Learn to predict landslide severity, but don’t under-estimate.

Data, 𝐷𝐷

Algorithm, 𝑎𝑎

𝑋𝑋

Severity

Solution, 𝜃𝜃

Learn how much insulin to inject, but don’t increase the prevalence of hypoglycemia.

Data, 𝐷𝐷

Algorithm, 𝑎𝑎 Solution, 𝜃𝜃

injection =current − target

𝜃𝜃1+meal size

𝜃𝜃2

Predict the average human height, but do not over-estimate.

Data, 𝐷𝐷

Algorithm, 𝑎𝑎

1.7 meters

Solution, 𝜃𝜃

Achieve [main goal] but do not produce [undesirable behavior].

Data, 𝐷𝐷


Probability, 𝛿𝛿

𝑋𝑋1

𝑋𝑋2

+1

−1

Learn to predict loan repayment, and don’t discriminate.

Data, 𝐷𝐷


Probability, 𝛿𝛿 = 0.01

𝑋𝑋1

𝑋𝑋2

Repay

Default

(From four people)

Learn how much insulin to inject, but don’t ever allow blood glucose to deviate from optimal by more than 1.2mg

dL.

Data, 𝐷𝐷


Probability, 𝛿𝛿 = 0.05injection =

current − target𝜃𝜃1

+meal size

𝜃𝜃2

Achieve [main goal] but do not produce [undesirable behavior].

Data, 𝐷𝐷

Algorithm, 𝑎𝑎 Solution, 𝜃𝜃OR

No Solution Found


𝑋𝑋1

𝑋𝑋2

+1

−1

Desiderata

• Interface for defining undesirable behavior.• User-specified probability, 𝛿𝛿.• Guarantee that the probability of a solution that produces

undesirable behavior is at most 𝛿𝛿.

Notation

• Let 𝐷𝐷 be all of the training data.• 𝐷𝐷 is a random variable

• Let Θ be the set of all possible solutions the algorithm can return.• Let 𝑓𝑓:Θ → ℝ be the primary objective function.• Let 𝑎𝑎 be a machine learning algorithm.

• 𝑎𝑎:𝒟𝒟 → Θ• 𝑎𝑎(𝐷𝐷) is the solution returned by the algorithm when run on data 𝐷𝐷

• Let 𝑔𝑔:Θ → ℝ be a function that measures undesirable behavior• 𝑔𝑔 𝜃𝜃 ≤ 0 if and only if 𝜃𝜃 does not produce undesirable behavior• 𝑔𝑔 𝜃𝜃 > 0 if and only if 𝜃𝜃 produces undesirable behavior

• Let NSF ∈ Θ and 𝑔𝑔 NSF = 0.

Desiderata

• Provide the user with an interface for defining undesirable behavior(i.e., defining 𝑔𝑔).

• Attempt to optimize a (possibly user-provided) objective 𝑓𝑓.• Guarantee that

Pr 𝑔𝑔 𝑎𝑎 𝐷𝐷 ≤ 0 ≥ 1− 𝛿𝛿.• The probability that the algorithm returns a solution that does not produce

undesirable behavior is at least 1− 𝛿𝛿.• The probability that the algorithm returns a solution that produces

undesirable behavior is at most 𝛿𝛿.• We need a name for algorithms that provide this guarantee.

Pr 𝑔𝑔 𝑎𝑎 𝐷𝐷 ≤ 0 ≥ 1− 𝛿𝛿

• An algorithm that provides this guarantee is safe.• An algorithm that provides this guarantee is Seldonian.• Quasi-Seldonian: Reasonable false assumptions

• Appeals to central limit theorem

This Photo by Unknown Author is licensed under CC BY-SA

https://en.wikipedia.org/wiki/Epilepsy


Seldonian Framework

• Framework for designing machine learning algorithms.• Provide the user with an interface for defining undesirable behavior (i.e.,

defining 𝑔𝑔).• Attempt to optimize a (possibly user-provided) objective 𝑓𝑓.• Guarantee that

Pr 𝑔𝑔 𝑎𝑎 𝐷𝐷 ≤ 0 ≥ 1− 𝛿𝛿.• This guarantee does not depend on any hyperparameter settings.

• I am not promoting a specific algorithm.• The algorithms I am going to discuss are extremely simple examples.• These examples show feasibility.

• I am promoting the framework.

Example Usage

• 𝑋𝑋 is a vector of features describing a person convicted of a crime.• 𝑌𝑌 is 1 if the person committed a subsequent violent crime and 0

otherwise.• Find a solution, 𝜃𝜃, such that �𝑦𝑦(𝑋𝑋,𝜃𝜃) is a good estimator of 𝑌𝑌.• 𝑔𝑔 𝜃𝜃 = Pr �𝑦𝑦 𝑋𝑋,𝜃𝜃 = 1 White − Pr �𝑦𝑦 𝑋𝑋,𝜃𝜃 = 1 Not White − 𝜖𝜖

• “Demographic Parity”• 𝑔𝑔 𝜃𝜃 ≤ 0 iff Pr �𝑦𝑦 𝑋𝑋,𝜃𝜃 = 1 White ≈ Pr �𝑦𝑦 𝑋𝑋,𝜃𝜃 = 1 Not White

• 𝑔𝑔 𝜃𝜃 = Pr FP White − Pr(FP|Not White) − 𝜖𝜖• “Predictive Equality”• 𝑔𝑔 𝜃𝜃 ≤ 0 iff Pr FP White ≈ Pr FP Not White

Minimize classification loss, use𝑔𝑔 𝜃𝜃 = Pr FP White − Pr(FP|Not White) − 𝜖𝜖

Data, 𝐷𝐷


No Solution Found


𝑋𝑋1

𝑋𝑋2

+1

−1

• Provide code for 𝑔𝑔:


• Provide code for unbiased estimates of 𝑔𝑔:


• Write an equation for 𝑔𝑔:𝑔𝑔 𝜃𝜃 = Pr FP White − Pr(FP|Not White) − 𝜖𝜖

• Can use any of the common variables already provided.• Classification: FP, FN, TP, TN, conditional FP, accuracy, probability positive, etc.• Regression: MSE, ME, conditional MSE, conditional ME, mean prediction, etc.• Reinforcement learning: Expected return, conditional expected return

• Can use any other variable for which they can provide unbiased estimates from data.

• Can use any supported operators +,−, %, abs,min,max, etc.



Data, 𝐷𝐷


No Solution Found


𝑋𝑋1

𝑋𝑋2

+1

−1

Algorithm 11

𝐷𝐷𝐷𝐷candidate

𝐷𝐷safety

Candidate Selection

Safety Test

Candidate solution, 𝜃𝜃𝑐𝑐

{𝜃𝜃𝑐𝑐 , NSF}

Safety Test

• Consider the earlier example: 𝑔𝑔 𝜃𝜃 = Pr FP White − Pr(FP|Not White) − 𝜖𝜖

• Given 𝜃𝜃𝑐𝑐 and 𝐷𝐷safety, output either 𝜃𝜃𝑐𝑐 or NSF

−

abs 𝜖𝜖

−

Pr FP White Pr FP White

0 1 0 1

0 1

0 1 0 1

0 1

Candidate Selection

• Use 𝐷𝐷candidate to pick the solution, 𝜃𝜃𝑐𝑐, predicted to:• Optimize the primary objective, 𝑓𝑓• Pass the subsequent safety test


• Historical data, 𝐷𝐷, is data collected from running some current policy, 𝜋𝜋cur.• A solution, 𝜃𝜃, is a policy or policy parameters.• User can define multiple objectives (reward functions), and can require

improvement (or limit degradation) with respect to all.• 𝑔𝑔 𝜃𝜃 = 𝐄𝐄 ∑𝑡𝑡=0∞ 𝛾𝛾𝑡𝑡𝑅𝑅𝑡𝑡 𝜋𝜋cur − 𝐄𝐄 ∑𝑡𝑡=0∞ 𝛾𝛾𝑡𝑡𝑅𝑅𝑡𝑡 𝜃𝜃

Pr 𝐄𝐄 ∑𝑡𝑡=0∞ 𝛾𝛾𝑡𝑡𝑅𝑅𝑡𝑡 𝜋𝜋cur ≤ 𝐄𝐄 ∑𝑡𝑡=0∞ 𝛾𝛾𝑡𝑡𝑅𝑅𝑡𝑡 𝑎𝑎(𝐷𝐷) ≥ 1− 𝛿𝛿

• Monte Carlo returns are unbiased estimates of 𝐄𝐄 ∑𝑡𝑡=0∞ 𝛾𝛾𝑡𝑡𝑅𝑅𝑡𝑡 𝜋𝜋cur . • Use importance sampling to obtain unbiased estimates of 𝐄𝐄 ∑𝑡𝑡=0∞ 𝛾𝛾𝑡𝑡𝑅𝑅𝑡𝑡 𝜃𝜃𝑐𝑐 .


• The ability to require improvement w.r.t. multiple objectives makes objective specification easier.

• Try to change the current policy to one that reaches the goal quicker in expectation.

• Do not increase the probability that the agent steps in the water.

Start Goal

A Powerful Interface for Reinforcement Learning

• Have user label trajectories with a value 𝐿𝐿 ∈ 1,0 :• Undesirable event: 1• No undesirable event: 0

• 𝐄𝐄 𝐿𝐿 is the probability that the undesirable event will occur.• Let 𝑔𝑔 𝜃𝜃 = 𝐄𝐄 𝐿𝐿 𝜃𝜃 − 𝐄𝐄 𝐿𝐿 𝜋𝜋cur

Pr 𝐄𝐄[𝐿𝐿|𝑎𝑎(𝐷𝐷)] ≤ 𝐄𝐄 𝐿𝐿 𝜋𝜋cur ≥ 1− 𝛿𝛿

• The probability that the policy will be changed to one that increases the probability of an undesirable event is at most 𝛿𝛿.

• The user need only be able to identify undesirable events!

Example: Type 1 Diabetes Management

• Undesirable event: the person experienced hypoglycemia during the day.

• Try to keep blood glucose close to ideal levels (primary reward function), but guarantee with probability 0.95 that the probability of a hypoglycemic event will not increase.

Example: Classification (GPA, Disparate Impact)

Example: Classification (GPA, Demographic Parity)

Example: Classification (GPA, Equalized Odds)

Example: Bandit (Tutoring)

Example: Bandit (Tutoring, Skewed Proportions)

Example: Bandit (Loan Approval, Disparate Impact)

Example: Bandit (Recidivism, Statistical Parity)

Example: RL (Type 1 Diabetes Management)

Example: RL (HCPI, Mountain Car)

Example: RL (Daedalus)

Example: RL (Require significant improvement)

Future Research Directions• How to partition data?• Why NSF (Not enough data? Conflicting constraints? Failed internal prediction? Available 𝛿𝛿?)• How to divide up 𝛿𝛿 among base variables and solutions / intelligence interval propagation?• How to trade-off primary objective and predicted-safety-test in candidate selection in a principled

way?• Secure Seldonian algorithms• Combine with reward machines (specification for 𝑔𝑔, and perhaps 𝑓𝑓)?• Multi-Agent RL, (with different constraints on different agents)?• Extend Fairlearn to be Seldonian / to settings other than classification?• Improved off-policy estimators for RL safety tests• Sequential Seldonian algorithms• Efficient optimization in candidate selection• Actual HCI interface (natural language?)• Better concentration inequalities (sequences?)

Watch For:

• High Confidence Policy Improvement (ICML 2015)• Offline Contextual Bandits with High Probability Fairness Guarantees

• NeurIPS 2019

• On Ensuring that Intelligent Machines are Well-Behaved• 2017 Arxiv paper updated soon!

safe and fair machine learning · 2019-10-06 · reinforcement learning • historical data,...

Documents