data mining, privacy and (non-)discrimination bettina berendt, ku leuven knowledge and the web /...
Post on 19-Jan-2016
220 Views
Preview:
TRANSCRIPT
Data mining, privacy and (non-)discrimination
Bettina Berendt, KU Leuven
Knowledge and the Web /
Privacy and Big Data courses 2015last updated 9 December 2015
AgendaMotivation: concepts and current cases
(Classical) discrimination-aware data mining
Exploratory discrimination-aware data mining; evaluation
(Some) limitations + outlook
Privacy and non-discrimination
Two fundamental rights In ICT and data mining:
Violations may result from the use of certain information
Protection may result from changing processing w.r.t. this information (e.g. “features“)
“privacy-preserving data mining/publishing“
“discrimination-aware data mining“
Is this discrimination?
https://www.wonga.com analyses, among other things, your social-media data to determine your creditworthiness Assume (cf. examples from last week) that it generates
patterns that deny a loan to1. People who like Converse sneakers
2. People who like Oil of Olay
Assume that this is because people who ... in the past very rarely paid back their loans.
(from Martijn Van Otterlo‘s presentation in Privacy and Big Data 2015)
PS: China‘s Social Credit Score (1) (from the Los Angeles Times)
in China, government authorities are hard at work devising their own e-database to rate each and every one of the nation's 1.3 billion citizens by 2020 using metrics that include whether they pay their bills on time, plagiarize schoolwork, break traffic laws or adhere to birth-control regulations.
PS: China‘s Social Credit Score (2)
China — largely atheist and lacking a strong civil society sector — has struggled for years to find a way to incentivize and reward moral and responsible behavior. It has launched appeals for citizens to uphold "traditional Chinese values" and […]
But the country continues to be shocked by incidents of callous, dishonest and immoral behavior, China prepares to rank its citizens on 'social credit' such as pedestrians refusing to help seniors who have fallen down (because they fear being sued by elderly extortionists), and motorists who accidentally strike pedestrians intentionally hitting them again to ensure they're dead (otherwise, the motorist would have to pay lifelong compensation for injuries).
The Social Credit System, the State Council says, offers hope of addressing this: "Only if there is mutual sincere treatment between members of society, and only if sincerity is fundamental, will it be possible to create harmonious and amicable interpersonal relationships. ... and realize social harmony, stability and a long period of peace and order."
Data and discrimination
E.g. a credit scoring & loan granting system uses/shares a person‘s personal data makes loan decisions depend on personal data
= differential treatment
Differential treatment is unlawful discrimination if it is based on “unjust grounds“ (e.g., gender)
Attention! This is a preliminary definition in the legal sense!
“Discrimination is forbidden“
In many areas, including Labour Loans Insurance
The protected-by-law grounds differ by area, but usually include gender, disability, age and sexual orientation, cultural, religious and linguistic beliefs/affiliation
A short intro: (Naudts, 2015) – PaBD lecture #6
“You may no longer ...“
European Court of Justice (2011) Case C-236/09, Association Belge des Consommateurs Test-Achats ASBL and Others v Conseil des ministres:
(18) The use of actuarial factors related to sex is widespread in the provision of insurance and other related financial services. In order to ensure equal treatment between men and women, the use of sex as an actuarial factor should not result in differences in individuals’ premiums and benefits. To avoid a sudden readjustment of the market, the implementation of this rule should apply only to new contracts concluded after the date of transposition of this Directive.
Historical examples: only { rich | white | male } people get to vote
Data mining (DM) and discrimination (D) (1)
“DM avoids D.“ E.g. in the domain of predictive policing: Dave Eggers, The Circle: start-up pitch
(warning: satire) Chicago police “heat list“ Relapse prediction and parole decisions
From The Economist, 2014
“The data that matter include the prisoner’s age at first arrest, his education, the nature of his crime, his behaviour in prison, his friends’ criminal records, the results of psychometric tests and even the sobriety of his mother while he was in the womb. The software estimates the probability that an inmate will relapse by comparing his profile with many others. The American version of LS/CMI, for example, holds data on 135,000 (and counting) parolees.
It is better to be guided by software than one’s gut, says Olivia Craven, head of the Idaho Commission of Pardons and Parole. Donna Sytek of the New Hampshire Parole Board agrees. Unaided, parole board members rely too much on their personal experiences and make inconsistent decisions, she says.”
What‘s right about this?What‘s wrong with this?
Reflection question
Recommended reading: Legal view of predictive policing and Big
Data: (Ferguson, 2015) More CS thinking: (Berendt, 2015)
DM and D (2)
“DM can lead to D, but ... hm ... maybe there‘s something to it?“
Cf. Laurens Naudts‘ remarks on the rational basis test in law and the assumptions of rationality concerning statistics and data mining.
Cf. “It is better to be guided by software than one’s gut” above
What‘s right about this?What‘s wrong with this?
Reflection question
DM and D (3)
“DM can lead to D, but modifying the algorithm can fix it.“
Classical discrimination-aware data mining
What‘s right about this?What‘s wrong with this?
Part of today‘s lecture
Recommended reading: Sources and critique in (Berendt &
Preibusch, 2014)
DM and D (4)
“The point of DM is D. (And so is much of human civilization?!) DM can lead to D, but making the
workings of the algorithm transparent can help make this more visible and encourage reflection and, ultimately, corrective action.“
Exploratory discrimination-aware data mining
What‘s right about this?What‘s wrong with this?
Part of today‘s lecture Reflection question
Recommended reading: (Berendt & Preibusch, 2014)
AgendaMotivation: concepts and current cases
(Classical) discrimination-aware data mining
Exploratory discrimination-aware data mining; evaluation
(Some) limitations + outlook
Pedreschi, Ruggieri, & Turini (2008)
PD and PND items: potentially (not) discriminatory– goal: want to detect & block mined rules such as
purpose=new_car & gender = female → credit=no– measures of discriminatory power of a rule include
elift (B&A → C) = conf (B&A → C) / conf (B → C) ,
where A is a PD item and B a PND item
Note: 2 uses/tasks of data mining here: Descriptive
“In the past, women who got a loan for a new car often defaulted on it.“
Prescriptive (Therefore) “Women who want a new car should not get a loan.“
Why not just “delete“ PD attributes?
If focus is detection: Prevents detection
If focus is prevention: May reproduce
indirect discrimination
... and this indirect discrimination will also not be detected!
DADM: Examples and DCUBE output
Three points of intervention for DADM – algorithmic / “classical“ Post-processing
As a filter on the mining results (e.g. DCUBE)
Pre-processing Similar to the distortion-based techniques for privacy-
preserving association-rule mining e.g. Hajian et al. 2013ff.
In-processing e.g. Kamiran et al. 2010: change tree-learning algorithm at each node, the good split will be the one that achieves a
high purity with respect to the class label (e.g. credit good/bad), but a low purity with respect to the sensitive attribute (e.g. gender).
Many algorithms also avoid indirect discrimination (as formally defined via correlations / probabilistic implication).
Recall: Example weather data
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHigh Hot Sunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlook
Recall: Decision tree learning for Classification / prediction
In which weather will someone play (tennis etc.)?
Result: this tree; but how to get there?
(Learned from the WEKA weather data)
Recall: Which attribute to select?
Recall: Which attribute to select?
Based on highest purity of the class attribute in the new nodes
(measured by entropy / info. gain)
Extending the weather dataGoal: learn a classifier that does not discriminate by gender
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHigh Hot Sunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlookGender
M
F
M
M
F
M
F
M
F
F
M
M
F
M
Assume this “pattern“ in the new weather data
NoTrueHighMildRainy
YesFalseNormalHotOvercast
YesTrueHighMildOvercast
YesTrueNormalMildSunny
YesFalseNormalMildRainy
YesFalseNormalCoolSunny
NoFalseHighMildSunny
YesTrueNormalCoolOvercast
NoTrueNormalCoolRainy
YesFalseNormalCoolRainy
YesFalseHighMildRainy
YesFalseHighHot Overcast
NoTrueHigh Hot Sunny
NoFalseHighHotSunny
PlayWindyHumidityTempOutlookGender
M
F
M
M
F
M
F
M
F
F
M
M
F
M
Which attribute to select now?
Based on highest purity of the class attribute in the new nodes
(measured by entropy / info. gain)AND
each node is low in purity w.r.t. gender (~ half/half)!
(Of course, in general, this does not need to lead to the
selection of the same attribute!)
AgendaMotivation: concepts and current cases
(Classical) discrimination-aware data mining
Exploratory discrimination-aware data mining; evaluation
(Some) limitations + outlook
Decision making: DM only?
But are (e.g. loan) decisions made fully automatically?
Cf. EU Privacy Directive, Article 15(1): “Member States shall grant the right to every person
not to be subject to a decision which produces legal effects concerning him or significantly affects him and which is based solely on automated processing of data intended to evaluate certain personal aspects relating to him, such as his performance at work, creditworthiness, reliability, conduct, etc.”
Four points of intervention for DADM – algorithmic & beyond
Pre-processing In-processing Post-processing
As a filter on the mining results (e.g. DCUBE) hiding “bad patterns“
In the interaction of a decision-support system (Berendt & Preibusch)
hiding or highlighting “bad patterns“
Limitations of classical DADM
„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“
… oh, or sexual orientation.
… (and so on)
Limitations of classical DADM
Detection
„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“
… oh, or sexual orientation.
… (and so on)
Limitations of classical DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“
… oh, or sexual orientation.
… (and so on)
Limitations of classical DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
„Thou shalt not discriminate on grounds of gender, skin colour, or nationality.“
… oh, or sexual orientation.
… (and so on)
Limitations of classical DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Limitations of classical DADM
Constraint-oriented DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Limitations of classical DADM
Constraint-oriented DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Avoidance of creation
Limitations of classical DADM
Constraint-oriented DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
Limitations of classical DADM
Constraint-oriented DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
Semi-automated decision support: sanitized rules sanitized minds?
Limitations of classical DADM
Constraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
Semi-automated decision support: sanitized rules sanitized minds?
Limitations of classical DADM
Constraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Exploratory data analysis supports feature construction, new feature analyses
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
Semi-automated decision support: sanitized rules sanitized minds?
Limitations of classical DADM
Constraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Exploratory data analysis supports feature construction, new feature analyses
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
?
Semi-automated decision support: sanitized rules sanitized minds?
Limitations of classical DADM
Constraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Exploratory data analysis supports feature construction, new feature analyses
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
?
Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection
better decisions?
How to do exploratory DADM?
Patterns that characterize classes Patterns that characterize rules Items, itemsets
interestingness measures
Visualisation, exploration, interactivity
Exploratory DADM: DCUBE-GUI
Left: rule count (size) vs. PD/non-PD (colour)
Exploratory DADM: DCUBE-GUI
Left: rule count (size) vs. PD/non-PD (colour)
Right: rule count (size) vs. AD-measure (rainbow-colours scale)
DCUBE-GUI: Co-occurrences of items in rule premises
Evaluating DADM
Algorithm-centric, automated measures User studies
Evaluation: Comparing c & eDADM
Constraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Exploratory data analysis supports feature construction, new feature analyses
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
?
Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection
better decisions?
“hiding bad patterns“, black box
“highlighting bad patterns“, white box
A more accurate definition of unlawful discrimination
Equality and discrimination are two sides of the same coin: “The principle of equality requires that equal situations are treated equally and
unequal situations differently. Failure to do so will amount to discrimination unless an objective and reasonable justification exists” - Explanatory memorandum protocol 12 to the ECHR
Differential/unequal treatment vs. discrimination: Differential treatment: neutral - tells us nothing about the legal
acceptability of a given measure. Discrimination: refers to unacceptable differential treatment (from
a legal perspective). Whether or not differential treatment is unacceptable and thus amounts to
discrimination is determined by the choices of law makers and judicial review. However: differential treatment may be perceived as unfair/unjust even if
tolerated by law.
An important example of EU non-discrimination law
European Convention on Human Rights • Art. 14 Prohibition of Discrimination “The enjoyment of the rights and freedoms set forth in this
Convention shall be secured without discrimination on any ground such as sex, race, colour, language, religion, political or other opinion, national or social origin, association with a national minority, property, birth or other status.”
Limitations (1): DADM‘s simple view of unlawful discrimination
1. A given differentiation in treatment may or may not be unlawful discrimination depending on the agent if based on “innocuous“ reasons (indirect discr.) depending on whether situations are comparable (“treat equal
things equally and unequal things unequally“) NOT differentiating by a protected attribute may constitute
discrimination! depending on aims and proportionality of means
e.g. “genuine occupational requirement“ depending on the changing social & legal environment
2. A fixed set of attributes makes it impossible to detect new forms of discrimination.
Data mining for loan decision support
Data
Algorithm
Pattern
Decision
Loan defaults Demographics, loan
purposes
Actionability, decision quality
Grant / deny loan, justify
Positive / negative risk factors
Graphical presentation
With / without discrimination
DM, cDADM, eDADM
Online experiment with 215 US mTurkersFraming Prevention:
bank Detection:
agency $6.00 show-up
fee
Tasks 3 Exercise tasks 6 Assessed
tasks $0.25
performance bonus per AT
Questionnaire Demographics Quant/bank job Experience with
discrimination
Dabiku is a Kenyan national. She is single and has no children. She has been employed as a manager for the past 10 years. She now asks for a loan of $10,000 for 24 months to set up her own business. She has $100 in her checking account and no other debts. There have been some delays in paying back past loans.
Decision-making scenario
Task structure Vignette, describing applicant and
application Rules: positive/negative risks, flagged Decision and motivation, optional
comment
Required competencies Discard discrimination-indexed rules Aggregate rule certainties Justify decision by categorising risk
factors
Rule visualisation by treatment
Constrained DADM Hide bad features Prevention
scenario
Exploratory DADM Flag bad features Detection
scenario
residence
savings residence
foreigner
(not DA)DM Neither flagged
nor hidden
residence
foreigner
Actionability and decision quality
Decisions and Motivations DA versus DADM
More correct decisions in DADM More correct motivations in DADM No performance impact
Relative merits Constrained DADM better for
prevention Exploratory DADM better for
detection
Berendt & Preibusch. Better decision support through exploratory discrimination-aware data mining. in: ARTI, 2014
Biases Discrimination
persistent in cDADM
‘‘I dropped the -.67 number a little bit because it included her being a female as a reason.’’
AgendaMotivation: concepts and current cases
(Classical) discrimination-aware data mining
Exploratory discrimination-aware data mining; evaluation
(Some) limitations + outlook
Limitations (1): DADM‘s simple view of unlawful discrimination
A given differentiation in treatment may or may not be unlawful discrimination depending on the agent if based on “innocuous“ reasons (indirect discr.) depending on whether situations are comparable
(“treat equal things equally and unequal things unequally“)
depending on aims and proportionality of means e.g. “genuine occupational requirement“
depending on the changing social & legal environment
Claim: The eDADM whitebox approach can
accommodate (some of) these complexities: provide more flexibility for detecting and
avoiding discrimination by positioning itself as a decision-support system
support awareness and reflection increase transparency increase accountability
Limitations (2) / Outlook: Social / Critical theories of discrimination
New discrimination grounds (see “mother“ ex.) Further patterns related to discrimination:
intersectionality + and – of hiding / showing features The hidden assumptions (+ effects!) of DM:
Ontological status of features? DM creates new features and new forms of
discrimination Notion of social justice underlying allocation?
Outlook: Evaluating these claims in practice
Constraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Exploratory data analysis supports feature construction, new feature analyses
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
?
Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection
better decisions?
Outlook: Developing the automated parts of eDADM furtherConstraint-oriented DADM Exploratory DADM
Detection
• Can only detect discrimination by pre-defined features / constraints
• Ex.: PD(female), PND(has-children), but discrimination of mothers
Exploratory data analysis supports feature construction, new feature analyses
Avoidance of creation
Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard)
?
Semi-automated decision support: sanitized rules sanitized minds? Salience, awareness, reflection
better decisions?
Thankyou!
References Makienen, J. (2015). China prepares to rank its citizens on 'social credit‘. Los Angeles Times, 15 November 2015. http://
www.latimes.com/world/asia/la-fg-china-credit-system-20151122-story.html The Economist (2014). Parole and Technology: Prison breakthrough. 19 April 2014. http://
www.economist.com/news/united-states/21601009-big-data-can-help-states-decide-whom-release-prison-prison-breakthrough
Ferguson, A.G. (2015). Big data and predictive reasonable suspicion. University of Pennsylvania Law Review, 163(2), 327-410. http://scholarship.law.upenn.edu/cgi/viewcontent.cgi?article=9464&context=penn_law_review
Berendt, B. (2015). Big Capta, Bad Science? http://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf Berendt, B. & Preibusch, S. (2014). Better decision support through exploratory discrimination-aware data mining:
foundations and empirical evidence. Artificial Intelligence and Law, 22 (2), 175-209 . http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_preibusch_2014.pdf
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of KDD’08, pp 560–568. ACM. http://www.di.unipi.it/~ruggieri/Papers/kdd2008.pdf
Ruggieri S, Pedreschi D, Turini F (2010). DCUBE: discrimination discovery in databases. In: Proceedings of SIGMOD’10, pp 1127–1130. http://www.di.unipi.it/~ruggieri/Papers/dcube.pdf
(and further papers by the same team) Sara Hajian, Josep Domingo-Ferrer: A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. IEEE
Trans. Knowl. Data Eng. 25(7): 1445-1459 (2013). http://crises2-deim.urv.cat/docs/publications/journals/684.pdf Sara Hajian, Josep Domingo-Ferrer, Oriol Farràs: Generalization-based privacy preservation and discrimination prevention
in data publishing and mining. Data Min. Knowl. Discov. 28(5-6): 1158-1188 (2014). http://crises2-deim.urv.cat/docs/publications/journals/813.pdf
Faisal Kamiran, Toon Calders, Mykola Pechenizkiy: Discrimination Aware Decision Tree Learning. ICDM 2010: 869-874. http://wwwis.win.tue.nl/~tcalders/pubs/TR10-13.pdf
“EU Privacy Directive“: Directive 95/46/EC of the European Parliament and of the Council of 24.10.1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (O.J. L 281, 23.11.1995)
Lee A. Bygrave. Minding the Machine: Article 15 of the EC Data Protection Directive and Automated Profiling. Computer Law & Security Report, 2001, volume 17, pp. 17–24. http://folk.uio.no/lee/oldpage/articles/Minding_machine.pdf
Gao B, Berendt B (2011) Visual data mining for higher-level patterns: discrimination-aware data mining and beyond. In: Proceedings of the BENELEARN 2011. http://www.liacs.nl/~putten/benelearn2011/Benelearn2011_Proceedings.pdf
top related