tight bounds for strategyproof classification

Tight Bounds forStrategyproof Classification

Jeff RosenscheinSchool of Computer Science and Engineering

Hebrew University

Joint work with:Reshef Meir, Shaull Almagor, Assaf Michaely, Ariel Procaccia

Strategy-Proof Classification

• An Example

• Motivation

• Our Model and some previous results

• Filling the gap: proving a lower bound

• The weighted case

• Some generalization

Motivation Model ResultsIntroduction

The Motivating Questions• Do “strategyproof” considerations apply to

machine learning?• If agents have an incentive to lie, what can we

do about it?– Approximation– Randomization


ERM

Motivation Model Results

Strategic labeling: an exampleIntroduction

5 errors

There is a better classifier! (for me…)


If I just change the

labels…


2+5 = 7 errors

ClassificationThe Supervised Classification problem:

– Input: a set of labeled data points {(xi,yi)}i=1..m

– output: a classifier c from some predefined concept class C ( e.g., functions of the form f : X{-,+} )

– We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize R(c) ≡the expected number of errors w.r.t. the distribution D

(the 0/1 loss function)

Motivation ResultsIntroduction Model

E(x,y)~D[ c(x)≠y ]

Classification (cont.)• A common approach is to return the ERM

(Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors)

• Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well)

With multiple experts, we can’t trust our ERM!

Motivation ResultsIntroduction Model

Where do we find “experts” with incentives?

Example 1: A firm learning purchase patterns– Information gathered from local retailers– The resulting policy affects them – “the best policy, is the policy that fits my pattern”

Introduction Model ResultsMotivation

Users Reported Dataset

Classification AlgorithmClassifier

Introduction Model Results

Example 2: Internet polls / polls of experts

Motivation

Introduction Model Results

Motivation from other domains

Motivation

Aggregating partitions

Judgment aggregation

Facility location (on the n-dimensional binary cube)

Agent A B A & B A | ~B

T F F T

F T F F

F F F T

Agent 1 Agent 2 Agent 3

Input: Example

––

–

–

++

–

X Xm

Y1 {-,+}m Y2 {-,+}m Y3 {-,+}m

S = S1, S2,…, Sn = (X,Y1),…, (X,Yn)

Introduction Motivation ResultsModel

–+

–

+

--

–

–+

+

–

-+

+

Mechanisms

• A Mechanism M receives a labeled dataset S and outputs c = M(S) C

• Private risk of i: Ri(c,S) = |{k: c(xik) yik}| / mi

• Global risk: R(c,S) = |{i,k: c(xik) yik}| / m• (No payments)• We allow non-deterministic mechanisms

– Measure the expected risk


% of errors on Si

% of errors on S

ERM

We compare the outcome of M to the ERM:c* = ERM(S) = argmin(R(c),S)r* = R(c*,S)

c C

Can our mechanism simply compute and return the ERM?


(Lying)

Requirements

1. Good approximation: S R(M(S),S) ≤ α∙r*

2. Strategy-Proofness (SP): i,S,Si‘ Ri(M(S-i , Si‘),S) ≥ Ri(M(S),S)

• ERM(S) is 1-approximating but not SP• ERM(S1) is SP but gives bad approximation3. No monetary transfer

Are there any mechanisms

that guarantee both SP and

good approximation?


MOST IMPORTANT

SLIDE

(Truth)

• A study of SP mechanisms in Regression learning

– O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning]

• No SP mechanisms for Clustering

– J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning]

• Characterization of SP aggregation rules

Introduction Motivation Model Results Related work

A simple case

• Tiny concept class: |C|= 2• Either “all positive” or “all negative”

Theorem: • There is an SP 2-approximation mechanism• There are no SP α-approximation mechanisms,

for any α<2

Introduction Motivation Model Results

R. Meir, A. D. Procaccia and J. S. Rosenschein, Stratgeyproof Classification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008

Previous work

Proof Sketch of Lower BoundC = {“all positive”, “all negative”}


R. Meir, A. D. Procaccia and J. S. Rosenschein, Strategyproof Classification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008

Previous work

Results

General concept classesTheorem: Selecting a dictator at random is SP

and guarantees approximation

True for any concept class C

Question #1: are there better mechanisms?Question #2: what if agents are weighted?Question #3: does this generalize for every

distribution?

Introduction Motivation Model

Meir, Procaccia and Rosenschein, IJCAI 2009

Previous work

n23

A lower boundIntroduction Motivation Model Results

Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least n

23

Main result from most recent work:

o Matching the upper bound from IJCAI-09

o Proof by careful reduction to a voting scenario

o Proof sketch below

Proof sketchIntroduction Motivation Model Results

Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators.*

We define X = {x,y,z}, and C as follows:

We also restrict the agents, so that each agent can have mixed labels on just one point

x y zcx + - -

cy - + -

cz - - +

x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

*not exactly…

Proof sketch (cont.)Introduction Motivation Model Results

x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

Suppose that M is SP


x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

Suppose that M is SP

1. M must be monotone on the mixed point

2. M must ignore the mixed point

3. M is a (randomized) voting rule

cz > cy > cx

cx > cz > cy


x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

4. By Gibbard [‘77], M is a random dictator

5. We construct an instance where random dictators perform poorly

cz > cy > cx

cx > cz > cy

31

32

Weighted agentsIntroduction Motivation Model Results

• We must select a dictator randomly

• However, probability may be based on weight

• Naïve approach: o Only gives a 3-approximation

• An optimal SP algorithm:o Matches the lower bound of

iwipr )(

)1(2)(

i

i

wwipr

n23

Generalization and learning

• So far, we only compared our results to the ERM, i.e., to the data at hand

• We want learning algorithms that can generalize well from sampled data– with minimal strategic bias– Can we ask for SP algorithms?


Generalization (cont.)• There is a fixed distribution DX on X• Each agent holds a private function labeling

the entire input space Yi : X {+,-}

– Possibly non-deterministic• The algorithm is allowed to sample from DX

and ask agents for their labels• We evaluate the result vs. the optimal risk,

averaging over all agents, i.e.,

Introduction Motivation Model ResultsResultsModel

n

iiDxCcopt xxYxcr

X1

~ |)()(Prinf: E

Generalization Mechanisms

Our mechanism is used as follows:1. Sample m data points i.i.d.2. Ask agents for their labels3. Use the SP mechanism on the labeled data, and

return the result

• Does it work? – Depends on our game-theoretic and learning-

theoretic assumptions


Generalization (cont.)Introduction Motivation Model ResultsResultsModel

DX Y1Y3

Y2

The “truthful approach”

• Assumption A: Agents do not lie unless they gain at least ε

• Theorem: W.h.p. the following occurs– There is no ε-beneficial lie– Approximation ratio (if no one lies) is close to 3 - 2/n

• Corollary: with enough samples, the expected approximation ratio is close to

• The number of required samples is polynomial in n and 1/ε


R. Meir, A. D. Procaccia and J. S. Rosenschein, Strategyproof Classification with Shared Inputs, IJCAI 2009

n23

The “Rational approach”

• Assumption B: Agents always pick a dominant strategy, if one exists.

• Theorem: with enough samples, the expected approximation ratio is close to 3 – 2/n

• The number of required samples is polynomial in 1/ε (and not on n)


R. Meir, A. D. Procaccia and J. S. Rosenschein, Strategyproof Classification with Shared Inputs, IJCAI 2009

Future work• Alternative assumptions on structure of data

• Other models of strategic behavior

• To better understand the relation between our model and

other domains, such as judgment aggregation

• Better characterization results on special cases

• Other concept classes

• Other loss functions (linear loss, quadratic loss,…)

• …


Talk Based on the Following Papers:• Strategyproof Classification Under Constant Hypotheses: A Tale of Two

Functions, Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. The Twenty-Third National Conference on Artificial Intelligence (AAAI 2008), Chicago, Illinois, July 2008, pages 126-131.

• Strategyproof Classification with Shared Inputs, Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. The Twenty-First International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, California, July 2009, pages 220-225.

• On the Limits of Dictatorial Classification, Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. The Ninth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Toronto, May 2010.

• Tight Bounds for Strategyproof Classification, Reshef Meir, Shaull Almagor, Assaf Michaely, and Jeffrey S. Rosenschein. The Tenth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, May 2011, pages 319-326.

tight bounds for strategyproof classification

Documents