dynamic matching models - university of...

Dynamic Matching Models

Ana Busic

Inria Paris - RocquencourtCS Department of Ecole normale superieure

joint work with Varun Gupta, Jean Mairesse and Sean Meyn

3rd Workshop on Cognition and ControlJanuary 16 & 17, 2015, University of Florida

1 / 29

Bipartite matching model Model description

Dynamic Bipartite Matching Model

Static model – long history in economics Finding Stable Matches 2012 NobelPrize awarded to L. S. Shapley.

Dynamic model introduced by Caldentey, Kaplan, and Weiss (2009)Multiclass queueing model – Supply/Demand play symmetric roles

Discrete time queueing model with twotypes of arrival: “supply” and “demand”.

Arrival of Supply/Demand is i.i.d., with

|AD(t)| = |AS(t)| for all t

Instantaneous matchings according to abipartite matching graph.

Supply/Demand that cannot be matched arestored in a buffer.

2 / 29

Matching in Health-care

Matching Kidneys and Donors

Who can join this program?For recipients: If you are eligible

for a kidney transplant and are

receiving care at a transplant

center in the United States, you

can join ... You must have a living

donor who is willing and medically

able to donate his or her kidney ...

For donors: You must also be

willing to take part ...

U N I T E D N E T W O R K F O R O R G A N S H A R I N G

TA L K I N G A B O U T T R A N S P L A N TAT I O N

3 / 29

Matching Policies

Model specified by 1) Matching graph, 2) Joint probability measure µ for arrivalsof Supply/Demand, and

3) A matching policy.

Admissible policies

State feedback: Decision U(t) depends only on buffers Q(t) and immediatearrivals A(t),

Q(t + 1) = Q(t)− U(t) + A(t)

Match according to graph, maintaining

|UD(t)| = |US(t)||QD(t)| = |QS(t)| for all t

Q is a discrete time Markov chainStability = positive recurrence of Q.

4 / 29

Matching Policies

Model specified by 1) Matching graph, 2) Joint probability measure µ for arrivalsof Supply/Demand, and 3) A matching policy.

Admissible policies

Q(t + 1) = Q(t)− U(t) + A(t)

4 / 29

Matching Policies

Admissible policies

Q(t + 1) = Q(t)− U(t) + A(t)

4 / 29

Matching Policies

Admissible policies

Q(t + 1) = Q(t)− U(t) + A(t)

4 / 29

Matching Policies

Admissible policies

Q(t + 1) = Q(t)− U(t) + A(t)

4 / 29

Bipartite matching model Necessary conditions

Necessary stability conditions

For a matching graph (D,S,E ) we denote:

D(s) = {d ∈ D : (d , s) ∈ E}, S(d) = {s ∈ S : (d , s) ∈ E} .

Necessary conditions: If the model is stable then the marginals of µ satisfy

NCond :

{µD(U) < µS(S(U)), ∀U ( DµS(V ) < µD(D(V )), ∀V ( S

Prop. Given [(D,S,E ), µ], there exists an algorithm of time complexityO((|D|+ |S|)3) to decide if NCond is satisfied.

5 / 29

Necessary stability conditions

For a matching graph (D,S,E ) we denote:

D(s) = {d ∈ D : (d , s) ∈ E}, S(d) = {s ∈ S : (d , s) ∈ E} .

Necessary conditions: If the model is stable then the marginals of µ satisfy

NCond :

{µD(U) < µS(S(U)), ∀U ( DµS(V ) < µD(D(V )), ∀V ( S

Prop. Given [(D,S,E ), µ], there exists an algorithm of time complexityO((|D|+ |S|)3) to decide if NCond is satisfied.

5 / 29

Proof using network flow arguments:

N =(D ∪ S ∪ {i , f },E ∪ {(i , d), d ∈ D} ∪ {(s, f ), s ∈ S}

Capacities: µD(d) for (i , d), µS(s) for (s, f ), ∞ for (d , s).

Lemma.

1 There exists a flow of value 1 in N iff µ satisfiesNCond≤ (< replaced by ≤ in NCond).

2 There exists a flow T of value 1 such thatT (d , s) > 0 for all (d , s) ∈ E iff µ satisfiesNCond.

6 / 29

Proof of Lemma 2

⇒ Follows easily from connectivity of the matching graph.

⇐ Fix η such that 0 < η < 1/|E |. A strictly positive flow of value |E |η:

Tη(x , y) =

η for (x , y) = (d , s) ∈ E

|S(d)| η for (x , y) = (i , d)

|D(s)| η for (x , y) = (s, f ) .

Define: µD(d) = µD(d)−|S(d)|η1−|E |η , µS(s) = µS(s)−|D(s)|η

1−|E |η .

For η small enough, µD, µS are probability measures satisfying NCond.

For µD, µS there exists a flow T of value 1.

A strictly positive flow of value 1: T = Tη + (1− |E |η)T .

7 / 29

Verification algorithm

The pair (µD, µS) satisfies NCond iff the pair (µD, µS) satisfies NCond for ηstrictly positive and small enough.

Run MaxFlow on the input (N , µD, µS) by considering η as a formal parameter“as small as needed”.

Quantities of type: x + yη for x , y ∈ R.Addition: (x1 + y1η) + (x2 + y2η) = (x1 + x2) + (y1 + y2)η.Comparisons:[

x1 + y1η = x2 + y2η]⇐⇒

[x1 = x2, y1 = y2

][x1 + y1η < x2 + y2η

]⇐⇒

[(x1 < x2) or (x1 = x2, y1 < y2)

]On any given input, MaxFlow stops in finite time.A posteriori, assign to η a value which is small enough not to reverse any strictinequality.

8 / 29

x1 + y1η = x2 + y2η]⇐⇒

[x1 = x2, y1 = y2

][x1 + y1η < x2 + y2η

]⇐⇒

[(x1 < x2) or (x1 = x2, y1 < y2)

On any given input, MaxFlow stops in finite time.A posteriori, assign to η a value which is small enough not to reverse any strictinequality.

8 / 29

x1 + y1η = x2 + y2η]⇐⇒

[x1 = x2, y1 = y2

][x1 + y1η < x2 + y2η

]⇐⇒

[(x1 < x2) or (x1 = x2, y1 < y2)

]On any given input, MaxFlow stops in finite time.A posteriori, assign to η a value which is small enough not to reverse any strictinequality.

8 / 29

Example

1 2 3 4

1’2’3’4’

1 2 3 4

1’2’3’4’

matching graph arrival graph

Consider any µ with supp(µ) = F . We have

µS({1′, 2′}) = µ(3, 1′) + µ(4, 2′) ≤ µD({3, 4}) ,

which contradicts NCond for U = {3, 4}.

9 / 29

Bipartite matching model Connectivity properties

Connectivity properties

Consider a bipartite matching structure (D,S,E ,F ). Associated directed graph:the nodes are D ∪ S and the arcs are

d −→ s, if (d , s) ∈ E , s −→ d , if (d , s) ∈ F .

1 2 3 4

1’2’3’4’

1 2 3 4

1’2’3’4’

arrival graph (D,S, F )matching graph (D,S,E )

associated directed graph

10 / 29

Thm. For a bipartite matching structure (D,S,E ,F ) the following properties areequivalent:

1 There exists µ such that supp(µ) = F , supp(µD) = D, supp(µS) = S and µsatisfies NCond.

2 The associated directed graph is strongly connected.

Thm. If the associated directed graph of (D,S,E ,F ) is strongly connected, thenany bipartite matching model [(D,S,E ,F ), µ,Pol] has a unique strictlyconnected component with all states leading to it.

11 / 29

Thm. For a bipartite matching structure (D,S,E ,F ) the following properties areequivalent:

1 There exists µ such that supp(µ) = F , supp(µD) = D, supp(µS) = S and µsatisfies NCond.

2 The associated directed graph is strongly connected.

Thm. If the associated directed graph of (D,S,E ,F ) is strongly connected, thenany bipartite matching model [(D,S,E ,F ), µ,Pol] has a unique strictlyconnected component with all states leading to it.

11 / 29

Bipartite matching model Sufficient conditions

State space decomposition

The state space can be decomposed into facets, defined only by the non-emptyclasses.

Def. A facet is an ordered pair (U,V ) such that: U ⊂ D,V ⊂ S andU × V ⊂ (D × S − E ).

2’ 1’3’

facet ({3}, {3′}) facet ({2, 3}, {3′})

For a facet F = (U,V ), define:

D•(F) = U, D}(F) = D(V ), D◦(F) = D − (D•(F) ∪ D}(F))

S•(F) = V , S}(F) = S(U), S◦(F) = S − (S•(F) ∪ S}(F)).

12 / 29

Sufficient conditions

Conditions SCond:

µD(D}(F)) + µS(S}(F)) > 1− µ(E ∩ D◦(F)× S◦(F)), ∀F 6= (∅, ∅)

Prop. (Sufficient conditions) A bipartite model with probability µ satisfyingSCond is stable under any admissible matching policy.

Proof. Variation of the linear Lyapunov function(number of unmatched customers):

D} D◦ D•S} −1 0 0S◦ 0 0 or 1 1S• 0 1 1

13 / 29

Sufficient conditions

Def. A facet F is called saturated if D◦(F) = ∅ or S◦(F) = ∅.

SCond =⇒ NCond (considering only the saturated facets).

2’ 1’3’

non-saturated saturated

For the NN graph:SCond = {NCond ∩ (µD(1) + µS(1′) > 1 −µ(2, 2′))} .

For µ = µD × µS andµD = µS = (x , y , 1− x − y):

NCond :

{x < 0.52x + y > 1

SCond :

{NCond2x + y2 > 1

14 / 29

Policy-specific results Match the Longest has maximal stability region

Match the Longest has maximal stability region

Match the Longest (ML) policy: a newly arriving customer of class c is matchedto a server in S(c) with the largest buffer (similarly for newly arriving server).

Thm. For any bipartite graph, ML has a maximal stability region.

Proof:

Quadratic Lyapunov function: L(x , y) =∑

d∈D x2d +

∑s∈S y

ML minimizes the value of this Lyapunov function at each step.

Facet-dependent randomized policy. In a non-zero facet F : the servers ∈ S}(F) is matched to d ∈ D•(F) ∩ D(s) with probability PFsd . Theseprobabilities can be chosen such that:

∀d ∈ D•,∑

s∈S(d)

µS(s)PFsd > µD(d).

(symmetrically for customers)

For this randomized policy stability can be shown using Foster-Lyapunovcriterion.

15 / 29

Proof:

d∈D x2d +

∑s∈S y

∀d ∈ D•,∑

s∈S(d)

15 / 29

Proof:

d∈D x2d +

∑s∈S y

∀d ∈ D•,∑

s∈S(d)

15 / 29

Policy-specific results Priorities are not always stable

Priorities and Match the Shortest are not always stable

Prop. NN model with either the MS policy or the PR (priority) policy such thatcustomers of class 1 (resp. servers of class 1′) give priority to servers of class 2′

(resp. to customers of class 2):

2’ 1’3’

For both policies, the stability region is not maximal.

Consider µD = (1/3, 2/5, 4/15),µS = µD, and µ = µD × µS . NCondare satisfied, but the Markov chain istransient (for MS or PR as above).

16 / 29

Policy-specific results Priorities are not always stable

Stability region for Match the shortest

17 / 29

Workload Stabilizability

Optimization

Cost function c on buffer levels.

Average-cost:

η = lim supN→∞

N−1∑t=0

E[c(Q(t))

Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

}with ue = 1i + 1j for e = (i , j) ∈ E .X (t) = Q(t) + A(t) the state process of the MDP model.

X (t + 1) = X (t)− U(t) + A(t + 1)

The state space X� = {x ∈ Z`+ : ξ0 · x = 0} with ξ0 = (1, . . . , 1,−1, . . . ,−1).

18 / 29

Optimization

Average-cost:

η = lim supN→∞

N−1∑t=0

E[c(Q(t))

]Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0

Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

X (t + 1) = X (t)− U(t) + A(t + 1)

18 / 29

Optimization

Average-cost:

η = lim supN→∞

N−1∑t=0

E[c(Q(t))

]Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

}with ue = 1i + 1j for e = (i , j) ∈ E .

X (t) = Q(t) + A(t) the state process of the MDP model.

X (t + 1) = X (t)− U(t) + A(t + 1)

18 / 29

Optimization

Average-cost:

η = lim supN→∞

N−1∑t=0

E[c(Q(t))

]Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

X (t + 1) = X (t)− U(t) + A(t + 1)

18 / 29

Workload

For any D ⊂ D, corresponding workload vector ξD defined so that

ξD · x =∑i∈D

xDi −

∑j∈S(D)

Necessary and sufficient condition for a stabilizing policy:

NCond: δD :=−ξD · α > 0 for each Dα = E[A(t)] arrival rate vector.

Why is this workload? Consistent with routing/scheduling models:

Fluid model,d

dtx(t) = −u(t) + α

The minimal time to reach the origin from x(0) = x : T ∗(x) = maxDξD ·xδD

Heavy-traffic: δD ∼ 0 for one or more D

19 / 29

Workload

ξD · x =∑i∈D

xDi −

∑j∈S(D)

Why is this workload?

Consistent with routing/scheduling models:

Fluid model,d

dtx(t) = −u(t) + α

19 / 29

Workload

ξD · x =∑i∈D

xDi −

∑j∈S(D)

Fluid model,d

dtx(t) = −u(t) + α

19 / 29

Workload

ξD · x =∑i∈D

xDi −

∑j∈S(D)

Fluid model,d

dtx(t) = −u(t) + α

19 / 29

Workload Workload Relaxation

Workload Dynamics

Fix one workload vector ξD ; denote (ξ, δ) for (ξD , δD).

Workload W (t) = ξ · X (t)

can be positive or negative.Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

Achieved ⇐⇒ S(D) matches with D only.

Workload relaxation: take this as the model for control.

20 / 29

Workload Dynamics

Workload W (t) = ξ · X (t) can be positive or negative.

Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

20 / 29

Workload Dynamics

Workload W (t) = ξ · X (t) can be positive or negative.Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

20 / 29

Workload Dynamics

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

20 / 29

Workload Dynamics

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

20 / 29

Relaxations

A workload relaxation takes this as the model for control:One Dimensional Workload relaxation,

W (t + 1) = W (t)− δ + I (t)︸︷︷︸Idleness ≥ 0

+ N(t + 1)︸︷︷︸Zero mean

Effective cost c : < → <+: Given a cost function c for Q,

c(w) = min{c(x) : ξ · x = w}

piecewise linear if c is linear

Conclusions

Control of the relaxation = inventory model of Clark & Scarf

Hedging policy, with threshold r : Idling is not permitted unless W (t) < −r

Heavy-traffic: For average-cost optimal control, r ∼ 12

δlog(1 + c+/c−)

21 / 29

Relaxations

c(w) = min{c(x) : ξ · x = w}

Conclusions

δlog(1 + c+/c−)

21 / 29

Relaxations

c(w) = min{c(x) : ξ · x = w}

Conclusions

δlog(1 + c+/c−)

21 / 29

Relaxations

c(w) = min{c(x) : ξ · x = w}

Conclusions

δlog(1 + c+/c−)

21 / 29

Relaxations

c(w) = min{c(x) : ξ · x = w}

Conclusions

δlog(1 + c+/c−)

21 / 29

Workload Examples

Tracking the Relaxation

e1 e2 e3 e4 e5

xD1 xD

ξ1 = (0, 0, 1,−1, 0, 0)T

W (t) = QD3 (t)− QS

Relaxation:Matching of Supply 1 and Demand 2allowed only if W (t) < −r

Example 1

Cost: c(x) = xD1 + 2xD

2 + 3xD3 + 3xS

1 + 2xS2 + xS

=⇒ Effective Cost: c(w) = 4|w |

Example 2

Cost: c(x) = 3xD1 + 2xD

2 + xD3 + 3xS

1 + 2xS2 + xS

=⇒ Effective Cost: c(w) = max(2w ,−5w)

22 / 29

Workload Examples

e1 e2 e3 e4 e5

xD1 xD

ξ1 = (0, 0, 1,−1, 0, 0)T

W (t) = QD3 (t)− QS

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

22 / 29

Workload Examples

e1 e2 e3 e4 e5

xD1 xD

ξ1 = (0, 0, 1,−1, 0, 0)T

W (t) = QD3 (t)− QS

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

22 / 29

Workload Examples

e1 e2 e3 e4 e5

xD1 xD

ξ1 = (0, 0, 1,−1, 0, 0)T

W (t) = QD3 (t)− QS

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

22 / 29

Workload Examples

e1 e2 e3 e4 e5

xD1 xD

ξ1 = (0, 0, 1,−1, 0, 0)T

W (t) = QD3 (t)− QS

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

22 / 29

Workload Examples

Tracking the Relaxation W (t) = QD3 (t)− QS

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

30 5 10 15 20 25 30 r

r∗ = 14.9

Average Cost Estimated in Simulation:Average Cost Comparisons:

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

Matching of Supply 1 and Demand 2allowed only if W (t) < −r

Workload Relaxation:

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

Simulation with r∗ = 14.9

23 / 29

Workload Examples

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

30 5 10 15 20 25 30 r

r∗ = 14.9

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

23 / 29

Workload Examples

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

30 5 10 15 20 25 30 r

r∗ = 14.9

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

23 / 29

Workload Examples

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

30 5 10 15 20 25 30 r

r∗ = 14.9

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

23 / 29

Workload Examples

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 5 10 15 20 25 30 r

r∗ = 14.9

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

23 / 29

Workload Examples

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 5 10 15 20 25 30 r

r∗ = 14.9

Average Cost Estimated in Simulation:

Average Cost Comparisons:

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

23 / 29

Workload Examples

Example 1

2 + 3xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

30 5 10 15 20 25 30 r

r∗ = 14.9

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

23 / 29

Workload Examples

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 2 4 6 8 10 12 14 r

r∗ = 7.2

Priority

MaxWeight

Threshold (7)

0 1 2 3 4 5 6

T5 6x 10

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD1 (t) = QD

3 (t) = 0 if W (t) < 0

24 / 29

Workload Examples

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 2 4 6 8 10 12 14 r

r∗ = 7.2

Priority

MaxWeight

Threshold (7)

0 1 2 3 4 5 6

T5 6x 10

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD1 (t) = QD

3 (t) = 0 if W (t) < 0

24 / 29

Workload Examples

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 2 4 6 8 10 12 14 r

r∗ = 7.2

Priority

MaxWeight

Threshold (7)

0 1 2 3 4 5 6

T5 6x 10

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD1 (t) = QD

3 (t) = 0 if W (t) < 0

24 / 29

Workload Examples

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 2 4 6 8 10 12 14 r

r∗ = 7.2

Priority

MaxWeight

Threshold (7)

0 1 2 3 4 5 6

T5 6x 10

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD1 (t) = QD

3 (t) = 0 if W (t) < 0

24 / 29

Workload Examples

Example 2

2 + xD3 + 3xS

1 + 2xS2 + xS

e1 e2 e3 e4 e5

xD1 xD

0 2 4 6 8 10 12 14 r

r∗ = 7.2

Priority

MaxWeight

Threshold (7)

0 1 2 3 4 5 6

T5 6x 10

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD1 (t) = QD

3 (t) = 0 if W (t) < 0

24 / 29

h-MaxWeight and Approximate Optimality

h-MaxWeight

h-MaxWeight Policy: U(t) = φMW(Q(t))

φMW(x) = argminu

E[∇h (x) ·∆(t + 1) | X (t) = x ,U(t) = u]

where ∆(t + 1) = X (t + 1)− X (t) = −U(t) + A(t + 1)

Average drift: −φMW(x) + α =

E[∆(t + 1) | X (t) = x ] = E[−U(t) + A(t + 1) | X (t) = x ]

For average cost optimality, this means,

E[h(Q(t + 1))− h(Q(t)) | Q(t) = x ] = ∇h (x) · [−φMW(x) + α]︸︷︷︸∼−c(x)

+ 12 E[∆(t + 1)T∇2h (X )∆(t + 1)

]︸︷︷︸

bounded

E[∆(t + 1) | X (t) = x ] = E[−U(t) + A(t + 1) | X (t) = x ]

25 / 29

h-MaxWeight

φMW(x) = argminu

E[∇h (x) ·∆(t + 1) | Q(t) = x ,U(t) = u]

where ∆(t + 1) = X (t + 1)− X (t) = −U(t) + A(t + 1)

Hope that h approximates solution to a dynamic programming equation

+ 12 E[∆(t + 1)T∇2h (X )∆(t + 1)

]︸︷︷︸

bounded

E[∆(t + 1) | X (t) = x ] = E[−U(t) + A(t + 1) | X (t) = x ]

25 / 29

h-MaxWeight

φMW(x) = argminu

E[∇h (x) ·∆(t + 1) | Q(t) = x ,U(t) = u]

where ∆(t + 1) = X (t + 1)− X (t) = −U(t) + A(t + 1)

Hope that h approximates solution to a dynamic programming equation

+ 12 E[∆(t + 1)T∇2h (X )∆(t + 1)

]︸︷︷︸

bounded

E[∆(t + 1) | X (t) = x ] = E[−U(t) + A(t + 1) | X (t) = x ]

25 / 29

Asymptotic optimality

Family of arrival processes {Aδ(t)} parameterized by Additional assumptions:

(A1) For one set D ( D we have ξD · αδ = −δ, where αδ denotes the mean ofAδ(t).Moreover, there is a fixed constant δ > 0 such that ξD

′ · αδ ≤ −δ for anyD ′ ( D, D ′ 6= D, and δ ∈ [0, δ•].

(A2) The distributions are continuous at δ = 0, with linear rate: For someconstant b,

E[‖Aδ(t)− A0(t)‖] ≤ bδ. (1)

(A3) The sets E and F do not depend upon δ, and the graph associated with E isconnected. Moreover, there exists i0 ∈ S(D), j0 ∈ Dc , and εI > 0 such that

P{Aδi0 (t) ≥ 1 and Aδj0 (t) ≥ 1} ≥ εI , 0 ≤ δ ≤ δ•. (2)

26 / 29

Asymptotic optimality

There is a function h such that, under Assumptions (A1)–(A3), for sufficientlylarge κ > 0, β > 0, and sufficiently small δ+ > 0 (each independent of δ), theaverage cost η under the h-MaxWeight policy satisfies,

η∗ ≤ η∗ ≤ η ≤ η∗ + O(1)

where η∗ is the optimal average cost for the MDP model, η∗ is the optimalaverage cost for the workload relaxation, and the constant O(1) does not dependupon δ.The average cost for the relaxation satisfies the uniform bound,

η∗ = η∗∗ + O(1)

where η∗∗ is the optimal cost for the diffusion approx. for the relaxation:

η∗∗ =1

Θc− log

), where

σ2∆

27 / 29

Final remarks

Final remarks/related work

Performance bounds?

Approximate optimal control for relaxations in higher dimensions?

More general arrival assumptions. Admission control?

Non-bipartite matching? Networks?

Applications in energy systems and/or healthcare?

28 / 29

Final remarks

Performance bounds?

28 / 29

Final remarks

Performance bounds?

28 / 29

References

Related models

Tassiulas, Ephremides, IEEE TAC 1992.McKeown, Mekkitikul, Anantharam, Walrand, IEEE Trans. Comm. 1999.

Bipartite matching model

Caldentey, Kaplan, Weiss, Adv. Appl. Probab. 2009.Adan & Weiss, Operations Research, 2012.Busic, Gupta, Mairesse, Stability of the bipartite matching model. Adv.Appl. Probab. 2013.Busic, Meyn, Optimization of Dynamic Matching Models. ArXiv:1411.1044.2014.Adan, Busic, Mairesse, Weiss, Reversibility of the FCFS bipartite matchingmodel. In preparation.

Workload relaxations

Meyn, Stability and asymptotic optimality of generalized MaxWeight policies.SIAM J. Control Optim., 2009.Meyn, Control Techniques for Complex Networks. Cambridge UniversityPress, 2007.

29 / 29

dynamic matching models - university of...

Documents