regret to the best vs. regret to the average
DESCRIPTION
Regret to the Best vs. Regret to the Average. Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman. Upenn + Tel Aviv Univ. Slides: Csaba. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/1.jpg)
Regret to the Best vs.
Regret to the Average
Eyal Even-Dar Michael Kearns Yishay Mansour Jennifer Wortman
Upenn + Tel Aviv Univ.Slides: Csaba
![Page 2: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/2.jpg)
Motivation
Expert algorithms attempt to control regret to the return of the best expert
Regret to the average return? Same bound! Weak???
EW: wi1=1, wit=wi,t-1e git , pit=wit/Wt, Wt = i wit
E1: 1 0 1 0 1 0 1 0 1 0 …E2: 0 1 0 1 0 1 0 1 0 1 …
GA,T=T/2-cT1/2
GT+ = GT
- = GT0 = T/2
RT+ · cT1/2, RT
0· c T1/2
![Page 3: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/3.jpg)
Notation - gains
git2 [0,1] - gains
g=(git) - sequence of gains
GiT(g)= t=1T git - cumulated gains
G0T(g)=(i GiT(g))/N - average gain
G-T(g)=mini GiT(g) - worst gain
G+T(g)=maxi GiT(g) - best gain
GDT(g)=i Di GiT(g) - weighted avg. gain
![Page 4: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/4.jpg)
Notation - algorithms
wit – unnormalized weights
pit=wit/Wt, – normalized weightsWt = i wit
gA,t=i pit git – gain of A
GAT(g)= t gA,t – cumulated gain of A
![Page 5: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/5.jpg)
Notation - regret
regret to the.. R+
T(g) = (G+T(g) – GA,T(g)) Ç 1 – best
R-T(g) = (G-
T(g) – GA,T(g)) Ç 1 – worst
R0T(g) = (G0
T(g) – GA,T(g)) Ç 1 – avg
RDT(g) = (GD
T(g) – GA,T(g)) Ç 1 – dist.
![Page 6: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/6.jpg)
Goal
Algorithm A is “nice” if .. R+
A,T · O(T1/2)
R0A,T · 1
Program: Examine existing algorithms (“difference
algorithms”) – lower bound Show “nice” algorithms Show that no substantial further improvement is
possible
![Page 7: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/7.jpg)
“Difference” algorithms
Def:A is a difference algorithm if for N=2, git2 {0,1}, p1t = f(dt), p2t = 1-f(dt), dt = G1t-G2t
Examples: EW: wit = e Git
FPL: Choose argmaxi ( Git+Zit )
Prod: wit = s (1+ gis) = (1+)Git
![Page 8: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/8.jpg)
A lower bound for difference algorithms
Theorem:If A is a difference algorithm then there exist some series, g, g’ (tuned to A), such that
R+AT (g) R0
AT (g’) ¸ R+AT (g) R-
AT (g’) = (T)
For R+AT = maxg R+
AT(g), R-AT = maxg R-
AT(g),
R0AT = maxg R0
AT(g),
R+AT R0
AT ¸ R+AT R-
AT = (T)
![Page 9: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/9.jpg)
Proof
Assume T is even, p11 · ½
: first time t when p1t¸ 2/3 ) R+AT(g) ¸ /3
9 2 {2,3,..,} s.t. p1-p1-1 ¸ 1/(6)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 …0 0 0 0 0 0 0 0 0 0 0 0 0 0 …
g:
![Page 10: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/10.jpg)
Proof/2 p1-p1-1 ¸ 1/(6)
G+T=G-
T=G0=T/2
GAT(g’)· + (T-2)/2 (1-1/(6)) R-
AT(g’) ¸ (T-2)/(12) ) R+
AT(g)R-AT(g’)¸ (T-2)/36
1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1g’:
p1,t=p1,
p1,t+1=p1,-1
Gain: · 1-1/(6)p1t=p1,T-t
Gain: p1t+1-p1t=1
![Page 11: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/11.jpg)
Tightness
We know that for difference algorithms
R+AT R0
AT ¸ R+AT R-
AT = (T) Can a (difference) algorithm achieve this? Theorem: EW=EW(), with appropriately
tuned =(), 0· · 1/2 has
R+EW,T· T1/2+ (1+ln N)
R0EW,T· T1/2-
![Page 12: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/12.jpg)
Breaking the frontier
What’s wrong with the difference algorithms? They are designed to find the best expert with
low regret (fast) ..they don’t pay attention to the average gain
and how it compares with the best gain
![Page 13: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/13.jpg)
BestWorst(A)
G+T-G-
T: the spread of cumulated gain Idea: Stay with the average, until the spread
becomes large. Then switch to learning (using algorithm A).
When the spread is large enough, G0
T=GBW(A),T À G-T
) “Nothing” to loose Spread threshold: NR; where R=RT,N is a
bound on the regret of A.
![Page 14: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/14.jpg)
BestWorst(A)
Theorem: R+BW(A),T = O(NR), GBW(A),T¸ G-{T}
Proof:At the time of switch, GBW(A) ¸ (G++ (N-1)+G-)/N. Since G+¸ G-+NR,
GBW(A)¸ G- + R.
![Page 15: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/15.jpg)
PhasedAgression(A,R,D)
for k=1:log2(R) do=2k-1/RA.reset(); s:=0 // local time, new phase
while (G+s-GD
s<2R) do
qs := A.getNormedWeights( gs-1 )
ps := qs + (1-) Dend
endA.reset()run A until time T
![Page 16: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/16.jpg)
PA(A,R,D) – Theorem
Theorem:Let A be any algorithm with regret R = RT,N to the best expert, D any distribution.Then for PA=PA(A,R,D),
R+PA,T· 2R(log R+1)
RDPA,T· 1
![Page 17: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/17.jpg)
Proof Consider local time s during phase k. D and A share the gains & the regret
G+s-GPA,s < 2k-1/R£ R + (1-2k-1/R) £ 2R < 2R
GDs-GPA,s· 2k-1/R £ R =2k-1
What happens at the end of the phase?
GPA,s-GD,s ¸ 2k-1/R £ (G+
s-R-GDs)
¸ 2k-1/R £ (G+s-GD
s-R+GDsGD
s)¸ 2k-1/R £ R = 2k-1.
What if PA ends in phase k at time T:
G+T-GPA,T · 2R k · 2R (log R + 1)
GDT-GPA,T· 2k-1 - j=1
k-1 2j-1= 2k-1(2k-1-1)=1
![Page 18: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/18.jpg)
General lower bounds
Theorem:
R+A,T=O(T1/2) ) R0
A,T=(T1/2)
R+A,T· (Tlog(T))1/2/10 ) R0
A,T=(T), where ¸ 0.02
Compare this with
R+PA,T· 2R(log R+1), RD
PA,T· 1,
where R=(T log N)1/2
![Page 19: Regret to the Best vs. Regret to the Average](https://reader036.vdocuments.net/reader036/viewer/2022062309/56814bf0550346895db8d982/html5/thumbnails/19.jpg)
Conclusions
Achieving constant regret to the average is a reasonable goal.
“Classical” algorithms do not have this property, but satisfy R+
AT R0AT ¸ (T).
Modification: Learn only when it makes sense; ie. when the best is much better than the average
PhasedAgression: Optimal tradeoff Can we remove dependence on T?