group testing and coding theory atri rudra university at buffalo, suny or, a theoretical computer...
TRANSCRIPT
Group Testing and Coding Theory
Atri Rudra University at Buffalo, SUNY
Or, A Theoretical Computer Scientist’s (Biased) View of Group Testing
Group testing overview
Test soldier for a disease
WWII example: syphillis
2
Group testing overview
Test an army for a disease
WWII example: syphillis
What if only one soldier has the
disease?
What if only one soldier has the
disease?
3
Can we do
better?
Can we do
better?
4
Communicating with my 2 year oldC(x)
x
y = C(x)+error
x Give up
“Code” C“Akash English”
C(x) is a “codeword”
5
The setupC(x)
x
y = C(x)+error
x Give up
Mapping CError-correcting code or just code
Encoding: x C(x)
Decoding: y x
C(x) is a codeword
The fundamental tradeoff
Correct as many errors as possible with as little redundancy as possible
6
Can one achieve the “optimal” tradeoff with efficient encoding and decoding ?
The main message
7
Coding Theory
Group Testing
Asymptotic view
n!
10n2
n2
O() notation
≤ is O with glasses
poly(n) is O(nc) for some fixed c
Group testing overview
Test an army for a disease
WWII example: syphillis
What if only one soldier has the
disease?
What if only one soldier has the
disease?
Can pool blood samples and
check if at least one soldier has
the disease
Can pool blood samples and
check if at least one soldier has
the disease
10
Group testing
Set of items: (Unknown) vector x in {0,1}n
At most d positives: |x| ≤ d
Tests: a subset S of {1,..,n}
Result of a test: OR of xi’s such that i in S
Goal 1: Figure out x
Goal 2: Minimize the number of tests t
Non-adaptive tests: all tests are fixed a priori
1 2 3 n…………
1
2
3
t
.
.
.
1 0 0 1………….
0 0 1 0………….
0 0 0 1………….
1 1 1 0………….
.
.
.
t = O(d2log n) is possiblet = O(d2log n) is possible
Tons of applications
Tons of applications
Output + itemsOutput + items
11
The decoding step
1 2 3 n…………
1
2
3
t
.
.
.
1 0 0 1………….
0 0 1 0………….
0 0 0 1………….
1 1 1 0………….
.
.
.
x1
x2
x3
xn
.
.
.
.
.
.
r1
r2
r3
rt
.
.
.
unknownunknown
To be designedTo be designed
ObservedObserved
How fast can this step be done?
How fast can this step be done?
12
An application: heavy hitters
Stream items are numbers in the range {1,…,n}
Output all items that occur at least 1/d fraction of the times
One pass,poly log space,
poly log update,poly log report
time
One pass,poly log space,
poly log update,poly log report
time
13
Cormode-Muthukrishnan idea
Use group testing: maintain counters for each test
Heavy tail property: Total frequency of non-heavy items < 1/d
1 2 3 n…………
c1
c2
c3
ct
.
.
.
1 0 0 1………….
0 0 1 0………….
0 0 0 1………….
1 1 1 0………….
.
.
.
Maintain count of items in tests
Maintain total count m
ri = 1 iff ci ≥ m/d
xj = 1 iff j is a heavy item (|x| ≤ d)
r = M × x Reporting the heavy items is just decoding!
Reporting the heavy items is just decoding!
14
Requirements from group testing
1 2 3 n…………
c1
c2
c3
ct
.
.
.
1 0 0 1………….
0 0 1 0………….
0 0 0 1………….
1 1 1 0………….
.
.
.
Non-adaptiveness is crucial
Minimize t (space)
Strongly explicit matrix
Minimize decoding time (report time)
15
An overview of results
# tests (t) Decoding time
d is O(log n)d is O(log n)
O(d2 log n) poly(t) [INR10, NPR11]
O(d2 log n) O(nt) [DR82], [PR08]
O(d4 log n) O(t) [GI04]
O(d2 log2 n) poly(t) [GI04, implicit]
Big savings
Big savings
16
Tackling the first row
# tests (t) Decoding time
O(d2 log n) poly(t) [INR10, NPR11]
O(d2 log n) O(nt) [DR82], [PR08]
O(d4 log n) O(t) [GI04]
O(d2 log2 n) poly(t) [GI04, implicit]
17
d-disjunct matricesSufficient condition for group testing
d columns
1 0 0 0 …………….. 0Exists
True for every d subset of columns and a disjoint column
Set of positives
Test result=0
Every non-positive column has one 0 test
result
Every non-positive column has one 0 test
result
18
L columns
Naïve decoder for d-disjunct matrices
d columns
1 0 0 0 …………….. 0
Set of positives
If rj = 0 then for every column i that is in test j, set xi = 0
If xi=1 then all tests column i participates in will have a 1
If xi=1 then all tests column i participates in will have a 1
O(nt) timeO(Lt) time
19
What is known
d columns
1 0 0 0 …………….. 0
Set of positives O(nt) time
r1
r2
r3
rt
.
.
.
d-disjunct matrix
Strongly explicit d-disjunct matrix with t = O(d2 log2n) [Kautz-Singleton 1964]
Deterministic d-disjunct matrix with t = O(d2 log n) [Porat-Rothschild 2008]
Lower bound of Ω(d2 log n/log d) [Dyachkov-Rykov 1982]
20
Randomized d-disjunct matrix with t = O(d2 log n) [Dyachkov-Rykov 1982]
Up next
# tests (t) Decoding time
O(d2 log n) poly(t) [INR10, NPR11]
O(d2 log n) O(nt) [DR82], [PR08]
O(d4 log n) O(t) [GI04]
O(d2 log2 n) poly(t) [GI04, implicit]
21
Error-correcting codes
22
C(x)x
y
x Give up
Mapping C : km
Dimension k, block length m m≥ k
Rate R = k/m 1
Efficient means polynomial in mDecoding time complexity
Noise model
Errors are worst case (Hamming)error locationsarbitrary symbol changes
Limit on total number of errors
23
Hamming’s 60 yr old observation
24
≥ D
D/2
Large “distance” is good
Large “distance” is good
All you need to remember about Reed-Solomon codes– Part I
q is a prime power
qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions
25
How do we get binary codes ?
26
Concatenation of codes [Forney 66]
C1: ({0,1}k)K ({0,1}k)M (Outer code)
C2: {0,1}k {0,1}m (Inner code)
C1° C2: {0,1}kK {0,1}mM
Typically k=O(log M)
x1 x2
wMw1 w2
xKx
C1(x)
C2(w1) C2(w2)C2(wM) C1° C2(x)
Disjunct matrices from RS codesn = qq/(d+1)
Column i gets ith codeword
Column i gets ith codeword
x 0 0 1…. …. 0x
x. q rows
t = q2 = O(d2 log2n)
d-disjunct matrix [Kautz,Singleton]
d-disjunct matrix [Kautz,Singleton]
Code Concatenation
Code Concatenation
q
27
A q=3 example
0
0
0
1
1
1
2
2
2
0
1
2
1
2
0
2
0
1
1
0
0
0
0
1
0
1
0
0
1
2
100
100
100
010
010
010
001
001
001
100
010
001
010
001
100
001
100
010
28
1-Agreement between two columns
0
0
0
1
1
1
2
2
2
0
1
2
1
2
0
2
0
1
1
0
0
0
0
1
0
1
0
0
1
2
100
100
100
010
010
010
001
001
001
100
010
001
010
001
100
001
100
010
≤ 1 agr
Agreement in binary = Agreement among RS codewords
< q/(d+1)
Agreement in binary = Agreement among RS codewords
< q/(d+1)29
d-disjunct matricesSufficient condition for group testing
d columns
1 0 0 0 …………….. 0Exists
True for every d subset of columns and a disjoint column
Set of positives
30
d-disjunctness of Kautz-Singleton
d columns
< q/(d+1) agr 11 11
< q/(d+1) agr 11 11
< q/(d+1) agr 11 11
1 >q- q*d/(d+1)>0 rows
0 0 0
31
Up next
# tests (t) Decoding time
O(d2 log n) poly(t) [INR10, NPR11]
O(d2 log n) O(nt) [DR82], [PR08]
O(d4 log n) O(t) [GI04]
O(d2 log2 n) poly(t) [GI04, implicit]
32
The basic idea
1 2 3 n…………
1
2
3
t
.
.
.
1 0 0 1………….
0 0 1 0………….
0 0 0 1………….
1 1 1 0………….
.
.
.
x1
x2
x3
xn
.
.
.
.
.
.
r1
r2
r3
rt
.
.
.
unknownunknown
Every column is a codeword
Every column is a codeword
ObservedObserved
Show is same as
`decoding’ the code
Show is same as
`decoding’ the code
33
n= # codewords = exp(m)
t = poly(m)
DecodingC(x) sent, y received
x k, y m
How much of y must be correct to recover x ?At least k symbols must be correctAt most (m-k)/m = 1-R fraction of errors1-R is the information-theoretic limit
: the fraction of errors decoder can handleInformation theoretic limit implies 1-R
34
x C(x)
yR = k/m
Can we get to the limit or 1-R ?
35
Not if we always want to uniquely recover the original message
Limit for unique decoding, (1-R)/2
(1-R)/2 (1-R)/2
1-R
c1
c2
r
R 1-R
(1-R)/2
36
List decoding [Elias57, Wozencraft58]
Always insisting on unique codeword is restrictiveThe “pathological” cases are rare
“Typical” received word can be decoded beyond (1-R)/2
Better Error-Recovery ModelOutput a list of answersList Decoding Example: Spell Checker
(1-R)/2
Almost all the space in higher dimension.
All but an exponential (in m) fraction
Information theoretic limit
• < 1 - R– Information-
theoretic limit
• Can handle twice as many errors
37Rate (R)
Unique decoding
Inf. theoretic limit
Fra
c. o
f Err
ors
()
Achievable by random codes.
NOT ALGORITHMIC!
Achievable by random codes.
NOT ALGORITHMIC!
38
Other applications of list decoding
CryptographyCryptanalysis of certain block-ciphers [Jakobsen98]Efficient traitor tracing scheme [Silverberg, Staddon, Walker 03]
Complexity TheoryHardcore predicates from one way functions [Goldreich,Levin 89; Impagliazzo
97; Ta-Shama, Zuckerman 01]Worst-case vs. average-case hardness [Cai, Pavan, Sivakumar 99; Goldreich, Ron,
Sudan 99; Sudan, Trevisan, Vadhan 99; Impagliazzo, Jaiswal, Kabanets 06]
Other algorithmic applicationsIP Traceback [Dean,Franklin,Stubblefield 01; Savage, Wetherall, Karlin, Anderson 00] Guessing Secrets [Alon,Guruswami,Kaufman,Sudan 02; Chung, Graham, Leighton 01]
Algorithmic list decoding results
1- R - > 0 Folded RS codes[Guruswami, R. 06]
39
Unique decoding
Inf. theoretic limit
Guruswami-Sudan 98
Parvaresh-Vardy 05
Fra
c. o
f Err
ors
()
Rate (R)
Folded RS
Concatenated codes
40
Concatenation of codes [Forney 66]
C1: ({0,1}k)K ({0,1}k)M (Outer code)
C2: {0,1}k {0,1}m (Inner code)
C1° C2: {0,1}kK {0,1}mM
Typically k=O(log M)
x1 x2
wMw1 w2
xKx
C1(x)
C2(w1) C2(w2)C2(wM) C1° C2(x)
Brute force decoding for inner code
41
List decoding C1° C2
y1 y2 yM
How do we “list decode” from lists ?
in {0,1}m
S1 S2 SM
in {0,1}k
List recovery
.
.
.
..
.
.
S1 S2 S3 SM
………………………Si subset of [q]
………………………c1 c2 c3 cM
|Si| ≤ d
42
All you need to remember about (Reed-Solomon) codes-- Part II
q is a prime power
qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions
poly(q) time algorithm for list recovery
.
.
.
..
.
.
S1 S2 S3 Sq
………………………Si subset of [q]
………………………c1 c2 c3 cq
|Si| ≤ d
43
Back to the example
0
0
0
1
1
1
2
2
2
0
1
2
1
2
0
2
0
1
1
0
0
0
0
1
0
1
0
0
1
2
100
100
100
010
010
010
001
001
001
100
010
001
010
001
100
001
100
010
101
001
011
+ items+ items ResultvectorResultvector
{1,2}
{2}
{0,2}
44
All you ever needed to know about (Reed-Solomon) codes…at least for this talk
q is a prime power
qq/(d+1) vectors from [q]q where every two agree in < q/(d+1) positions
poly(q) time algorithm for list recovery
.
.
.
..
.
.
S1 S2 S3 Sq
………………………
Si subset of [q]
………………………c1 c2 c3 cq
|Si| ≤ d
45
d2 columns
What does this imply?
d columns
1 0 0 0 …………….. 0
Set of positives
r1
r2
r3
rt
.
.
.
KS matrixpoly(t) time
O(d2t) time
t = O(d2 log2 n) Implicit in [Guruswami-
Indyk 04]
Implicit in [Guruswami-
Indyk 04]
46
Up next
# tests (t) Decoding time
O(d2 log n) poly(t) [INR10, NPR11]
O(d2 log n) O(nt) [DR82], [PR08]
O(d4 log n) O(t) [GI04]
O(d2 log2 n) poly(t) [GI04, implicit]
47
L columns
Filter-evaluate decoding paradigm
d columns
1 0 0 0 …………….. 0
Set of positives
r1
r2
r3
rt
.
.
.
d-disjunct matrix
“Filtering” matrix
y1
y2
y3
yt’
.
.
.poly(t’)time
O(Lt) time 48
So all we need to do
o(d2 log n/log d) tests
49
[Indyk, Ngo, R. 10]
[Ngo, Porat, R. 11]
Overview of the results
# tests (t) Decoding time
O(d2 log n) poly(t) [INR10, NPR11]
O(d2 log n) O(nt) [DR82], [PR08]
O(d4 log n) O(t) [GI04]
O(d2 log2 n) poly(t) [GI04, implicit]
50
The main message
51
Coding Theory
Group Testing
Open Questions
Close the gap between upper and lower bounds
Other applications of group testing? Complexity Theory?
Strongly explicit construction of optimal disjunct matrices ?
52
More on Coding Theory
53
http://www.cse.buffalo.edu/~atri/courses/coding-theory/book/index.html
Questions?
54
d+L columns
The filtering matrix
New* object: (d,L)-list disjunct matrix
d columns
Set of positives
Running naïve decoderreturns ≤ L bogus columns
Independently considered by
[Cheraghchi 09]
Independently considered by
[Cheraghchi 09]
(d,d)-list disjunct matricesexists with O(d log n) tests
55
Reed-Solomon codes
56
Message: (x0,x1,…,xk-1) Fk
View as poly. f(Y) = x0+x1Y+…+xk-1Yk-1
Encoding, RS(f) = ( f(1),f(2),…,f(m) ) F ={ 1,2,…,m}
f(1) f(2) f(3) f(4) f(m)
Alphabet size is at least m
Alphabet size is at least m
r
Revisiting the decoding algorithm
.
.
.
.
1
2
j
q
.
.
.
.
.
.
...
.
.
.
.
.
.
1x x ………… Sj
.
.
|Sj|≤ d
1 3 q21
11
……….……….……….
.
.
.
2
1
1
3
q
d-disjunct matrix
Naïve decoderNaïve decoder
Works but hits
a d3 barrier
Works but hits
a d3 barrier
57
r
Connection to List Recovery
x 0 0 1…. …. 0x
.
.
.
.
1
2
j
q
.
.
.
.
.
.
...
.
.
.
.
.
.
Decoding: Output all codewords that match the test results
1x x ………… Sj
.
.
.
………… S1
………… S2
………… Sq
List recover from S1,…,St to get the positive
codewords
List recover from S1,…,St to get the positive
codewords
|Sj|≤ d
58
r
Revisiting the decoding algorithm-II
.
.
.
.
1
2
j
q
.
.
.
.
.
.
...
.
.
.
.
.
.
1x x ………… Sj
.
.
|Sj|≤ 2d
1 3 q2
(d,d)-list disjunct
Naïve decoderNaïve decoder
Need to change the parameters
of the Reed-
Solomon codes a bit.
Need to change the parameters
of the Reed-
Solomon codes a bit.
59
http://www.impawards.com/2007/are_we_done_yet.html60
How we get our hands on…
.
.
.
.
1
2
j
q
.
.
.
.
.
.
...
.
.
.
.
.
.
1 3 q2
(d,d)-list disjunct
n ~ qq/d
RS codeword
d log qrows
t = q X (d log q)
~ (d X log n/ log q) X (d log q)
= d2 log n 61
Solution 1 [Indyk, Ngo, R. 10]
1 3 q2
(d,d)-list disjunctd log qrows
Pick “inner” codes at random
62
Solution 2 [Ngo, Porat, R. 10]
1 3 q2
(d,d)-list disjunctd log qrows
Use explicit expanders!
Some comments:
Left degree of the expander not important
d1+o(1) log q rows possible [GUV 07, Cheraghchi 09]
Use PV codes instead of RS codes63