![Page 1: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/1.jpg)
Algorithms and Abstractions for Stream-and-Sort
![Page 2: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/2.jpg)
Announcements
Thursday: • there will be a short quiz• quiz will close at midnight Thursday,
but probably best to do it soon after class
![Page 3: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/3.jpg)
Recap – Last week• Algorithms that use only stream and sort primitives:– Naïve Bayes training (event counting)– Inverted indices, preprocessing for distributional
clustering, …– Naïve Bayes classification (with event counts out of
memory)• Phrase finding task– Score all the phrases in a corpus for “phrasiness” and
“informativeness”– Statistics based on BLRT, Pointwise KL-divergence
![Page 4: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/4.jpg)
Today• Phrase finding task – the implementation– Score all the phrases in a corpus for
“phrasiness” and “informativeness”– Statistics based on BLRT, Pointwise KL-
divergence• Phrase finding algorithm• Abstractions for Map-Reduce• Guest lecture: Manik Varma, MSR India
![Page 5: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/5.jpg)
Phrase Finding
William W. Cohen
![Page 6: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/6.jpg)
scoring
![Page 7: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/7.jpg)
“Phraseness”1 – based on BLRT–Define• pi=ki /ni, p=(k1+k2)/(n1+n2),
• L(p,k,n) = pk(1-p)n-k
comment
k1 C(W1=x ^ W2=y)
how often bigram x y occurs in corpus C
n1 C(W1=x) how often word x occurs in corpus C
k2 C(W1≠x^W2=y)
how often y occurs in C after a non-x
n2 C(W1≠x) how often a non-x occurs in C
Phrase x y: W1=x ^ W2=y
Does y occur at the same frequency after x as in other positions?
![Page 8: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/8.jpg)
“Informativeness”1 – based on BLRT
–Define• pi=ki /ni, p=(k1+k2)/(n1+n2),
• L(p,k,n) = pk(1-p)n-k
Phrase x y: W1=x ^ W2=y and two corpora, C and B
comment
k1 C(W1=x ^ W2=y)
how often bigram x y occurs in corpus C
n1 C(W1=* ^ W2=*)
how many bigrams in corpus C
k2 B(W1=x^W2=y) how often x y occurs in background corpus
n2 B(W1=* ^ W2=*)
how many bigrams in background corpus
Does x y occur at the same frequency in both corpora?
![Page 9: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/9.jpg)
The breakdown: what makes a good phrase
– To compare distributions, use KL-divergence
“Pointwise KL divergence”
Phraseness: difference between bigram and unigram language model in foreground
Bigram model: P(x y)=P(x)P(y|x)
Unigram model: P(x y)=P(x)P(y)
![Page 10: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/10.jpg)
The breakdown: what makes a good phrase
– To compare distributions, use KL-divergence
“Pointwise KL divergence”
Informativeness: difference between foreground and background models
Bigram model: P(x y)=P(x)P(y|x)
Unigram model: P(x y)=P(x)P(y)
w would be a phrase here
![Page 11: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/11.jpg)
The breakdown: what makes a good phrase
– To compare distributions, use KL-divergence
“Pointwise KL divergence”
Combined: difference between foreground bigram model and background unigram model
Bigram model: P(x y)=P(x)P(y|x)
Unigram model: P(x y)=P(x)P(y)
![Page 12: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/12.jpg)
Implementation• Request-and-answer pattern
– Main data structure: tables of key-value pairs• key is a phrase x y • value is a mapping from a attribute names (like phraseness,
freq-in-B, …) to numeric values.
– Keys and values are just strings– We’ll operate mostly by sending messages to this data
structure and getting results back, or else streaming thru the whole table
– For really big data: we’d also need tables where key is a word and val is set of attributes of the word (freq-in-B, freq-in-C, …)
![Page 13: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/13.jpg)
Generating and scoring phrases: 1
• Stream through foreground corpus and count events “W1=x ^ W2=y” the same way we do in training naive Bayes: stream-and sort and accumulate deltas (a “sum-reduce”)– Don’t bother generating “boring” phrases (e.g., crossing a
sentence, contain a stopword, …)• Then stream through the output and convert to phrase, attributes-
of-phrase records with one attribute: freq-in-C=n• Stream through foreground corpus and count events “W1=x” in a
(memory-based) hashtable….• This is enough* to compute phrasiness:
– ψp(x y) = f( freq-in-C(x), freq-in-C(y), freq-in-C(x y))
• …so you can do that with a scan through the phrase table that adds an extra attribute (holding word frequencies in memory).
* actually you also need total # words and total #phrases….
![Page 14: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/14.jpg)
Generating and scoring phrases: 2
• Stream through background corpus and count events “W1=x ^ W2=y” and convert to phrase, attributes-of-phrase records with one attribute: freq-in-B=n
• Sort the two phrase-tables: freq-in-B and freq-in-C and run the output through another “reducer” that– appends together all the attributes associated
with the same key, so we now have elements like
![Page 15: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/15.jpg)
Generating and scoring phrases: 3
• Scan the through the phrase table one more time and add the informativeness attribute and the overall quality attribute
Summary, assuming word vocabulary nW is small:• Scan foreground corpus C for phrases: O(nC) producing mC
phrase records – of course mC << nC
• Compute phrasiness: O(mC) • Scan background corpus B for phrases: O(nB) producing mB • Sort together and combine records: O(m log m), m=mB +
mC
• Compute informativeness and combined quality: O(m)
Assumes word counts fit in memory
![Page 16: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/16.jpg)
Is there a stream-and-sort analog of this request-and-answer pattern?
id1 found an aardvark in zynga’s farmville today!id2 …id3 ….id4 …id5 …..
Test data Record of all event counts for each word
w Counts associated with W
aardvark C[w^Y=sports]=2
agent C[w^Y=sports]=1027,C[w^Y=worldNews]=564
… …
zynga C[w^Y=sports]=21,C[w^Y=worldNews]=4464
Classification logic
found ~ctr to id1
aardvark ~ctr to id2
…today ~ctr to idi
…
Counter records
requests
Combine and sort
Recap
![Page 17: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/17.jpg)
A stream-and-sort analog of the request-and-answer pattern…
Record of all event counts for each word
w Countsaardvark C[w^Y=sports]
=2
agent …
…
zynga …
found ~ctr to id1
aardvark ~ctr to id1
…today ~ctr to id1
…
Counter records
requests
Combine and sort
w Counts
aardvark C[w^Y=sports]=2
aardvark ~ctr to id1
agent C[w^Y=sports]=…
agent ~ctr to id345
agent ~ctr to id9854
… ~ctr to id345
agent ~ctr to id34742
…
zynga C[…]
zynga ~ctr to id1
Request-handling logic
Recap
![Page 18: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/18.jpg)
A stream-and-sort analog of the request-and-answer pattern…
requests
Combine and sort
w Counts
aardvark C[w^Y=sports]=2
aardvark ~ctr to id1
agent C[w^Y=sports]=…
agent ~ctr to id345
agent ~ctr to id9854
… ~ctr to id345
agent ~ctr to id34742
…
zynga C[…]
zynga ~ctr to id1
Request-handling logic
•previousKey = somethingImpossible• For each (key,val) in input:• If key==previousKey
• Answer(recordForPrevKey,val)• Else
• previousKey = key• recordForPrevKey = val
define Answer(record,request):• find id where “request = ~ctr to id”• print “id ~ctr for request is record”
Recap
![Page 19: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/19.jpg)
A stream-and-sort analog of the request-and-answer pattern…
requests
Combine and sort
w Counts
aardvark C[w^Y=sports]=2
aardvark ~ctr to id1
agent C[w^Y=sports]=…
agent ~ctr to id345
agent ~ctr to id9854
… ~ctr to id345
agent ~ctr to id34742
…
zynga C[…]
zynga ~ctr to id1
Request-handling logic
•previousKey = somethingImpossible• For each (key,val) in input:• If key==previousKey
• Answer(recordForPrevKey,val)• Else
• previousKey = key• recordForPrevKey = val
define Answer(record,request):• find id where “request = ~ctr to id”• print “id ~ctr for request is record”
Output:id1 ~ctr for aardvark is C[w^Y=sports]=2…id1 ~ctr for zynga is ….…
Recap
![Page 20: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/20.jpg)
A stream-and-sort analog of the request-and-answer pattern…
w Counts
aardvark C[w^Y=sports]=2
aardvark ~ctr to id1
agent C[w^Y=sports]=…
agent ~ctr to id345
agent ~ctr to id9854
… ~ctr to id345
agent ~ctr to id34742
…
zynga C[…]
zynga ~ctr to id1
Request-handling logic
Output:id1 ~ctr for aardvark is C[w^Y=sports]=2…id1 ~ctr for zynga is ….…id1 found an aardvark in zynga’s farmville today!id2 …id3 ….id4 …id5 …..
Combine and sort ????
Recap
![Page 21: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/21.jpg)
Key Value
id1 found aardvark zynga farmville today
~ctr for aardvark is C[w^Y=sports]=2
~ctr for found is
C[w^Y=sports]=1027,C[w^Y=worldNews]=564
…
id2 w2,1 w2,2 w2,3 ….
~ctr for w2,1 is …
… …
What we ended up with
Recap
![Page 22: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/22.jpg)
Ramping it up – keeping word counts out of memory
• Goal: records for xy with attributes freq-in-B, freq-in-C, freq-of-x-in-C, freq-of-y-in-C, …
• Assume I have built built phrase tables and word tables….how do I incorporate the word attributes into the phrase records?
• For each phrase xy, request necessary word frequencies:– Print “x ~request=freq-in-C,from=xy”– Print “y ~request=freq-in-C,from=xy”
• Sort all the word requests in with the word tables• Scan through the result and generate the answers: for each
word w, a1=n1,a2=n2,….
– Print “xy ~request=freq-in-C,from=w”• Sort the answers in with the xy records• Scan through and augment the xy records appropriately
![Page 23: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/23.jpg)
Generating and scoring phrases: 3
Summary1. Scan foreground corpus C for phrases, words: O(nC)
producing mC phrase records, vC word records2. Scan phrase records producing word-freq requests:
O(mC )producing 2mC requests
3. Sort requests with word records: O((2mC + vC )log(2mC + vC))
= O(mClog mC) since vC < mC
4. Scan through and answer requests: O(mC)5. Sort answers with phrase records: O(mClog mC) 6. Repeat 1-5 for background corpus: O(nB + mBlogmB)7. Combine the two phrase tables: O(m log m), m = mB
+ mC
8. Compute all the statistics: O(m)
![Page 24: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/24.jpg)
24
ABSTRACTIONS FOR STREAM AND SORTAND MAP-REDUCE
![Page 25: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/25.jpg)
25
Abstractions On Top Of Map-Reduce
• We’ve decomposed some algorithms into a map-reduce “workflow” (series of map-reduce steps)–naive Bayes training–naïve Bayes testing–phrase scoring
• How else can we express these sorts of computations? Are there some common special cases of map-reduce steps we can parameterize and reuse?
![Page 26: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/26.jpg)
26
Abstractions On Top Of Map-Reduce
• Some obvious streaming processes: – for each row in a table• Transform it and
output the result
• Decide if you want to keep it with some boolean test, and copy out only the ones that pass the test
Example: stem words in a stream of word-count pairs:(“aardvarks”,1) (“aardvark”,1)
Proposed syntax:
table2 = MAP table1 TO λ row : f(row)) f(row)row’
Example: apply stop words(“aardvark”,1) (“aardvark”,1)(“the”,1) deleted
Proposed syntax:
table2 = FILTER table1 BY λ row : f(row)) f(row) {true,false}
![Page 27: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/27.jpg)
27
Abstractions On Top Of Map-Reduce
• A non-obvious? streaming processes: – for each row in a table• Transform it to a list
of items• Splice all the lists
together to get the output table (flatten)Example: tokenizing a line
“I found an aardvark” [“i”, “found”,”an”,”aardvark”]“We love zymurgy” [“we”,”love”,”zymurgy”]..but final table is one word per row
“i”“found”“an”“aardvark”“we”“love”…
Proposed syntax:
table2 = FLATMAP table1 TO λ row : f(row)) f(row)list of rows
![Page 28: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/28.jpg)
28
Abstractions On Top Of Map-Reduce
• Another example from the Naïve Bayes test program…
![Page 29: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/29.jpg)
NB Test Step
X=w1^Y=sportsX=w1^Y=worldNewsX=..X=w2^Y=…X=……
524510542120
373
…
Event counts
How:• Stream and sort:• for each C[X=w^Y=y]=n
• print “w C[Y=y]=n”• sort and build a list of values
associated with each key wLike an inverted index
w Counts associated with W
aardvark C[w^Y=sports]=2
agent C[w^Y=sports]=1027,C[w^Y=worldNews]=564
… …
zynga C[w^Y=sports]=21,C[w^Y=worldNews]=4464
![Page 30: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/30.jpg)
NB Test Step
X=w1^Y=sportsX=w1^Y=worldNewsX=..X=w2^Y=…X=……
524510542120
373
…
Event counts
w Counts associated with W
aardvark C[w^Y=sports]=2
agent C[w^Y=sports]=1027,C[w^Y=worldNews]=564
… …
zynga C[w^Y=sports]=21,C[w^Y=worldNews]=4464
The general case:We’re taking rows from a table• In a particular format
(event,count)Applying a function to get a new value• The word for the eventAnd grouping the rows of the table by this new value
Grouping operationSpecial case of a map-reduce
Proposed syntax:
GROUP table BY λ row : f(row) Could define f via: a function, a field of a defined record structure, …
f(row)field
![Page 31: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/31.jpg)
NB Test StepThe general case:We’re taking rows from a table• In a particular format
(event,count)Applying a function to get a new value• The word for the eventAnd grouping the rows of the table by this new value
Grouping operationSpecial case of a map-reduce
Proposed syntax:
GROUP table BY λ row : f(row) Could define f via: a function, a field of a defined record structure, …
f(row)field
Aside: you guys know how to implement this, right?
1. Output pairs (f(row),row) with a map/streaming process
2. Sort pairs by key – which is f(row)
3. Reduce and aggregate by appending together all the values associated with the same key
![Page 32: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/32.jpg)
32
Abstractions On Top Of Map-Reduce
• And another example from the Naïve Bayes test program…
![Page 33: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/33.jpg)
Request-and-answer
id1 w1,1 w1,2 w1,3 …. w1,k1
id2 w2,1 w2,2 w2,3 …. id3 w3,1 w3,2 …. id4 w4,1 w4,2 …id5 w5,1 w5,2 …...
Test data Record of all event counts for each word
w Counts associated with W
aardvark C[w^Y=sports]=2
agent C[w^Y=sports]=1027,C[w^Y=worldNews]=564
… …
zynga C[w^Y=sports]=21,C[w^Y=worldNews]=4464
Step 2: stream through and for each test case
idi wi,1 wi,2 wi,3 …. wi,ki
request the event counters needed to classify idi from the event-count DB, then classify using the answers
Classification logic
![Page 34: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/34.jpg)
Request-and-answer
• Break down into stages– Generate the data being requested (indexed by
key, here a word)• Eg with group … by
– Generate the requests as (key, requestor) pairs• Eg with flatmap … to
– Join these two tables by key• Join defined as (1) cross-product and (2) filter out pairs
with different values for keys • This replaces the step of concatenating two different
tables of key-value pairs, and reducing them together
– Postprocess the joined result
![Page 35: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/35.jpg)
w Counters
aardvark C[w^Y=sports]=2
agent C[w^Y=sports]=1027,C[w^Y=worldNews]=564
… …
zynga C[w^Y=sports]=21,C[w^Y=worldNews]=4464
w Counters Requests
aardvark C[w^Y=sports]=2 ~ctr to id1
agent C[w^Y=sports]=…
~ctr to id345
agent C[w^Y=sports]=…
~ctr to id9854
agent C[w^Y=sports]=…
~ctr to id345
… C[w^Y=sports]=…
~ctr to id34742
zynga C[…] ~ctr to id1
zynga C[…] …
w Request
found ~ctr to id1
aardvark ~ctr to id1
…
zynga ~ctr to id1
… ~ctr to id2
![Page 36: Algorithms and Abstractions for Stream-and-Sort. Announcements Thursday: there will be a short quiz quiz will close at midnight Thursday, but probably](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f315503460f94c4c265/html5/thumbnails/36.jpg)
w Counters
aardvark C[w^Y=sports]=2
agent C[w^Y=sports]=1027,C[w^Y=worldNews]=564
… …
zynga C[w^Y=sports]=21,C[w^Y=worldNews]=4464
w Counters Requests
aardvark C[w^Y=sports]=2 id1
agent C[w^Y=sports]=…
id345
agent C[w^Y=sports]=…
id9854
agent C[w^Y=sports]=…
id345
… C[w^Y=sports]=…
id34742
zynga C[…] id1
zynga C[…] …
w Request
found id1
aardvark id1
…
zynga id1
… id2
Proposed syntax:
JOIN table1 BY λ row : f(row), table2 BY λ row : g(row)
Examples:
JOIN wordInDoc BY word, wordCounters BY word --- if word(row) defined correctlyJOIN wordInDoc BY lambda (word,docid):word, wordCounters BY lambda (word,counters):word – using python syntax for functions