finding needles in the internet haystack ron k. cytron washington university in saint louis...
TRANSCRIPT
![Page 1: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/1.jpg)
Finding Needles in the Internet Haystack
Ron K. CytronWashington University in Saint Louis
Department of Computer Sciencehttp://www.cs.wustl.edu/~cytron/
Century Club May 2002
Roger Chamberlain, Mark Franklin, Ron Indeck, John Lockwood, George Varghese (UCSD)
Mahesh JayaramThanks: Ben Brodie
Center for Distributed Object ComputingDepartment of Computer Science
Washington University
![Page 2: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/2.jpg)
Outline• Computers have come a long way
![Page 3: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/3.jpg)
Outline• Computers have come a long way
• Today’s computers are never lonely
![Page 4: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/4.jpg)
Outline• Computers have come a long way
• Today’s computers are never lonely
• Volumes and volumes of data
![Page 5: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/5.jpg)
Outline• Computers have come a long way
• Today’s computers are never lonely
• Volumes and volumes of data
• Fast searching of magnetic media
needle
needel needle
![Page 6: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/6.jpg)
Outline• Computers have come a long way
• Today’s computers are never lonely
• Volumes and volumes of data
• Fast searching of magnetic media
• Internet packet filtering
![Page 7: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/7.jpg)
Outline• Computers have come a long way
• Today’s computers are never lonely
• Volumes and volumes of data
• Fast searching of magnetic media
• Internet packet filtering
• Conclusion
![Page 8: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/8.jpg)
A Grandchild’s Gift1966
1999
Cost: $60 Cost: $35
Memory ½ char Memory 16 M chars
Speed: 1 cycle/s Speed: 16 M cycles/s
Fails: 10 seconds Fails: 5 years
![Page 9: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/9.jpg)
If cars improved that much in 30 years …
• $4000
• 60,000 miles per hour
• Seats 10,000 people
• Gets 20,000 miles per gallon
• Breaks every 70 years
![Page 10: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/10.jpg)
The Haystack
• The Internet is large and growing
• Content on the Internet is growing even faster
• A haystack sits still, but the Internet….
![Page 11: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/11.jpg)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
1969 1971 1973 1977 1983 1991 1993 1994
Year
Inte
rco
nn
ect
ed
Co
mp
ute
rsGrowth of the Internet
(why computers aren’t lonely anymore)
Y2K Problem (?):
More computers sold than TVs
![Page 12: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/12.jpg)
0
5,000
10,000
15,000
20,000
25,000
30,000
1979 1980 1988 1993
Year
Art
icle
s p
er
Da
y
Growth of Internet Content(volumes and volumes of data)
Anybody can publish
Problem is how to find what you want
![Page 13: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/13.jpg)
Page 6B
What can tech companies do? Some say they're at a loss, but others offer budding solutions
By Kevin Maney
On July 7, 1940, as the nation edged toward World War II, IBM put out a statement that made headlines. The company offered all its facilities for national defense, ready to convert to making anything the government needed.
Other leaders in the electro-mechanical technology of the day -- Ford Motor, General Motors, General Electric -- also threw their weight into defense efforts. They switched from making cars and washing machines to building tanks, aircraft engines and machine guns.
So here we are in 2001, readying for another war. The U.S. technology industry is the best and most innovative in the world. It is the nation's pride and joy.
Shouldn't it do something?
9/17/2001
![Page 14: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/14.jpg)
. . .
One possibility is in data-mining technology. Data mining is a way to collect millions of pieces of information in a computer system, sift through that data, make sense of them and come up with something useful. ''We (the U.S. tech industry) are experts at data mining and have vast resources of data to mine,'' says Tom Evslin, CEO of Internet communications company ITXC. ''We have used it to target advertising. We can probably use it to identify suspicious activity or potential terrorists.''
. . .
![Page 15: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/15.jpg)
Fast searching of magnetic mediawith
Roger Chamberlain, Mark Franklin, Ron Indeck,
John Lockwood
![Page 16: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/16.jpg)
Enabling Technology: Disk Drives
Magnetic disk storage areal density vs. year of IBM product introduction
(From D. A. Thompson)
Almost 10,000,000x increase in 45 years!
![Page 17: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/17.jpg)
Cost per Megabyte
Price history of hard disk product vs. year of product introduction
(From D. A. Thompson)
Cost decreasing 3% per week!
![Page 18: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/18.jpg)
• Storage industry will ship 4,000,000,000,000,000,000 Bytes this year
• FedEx generated 14 Terabytes of data last year
• US intelligence collects data equaling the printed collection of the US library every day!
Massive Storage & Data
![Page 19: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/19.jpg)
Massive Data Sets
• Employee records• Consumer information• Maps/mission/intelligence data• Genome maps Data sets now measured in Terabytes, and
are dynamic!
![Page 20: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/20.jpg)
Genome Application
• Genome maps growing expanded daily– Wash U sequencing center– Each of us has 80,000 genes found among 3 billion
characters of DNA (A,C,G,T)
• Look for matches– Identify function– Disease: understand, diagnose, detect, medicine, therapy– Biofuels, warfare, toxic waste– Understand evolution– Forensics, organ donors, authentication– More effective crops, disease resistance
![Page 21: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/21.jpg)
DNA String Matching
• Looking for CACGTTAGT…TAGC
• Interested in matches and near matches
• Search human genome and other gene oceans– Need to search entire data sets
![Page 22: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/22.jpg)
Bio Computation Problem
*BIG* Genome
DatabasesA C G T G
T A C A G
DNA pattern
DNA sequence
Match?Approximate matches are just as useful
![Page 23: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/23.jpg)
Finding a needel in a heystuck• DNA and live text can contain errors• We often seek an approximate match, for
exampleneedle
• No match? Try 2-transpositionsenedle, needle, nedele, neelde, needel
• No match? Try 1-deletionseedle, nedle, nedle, neele, neede, needl
• No match? Try insertions, larger edits, …• An exponential number of possibilities
![Page 24: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/24.jpg)
No
How is this done today?
• Think of every way a word can be misspelled• Present each misspelling to the computer for an
exact match
enedle needle nedele neelde needel
Yes
![Page 25: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/25.jpg)
How can we do better?
• Data is present on magnetic media
• Hardware at the disk is– Already fault tolerant (more on this later)
needel needle
– Distributed across all surfaces
needle
needel
We win if number of misspellings is large, and the number of false hits is small
![Page 26: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/26.jpg)
Another Application:Intelligence Data
• Lots of data
• Changing constantly
• Many perturbations– Tzar, tsar, czar, . . .
• Don’t know what we want to look for beforehand
![Page 27: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/27.jpg)
Google Search Engine
• Crawls the web once per month
• Caches web pages
• Fast, exact text-based search (see how soon)
needle
needleneedel
![Page 28: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/28.jpg)
Image Database Applications
• Challenging database
• Unstructured
• Massive data sets
• Don’t know what we need to look for in each picture
![Page 29: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/29.jpg)
Satellite Data
• Low-orbit fly-over every 90 minutes• Look for differences in images
– Large objects– Troops– Changes to landscape
• Flag, transmit these differences immediately• National Reconnaissance Office• City assessors . . .
![Page 30: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/30.jpg)
Washington University
Hilltop Campus
![Page 31: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/31.jpg)
How do we find what we’re looking for?!
![Page 32: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/32.jpg)
Conventional Structured DatabaseDid
43
12
DocumentAgent James Bond
Agent mobile computerJames Madison movie
James Bond movie
Word
Jamescomputer
agentBond
Inverted list - pointers<1,2><1,4><2>
<1,3,4>Madison <3>mobile <2>movie <3,4>
![Page 33: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/33.jpg)
Challenges in SearchingMassive Databases
Know what to search for– need to build index beforehand– maintain index as it changes
Do not know what to search for– need to search the whole database!
![Page 34: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/34.jpg)
Conventional Search
Hard drive
Processor
MemoryI/O bus
Memory bus
![Page 35: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/35.jpg)
Conventional Search
Hard drive
Processor
MemoryI/O bus
Memory bus
find ….
Conventional Search
![Page 36: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/36.jpg)
Conventional Search
Hard drive
Processor
MemoryI/O bus
Memory buscontents
yes, no, no, yes, yes ….
Conventional Search
![Page 37: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/37.jpg)
Conventional Approach
![Page 38: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/38.jpg)
WUSTL’s Approach
![Page 39: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/39.jpg)
Hard drive
Processor
Memory
I/O bus
Memory Bus
Reconfigurable hardware
Memory/processing
Streaming Approach
![Page 40: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/40.jpg)
Hard drive
Processor
Memory
I/O bus
Memory Bus
Reconfigurable hardware
Memory/processing
find
Streaming Approach
![Page 41: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/41.jpg)
Hard drive
Processor
Memory
I/O bus
Memory Bus
Reconfigurable hardware
Memory/processing
find
Streaming Approach
![Page 42: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/42.jpg)
Hard drive
Processor
Memory
I/O bus
Memory Bus
Reconfigurable hardware
Memory/processing
Parallelism through each transducer and drive
find
yes, no, no, yes, yes
Streaming Approach
![Page 43: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/43.jpg)
Magnetic Recording Channel Schematic
Encoder
Decoder
Detector
Input UserData
Decoded UserData
Channel Bits
Head Disk
Analog Readback
A
BC
To Bus or Cache
![Page 44: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/44.jpg)
Key streaming over Data
![Page 45: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/45.jpg)
Disk Level Implementation
100-bit-key matching through a pseudo-random binary series
score
matches
![Page 46: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/46.jpg)
Status: Prototype in progress
FPX
NID
RAD
Hard drive
HostATAPI
Controller
IDE busIDE bus
Tap16bit Data
15bit CTRL
Custom PCB forElectrical Termination &5V to 3.3V Conversion
32 RADtest pins
Loopback module
module
Setup reused from FPX
IDE_to_ATM module
![Page 47: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/47.jpg)
Internet Packet Filteringwith
Mahesh Jayaram and
George Varghese
![Page 48: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/48.jpg)
Finding Needles in a Moving Haystack
![Page 49: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/49.jpg)
As technology improves, transmission time decreases but latency stays the same
Year
Cost of Internet Request
Latency
Transmission
Time
![Page 50: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/50.jpg)
Example: Garden Hose
Water Supply
Latency (first drop) ~ distance
Bandwidth ~ hose diameter
Fire department and gardener suffer the same wait
![Page 51: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/51.jpg)
Example: Hot Shower
You want this water
Latency (time to get hot water) ~ distance
![Page 52: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/52.jpg)
Convection circuit continuously circulates hot water
Latency ~ 0
Latency-Free Hot Shower
![Page 53: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/53.jpg)
Better to receive than to give
• Cable broadcast
• Radio broadcast
• TV guide channel
• Gate connection announcements in flight
• Winning lottery number
Modern name: push technology
![Page 54: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/54.jpg)
Better to receive than to give
![Page 55: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/55.jpg)
How do you get what you want?
![Page 56: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/56.jpg)
Packet Filters
Filter F(Weather)
![Page 57: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/57.jpg)
Packet Filters
Filter F(Weather)
![Page 58: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/58.jpg)
Existing Approach
IBM Quote
Weather
Flight Schedule
![Page 59: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/59.jpg)
Our approach
IBM QuoteWeatherFlight Schedule
Composite filter makes just one pass
![Page 60: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/60.jpg)
How we do it
IBM Quote
Weather
Flight Schedule
Grammar 1
Grammar 2
Grammar 3
Parsing Engine
![Page 61: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/61.jpg)
TCPConnHeader : EtherType IPHeader TCPPortPair
EtherType : #IP_TYPE
IPHeader : Vers HlenPlusRest
Vers : HalfByte
HlenPlusRest : 0 1 0 1 FixedRest | 0 1 1 0 FixedRest OneIPOption
| 0 1 1 1 FixedRest TwoIPOption
| 1 0 0 0 FixedRest ThreeIPOption
| 1 0 0 1 FixedRest FourIPOption
| 1 0 1 0 FixedRest FiveIPOption
| 1 0 1 1 FixedRest FiveIPOption OneIPOption
| 1 1 0 0 FixedRest FiveIPOption TwoIPOption
| 1 1 0 1 FixedRest FiveIPOption ThreeIPOption
| 1 1 1 0 FixedRest FiveIPOption FourIPOption
| 1 1 1 1 FixedRest FiveIPOption FiveIPOption
FixedRest : ServiceType TotalLength Identification Flags
FragmentOffset TimeToLive Protocol HeaderChecksum IPAddrPair
ServiceType : Byte
TotalLength : TwoByte
Identification : TwoByte
Flags : bit bit bit
FragmentOffset : bit Byte HalfByte
TimeToLive : Byte
Protocol : #TCP_PROTOCOL
HeaderChecksum : TwoByte
IPAddrPair : #IP_SRC_DST_PAIR
FiveIPOption : ThreeIPOption TwoIPOption
FourIPOption : TwoIPOption TwoIPOption
ThreeIPOption : TwoIPOption OneIPOption
TwoIPOption : OneIPOption OneIPOption
OneIPOption : Option Padding
Option : ThreeByte
Padding : Byte
TCPPortPair : #TCP_PORT_PAIR
FourByte : TwoByte TwoByte
ThreeByte : TwoByte Byte
TwoByte : Byte Byte
Byte : HalfByte HalfByte
HalfByte : bit bit bit bit
bit : 0
| 1
Sample grammar for TCP packet
![Page 62: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/62.jpg)
Results
The more things you want, the slower existing approaches get
Our performance doesn’t degrade
![Page 63: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/63.jpg)
Conclusions
• The Internet and its content are growing explosively
• Disk storage is abundant, cheap, reliable
• Technology must provide fast, inexact searching of text and images
• As more data is hurled at and past us, fast filtering of Internet traffic is a must
![Page 64: Finding Needles in the Internet Haystack Ron K. Cytron Washington University in Saint Louis Department of Computer Science cytron](https://reader031.vdocuments.net/reader031/viewer/2022032204/56649e415503460f94b3283b/html5/thumbnails/64.jpg)
Questions?