speech and language processing: where have we been and where are we going? kenneth ward church...
TRANSCRIPT
Speech and Language Processing:
Where have we beenand where are we going?
Kenneth Ward Church
AT&T Labs-Research
www.research.att.com/~kwc
Eurospeech 2003 2
Where have we been?How To Cook A Demo
(After Dinner Talk at TMI-1992 & Invited Talk at TMI-2002)
• Great fun!
• Effective demos– Theater, theater, theater– Production quality matters– Entertainment >> evaluation– Strategic vision >> technical correctness
• Success/Catastrophe– Warning: demos can be too effective– Dangerous to raise unrealistic expectations
Message forAfter Dinner Talk
Message forAfter Breakfast Talk
Eurospeech 2003 3
Let’s go to the video tape!(Lesson: manage expectations)
• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful
careers: president of MIT, Microsoft exec, etc.
Eurospeech 2003 4
Let’s go to the video tape!(Lesson: manage expectations)
• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful
careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video
– Classic example of a demo embarrassment in retrospect
Eurospeech 2003 5
Let’s go to the video tape!(Lesson: manage expectations)
• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful
careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video
– Classic example of a demo embarrassment in retrospect2. Translating telephone (late 1980s) video
– Pierre Isabelle pulled a similar demo because it was so effective– The limitations of the technology were hard to explain to public
• Though well understood by research community
Eurospeech 2003 6
Let’s go to the video tape!(Lesson: manage expectations)
• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful
careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video
– Classic example of a demo embarrassment in retrospect2. Translating telephone (late 1980s) video
– Pierre Isabelle pulled a similar demo because it was so effective– The limitations of the technology were hard to explain to public
• Though well understood by research community
3. Apple (~1990) video– Still having trouble setting appropriate expectations– Factoid: the day of this demo, speech recognition deployed at scale in
AT&T network – with significant lasting impact – but little media
Eurospeech 2003 7
Let’s go to the video tape!(Lesson: manage expectations)
• Lots of predictions– Entertaining in retrospect– Nevertheless, many of these people went on to very successful
careers: president of MIT, Microsoft exec, etc.1. Machine Translation (1950s) video
– Classic example of a demo embarrassment in retrospect2. Translating telephone (late 1980s) video
– Pierre Isabelle pulled a similar demo because it was so effective– The limitations of the technology were hard to explain to public
• Though well understood by research community
3. Apple (~1990) video– Still having trouble setting appropriate expectations– Factoid: the day of this demo, speech recognition deployed at scale in
AT&T network – with significant lasting impact – but little media4. Andy Rooney (~1990): reset expectations video
Eurospeech 2003 8
Outline: Where have we been and where are we going?
1. Consistent progress over decades Moore’s Law, Speech Coding, Error Rate
2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)
3. Discontinuities: Fundamental changes that invalidate fundamental assumptions
• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation
ManagingExpectations
Eurospeech 2003 9
Charles Wayne’s Challenge:Demonstrate Consistent Progress Over Time
• Controversial in 1980s– But not in 1990s– Though, lgrumbling
• Benefits1. Agreement on what to do2. Limits endless discussion3. Helps sell the field
• Manage expectations• Fund raising
• Risks (similar to benefits)1. All our eggs are in one basket
(lack of diversity)2. Not enough discussion
• Hard to change course
3. Methodology Burden
ManagingExpectations
Eurospeech 2003 12
Where have we been and where are we going?Moore’s Law: Ideal Answer
Why different slopes?1. Progress limited by physicsphysics
– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)
Physics & Investment Rate of Progress
in Speech & Language(and everything)
Normal Inflation
Hyper-Inflation
Eurospeech 2003 13
Where have we been and where are we going?Moore’s Law: Ideal Answer
Why different slopes?1. Progress limited by physicsphysics
– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)
2. Progress limited by investmentinvestment – Case history: PCs improved faster than
supercomputers (Cray)• PCs: larger market more R&D
– Irony: “Dis-economy of Scale”– Danny Hillis (Thinking Machines)
• Computing is better (cheaper & faster) on smaller machines
– PCs >> big iron– LAN routers >> 5ESS (big phone switch)
– Economies of scale depend on size of market, not size of machine• Market: PC >> big iron (Economist View)• Machine: PC << big iron (CS View)
Physics & Investment Rate of Progress
in Speech & Language(and everything)
Normal Inflation
Hyper-Inflation
Eurospeech 2003 14
Where have we been and where are we going?Moore’s Law: Ideal Answer
Why different slopes?1. Progress limited by physicsphysics
– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)
2. Progress limited by investmentinvestment – Case history: PCs improved faster than
supercomputers (Cray)• PCs: larger market more R&D
– Irony: “Dis-economy of Scale”– Danny Hillis (Thinking Machines)
• Computing is better (cheaper & faster) on smaller machines
– PCs >> big iron– LAN routers >> 5ESS (big phone switch)
– Economies of scale depend on size of market, not size of machine• Market: PC >> big iron (Economist View)• Machine: PC << big iron (CS View)
Physics & Investment Rate of Progress
in Speech & Language(and everything)
Normal Inflation
Hyper-Inflation
Eurospeech 2003 15
Where have we been and where are we going?Moore’s Law: Ideal Answer
Why different slopes?1. Progress limited by physicsphysics
– Disk seek: 10 years (normal inflation)– Disk capacity: 1 year (hyper-inflation)
2. Progress limited by investmentinvestment – Case history: PCs improved faster than
supercomputers (Cray)• PCs: larger market more R&D
– Irony: “Dis-economy of Scale”– Danny Hillis (Thinking Machines)
• Computing is better (cheaper & faster) on smaller machines
– PCs >> big iron– LAN routers >> 5ESS (big phone switch)
– Economies of scale depend on size of market, not size of machine• Market: PC >> big iron (Economist View)• Machine: PC << big iron (CS View)
Physics & Investment Rate of Progress
in Speech & Language(and everything)
Normal Inflation
Hyper-Inflation
Eurospeech 2003 16
Bit Rate (kb/s)
Sp
eech
Qu
alit
y
Excellent
Good
Fair
Poor
Bad
Evolution of Speech Coder Performance
ITU RecommendationsCellular Standards
Secure Telephony
1980 Profile1990 Profile2000 Profile
2000
1980
1990
North American TDMA
Borrowed SlideRich Cox
Eurospeech 2003 17
Speech Coding
(Telephony)
• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)
• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)
Ceiling
Eurospeech 2003 18
Speech Coding
(Telephony)
• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)
• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)
• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)
• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)
– Limited more by physics than investment
Ceiling
Eurospeech 2003 19
Speech Coding
(Telephony)
• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)
• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)
• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)
• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)
– Limited more by physics than investment• Potential compression opportunity
– At most 10x: 8 kb/s 2 kb/s 1 kb/s (?)– Entropy: 50 bits per sec (Roger Moore)
Ceiling
Eurospeech 2003 20
Speech Coding
(Telephony)
• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)
• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)
• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)
• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)
– Limited more by physics than investment• Potential compression opportunity
– At most 10x: 8 kb/s 2 kb/s 1 kb/s (?) 50 bits per sec (??)• Speech (2 kb/s) >> text (2 bits/char): 10-1000 times more bits
– Speech coding will not close this gap for foreseeable future
Ceiling
Eurospeech 2003 21
Where have we been and where are we going?
1. Consistent progress over decades• Moore’s Law• Speech Coding Reducing Speech Recognition Error Rates
2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)
3. Discontinuities: Fundamental changes that invalidate fundamental assumptions
• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation
Eurospeech 2003 22
Err
or
Ra
te
Date (15 years)
Moore’s Law Time Constant:• 10x improvement per decade• Limited by R&D Investment
• (Not Physics)
Borrowed SlideAudrey Le (NIST)
Eurospeech 2003 23
Milestones in Speech and Multimodal Technology Research
1962 1967 1972 1977 1982 1987 1992 1997 2002
Year
Isolated Words
Filter-bank analysis;
Time-normalization
;Dynamic programming
Isolated Words; Connected Digits;
Continuous Speech
Pattern recognition; LPC
analysis; Clustering
algorithms; Level building;
Continuous Speech; Speech Understanding
Stochastic language understanding;
Finite-state machines;
Statistical learning;
Small Vocabulary,
Acoustic Phonetics-
based
Medium Vocabulary, Template-based
Large Vocabulary;
Syntax, Semantics,
Connected Words;
Continuous Speech
Large Vocabulary,
Statistical-based
Hidden Markov models;
Stochastic Language modeling;
Spoken dialog; Multiple
modalities
Very Large Vocabulary; Semantics, Multimodal Dialog, TTS
Concatenative synthesis; Machine
learning; Mixed-initiative dialog;
BorrowedSlide
Consistent improvement over time, but unlike Moore’s Law, hard to extrapolate (predict future)
Eurospeech 2003 24
Speech-Related TechnologiesWhere will the field go in 10 years?
Niels Ole Bernsen (ed)
2003 Useful speech recognition-based language tutor
2003 Useful portable spoken sentence translation systems
2003 First pro-active spoken dialogue with situation awareness
2004 Satisfactory spoken car navigation systems
2005Small-vocabulary (> 1000 words)spoken conversational systems
2006Multiple-purpose personal assistants (spoken dialog, animated characters)
2006 Task-oriented spoken translation systems for the web
2006 Useful speech summarization systems in top languages
2008 Useful meeting summarization systems
2010 Medium-size vocabulary conversational systems
Eurospeech 2003 25
Where have we been and where are we going?Consistent Progress over Time
Extrapolation/Prediction is Applicable
Extrapolation/Prediction is Not Applicable
2002 2003 2004
t
$
Physics andInvestment
Investment
Physics
ManageExpectations
Eurospeech 2003 26
Where have we been and where are we going?
1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate
History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)
3. Discontinuities: Fundamental changes that invalidate fundamental assumptions
• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation
Eurospeech 2003 27
It has been claimed that
Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?
• 1950s: Empiricism was at its peak– Dominating a broad set of fields
• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)
– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse
– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers
• 1970s: Rationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)
• “More data is better data”• Quantity >> Quality (balance)
– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all
– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of Rationalism (?)
Eurospeech 2003 28
It has been claimed that
Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?
• 1950s: EmpiricismEmpiricism was at its peak– Dominating a broad set of fields
• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)
– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse
– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers
• 1970s: Rationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)
• “More data is better data”• Quantity >> Quality (balance)
– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all
– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of Rationalism (?)
Eurospeech 2003 29
It has been claimed that
Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?
• 1950s: EmpiricismEmpiricism was at its peak– Dominating a broad set of fields
• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)
– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse
– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers
• 1970s: RationalismRationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)
• “More data is better data”• Quantity >> Quality (balance)
– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all
– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of Rationalism (?)
Eurospeech 2003 30
It has been claimed that
Recent progress made possible by EmpiricismEmpiricismProgress (or Oscillating Fads)?
• 1950s: EmpiricismEmpiricism was at its peak– Dominating a broad set of fields
• Ranging from psychology (Behaviorism)• To electrical engineering (Information Theory)
– Psycholinguistics: Word frequency norms (correlated with reaction time, errors)• Word association norms (priming): bread and butter, doctor / nurse
– Linguistics/psycholinguistics: focus on distribution (correlate of meaning)• Firth: “You shall know a word by the company it keeps”• Collocations: Strong tea v. powerful computers
• 1970s: RationalismRationalism was at its peak– with Chomsky’s criticism of ngrams in Syntactic Structures (1957)– and Minsky and Papert’s criticism of neural networks in Perceptrons (1969).
• 1990s: Revival of EmpiricismEmpiricism– Availability of massive amounts of data (popular arg, even before the web)
• “More data is better data”• Quantity >> Quality (balance)
– Pragmatic focus:• What can we do with all this data?• Better to do something than nothing at all
– Empirical methods (and focus on evaluation): Speech Language• 2010s: Revival of RationalismRationalism (?)
Consistent progress?
• Periodic signals are continuous• Support extrapolation/prediction• Progress? Consistent progress?
Extrapolation/Prediction: Applicable?
Eurospeech 2003 31
Speech Language Has the pendulum
swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?
– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that
– We are no longer training students for the possibility• that the pendulum might swing the other way
• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory
• History repeats itself: Mark Twain; bad idea then and still a bad idea now– 1950s: empiricism– 1970s: rationalism (empiricist methodology became too burdensome)– 1990s: empiricism– 2010s: rationalism (empiricist methodology is burdensome, again)
Eurospeech 2003 32
Speech Language Has the pendulum
swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?
– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that
– We are no longer training students for the possibility• that the pendulum might swing the other way
• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory
• History repeats itself: Mark Twain; bad idea then and still a bad idea now– 1950s: empiricism– 1970s: rationalism (empiricist methodology became too burdensome)– 1990s: empiricism– 2010s: rationalism (empiricist methodology is burdensome, again)
Plays well at Machine
Translation conferences
Eurospeech 2003 33
Speech Language Has the pendulum
swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?
– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that
– We are no longer training students for the possibility• that the pendulum might swing the other way
• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory
• History repeats itself: Mark Twain; bad idea then and still a bad idea now– 1950s: empiricism– 1970s: rationalism (empiricist methodology became too burdensome)– 1990s: empiricism– 2010s: rationalism (empiricist methodology is burdensome, again)
Plays well at Machine
Translation conferences
Eurospeech 2003 34
Speech Language Has the pendulum
swung too far?• What happened between TMI-1992 and TMI-2002 (if anything)?• Have empirical methods become too popular?
– Has too much happened since TMI-1992?• I worry that the pendulum has swung so far that
– We are no longer training students for the possibility• that the pendulum might swing the other way
• We ought to be preparing students with a broad education including:– Statistics and Machine Learning– as well as Linguistic Theory
• History repeats itself:– 1950s: empiricismempiricism– 1970s: rationalismrationalism (empiricist methodology became too burdensome)– 1990s: empiricismempiricism– 2010s: rationalismrationalism (empiricist methodology is burdensome, again)
Plays well at Machine
Translation conferences
Mark Twain; bad idea then and still a bad idea now
Eurospeech 2003 35
Rationalism Empiricism
Well-known advocates Chomsky, Minsky
Shannon, Skinner, Firth, Harris
Model Competence Model Noisy Channel Model
Contexts of Interest Phrase-Structure N-Grams
Goals
All and OnlyMinimize Prediction Error
(Entropy)
Explanatory Descriptive
Theoretical Applied
Linguistic Generalizations
Agreement & Wh-movement
Collocations & Word Associations
Parsing StrategiesPrinciple-Based,
CKY (Chart), ATNs, Unification
Forward-Backward (HMMs), Inside-outside (PCFGs)
Applications
Understanding Recognition
Who did what to whom
Noisy Channel Applications
Eurospeech 2003 36
Where have we been and where are we going?
1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate
2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)
Discontinuities: Fundamental changes that invalidate fundamental assumptions
• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation
Eurospeech 2003 37
Meeting Demand for PetabytesBet: Speech >> Text
(because we aren’t going to solve all “speech” problems)
• Moore’s Law More and More Supply– Disks, Memory, Network Bandwidth, everything…– Petabytes are coming: $2,000,000 (today) $2,000 (in 10 years)
• Can demand keep up?– If not, revenues will collapse tech meltdown– Much worse than the Dot-Bomb…
• Ans1: no problem– Demand has always kept up– Pundits have never been able to explain why
• Thomas J. Watson (1943): I think there is a world market for maybe five computers
– But if you build it, they will come• Ans2: big problem (prices for PCs & Networks are collapsing)
– Demand is everything– Anyone (even a dot-com) can build a network,– But the challenge is to sell it– Need a kill app (more minutes on the network)
Discontinuity
Eurospeech 2003 38
Meeting Demand for PetabytesBet: Speech >> Text
(because we aren’t going to solve all “speech” problems)
• Moore’s Law More and More Supply– Disks, Memory, Network Bandwidth, everything…– Petabytes are coming: $2,000,000 (today) $2,000 (in 10 years)
• Can demand keep up?– If not, revenues will collapse tech meltdown– Much worse than the Dot-Bomb…
Discontinuity
Eurospeech 2003 39
Meeting Demand for PetabytesBet: Speech >> Text
(because we aren’t going to solve all “speech” problems)
• Moore’s Law More and More Supply– Disks, Memory, Network Bandwidth, everything…– Petabytes are coming: $2,000,000 (today) $2,000 (in 10 years)
• Can demand keep up?– If not, revenues will collapse tech meltdown– Much worse than the Dot-Bomb…
• Ans1: no problem– Demand has always kept up– Pundits have never been able to explain why
• Thomas J. Watson (1943): I think there is a world market for maybe five computers www.wikipedia.org/wiki/Thomas+J.+Watson
– But if you build it, they will come• Ans2: big problem (prices for PCs & Networks are collapsing)
– Demand is everything– Anyone (even a dot-com) can build a network,– But the challenge is to sell it– Need a killer app (more minutes on the network)
Discontinuity
Eurospeech 2003 40
How much is a Petabyte?(1015 bytes)
• Question from execs:– How do I explain to a lay audience
• How much is a petabyte• And why everyone will buy lots of them
• Wrong answer: – 106 is a million (a floppy disk/email msg)– 109 is a billion (a billion here, a billion there…)– 1012 is a trillion (the US debt)– 1015 is a zillion (= , an unimaginably large #)
Eurospeech 2003 41
How much is a Petabyte?(1015 bytes)
• Question from execs:– How do I explain to a lay audience
• How much is a petabyte• And why everyone will buy lots of them
• Wrong answer: – 106 is a million (a floppy disk/email msg)– 109 is a billion (a billion here, a billion there…)– 1012 is a trillion (the US debt)– 1015 is a zillion (= , an unimaginably large #)
Eurospeech 2003 42
How much is a Petabyte?Some more wrong answers
• Goal: create demand for a petabyte/lifetime– ≈ 1015 bytes/100 years ≈ 18 megabytes/minute– Text: 18,000 pages/min– Speech: 317 telephone channels for 100 years per capita
• Text won’t do it– Speech probably won’t either, but it is closer– DVD video will (1.8 gigabytes/hour = 1.6 petabytes/lifetime), but
• Too much opportunity for compression• Not enough demand for Picture Phone (privacy concerns)
• Bank on speech recognition not working too well– Can’t afford big improvements in compression:
• Speech rates Text ratesFortunately, that
won’t happen
Eurospeech 2003 43
New Research Challenges
• New Priorities– Increase demand for
space >> Data entry• New Killer Apps
– Search >> Dictation• Speech Google!
– Data mining
• Old Priorities– Dictation application dates
back to days of dictation machines
– Speech recognition has not displaced typing
• Speech recognition has improved
• But typing skills have improved even more
– My son will learn typing in 1st grade
– Sec rarely take dictation
– Dictation machines are history• My son may never see one• Museums have slide rulers
and steam trains– But dictation machines?
Eurospeech 2003 44
Data Mining & Call Centers: An Intelligence Bonanza
• Some companies are collecting information with technology designed to monitor incoming calls for service quality.
• Last summer, Continental Airlines Inc. installed software from Witness Systems Inc. to monitor the 5,200 agents in its four reservation centers.
• But the Houston airline quickly realized that the system, which records customer phone calls and information on the responding agent's computer screen, also was an intelligence bonanza, says André Harris, reservations training and quality-assurance director.
Eurospeech 2003 46
Personal 100 GB todayThe Personal Petabyte (someday)
• It’s coming (2M$ today…2K$ in 10 years)
• Today the pack rats have ~ 10-100GB– 1-10 GB in text (eMail, PDF, PPT, OCR…)– 10GB – 50GB tiff, mpeg, jpeg,…– Some have 1TB (voice + video).
• Video can drive it to 1PB.
• Online PB affordable in 10 years.
• Get ready: tools to capture, manage, organize, search, display will be big app.
BorrowedSlide
Text won’t do it;Speech won’t either
Eurospeech 2003 47
300 TB (cooked)Hotmail / Yahoo
• Clone front ends ~10,000@hotmail.
• Application servers– ~100 @ hotmail – Get mail box– Get/put mail– Disk bound
• ~30,000 disks
• ~ 20 admins
BorrowedSlide
Cost of storage: People
Per Capita Demand: Tiny
Eurospeech 2003 48
AOL (msn)(1PB?)
• 10 B transactions per day (10% of that)
• Huge storage
• Huge traffic
• Lots of eye candy
• DB used for security/accounting.
• GUESS AOL is a petabyte – (40M x 10MB = 400 x 1012)
BorrowedSlide
Per Capita Demand: Tiny
Eurospeech 2003 49
Google1.5PB as of last spring
• 8,000 no-name PCs– Each 1/3U, 2 x 80 GB disk, 2
cpu 256MB ram
• 1.4 PB online.• 2 TB ram online• 8 TeraOps • Slice-price is 1K$ so 8M$.• 15 admins (!) (== 1/100TB).
BorrowedSlide
Per Capita Demand: Tiny
2001
Cost of storage: People
Eurospeech 2003 50
Digital Immortality:Gordon Bell & Jim Gray (2000)
Estimated Lifetime Storage Requirements
Data-types Per day Per Lifetime
email, papers, text 0.5 MB 15 GB
photos 2 MB 150 GB
speech 40 MB 1.2 TB
music 60 MB 5.0 TB
video-lite (200 Kb/s) 1 GB 100 TB
DVD video (4.3 Mb/s = 1.8 GB/hour) 20 GB 1 PB
Eurospeech 2003 51
Future of Tech Industry Depends On…
• Supply running into a (physical) limit – Moore’s Law breaking down– And little progress on compression
• Demand keeping up – If we build it, they will come…
• Bell & Gray underestimating demand by a lot– Everyone wanting lots and lots of speech– Everyone wanting lots of video– A miracle (the fat lady might sing…)
– Big progress on searching speech & videoBest Bet!
Not Likely
Not Likely
Not Optimistic
Eurospeech 2003 52
Bait and Switch Strategywww.elsnet.org
• Bait: public Internet– Large, sexy, available, rich hypertext structure
• Switch: as large as the web is– There are larger & more valuable private repositories
• Private Intranets & telephone networks– Exclusivity Value
• No one cares about data that everyone can have• Just as Groucho Marx doesn’t want to be in a club that…
• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories
Eurospeech 2003 53
Bait and Switch Strategywww.elsnet.org
• Bait: public Internet– Large, sexy, available, rich hypertext structure
• Switch: as large as the web is– There are larger & more valuable private repositories
• Private Intranets & telephone networks– Exclusivity Value
• No one cares about data that everyone can have• Just as Groucho Marx doesn’t want to be in a club that…
• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories
Eurospeech 2003 54
Bait and Switch Strategywww.elsnet.org
• Bait: public Internet– Large, sexy, available, rich hypertext structure
• Switch: as large as the web is– There are larger & more valuable private repositories
• Private Intranets & telephone networks– Exclusivity Value
• No one cares about data that everyone can have• Just as Groucho Marx doesn’t want to be in a club that…
• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories
Eurospeech 2003 55
Switch: How Large is Large?
• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC
• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words
1 TB (ngram freqs) or 1 PB (Gray)?
Eurospeech 2003 56
Switch: How Large is Large?
• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC
• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words
• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:
www.lexisnexis.com• Private Intranets and telephone networks >> Public Web
– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day
– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice
– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day
1 TB (ngram freqs) or 1 PB (Gray)?
Eurospeech 2003 57
Switch: How Large is Large?
• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC
• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words
• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:
www.lexisnexis.com• Private Intranets and telephone networks >> Public Web
– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day
– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice
– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day
1 TB (ngram freqs) or 1 PB (Gray)?
Eurospeech 2003 58
Switch: How Large is Large?
• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC
• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words
• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:
www.lexisnexis.com• Private Intranets and telephone networks >> Public Web
– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day
– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice
– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day
1 TB (ngram freqs) or 1 PB (Gray)?
Eurospeech 2003 59
Switch: How Large is Large?
• Web Renewed Excitement– Large, rich hypertext structure & publicly available– Ngram freqs Google = 1000 * BNC
• Google: 100 Billion Words • British National Corpus (BNC): 100 Million Words
• It is often said that the web is the largest repository but…– Changes to copyright laws could unlock vast resources:
www.lexisnexis.com• Private Intranets and telephone networks >> Public Web
– American Telephone Network (FCC): 1 line/person• Usage: 1 hour/day/line• Assume 1 sec ≈ 1 word 10 Google collections/day
– Currently, Intranets (data) ≈ telephones (voice)• But data is growing faster than voice
– AT&T networks: 1 PB/day• Worldwide networks: tens of PB/day
1 TB (ngram freqs) or 1 PB (Gray)?
A lot of speech, but notPB per capita
Eurospeech 2003 60
Privacy Concerns: Private Data is Private(Exclusivity Value)
• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded
• let alone distributed
• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one
• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of
previous calls based on content• New capabilities new public policy
– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)
Eurospeech 2003 61
Privacy Concerns: Private Data is Private(Exclusivity Value)
• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded
• let alone distributed
• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one
• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of
previous calls based on content• New capabilities new public policy
– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)
Eurospeech 2003 62
Privacy Concerns: Private Data is Private(Exclusivity Value)
• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded
• let alone distributed
• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one
• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of
previous calls based on content• New capabilities new public policy
– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)
Eurospeech 2003 63
Privacy Concerns: Private Data is Private(Exclusivity Value)
• Data on private intranets cannot be distributed– And most telephone conversations cannot even be recorded
• let alone distributed
• But attitudes are changing– It used to be considered rude to have an answering machine– Now it is considered rude not to have one
• Between answering machines and call centers, perhaps 10% of telephone traffic can be recorded (≈ 1 PB/day)– Customer expectation: call centers can retrieve recordings of
previous calls based on content• New capabilities new public policy
– Video recording: • Expected in banks (ATMs)• Prohibited in rest rooms (except children’s YMCA locker room)
Eurospeech 2003 64
In the past, recording all this data would have been prohibitively expensive
• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time
• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech
• If I am willing to pay for a call– I might as well keep the speech online forever
• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page
• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?
– Web caches crawlers• Go find the pages that I might ask for and keep them forever
• Storage is cheap (compared to transport)
Eurospeech 2003 65
In the past, recording all this data would have been prohibitively expensive
• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time
• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech
• If I am willing to pay for a call– I might as well keep the speech online forever
• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page
• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?
– Web caches crawlers• Go find the pages that I might ask for and keep them forever
• Storage is cheap (compared to transport)
Eurospeech 2003 66
In the past, recording all this data would have been prohibitively expensive
• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time
• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech
• If I am willing to pay for a call– I might as well keep the speech online forever
• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page
• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?
– Web caches crawlers• Go find the pages that I might ask for and keep them forever
• Storage is cheap (compared to transport)
Eurospeech 2003 67
In the past, recording all this data would have been prohibitively expensive
• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time
• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech
• If I am willing to pay for a call– I might as well keep the speech online forever
• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page
• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?
– Web caches crawlers• Go find the pages that I might ask for and keep them forever
• Storage is cheap (compared to transport)
Eurospeech 2003 68
In the past, recording all this data would have been prohibitively expensive
• Thanks to Moore’s Law– Storage costs have been falling faster than transport– And will continue to do so for some time
• Even at current prices, transport >> storage– Transport: Long-distance telephone calls: 5 cents per minute of speech – Storage: Disk space: ½ cent per minute of speech
• If I am willing to pay for a call– I might as well keep the speech online forever
• Similar comments hold for data (web pages)– If I am willing to pay to fetch a web page
• I might as well cache it for a long time• Why flush a page if there is any chance that it might be requested again?
– Web caches crawlers• Go find the pages that I might ask for and keep them forever
• Storage is cheap (compared to transport)
Eurospeech 2003 69
Bait: Use Web to Establish Excitement: More data is better data
• Shocking at TMI-1992 (Bob Mercer)– but less so a decade later (Eric Brill)– Many researchers are finding that performance improves with corpus
size, over full range of sizes that are available.• EMNLP-2002 Best paper (& CL): Using the Web to Overcome
Data Sparseness, Keller et al– For many tasks:
– Language modelling– Predicting psycholinguistic judgements
• Larger corpora (100B Google) >> Smaller corpora (100M BNC)– Collecting more data is better than tricks for not collecting data
• Smoothing, balance, etc.• Tricks have limited power:
– Collecting xx data with tricks ≈ collecting 10xx data without tricks• Wish list: more papers measuring power of various tricks
– Was balancing BNC (British National Corpus) worth the effort?• Should a corpus be balanced? (Oxford Debate, 1991)
• The rising tide of data will lift all boats!1. TREC Question Answering2. Collocations:
My spin
Google is displacing BNCjust as PCs displaced Crays
Still find papers on “tiny” corpora
Larg
er m
arke
t sha
re
M
ore
$$ fo
r R
&D
B
ette
r M
oore
’s L
aw T
ime
Con
stan
t
Eurospeech 2003 70
Bait: Use Web to Establish Excitement: More data is better data
• Shocking at TMI-1992 (Bob Mercer)– but less so a decade later (Eric Brill)– Many researchers are finding that performance improves with corpus
size, over full range of sizes that are available.• EMNLP-2002 Best paper (& CL): Using the Web to Overcome
Data Sparseness, Keller et al– For many tasks:
– Language modelling– Predicting psycholinguistic judgements
• Larger corpora (100B Google) >> Smaller corpora (100M BNC)– Collecting more data is better than tricks for not collecting data
• Smoothing, balance, etc.• Tricks have limited power:
– Collecting xx data with tricks ≈ collecting 10xx data without tricks• Wish list: more papers measuring power of various tricks
– Was balancing BNC (British National Corpus) worth the effort?• Should a corpus be balanced? (Oxford Debate, 1991)
• The rising tide of data will lift all boats!1. TREC Question Answering2. Collocations: http://labs1.google.com/sets
My spin
Google is displacing BNCjust as PCs displaced Crays
Still find papers on “tiny” corpora
Larg
er m
arke
t sha
re
M
ore
$$ fo
r R
&D
B
ette
r M
oore
’s L
aw T
ime
Con
stan
t
Eurospeech 2003 71
The rising tide of data will lift all boats!TREC Question Answering & Google:
What is the highest point on Earth?
Eurospeech 2003 72
The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:
Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets
Cat cat England Japan
Dog more France China
Horse
Fish
Bird
Rabbit
Cattle
Rat
Livestock
Mouse
Human
Eurospeech 2003 73
The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:
Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets
Cat cat England Japan
Dog more France China
Horse ls
Fish rm
Bird mv
Rabbit cd
Cattle cp
Rat mkdir
Livestock man
Mouse tail
Human pwd
Eurospeech 2003 74
The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:
Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets
Cat cat England Japan
Dog more France China
Horse ls Germany
Fish rm Italy
Bird mv Ireland
Rabbit cd Spain
Cattle cp Scotland
Rat mkdir Belgium
Livestock man Canada
Mouse tail Austria
Human pwd Australia
Eurospeech 2003 75
The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:
Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets
Cat cat England Japan
Dog more France China
Horse ls Germany India
Fish rm Italy Indonesia
Bird mv Ireland Malaysia
Rabbit cd Spain Korea
Cattle cp Scotland Taiwan
Rat mkdir Belgium Thailand
Livestock man Canada Singapore
Mouse tail Austria Australia
Human pwd Australia Bangladesh
Eurospeech 2003 76
Rising Tide of Data Lifts all Boats
• More data better results – TREC Question Answering
• Remarkable performance: Google and not much else
– Norvig (ACL-02)– AskMSR (SIGIR-02)
– Lexical Acquisition• Google Sets
– We tried similar things» but with tiny corpora» which we called large
SwitchSwitch: port these ideas to private repositories
BaitBait: use public web to create & socialize new ideas
Eurospeech 2003 77
RecommendationsBait and Switch Strategy
• Strategy: Use the public Intranet to develop, test and socialize new ways to extract value from large linguistic repositories– Value to society: Port solutions to private repositories
• Research papers:– Keep up the good work!– There is already considerable interest in evaluation of new ideas
on corpora (public repositories)– There will be more interest in
• How well methods port to new corpora• How well performance scales with size
– Hopefully corpus size helps• But of course, all the data in the world
– Will not solve all the world’s problems– Need to understand when more data will help
• And when it is better to do something else– Revival of RationalismRationalism (Linguistics)
Switch
Bait
Eurospeech 2003 78
More RecommendationsBait and Switch Strategy
• Infrastructure– In addition to traditional public repositories (large)
• Web data, data collection efforts such as LDC– We ought to think more about private repositories (even larger)
• Most of us do not keep voice mail for long– But I have been using Scanmail to copy my voice mail to email– And like many, I keep email online for a long time
• Private repositories would be much larger if– It was more convenient to capture private data– and there was obvious value in doing so.
• Currently, tools for public repositories (e.g., Google)– are better than comparable tools for private data (e.g., searching email)
• Better search tools (email, speech & video) Larger private repositories
• New priorities (consume space) new killer apps– Search (consumes space) >> Dictation (data entry) & Compression
Switch
Bait
Eurospeech 2003 79
Summary:Where have we been and where are we going?
• 1970s: Hot debate: knowledge v. data intensive methods– People think about what they can afford to think about– Data was expensive
• Only the richest industrial labs could play• Beyond the reach of most universities• Victor Zue dreams of having an hour of speech online (with annotations)
• 1990s: Revival of Empiricism: More data is better data!– Everyone can afford to play (but still expensive)– Linguistic Data Consortium (LDC) Web– Evaluation, evaluation, evaluation demonstrates consistent progress
over time, but not as convincingly as Moore’s Law– Data intensive: method of choice
• Pendulum swings (too) far• Is this progress, or is the pendulum about to swing back the other way?
• 2010s: Petabytes everywhere (be careful what you ask for)– Big problem: Supply >> Demand tech meltdown (??)– No problem: Demand has always kept up new killer apps
• Search (consumes space) >> dictation (data entry) & compression• Video >> Speech >> Text
Demonstrate consistentprogressover time
Oscillations
Discontinuities
More realistic expectations
Don’t see how to consume PB per capita
Eurospeech 2003 80
Where have we been and where are we going?
1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate• Time constant limited by: physics and/or R&D investment
2. History repeats itself: • Mark Twain; bad idea then and still a bad idea now
• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)
3. Discontinuities:• Fundamental changes that invalidate fundamental assumptions
• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: data entry create demand for petabytes
– New Killer Apps: Search (creates demand) >> Compression & Dictation
Eurospeech 2003 82
Speech Language
Shannon’s: Noisy Channel Model
• I Noisy Channel O
• I΄ ≈ ARGMAXI Pr(I|O) = ARGMAXI Pr(I) Pr(O|I)Language Model
Word Rank More likely alternatives
We 9The This One Two A Three
Please In
need 7 are will the would also do
to 1
resolve 85 have know do…
all 9The This One Two A Three
Please In
of 2The This One Two A Three
Please In
the 1
important 657 document question first…
issues 14 thing point to
Channel Model
Application Input Output
Speech Recognition writer rider
OCR (Optical Character Recognition)
all a1l
Spelling Correction government goverment
ChannelModel
LanguageModel
ApplicationIndependent
Eurospeech 2003 83
Speech Language Using (Abusing) Shannon’s Noisy Channel Model: Part of Speech Tagging and Machine Translation
• Speech– Words Noisy Channel Acoustics
• OCR– Words Noisy Channel Optics
• Spelling Correction– Words Noisy Channel Typos
• Part of Speech Tagging (POS): – POS Noisy Channel Words
• Machine Translation: “Made in America”– English Noisy Channel French
Eurospeech 2003 84
I am going to try to avoid making predictions like these because…
• Too falsifiable• Appearance of conflicts of interest
– Sound like you are trying to raise money for your favorite stuff
• Committees do what committees do– Union of all (represented) positions = no position– Advocate what the members are currently working on
• Rarely establish new strategic direction
• Boring (too obviously correct)
Eurospeech 2003 85
Predictions: Where are we going? Change the subject (engage in meta discussion)
• Set unrealistic expectations (plenty of examples)– Sound like you are trying to raise money for your favorite stuff
• And that you have lost touch with reality• Come up short (fewer examples)• Sound like you’re over the hill (old fogies session at Coling)
– Kids these days don’t get it– Everyone should still be working on
• what we thought was important when we were kids– Dress up old-style thinking (empiricism/rationalism)
• with current fashion (web) Meta discussion: consistent progress, history repeating itself, discontinuities
Come up with a new angle: bounds• Lower bound: we will solve such and such (x)
– Extrapolations based on Moore’s Law• Upper bound: we won’t solve x (soon/ever)
– e.g., pass Turing Test, compress speech down to text rates– And you can bank on it good apps based on assumption x can’t be done
Eurospeech 2003 86
Breaking Through Automation Barriers
Illustrative
Complexity of Services
Com
plex
ity
of U
ser
Inte
ract
ion
TraditionalIVR
Word Spotting
Agents
AdvancedASR
Natural Language Dialog
Exten
t of A
utom
atio
n
BorrowedSlide
Eurospeech 2003 87
Past, Present, Future….
MATCH: Multimodal Access To City Help
Keyword spottingHandcrafted grammars No dialogue
Directory AssistanceVRCP
• Constrained speech• Minimal data collection• Manual design
Medium size ASRHandcrafted Grammars System Initiative
Airline reservationBanking
• Constrained speech• Moderate data collection• Some automation
Large size ASR Limited NLU Mixed-initiative
Call centers, E-commerce
• Spontaneous speech• Extensive data collection• Semi-automation
1990+
Unlimited ASR Deeper NLU Adaptive systems
Multimodal, MultilingualHelp Desks, E-commerce
• Spontaneous speech/pen• Fully automated systems
1995+
2000+
2005+
BorrowedSlide
Eurospeech 2003 88
Example of Upper Bound:Reverse Turing Test
(Kochanski et al., ICSLP-2002)
• Assume: won’t pass Turing Test (any time soon)• Assumptions you can bank on
– Liberace: cry all the way to the bank• Good apps for crummy (limited) technology
– “Good Applications for Crummy Machine Translation” • Church & Hovy (1993)
• Reverse Turing Test– Owner of web site wants to grant access to people but not to spiders– Task: distinguish friend from foe, man from beast– Solution: assume there are a class of problems (AI-complete) that any
person can do and no machine can.• Currently deployed Reverse Turing Applications
– Assume OCR is AI-complete– User is given a degraded image and asked to enter text into a form– Easy for people but challenging for machines
• Problem: OCR is not challenging enough for machines• Proposal: Speech recognition with noise is more challenging
– We can bank on not solving the cocktail party effect any time soon
Eurospeech 2003 89
Where have we been and where are we going?
1. Consistent progress over decades• Moore’s Law, Speech Coding, Error Rate
2. History repeats itself• Empiricism: 1950s• Rationalism: 1970s• Empiricism: 1990s• Rationalism: 2010s (?)
Discontinuities: Fundamental changes that invalidate fundamental assumptions
• Petabytes: $2,000,000 $2,000• Can demand keep up with supply?• If not Tech meltdown• New priorities: Search >> Compression & Dictation
Eurospeech 2003 90
Statistical MT:IBM Models 1-5
• E Noisy Channel F• E΄ = ARGMAXE Pr(E) Pr(F|E)• Language Model, Pr(E):
– Trigram model (borrowed from speech recog)• Channel Model, Pr(F|E):
– Based on aligned parallel corpora– Models 1-5: alignment
• Mercer & Church (Computational Linguistics, 1993)– Statistical MT may fail for reasons advanced by Chomsky– Regardless of its ultimate success or failure,– There is a growing community of researchers in corpus-based
linguistics who believe it will produce valuable lexical resources• Bilingual concordances• Translation tools• Training & testing material for word sense disambig (senseval)
Eurospeech 2003 91
Word Sense Disambiguation
• Knowledge Acquisition Bottleneck– Bar-Hillel (1960)– Expert systems don’t scale– Sense-tagged text: expensive– Parallel text!
• Translation = sense-tagged text– Sentence (judicial sense) peine– Sentence (syntactic sense) phrase
• Yarowsky: bilingual monolingual• One sense per discourse• Machine Learning: early example of co-training (EM alg)
Eurospeech 2003 92
TMI-02 Keynote (similar subject)The organizers asked me…
• What's changed since TMI-92 (if anything)?– TMI-92: great excitement over the use of aligned parallel corpora to help
human translators (translation tools)– Also, much controversy over IBM Models 1-5
• Have IBM Models 1-5 failed to solve all the world’s problems?• So what's happened (if anything) since 1992?
– Empiricism has come of age• Textbooks: Charniak, Jelinek, Manning & Schultze, Jurafsky & Martin• Textbooks courses in many universities around the world
– What used to be considered radical is now accepted practice• Evaluation is practically required for publication
– Mercer’s fighting words: More data is better data!• Aren’t as shocking when Brill makes the case a decade later
– The new field of Machine Learning has absorbed many good (and formally controversial) ideas including
• IBM Models 1-5• Yarowsky's Word Sense Disambiguation
– Grew out of Machine Translation,– But is now widely cited in Machine Learning as an early example of co-training
Eurospeech 2003 93
What has happened to the IBM-Approach to Machine Translation?
• Support for human translators – Terminology: translators don’t need help with the easy
vocabulary and the easy grammar– Translation Memory: translators are often asked to translate
the same material again and again (e.g., revisions of manuals)– Alignment
• Fully automatic– CLIR: cross-language information retrieval– Translating web pages
• Academic fields– Machine Learning: most important contributionmost important contribution– Corpus-based Lexicography: spreading into lots of other fields
Eurospeech 2003 94
Revival of Empiricism:A Personal Perspective
• As a student at MIT, I was solidly opposed to empiricism– But that changed soon after moving to AT&T Bell Labs (1983)
• Letter-to-Sound Rules (speech synthesis)– Names: Letter stats Etymology Pronunciation video– NetTalk: Neural Nets video
• Demo: great theater unrealistic expectations • Self-organizing systems v. empiricism• Machine Learning v. Corpus-based Linguistics• I did it, I did it, I did it, but…
• Part of Speech Tagging (1988)• Word Associations (Hanks)
– Mutual info collocations & word associations• Collocations: Strong tea v. powerful computers• Word Associations: bread and butter, doctor/nurse
• Good-Turing Smoothing (Gale)• Aligning Parallel Corpora (inspired by MT)• Word Sense Disambiguation
– Bilingual Monolingual• Even if IBM’s approach fails for MT lasting benefit (tools, linguistic
resources, academic contributions to machine learning)
Eurospeech 2003 95
Speech Coding
(Telephony)
• More complicated than Moore’s Law– Many Dimensions: Bit Rate, Quality, Complexity and Delay– Quality ceiling (imposed by telephone standards)
• Easy to reach the ceiling at high bit rates (≥ 8 kb/s)• More room for progress at low bit rates (≤ 8 kb/s)
• Moore’s Law Time Constant– Bit rates half every decade (≤ 8 kb/s)– Relatively slow by Moore’s Law standards (not hyper-inflation)
• Performance doubles every decade• Like disk seek or money in the bank (normal inflation)
– Limited more by physics than investment• Potential compression opportunity
– At most 10x: 8 kb/s 2 kb/s 1 kb/s (?)• Speech (2 kb/s) >> text (2 bits/char): 100-1000 times more bits
– Speech coding will not close this gap for foreseeable future
Ceiling