the unreasonable effectiveness of data
Post on 24-Feb-2016
60 Views
Preview:
DESCRIPTION
TRANSCRIPT
The Unreasonable Effectiveness of Data
Alon Halevy, Peter Norvig and Fernando PereiraGoogle
2011. 10. 24Eun-Sol Kim
• The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.
- Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences
• Essentially, all models are wrong but some are useful
- George Box
Two approaches to AI• GOFAI ( Good Old-Fashioned Artificial
Intelligence )– Based on Logic– Symbolic AI
• SML ( Statistical Machine Learning )– Based on empirical data ( sensor data or databases )– Inductive inference based on data, generalize data to
rules, predict on future data
Learning from Text at Web Scale• Brown Corpus– 1 Million English
words– Complete sen-
tences, no spelling errors, no gram-matical errors
• Google a trillion-word corpus– 100 time larger
than Brown corpus– Frequency counts
for all sequences up to 5 words long.
Some lessons of web-scale learning
1. Use available large-scale data rather than annotated data
– We can find useful semantic relationships au-tomatically from the statistics of search queries and the corresponding results or from the ac-cumulated evidence of web-based text pat-terns without annotated data.
2. Memorization is a good policy
- Memorizing specific phrases is more effective than general patterns.
- Machine translation example : Large memo-rized phrase tables that give candidate map-pings between specific source- and target-lan-guage phrases.
- For many tasks, words and word combinations provide all the representational machinery we need to learn from text.
Conventional two approaches to NLP
• Deep approach– Hand-coded grammars and ontologies– Complex networks of relations
• Statistical approach– Learning n-gram statistics from large
corpora
New approaches to NLP• Combination of two conventional ap-
proaches– Statistical relational learning• Represent relations between objects with
rule ( first-order-logic)• Model built by statistical learning
Semantic interpretation• Semantic web– A convention for formal representation lan-
guages that lets software services interact with each other
• Semantic interpretation– Imprecise, ambiguous natural languages.– Embodied in human cognitive and cultural pro-
cesses whereby linguistic expression elicits ex-pected responses and expected changes in cog-nitive states
The challenges for achieving accurate semantic interpretation• Interpreting the content
– methods to infer relationships between column headers or mentions of entities in the world.
• Web-scale data might be an important part of the solution.– Hundreds of millions of independently created tables.– Tables represent structured data– With table, we can resolve semantic heterogeneity.
top related