alberta council of technologies, “when machines decide” november 3, 2010 essential concepts of...
Post on 22-Dec-2015
213 views
TRANSCRIPT
Alberta Council of Technologies, “When machines decide” November 3, 2010
Essential concepts of machine learning: (really) not the whole story
Randy [email protected]
Alberta Council of Technologies, “When machines decide” November 3, 2010
Co-conspirators
Alberta Ingenuity Centre for Machine Learning
www.aicml.ca
Alberta Council of Technologies, “When machines decide” November 3, 2010
Overview
• Foundational Ideas• Machine Learning Components• Learning Architectures• Summary
Alberta Council of Technologies, “When machines decide” November 3, 2010
Déjà vu
“One system ... universal, interdependent, intercommunicating, like the highway system of the country, extending from every door to every other door, affording electrical communication of every kind, from every one at every place to every one at every other place.”
— Theodore Vail, President, AT&T, 1907
Alberta Council of Technologies, “When machines decide” November 3, 2010
Déjà vu
“...In addition to the technology and the system, the success of the telephone required people to think about communications in a new way. The telephone was at first a scientific curiosity, without obvious business use.”
— Steven Lubar, Infoculture: The Smithsonian
Book of Information Age Inventions
Alberta Council of Technologies, “When machines decide” November 3, 2010
What computers can’t do?
• Change a baby’s diaper• Identify ideological bias• “Give me 2-3 video clips of Steven
Harper contradicting himself…”
Alberta Council of Technologies, “When machines decide” November 3, 2010
What humans can’t do?
• Manage large data volumes• Manage transaction time frames• Prioritize possible hypotheses/trends on
data
Alberta Council of Technologies, “When machines decide” November 3, 2010
ML for data analytics
• Google search crawler uses 850 TB of information
• Analytics uses 220 TB stored in two tables: 200 TB for the raw data and 20 TB for the summaries.
• Google Earth uses 70.5 TB: 70 TB for the raw imagery and 500 GB for the index data.
From http://zonixsoft.wordpress.com/2008/06/23/how-big-is-googles-database/
Alberta Council of Technologies, “When machines decide” November 3, 2010
ML for data analytics
• Volume of data being accumulated• No simple precise definition of analytics
goals• Managing models for identification of
trends• AI closes the loop: taking action based on
trends
Alberta Council of Technologies, “When machines decide” November 3, 2010
Foundational Ideas
• Algorithmic versus procedural• Heuristic programming: compiling in
knowledge• Machine Learning: when there is too
much data
Alberta Council of Technologies, “When machines decide” November 3, 2010
...
...
...
Constructive search
• systematic search for a completed solution, in a sparsely populated space (constructive search)
• classical AI weak method
• best solutions exploit “intelligence” of what is the best next piece of a partial solution
Alberta Council of Technologies, “When machines decide” November 3, 2010
Iterative Improvement
• systematic search for improvements, in a densely populated space (iterative search)
• neo-classical AI weak method• best solutions exploit
“intelligence” of where to look for better solutions
Alberta Council of Technologies, “When machines decide” November 3, 2010
If then
If then
. . .
Representation
...
...
...
Alberta Council of Technologies, “When machines decide” November 3, 2010
Modeling, Prediction, Control
Modeling
Control
Prediction
Alberta Council of Technologies, “When machines decide” November 3, 2010
Technology Push vs. Market Pull
• Push: AI & ML can accomplish anything
• Pull: business models should shape priorities for what to accomplish
287kg/632lbs
Alberta Council of Technologies, “When machines decide” November 3, 2010
Machine Learning (ML)
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
Machine Learning (ML)
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
Data Compression
• average is one of the simplest forms of compression (aggregation, generalization)
• “lossy” in that the original cannot be recomputed
{1, 2, 3, 4, 5}
3
Alberta Council of Technologies, “When machines decide” November 3, 2010
Data Compression
• Lossy compression (e.g., JPEG)
From http://www.widearea.co.uk/designer/ducks.html
Alberta Council of Technologies, “When machines decide” November 3, 2010
Data Compression
Primary Structure (sequence of amino acids) MVKQIESKTA FQEALDAAGD KLVVVDFSAT WCGPCKMIKP FFHSLSEKYS NVIFLEVDVD DCQDVASECE VKCMPTFQFF KKGQKVGEFS GANKEKLEAT INELV
Secondary Structure (alpha Helix, Beta strand, random Coil) CBBBBCCHHH HHHHHHHCCC CBBBBBBBCC CCHHHHHHHH HHHHHHHHCC CBBBBBBBCC CCHHHHHHCC CCCCCBBBBB BCCBBBBBBB CCCHHHHHHH HHHCC
Alberta Council of Technologies, “When machines decide” November 3, 2010
Machine Learning (ML)
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
Vocabulary Bias
• The language in which ML abstraction is rendered
• Control policy• E.g., robot control program
• Classifier/Classification structure• E.g., medical ontologies
• Inductive hypotheses• E.g., biological hypotheses
Alberta Council of Technologies, “When machines decide” November 3, 2010
Data Compression
Primary Structure (sequence of amino acids) MVKQIESKTA FQEALDAAGD KLVVVDFSAT WCGPCKMIKP FFHSLSEKYS NVIFLEVDVD DCQDVASECE VKCMPTFQFF KKGQKVGEFS GANKEKLEAT INELV
Secondary Structure (alpha Helix, Beta strand, random Coil) CBBBBCCHHH HHHHHHHCCC CBBBBBBBCC CCHHHHHHHH HHHHHHHHCC CBBBBBBBCC CCHHHHHHCC CCCCCBBBBB BCCBBBBBBB CCCHHHHHHH HHHCC
Alberta Council of Technologies, “When machines decide” November 3, 2010
Vocabulary bias: Healthlink
Example
CHIEF COMPLAINT/QUESTION:fever,sore behind her ears, sore throat
PRIORITY SYMPTOMS:39 C 1 hr ago, denied Sob, denied chest pain, denied rash
When began? started while still in Mexico on Wednesday April 22. St
How is child now? She has hx of asthma; now coughing fits- no vom
Alberta Council of Technologies, “When machines decide” November 3, 2010
Mapping retrieved name entities
Chief complaint Priority symptoms
sex age temperature
fever, sore behind her ears, sore throat
F child 39C
Travel history Onset time duration Other concerns
Mexico Wednesday April 22
Asthma, coughing
Alberta Council of Technologies, “When machines decide” November 3, 2010
Vocabulary Bias: RLAI Critterbot
Time Motor0_Command Motor0_Speed Motor0_Current
Motor1_Command Motor1_Speed Motor1_Current Motor2_Command Motor2_Speed Motor2_Current
AccelX AccelY AccelZ RotationVel IR0 IR1 IR2 IR3 IR4 IR5 IR6 IR7 IR8 IR9 Light0 Light1 Light2 Light3
1232746809.240 0 0 0 0 0 0 0 0 0 -32 -32 976 16 2 2 2 2 2 18 2 63 2 10 308 172 120 120
1232746809.250 0 0 0 0 0 0 0 0 0 -32 -32 992 4 2 3 3 3 5 17 4 64 3 19 308 168 124 120
1232746809.260 0 0 0 0 0 0 0 0 0 -32 -32 992 8 3 2 4 5 3 17 2 65 2 19 308 172 124 120
1232746809.270 0 0 0 0 0 0 0 0 0 -32 -32 992 4 2 2 2 2 2 17 2 66 5 21 312 172 120 120
1232746809.280 0 0 0 0 0 0 0 0 0 -32 -32 992 12 3 2 2 2 2 22 3 65 2 17 304 172 124 120
1232746809.290 0 0 0 0 0 0 0 0 0 -32 -32 976 0 2 2 2 2 2 21 3 66 3 18 312 168 120 124
.
.
.
http://www.cs.ualberta.ca/~sokolsky/critterbot/index.php
Alberta Council of Technologies, “When machines decide” November 3, 2010
Vocabulary Bias: Data Stream Mining
transaction
Data stream: ordered sequence of transactions Mining frequent patterns: Identify all subsets of items whose current frequency exceeds sN
Hui Yang1, Hongyan Liu2, and Jun He1,
1Information School, Renmin University of China,
{huiyang,hejun}@ruc.edu.cn
2School of Economics and Management, Tsinghua University, [email protected] Data Mining and Applications, Third International Conference, ADMA 2007 Harbin, China, August 6-8, 2007 Proceedings
Alberta Council of Technologies, “When machines decide” November 3, 2010
Click to edit the outline text format
Second Outline Level Third Outline
Level Fourth
Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
• Ninth Outline LevelClick to edit Master text styles
• Second level• Third level
• Fourth level• Fifth level
Vocabulary Bias: Biological Hypotheses
Alberta Council of Technologies, “When machines decide” November 3, 2010
Click to edit the outline text format
Second Outline Level Third Outline
Level Fourth
Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
• Ninth Outline LevelClick to edit Master text styles
• Second level• Third level
• Fourth level• Fifth level
Vocabulary Bias: Biological Hypotheses
Alberta Council of Technologies, “When machines decide” November 3, 2010
Vocabulary Bias: Biological Hypotheses
• Gabriel Synnaeve, Andrei Doncescu, and Katsumi Inoue. Kinetic Models for Logic-Based Hypothesis Finding in Metabolic Pathways. The 19th International Conference on Inductive Logic Programming (ILP 2009), 2009, July 2-4
• Discretization of e-coli metabolic pathways, to enable machine management of hypotheses on dynamics of compound concentrations
Alberta Council of Technologies, “When machines decide” November 3, 2010
Machine Learning (ML)
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
Method Selection
• Wiki’s list• WEKA, Rapidminer, RL Toolkit, …• Analytics goals drive method selection
Alberta Council of Technologies, “When machines decide” November 3, 2010
Wiki list of ML Methods
…how do you choose?
Alberta Council of Technologies, “When machines decide” November 3, 2010
Wiki list of ML Methods
…how do you choose?
Alberta Council of Technologies, “When machines decide” November 3, 2010
Modeling: Producing a classifier
N
NY
Sore Thro
at
…
……
…
NoPale8710::::
YesClear11022NoPale9535
diseaseX
Colour
Press.
Temp.
classifierdata
Data capture/sampling Attribute selection Learning Prediction
Alberta Council of Technologies, “When machines decide” November 3, 2010
Temp Press.Sore-
Throat… Color
32 90 N … PaleClassifier
diseaseX
No
Learner
+
-
+
+ +
+
+
+
+
-
-
--
-
-
Temperature
Pre
ssu
re
N
N
Y
Sore Throat
…
…
…
…
NoPale8710
::::
YesClear11022
NoPale9535
diseaseXColourPress.Temp.
Prediction: using a classifier
Alberta Council of Technologies, “When machines decide” November 3, 2010
Control: creating intervention decision
data
Interventiondecision
N
NY
Sore Throat
…
……
…
NoPale
87
10
::::Yes
Clear
110
22
NoPale
95
35
diseaseXPress.Temp.
predictor
data
Interventiondecision
Temp Press.Sore-
Throat… Color
32 90 N … Pale
New instance
Alberta Council of Technologies, “When machines decide” November 3, 2010
Machine Learning (ML)
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
From: xkcd.com
Selecting data to make decisions?
Alberta Council of Technologies, “When machines decide” November 3, 2010
Keeping a pole balanced
http://www-clmc.usc.edu/~jrpeters/pmwiki.php/Main/PublicationsByTopic?id=1785
Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30
Alberta Council of Technologies, “When machines decide” November 3, 2010
Noise in patient chart
Spelling errors• Doctors’ dictations have many misspellings, acronyms, and
abbreviations in words.
– UNSTABLE ANIGINA ( ANGINA)– GYNECOLOGY ABORTION SPETIC SPONTANEOUS( SEPTIC)– PYLONOPHRITIS 6 WEEKS PREGNANT( PYELONEPHRITIS)– QUERY MENIGITIS IRRITABILTY( MENINGITIS, IRRITABILITY)– SURGERY APPENDICITIS UNPSECIFIED SURGERY( UNSPECIFIED)– ACTUE MANIA ( ACUTE)
Alberta Council of Technologies, “When machines decide” November 3, 2010
Types of noise in annotated data
1. Inconsistent annotation
1. Early treatment of <DISEASE> gestational diabetes </DISEASE> reduces the rate of <DISEASE fetal macrosomia </DISEASE>2. We conclude that to reduce the rate of macrosomic infants in <DISEASE> gestational diabetes cases </DISEASE> , <TREATMENT> good glycemic control </TREATMENT> should be initiated before 34 completed gestational weeks . 3. Although improved <TREATMENT> glycemic control </TREATMENT>, maintenance of normal blood pressure , and use of …
inconsistent
inconsistent
Alberta Council of Technologies, “When machines decide” November 3, 2010
Sogou query logs
Query logs can be analyzed to induce the behavior of users and improve the performance of search engines.
Sogou: a popular Chinese search engine. Sogou query logs of 2007: 31 files, one for
each day; 1,415,651logs by 378,303 users.
Alberta Council of Technologies, “When machines decide” November 3, 2010
“Noise” in Query Logs
The assumption is that some URLs in query logs are better than others in terms of users’ goals for a specific query.
But analysis shows this is not always true: users can have different objectives for the same query.
Alberta Council of Technologies, “When machines decide” November 3, 2010
Ambiguity in Query Logs
For example, one user entered query “ 无” 业良民 (unemployed people), and he
clicked URLs that ranked below the top 40 by the search engine.
After analyzing the web pages, we found the user was searching for articles whose author is named “ 无业良民 ,” instead of whose topic is that phrase.
Alberta Council of Technologies, “When machines decide” November 3, 2010
Machine Learning (ML)
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluation
• Classification performance• Information retrieval• Scientific leverage• Physical performance
Alberta Council of Technologies, “When machines decide” November 3, 2010
Did the intervention kill the patient?
N
NY
Sore Throat
…
……
…
NoPale
87
10
::::Yes
Clear
110
22
NoPale
95
35
diseaseXPress.Temp.
predictor Interventiondecision
modeling prediction control
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluating classifiers
Confusion matrix
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluating classifiers
Q. Zhang, Using Multiple Detectors for Artist Classification, M.Sc. Dissertation, 2005, Computing Science, University of Alberta, Figure 5.6: Receiver Operation Characteristic curves
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluating classifiers
Q. Zhang, Using Multiple Detectors for Artist Classification, M.Sc. Dissertation, 2005, Computing Science, University of Alberta
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluating IR Performance
Mi-Young Kim, Qing Dou, Osmar R. Zaiane, Randy Goebel, Mapping of Sentences to UMLS Disease Concepts based on Information Retrieval Model and Clustering, AICML (under submission)
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluating IR Performance
Shane Bergsma, Dekang Lin, Randy Goebel Discriminative Learning of Selectional Preference from Unlabeled Text, Empirical Methods in NLP, 2008.
Alberta Council of Technologies, “When machines decide” November 3, 2010
Click to edit the outline text format
Second Outline Level Third Outline
Level Fourth
Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
• Ninth Outline LevelClick to edit Master text styles
• Second level• Third level
• Fourth level• Fifth level
Evaluating Scientific Leverage
Alberta Council of Technologies, “When machines decide” November 3, 2010
Evaluating Physical Performance
Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30
Alberta Council of Technologies, “When machines decide” November 3, 2010
Learning Architecture Choices
• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation
Alberta Council of Technologies, “When machines decide” November 3, 2010
Summary
• Machine learning can provide methods for modeling, prediction, and control, based on goal-directed analysis of large data volumes
• Learning architectures are required to provide a framework for building adaptive systems
• Industrial application must be driven by anticipating business model value