alberta council of technologies, “when machines decide” november 3, 2010 essential concepts of...

58
Alberta Council of Technologies, “When machines decide” November 3, 2010 Essential concepts of machine learning: (really) not the whole story Randy Goebel [email protected] [email protected]

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Alberta Council of Technologies, “When machines decide” November 3, 2010

Essential concepts of machine learning: (really) not the whole story

Randy [email protected]

[email protected]

Alberta Council of Technologies, “When machines decide” November 3, 2010

Co-conspirators

Alberta Ingenuity Centre for Machine Learning

www.aicml.ca

Alberta Council of Technologies, “When machines decide” November 3, 2010

Overview

• Foundational Ideas• Machine Learning Components• Learning Architectures• Summary

Alberta Council of Technologies, “When machines decide” November 3, 2010

Déjà vu

“One system ... universal, interdependent, intercommunicating, like the highway system of the country, extending from every door to every other door, affording electrical communication of every kind, from every one at every place to every one at every other place.”

— Theodore Vail, President, AT&T, 1907

Alberta Council of Technologies, “When machines decide” November 3, 2010

Déjà vu

“...In addition to the technology and the system, the success of the telephone required people to think about communications in a new way. The telephone was at first a scientific curiosity, without obvious business use.”

— Steven Lubar, Infoculture: The Smithsonian

Book of Information Age Inventions

Alberta Council of Technologies, “When machines decide” November 3, 2010

What computers can’t do?

• Change a baby’s diaper• Identify ideological bias• “Give me 2-3 video clips of Steven

Harper contradicting himself…”

Alberta Council of Technologies, “When machines decide” November 3, 2010

What humans can’t do?

• Manage large data volumes• Manage transaction time frames• Prioritize possible hypotheses/trends on

data

Alberta Council of Technologies, “When machines decide” November 3, 2010

Google trends

Alberta Council of Technologies, “When machines decide” November 3, 2010

ML for data analytics

• Google search crawler uses 850 TB of information

• Analytics uses 220 TB stored in two tables: 200 TB for the raw data and 20 TB for the summaries.

• Google Earth uses 70.5 TB: 70 TB for the raw imagery and 500 GB for the index data.

From http://zonixsoft.wordpress.com/2008/06/23/how-big-is-googles-database/

Alberta Council of Technologies, “When machines decide” November 3, 2010

ML for data analytics

• Volume of data being accumulated• No simple precise definition of analytics

goals• Managing models for identification of

trends• AI closes the loop: taking action based on

trends

Alberta Council of Technologies, “When machines decide” November 3, 2010

Foundational Ideas

• Algorithmic versus procedural• Heuristic programming: compiling in

knowledge• Machine Learning: when there is too

much data

Alberta Council of Technologies, “When machines decide” November 3, 2010

...

...

...

Constructive search

• systematic search for a completed solution, in a sparsely populated space (constructive search)

• classical AI weak method

• best solutions exploit “intelligence” of what is the best next piece of a partial solution

Alberta Council of Technologies, “When machines decide” November 3, 2010

Iterative Improvement

• systematic search for improvements, in a densely populated space (iterative search)

• neo-classical AI weak method• best solutions exploit

“intelligence” of where to look for better solutions

Alberta Council of Technologies, “When machines decide” November 3, 2010

If then

If then

. . .

Representation

...

...

...

Alberta Council of Technologies, “When machines decide” November 3, 2010

Modeling, Prediction, Control

Modeling

Control

Prediction

Alberta Council of Technologies, “When machines decide” November 3, 2010

Technology Push vs. Market Pull

• Push: AI & ML can accomplish anything

• Pull: business models should shape priorities for what to accomplish

287kg/632lbs

Alberta Council of Technologies, “When machines decide” November 3, 2010

Machine Learning (ML)

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

Machine Learning (ML)

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

Data Compression

• average is one of the simplest forms of compression (aggregation, generalization)

• “lossy” in that the original cannot be recomputed

{1, 2, 3, 4, 5}

3

Alberta Council of Technologies, “When machines decide” November 3, 2010

Data Compression

• Lossy compression (e.g., JPEG)

From http://www.widearea.co.uk/designer/ducks.html

Alberta Council of Technologies, “When machines decide” November 3, 2010

Data Compression

Primary Structure (sequence of amino acids) MVKQIESKTA FQEALDAAGD KLVVVDFSAT WCGPCKMIKP FFHSLSEKYS NVIFLEVDVD DCQDVASECE VKCMPTFQFF KKGQKVGEFS GANKEKLEAT INELV

Secondary Structure (alpha Helix, Beta strand, random Coil) CBBBBCCHHH HHHHHHHCCC CBBBBBBBCC CCHHHHHHHH HHHHHHHHCC CBBBBBBBCC CCHHHHHHCC CCCCCBBBBB BCCBBBBBBB CCCHHHHHHH HHHCC

Alberta Council of Technologies, “When machines decide” November 3, 2010

Machine Learning (ML)

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

Vocabulary Bias

• The language in which ML abstraction is rendered

• Control policy• E.g., robot control program

• Classifier/Classification structure• E.g., medical ontologies

• Inductive hypotheses• E.g., biological hypotheses

Alberta Council of Technologies, “When machines decide” November 3, 2010

Data Compression

Primary Structure (sequence of amino acids) MVKQIESKTA FQEALDAAGD KLVVVDFSAT WCGPCKMIKP FFHSLSEKYS NVIFLEVDVD DCQDVASECE VKCMPTFQFF KKGQKVGEFS GANKEKLEAT INELV

Secondary Structure (alpha Helix, Beta strand, random Coil) CBBBBCCHHH HHHHHHHCCC CBBBBBBBCC CCHHHHHHHH HHHHHHHHCC CBBBBBBBCC CCHHHHHHCC CCCCCBBBBB BCCBBBBBBB CCCHHHHHHH HHHCC

Alberta Council of Technologies, “When machines decide” November 3, 2010

Vocabulary bias: Healthlink

Example

CHIEF COMPLAINT/QUESTION:fever,sore behind her ears, sore throat

PRIORITY SYMPTOMS:39 C 1 hr ago, denied Sob, denied chest pain, denied rash

When began? started while still in Mexico on Wednesday April 22. St

How is child now? She has hx of asthma; now coughing fits- no vom

Alberta Council of Technologies, “When machines decide” November 3, 2010

Mapping retrieved name entities

Chief complaint Priority symptoms

sex age temperature

fever, sore behind her ears, sore throat

F child 39C

Travel history Onset time duration Other concerns

Mexico Wednesday April 22

Asthma, coughing

Alberta Council of Technologies, “When machines decide” November 3, 2010

Vocabulary Bias: RLAI Critterbot

Time Motor0_Command Motor0_Speed Motor0_Current

Motor1_Command Motor1_Speed Motor1_Current Motor2_Command Motor2_Speed Motor2_Current

AccelX AccelY AccelZ RotationVel IR0 IR1 IR2 IR3 IR4 IR5 IR6 IR7 IR8 IR9 Light0 Light1 Light2 Light3

1232746809.240 0 0 0 0 0 0 0 0 0 -32 -32 976 16 2 2 2 2 2 18 2 63 2 10 308 172 120 120

1232746809.250 0 0 0 0 0 0 0 0 0 -32 -32 992 4 2 3 3 3 5 17 4 64 3 19 308 168 124 120

1232746809.260 0 0 0 0 0 0 0 0 0 -32 -32 992 8 3 2 4 5 3 17 2 65 2 19 308 172 124 120

1232746809.270 0 0 0 0 0 0 0 0 0 -32 -32 992 4 2 2 2 2 2 17 2 66 5 21 312 172 120 120

1232746809.280 0 0 0 0 0 0 0 0 0 -32 -32 992 12 3 2 2 2 2 22 3 65 2 17 304 172 124 120

1232746809.290 0 0 0 0 0 0 0 0 0 -32 -32 976 0 2 2 2 2 2 21 3 66 3 18 312 168 120 124

.

.

.

http://www.cs.ualberta.ca/~sokolsky/critterbot/index.php

Alberta Council of Technologies, “When machines decide” November 3, 2010

Vocabulary Bias: Data Stream Mining

transaction

Data stream: ordered sequence of transactions Mining frequent patterns: Identify all subsets of items whose current frequency exceeds sN

Hui Yang1, Hongyan Liu2, and Jun He1,

1Information School, Renmin University of China,

{huiyang,hejun}@ruc.edu.cn

2School of Economics and Management, Tsinghua University, [email protected] Data Mining and Applications, Third International Conference, ADMA 2007 Harbin, China, August 6-8, 2007 Proceedings

Alberta Council of Technologies, “When machines decide” November 3, 2010

Click to edit the outline text format

Second Outline Level Third Outline

Level Fourth

Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

• Ninth Outline LevelClick to edit Master text styles

• Second level• Third level

• Fourth level• Fifth level

Vocabulary Bias: Biological Hypotheses

Alberta Council of Technologies, “When machines decide” November 3, 2010

Click to edit the outline text format

Second Outline Level Third Outline

Level Fourth

Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

• Ninth Outline LevelClick to edit Master text styles

• Second level• Third level

• Fourth level• Fifth level

Vocabulary Bias: Biological Hypotheses

Alberta Council of Technologies, “When machines decide” November 3, 2010

Vocabulary Bias: Biological Hypotheses

• Gabriel Synnaeve, Andrei Doncescu, and Katsumi Inoue. Kinetic Models for Logic-Based Hypothesis Finding in Metabolic Pathways. The 19th International Conference on Inductive Logic Programming (ILP 2009), 2009, July 2-4

• Discretization of e-coli metabolic pathways, to enable machine management of hypotheses on dynamics of compound concentrations

Alberta Council of Technologies, “When machines decide” November 3, 2010

Machine Learning (ML)

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

Method Selection

• Wiki’s list• WEKA, Rapidminer, RL Toolkit, …• Analytics goals drive method selection

Alberta Council of Technologies, “When machines decide” November 3, 2010

Wiki list of ML Methods

…how do you choose?

Alberta Council of Technologies, “When machines decide” November 3, 2010

Wiki list of ML Methods

…how do you choose?

Alberta Council of Technologies, “When machines decide” November 3, 2010

Modeling: Producing a classifier

N

NY

Sore Thro

at

……

NoPale8710::::

YesClear11022NoPale9535

diseaseX

Colour

Press.

Temp.

classifierdata

Data capture/sampling Attribute selection Learning Prediction

Alberta Council of Technologies, “When machines decide” November 3, 2010

Temp Press.Sore-

Throat… Color

32 90 N … PaleClassifier

diseaseX

No

Learner

+

-

+

+ +

+

+

+

+

-

-

--

-

-

Temperature

Pre

ssu

re

N

N

Y

Sore Throat

NoPale8710

::::

YesClear11022

NoPale9535

diseaseXColourPress.Temp.

Prediction: using a classifier

Alberta Council of Technologies, “When machines decide” November 3, 2010

Control: creating intervention decision

data

Interventiondecision

N

NY

Sore Throat

……

NoPale

87

10

::::Yes

Clear

110

22

NoPale

95

35

diseaseXPress.Temp.

predictor

data

Interventiondecision

Temp Press.Sore-

Throat… Color

32 90 N … Pale

New instance

Alberta Council of Technologies, “When machines decide” November 3, 2010

Machine Learning (ML)

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

From: xkcd.com

Selecting data to make decisions?

Alberta Council of Technologies, “When machines decide” November 3, 2010

Keeping a pole balanced

http://www-clmc.usc.edu/~jrpeters/pmwiki.php/Main/PublicationsByTopic?id=1785

Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30

Alberta Council of Technologies, “When machines decide” November 3, 2010

Noise in patient chart

Spelling errors• Doctors’ dictations have many misspellings, acronyms, and

abbreviations in words.

– UNSTABLE ANIGINA ( ANGINA)– GYNECOLOGY ABORTION SPETIC SPONTANEOUS( SEPTIC)– PYLONOPHRITIS 6 WEEKS PREGNANT( PYELONEPHRITIS)– QUERY MENIGITIS IRRITABILTY( MENINGITIS, IRRITABILITY)– SURGERY APPENDICITIS UNPSECIFIED SURGERY( UNSPECIFIED)– ACTUE MANIA ( ACUTE)

Alberta Council of Technologies, “When machines decide” November 3, 2010

Types of noise in annotated data

1. Inconsistent annotation

1. Early treatment of <DISEASE> gestational diabetes </DISEASE> reduces the rate of <DISEASE fetal macrosomia </DISEASE>2. We conclude that to reduce the rate of macrosomic infants in <DISEASE> gestational diabetes cases </DISEASE> , <TREATMENT> good glycemic control </TREATMENT> should be initiated before 34 completed gestational weeks . 3. Although improved <TREATMENT> glycemic control </TREATMENT>, maintenance of normal blood pressure , and use of …

inconsistent

inconsistent

Alberta Council of Technologies, “When machines decide” November 3, 2010

Sogou query logs

Query logs can be analyzed to induce the behavior of users and improve the performance of search engines.

Sogou: a popular Chinese search engine. Sogou query logs of 2007: 31 files, one for

each day; 1,415,651logs by 378,303 users.

Alberta Council of Technologies, “When machines decide” November 3, 2010

“Noise” in Query Logs

The assumption is that some URLs in query logs are better than others in terms of users’ goals for a specific query.

But analysis shows this is not always true: users can have different objectives for the same query.

Alberta Council of Technologies, “When machines decide” November 3, 2010

Ambiguity in Query Logs

For example, one user entered query “ 无” 业良民 (unemployed people), and he

clicked URLs that ranked below the top 40 by the search engine.

After analyzing the web pages, we found the user was searching for articles whose author is named “ 无业良民 ,” instead of whose topic is that phrase.

Alberta Council of Technologies, “When machines decide” November 3, 2010

Machine Learning (ML)

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluation

• Classification performance• Information retrieval• Scientific leverage• Physical performance

Alberta Council of Technologies, “When machines decide” November 3, 2010

Did the intervention kill the patient?

N

NY

Sore Throat

……

NoPale

87

10

::::Yes

Clear

110

22

NoPale

95

35

diseaseXPress.Temp.

predictor Interventiondecision

modeling prediction control

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluating classifiers

Confusion matrix

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluating classifiers

Q. Zhang, Using Multiple Detectors for Artist Classification, M.Sc. Dissertation, 2005, Computing Science, University of Alberta, Figure 5.6: Receiver Operation Characteristic curves

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluating classifiers

Q. Zhang, Using Multiple Detectors for Artist Classification, M.Sc. Dissertation, 2005, Computing Science, University of Alberta

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluating IR Performance

Mi-Young Kim, Qing Dou, Osmar R. Zaiane, Randy Goebel, Mapping of Sentences to UMLS Disease Concepts based on Information Retrieval Model and Clustering, AICML (under submission)

 

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluating IR Performance

Shane Bergsma, Dekang Lin, Randy Goebel Discriminative Learning of Selectional Preference from Unlabeled Text, Empirical Methods in NLP, 2008.

 

Alberta Council of Technologies, “When machines decide” November 3, 2010

Click to edit the outline text format

Second Outline Level Third Outline

Level Fourth

Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

• Ninth Outline LevelClick to edit Master text styles

• Second level• Third level

• Fourth level• Fifth level

Evaluating Scientific Leverage

Alberta Council of Technologies, “When machines decide” November 3, 2010

Evaluating Physical Performance

Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30

Alberta Council of Technologies, “When machines decide” November 3, 2010

Learning Architecture Choices

• Data Compression• Vocabulary Bias• Method Selection• Data Selection/Preparation• Evaluation

Alberta Council of Technologies, “When machines decide” November 3, 2010

Summary

• Machine learning can provide methods for modeling, prediction, and control, based on goal-directed analysis of large data volumes

• Learning architectures are required to provide a framework for building adaptive systems

• Industrial application must be driven by anticipating business model value