an analysis of_machine_and_human_analytics_in_classification
TRANSCRIPT
An Analysis of Machine- and Human-Analytics in Classification
Authors:
1. Gary K. L. Tam (Swansea University)
2. Vivek Kothari (University of Oxford)
3. Min Chen (University of Oxford)
Presented by:
Subhashis Hazarika
(Ohio State University)
Major Contribution
• An information-theoretic model that explains why a human driven visual analytic model of classification performs better than a purely machine-learning model.
Overview
• Consider two classification case studies.
• Create a decision tree classifier applying standard ML algorithms.
• Create a decision tree classifier using visual analytics guided by “soft knowledge” of a human model-developer.[1]
• Using Information theory explain why the human centric approach performs better than the ML approach.
• Quantify the “soft knowledge” that a human centric approach takes advantage of.
[1]: “Visualization of Time-Series Data in Parameter Space for Understanding Facial Dynamics”, G.K.L. Tam, H. Fang, A. J. Aubrey, P.W. Grant, D. Marshall, M. Chen. Eurovis2011.
Case Study A (Facial Dynamics Data) • Input and feature extraction:
– 68 raw facial videos classified as one of the four (smile, sadness, surprise, anger).
– For each video, extracted 14 time series representing different temporal facial features.
– For each time series, 23 quantitative measures were obtained.
– Resulting in 14x23 attributes/features per video.
• Create Decision Tree using a Parallel Coordinates based visual analytics system.
• Create a Decision Tree with standard ML algorithms (C4.5 or CART).
Case Study B (Visualization Image Classification) • Input and feature extraction:
– 4x49 jpeg images classified as (bubble-chart, treemap, parallel-coordinate, bar-graphs).
– For each image, extracted 222 features via. different image classification and clustering .
• Create Decision Tree using a Parallel Coordinates based visual analytics system.
• Create a Decision Tree with standard ML algorithms (C4.5 or CART).
The Team
• Case Study A: – Conducted by 7 researchers with expertise in vision, visual analytics, computer graphics
and machine learning.
– Human-centric D-Tree was constructed by a researcher who was specialized in graphics and acquired the knowledge of computer vision and visual analytics during the project.
• Case Study B: – Conducted by 2 researchers with expertise in image processing and visual analytics.
– Human-centric D-Tree was constructed by a researcher with 8 months of experience in visual analytics.
But Why? Some Empirical Observations
• O1: Overview and Axis Distribution. – A machine-centric approach examines many cut positions on all the axis and greedily
picks the cut with the highest quality measure.
– While a human model developer usually first obtains a general overview of the data and identifies important axes with promising patterns before paying detailed attention to these axes.
• O2: General Agreement amongst Statistics. – ML algorithms only use one metric to determine the cut.
– HC approach can evaluate more than one statistics to decide the cut.
• O3: Look-ahead. – Humans’ insights into the consequence often influences the current decision.
– Humans’ look-ahead ability enables multi-step judgement, while the ML algorithms focused only on the current decisions.
But Why? Some Empirical Observations
• O4: Outliers. – If possible model developers avoid axes with outliers, as they may be unreliable.
– Such reasoning is not available in the ML algorithms.
• O5: Cut Positions on an Axis. – Humans’ look for a cut or cuts that would allow each class to expand beyond the current
instance in the training set.
– ML algorithms decide the cuts at the very edges of a particular class.
• O6: Human (Domain) Knowledge. – Humans’ incorporate their domain knowledge into their model construction process.
Information Theoretic Analysis
• Estimated World Population : 7.4 billion
• Consider each person have 5 variations for each of the 4 expressions.
• The number of possible scenarios to capture : 148 billion
• The maximal entropy is 37.1 bits.
• We only know 68 cases(the raw training video)
• That is 1.7 x 10-8 bits. (a drop in the ocean)
• [ML] Optimistically, assuming the categorization retains 50% mutual information. That leaves us with 8.5 x 10-9 bits of information.
Information Theoretic Analysis • [VA] Model developer may know some 200 people reasonably well, and
can recall their 5 variations of 4 expressions at ease. Conservatively, that is equivalent to 4068 videos instead of 68. Representing 1.0 x 10-6 bits of known information.
• [VA]When given an arbitrary facial image, the developer can also reconstruct an expression using imagination e.g at least 1 variation per expression. This ability accounts to 29.6 billion videos, representing 7.4 bits of known information. (This ability shows up in determining outliers).
• 7.4 bits v/s 8.5 x 10-9 bits . That is roughly 871 million times more information content.
Soft Knowledge and Soft Models
• Soft Knowledge: The uncaptured information not available to the machine-centric approach.
• Soft Model: The models which make decisions based on soft knowledge.
• Examples: 1. Given a facial photo (input), imagine how the person would smile (output).
2. Given a video (input), determine if it is an outlier (output).
3. Given a set of points on an axis (input), decide how many cuts and where they are (output).
Conclusion
• There is an overwhelming amount of information available to the human-centric approach in the form of soft knowledge that can’t be utilized by a machine-centric approach.
• It is necessary to understand and quantify the information flow in both machine- and human-centric approaches to help design a mixed model performing an much better job.
• Human model developer can never by cast aside.