simplicity, induction, and scientific discovery school of computing science simon fraser university...

Click here to load reader

Upload: amelia-byrd

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1

Simplicity, Induction, and Scientific Discovery School of Computing Science Simon Fraser University Vancouver, Canada Slide 2 2/36 Outline Mind Change Minimization and Simplicity. Introductory Examples. Existence inquiry. Riddle of Induction. Necessary and Sufficient Condition for Mind Change Bounds. Connection with Point-Set Topology. Advanced Examples. Learning Conservation Laws in Particle Physics. Constraint-Based Learning of Bayes Nets. Simplicity, Induction, Scientific Discovery Slide 3 3/36 Learning With Mind Change Bounds Simplicity, Induction, Scientific Discovery Slide 4 4/36 Simplicity and Steady Progress Both simple and complex routes may get us to the same place. But favoring simplicity gets us there more efficiently. Kelly, K.T. How Simplicity Helps You Find the Truth Without Pointing at it. In Induction, Algorithmic Learning Theory, and Philosophy, 2007, Springer. Slide 5 5/36 Learning and Steady Convergence Minimizing Mind Changes can be seen as an objective function for learning, like a loss function. Standard loss function: loss(hypothesis, true model). Mind Changes: loss(sequence of hypotheses). Simplicity, Induction, Scientific Discovery learning time output hypothesis learning time goodworse Slide 6 6/36 Simplicity, Induction, Scientific Discovery Mind Change Bound Example Is a certain reaction possible?, e.g. r = n + n p + p + e - + e - Rules Learner makes conjecture yes or no Adversary shows experimental outcomes (observed or not). Learner pays for abandoning yes or no. r observed no r r observed no r r observed no r . Slide 7 7/36 Simplicity, Induction, Scientific Discovery The New Riddle of Induction Goodman (1983). Grue applies to all things examined before t just in case they are green but to other things just in case they are blue. Rules Learner projects generalization (e.g. all green) Adversary chooses color of next emerald. Learner pays for mistaken predictions. All green . . All blue . All grue 1 . All grue 2 Slide 8 8/36 Description Length, Green and Grue One of the reasons philosophers are interested in the Riddle of Induction is that it illustrates how descriptive simplicity can depend on choice of vocabulary. Simplicity, Induction, Scientific Discovery Basic predicates: green, blue HypothesisDefinition all green all grue t Basic predicates: grue t, bleen t HypothesisDefinition all green all grue t grue t = green up to time t, blue thereafter. bleen t = blue up to time t, green thereafter. Slide 9 9/36 Topology and Mind Change Bounds Simplicity, Induction, Scientific Discovery Slide 10 10/36 Convergence in the Limit A learning problem consists of: A hypothesis space H. A space D of possible complete data sequences. A correctness notion that specifies which hypothesis H is correct for which data sequence D. A learner outputs a hypothesis on every finite (partial) data sequence. May also output ? for no conclusion yet. A learner converges to a correct hypothesis in the limit if, on every data sequence D in D, after some finite time, the learners conjecture is always correct for D. Putnam, H., 1963. Degree of Confirmation and Inductive Logic, in The Philosophy of Rudolf Carnap. Gold, E., "Language Identification in the Limit," Information and Control, 10, 1967. Kelly, K., 1996. The Logic of Reliable Inquiry, Oxford: Oxford University Press. all green all grue 1 all grue 2 .... Hypothesis Space output hypothesis learning time Slide 11 11/36 Mind Change Bounds A learner makes a mind change on a data sequence at time m+1 if its conjecture at time m is in H and is different from the conjecture at time m+1. A learning problem (H,D) is solvable with at most k mind changes if there is a learner that converges to a correct hypothesis and changes its mind at most k times before convergence. Putnam, H., 1965. Trial and Error Predicates and the Solution to a Problem of Mostowski, in The Journal of Symbolic Logic, 30(1): 4957. Convergent (Reliable) +Mind-change bounded learners Learners Slide 12 12/36 Topology on a Hypothesis Space A hypothesis H is an isolated point in hypothesis space H if there is a finite data sequence such that H is the only hypothesis in H consistent with the data. Write H for the set of isolated points of H. Successively eliminate isolated points: 1. H 0 = H-H. 2. H i+1 = H i H i . The accumulation order of H is the least index i s.t. H i+1 = H i. G. Cantor, Grundlagen einer allgemeinen Mannigfaltigkeitslehre, 1883. Apsitis, K., 1994. Derived sets and inductive inference ALT 1994. Mind Change Efficient Learning. W. Luo and O. Schulte COLT 2005. Slide 13 13/36 Examples Simplicity, Induction, Scientific Discovery r observed no r r observed no r r observed no r All grue 1 All grue 2 All green . . All blue . no r Slide 14 14/36 Topology and Mind-Change Bounds Theorem (Luo and Schulte 1995) A learning problem (H,D) is solvable with k mind changes if and only if 1. the accumulation order of H is at most k. 2. H k is the empty set. Simplicity, Induction, Scientific Discovery Slide 15 15/36 Topology and Inductive Simplicity The simplicity rank of a hypothesis H is the last stage at which H is eliminated. Greater simplicity rank greater inductive simplicity. Simplicity, Induction, Scientific Discovery all grue 1 all grue 2 .... all green rank 1 rank 0 Slide 16 16/36 Mind-Change Optimality Suppose we add convergence time admissibility (Gold 1967) to mind-change optimality. Then there is a unique mind-change optimal learner (Schulte, Luo, Greiner 2007, 2010). Mind-change optimal learning of Bayes net structure from dependency and independency data. Schulte, O., W. Luo, and R. Greiner (2010). Information and Computation, 208:63-82. Convergent +Mind-change bounded Learners +Time- Admissible is there a unique simplest hypothesis consistent with the data? output simplest hypothesis output ? (no conclusion) no yes Slide 17 17/36 Related Work and Extensions Mind change bounds are related to mistake bounds in statistical learning theory (Jain and Sharma 1999). Mind-change solvability requires logical (in)consistency with the data. Relaxed by Kelly, including for statistical applications (Kelly and Mayo-Wilson 2010). Simplicity rank is a kind of degree of falsifiability. Jain, S. and Sharma, A. On a generalized notion of mistake bounds. COLT 1999. Kelly, K.T. and Mayo-Wilson, C. Causal Conclusions that Flip Repeatedly and Their Justification. UAI 2010. Slide 18 18/36 Learning Conservation Laws in Particle Physics Simplicity, Induction, Scientific Discovery Slide 19 19/36 Simplicity, Induction, Scientific Discovery 19 /17 Example: Particle Physics Reactions and Quantities represented as Vectors (Aris 69; Valds-Prez 94, 96) i = 1,n entities r(i) = # of entity i among reagents - # of entity i among products. A quantity is conserved in a reaction if and only if the corresponding vectors are orthogonal. A reaction is possible iff it conserves all quantities. Slide 20 20/36 Simplicity, Induction, Scientific Discovery 20 /17 Conserved Quantities in the Standard Model Standard Model based on Gell-Manns quark model (1964). Full set of particles: n = 193. Quantity Particle Family (Cluster). Slide 21 21/36 The Learning Task (Toy Example) Simplicity, Induction, Scientific Discovery Given: 1.fixed list of known detectable particles. 2.Input reactions Reactions Reaction Matrix R Output Quantity Matrix Q Learning Cols in Q are conserved, so RQ = 0. Not Given: 1.# of quantities 2.Interpretation of quantities. Slide 22 22/36 Inductive Simplicity of Conservation Laws Simplicity rank of a set of conserved quantities = number of independent quantities (in the sense of linear algebra). Therefore the mind-change optimal method chooses a maximum rank conservation matrix consistent with the data (Schulte 2001). Can be computed as a basis for the nullspace of the observed reaction matrix. Least generalization: Rules out as many unobserved reactions as possible. Inferring Conservation Principles in Particle Physics: A Case Study in the Problem of Induction. Oliver Schulte (2001). The British Journal for the Philosophy of Science, 51: 771-806. R the smallest generalization of observed reactions R = linear span of R larger generalization of observed reactions R Unobserved allowed reactions Slide 23 23/36 Simplicity, Induction, Scientific Discovery System for Finding a Maximally Strict Set of Selection Rules Read in Observed Reactions Convert to list of vectors R Compute basis Q for nullspace of R from database using conversion utility Maple function nullspace Slide 24 24/36 Simplicity, Induction, Scientific Discovery 24 /17 Comparison with Standard Model Dataset complete set of 193 particles (antiparticles listed separately). See Excel. included most probable decay for each unstable particle 182 reactions. Some others from textbooks for total of 205 reactions. See Demo. Matches Standard Model! Slide 25 25/36 Extensions Mind-change optimal learning for simultaneous discovery of conservation laws and hidden particles (neutrinos), Schulte 2009. Same algorithm can be used to find molecular structure of chemical substances (e.g., water is H 2 0), Schulte and Drew 2010. Simultaneous Discovery of Conservation Laws and Hidden Particles With Smith Matrix Decomposition. Schulte, O. IJCAI 2009. Learning Conservation Laws Via Matrix Search. Oliver Schulte and Mark S. Drew, Discovery Science 2010. Slide 26 26/36 Learning Simple Conservation Laws in Particle Physics Simplicity, Induction, Scientific Discovery Slide 27 27/36 Empirically Equivalent Conservation Matrices There are many bases for the nullspace of an observed reaction set. All are empirically equivalent: consistent with exactly the same reactions (non-identifiability). All have the same inductive simplicity rank. How to choose? 1. Minimize description length/maximize parsimony. 2. Choose conservation matrices with a simpler ontology. Simplicity, Induction, Scientific Discovery Slide 28 28/36 Description Length/Parsimony L1-norm |M| of matrix M = sum of absolute values of entries. Prefer conservation matrices with smaller L1- norm (Valdes-Perez and Erdmann 1994). Valdes-Perez, R., Erdmann, M.: Systematic induction and parsimony of phenomenological conservation laws. Computer Physics Communications 83 (1994). Slide 29 29/36 Ontological Simplicity Recall that conserved quantities define groups or families of particles. Prefer quantities that induce smallest number of disjoint families. The fewer kinds of things are introduced by a theory, the ontologically simpler it is (homogeneity). The less overlap between kinds, the greater the ontological simplicity. Ontological simplicity # of Kinds Slide 30 30/36 30 /17 Parsimony meets Ontology Theorem (Schulte 2008). Let R be a reaction data matrix. If there is a nullspace basis conservation matrix Q with disjoint entity clusters, then The clusters (families) are uniquely determined. There is a unique nullspace basis Q* that minimizes the L1-norm (up to sign). pn -- 00 e-e- e -- -- Baryon#Electron#Muon#Tau# Quantity#1Quantity#2Quantity#3Quantity#4 Any alternative set of 4 Q#s with disjoint carriers The Co-Discovery of Conservation Laws and Particle Families. O. Schulte (2008). Studies in History and Philosophy of Modern Physics. Slide 31 31/36 Implementation The theorem implies that minimizing the L1-norm will discover the unique set of particle families determined by the data. Minimization Problem. Minimize L1-norm |Q|, subject to nonlinear constraint: Q columns are basis for nullspace of R. Algorithm by Schulte and Drew (2010). If electric charge is fixed as input, recovers exactly the laws in Standard Model! Learning Conservation Laws Via Matrix Search. Oliver Schulte and Mark S. Drew, Discovery Science 2010. Slide 32 32/36 Big Picture: Simplicity in Learning Conservation Laws In particle physics problem: 1. Maximize topological/inductive simplicity first. 2. Maximize ontological simplicity and parsimony to break ties. Simplicity, Induction, Scientific Discovery simplicity ontology parsimony/ description length Mind changes/ topology Slide 33 33/36 Another Application: Learning Bayes Nets Learn Bayes nets from observed correlations (constraint-based). Simplicity Rank of Bayes net G = number of edges not in G. Is there a unique minimum- edge graph for a given set of observed correlations? NP- hard. Mind Change Optimal Learning of Bayes Net Structure. O.Schulte, W. Luo and R. Greiner (2007) COLT. MeaslesAllergy Spots MeaslesAllergy Spots simpler more complex Slide 34 34/36 Summary: Theory Mind-change optimal learning: converge to a correct hypothesis with a minimum number of theory changes. Mind-change complexity is characterized by topological concept of accumulation order. Also defines a topological concept of simplicity rank for a hypothesis. There is a mind-change optimal method that conjectures the uniquely topologically simplest hypothesis if there is one, otherwise outputs ? for no conclusion. Topological Simplicity does not indicate truth, but maximizing it leads to efficient convergence. Simplicity, Induction, Scientific Discovery Slide 35 35/36 Summary: Examples Examples of the mind-change optimal method. Existence problem: conjecture reaction not possible until observed. Riddle of Induction: conjecture all emeralds are green until blue one is observed. Learning conservation laws: conjecture maximum-rank conservation matrix consistent with the data. Matches predictions of particle physics standard model. Refine selection using L1-norm: matches quantities in standard model exactly. recovers particle families (ontology). Simplicity, Induction, Scientific Discovery Slide 36 36/36 The End Thank you!