learning to learn programs from examples: going ?· learning to learn programs from examples:...

Download Learning to Learn Programs from Examples: Going ?· Learning to Learn Programs from Examples: Going…

Post on 12-Sep-2018




0 download

Embed Size (px)


  • Learning to Learn Programs from Examples: Going Beyond Program Structure

    Content areas: program synthesis, machine learning, programming by examples

    AbstractProgramming-by-example technologies let endusers construct and run new programs by providingexamples of the intended program behavior. But,the few provided examples seldom uniquely deter-mine the intended program. Previous approachesto picking a program used a bias toward shorter ormore naturally structured programs. Our work heregives a machine learning approach for learning tolearn programs that departs from previous work byrelying upon features that are independent of theprogram structure, instead relying upon a learnedbias over program behaviors, and more gener-ally over program execution traces. Our approachleverages abundant unlabeled data for semisuper-vised learning, and incorporates simple kinds ofworld knowledge for common-sense reasoning dur-ing program induction. These techniques are eval-uated in two programming-by-example domains,improving the accuracy of program learners.

    1 IntroductionBillions of people own computers, yet vanishingly few knowhow to program. Imagine an end user wishing to extract theyears from a table of data, like in Table 1. What would be atrivial regular expression for a coder is impossible for the vastmajority of computer users. But anyone can show a com-puter what to do by giving examples an observation thathas motivated a long line of work on the problem of pro-gramming by examples (PBE), a paradigm where end usersgive examples of intended behavior and the system respondsby inducing and running a program [Lieberman, 2001]. Acore problem in PBE is determining which single programthe user intended within the vast space of all programs con-sistent with the examples. Users would like to provide onlyone or a few examples, leaving the intended behavior highlyambiguous. Consider a user who provides just the first in-put/output example in Table 1. Did they mean to extract thefirst number of the input? The last number? The first numberafter a comma? Or did they intend to just produce 1993for each input? In real-world scenarios we could encounteron the order of 10100 distinct programs consistent with theexamples. Getting the right program from fewer examples

    Input table Desired output tableMissing page numbers, 1993 199364-67, 1995 19951992 (1-27) 1992

    Table 1: An everyday computer task trivial for programmersbut inaccessible for nonprogrammers: given the input table ofstrings, automatically extract the year to produce the desiredoutput table on the right.

    means less effort for users and more adoption of PBE tech-nology. This concern is practical: Microsoft refused to shipthe recent PBE system Flash Fill [Gulwani, 2011] until com-mon scenarios were learned from only one example.

    We develop a new inductive bias for resolving the ambi-guity that is inherent when learning programs from few ex-amples. Prior inductive biases in PBE use features of theprograms syntactic structure, picking either the smallest pro-gram consistent with the examples, or the one that looks themost natural according to some learned criterion [Liang etal., 2010; Menon et al., 2013; Singh and Gulwani, 2015;Lin et al., 2014]. In contrast, we look at the outputs and ex-ecution traces of a program, which we will show can some-times predict program correctness even better than if we couldexamine the program itself. Intuitively, we ask, what do typ-ically intended programs compute? rather than what dotypically intended programs look like? Returning to Table 1,we prefer the program extracting years because its outputslook like an intended behavior, even though extracting thefirst number is a shorter program.

    We apply our technique in two different PBE domains: astring transformation domain, which enriches Flash Fill-styleproblems (eg Table 1) with semantic transformations, likethe ability to parse and transform times and dates [Singh andGulwani, 2012a] and numbers [Singh and Gulwani, 2012b];and a text extraction domain, where the goal is to learn aprogram that extracts structured tables out of a log file [Leand Gulwani, 2014]. Flash Fill, now a part of MicrosoftExcel, motivated a series of other PBE systems, which co-alesced into a software library called PROSE [Polozov andGulwani, 2015]. In PROSE, one provides the hypothesisspace (programming language) and gets a PBE tool for free.But PROSE does not solve the ambiguity problem, instead

  • using a hand-engineered inductive bias over programs. Ourwork integrates into PROSE and provides a better inductivebias. Although we worked with existing PROSE implemen-tations of the string transformation and text extraction do-mains, the broad approach is domain-agnostic. We take asa goal to improve PROSEs inductive bias, and use the phrasePROSE to refer to the current PROSE implementations ofthese domains, in contrast to our augmented system.

    1.1 Our contribution: picking the correct programWe develop two new contributions to PBE technology:

    Predictive featuresPredicting program correctness based on its syntactic struc-ture is perhaps the oldest and most successful idea in pro-gram induction [Solomonoff, 1964]. This general family ofapproaches use what we call program features to bias thelearner. But the correctness of a program goes beyond its ap-pearance. We develop two new classes of features that areinvariant to program structure:Output features. Some sets of outputs are apriori more likelyto be produced from valid programs. In PBE scenarios theuser typically labels few inputs by providing outputs but hasmany unlabeled inputs; the candidate outputs on the unla-beled inputs give a semisupervised learning signal that lever-ages the typically larger set of unlabeled data. See Table 2and 3. In Table 2, the system considers programs that eitherappend a bracket (a simple program) or ensure correct brack-eting (a complex program). PROSE opts for the simple pro-gram, but our system notices that program predicts an outputtoo dissimilar from the labeled example. Instead we preferthe program without this outlier in its outputs.Execution trace features. Going beyond the final outputsof a candidate program, we show how to consider the entireexecution trace. Our model learns a bias over sequences ofcomputations, which allows us to disprefer seemingly naturalprograms with pathological behavior on the provided inputs.

    Input Output (PROSE) Output (ours)[CPT-00350 [CPT-00350] [CPT-00350][CPT-00340 [CPT-00340] [CPT-00340][CPT-115] [CPT-115]] [CPT-115]

    Table 2: Learning a program from one example (top row) andapplying it to other inputs (bottom rows, outputs italicized).Our semisupervised approach let us get the last row correct.

    Machine learning frameworkWe develop a framework for learning to learn programs froma corpus of problems that improves on prior work as follows:Weak supervision. We require no explicitly providedground-truth programs, in contrast with eg [Menon et al.,2013; Liang et al., 2010]. This helps automate the engineer-ing of PBE systems because the engineer need not manuallyannotate solutions to potentially hundreds of problems.The modeling paradigm. We introduce a discriminativeprobabilistic model, in contrast with [Liang et al., 2010;Menon et al., 2013; Singh and Gulwani, 2015]. A discrim-inative approach leads to higher predictive accuracy, while a

    Input Output Output(PROSE) (ours)

    Brenda Everroad Brenda BrendaDr. Catherine Ramsey Catherine CatherineJudith K. Smith Judith K. JudithCheryl J. Adams and Cheryl J. Adams Cheryl

    Binnie Phillips and Binnie

    Table 3: Learning a program from one example (top row)and applying it to other inputs (bottom rows, outputs itali-cized). Our semisupervised approach uses simple commonsense reasoning, knowing about names, places, words, dates,etc, letting us get the last two rows correct.

    probabilistic framing lets us learn with simple and tractablegradient-guided search.

    1.2 NotationWe consider PBE problems where the program, written p, isdrawn from a domain specific language (DSL), written L. Wehave one DSL for string transformation and a different DSLfor text extraction. DSLs are described using a grammar thatconstrains the ways in which program components may becombined. We learn a p L consistent with L labeled in-put/output examples, with inputs {xi}Li=1 (collectively XL)and user labeled outputs {yi}Li=1 (collectively YL). We writep(x) for the output of p on input x, so consistency with thelabeled examples means that yi = p(xi) for 1 i L.We write N for the total number of inputs on which the userintends to run the program, so that means N L. All ofthese inputs are written {xi}Ni=1 (collectively X). When aprogram p L is clear from context, we write {yi}Ni=1 (col-lectively Y ) for the outputs p predicts on the inputs X . Wewrite {yi}Ni=L+1 for the predictions of p on the unlabeled in-puts (collectively YU ).

    For each DSL L, we assume a simple hand-crafted scoringfunction that assigns higher scores to shorter or simpler pro-grams. PROSE can enumerate the top K programs under thisscoring function, where K is large but manageable (for K upto 104). We call these top K programs the frontier, and wewrite FK(X,YL) to mean the frontier of size K for inputs Xand labeled outputs YL (so this means that if p FK(X,YL)then p(xiL) = yiL). The existing PROSE approach is topredict the single program in F1(X,YL). We write () tomean some kind of feature extractor, and use the variable tomean weights placed on those features.

    For ease of exposition we will draw our examples fromstring transformation, where the goal is to learn a programthat takes as input a vector of strings (so xi is a vector ofstrings) and produces as output a string (so yi is a string).

    2 Extracting predictive features2.1 Features of program structureA common intuition in t


View more >