schema-based program synthesis and the autobayes system part ii johann schumann sgt, nasa ames
TRANSCRIPT
![Page 1: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/1.jpg)
Schema-based Program Synthesis and the AutoBayes System
Part II
Johann Schumann
SGT, NASA Ames
![Page 2: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/2.jpg)
Example
• Generate a program that finds the maximum value of a function f(x): max f(x) wrt x
univariate multivariate
Note: the function might be given as a formula or a vector of data
![Page 3: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/3.jpg)
Schemas for univariate optimizationschema(max F wrt X, C) :- ... as before
schema(max F wrt X, C) :- length(X, 1), % F is a vector of data points F(0..n) C = let(sequence([ assign(mymax,0), for(idx(I,0,n), if(select(F,I) > mymax, assign(mymax, select(F,I)), skip)... ]), comment([‘The maximum is found by iterating...’]), mymax).
schema(max F wrt X, C) :- length(X, 1), % instantiate numeric solution algorithm % e.g., golden section search C = ...schema(max F wrt X, C) :- ...
..
![Page 4: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/4.jpg)
Schema for univariate optimization
1. build the derivative: df/dx
2. set it to 0: 0 = df/dx
3. solve that equation for x
4. the solution is the desired maximum
schema(max F wrt X, C) :- % INPUT (Problem), OUTPUT (Code fragment) % guards
length(X, 1),
% calculate the first derivative simplify(deriv(F, X), DF),
% solve the equation solve(true, x, 0 = DF, S), % possibly more checks % is that really a maximum? simplify(deriv(DF, X), DDF), (solve(true, x, 0 > DDF, _) -> true ; writeln(‘Proof obligation not solved automatically’) ), XP = [‘The maximum for‘, expr(F), ‘is calculated ...’], V = pv_fresh, C = let(assign(V, C, [comment(XP)]), V).
..
![Page 5: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/5.jpg)
Demo• Generation of multiple programs
– -maxprog– -maxprog N -fastest (coarse approximation)
• Control for numeric solvers– pragma schema_control_arbitrary_init_values– pragma schema_control_use_generic_optimize
• Tracing pragmas• The necessity of constraints
![Page 6: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/6.jpg)
Multivariate Optimization
• Task: minimize function F(X) wrt X
• Algorithm: double* minimze(F){ double* x0 = pick_start(); int converging = 1; while (converging){ double step_length = 0.1; double step_dir = -gradient(F,x0); x1 = x0 + step_length * step_dir; if (fabs(F(x1) - F(x0)) < 0.001) converging = 0; else x0 = x1; }}
•start somewhere•go down along the steepest slope•when you come to a flat area, return that (local) minimum•Many design decisions
• where to start?• how to move?• when to stop?
![Page 7: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/7.jpg)
Multivariate Optimizationschema(max F wrt X, C) :- % IN, OUT % guards: here none
length(X,Y), Y > 1,
% divide and solve subproblemsschema(getStartValue(F,X), C_Start), % recursive schema callsschema(getStepDirection(F,X), C_Dir),schema(getStepSize(F,X), C_Size),
% assemble code segment X0=pvar_new(X), % get a new PROGRAM variable C = block([local(X0,double)], series( [ assign(X0, C_start), while_converging(X0, assign(X0, +([X0, *([C_Dir, C_Size]))) ]) ).
![Page 8: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/8.jpg)
Multivariate optimization II
• The schemas generate code in an intermediate language
• procedural elements
• local variables, lambda blocks
• sum(..), while_converging(..) --> loops
X0=pvar_new(X), C = block([local(X0,double)], series( [ assign(X0, C_start), while_converging(X0, assign(X0, +([X0, *([C_Dir, C_Size]))) ]) ).
double v_0;double E;v_0 = -99;E = 1e10;while (E > 0.001){ y = sin(v_0); v_0 = V_0 - cos(v_0) * 0.01; E = fabs(y - sin(v_0)); }
generated code for max sin(v) wrt v
Important: variables in specification or program are NOT Prolog variables
![Page 9: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/9.jpg)
Why schema-based synthesis?
Multiple algorithm variants can be automatically constructedThe “best” one is chosen by the user or selected via constraints
some possibilities for getStepDir
![Page 10: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/10.jpg)
AB Schema Hierarchies• Schemas to break down statistical problem
– Bayesian independence theorems -- works on Bayesian graphs
• Schemas to solve complex statistical problems– instantiate (iterative) clustering algorithms– handling of time series problems
• Schemas to solve atomic problems– instantiate PDF and maximize (symbolically)– instantiate numerical solvers (see last slides)
• auxiliary schemas– initialization of clustering algorithms– data pre-processing (e.g., [0..1] normalization)
![Page 11: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/11.jpg)
AB Schema Hierarchy• Static tree structure
• AB uses two kinds of schemas– schemas for
probabilistic problems
– schemas for formula
![Page 12: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/12.jpg)
Schemas and AB Model• The AB schemas have to use all information from the input
specification, which is stored in the Prolog data base (AB model)
• Problem: schemas can modify the model, which must be undone during backtracking– add new statistical variables– remove dependencies for subproblems
• Solutions:– add model as parameters: schema(Prob, C, M_in, M_out) and
everywhere else– keep a model stack (similar to the dynamic calling environments in
procedural languages) and use backtrackable asserts/retracts
![Page 13: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/13.jpg)
Backtrackable Global Stuff• Global data in Prolog are handled using assert/retract or flags.
All other data are local to each clause p(X) :- q(X,Z), r(Z). % X, Y, Z local to clause
• Asserts are not backtrackablep(X) :- assert(keep(X)), ..., fail.The “keep(X)” is kept in the data base even after backtracking
• Work-around: add global variables as parameter to all predicates (impractical)p(X, GL_in, GL_out) :- GL_out = [keep(X)|GL_in], ...
• Backtrackable bassert/bretract requires some low-level additional C-programs (but has clean semantics)
![Page 14: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/14.jpg)
Schema Control• schema applicability is controlled via guards• order of application: order in Prolog file• How to enforce/avoid certain schemas
– autobayes pragmas, but that’s not really fun– doesn’t work for nested applications:
• inner loop: symbolic solutions only• outer loop: enable numeric loop
– generate them all and decide later or pick “fastest”
• schema control language is a research topic– extend declarative AB language– how to talk about selection of iterative algorithm in a purely
declarative language?
![Page 15: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/15.jpg)
The AB Infra Structure
• term utilties• rewriting engine• symbolic system:
– simplifier– abstraction (range, sign, definedness)– solver
• pretty printer (code, intermediate language)• comment generation
![Page 16: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/16.jpg)
Term utilities• implemented on top of Prolog a lot of
functional-programming style predicates for– lists, sets, bags, relations– terms, AC-terms
• operations– term_substitute, subsumption, differences
between term sets
• ...
![Page 17: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/17.jpg)
Rewriting Engine• A lot of stuff in AB is done using rewriting (but not
all)• small rewriting engine implemented in Prolog
– rewriting rules are Prolog clauses– conditional rewriting, AC-style rewriting– Evaluation:
• eager: apply first top-down• lazy: apply bottom up
– continuation: pure bottom-up or dove-tailing– handle for attachment of prover/constraint solver– compilation of rewriting rules for higher efficiency
![Page 18: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/18.jpg)
Rewriting Rules
• Can combine pure rewriting with Prolog programming in the body of the rewrite rule
% NAME, STRATEGY, PROVER, ASSUMPTIONS, IN, OUTtrig_simplify('sin-of-0', [eval=lazy|_] ,_,_, sin(0), 0) :- !. trig_simplify('sin-of-pi-over-6',[eval=lazy|_],_,_,sin(*([1/6, pi])), 1/2) :- !. trig_simplify('cos^2+sin^2',[eval=eager|_],_,_, +(Args), +([1|Args3])) :- select(cos(X)**2, Args, Args2), select(sin(X)**2, Args2, Args3), !.
![Page 19: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/19.jpg)
Compilation and Rewriting• Group and compile rewrite rules (statically) ?- rwr_compile(my_simplifications, [trig_simplify, remove_const_rules ] ).
• Call the rewriting engine rwr_cond(my_simplifications, true, S, T).
• Calling with time-out
![Page 20: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/20.jpg)
Symbolic System• Symbolic system implemented on top of the rewriting
engine + Prolog code for solvers, etc• assumption-based rewriting
– X/Y -- (not(Y = 0)) --> X
• simplification (lots of rules)• calculation of derivatives (deriv(F,X) as operator)• Taylor-series expansion, ...• equation solver
– polynomial solver– Gauss-elimination for sets of linear equations– sequentialization of equation systems
![Page 21: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/21.jpg)
The AB Intermediate language
• strict separation between synthesis and code generation• small procedural intermediate language with some extensions
– sum(..), prod(..), simul_assign(..), while_converging(...)– Annotations for comments, and pre/post/inv formulas
• code generator for different languages/targets– C++/Octave– C/Matlab, C/standalone– ADA/SparkADA, Java (both “unsupported/in work/bad shape”)
• Pretty-printer to ASCII, HTML, LaTeX
![Page 22: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/22.jpg)
Extending AutoBayes• some extensions are straight-forward: add text-book
formulas• additional symbolic simplification rules might be
required• adding schemas requires substantial work
– “hard-coded” schema as first step– applicability constraints and control– functional mechanisms to handle scalar/vector/matrix cases
are available– support for documentation generation– no schema language, Prolog syntax used
![Page 23: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/23.jpg)
Non-Gaussian PDF• Data characteristics are modeled using
probability density functions (PDFs)
• Example: Gaussians, exponential, ...
• AB contains a number of built-in PDFs, which can be extended (hands-on demo)
• Having multiple PDFs adds a lot of power over libraries
![Page 24: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/24.jpg)
Example• For clustering, often Gaussian distribution of data is used.• How about angles: 0 == 360• you get 5 clusters
• A different distribution (vonMises-Fisher) automatically solves this problem
• In AutoBayes: just replace the “gauss” by “vonmises1” -- no programming required
• multiple PDFs in one spec
![Page 25: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/25.jpg)
Sample Generation• We have used:
– MODEL ---> P ---(data)--> parameters
• The model can be read the other way round: generate me random data, which are consistent with the model– MODEL ---> P ---(parameters)--> data
• Very useful for– model debugging/development– debugging and assessment of synthesized algorithms
![Page 26: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/26.jpg)
AutoBayes and Correctness• practical synthesis: forget about correct-by-
construction, but• detailed math derivations, which can be checked
externally (e.g., by Mathematica)• literature references in documentation/comments• generation of test harness and sample data• checking of safety properties (“AutoCert”)
[Cade2002 slide set]
![Page 27: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/27.jpg)
AutoBayes as a Prolog Program• AutoBayes is a pretty large program
– ~180 prolog files, 100,000LoC (with AutoFilter)
• Heavy use of– meta-programming (call, etc.)– rewriting (using an engine implemented in Prolog)– functional programming elements for all sorts of list/vector/array
handling– backtracking and backtrackable global data structures– procedural (non-logical) elements, e.g., file I/O, flags, etc.
• no use of modules but naming conventions• everything SWI Prolog + few C extensions to handle
backtrackable global counters and flags
![Page 28: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/28.jpg)
AutoBayes Weak Points• The input parser is very inflexible (uses Prolog operators)• Very bad error messages–often just “no”• no “schema language”: AutoBayes extension only by union of
Prolog/domain specialist• Only primitive control of schema selection: need for a schema-
selection mechanism• not all schemas are fully documented• large code-base, which needs to be maintained
![Page 29: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames](https://reader037.vdocuments.net/reader037/viewer/2022110319/56649c755503460f94928bfb/html5/thumbnails/29.jpg)
Summary• AutoBayes suitable for a wide range of data analysis tasks• AutoBayes generated customized algorithms• AutoBayes schema-based program synthesis + symbolic• logic + functional + procedural elements used • AutoBayes extension: easy to very hard• AutoBayes debugging: a pain, but explanations and LaTeX output
very helpful• AutoBayes is NASA OpenSource: bugfixes/extensions always
welcome• AutoBayes has a 160+ pages Users manual• AutoBayes useful for classroom projects to PhD projects