tools for the toolmakers

45
Tools for the Toolmakers Designing a Programming Language

Upload: caleb-callaway

Post on 15-Apr-2017

93 views

Category:

Technology


0 download

TRANSCRIPT

Tools for the Toolmakers Designing a Programming Language

Please Ask Questions (hold the rotten fruit)Tried to make an accessible presentationDon't know what the audience doesn't know

0. About Me (and My Perspective)Back-to-school senior4 years in industry with a large-scale C#/WPF code base, developing computational geometry, CAD, and CAM applications and solving optimization problems.Rails and Android apps here and there

Pain PointsProgramming languages are a pain sometimes"Industry-strength" languages: C++/Java/C#"Scripting" languages: Python, Ruby, BashFunctional programming languages: HaskellAll of the above are brilliant, useful, and viable languages; this presentation does not intend to denigrate or belittle

Pain Points: Syntactic VerbosityJava (5 LOC, 120 chars)public class HelloWorld {public static void main(String[] args) {System.out.println("Hello, World");}}

Python 3.5 (1 LOC, 22 chars)print("Hello, World")

Note: sometimes verbosity aids readability

Pain Points: Executive VerbosityJava$ javac HelloWorldApp.java$ java HelloWorldApp

Python 3.5$ python3 hello_world.py

Build tools (e.g. make) mitigate this problem somewhat, but introduce their own complexity

Pain Points: Runtime Type FailuresPython>>> 1 in ["1", "2", "3"]False

Issue is kind of obvious here, but obfuscated by function calls and modules in production codeA Youtuber I talked with confirmed that one frequently ends up re-inventing a type system with unit testsThe Python community is working on this (see type decorators: https://www.python.org/dev/peps/pep-0484/)

Pain Points: More Runtime Type FailuresRubydef is_thing (arg) arg == thingend

#the next statement evaluates to false: our function didn't check the type of its argument#As with Python, this issue is obfuscated by layers of indirectionis_thing [thing]

def do_thing(a) return a.specific_member #runtime failure if the member doesn't existend

Pain Points: Immutable BoilerplateImmutability and constness are recommended defaults, but performing transforms on immutable record types can be very verbose:

class Point {public:Point(const int x, const int y);const Point* WithX(const int x) const { return new Point(x, m_y); }const Point* WithY(const int y) const { return new Point(m_x, y); }

private:const int m_x;const int m_y;};

Pain Points: Separate Declaration and DefinitionPoint.hclass Point {public:Point(const int x, const int y);const Point* WithX(const int x) const;const Point* WithY(const int y) const;

private:const int m_x;const int m_y;};

Pain Points: Separate Declaration and DefinitionPoint.cpp

#include "Point.h"

Point::Point(const int x, const int y) : m_x(x), m_y(y) { }const Point* Point::WithX(const int x) const { return new Point(x, m_y); }const Point* Point::WithY(const int y) const { return new Point(m_x, y); }Multiple signatures that must be kept in syncTooling can help (Eclipse's Implement Method feature is a life-saver), we shouldn't rely on tooling if we don't have toSomewhat poetically, couldn't fit the contents of both files on a single slide

Pain Points: Functional Language SyntaxString indexing in Haskell:ghci> "Steve Buscemi" !! 6'B'

Snippet from my xmonad.hs (is that a TIE fighter?):main = doxmonad $ gnomeConfig {layoutHook = smartBorders $ layoutHook gnomeConfig, workspaces = myWorkspaces, manageHook = myManageHook manageHook defaultConfig} `additionalKeys` myKeys `removeKeys` disabledKeys

2. Principles and ObservationsAs creators of software, automation is what we do.

Examples of things that aren't automated:Having to remember the types of function parameters or look them up in the docsWriting code that validates the types of function argumentsWriting unit tests to do type validation

Principles and Observations, the ContinuationTighter feedback loops allow us to deliver better solutions faster"Feedback loop" encompasses everything necessary to create a solution and be confident that it works as intendedAuthoring codeSemantic analysisUnit testing, user testing, integration testing, etcRequires succinct syntax and succinct build interfaceMust identify defects as early as possibleLanguages like Python initially give very tight feedback loops, then the loops get longFirst iteration of code is easy to write, but many failures don't occur until runtimeNon-trivial problems can't be kept in mind all at once. If you cant keep the whole problem in your head at once, it gets hard to reason about a codebaseEven "simple" scripts frequently grow past the non-trivial point as requirements change

Principles and ObservationsConventions are important, but it's important theyre not astonishingAstonishing behavior is inhumane, inducing stress and frustration"Non-astonishing" usually means "familiar"Follow established conventions where possibleWriting boilerplate code is frustrating and error-proneTools that generate code snippets are treating the symptoms, not the cause

Let's End the PainWe can do better!Or can we?Ideas that sound good in principle can fall over when implementedIdeas that are simple to describe in English can be hard to describe in code

ValidationCan we create a language that:Consistently provides tight feedback loops"Fails fast" and fails loudlyHas succinct syntax and grammarIs readable and unastonishingDoesn't violate existing conventions except where necessary

Failing Fast: Basic DefinitionsTypeA shorthand for the properties of something in memory (e.g. an object) and the manner in which we may interact with it.Dynamically Typed LanguageVariables are not associated with a type, and may be assigned a value of any type. Values are generally typed.Statically Typed LanguageVariables are associated with a type, and may only be assigned values of its type, or the type's subtypes. Typecasting is a hole in a statically typed language (widening conversions are also questionable)

Failing Fast: Syntax, Grammar, SemanticsSymbolA sequence of characters

SyntaxLegal sequences of symbols, usually described by a formal grammar

SemanticsThe meaning of the syntaxFrequently analyzed without mathematical formality; type validation is part of semantic analysis

Failing Fast: Semantic AnalysisSource Code ->Lexer ->Parser ->Abstract Syntax Tree ->Code Emission (or Interpretation) ->Testing (hopefully) ->Production"Failing fast": push detection of defects as far up this stack as possibleDynamically typed languages can't do semantic analysis until Testing stageStatically typed languages can do semantic analysis on the AST, so we'll use a statically typed language for our validation

ReadabilitySometimes conflicts with succinctness (see Game of Life example)Sometimes in tension with verbosity (see Java's "Hello, World!")Existing conventions shouldn't be violated without good reasonExample: parentheses are deeply associated with function invocation

The Language: newtDerived from gpl, the domain-specific language (DSL) developed in Dr. Tyson Henry's Compilers courseNo geometry types or animation blocksBy extension, no tests for geometry types; all other tests are left intact, and are passingNo statement terminators (that is, no semi-colons)More detailed error reporting, including column numbersGeneralized build systemExplicit errorsRepresentative--not exhaustive--set of features implemented for purposes of validationWould like to do more

Implemented FunctionalityPrimitive Types (bool, int, double, string)Arrays (dynamically sized, multi-dimensional)Record Types ("structs")can be marked read-onlyBasic Flow Control and Logic OperatorsFunctionsRecursive functions not implemented due to time constraints, but no implementation roadblocks are knownSupport for first-class and higher-order functionsType InferenceNo memory memory management (e.g. garbage collection or delete)

Design PhilosophyFavor immutability and constness wherever possibleDiverged from this by allowing identifiers to be mutable by default; would like to change thisSyntax should require as little text entry as possible, without affecting readabilityNo significant whitespace, however, because it makes code refactoring harder to automateFunction declaration syntax sacrifices succinctness to aid readabilityErrors are serious businessNothing is executed until semantic analysis is completeNothing is executed if semantic analysis yields any errorsStop execution on the first runtime errorDon't yield an error unless there's a very good reason

Notable Implementation DetailsObject-oriented, written in C++Mostly C++98, with a few C++11 constructs like autoFavor immutability and constness, all the way downReentrant parserInformation about statements and expressions stored in the corresponding AST for error reportingNo implicit state (e.g. global variables) in the runtimeVital for keeping execution state organized, particularly during function callsExecution state captured in ExecutionContext objects that are passed around as neededEvery type has a default valueIncluding compound types (e.g. structs)Even functions!

More Notable Implementation DetailsBuild interface is simple (requires Flex and Bison):$ make allAutomated testing framework is simple:$ make testVery thin parser file compared to gpl913 LOC, reduced from 1865Semantic analysis is done in ordinary C++ codeDecision motivated primarily by tooling: Bison files aren't well-supported by my IDE of choiceAlso a good separation of concerns: semantic analysis and parsing aren't the same thing and don't mix wellExplicit communication channels for errors inside the runtimeThrowing exceptions isn't a great way to aggregate messages

VariablesStrongly typedDistinct declaration and assignment operators:

a :int= 42 #declarationa = 13 #assignment

Syntax motivated by "fail early" philosophy, variable shadowing, and first-class functions (more on this later)Type can be inferred:

name := "John Doe" #"name" will be of type 'string'age := 42.5 #"age" will be of type 'double'

Flow Controla := 15if (a > 12) {print("The world is flat.")} else {print("The rules no longer apply.")}Built to be as un-astonishing as possible to a C/C++/Java programmer

Arraysarr:int[] #array of intsBrackets go with base type (departure from C/C++)Dynamically sizeStatic sizes seem possible because declaration operator is distinct from assignment operator (this syntax is a recent change)Inserting beyond the end of the array autofills any non-existent indices with default values (this feels kind of weird)Multi-dimensionalArray literal syntax not implemented, but it's quite feasible to do so

Record Types ("structs")readonly struct Point {x:inty:int}

p1 :Point= @Point with { x = 20, y = 30 }p2 := p1 with { y = 56 + fun_call() }p3 := p2 #create an alias for p2Declaration maps identifiers to types; an instance maps identifiers to valuesNot stored contiguously in memory"with" syntax generates instances, either from a default value or another instance (reminder: all types have default values)"with" syntax always generates a new instance, never aliases

FunctionsFirst-class and higher-order functions ("functions as data")Functions are assignable to variablesFunctions can take functions as argumentsTrue first-class functions that close over their contextsNot just function pointersNo statement terminators in combination with the chosen function type specifier syntax gives rise to some complexity: fun := (a:int, b:double) -> int {} instead of int fun(int a, double b) {}.6 additional charactersCould be reduced to 3 additional characters by removing "->" and comma separator syntax, but at the possible expense of readability:fun := (a:int b:double) int {}

Why First-class and Higher-order Functions?They're shinyNecessary for functional programmingUseful forcallbacksgeneric sortingmap/reduce operations

Function Examples#My First Functionadd := (a:int, b:int) -> int {return a + b}result := add(2, 3)

#Higher order function example (callback)do_op := (value:int, callback: (int) -> string) -> int {#do some processing here, possibly in the backgroundcallback(value)}

#some anonymous function action thrown in for freeresult := do_op(3, (a:int) -> string { print("Good job " + a + "!") })

Automated TestingExtensive automated test suite (~330 individual tests as of this writing)Derived from gpl's test suiteFor every bug, a testUseful for testing semantics that cannot be expressed in the C++ type systemProved vital when developing new features in a way that didn't break existing code

StopDemo Time

It's BetterI believe newt offers strong evidence that statically typed languages with lightweight, fail-fast syntax and grammar are viableLoss of succinctness in favor of tighter feedback loops as projects scaleRoom for improvement hereFunction declaration syntaxFunction overloading and polymorphismDistinct declaration and assignment syntax will probably stub toes (it still stubs mine)Dogfooding required

CaveatsProject has been active for 6 monthsCompared to years for Ruby and Python and decades for Perl, Bash, and C++.Comparing the number of person-hours is even more amusingRome wasn't built in a day, and programming languages aren't built in a half a year

LessonsLarge projects are collaborative efforts. You can't do it alone (but nobody will do it for you either)Build a good foundation. Get the tooling right. Get the environment right. Automate all the things, and dont repeat yourself (without damn good reason)The following tools are non-optional:Version control. You cannot maintain the entire project history in your head, and its a waste of cognitive resources to tryAutomated regression testing. One cannot maintain the entire state of a non-trivial project in one's head, so you cannot completely validate a change or enhancement without automated testing.

LessonsThe complexity in any nontrivial system is a result of the interaction of its partsExample: C-style declaration syntax doesnt mix with first-class functions that share symbols with function invocationSeparation of concerns is a real thingDoing semantic analysis in the parser can make it difficult to reason about the codeThe implementation of a well-designed language is so regular its almost boring. Patterns are everywhere, but exploiting those patterns can be time-consuming

LessonsStatement terminators (e.g. semicolons) make parsing easierBut I'm glad newt doesn't have them, because they're not very succinctUsing enums for type specifiers isn't very extensibledynamic_cast is usually code smell, but sometimes necessaryGet the parser working correctly before writing the implementationLinked lists are an interesting way of expressing immutable ASTs, but they aren't always simpleFor example, newt generates a linked list of parameters, but it has to be reversed before processing

Statistics$ cloc $(git ls-files)806 text files.794 unique files.665 files ignored.

http://cloc.sourceforge.net v 1.60 T=2.72 s (52.2 files/s, 4806.0 lines/s)-------------------------------------------------------------------------------Language files blank comment code-------------------------------------------------------------------------------C++ 62 1137 925 5398C/C++ Header 74 1100 1044 2360yacc 1 91 58 763lex 1 26 14 88make 1 11 8 44YAML 1 0 0 9Bourne Shell 2 0 0 6-------------------------------------------------------------------------------SUM: 142 2365 2049 8668-------------------------------------------------------------------------------

More, Pleasenewt is free (as in freedom) software, licensed under GPL v3. It is available at http://github.com/cqcallaw/newt

AcknowledgementsDr. Tyson HenryAdvisingLao AkiliBottomless fount of ideas and Chief Rubber DuckDr. Kevin BuffardiConsultation and usability brainstormingAll y'all

Notable ChallengesMinimizing the build systemBasic Assignment (widening conversions are painful)ArraysGeneralizing type specifiersMaking functions first-class

WishlistCollections (sets, array literals, etc)Polymorphic functionsOption typesSyntactic sugar for structsSyntax highlighter and code formatterMore versatile test suite