1 parametric polymorphism for popular programming languages andrew kennedy microsoft research...

1

Parametric Polymorphism for

Popular Programming Languages

Andrew KennedyMicrosoft Research Cambridge

2

Or:

Forall for all

Andrew KennedyMicrosoft Research Cambridge

(Joint work with Don Syme)

3

Curriculum Vitae for FOOLs

http://research.microsoft.com/~akenn

4

Parametric polymorphismParameterize types and code by types Concept: Strachey (1967) Language: ML (Milner, 1975), Clu (Liskov, 1975) Foundations: System F (Girard, 1971),

Polymorphic lambda calculus (Reynolds, 1974) Engineering benefits are well-known (code re-use

& strong typing) Implementation techniques are well-researched

5

Polymorphic Programming Languages

Standard ML

O’Caml

Eiffel

Ada

GJ

C++

Mercury

Miranda Pizza

Haskell

Clu

6

Widely-usedPolymorphic Programming Languages

C++

7

Widely-used Strongly-typedPolymorphic Programming Languages

8

In 2004?

C#

Visual Basic?

JavaCobol, Fortran, …?

9

This talkThe .NET “generics” project: What was challenging? What was surprising? What’s left?

10

What is the .NET CLR (Common Language Runtime)? For our purposes: the CLR

Executes MS-IL (Intermediate Language) programs using just-in-time or way-ahead-of-time compilation

Provides an object-oriented common type system Provides managed services: garbage collection, stack-

walking, reflection, persistence, remote objects Ensures security through type-checking (verification)

and code access security (permissions + stack inspection)

Supports multiple source languages and interop between them

11

Themes Design: Can multiple languages be

accommodated by a single design? What were the design trade-offs?

Implementation: How can run-time types be implemented efficiently?

Theory: How expressive is it? Practice: Would you like to program in it? Future: Have we done enough?

12

Timeline of generics projectMay 1999 Don Syme presents proposal to C# and CLR

teamsFeb 2000 Initial prototype of extension to CLR

Jan 2002 Our code is integrated into the product team’s code base

Nov 2002 Anders Hejlsberg announces generics at OOPSLA’02

late 2004? Product release of .NET v1.2 with generics

Feb 2001 Product Release of .NET v1.0

13

Design

14

Design for multiple languagesC++

Can I write class C<T> : T

MLFunctors are cool!

Visual BasicDon’t touch my

language!

C++ Give me template

specializationC++

And template meta-

programmingJava

Run-time types please

SchemeWhy should I care?

C#Just give me decent collection classes

HaskellRank-n types? Existentials?

Kinds? Type classes?

EiffelAll generic types covariant please

15

Some design goals Simplicity

Don’t surprise the programmer with odd restrictions

ConsistencyFit with the object model of .NET

Separate compilationType-check once, instantiate anywhere

16

Non-goals C++ style template meta-programming

Leave this to source-language compilers Higher-order polymorphism, existentials

Hey, let’s get the basics right first!

17

What’s in the design? Type parameterization for all declarations

classes e.g. class Set<T>

interfaces e.g. interface IComparable<T>

structse.g. struct HashBucket<K,D>

methods e.g. static void Reverse<T>(T[] arr)

delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)

18

What’s in the design (2)? Bounds on type parameters

single class bound (“must extend”)e.g. class Grid<T> where T : Control

multiple interface bounds (“must implement”)e.g. class Set<T> where T : IComparable<T>

19

Simplicity => no odd restrictions

interface IComparable<T> { int CompareTo(T other); }

class Set<T> : IEnumerable<T> where T : IComparable<T>{ private TreeNode<T> root; public static Set<T> empty = new Set<T>(); public void Add(T x) { … } public bool HasMember(T x) { … }}

Set<Set<int>> s = new Set<Set<int>>();

Type arguments can be value or reference types

Even statics can use type parameter

Bounds can reference type parameter (“F-bounded

polymorphism”)

Interfaces and superclass can be

instantiated

20

Consistency => preserve types at run-time Type-safe serialization:

Interop with legacy code:

Reflection:

Object obj = formatter.Deserialize(file);LinkedList<int> list = (LinkedList<int>) obj;

// Just wrap existing Stack until we get round to re-implementing itclass GStack<T> { Stack st; public void Push(T x) { st.Push(x); } public T Pop() { return (T) st.Pop(); }…

object obj; …Type ty = obj.GetType().GetGenericArguments()[0];

21

Separate compilation => restrict generic definitions No dispatch through a type parameter

No inheritance from a type parameter

class C<T> { void meth() { T.othermeth(); } // don’t know what’s in T}

class Weird<T> : T { … } // don’t know what’s in T

22

Implementation

23

Compiling polymorphism, as wasTwo main techniques: Specialize code for each instantiation

C++ templates, MLton & SML.NET monomorphization good performance code bloat

Share code for all instantiations Either use a single representation for all types (ML,

Haskell) Or restrict instantiations to “pointer” types (Java) no code bloat poor performance (extra boxing operations required on

primitive values)

24

Compiling polymorphism in the Common Language Runtime Polymorphism is built-in to the intermediate

language (IL) and the execution engine CLR performs “just-in-time” type specialization Code sharing avoids bloat Performance is (almost) as good as hand-

specialized code

25

Code sharing Rule:

share field layout and code if type arguments have same representation

Examples: Representation and code for methods in Set<string>

can be also be used for Set<object> (string and object are both 32-bit pointers)

Representation and code for Set<long> is different from Set<int> (int uses 32 bits, long uses 64 bits)

26

Exact run-time types We want to support

if (x is Set<string>) { ... }else if (x is Set<Component>) { ... }

But representation and code is shared between compatible instantiations e.g. Set<string> and Set<Component>

So there’s a conflict to resolve… …and we don’t want to add lots of overhead to

languages that don’t use run-time types (ML, Haskell)

27

Object representation in the CLR

vtable ptr

fields

normal object representation:type = vtable pointer

vtable ptr

elements

array representation:type is inside object

element typeno. of

elements

28

Object representation for generics Array-style: store the instantiation directly in the

object? extra word (possibly more for multi-parameter types)

per object instance e.g. every list cell in ML or Haskell would use an extra

word Alternative: make vtable copies, store

instantiation info in the vtable extra space (vtable size) per type instantiation expect no. of instantiations << no. of objects so we chose this option

29

Object representation for generics

vtable ptr

fields

x : Set<string>

vtable ptr

fields

y : Set<object>

Add

HasMemberToArray

Add

HasMemberToArray

code for HasMember

code for ToArray

code for Add

string object

… …

30

Type parameters in shared code Run-time types with embedded type parameters

e.g.class TreeSet<T>{

void Add(T item) { ..new TreeNode<T>(..).. }}

Q: Where do we get T from if code for m is shared?A: It’s always obtainable from instantiation info in this object

Q: How do we look up type rep for TreeNode<T> efficiently at run-time?A: We keep a “dictionary” of such type reps in the vtable for TreeSet<T>

31

Dictionaries in action

class Set<T> { … public void Add(T x) { … …new TreeNode<T>()… } public T[] ToArray() { … …new T[]… }}

Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray();

32


string

…vtable slots

…



vtable for Set<string>

33


string

…vtable slots

…

TreeNode<string>

34


string

…vtable slots

…

TreeNode<string>


Set<string> s = new Set<string>();s.Add(“a”);Set<Set<string>> ss = new Set<Set<string>>();ss.Add(s);Set<string>[] ssa = ss.ToArray();string[] sa = s.ToArray(); Set<string>

…vtable slots

…


vtable for Set<Set<string>>

35


string

…vtable slots

…

TreeNode<string>



…vtable slots

…

TreeNode<Set<string>>

36


string

…vtable slots

…

TreeNode<string>



…vtable slots

…


Set<string>[]

37


string

…vtable slots

…

TreeNode<string>



…vtable slots

…


Set<string>[]



string[]

38

x86 code for new TreeNode<T>

mov ESI, dword ptr [EDI]mov EAX, dword ptr [ESI+24]mov EAX, dword ptr [EAX]add EAX, 4mov dword ptr [EBP-0CH], EAXmov EAX, dword ptr [EBP-0CH]mov EBX, dword ptr [EAX]test EBX, EBXjne SHORT G_M003_IG06G_M003_IG05:push dword ptr [EBP-0CH]push ESImov EDX, 0x1b000002mov ECX, 0x903ea0call @RuntimeHandlejmp SHORT G_M003_IG07G_M003_IG06:mov EAX, EBXG_M003_IG07:mov ECX, EAXcall @newClassSmall

Retrieve dictionary entry from vtable

If non-null then skip

Look up handle the slow way

Create the object with run-time type

39

Is it worth it? With no dictionaries, just run-time look-up:

new Set<T>() is 10x to 100x slower than normal object creation

With lazy dictionary look-up: new Set<T>() is ~10% slower than normal

object creation

40

Shared code for polymorphic methods Polymorphic methods

Specialize per instantiation on demand Again share code between instantiations

where possible Run-time types issue solved by “dictionary-

passing” style

41

Performance Non-generic quicksort:

void Quicksort(object[] arr, IComparer comp)

Generic quicksort

void GQuicksort<T>(T[] arr, GIComparer<T> comp)

Compare on element types int, string, double

42

Performance

0

0.5

1

1.5

2

2.5

3

3.5

4

int double string (length)

element type

Quicksort on 1,000,000 elementsTimes in seconds

Generic

Non-generic (object)

43

Theory

44

Transposing F to C# As musical keys, F and C♯ are far apart As programming languages, (System) F and (Generic)

C♯ are far apart But:

Polymorphism in Generic C♯ is as expressive as

polymorphism in System F

45

System F and C♯

System F Generic C♯

Structural equivalence for types

Name equivalence for types

No subtyping Subtyping & inheritance

First-class functions Virtual methods

Quantified types(“first-class polymorphism”)

Parameterized classes & polymorphic methods

46

System F into C♯

Despite the differences, we can formalize a translation from System F into (Generic) C♯ that is fully type-preserving (no loss of information) is sound (preserves program behaviour) makes crucial use of the fact that:

polymorphic virtual methodsexpress

first-class polymorphism

47

Polymorphic virtual methods Define an interface or abstract class:

interface Sorter { void Sort<T>(T[] a, IComparer<T> c); }

Implement the interface:

class QuickSort : Sorter { ... }class MergeSort : Sorter { ... }

Use instances at many type instantiations:

void TestSorter(Sorter s, int[] ia, string[] sa) { s.Sort<int>(ia, IntComparer); s.Sort<string>(sa, StringComparer);}

TestSorter(new QuickSort(), ...);TestSorter(new MergeSort(), ...);

48

Compare: Define an SML signature:

signature Sorter = sig val Sort : ‘a array * (‘a*’a->order) –> unit end

Define structures that match the signature:

structure QuickSort :> Sorter = ... structure MergeSort :> Sorter = ...

Use structures at many type instantiations:

functor TestSorter(S : Sorter) = struct fun test (ia, sa) = (S.Sort(ia, Int.compare); S.Sort(sa, String.compare) end

structure TestQS = TestSorter(QuickSort); TestQS.test(...);structure TestMS = TestSorter(MergeSort); TestMS.test(...);

49

Or (Russo first-class modules): Define an SML signature:

signature Sorter = sig val Sort : ‘a array * (‘a*’a->order) –> unit end

Define structures that match the signature:

structure QuickSort :> Sorter = ... structure MergeSort :> Sorter = ...

Use a function to test the structures:

fun TestSorter (s, ia, sa) = let structure S as Sorter = s in (S.Sort(ia, Int.compare); S.Sort(sa, String.compare)) end

TestSorter ([structure QuickSort as Sorter], ...);TestSorter ([structure MergeSort as Sorter], ...);

50

Observations Translation from System F to C# is global

generates new class names for (families of) polymorphic types The generics design for Java (GJ) also supports polymorphic

virtual methods C++ has “template methods” but not virtual ones

for good reason: it compiles by expansion Distinctiveness of polymorphic virtual methods shows up in

(type-passing) implementations (e.g. CLR) requires execution-time type application

51

Practice

52

Type inference? ML and Haskell have type inference C# programs must be explicitly-typed Is this a problem in practice?

not for the most-frequent application: collection classes

but try parser combinators in C#...

53

Parser combinators (Sestoft)class SeqSnd<T,U> : Parser { Parser<T> tp; Parser up;

public SeqSnd(Parser<T> tp, Parser up) { this.tp = tp; this.up = up; }

public Result Parse(ISource src) { Result<T> tr = tp.Parse(src); if (tr.Success) { Result ur = up.Parse(tr.Source); if (ur.Success)

return new Succ(ur.Value, ur.Source); } return new Fail(); }}

54

On the other hand… .NET generics are supported by

debugger profiler class browser GUI development environment

55

Try it! Rotor = shared-source release of CLR and

C#http://msdn.microsoft.com/NET/sscli

Generics + Rotor = Gyro Gyro extends Rotor with generics support

in CLR and C#http://research.microsoft.com/projects/clrgen

56

Future

57

Extension: Variance Should we add variance? e.g.

IEnumerator<Button> <: IEnumerator<Component> IComparer<Component> <: IComparer<Button>

Can even use this to support “broken” Eiffel:

class Cell<T> { T val; void Set(T newval) { val = newval; } T Get() { return val; }}

class Cell<+T> { T val; void Set(object newval) { val = (T) newval; } T Get() { return val; }}

invariant in T covariant in T

Run-time check

58

Extension: Parameterize by superclass Can type-check given sufficient constraints:

class D { virtual void m1() { … } virtual void m2() { … }}

class C<T> : T where T : D{ int f; override void m2(T x) { …x.m1()… } new virtual void m3() { … }}

T must extend D

Override method D.m2

Know m1 exists because of constraint on T

New method, name can clash with method from T

59

Extension:Parameterized by superclass (2) Provides a kind of “mixin” facility Unfortunately, implementation isn’t easy We’d like to share rep & code for C and

C<Q> for reference types P and Q, but it may be the case that object size of C ≠ size of C<Q> field offset of C.f ≠ offset of C<Q>.f vtable slot of C.m3 ≠ slot of C<Q>.m3 => abandon sharing, or do more run-time lookup

60

Open problem Most widely used polymorphic library is

probably C++ STL (Standard Template Library)

STL gets expressivity and efficiency from checking and compiling instantiations separately Really : ML functors can’t match it

How can we achieve the same expressivity and efficiency with compile-time-checked parametric polymorphism?

61

Questions?

1 parametric polymorphism for popular programming languages andrew kennedy microsoft research...

Documents