cint function stub removal root team meeting cern leandro franco (joint work with diego marcos)...

CInt Function Stub Removal

ROOT Team MeetingCERN

Leandro Franco (Joint work with Diego Marcos)

18-06-07

Modifying CInt? ...

A.K.A : The Piñata paradigm

CInt

NewbieExperiencedProgrammers

Goal: Obtain the candies from the piñata... without breaking anybody's head.

Simple Idea

● The dictionaries are big: around 52% of the total library size.

● Why don't we just wipe them off from the face of earth?

● Short answer: we can't do it yet, but we will try.● Long answer: the whole topic of these slides ;)

First steps

● One good way to shrink the dictionaries is to remove the stub functions.

● Such functions come from the need of having a generic way to call a function in Cint and from the impossibility of doing a proper name mangling to find such function (i.e. Cint must behave as a compiler but doesn't have the means to do so).

Stub Functions

● To be able to solve the name mangling problem a traditional approach was taken:

“Any problem in computer science can be solved with another layer of indirection”

Wheeler's Law

Stub Functions

Compiler/CInt Library function

CInt Library functionDictionary

manglingcompiling time

pseudo manglingrunning time

manglingcompiling time

The dictionary could be seen as a bijective function that maps c++ function declarations to a certain string (string which will be associated to the symbol by the compiler)

Stub Functions

● The idea is to avoid that layer of indirection.– We still don't how to do the mangling.

● But we know how to do the demangling (or at least, we know who to call to do it ;) ).

function header (X) library (Y)

A::A() _ZN1AC1Ev

A::HiA() _ZN1A3HiAEv

Instead of going from set X to set Yfor a given x in X

function header (X) library (Y)

A::A() _ZN1AC1Ev

A::HiA() _ZN1A3HiAEv

Go from set Y to set Xfor all y in Y

Stub Functions

● This approach writes in stone the biggest side effect:– We will need to demangle ALL the symbols in a

library just to be able to call 1 function.

● The demangling process might not be too expensive but what happens when we have thousands and thousands of symbols in a library?

Efficiency

● Since we have to demangle all the symbols from the library at least once we could cache this result– Expensive approach: libCore has 21000 symbols with

an average length of 46 characters when demangled (i.e 614 KB in cache).

● Try to demangle as less as possible. Don't do it more than once or twice and don't even try it if the symbols have been registered.

● I'm not even mentioning the parsing needed between the demangling and the registering.

Are we winning the fight?

● CVS version of ROOT

– Libs size: 74.67 MB

– Objects size (dictionaries): 47.71 MB

– Source size (dictionaries): 50.37 MB

● Current status of pre-experimental version

– Libs size: 65.46 MB ( -9.21 MB, 12%)

– Objects size (dict): 36.42 MB (-11.29 MB, 24%)

– Source size (dicti): 37.25 MB (-13.12 MB, 26%)

In all war sacrifices must be made: space and time overhead

Let's start with a “normal” sesion

Real time: 0.37 s Real time: 21.72 sRootmarks: 341.97

First Algorithm: be stupid.

Initial attempt: demangle all the symbols in a library for every used class

Real time: 0.76 s Real time: 38.95 sRootmarks: 184

Spikes due to the silliness of the algorithm. First demangle everything and the register it.

Second Algorithm: don't be so stupid

At least remember the classes thathave already been registered


Spikes due to the silliness of the algorithm. First demangle everything and the register it.

Third Algorithm: use the RAM

Demangle the symbols once and keep them in a cache


Fourth Algorithm: Axel's idea

Keep a pointer to the mangled name and demangle twice (when needed)


Fifth Algorithm: some tuning

A bit of optimization with the structures


Algorithms Comparison

How much are we willing to pay for this feature???Demangling takes 15% of the time at startup (100ms).

Which means there is still some room for improvement.

Problems so far... a plethora

● Easy ones

– ellipsis

– parameters by default

– free standing functions

– weird types like va_list

– many more...

● Not so easy:

– virtual functions... a real pain in the neck

– constructors, destructors (in-charge, deleting, etc)

– inline functions

– non-member operators

– ...

Work to be done

● Certain stub functions are not out of the dictionary yet:

– Constructors and destructors (Diego is working on it)

– Non-member operators

– Certain cases for std templates

● Without stubs we can also take the setup_memfunc calls out of the dictionary.

● What else can we take out?– Shadow classes? Show members? Streamers?

– Class Inheritance info? typedef? data members info? ...?

Future is always bright (dict source)

● CVS Version: 50.37MB

● Actual status: 37.25MB (-13.12 MB, 26.0%)

● No cons, dests: 30.09MB (-20.28 MB, 40.2%)

– Should be there soon enough.

● No memfuncs: 17.40MB (-32.97 MB, 65.4%)

– We still need the info (in a root file for instance).

● No memvars: 14.72MB (-35.65 MB, 70.7%)

● No inline issue: 13.89MB (-36.47 MB, 72.4%)

Conclusions

● We have gained a better understanding of C++.● As my mother used to say:

– He who knows not the way, walks with desperation.

(fortunately, we finally have an idea of what we are doing and where we want to go)

● A lot of tuning is being done to bring times and memory down to something acceptable.

● We need a considerable amount of time to deal with a myriad of small (and not so small) issues.

cint function stub removal root team meeting cern leandro franco (joint work with diego marcos)...

Documents

total library size

mbsource size dictionaries

mbobjects size dictionaries

bijective function

thousands of symbols

registeredreal time

time overheadlets

y stub functionsthis