domain-specific development tools

Oscar NierstraszSoftware Composition Group

scg.unibe.ch

Rennes — 2015-12-03

Domain-Specific Development Tools

3

Developers spend more time reading than writing code

Especially with OO, where the code does not reflect run time behaviour well, more time is spent reading than writing, to understand code and to understand impact of change. IDEs do not well support this reading activity, since they focus on PL concepts, like editing, compiling and debugging, not architectural constraints, or user features.

Roadmap

Agile Modeling

Moldable Tools

Architectural Monitoring

Moldable Tools

6

Build a new assessment tool in ten minutes

Challenge

Custom analyses require custom tools. Building a tool should be as easy as writing a query in SQL or a form-based interface.

7

Conventional debuggers just offer an interface to the run-time stack.

8

Specific Models

Mind the abstraction gap

Generic Debugger

Domain-specific Debuggers

The Moldable Debugger

DebuggingWidget

DebuggingAction

*

ActivationPredicate

Andrei Chis et al. The Moldable Debugger: A Framework for Developing Domain-Specific Debuggers. SLE 2014. DOI: 10.1007/978-3-319-11245-9_6

Classical development tools like browsers, debuggers and inspectors are generic and do not address the needs of specific domains.The Moldable debugger can be easily adapted to different domains, such as event-driven computation, GUI construction and parser generation.

Moldable Tools

PetitParser

identifierletter , (letter / digit) *

letter *

,

/

letter digitPetitParser is a PEG-based framework for developing parsers composed of objects.

10

IdentifierParser new parse: 'aLong32Identifier'

11

The conventional debugger knows nothing about the parsing domain.

12

A moldable PP debugger knows which objects are parsers, knows where we are in the input, and can show us which parser object is currently active.

Domain specific-extensions

DebuggingWidget

DebuggingView

DebuggingAction

DebuggingSession

DebuggingPredicate

PrimitivePredicate

HighLevelPredicate

***

ActivationPredicate

Moldable debuggers are built up from debugging widgets and debugging actions. The moldable debugger uses activation predicates to know which debuggers can currently be activated, allowing the developer to switch between debuggers without starting a new session.

Next production

Next parser

Production(aproduction)

Next failure

Stream position(anInteger)

Stream position changed

14

Debugging widgets

Debugging actions

The parts from which the PP debugger is built.

Petit Parser Events

SUnit Glamour

15

Moldable debuggers have been built for several different domains already.

New debuggers are cheap

Although some expertise is required to build a new debugger, the development effort for a new debugger is tiny.

The Moldable Inspector

The moldable inspector extends these ideas to object inspectors. Here is a moldable inspector for PostGres databases.We are exploring other kinds of moldable tools …

18

Demo

Agile Modeling

20

Smalltalk

Navigation

Metrics

Querying

Grouping

Smalltalk

Java

C++

Python

…

Extensible meta model

Model repository

Moose is a powerful tool once we have a model …

Roassal

Orion DSM ...BugMap

Nierstrasz et al. The Story of Moose. ESEC/FSE 2005. DOI: 10.1145/1095430.1081707

Moose is a platform for software and data analysis, but the bottleneck is the development of importers for different languages to the FAMIX metamodel. Development can take weeks or months.

21

Load the model in the morning, analyze it in the afternoon

Challenge

The key bottleneck to assessment is creating a suitable model for analysis. If a tool does not already exist, it can take days, weeks or months to parse source files and generate models.

Ideas Grammar Stealing

Hooking into an existing tool

of this phase will be a model of the Ruby software system. As the meta-modelis FAME compliant, also the model will be. Information about the ClassLoader,an instance responsible for loading Java classes, is covered in section 4.7.

The Fame framework automatically extracts a model from an instance of anEclipse AST. This instance corresponds to the instance of the Ruby plugin ASTrepresenting the software system. Automation is possible due to the fact thatwe defined the higher level mapping. Figure 2.1 reveals the need for the highermapping to be restored. In order to implement the next phase independentlyfrom the environment used in this phase we extracted the model into an MSEfile.

Figure 2.1: The dotted lines correspond to the extraction of a (meta-)model.The other arrows between the model and the software system hierarchy showwhich Java tower level corresponds to which meta-model tower element.

2.3 Model Mapping by Example phase

Our previously extracted model still contains platform dependent informationand thus is not a domain specific model for reverse engineering. It could beused by very specific or very generic reverse engineering tools, as it containsthe concrete syntax tree of the software system only. However such tools donot exist. In the Model Mapping by Example phase we want to transform themodel into a FAMIX compliant one. With such a format it will be easier to usein several software engineering tools.

The idea behind this approach relies on Parsing by Example [3]. Parsingby Example presents a semi-automatic way of mapping source code to domain

9

Recycling Trees

Parsing by Example

Evolutionary Grammar Generation

18 CHAPTER 3. GENETIC PROGRAMMING

Since biological evolution starts from an existing population of species, we need tobootstrap an initial population before we can begin evolving it. This initial populationis generally a number of random individuals. These initial individuals usually don’tperform well, although some will already be a tad better than others. That is exactlywhat we need to get evolution going.

The final part is reproduction, i.e. to generate a new generation from the surviving pre-vious generation. For that purpose an evolutionary algorithm usually uses two typesof genetic operators: point mutation and crossover (We will refer to point mutations asmutations, although crossover is technically also a mutation). Mutations change anindividual in a random location to alter it slightly, thus generating new information.Crossover1 however, takes at least two individuals and cuts out part of one of them, toput it in the other individual(s). By only moving around information, Crossover doesnot introduce new information. Be aware that every modification of an individual hasto result in a new individual that is valid. Validity is very dependent on the searchspace - it generally means that fitness function as well as the genetic operators shouldbe applicable to a valid individual. A schematic view is shown in fig. 3.1.

generate new

random population

select most fit

individuals

generate new

population with

genetic operators

fit enough?

mutation crossover

Figure 3.1: Principles of an Evolutionary Algorithm

There are alternatives to rejecting a certain number of badly performing individualsper generation. To compute the new generation, one can generate new individualsfrom all individuals of the old generation. This would not result in an improvementsince the selection is completely random. Hence the parent individuals are selected

1Crossover in biology is the process of two parental chromosomes exchanging parts of genes in themeiosis (cell division for reproduction cells)

22

Grammar Stealing was introduced by Verhoef and Lämmel. The other approaches we have tried, but they all have limitations …

23

Agile Modeling LifecycleBuild a

coarse model

Build a custom analysis

Refine the model

We don’t need a full parser; just enough to start analysis. Then we can refine the model to extract more details.

24

Idea: use island grammars to extract coarse models

'class' ID (method / . {avoid})*

'end'

method?

method

. {avoid}

class Shapeint x;int y;

method draw() … endend

method main() … end

Island grammars allow us to extract just parts of the information from source code that interest us at the moment.

25

Problem: island grammars lead to shipwrecksclass Shape

method end

'class' ID (method / !'end' !method)*

'end'

method?

Tweaking island grammars till they work is not an option …

Unfortunately island grammars are very difficult to get right because the rules for water depend on the islands. If the islands of interest change, the water must change too.

26

A Bounded Sea searches for an island in a bounded scope'class' ID

(~method~)*'end'

method?

~method~

method

~method~

Bounded seas essentially eliminate the need to write special rules for water, since the boundaries are inferred from the islands.We are now starting to explore how well this works for real languages …

Jan Kurš, et al. Bounded Seas. Computer Languages, Systems & Structures 44, 2015. DOI: 10.1016/j.cl.2015.08.002

27

Further experiments …

Keyword heuristicsExploit structure

Classify languages

Architectural Monitoring

29

Challenge

“What will my code change impact?”

Large software systems are so complex that one can never be sure until integration whether certain changes can have catastrophic effects at a distance.Ideas: Tracking Software Architecture; exploiting Big Software Data

30

Problems

Diverse views of SA

SA is not in the code

What is SA in the Wild?

Andrea Caracciolo, et al. How Do Software Architects Specify and Validate Quality Requirements? Software Architecture 2014. DOI: 10.1007/978-3-319-09970-5_32

The theory seems to suggest that SA is mainly about structure and dependencies.Our experience with actual projects suggested that the truth might be different.We carried out a couple of empirical studies, first a qualitative one to understand what is SA in the wild, and then a second, quantitative one to see to what extent various kinds of constraints appear in practice.

31

32

Impact of SA constraints

constraint Impact (1-5)availability 4.2response-time 4.0authorization 3.9authentication 3.6communication 3.4throughput 3.4signature 3.4software infrastructure 3.3data integrity 3.3recoverability 3.1dependencies 3.1visual design 3.0data retention policy 3.0hardware infrastructure 2.9system behavior 2.9data structure 2.9event handling 2.9code metrics 2.7meta-annotation 2.6naming conventions 2.6file location 2.5accessibility 2.5software update 2.2

In the quantitative study we asked developers how important different kinds of architectural constraints were for their projects. Interestingly, in the top ten, there were significantly more user constraints, like availability (in green) than developer constraints (in blue). Dependencies were only halfway down the list.

Automated Validation is not Prevalent

naming conventionsfile location

hardware infrastructuresoftware updaterecoverabilitydependencies

signaturesoftware infrastructure

data structureevent handling

availabilitycommunicationaccessibility

meta-annotationcode quality

visual designdata integrityauthentication

data retention policyresponse-time

throughputauthorization

0% 25% 50% 75% 100%

Avg: 40%

As we see, on average, QRs are automatically tested only 40% of the time.

Formalization is not Prevalent

software updatehardware infrastructure

accessibilityrecoverability

software infrastructureauthentication

data retention policythroughput

response-timeavailability

file locationcode metricsvisual design

communicationdata integrityauthorization

event handlingnaming conventions

meta-annotationdata structure

signaturedependencies

0% 25% 50% 75% 100%

Avg: 20%

ER, UML + profileRegex, BNFannotations

…

On average QRs are formally specified only 20 % of the time. Practitioners use different formalisms: from UML+profile to regex

Architectural Rules

“Repository interfaces can only declare methods named find..()”

“Only Service classes are allowed to throw AppException”

“The rendering operation has to be completed in less than 4ms”

Naming Conventions

Dependencies

Performance

AC: “One year ago I had the chance to talk to various professionals working in the area where I study. What I noticed was that part of an architectural specification consists of constraints and guidelines on how a system should behave and be implemented”

35

Rule Validationxml

java

uml

Limited functionality

Poor usability What we typically do is to test these rules using the most appropriate tool.

36

Dicto — a unified ADSL

Andrea Caracciolo, et al. Dicto: A Unified DSL for Testing Architectural Rules. ECSAW '14. DOI: 10.1145/2642803.2642824

We have a single unified spec. language which can be used to define a wide range of rules and can exploit off-the-shelf tools for verifying them. We call this language Dicto.

37

Dicto Rules

…

MyService : Website with url=“http://www.abc.com/api”

MyService must HandleLoadFrom("10 users")

MyService cannot HaveResponseTimeLessThan(“1000 ms")

MyService can only HandleSOAPMessages()

…DICTO looks like this.Two types of statements: 1. entity definition: used to identify concrete elements of the system 2. rules: express the condition that we want to test through one of the supported tools 38

Rule Examples

Website response timeWebsite load testing

Dependencies

Code clones

Deadlock freeness

File Content grepAt the moment we have a working implementation that supports these kinds of rules …

39

Evaluation

40

Medium size company various - Java EE / .NET100 employees

Open source project LMS - PHP (1.8M LOC) 12 service providers, 900’000+ users

Large size company B2B - Java EE (50K LOC) 1’000 employees

40

Conclusion Current IDEs offer developers only primitive

support for software assessment

Developers need support for moldable tools, agile modeling,

and architectural monitoring