domain-specific development tools
TRANSCRIPT
Oscar NierstraszSoftware Composition Group
scg.unibe.ch
Rennes — 2015-12-03
Domain-Specific Development Tools
2
3
Developers spend more time reading than writing code
Especially with OO, where the code does not reflect run time behaviour well, more time is spent reading than writing, to understand code and to understand impact of change. IDEs do not well support this reading activity, since they focus on PL concepts, like editing, compiling and debugging, not architectural constraints, or user features.
Roadmap
Agile Modeling
Moldable Tools
Architectural Monitoring
Moldable Tools
6
Build a new assessment tool in ten minutes
Challenge
Custom analyses require custom tools. Building a tool should be as easy as writing a query in SQL or a form-based interface.
7
Conventional debuggers just offer an interface to the run-time stack.
8
Specific Models
Mind the abstraction gap
Generic Debugger
Domain-specific Debuggers
The Moldable Debugger
DebuggingWidget
DebuggingAction
*
ActivationPredicate
Andrei Chis et al. The Moldable Debugger: A Framework for Developing Domain-Specific Debuggers. SLE 2014. DOI: 10.1007/978-3-319-11245-9_6
Classical development tools like browsers, debuggers and inspectors are generic and do not address the needs of specific domains.The Moldable debugger can be easily adapted to different domains, such as event-driven computation, GUI construction and parser generation.
Moldable Tools
PetitParser
identifierletter , (letter / digit) *
letter *
,
/
letter digitPetitParser is a PEG-based framework for developing parsers composed of objects.
10
IdentifierParser new parse: 'aLong32Identifier'
11
The conventional debugger knows nothing about the parsing domain.
12
A moldable PP debugger knows which objects are parsers, knows where we are in the input, and can show us which parser object is currently active.
Domain specific-extensions
DebuggingWidget
DebuggingView
DebuggingAction
DebuggingSession
DebuggingPredicate
PrimitivePredicate
HighLevelPredicate
***
ActivationPredicate
Moldable debuggers are built up from debugging widgets and debugging actions. The moldable debugger uses activation predicates to know which debuggers can currently be activated, allowing the developer to switch between debuggers without starting a new session.
Next production
Next parser
Production(aproduction)
Next failure
Stream position(anInteger)
Stream position changed
14
Debugging widgets
Debugging actions
The parts from which the PP debugger is built.
Petit Parser Events
SUnit Glamour
15
Moldable debuggers have been built for several different domains already.
New debuggers are cheap
Although some expertise is required to build a new debugger, the development effort for a new debugger is tiny.
The Moldable Inspector
The moldable inspector extends these ideas to object inspectors. Here is a moldable inspector for PostGres databases.We are exploring other kinds of moldable tools …
18
Demo
Agile Modeling
20
Smalltalk
Navigation
Metrics
Querying
Grouping
Smalltalk
Java
C++
Python
…
Extensible meta model
Model repository
Moose is a powerful tool once we have a model …
Roassal
Orion DSM ...BugMap
Nierstrasz et al. The Story of Moose. ESEC/FSE 2005. DOI: 10.1145/1095430.1081707
Moose is a platform for software and data analysis, but the bottleneck is the development of importers for different languages to the FAMIX metamodel. Development can take weeks or months.
21
Load the model in the morning, analyze it in the afternoon
Challenge
The key bottleneck to assessment is creating a suitable model for analysis. If a tool does not already exist, it can take days, weeks or months to parse source files and generate models.
Ideas Grammar Stealing
Hooking into an existing tool
of this phase will be a model of the Ruby software system. As the meta-modelis FAME compliant, also the model will be. Information about the ClassLoader,an instance responsible for loading Java classes, is covered in section 4.7.
The Fame framework automatically extracts a model from an instance of anEclipse AST. This instance corresponds to the instance of the Ruby plugin ASTrepresenting the software system. Automation is possible due to the fact thatwe defined the higher level mapping. Figure 2.1 reveals the need for the highermapping to be restored. In order to implement the next phase independentlyfrom the environment used in this phase we extracted the model into an MSEfile.
Figure 2.1: The dotted lines correspond to the extraction of a (meta-)model.The other arrows between the model and the software system hierarchy showwhich Java tower level corresponds to which meta-model tower element.
2.3 Model Mapping by Example phase
Our previously extracted model still contains platform dependent informationand thus is not a domain specific model for reverse engineering. It could beused by very specific or very generic reverse engineering tools, as it containsthe concrete syntax tree of the software system only. However such tools donot exist. In the Model Mapping by Example phase we want to transform themodel into a FAMIX compliant one. With such a format it will be easier to usein several software engineering tools.
The idea behind this approach relies on Parsing by Example [3]. Parsingby Example presents a semi-automatic way of mapping source code to domain
9
Recycling Trees
Parsing by Example
Evolutionary Grammar Generation
18 CHAPTER 3. GENETIC PROGRAMMING
Since biological evolution starts from an existing population of species, we need tobootstrap an initial population before we can begin evolving it. This initial populationis generally a number of random individuals. These initial individuals usually don’tperform well, although some will already be a tad better than others. That is exactlywhat we need to get evolution going.
The final part is reproduction, i.e. to generate a new generation from the surviving pre-vious generation. For that purpose an evolutionary algorithm usually uses two typesof genetic operators: point mutation and crossover (We will refer to point mutations asmutations, although crossover is technically also a mutation). Mutations change anindividual in a random location to alter it slightly, thus generating new information.Crossover1 however, takes at least two individuals and cuts out part of one of them, toput it in the other individual(s). By only moving around information, Crossover doesnot introduce new information. Be aware that every modification of an individual hasto result in a new individual that is valid. Validity is very dependent on the searchspace - it generally means that fitness function as well as the genetic operators shouldbe applicable to a valid individual. A schematic view is shown in fig. 3.1.
generate new
random population
select most fit
individuals
generate new
population with
genetic operators
fit enough?
mutation crossover
Figure 3.1: Principles of an Evolutionary Algorithm
There are alternatives to rejecting a certain number of badly performing individualsper generation. To compute the new generation, one can generate new individualsfrom all individuals of the old generation. This would not result in an improvementsince the selection is completely random. Hence the parent individuals are selected
1Crossover in biology is the process of two parental chromosomes exchanging parts of genes in themeiosis (cell division for reproduction cells)
22
Grammar Stealing was introduced by Verhoef and Lämmel. The other approaches we have tried, but they all have limitations …
23
Agile Modeling LifecycleBuild a
coarse model
Build a custom analysis
Refine the model
We don’t need a full parser; just enough to start analysis. Then we can refine the model to extract more details.
24
Idea: use island grammars to extract coarse models
'class' ID (method / . {avoid})*
'end'
method?
method
. {avoid}
class Shapeint x;int y;
method draw() … endend
method main() … end
Island grammars allow us to extract just parts of the information from source code that interest us at the moment.
25
Problem: island grammars lead to shipwrecksclass Shape
method end
'class' ID (method / !'end' !method)*
'end'
method?
Tweaking island grammars till they work is not an option …
Unfortunately island grammars are very difficult to get right because the rules for water depend on the islands. If the islands of interest change, the water must change too.
26
A Bounded Sea searches for an island in a bounded scope'class' ID
(~method~)*'end'
method?
~method~
method
~method~
Bounded seas essentially eliminate the need to write special rules for water, since the boundaries are inferred from the islands.We are now starting to explore how well this works for real languages …
Jan Kurš, et al. Bounded Seas. Computer Languages, Systems & Structures 44, 2015. DOI: 10.1016/j.cl.2015.08.002
27
Further experiments …
Keyword heuristicsExploit structure
Classify languages
Architectural Monitoring
29
Challenge
“What will my code change impact?”
Large software systems are so complex that one can never be sure until integration whether certain changes can have catastrophic effects at a distance.Ideas: Tracking Software Architecture; exploiting Big Software Data
30
Problems
Diverse views of SA
SA is not in the code
What is SA in the Wild?
Andrea Caracciolo, et al. How Do Software Architects Specify and Validate Quality Requirements? Software Architecture 2014. DOI: 10.1007/978-3-319-09970-5_32
The theory seems to suggest that SA is mainly about structure and dependencies.Our experience with actual projects suggested that the truth might be different.We carried out a couple of empirical studies, first a qualitative one to understand what is SA in the wild, and then a second, quantitative one to see to what extent various kinds of constraints appear in practice.
31
32
Impact of SA constraints
constraint Impact (1-5)availability 4.2response-time 4.0authorization 3.9authentication 3.6communication 3.4throughput 3.4signature 3.4software infrastructure 3.3data integrity 3.3recoverability 3.1dependencies 3.1visual design 3.0data retention policy 3.0hardware infrastructure 2.9system behavior 2.9data structure 2.9event handling 2.9code metrics 2.7meta-annotation 2.6naming conventions 2.6file location 2.5accessibility 2.5software update 2.2
In the quantitative study we asked developers how important different kinds of architectural constraints were for their projects. Interestingly, in the top ten, there were significantly more user constraints, like availability (in green) than developer constraints (in blue). Dependencies were only halfway down the list.
Automated Validation is not Prevalent
naming conventionsfile location
hardware infrastructuresoftware updaterecoverabilitydependencies
signaturesoftware infrastructure
data structureevent handling
availabilitycommunicationaccessibility
meta-annotationcode quality
visual designdata integrityauthentication
data retention policyresponse-time
throughputauthorization
0% 25% 50% 75% 100%
Avg: 40%
As we see, on average, QRs are automatically tested only 40% of the time.
Formalization is not Prevalent
software updatehardware infrastructure
accessibilityrecoverability
software infrastructureauthentication
data retention policythroughput
response-timeavailability
file locationcode metricsvisual design
communicationdata integrityauthorization
event handlingnaming conventions
meta-annotationdata structure
signaturedependencies
0% 25% 50% 75% 100%
Avg: 20%
ER, UML + profileRegex, BNFannotations
…
On average QRs are formally specified only 20 % of the time. Practitioners use different formalisms: from UML+profile to regex
Architectural Rules
“Repository interfaces can only declare methods named find..()”
“Only Service classes are allowed to throw AppException”
“The rendering operation has to be completed in less than 4ms”
Naming Conventions
Dependencies
Performance
AC: “One year ago I had the chance to talk to various professionals working in the area where I study. What I noticed was that part of an architectural specification consists of constraints and guidelines on how a system should behave and be implemented”
35
Rule Validationxml
java
uml
Limited functionality
Poor usability What we typically do is to test these rules using the most appropriate tool.
36
Dicto — a unified ADSL
Andrea Caracciolo, et al. Dicto: A Unified DSL for Testing Architectural Rules. ECSAW '14. DOI: 10.1145/2642803.2642824
We have a single unified spec. language which can be used to define a wide range of rules and can exploit off-the-shelf tools for verifying them. We call this language Dicto.
37
Dicto Rules
…
MyService : Website with url=“http://www.abc.com/api”
MyService must HandleLoadFrom("10 users")
MyService cannot HaveResponseTimeLessThan(“1000 ms")
MyService can only HandleSOAPMessages()
…DICTO looks like this.Two types of statements: 1. entity definition: used to identify concrete elements of the system 2. rules: express the condition that we want to test through one of the supported tools 38
Rule Examples
Website response timeWebsite load testing
Dependencies
Code clones
Deadlock freeness
File Content grepAt the moment we have a working implementation that supports these kinds of rules …
39
Evaluation
40
Medium size company various - Java EE / .NET100 employees
Open source project LMS - PHP (1.8M LOC) 12 service providers, 900’000+ users
Large size company B2B - Java EE (50K LOC) 1’000 employees
40
Conclusion Current IDEs offer developers only primitive
support for software assessment
Developers need support for moldable tools, agile modeling,
and architectural monitoring