Download - Hands-on Refactoring with Wrangler Simon Thompson Huiqing Li, Xingdong Bian University of Kent
Hands-on Refactoring with Wrangler
Simon Thompson Huiqing Li, Xingdong
BianUniversity of Kent
Overview
What is refactoring?
Examples
The process of refactoring
Tool building and infrastructure
What is in Wrangler … demo
Latest advances: data, processes, erlide.
Introducing refactoring
Soft-ware
There’s no single correct design …
… different options for different situations.
Maintain flexibility as the system evolves.
Refactoring
Refactoring means changing the design or structure of a program … without changing its behaviour.
Refactor
Modify
Examples
Generalisation
-module (test).-export([f/1]). add_one ([H|T]) -> [H+1 | add_one(T)];
add_one ([]) -> [].
f(X) -> add_one(X).
-module (test).-export([f/1]). add_one (N, [H|T]) -> [H+N | add_one(N,T)];
add_one (N,[]) -> [].
f(X) -> add_one(1, X).
-module (test).-export([f/1]). add_int (N, [H|T]) -> [H+N | add_int(N,T)];
add_int (N,[]) -> [].
f(X) -> add_int(1, X).
Generalisation and renaming
Generalisation
-export([printList/1]).
printList([H|T]) -> io:format("~p\n",[H]), printList(T);printList([]) -> true.
printList([1,2,3])
-export([printList/2]).
printList(F,[H|T]) -> F(H), printList(F, T);printList(F,[]) -> true.
printList( fun(H) -> io:format("~p\n", [H]) end, [1,2,3]).
Generalisation
-export([printList/1]).
printList([H|T]) -> io:format("~p\n",[H]), printList(T);printList([]) -> true.
-export([printList/1]).
printList(F,[H|T]) -> F(H), printList(F, T);printList(F,[]) -> true.
printList(L) -> printList( fun(H) -> io:format("~p\n", [H]) end, L).
Asynchronous to synchronous
pid! {self(),msg}
{Parent,msg} -> body
pid! {self(),msg}, receive {pid, ok}-> ok
{Parent,msg} -> Parent! {self(),ok}, body
Refactoring
Refactoring = Transformation + Condition
Transformation
Ensure change at all those points needed.
Ensure change at only those points needed.
Condition
Is the refactoring applicable?
Will it preserve the semantics of the module? the program?
Transformations
full stop one
Condition > TransformationRenaming an identifier
"The existing binding structure should not be affected. No binding for the new name may intervene between the binding of the old name and any of its uses, since the renamed identifier would be captured by the renaming. Conversely, the binding to be renamed must not intervene between bindings and uses of the new name."
Which refactoring exactly?Generalise f by making 23 a parameter of f:
f(X) -> Con = 23, g(X) + Con + 23.
• This one occurrence?• All occurrences (in the body)?• Some of the occurrences … to be selected.
Compensate or crash?
-export([oldFun/1, newFun/1]).
oldFun(L) -> newFun(L).
newFun(L) -> … … .
-export([newFun/1]).
newFun(L) -> … … .
or ?
Refactoring tools
Tool support
Bureaucratic and diffuse.
Tedious and error prone.
Semantics: scopes, types, modules, …
Undo/redo
Enhanced creativity
Semantic analysis
Binding structure• Dynamic atom creation, multiple binding occurrences, pattern semantics etc.
Module structure and projects• No explicit projects for Erlang; cf Erlide / Emacs.
Type and effect information• Need effect information for e.g. generalisation.
Erlang refactoring: challenges Multiple binding occurrences of variables.Indirect function call or function spawn: apply (lists, rev, [[a,b,c]]) Multiple arities … multiple functions: rev/1
ConcurrencyRefactoring within a design library: OTP.Side-effects.
Static vs dynamic
Aim to check conditions statically.
Static analysis tools possible … but some aspects intractable: e.g. dynamically manufactured atoms.
Conservative vs liberal.
Compensation?
Architecture of Wrangler
Wrangler in Emacs
Refactorings in Wrangler• Renaming variable,
function, module, process• Function generalisation• Move function between
modules.• Function extraction• Fold against definition• Introduce and fold
against macros.
• Tuple function arguments
together• Register a process• From function to process• Add a tag to messages
All these refactorings work
across multiple-module
projects and respect
macro definitions.
Wrangler demo
Tool building
Wrangler and RefactorErlLightweight.
Better integration with interactive tools (e.g. emacs).
Undo/redo external?
Ease of implementing conditions.
Higher entry cost.
Better for a series of refactorings on a large project.
Transaction support.
Ease of implementing transformations.
Duplicate Code Detection
Especially for Erlang/OTP programs.
Report syntactically well-formed code
fragments that are identical after consistent
renaming of variables …
… ignoring differences in literals and layout.
Integrated with the refactoring environment.
Code Inspection Support• Variable use/binding information.• Caller functions.• Caller/callee modules.• Case/if/receive expressions nested more than a
specified level.• Long function/modules.• Non tail-recursive servers.• Non-flushed unknown messages
• . . .
Integration … with IDEsBack to the future? Programmers' preference for emacs and gvim …
… though some IDE interest: Eclipse, NetBeans …
Issue of integration with multiple IDEs: building common interfaces.
Integration … with toolsTest data sets and test generation.Makefiles, etc.
Working with macros e.g. QuickCheck uses Erlang macros …
… in a particular idiom.
APIs … programmer / userAPI in Erlang to support user-programmed refactorings: • declarative, straightforward and complete • but relatively low-level.
Higher-level combining forms? • OK for transformations, but need a separate condition language.
Verification and validationPossible to write formal proofs of correctness:• check conditions and transformations• different levels of abstraction
• possibly-name binding substitution for renaming etc.• more abstract formulation for e.g. data type changes.
Use of Quivq QuickCheck to verify refactorings in Wrangler.
Clone detection
The Wrangler Clone Detector
Uses syntactic and static semantic information.
Syntactically well-formed code fragments
… identical after consistent renaming of variables,
… with variations in literals, layout and comments.
Integrated within the refactoring environment.
The Wrangler Clone Detector
Make use of token stream and annotated AST. Token–based approaches Efficient. Report non-syntactic clones. AST-based approaches. Report syntactic clones. Checking for consistent renaming is easier.
The Wrangler Clone Detector
Source Files
TokenisationTokenisation
Token Stream
NormalisationNormalisation
Normalised Token Stream
Suffix Tree ConstructionSuffix Tree Construction
Suffix tree
The Wrangler Clone Detector
Source Files
TokenisationTokenisation
Token Stream
NormalisationNormalisation
Normalised Token Stream
Suffix Tree ConstructionSuffix Tree Construction
Suffix tree
Clone CollectorClone Collector
Initial Clones
Clone FilterClone Filter
Filtered Initial Clones
Clone DecompositionClone Decomposition
Parsing + Static AnalysisParsing + Static Analysis
Annotated ASTs
Syntactic Clones
The Wrangler Clone Detector
Source Files
TokenisationTokenisation
Token Stream
NormalisationNormalisation
Normalised Token Stream
Suffix Tree ConstructionSuffix Tree Construction
Suffix tree
Clone CollectorClone Collector
Initial Clones
Clone FilterClone Filter
Filtered Initial Clones
Clone DecompositionClone Decomposition
Parsing + Static AnalysisParsing + Static Analysis
Annotated ASTs
Syntactic Clones Consistent Renaming
CheckingConsistent Renaming
Checking
Clones to report
The Wrangler Clone Detector
Source Files
TokenisationTokenisation
Token Stream
NormalisationNormalisation
Normalised Token Stream
Suffix Tree ConstructionSuffix Tree Construction
Suffix tree
Clone CollectorClone Collector
Initial Clones
Clone FilterClone Filter
Filtered Initial Clones
Clone DecompositionClone Decomposition
Parsing + Static AnalysisParsing + Static Analysis
Annotated ASTs
Syntactic Clones Consistent Renaming
CheckingConsistent Renaming
Checking
Clones to report
FormattingFormatting
Reported Code Clones
Clone detection demo
Support for clone removalRefactorings to support clone removal.
Function extraction. Generalise a function definition. Fold against a function definition.
Move a function between modules.
Case studies
Applied the clone detector to Wrangler itself with threshold values of 30 and 2.
36 final clone classes were reported …12 are across modules, and 3 are duplicated function definitions. Without syntactic checking and consistent variable renaming checking, 191 would have been reported.
Applied to third party code base (32k loc, 89 modules),109 clone classes reported.
Data-oriented refactorings
-module(tup1).-export([gcd/1]).
gcd({X,Y}) -> if X>Y -> gcd({X-Y,Y}); Y>X -> gcd({Y-X,X}); true -> X end.
Tupling parameters
-module(tup1).-export([gcd/2]).
gcd(X,Y) -> if X>Y -> gcd(X-Y,Y); Y>X -> gcd(Y-X,X); true -> X end.
2
-module(rec1).-record(rec,{f1, f2}).
g(#rec{f1=A, f2=B})-> A + B.
h(X, Y)-> g(#rec{f1=X,f2=X}), g(#rec{ f1=element(1,Y), f2=element(2,Y)}).
Introduce records …
-module(rec1).
g({A, B})-> A + B.
h(X, Y)-> g({X, X}), g(Y).
f1 f2
Introduce records in a projectNeed to replace other expressions …• Replace tuples with record• Record update expression• Record access expression
Chase dependencies across functions …… and across modules.
Refactoring and Concurrency
Wrangler and processes
Refactorings which address processes
• Register a process.
• Rename a registered process.
• From function to process.
• Add tags to messages sent / received.
Challenges to implementationData gathering is a challenge because
• Processes are syntactically implicit.
• Pid to process links are implicit.
• Communication structure is implicit.
• Side effects.
Underlying analysis
Analyses include
• Annotation of the AST, using call graph.
• Forward program slicing.
• Backwards program slicing.
Wrangler and Erlide
Wrangler and Erlide
Erlide is an Eclipse plugin for Erlang.
• Distribution simplified.
• Integration with the edit undo history.
• Notion of project.
• Refactoring API in the Eclipse LTK.
Ongoing support for Erlide from Ericsson.
Issues on integration
LTK has a fixed workflow for interactions.
• New file vs set of diffs as representation.
• Fold and generalise interaction pattern.
• Cannot support rename / create file.
Other refactorings involve search … a different API.
Conclusions
Future work
Concurrency: continue work.
Refactoring within a design library: OTP.
Working with Erlang Training and Consulting.
Continue integration with Eclipse + other IDEs.
Test and property refactoring in .
Clone detection: fuller integration.
Ackonwledgements
Wrangler development funded by EPSRC.
The developers of syntax-tools, distel and Erlide.
George Orosz and Melinda Toth.
Zoltan Horvath and the RefactorErl group at Eotvos Lorand Univ., Budapest.
Property discovery in WranglerClone detection …… and elimination.Find code that is similar …… common abstraction …… accumulate the instances.
Examples:Test code from Ericsson: different medium and codec.Clone removal example: 2.6k to 2.0k and counting.
Other Wrangler developmentsFully integrated into Eclipse … keeps the reviewers happy!User experience: preview the changes, code inspector,
Respecting test code in e.g. EUnit.Multi-version: Erlang, OS, Java, Eclipse.Windows installer.
Next steps
Refine the notion of similarity … … to take account of insert/delete in seqs of commands.Support property extraction from 'free' and EUnit tests.
Refactorings of tests and properties themselves.Further integration into Erlide: allow use of the contextual menu.Case study with Lambda Stream.
Case Studies
Applied the clone detector to Wrangler itself and other Erlang applications with the thresholds of 30 for the minimum size of the clone (in tokens) and 2 for the minimum number of duplicates.
Wrangler Mnesia Yaws
No. of files
44 38 68
Size (K Loc)
30.9 28.2 26.9
Time (Min) <6 <3 <3
No. Clones 53 43 66
Inter-module clones
35 5 18
Clearly a clone
From the Dialyzer user interface.
Less clearly worth replacing OkButton = gs:button(WinPacker, [{label, {text,
"Ok"}}, {pack_xy, {2,3}}]),
CancelButton = gs:button(WinPacker, [{label, {text, "Cancel"}},
{pack_xy, {3,3}}]),
Also from the dialyzer GUI … would it be clearer to have an intervening common function call?
Related WorkExisting clone detection approaches:
Program text-based.
Token-based.
AST-based.
PDG-based.
Hybrid approaches.
Language dependent or independent?
70
Future Work
Use visualization techniques to improve the presentation of clone results.
Extend the current approach to find “similar” code fragments.
How to automate or semi-automate the work-flow of clone detection and removal.
71
ConclusionsThe Wrangler clone detector
- Relatively efficient
- No false positives
Refactorings support interactive removal of clones.
Integrated in the development environment.
72
Questions?
http://www.protest-project.eu
http://www.cs.kent.ac.uk/projects/wrangler/
Installation: Mac OS X and LinuxRequires R11B-5, 12B, 13B + Emacs
Download Wrangler from http://www.cs.kent.ac.uk/projects/
wrangler/
./configure, make, sudo make install
Add to .emacs file: (add-to-list 'load-path
"/usr/local/share/wrangler/elisp") (require 'wrangler)
74
Installation: Windows
Requires R11B-5, 12B, 13B + Emacs
Download installer from http://www.cs.kent.ac.uk/projects/
wrangler/
Requires no other actions.
75
Installation: Eclipse + ErlIDERequires Erlang R11B-5 or later, if it isn't already present on your system.
On Windows systems, use a path with no spaces in it.
Install Eclipse 3.4, if you didn't already.
All the details at
http://erlide.sourceforge.net/
76
Starting Wrangler in EmacsOpen emacs, and open a .erl file.M-x erlang-refactor-on or ...... C-c, C-rNew menus: Refactor and Inspector
Customise for dirUndo C-c, C-_
77
Preview Feature
Preview changes before confirming the change
Emacs ediff is used.
78
Stopping Wrangler in EmacsM-x erlang-refactor-off to stop Wrangler
Shortcut C-c, C-r
79
Hands On
Check out sample code from: svn co https://svn.erlangsystems.com/protest/branches/refa
Or use your own project codeFeedback:[email protected] [email protected]
80