a static program analyzer to increase software reuse
DESCRIPTION
A Static Program Analyzer to increase software reuse. Ramakrishnan Venkitaraman and Gopal Gupta. Cost of software always on the rise. Why do we need a software standard?. Lack of software reuse because of lack of software standards Non availability of a rich set of COTS components - PowerPoint PPT PresentationTRANSCRIPT
Department of Computer Science
A Static Program Analyzer to increase software reuse
Ramakrishnan Venkitaraman and
Gopal Gupta
Why do we need a software standard?
Lack of software reuse because of lack of software standards
Non availability of a rich set of COTS components Time to market new products measured in years
rather than months Incompatibilities make integration of software from
multiple vendors impossible
The discussion refers mainly to DSP software but the problems are comparable to any software development process
TI TMS320 DSP Algorithm Standard
Contains 34 rules and 15 guidelines Intended to enable a rich set of COTS
marketplace and significantly reduce the time to market for new products
Will allow system integrators to integrate compliant algorithms from multiple vendors into a single system
Reduces time to market, increases software quality and software reuse
General Programming Rules
No tool currently exists to check for compliance
Programs must be relocatable No hard coded data memory locations No hard coded program memory locations Programs must be reusable Algorithms must be re-entrant
Hard Coded Addresses
Generally a bad programming practice unless you are programming for device drivers
Results in non relocatable code Results in non reusable code A pointer variable is said to be NOT hard coded if
a) If the address is derived from a call to memory allocation routines like “malloc” or “calloc”
b) If the address is derived as a function of the “stack pointer”c) If the address is derived from another pointer that is
legitimate.
Static Program Analysis
Static program analysis (or static analysis for brevity) is defined as any analysis of a program carried out without completely executing the program
The traditional data-flow analysis found in compiler back-ends is an example of static analysis
Another example of static analysis is abstract interpretation, in which a program's data and operations are approximated and the program abstractly executed
Basic Blocks and Flow Graphs
A “Basic Block” is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halting or possibility of branching except at the end.
The basic blocks form the nodes in a directed graph called the “Control Flow-Graph”. This graph will help us to visualize and arrive at all possible paths through which program control could flow at runtime. All such paths must be analyzed for compliance.
Overview of our approach
Input: Object Code of the algorithm Output: Compliant / Not Compliant status
Activity Diagram for our Static Analyzer
Our Algorithm for Static Analysis
1) Get the disassembled code from the input object code2) From the disassembled code, get the basic blocks and
construct the flow-graph3) Analyze the flow-graph and check for the dereferencing of
pointer variables4) For each such dereferencing, scan back and find out from
where did this pointer get its value from (involves the formation of unsafe sets which are explained later)
• If the original source of this pointer is hard coded, then declare that the algorithm is not compliant (“unsafe")
• If the original source from of this pointer is legitimate then declare that dereferencing is safe
5) The algorithm is declared to be safe if and only if all such pointer dereferencing are safe
Phases in Static Analysis of the Flow Graph
Phase 1: The analyzer detects statements in the disassembled code which correspond to the dereferencing of pointer variables by scanning downwards in the flow graph
Phase 2: The analyzer checks whether any dereferencing detected in phase 1 is safe by scanning upwards in the flow graph
Building Unsafe Sets
“Unsafe Set” is the set of registers which may potentially contain hard coded references
First element is added to the unsafe set when phase 1 detects dereferencing of a pointer
Example: If we find “ *Reg ” in the analyzed code, the unsafe set is initialized to {*Reg}
Note: Most Examples used in the presentation use the ‘C’ programming language for easy understanding while the real analysis is done at the Assembly Language level.
Building unsafe sets (continued)
Phase 2 populates the equivalence set by “scanning backwards”
For example if we find Reg = Reg1 + Reg2, the element “Reg” is deleted
from the unsafe set and the elements “Reg1” and “Reg2” are inserted into the unsafe set
Contents of the unsafe set will now become {Reg1, Reg2}
Now we scan backwards searching for both “Reg1” and “Reg2” in this case
Analysis Stops when…
All pointer dereferencing in the program are declared to be “safe” (not hard coded)
OR At least one of the pointer dereferencing in
the program is declared to be “unsafe” (hard coded)
Handling Loops
Complex because the number of iterations of the loop may not be known until runtime
We scan and cycle through the loop until the unsafe set reaches a “Fixed Point”
A Fixed Point is reached when The unsafe set repeats itself at the same point
in the loop during successive iterations No new information is added to the unsafe set
during successive iterations
Handling Function Calls
Similar to a Branch statement Marks the beginning and end of basic blocks Recursive function calls are handled as if
they were looping constructs
Handling Parallelism
The || characters signify that an instruction is to execute in parallel with the previous instruction
Instructions A, B, C are executed in parellel Example
Instruction A|| Instruction B|| Instruction C
Handle/Skip parallel instructions encountered during phase 2 until an instruction in the previous cycle is found
Current Work
Current work includes fine tuning the handling of loops and extending our system for the remaining rules
The development and testing of the tool is currently in progress
The system is being developed using the ‘C’ programming language
Related Work and Conclusion
Compared to Dynamic Analysis, Static Analysis can give correct results for a larger set of cases because of the very nature of the analysis
Our work so far can be regarded as an attempt to demonstrate the efficacy of static analysis to perform these checks and aid in software reuse
References
Ramakrishnan Venkitaraman and Gopal Gupta, “Static Program Analysis to Detect Hard Coded Addresses and its Application to TI's DSP Processor”, CS department technical report UTD CS-23-03
For More information, contact [email protected]