1 cs 410 / 510 mastery in programming chapter 3 program and language complexity herbert g. mayer,...

25
1 CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity Herbert G. Mayer, PSU CS Herbert G. Mayer, PSU CS Status 7/4/2013 Status 7/4/2013

Upload: nataly-topp

Post on 14-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

1

CS 410 / 510Mastery in Programming

Chapter 3Program and Language Complexity

Herbert G. Mayer, PSU CSHerbert G. Mayer, PSU CSStatus 7/4/2013Status 7/4/2013

2

Syllabus

Thoughts on ComplexityThoughts on Complexity Hard to Understand Code?Hard to Understand Code? Program ComplexityProgram Complexity Complex vs. HardComplex vs. Hard Halstead Program MetricsHalstead Program Metrics McCabe Cyclomatic NumberMcCabe Cyclomatic Number Cyclomatic Number SamplesCyclomatic Number Samples ReferencesReferences

3

Thoughts on Complexity‘‘Complexity’ as used in this class:Complexity’ as used in this class:

Refers to the number of different paths of execution through a given program, dictated by flow of control; synonym: convoluted

Or refers a degree of difficulty of expressing some algorithm via a string of symbols –i.e. the source program; synonym: hard

Some hard to compute functions are easy to code and understand, once invented

E.g. R. E. Tarjan’s SCC algorithm, or Newton’s square-root formula

Complexity, as used here, does not mean:Complexity, as used here, does not mean: “intractable to compute”, such as NP-complete problems requiring too much compute power to ever terminate in human time

Complexity also does not mean:Complexity also does not mean: “hard to understand”, as may be the case with obfuscated programming styles; or poorly written code

Synonym for such a type of “complex” may be: difficult to read

4

Hard to Understand C Code?

#include <stdio.h>#include <stdio.h>int a[ 1 ];int a[ 1 ]; // just to have an array to index// just to have an array to indexint p( char arg )int p( char arg ){ // p{ // p

printf( "%c", arg );printf( "%c", arg );return 0;return 0; // no array bounds violation!// no array bounds violation!

} //end p} //end p

int main( )int main( ){ // main{ // main

a[ p( 'a' ) ] =a[ p( 'a' ) ] = a[ p( 'b' ) ] =a[ p( 'b' ) ] = a[ p( 'c' ) ] = a[ p( 'd' ) ];a[ p( 'c' ) ] = a[ p( 'd' ) ];printf( "\n" );printf( "\n" );return 0;return 0;

} //end main} //end main

5

Hard to Understand Code?

Output using PSU Unix C compiler is: Output using PSU Unix C compiler is: a a b c db c d

Is this correct? If not, what should Is this correct? If not, what should output be?output be?

Is this assignment-statement rule Is this assignment-statement rule respected in the used C++ respected in the used C++ implementation:implementation: to execute the right-hand side first?

Other outputs feasible, according to Other outputs feasible, according to rules C++ or Java or C# ?rules C++ or Java or C# ?

6

Hard to Understand, Not Complex

#include <stdio.h>#include <stdio.h>

#define MAX 7#define MAX 7 // 7 redundant? Discuss!// 7 redundant? Discuss!int a[ MAX ] = { 0, 1, 2, 3, 4, 5, 6 };int a[ MAX ] = { 0, 1, 2, 3, 4, 5, 6 }; void p()void p(){ // p{ // p

for( int i = 0; i < MAX; i++ ) {for( int i = 0; i < MAX; i++ ) { printf( " a[%d] = %d\n", i, a[ i ] );printf( " a[%d] = %d\n", i, a[ i ] );

} //end for} //end forprintf( "\n" );printf( "\n" );

} //end p} //end p

int main()int main(){ // main{ // main

int x = 99;int x = 99;

p();p();a[ x = 3 ] = a[ x = 5 ] = x = 6;a[ x = 3 ] = a[ x = 5 ] = x = 6;p();p();

} //end main} //end main

7

Hard to Understand, Not Complex a[0] = 0a[0] = 0

a[1] = 1 a[1] = 1 a[2] = 2 a[2] = 2 a[3] = 3 a[3] = 3 a[4] = 4 a[4] = 4 a[5] = 5 a[5] = 5 a[6] = 6 a[6] = 6

a[0] = 0 a[0] = 0 a[1] = 1 a[1] = 1 a[2] = 2 a[2] = 2 a[3] = 6 a[3] = 6 a[4] = 4 a[4] = 4 a[5] = 6 a[5] = 6 a[6] = 6a[6] = 6

x ends up being = 6 on [most] C++ run-time systemsx ends up being = 6 on [most] C++ run-time systems

8

Program ComplexitySome computable problems are hard, NP-hard, complex, or hard-to-Some computable problems are hard, NP-hard, complex, or hard-to-

understand!understand! Assuming an experienced designer and programmer: Some problems are laborious to solve; they are “complex” due to amount of

work Others are hard, due to elusiveness of a solution; just try to find a better

SCC!!! Yet others are not solvable; e.g. non computable functions, e.g. Halting

Problem [10]

What is program complexity?What is program complexity? Is a large program complex, i.e. one with many lines of code (LOC)? More complicated code? Spaghetti code? Labels? Computable labels? Gotos? Poor naming conventions? Recursive functions?

What unit-of-measure does complexity have?What unit-of-measure does complexity have? Time to run? Number of different paths through control-flow graph? Space for memory locations needed to run? Number of processors needed to solve computation? Number of iterations for suitable solution? E.g. number of digits for π Degree of “mental hardness” to identify a solution? E.g. in the chess game? V(G) by McCabe is a stab at a unit of complexity. But will it be universally

acceptable?

9

Program ComplexityProgrammatic solution for Programmatic solution for ““chesschess”” is hard or complex or both? is hard or complex or both?

Safely: A complete and correct chess program is hard to code Yet the rules are simple and relatively few And it has been solved programmatically to the grand-master level Kasparov lost to “Deep Blue” in a Tournament in game 1 in 1996, overall competition ended up in a tie in 1997 [8]

Degree of difficulty for finding a solution quantifies Degree of difficulty for finding a solution quantifies complexity!complexity! For example, solving Sudoku?

Some problems seem not hard, yet the number of special cases Some problems seem not hard, yet the number of special cases renders a solution virtually intractablerenders a solution virtually intractable E.g. US tax code [9]; contains about 9,800 different sections; ~75,000 pages

Could be simpler and fairer, even equally applicable to all citizens

But instead is highly complex, due to “special cases” and requires experts to give definitive answers; has exceptions for individual tax payers!

Numerous Numerous CS attempts to formalize complexityCS attempts to formalize complexity, unit, computability, unit, computability We cover 2 very briefly: Halstead’s and McCabe’s

10

Complex vs. HardComplexComplex is to be interpreted as is to be interpreted as

““Mathematically difficult to find a correct Mathematically difficult to find a correct algorithm!algorithm!”” E.g. find an algorithm to identify all strongly-connected components in a graph: SCC

HardHard is to be interpreted as is to be interpreted as ““Very much work Very much work to compute the solutionto compute the solution””, with the algorithm , with the algorithm being not hardbeing not hard E.g. compute the shortest path for a Travelling Salesman’s n stopping points

Might take so long that we are no longer interested in the solution

Instead: use heuristic provably no worse than x times the best solution

An incorrect solution, is always An incorrect solution, is always easyeasy to to compute compute

11

Halstead Program Metrics

Measures a specific program’s complexityMeasures a specific program’s complexity

Metrics developed by the late Maurice Metrics developed by the late Maurice HalsteadHalstead To directly quantify complexity of any given source program

Solely from operators, operands used in source

Halstead introduced measures in 1977Halstead introduced measures in 1977 Early formal program complexity measures [1], [2], [3]

Not formally derived, but postulatedNot formally derived, but postulated Halstead metrics carry an element of arbitrariness

Lack scientific proof! No formal derivation of the rules!

12

Halstead Program MetricsHalstead’s metrics count Halstead’s metrics count operatorsoperators and and operandsoperands

in source code of program being analyzedin source code of program being analyzed number of unique (distinct) operators (n1) number of unique (distinct) operands (n2) total number of operators (N1) total number of operands (N2) Number of unique operators and operands (n1 and n2) as well as the total number of operators and operands (N1 and N2) are calculated during lexical analysis of source program

Other Halstead measures are derived from these 4 Other Halstead measures are derived from these 4 unitsunits but without proof or scientific derivation! intuition of developer was used as the basis for deriving the measures

Halstead intended to provide formal proofs; but he died!

13

Halstead Program Metrics

OperandsOperands

Literals, AKA constants; e.g. 0, 1000, “hello”

User defined identifiers for values, AKA symbolic constants, e.g. MAX is an operand in: #define MAX 5

Reserved keywords that denote value, e.g. NIL

Declarations like #define MAX 5 less obvious

Depending on language, some language-defined type specifiers are treated as operands, e.g. in C++ char, int, double

14

Halstead Program MetricsOperatorsOperators

Common arithmetic symbols, e.g. + - / * ^ % Other arithmetic symbols, e.g. ( and ) Symbols for boolean operations, e.g. > >= < <= != && ||

Symbols for all kinds of operations, including cat for concatenation in some languages

Reserved keywords, e.g. or, or else, and, and then, xor

Function names, e.g. add( a, 8 ), sin( 45 ), sqrt( 3 ) Reserved operations, e.g. try, catch, throw Type qualifiers, e.g. const, volatile Scope specifiers, e.g. extern, static1

15

Halstead Program Metrics

OperatorsOperators that are control constructs: that are control constructs:

if ( ... ) plus then-clause and optional else-clause

while ( ... ) do ... for( ; ; ) ... catch() return ... switch {... }

16

Halstead Program MetricsProgram length N, vocabulary size n, program Program length N, vocabulary size n, program

volume V:volume V:

Program Program length N N is the sum of total number is the sum of total number of operators and operands in the program of operators and operands in the program analyzed:analyzed: N = N1 + N2

Vocabulary size nVocabulary size n is the sum of the number of is the sum of the number of unique operators and operands:unique operators and operands: n = n1 + n2

Program volume VProgram volume V : information contents of : information contents of program:program: V = N * log2 n

17

Halstead Program Metrics

Difficulty level D, AKA degree of error-Difficulty level D, AKA degree of error-proneness:proneness:

Level of difficulty D of program is Level of difficulty D of program is proportional to number of unique operators proportional to number of unique operators n1 in programn1 in program

And proportional to the total number of And proportional to the total number of operands N2operands N2

But with scale-factors applied to bothBut with scale-factors applied to both

D is postulated to be:D is postulated to be: D = ( n1 / 2 ) * ( N2 / n2 ) Interestingly, total number of operators N1 is not part of the formula for the difficulty level D

18

Halstead Program MetricsProgram level L:Program level L:

Program level L is inverse of error-Program level L is inverse of error-pronenessproneness i.e. a low level program is more prone to errors than a corresponding high level program for the same computable function

L = 1 / D

19

Halstead Program Metrics

Other measures, for you to elaborate in your Other measures, for you to elaborate in your paperpaper Effort to implement Time to implement Number of bugs delivered Etc.

20

Cyclomatic NumberGoal of McCabeGoal of McCabe’’s s Cyclomatic NumbersCyclomatic Numbers::

To have a measure of source program complexity To manage complexity, rather than dealing with an unknown

See [4], [6]

Builds on:Builds on: Graph theory E.g. [7] Berge: “Graphs and Hypergraphs”

Fundamental units:Fundamental units: Graph G –not necessarily connected! Number of edges: e Number of nodes: n Number of connected components: p

i.e. if ( p > 1 ) then G is not connected

21

Cyclomatic Number VCyclomatic number V of a graph G is called Cyclomatic number V of a graph G is called

V(G)V(G)If:e = number of edgesn = number of nodes, AKA vertices in other literaturep = number of connected components

then:

V(G) = e – n + 2 * pV(G) = e – n + 2 * p

22

Cyclomatic Number SamplesSequence of 2 statementsSequence of 2 statements

e = 1 n = 2 p = 1 V(G) = 1 – 2 + 2 * 1 = 1

If Statement with Then- If Statement with Then- and Else-and Else- e = 4 n = 4 p = 1 V(G) = 4 – 4 + 2 * 1 = 2

Sequence of 4 statementsSequence of 4 statements e = 3 n = 4 p = 1 V(G) = 3 – 4 + 2 * 1 = 1

23

Cyclomatic Number of While

While LoopWhile Loop e = 3 n = 3 p = 1 V(G) = 3 - 3 + 2 * 1 = 2

24

Cyclomatic Number of Program

Multiple-Module program with no cross-module Multiple-Module program with no cross-module verticesvertices Main Program = M Module A = A() Module B = B()

V(G) = V( M U A U B ) = V(M) + V(A) + V(B)M: A: B:M: A: B:

V(M) = 3-2+2 = 1 V(A) = 4-4+2 = 2 V(B) = 6-5+2 = 3 V(M) = 3-2+2 = 1 V(A) = 4-4+2 = 2 V(B) = 6-5+2 = 3 V(G) = 12 – 12 + 2*3 = 6V(G) = 12 – 12 + 2*3 = 6

25

References1.1. Halstead metrics: Halstead metrics:

http://www.verifysoft.com/en_halstead_metrics.htmlhttp://www.verifysoft.com/en_halstead_metrics.html2.2. HalsteadHalstead’’s book: Maurice Halstead, s book: Maurice Halstead, ““Elements of Elements of

Software ScienceSoftware Science””, Elsevier, 1977, ISBN , Elsevier, 1977, ISBN 044400205704440020573.3. Detail on Halstead: Detail on Halstead: http://www.horst-zuse.homepage.t-http://www.horst-zuse.homepage.t-

online.de/halstead.htmlonline.de/halstead.html4.4. Wiki page on Cyclomatic numbers: Wiki page on Cyclomatic numbers:

http://en.wikipedia.org/wiki/Cyclomatic_complexityhttp://en.wikipedia.org/wiki/Cyclomatic_complexity5.5. Program complexity: Program complexity:

http://www.acis.pamplin.vt.edu/faculty/tegarden/wrk-http://www.acis.pamplin.vt.edu/faculty/tegarden/wrk-pap/DSS.PDFpap/DSS.PDF

6.6. Thomas J. McCabe, Thomas J. McCabe, ““A Complexity MeasureA Complexity Measure””, IEEE , IEEE Transactions on SWE, Viol. SE-2, No. 4, December 1976Transactions on SWE, Viol. SE-2, No. 4, December 1976

7.7. C. Berge: C. Berge: ““Graphs and HypergraphsGraphs and Hypergraphs””, North-Holland, , North-Holland, Amsterdam 1973Amsterdam 1973

8.8. Deep Blue Info: Deep Blue Info: http://www.research.ibm.com/deepblue/http://www.research.ibm.com/deepblue/9.9. Tax code info: Tax code info: http://www.fourmilab.ch/ustax/ustax.html http://www.fourmilab.ch/ustax/ustax.html 10.10.Halting Problem: Halting Problem:

http://www.comp.nus.edu.sg/~cs5234/FAQ/halt.html http://www.comp.nus.edu.sg/~cs5234/FAQ/halt.html 11.11.Robert E. Tarjan:Robert E. Tarjan: "Depth-First Search and Linear Graph "Depth-First Search and Linear Graph

Algorithms"Algorithms". SIAM J. Computing, Vol. 1, No. 2, June . SIAM J. Computing, Vol. 1, No. 2, June 1972 1972