cp – cost analytics and parametric estimation directorate unclassified approved for public release...

CP – Cost Analytics and Parametric Estimation Directorate

UNCLASSIFIED

UNCLASSIFIED Approved for Public Release 15-MDA-8479 (10 November 15)

My Dad Is Bigger Than Your Dad:Analysis Of Productivity Ranges In Popular Software Cost Estimation

Models Using Stratified Data

Dan StricklandSoftware SME, Research Division

16 Nov 2015DISTRIBUTION STATEMENT A. Approved for public release;

distribution is unlimited.


UNCLASSIFIED


2


UNCLASSIFIED


3

Overview

• Simple Model Calibration With Data• Aerospace Database• Putnam Model

- Performance Overall- Performance With Boundaries

• SEER/Sage Model- Performance Overall- Performance With Boundaries

• COCOMO II Model- Performance Overall- Performance With Boundaries

• Paired Data Results• What Does It All Mean?• Future Research


UNCLASSIFIED


4

Software Estimating Models

• In the Feb 2006 CrossTalk article “Software Estimating Models: Three Viewpoints”, three popular software cost estimation models (Sage/SEER, Putnam, COCOMO II) are described in their base mathematical forms

• All three models calculate effort using size and productivity

• Two models (Putnam, SEER) also use development time as a factor to calculate

• Productivity is expressed as software output over software input, usually in SLOC/hr or SLOC/PM; in calibration, estimators are looking for productivity

• Putnam model on calculating productivity: “From historic projects, we know the size, effort, and schedule…just put in a consistent set of historic numbers and calculate a Process Productivity Parameter.”

• Is it really that simple to calculate Productivity?

Given a database of completed projects and the three default models, can reliable productivity ranges be developed in different development strata?


UNCLASSIFIED


5

Aerospace Database

• Published in 2004 by Gayek, Long, et al, the Aerospace “Software Cost and Productivity Model” is an analysis of over 450 completed Aerospace, DoD, and space software programs

• Data was normalized for size, effort, and duration and stratified by Operating Environment, Application Domain, Programming Language, etc.

• When normalized, there are 181 records of Computer Software Configuration Item (CSCI) level software development with effort (in hours) and duration (in months) representing Preliminary Design through Integration and Testing


UNCLASSIFIED


6

Aerospace Database Strata

Operating Environments

Military Ground Ground-based, mission critical software; usually a fixed site

Military Mobile In a vehicle such as a truck, trailer, or ship

Mil-Spec Avionics Operation of an aircraft or similar vehicle

Unmanned Space Similar to Avionics, but may be satellite or UAV

Application Domains

Command & Control Network monitoring, network control, mission control, command processing

Database Collects, stores, and organizes information; database generation and management

Mission Planning Supports mission planning like aircraft mission planning, scenario generation, route planning

Operating System/ Executive Controls basic hardware functions and serves as the platform for other applications to run

Signal Processing Enhances, transforms, filters, and converts data signals; Complex algorithms for sonar, radar, and flight systems

Simulation Environment simulation, system simulation, network simulation, system reliability

Support Software used to aid the development, testing, and support of other applications

Test Software used specifically for testing and evaluating software and hardware systems


UNCLASSIFIED


7

Putnam Model

• Basic Equation: Size = (Effort / )1/3 (Schedule)4/3 PP- Size in SLOC- Effort is software development effort in Person-Years- Beta is a skills factor based on size and ranging from 0.16 to 0.39- Schedule is the development time in years- PP is the Process Productivity Parameter and has been observed to range from 1,974 to

121,393• Solve for Productivity: PP = Size / ((Effort / )1/3 (Schedule)4/3))

- As PP increases, effort (cost) decreases

• Using the Aerospace database, solve for PP and observe stratified results

• Can a Productivity value be developed that produces accurate results?- Effort (PY) = 15 * * (td𝜷𝒆𝒕𝒂 min)3 (Putnam’s effort equation)- tdmin(years) = 0.68 * (Size / PP)0.43 (Putnam’s minimum development time equation)

Operating Environment Application Domain Lang ESLOC Labor Effort (DevMos) Develop Schedule (Months) Unmanned Sp Signal Proc C++ 2,200 11 6

Assume ESLOC = SLOC Effort (PY) = Labor Effort / 12

Schedule (yrs) = Develop Schedule / 12


UNCLASSIFIED


8

Op Env App Type Language Records Min Max Mean MedianAll All All 181 450.9 19303.5 3346.0 2811.9All Cmd/Ctrl All 50 450.9 7865.1 2865.1 2904.4All Database All 11 1812.8 4533.8 3145.1 3005.8All Mission Planning All 18 1667.5 8777.6 4166.5 4222.6All OS/Exec All 18 549.9 4159.0 1904.5 1855.1All Signal Processing All 39 615.2 12755.8 2786.8 2011.5All Simulation All 8 1477.9 5814.6 3209.7 2118.7All Support All 21 754.8 19303.5 6177.8 4264.2All Test All 16 936.1 7294.2 3400.0 3072.9All All C/C++ 61 615.2 12755.8 3729.3 3018.3All All Ada 79 450.9 7077.9 2734.9 2284.4All All Java 5 8473.4 19303.5 13046.9 11255.8Unmanned Space All All 27 610.5 4286.6 2540.2 2318.6Mil-Spec Avionics All All 22 549.9 6720.5 2122.4 1412.6Military Mobile All All 73 615.2 6740.4 2806.7 2793.2Military Ground All All 59 450.9 19303.5 4838.4 3982.7

Putnam Model – Results

Several PP values below the minimum observed values by

Putnam

Example (Mission Plan, Median PP):Effort (PY) = 15 * * (td𝜷 min)3 tdmin(years) = 0.68 * (Size / 4222.6)0.43


UNCLASSIFIED


9

Putnam Model – Accuracy

• Formula used to compare estimate vs. actual from Aerospace data• Mean Magnitude of Relative Error (MMRE) measures the average relative error of all the

predictions to their actuals, independent of scale and sign – lower is better• Prediction Level (PRED) measures the percentage of all the predictions that fall within a

defined error bounds of the actual, here we used PRED(30) or within 30% – higher is better• Overall, the Putnam model using either the mean or median productivity values is only

accurate in a few strata (Database, Unmanned Space)• What about those data points that had calculated PP outside of the published limits for

Putnam? What happens when those data points are thrown out?

Op Env App Type Language Records MMRE PRED(30) MMRE PRED(30)All All All 181 0.66 25.4% 0.75 36.5%All Cmd/Ctrl All 50 0.49 52.0% 0.49 52.0%All Database All 11 0.34 63.6% 0.34 63.6%All Mission Planning All 18 0.54 22.2% 0.53 27.8%All OS/Exec All 18 0.81 22.2% 0.84 22.2%All Signal Processing All 39 0.79 25.6% 1.11 35.9%All Simulation All 8 1.25 0.0% 2.20 50.0%All Support All 21 0.50 23.8% 0.63 23.6%All Test All 16 0.60 31.3% 0.61 31.3%All All C/C++ 61 0.60 29.5% 0.67 44.3%All All Ada 79 0.78 27.8% 0.95 25.3%All All Java 5 0.52 0.0% 0.42 20.0%Unmanned Space All All 27 0.31 70.4% 0.36 40.7%Mil-Spec Avionics All All 22 1.12 13.6% 1.85 18.2%Military Mobile All All 73 0.52 37.0% 0.52 38.4%Military Ground All All 59 0.64 18.6% 0.77 18.6%

Mean PP Median PP


UNCLASSIFIED


10


Productivity Parameter (IB)

Putnam Model – In-Bounds Results

Removed 55 data points with calculated

PP values below threshold aka “out of

bounds”


UNCLASSIFIED


11

Op Env App Type Language Records MMRE PRED(30) Delta MMRE PRED(30) DeltaAll All All 126 0.58 19.0% -6.4%▼ 0.66 32.5% -4.0%▼All Cmd/Ctrl All 34 0.48 35.3% -16.7%▼ 0.49 41.2% -10.8%▼All Database All 9 0.28 66.7% 3.1%▲ 0.33 44.4% -19.2%▼All Mission Planning All 17 0.49 35.3% 13.1%▲ 0.47 41.2% 13.4%▲All OS/Exec All 8 0.47 50.0% 27.8%▲ 0.57 37.5% 15.3%▲All Signal Processing All 20 0.79 5.0% -20.6%▼ 0.98 20.0% -15.9%▼All Simulation All 6 1.21 0.0% 0.0%▼ 1.30 0.0% -50.0%▼All Support All 19 0.45 26.3% 2.5%▲ 0.60 31.6% 8.0%▲All Test All 13 0.59 23.1% -8.2%▼ 0.60 30.8% -0.5%▼All All C/C++ 48 0.60 14.6% -14.9%▼ 0.65 29.2% -15.1%▼All All Ada 48 0.66 25.0% -2.8%▼ 0.76 25.0% -0.3%▼All All Java 5 0.52 0.0% 0.0%▼ 0.42 20.0% 0.0%▼Unmanned Space All All 18 0.16 77.8% 7.4%▲ 0.17 77.8% 37.1%▲Mil-Spec Avionics All All 9 0.71 22.2% 8.6%▲ 0.76 22.2% 4.0%▲Military Mobile All All 50 0.50 28.0% -9.0%▼ 0.51 30.0% -8.4%▼Military Ground All All 49 0.55 18.4% -0.2%▼ 0.73 24.5% 5.9%▲

Mean PP Median PP

Putnam Model – In-Bounds Accuracy

• Removing the data points with “out of bounds” Productivity calculations has little effect on the accuracy

• Some values go up, some go down• Database and Unmanned Space continue to have good performance


UNCLASSIFIED


12

SEER Model

• Basic Equation: Cte = (Size)/(K)1/2 Schedule- Size in ESLOC- K is software life-cycle effort in person-years- Schedule is the development time in years- Cte is the Effective Technology Constant ranges from 2.7 to 22,184.1- Cte increases, effort (cost) decreases

- Software development effort is 0.3945 of the total life-cycle effort (K)

• Using the Aerospace database, solve for Cte and observe stratified results

• Can a Productivity value be developed that produces accurate results?- Effort (PM) = ((Size / (Cte*Schedule))2 (0.3945) * 12


ESLOC = ESLOC K (PY) = Labor Effort/(12)(0.3945)

Schedule (yrs) = Develop Schedule / 12


UNCLASSIFIED


13

SEER Model – Results

All calculated Cte values are

“in-bounds” with model thresholds


Cte

Example (Mission Plan, Median Cte):Effort (PM) = ((Size / (2695 * Schedule))2* (0.3945) * 12


UNCLASSIFIED


14

SEER Model – Accuracy

• Using the mean or median Cte values, SEER model performs poorly for most strata• Performs well in the Mission Planning Application Domain and Java-based developments• Difficult to remove any data points as all the calculated Cte values are “in-bounds”• Is there another value at work in the SEER model that can indicate possible candidates for

removal?


Mean Cte Median Cte


UNCLASSIFIED


15

SEER Model – Calculating D

• The Staffing Complexity factor, D, represents the difficulty on terms of the rate at which staff can be added to a software product

• D ranges from 4 to 28, where higher values equate to very complex software that is difficult to staff (missile algorithms) and lower values equate to simple software that can be broken up and staffed easily (data entry)

• D is interactive with the schedule and effort

• Formula: D = K/(Schedule)3

• Calculate D for the database and remove the values that are out of bounds


UNCLASSIFIED


16

Op Env App Type Language Records MMRE PRED(30) Delta MMRE PRED(30) DeltaAll All All 139 0.81 23.7% 4.4%▲ 0.83 22.3% 7.4%▲All Cmd/Ctrl All 36 0.54 22.2% 4.2%▲ 0.49 27.8% 9.8%▲All Database All 10 0.57 30.0% -6.4%▼ 0.50 40.0% 3.6%▲All Mission Planning All 17 0.62 58.8% -2.3%▼ 0.67 64.7% -2.0%▼All OS/Exec All 14 0.93 42.9% 15.1%▲ 1.05 35.7% 13.5%▲All Signal Processing All 26 1.02 11.5% -3.9%▼ 1.51 26.9% 9.0%▲All Simulation All 6 0.99 0.0% -12.5%▼ 1.18 0.0% -25.0%▼All Support All 15 0.86 33.3% 23.8%▲ 0.97 46.7% 13.4%▲All Test All 15 0.69 26.7% -4.6%▼ 0.75 26.7% 1.7%▲All All C/C++ 48 0.88 27.1% 7.4%▲ 1.02 27.1% 7.4%▲All All Ada 55 0.66 23.6% -1.7%▼ 0.67 23.6% -5.5%▼All All Java 0 - 0.0% -60.0%▼ - 0.0% -60.0%▼Unmanned Space All All 24 1.05 4.2% -3.2%▼ 1.91 33.3% -14.8%▼Mil-Spec Avionics All All 18 1.25 22.2% 4.0%▲ 1.72 16.7% -1.5%▼Military Mobile All All 57 0.46 29.8% 7.9%▲ 0.40 40.4% 18.5%▲Military Ground All All 40 0.78 30.0% 8.0%▲ 0.94 40.0% 9.5%▲

Mean Cte Median Cte

SEER Model – In-Bounds Accuracy

• Staffing Complexity boundaries remove 42 data points• Removing the data points with “out of bounds” Staffing Complexity values has little effect

on the accuracy• Some values go up, some go down, Median predictions trend upward• Mission Planning continues to perform well, but Java data points are completely out of

bounds


UNCLASSIFIED


17

COCOMO II Model

• Basic Equation: Effort (PM) = 2.94 * (EAF) (Size)E

- Size in KESLOC- EAF is the Effort Adjustment Factor, used to calculate productivity- E is the exponential scaling factor; default value of 1.0997- EAF ranges from 0.0569 to 80.8271- As EAF increases, effort (cost) increases

• Using the Aerospace database, solve for EAF and observe stratified results


Size = ESLOC / 1000 Effort = Labor Effort Schedule is not used in the equation


UNCLASSIFIED


18

COCOMO II – Results

All calculated EAF values are

“in-bounds” with model thresholds


EAF

Example (Mission Plan, Median EAF):Effort = 2.94 * (1.88) (Size)1.0997


UNCLASSIFIED


19


Mean EAF Median EAF

COCOMO II – Accuracy

• COCOMO II performs average to good for many strata using mean and median EAF values• Performs well in the Command and Control and Database Application Domains, Java-

based developments, and Military Mobile Operating Environment• Difficult to remove any data points as all the calculated EAF values are “in-bounds”• Is there another value in the COCOMO II model that can indicate possible candidates for

removal?


UNCLASSIFIED


20

COCOMO II Exponent

• The effort equation’s scaling exponent is also used in the COCOMO II Schedule Equation

- Schedule = 3.67 * (Effort)F

- F = 0.28 + 0.2 * (E – 0.91)• Where the COCOMO II effort equation does not use schedule as an input, the

data in the Aerospace database could be used to solve for E

- E = [(ln (Schedule/3.67)/ln(Effort)) - 0.098)] / 0.2• E ranges from 0.91 to 1.2262

• Calculate E from the Schedule and Effort values of the Aerospace database and remove values that are out of bounds


UNCLASSIFIED


21

COCOMO II Model – In-Bounds Accuracy

• Scaling Exponent boundaries remove 72 data points• Calculating the Scaling Exponent from Schedule and Effort and removing the data points

with “out of bounds” values eliminates almost all accuracy• Prediction values almost universally drop to unusable values

Op Env App Type Language Records MMRE PRED(30) Delta MMRE PRED(30) DeltaAll All All 109 0.57 8.3% -20.4%▼ 0.57 8.3% -39.2%▼All Cmd/Ctrl All 30 0.56 6.7% -17.3%▼ 0.56 6.7% -57.3%▼All Database All 9 0.62 0.0% -72.7%▼ 0.63 0.0% -63.6%▼All Mission Planning All 14 0.50 21.4% -28.6%▼ 0.50 21.4% -34.2%▼All OS/Exec All 11 0.61 0.0% -27.8%▼ 0.62 0.0% -44.4%▼All Signal Processing All 18 0.58 5.6% -35.4%▼ 0.58 5.6% -38.0%▼All Simulation All 5 0.59 20.0% 20.0%▲ 0.60 20.0% -17.5%▼All Support All 10 0.54 0.0% -42.9%▼ 0.54 0.0% -38.1%▼All Test All 12 0.60 16.7% -33.3%▼ 0.60 16.7% -33.3%▼All All C/C++ 40 0.57 10.0% -14.6%▼ 0.57 10.0% -31.0%▼All All Ada 41 0.59 9.8% -25.6%▼ 0.59 9.8% -35.8%▼All All Java 0 - 0.0% -100.0%▼ - 0.0% -100.0%▼Unmanned Space All All 19 0.70 0.0% -48.1%▼ 0.69 0.0% -59.3%▼Mil-Spec Avionics All All 10 0.60 10.0% -26.4%▼ 0.60 10.0% -26.4%▼Military Mobile All All 45 0.57 6.7% -34.4%▼ 0.57 6.7% -56.3%▼Military Ground All All 35 0.50 14.3% -33.2%▼ 0.50 14.3% -33.2%▼

Mean EAF Median EAF


UNCLASSIFIED


22

Paired Data

• A popular method with this type of database is stratifying by Operating Environment and Application Domain together to analyze data

• As more strata are introduced, less values are available

• What is the accuracy in terms of PRED(30) of the mean/median productivity calculations of paired data for both full and “in-bounds” data sets?

• Key assumption: Need at least 5 data points in a paired strata to be applicable


UNCLASSIFIED


23

Putnam Paired Data Performance

Cmd/Ctrl Database Mission Planning OS/Exec Signal Processing Simulation Support TestMean 11.1% N/A N/A N/A 0.0% 0.0% 35.3% 33.3%Median 0.0% N/A N/A N/A 11.1% 0.0% 35.3% 44.4%Mean 65.5% N/A 15.4% N/A 38.1% N/A N/A N/AMedian 65.5% N/A 46.2% N/A 42.9% N/A N/A N/AMean N/A N/A N/A 37.5% N/A N/A N/A N/AMedian N/A N/A N/A 37.5% N/A N/A N/A N/AMean 10.0% N/A N/A N/A 57.1% N/A N/A N/AMedian 10.0% N/A N/A N/A 28.6% N/A N/A N/A

Military Ground

Military Mobile

Mil-Spec Avionics

Unmanned Space

Cmd/Ctrl Database Mission Planning OS/Exec Signal Processing Simulation Support TestMean 40.0% N/A N/A N/A 12.5% N/A 43.8% 25.0%Median 40.0% N/A N/A N/A 12.5% N/A 18.8% 37.5%Mean 34.8% N/A 25.0% N/A 0.0% N/A N/A N/AMedian 47.8% N/A 50.0% N/A 42.9% N/A N/A N/AMean N/A N/A N/A N/A N/A N/A N/A N/AMedian N/A N/A N/A N/A N/A N/A N/A N/AMean 100.0% N/A N/A N/A N/A N/A N/A N/AMedian 100.0% N/A N/A N/A N/A N/A N/A N/A

Military Ground

Military Mobile

Mil-Spec Avionics

Unmanned Space

Unbounded:

In-Bounds:

• Little difference in overall effect • Some pairs accurate in unbounded data and some pairs accurate in “in-bounds” data


UNCLASSIFIED


24

SEER Paired Data Performance

Unbounded:

In-Bounds:

Cmd/Ctrl Database Mission Planning OS/Exec Signal Processing Simulation Support TestMean 11.1% N/A N/A N/A 11.1% 20.0% 5.9% 22.2%Median 22.2% N/A N/A N/A 33.3% 40.0% 23.5% 22.2%Mean 27.6% N/A 61.5% N/A 33.3% N/A N/A N/AMedian 27.6% N/A 76.9% N/A 38.1% N/A N/A N/AMean N/A N/A N/A 0.0% N/A N/A N/A N/AMedian N/A N/A N/A 50.0% N/A N/A N/A N/AMean 30.0% N/A N/A N/A 0.0% N/A N/A N/AMedian 50.0% N/A N/A N/A 42.9% N/A N/A N/AUnmanned Space

Military Ground

Military Mobile

Mil-Spec Avionics

Cmd/Ctrl Database Mission Planning OS/Exec Signal Processing Simulation Support TestMean N/A N/A N/A N/A 0.0% N/A 27.3% 37.5%Median N/A N/A N/A N/A 50.0% N/A 54.5% 25.0%Mean 60.9% N/A 61.5% N/A 50.0% N/A N/A N/AMedian 60.9% N/A 76.9% N/A 58.3% N/A N/A N/AMean N/A N/A N/A 60.0% N/A N/A N/A N/AMedian N/A N/A N/A 60.0% N/A N/A N/A N/AMean 50.0% N/A N/A N/A 0.0% N/A N/A N/AMedian 25.0% N/A N/A N/A 0.0% N/A N/A N/A

Military Ground

Military Mobile

Mil-Spec Avionics

Unmanned Space

• Unbounded data has a few productive pairs• Using “in-bounds” data improves overall accuracy with more data pairs having higher PRED(30)

scores


UNCLASSIFIED


25

COCOMO II Paired Data Performance

Unbounded:

In-Bounds:

Cmd/Ctrl Database Mission Planning OS/Exec Signal Processing Simulation Support TestMean 44.4% N/A N/A N/A 33.3% 60.0% 58.8% 44.4%Median 66.7% N/A N/A N/A 44.4% 60.0% 58.8% 55.6%Mean 89.7% N/A 53.8% N/A 52.4% N/A N/A N/AMedian 86.2% N/A 63.5% N/A 57.1% N/A N/A N/AMean N/A N/A N/A 50.0% N/A N/A N/A N/AMedian N/A N/A N/A 50.0% N/A N/A N/A N/AMean 80.0% N/A N/A N/A 28.6% N/A N/A N/AMedian 80.0% N/A N/A N/A 57.1% N/A N/A N/A

Military Ground

Military Mobile

Mil-Spec Avionics

Unmanned Space

Cmd/Ctrl Database Mission Planning OS/Exec Signal Processing Simulation Support TestMean N/A N/A N/A N/A 16.7% N/A 0.0% 12.5%Median N/A N/A N/A N/A 16.7% N/A 0.0% 12.5%Mean 10.5% N/A 10.0% N/A 0.0% N/A N/A N/AMedian 10.5% N/A 10.0% N/A 0.0% N/A N/A N/AMean N/A N/A N/A N/A N/A N/A N/A N/AMedian N/A N/A N/A N/A N/A N/A N/A N/AMean 0.0% N/A N/A N/A N/A N/A N/A N/AMedian 0.0% N/A N/A N/A N/A N/A N/A N/AUnmanned Space

Military Ground

Military Mobile

Mil-Spec Avionics

• Unbounded data has many productive pairs with high PRED(30) values• Using “in-bounds” data eliminates accuracy with no data pairs having high PRED(30) values


UNCLASSIFIED


26

What Does It All Mean?

•Putnam Model:– Default model has some accuracy in certain areas using mean/median Productivity values– Bounding the data by Productivity thresholds has little effect– Paired Operating Environments and Application Domains are limited in accuracy

•SEER Model:– Default model has high prediction accuracy in only a few areas– Bounding the data by Staffing Complexity has limited effect– Paired, bounded data has the highest overall prediction accuracy for the model

•COCOMO II Model:– Default model has the best overall prediction accuracy– Bounding the data by Schedule-calculated Scaling Exponent produces poor results– Paired, unbounded data has some excellent prediction accuracy values

•Caveats:– No calibration tools or regression analysis was performed– No “fine-tuning” of the Productivity values was possible

• Calculating the average productivity variable for a model with stratification does not always produce credible results, but in certain cases, quick estimates can be made with limited information

• Pairing strata is very beneficial in some models• Calculated productivity ranges and resulting performance can be

used to develop and evaluate software cost models


UNCLASSIFIED


27

Future Research

• Software Requirements Data Reports (SRDR) performance in models with/without boundaries

• Performance against tool-calibrated model changes (Calico, SEER calibration, etc.)

• Optimal schedule vs. actual schedule performance


UNCLASSIFIED


28

Questions

cp – cost analytics and parametric estimation directorate unclassified approved for public release...

Documents

aerospace software cost

software input

software output

cp cost analytics

productivity model

mission critical software

software estimating

public release distribution