estimating cost size difficulty effort productivity work rate cost loc, fp mm (ideal) mm $ $/mm time
TRANSCRIPT
Estimating Cost
size
difficulty
effort
productivity
work
rate
cost
LoC, fp
mm (ideal)
mm
$
$/mm
time
Modeling
• Simple functional shape– e.g.– Based on general observations
• Very many parameters• Calibration based on past experience• Speculative due to high uncertainty• Not very accurate
exponentscalingsizeceffort _
So how should we measure (and estimate) the size of a software project?
LoC
• What is a line? What is code?• Typically count statements• Exclude comments• Exclude blank lines• What about marcos?• What about temporary code, e.g. testing?• Depends on language– handled in other part of model
• How do you estimate LoC from requirements?
size
difficultyeffort
product.work
ratecost
Function Points (fp)
• Estimate the functional content of the project– Based on requirements– Mainly external aspects of the system (black box)– Can be used early in the life cycle
• Invented at IBM in 1970s• Geared towards information systems• Several variants, e.g. used by SPR (Capers
Jones’s company)• Translation of fp to LoC depends on language– fp itself independent of language
size
difficultyeffort
product.work
ratecost
Function Points (fp)
How is it done?• Based on counting 4-7 parameters • Multiplying them by weighting factors• Summing up the weighted counts • Multiplying by a complexity adjustment factor
size
difficultyeffort
product.work
ratecost
fp Spreadsheet
Parameter Count Weight ResultNumber of inputs _____ x 4 = _____Number of outputs _____ x 5 = _____Number of queries _____ x 4 = _____Number of data files _____ x 10 = _____Number of interfaces _____ x 7 = _____Unadjusted total _____Complexity adjustment _____Adjusted fp total _____
size
difficultyeffort
product.work
ratecost
fp Spreadsheat
Parameter Count Weight ResultNumber of inputs _____ X 4 = _____Number of outputs _____ X 5 = _____Number of queries _____ X 4 = _____Number of data files _____ X 10 = _____Number of interfaces _____ X 7 = _____Unadjusted total _____Complexity adjustment _____Adjusted fp total _____
size
difficultyeffort
product.work
ratecost
Controls (what to do)
Internal (e.g. index)
With other programs
Screens
Records / fields
fp Spreadsheat
Parameter Count Weight ResultNumber of inputs _____ x 4 = _____Number of outputs _____ x 5 = _____Number of queries _____ x 4 = _____Number of data files _____ x 10 = _____Number of interfaces _____ x 7 = _____Unadjusted total _____Complexity adjustment _____Adjusted fp total _____
size
difficultyeffort
product.work
ratecost
Can have different weights for “simple”,
“average”, or “complex”
Complexity AdjustmentData communicationsDistributed functionsPerformance objectivesHeavily used configurationTransaction rateOn-line data entryEnd-user efficiencyOn-line updateComplex processingReusabilityInstallation easeOperational easeMultiple sitesFacilitate change
size
difficultyeffort
product.work
ratecost
• Rate each on a scale of 0 to 5
• Sum them up• Divide by 100• Add 0.65• This gives a factor in
the range 0.65-1.35 (35%)
Complexity AdjustmentData communicationsDistributed functionsPerformance objectivesHeavily used configurationTransaction rateOn-line data entryEnd-user efficiencyOn-line updateComplex processingReusabilityInstallation easeOperational easeMultiple sitesFacilitate change
size
difficultyeffort
product.work
ratecost
• Rate each on a scale of 0 to 5
• Sum them up• Divide by 100• Add 0.65• This gives a factor in
the range 0.65-1.35 (35%)
0 = no influence1 = insignificant influence2 = moderate influence3 = average influence4 = significant influence5 = strong influence
Complexity AdjustmentData communicationsDistributed functionsPerformance objectivesHeavily used configurationTransaction rateOn-line data entryEnd-user efficiencyOn-line updateComplex processingReusabilityInstallation easeOperational easeMultiple sitesFacilitate change
size
difficultyeffort
product.work
ratecost
• Rate each on a scale of 0 to 5
• Sum them up• Divide by 100• Add 0.65• This gives a factor in
the range 0.65-1.35 (35%)
fp Spreadsheat
Parameter Count Weight ResultNumber of inputs _____ x 4 = _____Number of outputs _____ x 5 = _____Number of queries _____ x 4 = _____Number of data files _____ x 10 = _____Number of interfaces _____ x 7 = _____Unadjusted total _____Complexity adjustment factor (35%) _____Adjusted function point total _____
size
difficultyeffort
product.work
ratecost
COCOMO
• Stands for COnstructive COst MOdel• Published by Barry Boehm in 1981 for waterfall• COCOMO II update for modern methodologies
published in 2000• Actually three models, with many parameters– Early prototype: checking high-risk issues stage– Early design: architecture development stage– Post-architecture: code development to delivery
stage, very detailed
Size in COCOMO IIsize
difficultyeffort
product.work
ratecost
• Start with function points• Translate to KLoC based on language
Language LoC/fpAssembly 320C 128Fortran 77 105Lisp 64Java 53Visual C++ 34Perl 27
The Modelsize
difficultyeffort
product.work
ratecost
• MM = man-months of effort• KLoC = lines of code (‘000)– Includes model for taking reuse into account– Also estimate factor of increase due to changes
• SFj = set of scaling factors
• EMi = set of effort multipliers
16
1
91.05
194.2i i
SFEMKLoCMM j j
The Modelsize
difficultyeffort
product.work
ratecost
16
1
91.05
194.2i i
SFEMKLoCMM j j
5
12.028.0
67.3 j jSFMMtime
time
MMstaff
Note: not the other way around!!!
Calibration
• The model includes two “top-level” constants– Average productivity of 2.94 and exponent of 0.91
• Also dozens of parameters in scaling factors and effort multipliers
• These are all derived by calibrating the model to data about 161 specific projects from the late 1990s using Bayesian approach
• Users should calibrate to their own data
The Exponentsize
difficultyeffort
product.work
ratecost
5
191.0
j jSFLoC
• <1 reflects economy of scale– Uncommon– Possible due to fixed startup costs in small projects
• >1 reflects diseconomy of scale– Due to growth of inter-person communication
needs– Due to growth of integration overhead
Scaling Factorssize
difficultyeffort
product.work
ratecost
• The project is similar to previous ones• Flexibility in achieving goals• Risks have been resolved• Team is cohesive• Process is mature (based on 18-item
questionnaire)• Each factor has several levels• Each level has a score– Scores go down from ~0.07 to 0
Scaling Factorssize
difficultyeffort
product.work
ratecost
• The project is similar to previous ones• Flexibility in achieving goals• Risks have been resolved• Team is cohesive• Process is mature (based on 18-item
questionnaire)• Each factor has several levels• Each level has a score– Scores go down from ~0.07 to 0
Example:Thoroughly unprecedented = 0.0620Largely unprecedented = 0.0496Somewhat unprecedented = 0.0372Generally familiar = 0.0248Largely familiar = 0.0124Thoroughly familiar = 0.0000
Effort Multiplierssize
difficultyeffort
product.work
ratecost
• Each has a scale of possible values
• All scales include 1.00 as the default
• Some have only higher values
• Others have both higher and lower
Required reliabilityDatabase sizeProduct complexityIntended reusabilitySuitability of documentationExecution time constrainsMain storage constraintPlatform volatilityAnalysts capabilitiesProgrammers capabilitiesPersonnel continuityApplications experiencePlatform experienceLanguage/tools experienceUse of toolsMulti-site developmentRequired schedule
Effort Multiplierssize
difficultyeffort
product.work
ratecost
• Each has a scale of possible values
• All scales include 1.00 as the default
• Some have only higher values
• Others have both higher and lower
Required reliabilityDatabase sizeProduct complexityIntended reusabilitySuitability of documentationExecution time constrainsMain storage constraintPlatform volatilityAnalysts capabilitiesProgrammers capabilitiesPersonnel continuityApplications experiencePlatform experienceLanguage/tools experienceUse of toolsMulti-site developmentRequired schedule
Example:Very low = 0.73Low = 0.87Nominal = 1.00High = 1.17Very high = 1.34Extra high = 1.74
Selection basedon table withexamples for control, datacomputation,devices, anduser interface
Effort Multiplierssize
difficultyeffort
product.work
ratecost
• Each has a scale of possible values
• All scales include 1.00 as the default
• Some have only higher values
• Others have both higher and lower
Required reliabilityDatabase sizeProduct complexityIntended reusabilitySuitability of documentationExecution time constrainsMain storage constraintPlatform volatilityAnalysts capabilitiesProgrammers capabilitiesPersonnel continuityApplications experiencePlatform experienceLanguage/tools experienceUse of toolsMulti-site developmentRequired schedule
Example:
Nominal = 1.00High = 1.05Very high = 1.17Extra high = 1.46
Available storage used:50%70%85%95%
Effort Multiplierssize
difficultyeffort
product.work
ratecost
• Each has a scale of possible values
• All scales include 1.00 as the default
• Some have only higher values
• Others have both higher and lower
Required reliabilityDatabase sizeProduct complexityIntended reusabilitySuitability of documentationExecution time constrainsMain storage constraintPlatform volatilityAnalysts capabilitiesProgrammers capabilitiesPersonnel continuityApplications experiencePlatform experienceLanguage/tools experienceUse of toolsMulti-site developmentRequired schedule
Highest impact on productivity, as assessed by max/min factor:• Staff capabilities: 3.53• Project complexity: 2.38• Time constraint: 1.63• All others: range of 1.26 to 1.54
Examplesize
difficultyeffort
product.work
ratecost
• Assume an estimated size of 100 KLoC• Average large project exponent = 1.15• Average project all effort multipliers = 1.00
58710094.2 15.1 MM
7.2958767.3 33.0 time
75.197.29
587staff
Use-Case Points
• Function points are based on information system concepts like queries and transactions
• Modern systems are not characterized by the same attributes
• But they can be characterized in terms of use-cases
• Which are also known in an early phase of the lifecycle
Use-Case Points ENVTECHAUCUCP
Use-Case Points ENVTECHAUCUCP
UC = Sum of weights for simple, average, and complex use cases
Type Steps Classes WeightSimple 3 5 5Average 4-7 5-10 10complex >7 >10 15
Use-Case Points ENVTECHAUCUCP
A = Sum of weights for simple, average, and complex actors
Type characteristics WeightSimple Programmatic using API 1Average Programmatic using
protocol2
Complex Human using GUI 3
Use-Case Points ENVTECHAUCUCP
TECH = Technical complexity factor
Each factor scoredfrom 0 to 5 andMultiplied by weightFrom table
Range: 0.6-1.3
Distributed system 0.02Resp. time requirement 0.01End-user efficiency 0.01Complex processing 0.01Reusable code 0.01Easy to install 0.00
5Easy to use 0.00
5Portability 0.02Maintenance 0.01Concurrent/parallel 0.01Security features 0.01Access by third party 0.01End-user training 0.01
iiWF6.0
Use-Case Points ENVTECHAUCUCP
ENV = environmental complexity factor
Each factor scoredfrom 0 to 5 andMultiplied by weightFrom table
Range: 0.42-1.70
Familiarity with process 0.045Application experience 0.015OO experience 0.03Lead analyst capability 0.015Team motivation 0.03Requirements stability 0.06Part-time staff -0.03Difficult programming language -0.03
iiWF4.1
Agile
• Approach is to guarantee quality and schedule at expense of features
• User stories broken into tasks of 1-3 “ideal days”
• Measure velocity = how many ideal days correspond to a real day
• Plan user stories for next iteration taking velocity into account
Schedule
Common approach: • Manager decides on schedule• Subordinates tell him it will be OK– SNAFU principle: Accurate communication is only
possible between equals• Engineers don’t have a say• Schedule slippage is discovered too late• Overwork, hysteria, reduced quality, and late
delivery
Brooks’s Law
From “The Mythical Man-Month”
Adding manpower to a late software project makes it later
Men and months are not interchangeable• Attributed to– The need to train new personnel– communication overhead within a larger team– Serial tasks like design, debugging, and integration
Schedule
Alternative approach: • Manager decides on schedule• Subordinates provide estimates– Based on best available input from engineers– Managers may not change this
• Manager creates feedback loop: disappointing estimates lead to revised resources or tasks for engineers
• Schedule affects manager’s performance rating but not engineer performance rating
Technical Debt
Oftentimes you have two options of how to do something:
A clean design OR Quick and dirtyThe difference between them is the technical debt:• It burdens future development (harder to make
progress)• It accrues interest (you will pay by harder work)• You can (and should?) pay off the principal
(refactor to achieve a clean design)
Examples
• Not using classes to separate concerns• Partial/no unit testing• Using an inefficient algorithm because it’s simpler• Not checking inputs• Using bad identifier names or not naming
constants• Using constants instead of settable parameters• Skipping documentation• Cryptic if any error messages
Examples
• Not using classes to separate concerns• Partial/no unit testing• Using an inefficient algorithm because it’s simpler• Not checking inputs• Using bad identifier names or not naming
constants• Using constants instead of settable parameters• Skipping documentation• Cryptic if any error messages
Technical debt is a
form of risk (albeit risk
introduced intentionally)
Managing technical debt
is risk management
Considerations
• Take on debt to exploit a business opportunity– e.g. make a release to gain market share
• Pay off debt to avoid paying interest– Will make future progress more efficient, but..– Incurs “wasted” time in which we don’t make
progress• Continue paying interest if it is low enough– e.g. if dirty code is peripheral
Refactoring
• High level– Create new abstractions (+ information hiding)– Move methods from/to super/sub class
• Low level– Changing names– Extracting methods– Supported by tools
• Need to include in schedule