observations on cost modeling and performance measurement of long term archives kathy fontaine nasa...

31
Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems Working Groups Greg Hunolt, Bud Booth, Mel Banks SGT, Inc. PV2007 Conference October 9 - 11, 2007 DLR Oberpfaffenhofen - Munich - Germany CEOS WGISS October 15 - 19, 2007 DLR Oberpfaffenhofen - Munich - Germany

Upload: monica-lester

Post on 14-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

Observations on Cost Modeling and Performance Measurement of Long Term Archives

Kathy FontaineNASA Goddard Space Flight Center

Earth Science Data Systems Working Groups

Greg Hunolt, Bud Booth, Mel BanksSGT, Inc.

PV2007 ConferenceOctober 9 - 11, 2007

DLR Oberpfaffenhofen - Munich - Germany

CEOS WGISSOctober 15 - 19, 2007

DLR Oberpfaffenhofen - Munich - Germany

Page 2: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

2

Agenda• Review - What is the Cost Estimation Toolkit?

– Goal and Approach of the CET (Cost Estimation Tool) Development

– High Level Description of the Data Activity Reference Model

• Experience / Lessons Learned - Building and Maintaining the Comparables Database (CDB)

• High Level Description of the Cost Estimating Tool

• Application of the CET to Long-Term Archives

• Summary

• Next Steps

Page 3: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

3

Goal of the CET (Cost Estimation Tool) Development

• NASA has always used cost estimating models for planning Earth and space science flight projects– For estimating costs of instrument packages, spacecraft, mission control

centers, etc.

• NASA had no tool for estimating life cycle costs of science ground data handling capabilities, whether stand-alone or within a flight project.

• The goal of the CET development was to see whether that gap could be filled - – Project was begun in 2002.– CET Prototypes were tested and evaluated in 2003 and 2004.– ‘Operational Beta’ versions were completed in 2005, 2006, and 2007.– Initial testing at GSFC and LaRC in 2005, 2006, 2007 were successful.– CET ‘operational beta’ being evaluated for addition to GSFC’s Integrated

Development Center’s package of tools, and was made available as a NASA Open Source item in 2007.

• - for the PI planning a new Data Activity– To help PI consider the full range of items that will contribute to the life cycle

cost of a new data activity and to produce an estimate that the PI can com- pare to estimates produced by other means.

Page 4: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

4

CET Approach• Cost Estimation by Analogy Method was Adopted

– Decision based on Benchmark study and internal testing of existing tools (PRICE, SEER, COCOMO, and others);

– At the time, did not find other acceptable parametric methods for estimating life cycle data costs for implementation and maintenance/operations costs;

– Ensures that estimates will be based on experience with existing science ground data handling activities;

– Requires assembly of information about existing activities, and…– Mapping of that information to a common reference model, so that information from multiple

activities with multiple data providers can be normalized and used together in the estimating process.

• Comparables Database (CDB)– The database of information from many existing data activities mapped to the common

reference model.

• Data Activity Reference Model– Based on reference model developed during 2001 comparative analysis of 19 U.S. and

international data activities.– Includes a set of development, operational and support functions / areas of cost, and descriptors

for each.

• CET Estimation by Analogy Implementation– The CET uses adaptive regression curve fitting for estimating staffing levels and parametric

techniques (e.g. cost curves) for non-staff cost items.

Page 5: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

5

Data Activity Reference Model

• A ‘Data Activity’ is– An entity that performs data handling functions that may include ingest,

product generation, storage/archive, distribution, and support functions (see below).

– A data activity’s life cycle includes implementation and a period of operations (when data activity is performing data handling functions) that may overlap.

– A data activity can be a ‘stand-alone’ organization or embedded within a flight project or other science or applications project. A ‘data center’ can include more than one distinct data activity.

• Data Activity Reference Model– Functions with Descriptors for each…– Operating Functions: Ingest, Product Generation, Archive, Search and Order,

Access and Distribution, User Support.– Support Functions: Documentation, Implementation, Sustaining Engineering,

Engineering Support, Management, Technical Coordination, Facility/Infrastructure.

– See paper for more detail on functions, example of descriptors.– Compatible with OAIS where the models overlap, see paper for more detail.

Page 6: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

Functions

DataActivity

Template

Information on Existing Data Activities

DAACs, ESIPS,SIPSs, Space Science DataCenters, etc

PI User Input

Specify Mission Schedule, etc.Select from Menu of FunctionsProvide Descriptors for eachselected function.

Cost Estimation by Analogy:Function by Function,Staff – Adaptive Regression Curve Fitting;Non-Staff – Parametric.

CET Output

Life-cycle costs andstaffing levels

Comparables Database (CDB)

DescriptorsFor each Function

•••

Mapped Data Activity Information, Year by Year, Function by

Function

Cost Estimation Toolkit (CET)

CDB Building & Maintenance Tool

Map Data ActivityInformation toReference Model

Graphs

Sensitivity Analysis

Version 2.1 - 29 Data Activities

Concepts:Reference Model,CDB and CET

General Data Activity Reference ModelGeneral Data Activity Reference Model

Page 7: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

7

High Level Description of CET

• Excel-based, uses Visual Basic for Applications, two workbooks, one for CET, the other holds the CDB; runs on PC or Macintosh platforms.

• Use the CET to

– Describe a new Data Activity: Menu Driven Sequence of Forms for Selecting Functions and Entering Descriptors (example to follow).

– Produce a life cycle estimate: year by year, functional breakdown, staffing profile and costs, costs for non-staff items (example to follow).

– Run a ‘what-if’: vary one or more inputs, re-run, produce new estimate and comparison with original estimate.

– Test sensitivity of estimate to a range of variation of a selected descriptor.

– Produce graphs: select from a number of options (examples to follow).

– Review and edit/tailor the estimate… tool offers hints such as:• Adjust staffing levels to smooth out ups and downs that track workload changes but

would be impractical to implement;• Delete costs for items included in loaded labor rates;• Adjust for re-use of existing resources.

Page 8: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

8

Data Activity Reference Model: Functions and Descriptors

Data Activity Functions No. of Descriptors to Describe Each Function

Ingest 9

Product Generation 21

Documentation 4

Archive 16

Distribution 31

User Support 6

Management 5

Sustaining Engineering 4

Engineering Support 3

Implementation 8

Page 9: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

9

Data Activity Reference Model Ingest Function Descriptors (Example)

Total Ingest FTE

Ingest Technical FTE

Ingest Operations FTE

Ingest Function Level of Service (LOS)

External Ingest Interfaces

Product Types Ingested per Year

Ingest Automation LOS

Number of Products Ingested per year

Ingest Volume per Year

Page 10: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

10

CET - Sample Ingest Descriptor Input Form

Page 11: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

11

CET Screen Shot – Archive Form

Page 12: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

12

CET Screen Shot – Processing Form

Page 13: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

13

CET Screen Shot – Sample Output Table

Page 14: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

14

CET - Sample Life Cycle Cost Estimate Output

Page 15: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

15

CET - Graph Example 1

3. Sample Activity - Total Mission Life Staffingby Labor Cost Category

1%

50%

9%

33%

7%

Admin Support

Development / Engineering

Management

Operations

Technical / Science

Page 16: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

16

CET - Graph Example 2

5. Sample Activity - Avgerage Annual Staffing by Function FTE - Operations Period

0

0

4.11

0.76

0.35

3.87

0.42

1.79

1.43

2.53

0.97 Archive

Development

Distribution

Documentation

Eng Support

Ingest

Management

Processing

Sustaining Eng

Tech Coord

User Support

Page 17: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

17

CET – Graph Example 3

7. Sample Activity - Total Estimated Staff Costs

49%

10%

34%

7%

Development / Engineering

Management / Admin

Operations

Technical / Science

Page 18: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

18

Application of the CET to Long-Term Archives

• CET and CDB currently do not directly support Long Term Archives – No such NASA requirement currently exists for Earth science data, but they

could be extended to do so…

• Step 1 – Extend Data Activity Reference Model:– Analyze OAIS model (especially Preservation Planning and aspects of Ingest,

Archival Storage, and Data Management) and existing Long Term Archives– Identify specific functions or aspects of functions associated with long term

archiving that go beyond what the model now includes.

• Step 2 – Extend the Comparables Database:– Collect information from a number of existing Long Term Archives– Map to the extended Data Activity Reference Model– Populate the CDB

• Step 3 – Extend the CET:– Add estimation of new factors particular to Long Term Archives

• Extended CET / CDB could then be used to estimate staffing / costs for a New Long Term Archive, and could be used to support management of existing Long Term Archives.

Page 19: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

19

Observations

• Yes, the gap could be filled– The CET is proving to be a valuable tool for estimating the life cycle costs of

scientific data processing, archive, and distribution activities. – The information collected for the CDB can also be used by such activities to

monitor their performance. – The CET and its database is capable of being extended to encompass long term

archives, thus providing a quantitative tool for both planning their development and monitoring their performance.

• However, – Cost estimation by analogy requires, among other things,

• lots of analogies [many data activities of similar sizes, for instance]• lots of maintenance [information must be updated to maintain currency and relevancy]• lots of security [data activity information must not be labeled or otherwise identifiable]

– All of the above would require a good, solid set of requirements, a project plan, and other necessary management and review structure.

• And so…

Page 20: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

20

Next StepsNext Steps

• NASA is preparing to do an in-depth evaluation of the tool– NASA’s evolving data systems present a different

overall picture than was present at the beginning of this process;

– It is now time to determine whether ‘it should continue to be done,’ and if so, which pieces and how.

– Existing user feedback is being incorporated, and will continue to be critical to this process.

http://opensource.gsfc.nasa.gov

• NASA is preparing to do an in-depth evaluation of the tool– NASA’s evolving data systems present a different

overall picture than was present at the beginning of this process;

– It is now time to determine whether ‘it should continue to be done,’ and if so, which pieces and how.

– Existing user feedback is being incorporated, and will continue to be critical to this process.

http://opensource.gsfc.nasa.gov

Page 21: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

21

Thank You for Your Attention!

Questions?

Further questions or comments: [email protected]

Thank You for Your Attention!

Questions?

Further questions or comments: [email protected]

Page 22: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

22

Backup Charts

Page 23: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

23

CET Effort Estimation Process

1

Compute:Parameters –Year by YearSummed over Streams

Activity Dataset -Describes a New Activity

Workload, LOSParameters:Single New ActivityYear by YearFunction by FunctionStream by Stream

Comparables DB –Describes Existing Activities

Effort and WorkloadParameters:Multiple ActivitiesYear by YearFunction by Function

Compute:Annual Averages,Workload and EffortParameters forEach CDB Activity

Generate:Effort Estimating Relationshipsfor each Workload parameter

Compute:Annual Averages,Workload and EffortParameters acrossCDB Activities

Compute:Set of year by yeareffort estimates for each workloadparameter

Compute:Year by YearEffort Estimates:Correlation weighted averageover workload parameters, apply levels of service

Overall Effort Estimate

Form of effort estimate computation:Effort[new activity] = f ( Workload [new activity] wheref is function based on CDB activities ’ effort -workload developed using “Curve-Fit” approach.

Intermediate Parameters

Parameter by Parameter

Effort Estimation

Page 24: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

24

Cost Estimating Approach

• Method is Cost Estimation by Analogy – the data activities in the CDB are assumed to be analogs for a new data activity to be estimated.

• Year by year staff effort for new data activity is estimated from mission and

expected year by year workload (using “effort estimating relationships” – see next chart), then user’s projected local labor rates are applied to produce estimates of staff costs.

• Estimating of effort is done function by function, so CDB comparison is with data for separate functions rather than with whole data activities.

• Non-staff items are currently based on CDB history, use inflation normalization, parametric approaches, ‘cost curves’ etc., for projections.

Page 25: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

25

CET’s Effort Estimating Process

• Compute averages of annual workload parameters and staffing levels for each functional area for each CDB activity.

• Compute “Effort Estimating Relationships”, i.e. equations for FTE as function of workload parameters

– Using regression-based curve fitting (see next chart) for operating functions (ingest, processing, archive, distribution) and for implementation and sustaining engineering, system purchase cost (normalized to base year then projected);

– Using a ‘base plus delta’ approach for other non-operating functions – CDB averages as base, delta based on comparison of new activity LOS’s with CDB averages.

• Compute year by year staffing for the functional area for the new activity by

– Use the equations to compute a set of FTE estimates, each based on a specific workload parameter, and…

– Compute weighted average for each functional area’s staffing categories, weighted by curve fit correlation for each workload parameter,

– Use applicable Level of Service parameter(s) to bump up estimate if new activity’s LOS is higher than CDB average, or decrease if lower.

Page 26: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

26

Regression-based Curve Fitting - Detail

Curve Fitting is used to develop a relationship between workload parameters and FTE for the CDB data activities.

1. CET takes a set of data, i.e. values for a workload parameter and corresponding operational or technical FTE values, performs “cluster” outlier screening.

2. CET computes a set of eight curves, using regression: linear, quadratic, exponential, logarithmic, power, root, linear-exponential, linear-logarithmic.

3. CET eliminates those curves which drive estimated FTE negative or introduce double values (i.e. two FTE estimates for one workload value).

4. CET computes Pearson correlation coefficient for each curve left.5. CET checks all remaining curves for outliers – points whose departure from

the curve exceed a threshold multiple of standard deviation, eliminates an outlier point (the “worst”).

6. CET re-computes the curves without the outlier, makes sure each curve’s correlation is not worse.

7. CET repeats 5 and 6 until outliers are gone or outlier toss limit is reached. 8. CET selects the re-computed curve with the best correlation value.9. CET uses a limited linear projection if ADS workload exceeds CDB range.10. CET uses the final curve’s equation/coefficients to make year by year

estimates of FTE.

Page 27: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

27

Calibration / Tuning of the CET

• The premise of the cost estimation by analogy approach is that if the CET is calibrated against existing data activities, i.e. tuned to produce the best possible overall results for the known existing data activities, it will produce a good life cycle cost estimate for a new data activity.

• The CDB contains information for twenty-nine data activities that can be used as test subjects, since CDB information includes mission information, workload, staffing, etc.

• Calibration / tuning is accomplished by adjusting CET controls: parameter weighting, outlier removal limits, LOS adjustment coefficients, until the ‘best’ overall performance for the set of CDB data activities is achieved.

• The accuracy of the CET for existing CDB data activities is measured by independent testing…

Page 28: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

28

Independent Testing Process

• The data activity used as a test subject is not allowed to influence its own test results.

• In the independent testing process– The objective is to measure the error of the estimate of a data activity’s staffing

profile (estimate based on its mission and workload).

– A CDB data activity is selected to be a test subject.

– An Activity Data Set is prepared for the data activity, which contains the mission and workload information a CET user would enter.

– That activity is removed from the CDB.

– The CET reads the Activity Data Set, accesses the CDB, and produces an estimate for the data activity.

– The estimated staffing profile is compared with the actual staffing profile to determine the error, function by function and for the activity as a whole.

– The process is repeated for the set of CDB data activities.

– When all activities have been processed, overall errors across the data activities are computed: e.g. overall average absolute error and percentage, and overall bias.

Page 29: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

29

Independent Testing Results

• Results are based on testing with 28 CDB sites

• Test Results for the September 2006 version 2.1 of the CET– The typical annual error of estimate is 2.46 FTE (average absolute error, so

positive and negative errors don’t cancel). The average typical error % of actual is 22.9%.

– The overall annual average error across the 29 sites is –0.03 FTE, which is –0.3%, showing very little overall bias.

– For the individual estimates for the 29 data activities:

13 have errors less than 20%, 18 have errors less than 30%; 21 have errors less than 50%; and overall smaller activities have greater errors (see next chart).

– For the CDB activities, the average standard deviation of FTE for a function, weighted by the number of activities having the function, is 2.57. This is a rough measure of the variability of the information in the CDB.

– The standard deviation of the typical error for the Version 2.1 CET, 1.66, is well within the range of variability of the information in the CDB.

Page 30: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

30

Independent Testing Results, Continued

Actual Staff Size vs ATE (Average Typical Error) Percentage

0.0%

50.0%

100.0%

150.0%

200.0%

250.0%

300.0%

0.00 5.00 10.00 15.00 20.00 25.00 30.00

Actual Staff Size, FTE (Averaged over Activity Life)

ATE Err %

If the actual size of an activity was 10 FTE or greater, 14 out of 15, or 93%, had an ATE of less than 30%.If the actual size of an activity was less than 10 FTE, 4 out of 13, or 31%, had an ATE of less than 30%.

Page 31: Observations on Cost Modeling and Performance Measurement of Long Term Archives Kathy Fontaine NASA Goddard Space Flight Center Earth Science Data Systems

31

Progress with CET Independent Testing Performance

Improving Average Typical Error (ATE) for CETs

6.08

4.89

3.29

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Working PrototypeMay 2003

IOCSeptember

2003

Beta TestMay 2004

CETVersion:

Version 1Sept 2004

2.78 2.47

Version 2Oct 2005

2.46

Version 2.1Sept 2006