comparison of validation metrics using repeated full-scale ...€¦ · [email protected] key words:...

18
8th. World Congress on Computational Mechanics (WCCM8) 5th. European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2008) June 30 July 5, 2008 Venice, Italy Comparison of Validation Metrics Using Repeated Full-scale Automobile Crash Tests * Malcolm H. Ray¹, Marco Anghileri ² and Mario Mongiardini³ ¹ Worcester Polytechnic Institute Dept. of Civil and Environmental Eng. 100 Institute Road Worcester, MA 01609, USA [email protected] ² Politecnico di Milano Dept. of Aerospace Eng. Via La Masa 34 20156 Milan, Italy [email protected] ³ Worcester Polytechnic Institute Dept. of Civil and Environmental Eng. 100 Institute Road Worcester, MA 01609, USA [email protected] Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests. ABSTRACT Qualitatively comparing the shapes of time histories to validate experiments with computational simulations is a common technique in both general computational mechanics as well as computational roadside safety. Qualitative comparisons, however, are subjective and open to interpretation. A variety of quantitative metrics are available for comparing time history data as well but developing acceptance criteria for these methods often relies on equally imprecise engineering judgment. This paper presents the results of time-history comparisons of 10 essentially identical full-scale vehicle re-directional crash tests with a vertical concrete wall. Five of the crash tests used exactly the same type of vehicle whereas the other five used a similar vehicle that was within the EN1317 test vehicle specifications for that class of vehicle. A variety of quantitative shape comparison metrics were calculated for each set of repeated crash test cases and the results are presented. The results are compared and contrasted as to the utility of each metric and its diagnostic value in assessing the degree of comparison between the repeated crash test time histories. Since the crash test experiments are as identical as can be achieved experimentally, the values of the quantitative metrics represent the reasonable range for the metric corresponding to matched experiments. Statistical analysis of the data will also be performed to assess the typical residual errors that can be expected in full-scale roadside safety crash tests. Finally, recommendations for the use of specific metrics are provided.

Upload: others

Post on 16-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

8th. World Congress on Computational Mechanics (WCCM8)

5th. European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2008)

June 30 – July 5, 2008

Venice, Italy

Comparison of Validation Metrics Using Repeated Full-scale Automobile Crash Tests

* Malcolm H. Ray¹, Marco Anghileri² and Mario Mongiardini³

¹ Worcester Polytechnic

Institute

Dept. of Civil and

Environmental Eng.

100 Institute Road Worcester,

MA – 01609, USA

[email protected]

² Politecnico di Milano

Dept. of Aerospace Eng.

Via La Masa 34

20156 Milan, Italy

[email protected]

³ Worcester Polytechnic

Institute

Dept. of Civil and

Environmental Eng.

100 Institute Road Worcester,

MA – 01609, USA

[email protected]

Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests.

ABSTRACT

Qualitatively comparing the shapes of time histories to validate experiments with computational

simulations is a common technique in both general computational mechanics as well as computational

roadside safety. Qualitative comparisons, however, are subjective and open to interpretation. A variety of

quantitative metrics are available for comparing time history data as well but developing acceptance criteria

for these methods often relies on equally imprecise engineering judgment.

This paper presents the results of time-history comparisons of 10 essentially identical full-scale

vehicle re-directional crash tests with a vertical concrete wall. Five of the crash tests used exactly the same

type of vehicle whereas the other five used a similar vehicle that was within the EN1317 test vehicle

specifications for that class of vehicle. A variety of quantitative shape comparison metrics were calculated

for each set of repeated crash test cases and the results are presented. The results are compared and

contrasted as to the utility of each metric and its diagnostic value in assessing the degree of comparison

between the repeated crash test time histories.

Since the crash test experiments are as identical as can be achieved experimentally, the values of the

quantitative metrics represent the reasonable range for the metric corresponding to matched experiments.

Statistical analysis of the data will also be performed to assess the typical residual errors that can be

expected in full-scale roadside safety crash tests. Finally, recommendations for the use of specific metrics

are provided.

Page 2: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

2

INTRODUCTION

Comparing the correspondence between curves from physical experiment and mathematical models

is a very important and common technique used by scientists and engineers to determine if the mathematical

models adequately represent physical phenomena. Two common reasons for which shapes are compared are

the verification or validation of computational results and the assessment of the repeatability of experimental

tests. In the former case, an experimental and a numerical curve are compared in order to assess how well

the numerical model predicts a physical phenomenon; while in the latter case, two or more experimental

curves are compared in order to asses if they represent the same or similar physical events.

A traditional technique has been to visually compare curves by matching peaks, oscillations,

common shapes, etc. Although this kind of comparison gives a subjective impression of how similar two

curves are, it is based on a purely subjective judgment which could vary from one analyst to another.

Approval decisions need to be based as much as possible on quantitative criteria that are unambiguous and

mathematically precise. In order to minimize the subjectivity, it is necessary to define objective comparison

criteria based on computable measures. Comparison metrics, which are mathematical measures that

quantify the level of agreement between simulation outcomes and experimental outcomes, can accomplish

this goal.

Recently, several comparison metrics have been developed in different engineering domains. [2-14]

Metrics can be grouped into two main categories: (i) deterministic metrics and (ii) stochastic metrics.

Deterministic metrics do not specifically address the probabilistic variation of either experiments or

calculation (i.e., for deterministic metrics the calculation results are the same every time given the same

input), while stochastic metrics involve computing the likely variation in both the simulation and the

experiment response due to parameter variations. Deterministic metrics found in literature can be further

classified into two main types: (a) domain-specific metrics and (b) shape comparison metrics. The domain-

specific metrics are quantities specific to a particular application. For example, the axial crush of a railroad

car in a standard crash test might be a metric that is useful in designing rolling stock but has no relevance to

other applications.

On the other hand, shape comparison metrics involve a comparison of curves from a numerical

simulation and a physical experiment. The curves may be time histories, force-deflection plots, stress-strain

plots, etc. Shape comparison metrics assess the degree of similarity between any two curves in general and,

therefore, do not depend on the particular application domain.

In roadside safety, comparisons between several tests or test and simulation results have mainly used

domain-specific metrics (e.g. occupant severity indexes, changes in velocity, 10-msec average accelerations,

maximum barrier deflection etc.).[1] The main advantage of this method was that the user could use the

same domain-specific metrics that are already used to evaluate experiments also to compare test and

simulations results. Although the comparison of domain-specific metrics can give an idea of how close two

tests or a test and a simulation are, shape-comparison metrics would be a more precise tool since they can be

used to directly evaluate the basic response of the structures, like acceleration and velocity time histories.

In roadside safety, domain-specific metrics are all derivative from the acceleration time histories so if the

time history information is valid, any metric derived from the time history data will also be valid.

Once a particular deterministic shape comparison metric is chosen, it is necessary to establish an

acceptance criterion for deciding if the comparison is acceptable. One approach is to arbitrarily set an

acceptance criteria but this is not very satisfying nor precise. A better approach is to determine what the

realistic variation in the deterministic shape comparison metrics is for identical physical experiments and use

that variation as an acceptance criterion. For example, if a series of physical experiments result in a shape

comparison metric that is within some specific range, a mathematical model of the same phenomena need

only fall within that same range.

Page 3: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

3

OBJECTIVE

The purpose of this paper is to evaluate the response of the most common shape comparison metrics

found in literature for the case of ten essentially identical full-scale crash tests. The experiments consist of

two groups of five. The first five use identical vehicles (e.g., the same model and year) whereas the second

set of five use similar though not identical vehicles. For each of the two data sets, the five experimental

curves were compared in couples for a total of four different comparisons and the obtained metric values

were then used to define acceptance criteria and the expected range of values based on a probabilistic

approach.

The curves used in this study represent the time history of the lateral acceleration measured at the

center of gravity of the vehicles during the impact. A similar effort was undertaken for the longitudinal and

vertical accelerations but they are not presented here for the sake of brevity. In this type of collision, the

lateral acceleration is generally the best measure of the stiffness of the barrier response and is most directly

related to measures of occupant impact severity. Acceleration time histories were used rather than velocity

or displacement time histories because the accelerations are the experimentally observed quantities whereas

the velocity and displacement are calculated by integrating and double integrating the acceleration response,

respectively. If the acceleration response compares well, the velocity and displacement will also compare

well since they are simply a mathematical operation on the same source data. [2].

METRICS

A brief description of the metrics evaluated in this work is presented in this section. All fourteen

metrics considered in this paper are deterministic shape-comparison metrics. Details about the mathematical

formulation of each metric can be found in the sited literature. Conceptually, the metrics evaluated can be

classified into three main categories: (i) magnitude-phase-composite (MPC) metrics, (ii) single-value

metrics and (iii) analysis of variance (ANOVA) metrics.

MPC

MPC metrics treat the curve magnitude and phase separately using two different metrics (i.e., M and P,

respectively). The M and P metrics are then combined into a single value comprehensive metric, C. The

following MPC metrics were used: (a) Geers (original formulation and two variants), (b) Russell and (c)

Knowles and Gear. [4-9] Table 1 shows the analytical definition of each metric. In this and the following

sections, the terms mi and ci refer to the measured and computed quantities respectively with the “i”

subscribe indicating a specific instant in time.

Table 1: Definition of MPC metrics.

Magnitude Phase Comprehensive

Integral comparison metrics

Geers

Geers CSA

Sprague & Geers

Page 4: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

4

Russell

where

Point-to-point comparison metrics

Knowles & Gear

where (with )

In all MPC metrics the phase component (P) should be insensitive to magnitude differences but

sensitive to differences in phasing or timing between the two time histories. Similarly, the magnitude

component (M) should be sensitive to differences in magnitude but relatively insensitive to differences in

phase. These characteristics of MPC metrics allow the analyst to identify the aspects of the curves that do

not agree. For each component of the MPC metrics, zero indicates that the two curves are identical. Each of

the MPC metrics differs slightly in its mathematical formulation. The different variations of the MPC

metrics are primarily distinguished in the way the phase metric is computed, how it is scaled with respect to

the magnitude metrics and how it deals with synchronizing the phase. In particular, the Sprague and Geers

metric [6] uses the same phase component as the Russell metric [7]. Also, the magnitude component of the

Russell metric is peculiar as it is based on a base-10 logarithm and it is the only MPC metrics among those

considered in this paper to be symmetric (i.e., the order of the two curves is irrelevant). The Knowles and

Gear metric [8,9] is the most recent variation of MPC-type metrics. Unlike the previously discussed MPC

metrics, it is based on a point-to-point comparison. In fact, this metric requires that the two compared

curves are first synchronized in time based on the so called Time of Arrival (TOA), which represents the

time at which a curve reaches a certain percentage of the peak value In this work the percentage of the peak

value used to evaluate the TOA was 5%, which is the typical value found in literature. Once the curves have

been synchronized using the TOA, it is possible to evaluate the magnitude metric. Also, in order to avoid

creating a gap between time histories characterized by a large magnitude and those characterized by a

smaller one, the magnitude component M has to be normalized using the normalization factor QS.

Single-value metrics

Single-value metrics give a single numerical value that represents the agreement between the two

curves. Seven single-value metrics were considered in this work: (1) the correlation coefficient metric,

(2) the NARD correlation coefficient metric (NARD), (3) Zilliacus error metric, (4) RSS error metric, (5)

Theil's inequality metric, (6) Whang's inequality metric and (7) the regression coefficient metric. [10-13]

The first two metrics are based on integral comparisons while the others are based on a point-to-point

comparison. The definition of each metric is shown in Table 2.

Table 2: Definition of single-value metrics.

Integral comparison metrics

Correlation Coefficient

Correlation Coefficient (NARD)

Point-to-point comparison metrics

Zilliacus error RSS error

Page 5: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

5

Theil's inequality Whang's inequality

Regression coefficient

ANOVA metric

ANOVA metrics are based on the assumption that is two curves do, in fact, represent the same event then

any differences between the curves must be attributable only to random experimental error. The analysis of

variance (i.e., ANOVA) is a standard statistical test that assesses whether the variance between two curves

can be attributed to random error.[2,3] When two time histories represent the same physical event, both

should be identical such that the mean residual error, e , and the standard deviation of the residual errors, ,

are both zero. Of course, this is never the case in practical situations (e.g., experimental errors cause small

variations between tested responses even in identical tests). The conventional T statistic provides an

effective method for testing the assumption that the observed e is close enough to zero to represent only

random errors. Ray proposed a method where the residual error and its standard deviation are normalized

with respect to the peak value of the test curve and came to the following acceptance criteria based on six

repeated frontal full-scale crash tests [2]:

The average residual error normalized by the peak response (i.e., re ) should be less than five

percent.

The standard deviation of the normalized residuals (i.e., r ) should be less than 20 percent.

The t-test on the distribution of the normalized residuals should not reject the null hypothesis that the

mean value of the residuals is null for a paired two-tail t-test at the five-percent level, ,005.0t (i.e.,

90th

percentile).

r

r

enT

REPEATED FULL SCALE CRASH TESTS

A series of five crash tests with new Peugeot 106 vehicles (model year 2000) and a rigid concrete

barrier were performed as a part of the ROBUST project (Figure 1).[15] The tests were independently

carried out by five different test laboratories in Europe, herein called laboratories one through five, with the

purpose of assessing the repeatability of crash tests. As the main intent was to see if experimental curves

representing the same test result in similar responses, a rigid barrier was intentionally chosen in order to

limit the scatter of the results which is typically greater in the case of deformable barriers. A second series

of five tests was performed using the same barrier but with vehicles of different brands and models. All the

vehicles used in the series, however, corresponded to the standard small test vehicle specified the European

crash test standards, EN 1317. [16] The second set of tests was performed to investigate influences arising

from different vehicle models on the repeatability of crash tests. In all cases, the three components of

acceleration, including the lateral acceleration used in this paper, were measured at the center of gravity of

the vehicles.

Preprocessing

Page 6: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

6

In order to correctly compare the different time histories, it was necessary to prepare them properly

by performing the following operations: (a) filtering, (b) re-sampling, (c) synchronizing and (d) trimming.

All the preprocessing operations and the following metrics evaluation were performed using Matlab® [17].

Initially, all the time histories were filtered using a SAE J211 CFC 60 compliant filter [18] and the initial

vertical shift typical of most experimental data was eliminated by shifting each curve by a value equal to the

average of the first ten data points.

Figure 1. Crash test of one the new Peugeot vehicles.

The residuals between two time histories are by definition the difference at each instant in time

between two curves, one of which is considered to be the “true” or correct curve. For each of the two sets of

tests, the “true” curve was chosen to be the time history which was considered to be closest to the average

response of the specific set (i.e., tests Lab#1 (Set 1) and Lab#1 (Set 2)). In order to compute the residuals,

the different time histories have to be sampled at the same rate and start at a common point. As the original

sampling rate was not the same for all the time histories, it was necessary to resample each curve to a

common rate which was chosen to be the highest sampling rate among all the tests (i.e., 20 kHz).

Also, as the curves did not always start at the exact time (i.e., the impact time did not happen at the

same time in each test), it was necessary to synchronize each test such that the most probable impact point

was matched in each curve. Although each curve could be considered independently, for the sake of

simplicity, it was decided to use the “true” curve of the first data set as the reference for the synchronization.

The method used to synchronize each time history pair was based on the minimization of the area between

the two curves (i.e., the absolute area of the time history of residual). A Matlab routine was implemented

which could shift either one or the other curve and evaluate the area between the curves in the new

configuration. A loop was then implemented to search for the shift value which corresponded to the

minimum residual error between the two curves. Once the shift values corresponding to each time history

from both the two sets of tests were evaluated, each curve was shifted by the maximum value and trimmed

to adjust the potential difference between the specific shift value required for that curve and the common

shift value used for all the curves. Also, the curves were cut at the tail in order to guarantee the same final

time (i.e., length of data vectors).

Residuals statistics

Once all the curves had the same sampling rate and were synchronized with respect to the time of initial

impact, the average and standard deviation of the residuals were evaluated for each time history of both the

two sets of tests. The specific values and the average of the standard deviation for the first two sets are

summarized in Table 3, while the value obtained considering all the tests are summarized in Table 4.

Table 3. Residual errors for crash test Set #1 and #2.

Comparison vs. True curve [Lab #1 (Set1)]

Set #1 Lab #2 Lab #3 Lab #4 Lab #5 Mean Std

Average 0 -0.01 -0.01 0 -0.005 0.00577

Standard Deviation

0.19 0.24 0.23 0.2 0.215 0.02380

Comparison vs. True curve [Lab #1 (Set2)]

Set #2 Lab #2 Lab #3 Lab #4 Lab #5 Mean Std

Average -0.01 0 0 -0.02 -0.0075 0.00957

Standard deviation

0.21 0.22 0.31 0.25 0.2475 0.04500

Table 4. Residual errors for crash tests for Sets #1 and #2 combined.

Lab #2 Lab #3 Lab #4 Lab #5 Lab #1 Lab #2 Lab #3 Lab #4 Lab #5 Mean Std

Page 7: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

7

(Set 1) (Set 1) (Set 1) (Set 1) (Set 2) (Set 2) (Set 2) (Set 2) (Set 2)

Average 0 -0.01 -0.01 0 -0.01 -0.01 -0.01 -0.01 -0.03 -0.01 0.00866

Standard deviation

0.19 0.24 0.23 0.2 0.26 0.25 0.26 0.32 0.21 0.24 0.03937

Since the time histories for all the crash tests represented essentially identical physical events, the

residuals for each curve should be attributable only to random experimental error or noise. Statistically

speaking, this means that the residuals should be normally distributed around a mean error equal to zero. As

shown in the cumulative density function in Figure 2, the shape of the residual accelerations distribution is

typical of a normal distribution for both sets of crash tests when taken separately or combined. Since the

cumulative distribution is an “S” shaped curve centered on zero, the distribution of the residuals is consistent

with random experimental error as would be expected in these series of repeated crash tests.

Set#1 [True curve: Lab #1 (Set 1)] Set#2 [True curve: Lab #1(Set 2) ]

All tests [True curve: Lab #1 (Set 1)]

Figure 2. Cumulative density function of the residual accelerations for Set #1 (top left), Set #2 (top right)

and the combination of Sets #1 and #2 (bottom).

RESULTS

Once the time histories were preprocessed, each was compared to the “true” curve by evaluating all

fourteen comparison metrics previously described. All the calculations were performed using Matlab.

Initially, the two sets of tests, Set #1 with the same new vehicle and Set#2 with similar vehicles, were

considered separately using the response from the test Lab #1 belonging to the respective set as the “true”

curve.

For both sets, the average of the standard deviations of the residuals between each curve of the set

and the respective “true curve” was evaluated. The average value of the standard deviation for each set was

then used to evaluate the 90th

percentile envelope for each set by adding and subtracting to the respective

Page 8: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

8

“true” curve the average of the standard deviations of the residuals for each specific set of tests multiplied by

1.6449 (Figure 3).

Set #1 Set #2

Figure 3. 90th

percentile envelope and acceleration time histories for Set# 1 and Set#2.

Table 5. Values of the comparison metrics for Set#1.

Lab#2 (Set 1) Lab#3 (Set 1) Lab#4 (Set 1) Lab#5 (Set 1) Mean STD

MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Geers Magnitude -23 -21.4 -14.4 -25.8 -21.2 4.85 -29.2 -13.1

Geers Phase 22.6 36.3 31.2 24.9 28.8 6.21 18.4 39.1

Geers Comprehensive 32.3 42.1 34.3 35.8 36.1 4.23 29.1 43.2

Geers CSA Magnitude -23 -21.4 -14.4 -25.8 -21.2 4.85 -29.2 -13.1

Geers CSA Phase 22.6 36.3 31.2 24.9 28.8 6.21 18.4 39.1

Geers CSA Comprehensive 32.3 42.1 34.3 35.8 36.1 4.23 29.1 43.2

Sprague-Geers Magnitude -23 -21.4 -14.4 -25.8 -21.2 4.85 -29.2 -13.1

Sprague-Geers Phase 21.8 28 25.8 22.9 24.6 2.81 20.0 29.3

Sprague-Geers Comprehensive 31.7 35.2 29.6 34.5 32.8 2.59 28.5 37.0

Russell Magnitude -18.5 -17.2 -11.8 -20.6 -17.0 3.75 -23.3 -10.8

Russell Phase 21.8 28 25.8 22.9 24.6 2.81 20.0 29.3

Russell Comprehensive 25.3 29.1 25.2 27.3 26.7 1.86 23.6 29.8

Knowles-Gear Magnitude 55.2 60.6 48.6 56.8 55.3 5.01 47.0 63.6

Knowles-Gear Phase 40.9 79.9 79.9 67.9 67.2 18.39 36.6 97.7

Knowles-Gear Comprehensive 53.1 64.2 55.1 58.8 57.8 4.88 49.7 65.9

Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Whang's inequality metric 40.9 53.2 46.3 46 46.6 5.05 38.2 55.0

Theil's inequality metric 35.8 43.9 40.1 37.9 39.4 3.46 33.7 45.2

Zilliacus error metric 70.8 92.2 87.1 77.9 82.0 9.53 66.2 97.8

RSS error metric metric 63.3 78.5 74.4 66 70.6 7.10 58.8 82.3

WIFac_Error 53.3 61.5 58.2 55.5 57.1 3.54 51.3 63.0

Regression Coefficient 72.6 52.4 58.9 69.8 63.4 9.43 47.8 79.1

Correlation Coefficient 72.7 55.3 62 69.8 65.0 7.86 51.9 78.0

Correlation Coefficient(NARD) 77.4 63.7 68.8 75.1 71.3 6.21 60.9 81.6

ANOVA Metrics Value Value Value Value Value Value min max

Average 0 -0.01 -0.01 0 -0.01 0.01 -0.01 0.00

Std 0.19 0.24 0.23 0.2 0.22 0.02 0.18 0.25

T-test -1.51 -1.87 -2.56 1.31

Acceptable value

Page 9: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

9

Table 5 and 6 show the values of the comparison metrics obtained for each of the two sets of test data. The

values of the metrics for each of the laboratories (i.e., two through five) are shown in each table along with

the mean and standard deviation of the metric. If it is assumed that distribution of the metric is normal, then

90 percent of the values should be within 1.66 standard deviations of the mean. Possible acceptance criteria

are listed in the last column by calculating the 90th

percentile limit of the observed variation of the metric.

Table 6. Values of the comparison metrics for tests of Set#2.

Lab#2 (Set 2) Lab#3 (Set 2) Lab#4 (Set 2) Lab#5 (Set 2) Mean STD

MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Geers Magnitude -2.6 -8.2 35.6 -9.3 3.9 21.35 -31.6 39.3

Geers Phase 21.3 24.5 30.2 33.2 27.3 5.39 18.4 36.2

Geers Comprehensive 21.5 25.9 46.7 34.5 32.2 11.10 13.7 50.6

Geers CSA Magnitude -2.6 -8.2 35.6 -9.3 3.9 21.35 -31.6 39.3

Geers CSA Phase 21.3 24.5 30.2 33.2 27.3 5.39 18.4 36.2

Geers CSA Comprehensive 21.5 25.9 46.7 34.5 32.2 11.10 13.7 50.6

Sprague-Geers Magnitude -2.6 -8.2 35.6 -9.3 3.9 21.35 -31.6 39.3

Sprague-Geers Phase 21.2 22.8 25.4 26.7 24.0 2.49 19.9 28.2

Sprague-Geers Comprehensive 21.3 24.2 43.8 28.3 29.4 10.02 12.8 46.0

Russell Magnitude -2.3 -6.9 20.9 -7.8 1.0 13.50 -21.4 23.4

Russell Phase 21.2 22.8 25.4 26.7 24.0 2.49 19.9 28.2

Russell Comprehensive 18.9 21.1 29.2 24.6 23.5 4.49 16.0 30.9

Knowles-Gear Magnitude 54.2 60.7 97.3 68.4 70.2 19.01 38.6 101.7

Knowles-Gear Phase 38 74 16.7 100 57.2 37.07 -4.4 118.7

Knowles-Gear Comprehensive 51.9 63.2 89.1 74.6 69.7 15.91 43.3 96.1

Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Whang's inequality metric 37.7 42.2 43.8 43.2 41.7 2.76 37.1 46.3

Theil's inequality metric 32.7 35.3 41.3 41 37.6 4.26 30.5 44.7

Zilliacus error metric 69.3 77 97.3 79.5 80.8 11.84 61.1 100.4

RSS error metric metric 64.5 67.6 97.2 78.1 76.9 14.76 52.3 101.4

WIFac_Error 49.6 54 61.1 56.8 55.4 4.83 47.4 63.4

Regression Coefficient 67.1 62.8 0 43.9 43.5 30.67 -7.5 94.4

Correlation Coefficient 71.2 66.8 63.9 53 63.7 7.75 50.9 76.6

Correlation Coefficient(NARD) 78.7 75.4 69.8 66.8 72.7 5.37 63.8 81.6

ANOVA Metrics Value Value Value Value Value Value min max

Average -0.01 0 0 -0.02 -0.01 0.01 -0.02 0.01

Std 0.21 0.22 0.31 0.25 0.25 0.05 0.17 0.32

T-test -2.3 0.66 -0.23 -6.27

Acceptable value

Next, all ten tests from both sets were compared together considering the response of test Lab#1

from set 1 as the “true” curve. Similarly to the previous case of the two separate sets, the 90th

percentile

envelope was first evaluated (Figure 4). Table 7 shows the results obtained in this case.

Page 10: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

10

Figure 4. 90

th percentile envelope and acceleration time histories considering all tests.

Table 7. Values of the comparison metrics considering all tests.

Lab#1 (Set 2) Lab#2 (Set 1) Lab#2 (Set 2) Lab#3 (Set 1) Lab#3 (Set 2) Lab#4 (Set 1) Lab#4 (Set 2) Lab#5 (Set 1) Lab#5 (Set 2) Mean STD

MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Geers Magnitude -16.7 -23 -18.9 -21.4 -23.5 -14.4 13 -25.8 -24.5 -17.2 11.94 -37.1 2.6

Geers Phase 41.3 22.6 38.3 36.3 42.6 31.2 47.5 24.9 29.2 34.9 8.44 20.9 48.9

Geers Comprehensive 44.6 32.3 42.7 42.1 48.6 34.3 49.2 35.8 38.1 40.9 6.10 30.7 51.0

Geers CSA Magnitude -16.7 -23 -18.9 -21.4 -23.5 -14.4 13 -25.8 -24.5 -17.2 11.94 -37.1 2.6

Geers CSA Phase 41.3 22.6 38.3 36.3 42.6 31.2 47.5 24.9 29.2 34.9 8.44 20.9 48.9

Geers CSA Comprehensive 44.6 32.3 42.7 42.1 48.6 34.3 49.2 35.8 38.1 40.9 6.10 30.7 51.0

Sprague-Geers Magnitude -16.7 -23 -18.9 -21.4 -23.5 -14.4 13 -25.8 -24.5 -17.2 11.94 -37.1 2.6

Sprague-Geers Phase 30 21.8 28.8 28 30.5 25.8 32.4 22.9 25 27.2 3.60 21.3 33.2

Sprague-Geers Comprehensive 34.4 31.7 34.5 35.2 38.5 29.6 34.9 34.5 34.9 34.2 2.45 30.2 38.3

Russell Magnitude -13.6 -18.5 -15.3 -17.2 -18.8 -11.8 9.5 -20.6 -19.5 -14.0 9.26 -29.4 1.4

Russell Phase 30 21.8 28.8 28 30.5 25.8 32.4 22.9 25 27.2 3.60 21.3 33.2

Russell Comprehensive 29.2 25.3 28.9 29.1 31.8 25.2 29.9 27.3 28.1 28.3 2.13 24.8 31.8

Knowles-Gear Magnitude 63.3 55.2 63.9 60.6 78.9 48.6 83.4 56.8 59.1 63.3 11.16 44.8 81.8

Knowles-Gear Phase 62.3 40.9 123.9 79.9 182.4 79.9 35.2 67.9 100 85.8 45.42 10.4 161.2

Knowles-Gear Comprehensive 63.1 53.1 77.2 64.2 103.6 55.1 77.5 58.8 67.6 68.9 15.60 43.0 94.8

Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Whang's inequality metric 51.6 40.9 53.3 53.2 52.7 46.3 53.3 46 44.9 49.1 4.66 41.4 56.9

Theil's inequality metric 46.2 35.8 44.8 43.9 47.6 40.1 49 37.9 40.3 42.8 4.54 35.3 50.4

Zilliacus error metric 98.3 70.8 93.6 92.2 92 87.1 112.1 77.9 79.1 89.2 12.33 68.8 109.7

RSS error metric metric 84.7 63.3 81.1 78.5 84 74.4 104.4 66 70.8 78.6 12.30 58.2 99.0

WIFac_Error 65.9 53.3 63.9 61.5 63.7 58.2 69.5 55.5 58 61.1 5.23 52.4 69.7

Regression Coefficient 39.5 72.6 47.6 52.4 41 58.9 0 69.8 64 49.5 22.02 13.0 86.1

Correlation Coefficient 49.3 72.7 52.6 55.3 47.2 62 44.6 69.8 64.6 57.6 10.13 40.7 74.4

Correlation Coefficient(NARD) 58.7 77.4 61.7 63.7 57.4 68.8 52.5 75.1 70.8 65.1 8.44 51.1 79.1

ANOVA Metrics Value Value Value Value Value Value Value Value Value Value Value min max

Average -0.01 0 -0.01 -0.01 -0.01 -0.01 -0.01 0 -0.03 -0.01 0.01 -0.02 0.00

Std 0.26 0.19 0.25 0.24 0.26 0.23 0.32 0.2 0.21 0.24 0.04 0.17 0.31

T-test -1.84 -1.51 -3.45 -1.87 -1.41 -2.56 -1.68 1.31 -7.99

Acceptable value

Table 8 shows the average of the mean values and the standard deviations evaluated for each of the

three cases previously analyzed, Set#1 vs. Lab#1 (Set 1), Set#2 vs. Lab#1 (Set 2) and all tests vs. Lab#1 (Set

1). The last column of Table 8 also shows the Acceptable Range of values considering these average values

obtained from each of the three above mentioned cases considered in this study.

Page 11: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

11

Table 8. Average of the means, the standard deviations and the corresponding acceptance criteria.

Set#1 Set#2 All Set#1 Set#2 All

MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Geers Magnitude -21.2 21.4 -17.2 4.85 39.32 11.94 -5.7 18.71 -36.7 25.4

Geers Phase 28.8 5.4 34.9 6.21 36.24 8.44 23.0 16.96 -5.2 51.2

Geers Comprehensive 36.1 11.1 40.9 4.23 50.58 6.10 29.4 20.30 -4.3 63.1

Geers CSA Magnitude -21.2 21.4 -17.2 4.85 39.32 11.94 -5.7 18.71 -36.7 25.4

Geers CSA Phase 28.8 5.4 34.9 6.21 36.24 8.44 23.0 16.96 -5.2 51.2

Geers CSA Comprehensive 36.1 11.1 40.9 4.23 50.58 6.10 29.4 20.30 -4.3 63.1

Sprague-Geers Magnitude -21.2 21.4 -17.2 4.85 39.32 11.94 -5.7 18.71 -36.7 25.4

Sprague-Geers Phase 24.6 2.5 27.2 2.81 28.15 3.60 18.1 11.52 -1.0 37.2

Sprague-Geers Comprehensive 32.8 10.0 34.2 2.59 46.03 2.45 25.7 17.02 -2.6 53.9

Russell Magnitude -17.0 13.5 -14.0 3.75 23.38 9.26 -5.8 12.13 -26.0 14.3

Russell Phase 24.6 2.5 27.2 2.81 28.15 3.60 18.1 11.52 -1.0 37.2

Russell Comprehensive 26.7 4.5 28.3 1.86 30.91 2.13 19.8 11.63 0.5 39.2

Knowles-Gear Magnitude 55.3 19.0 63.3 5.01 101.70 11.16 45.9 39.29 -19.3 111.1

Knowles-Gear Phase 67.2 37.1 85.8 18.39 118.71 45.42 63.3 60.84 -37.6 164.3

Knowles-Gear Comprehensive 57.8 15.9 68.9 4.88 96.11 15.60 47.5 38.86 -17.0 112.1

Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Whang's inequality metric 46.6 2.8 49.1 5.05 46.31 4.66 32.8 18.67 1.8 63.8

Theil's inequality metric 39.4 4.3 42.8 3.46 44.65 4.54 28.8 17.55 -0.3 58.0

Zilliacus error metric 82.0 11.8 89.2 9.53 100.43 12.33 61.0 40.76 -6.6 128.7

RSS error metric metric 70.6 14.8 78.6 7.10 101.36 12.30 54.6 40.25 -12.2 121.4

WIFac_Error 57.1 4.8 61.1 3.54 63.40 5.23 41.0 24.05 1.1 80.9

Regression Coefficient 63.4 30.7 49.5 9.43 94.36 22.02 47.9 41.94 -21.7 117.5

Correlation Coefficient 65.0 7.8 57.6 7.86 76.60 10.13 43.4 31.53 -8.9 95.8

Correlation Coefficient(NARD) 71.3 5.4 65.1 6.21 81.59 8.44 47.2 32.08 -6.0 100.5

ANOVA Metrics Value Value Value Value Value Value Value Value min max

Average -0.01 0.01 -0.01 0.01 0.01 0.01 0.0 0.01 -0.01 0.01

Std 0.22 0.05 0.24 0.02 0.32 0.04 0.2 0.13 -0.05 0.38

T-test

Acceptable valueSTDMean

Average Mean Average STD

RESULTS USING DERIVED TIME HISTORIES

As previously mentioned, the evaluation of metrics should always be performed using time histories

directly measured and not derived using either integration or differentiation. For example, if accelerations

are measured experimentally, accelerations should be the basis of comparison. Velocities and displacements

obtained by integrating the acceleration curve will accumulate error. As an example of this situation, the

comparison metrics were evaluated using the velocity time histories obtained by integrating the acceleration

time histories from Set#1 (Figure 5).

Page 12: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

12

Figure 5. 90

th percentile envelope and velocity time histories from Set#1.

As shown in Table 9 all comparison metrics except the ANOVA metric gave better results using velocities

instead of accelerations. The integration process is essentially a second low-pass filter which smoothes the

time history data further. Also, the exception of the ANOVA metric confirmed that the residuals

accumulated because of the integration operation. Figure 6 shows the residual distribution in the case of the

velocity time histories. As can be seen, the distribution of the four tests is more spread than in the case of

the acceleration time histories and the mean residual (i.e., the residual error corresponding to the 50th

percentile) is no longer zero for three of the four comparisons While the acceptance criteria for velocity-

based comparisons would be smaller, the smaller value would provide an incorrect estimation of how similar

the curves are since the actual variation of the metrics in the units the data was measured in is greater as

shown in Table 5.

Figure 6. Cumulative density function of the residual velocities for Set #1.

Page 13: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

13

Table 9. Values of the comparison metrics using velocity time histories (Set#1).

Lab#2 (Set 1) Lab#3 (Set 1) Lab#4 (Set 1) Lab#5 (Set 1) Mean STD

MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Geers Magnitude 0.5 5.1 4.5 -4 1.5 4.21 -5.5 8.5

Geers Phase 0.2 0.6 0.5 0.4 0.4 0.17 0.1 0.7

Geers Comprehensive 0.5 5.1 4.5 4 3.5 2.07 0.1 7.0

Geers CSA Magnitude 0.5 5.1 4.5 -4 1.5 4.21 -5.5 8.5

Geers CSA Phase 0.2 0.6 0.5 0.4 0.4 0.17 0.1 0.7

Geers CSA Comprehensive 0.5 5.1 4.5 4 3.5 2.07 0.1 7.0

Sprague-Geers Magnitude 0.5 5.1 4.5 -4 1.5 4.21 -5.5 8.5

Sprague-Geers Phase 2 3.5 3.1 2.8 2.9 0.64 1.8 3.9

Sprague-Geers Comprehensive 2 6.2 5.4 4.9 4.6 1.83 1.6 7.7

Russell Magnitude 0.4 4.1 3.6 -3.4 1.2 3.46 -4.6 6.9

Russell Phase 2 3.5 3.1 2.8 2.9 0.64 1.8 3.9

Russell Comprehensive 1.8 4.8 4.2 3.9 3.7 1.30 1.5 5.8

Knowles-Gear Magnitude 6.2 12.3 10.2 8.5 9.3 2.59 5.0 13.6

Knowles-Gear Phase 36.2 52 43.4 48 44.9 6.78 33.6 56.2

Knowles-Gear Comprehensive 15.8 24 20 21.1 20.2 3.40 14.6 25.9

Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]

Whang's inequality metric 2.7 5.1 4.7 4.4 4.2 1.06 2.5 6.0

Theil's inequality metric 3.1 6 5.4 4.8 4.8 1.25 2.8 6.9

Zilliacus error metric 5.3 10.3 9.4 8.5 8.4 2.18 4.8 12.0

RSS error metric metric 6.2 12.3 11 9.5 9.8 2.63 5.4 14.1

WIFac_Error 6.1 11.5 10.3 9.3 9.3 2.32 5.5 13.1

Regression Coefficient 98.7 94.9 96 97 96.7 1.61 94.0 99.3

Correlation Coefficient 99 96.9 97.7 98.1 97.9 0.87 96.5 99.4

Correlation Coefficient(NARD) 99.8 99.4 99.5 99.6 99.6 0.17 99.3 99.9

ANOVA Metrics Value Value Value Value Value Value min max

Average 0 -0.02 -0.02 0.04 0.00 0.03 -0.05 0.05

Std 0.05 0.09 0.08 0.06 0.07 0.02 0.04 0.10

T-test 4.99 -16.38 -14.17 44

Acceptable value

DISCUSSION

The data shown in Tables 5 through 8 show several interesting characteristics. Since the Geers,

Geers CSA and Sprague-Geers metrics all use the same formulation for the magnitude comparison it is not

surprising that all yield essentially the same values. These magnitude comparisons all yield metrics less

than 25.8 for set 1 and 35.6 for set 2. The phase components of the Geer, Geer CSA and Russel MPC

metrics also share the same formulation and the values for set 1 are less than 36.3 and 33.2 for set 2. In

general, the Geers, Geer CSA, Sprague-Geer and Russell metrics all result in similar values with maximum

magnitude components of 35.6, phase components of 33.2 and combined values of 46.7. Interestingly,

while the mean values are the same for these metrics for both sets 1 and 2, the standard deviation of the

magnitude components of these MPC metrics is about five times greater for set 2 than set 1 while the

standard deviation of the phase metric is similar in both sets. This may be an indication that the use of

slightly different vehicles in set 2 introduces additional variability.

Table 10 and Table 11 show the ranking of the full-scale crash test according to each metric. As

shown in the first column of Table X and Y, the Geers, Geer CSA, Sprague-Geer and Russell metrics all rate

the results from Lab#4 as the best match and Lab#2 as the worst for magnitude, Lab#5 as the best and Lab#1

as the worst for phase and Lab#4 for the best and Lab#2 as the worst for the combined metric.

The Knowles-Gear metric scales much differently than the other MPC metrics. The maximum

magnitude component is 60.6 for set #1 and 97.3 for set #2. Similarly, the maximum phase component is

28.0 for set #1 and 100 for set #2. The Knowles-Gear metric rates Lab#5 as the best magnitude match,

Lab#3 as the best phase match and Lab#5 as the best combined. Lab#2 results in the worst magnitude

match and Lab#1 the worst phase match. These results are slightly different than the other MPC metrics.

Both the standard deviation of the magnitude and phase components increased significantly between set#1

Page 14: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

14

and set#2 indicating, again, that the use of similar but not identical vehicles may increase the variability of

the metric.

Table 10. Ranking of the full-scale crash test according to each evaluated metric (Set 1 and Set 2).

Best

2n

d

3rd

Wo

rst

Best

2n

d

3rd

Wo

rst

MPC Metrics

Geers Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4

Geers Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5

Geers Comprehensive Lab#2 Lab#4 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Geers CSA Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4

Geers CSA Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5

Geers CSA Comprehensive Lab#2 Lab#4 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Sprague-Geers Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4

Sprague-Geers Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5

Sprague-Geers Comprehensive Lab#4 Lab#2 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Russell Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4

Russell Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5

Russell Comprehensive Lab#4 Lab#2 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Knowles-Gear Magnitude Lab#4 Lab#2 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Knowles-Gear Phase Lab#2 Lab#5 Lab#4 Lab#2 Lab#3 Lab#5

Knowles-Gear Comprehensive Lab#2 Lab#4 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Single Value Metrics

Whang's inequality metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Theil's inequality metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Zilliacus error metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

RSS error metric metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

WIFac_Error Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Regression Coefficient Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

Correlation Coefficient Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5

Correlation Coefficient(NARD) Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5

ANOVA Metrics

Average Lab#2 Lab#5

Std Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4

T-test Lab#5 Lab#2 Lab#3 Lab#4 Lab#4 Lab#3 Lab#2 Lab#5

Lab#3 & Lab#4

Lab#2 & Lab#5Lab#3 & Lab#4Lab#3 & Lab#4

Set 1 Set 2

All the single value metrics resulted in rating the results from Lab#5 as the best and either Lab#1 or

Lab#2 as the worst. As shown in Table 2, the correlation coefficient, NARD correlation coefficient, RSS

error and regression coefficient all have similar formulation and an identical match is indicated by unity.

The poorest correlation was 55.3 for Lab#3 and the best was 75.1 for Lab#5. Like the MPC metrics, the

standard deviation of the metrics generally doubled in the second set of experiments.

The average residual error component of the ANOVA metric was generally very close to zero for all

the crash tests in both sets of data. The standard deviation of the residual errors was as high as 31 percent

and like all the other metrics, the standard deviation of the standard deviation of the residuals doubled in the

second set of tests. Only two tests in each set (Lab#3 and Lab#5 for set #1 and Lab#3 and Lab#4) passed

the T test at the 90th

percent confidence level.

In evaluating the ANOVA metrics for a series of six identical frontal rigid pole impacts, Ray proposed an

acceptance criterion of a mean residual error less than five percent of the peak and a standard deviation of

less than 20 percent of the peak test acceleration.[2] As shown in Tables 5 through 7, most of the tests in

these two test series had mean residual errors very close to zero but the standard deviations tended to be

somewhat higher than the 20 percent resulting in T statistics just outside the 90th

percentile acceptance

range. The reason for this is probably that the redirectional test examined in this paper is much less

Page 15: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

15

repeatable than the frontal impact originally studied by Ray. The T test in some of these impacts can fail

because the result of the test was in actual fact somewhat different. For example, in a redirectional test, a

slight difference in the suspension or steering system could cause a slightly different orientation of the front

wheels which would in turn change the redirection angle and lateral forces. While the impact conditions

may well be essentially identical, the result of the test is still not identical due to the other uncontrollable

variations in the experiment. The T-test aspect of the ANOVA metrics is, therefore, an extremely difficult

test to match although when the T statistic is in the passing range, the resulting comparison will be very

good.

Table 11.. Ranking of the full-scale crash test according to each evaluated metric (all tests).

Best

2n

d

3rd

4th

5th

6th

7th

8th

Wo

rst

MPC Metrics

Geers Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)

Geers Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Geers Comprehensive Lab#2 (1) Lab#4 (1) Lab#5 (1) Lab#5 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Geers CSA Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)

Geers CSA Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Geers CSA Comprehensive Lab#2 (1) Lab#4 (1) Lab#5 (1) Lab#5 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Sprague-Geers Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)

Sprague-Geers Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Sprague-Geers Comprehensive Lab#4 (1) Lab#2 (1) Lab#1 (2) Lab#3 (1) Lab#3 (2)

Russell Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)

Russell Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Russell Comprehensive Lab#4 (1) Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#2 (2) Lab#3 (1) Lab#1 (2) Lab#4 (2) Lab#3 (2)

Knowles-Gear Magnitude Lab#4 (1) Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#3 (1) Lab#1 (2) Lab#2 (2) Lab#3 (2) Lab#4 (2)

Knowles-Gear Phase Lab#4 (2) Lab#2 (1) Lab#1 (2) Lab#5 (1) Lab#5 (2) Lab#2 (2) Lab#3 (2)

Knowles-Gear Comprehensive Lab#2 (1) Lab#4 (1) Lab#5 (1) Lab#1 (2) Lab#3 (1) Lab#5 (2) Lab#2 (2) Lab#4 (2) Lab#3 (2)

Single Value Metrics

Whang's inequality metric Lab#2 (1) Lab#5 (2) Lab#5 (1) Lab#4 (1) Lab#1 (2) Lab#3 (2) Lab#3 (1)

Theil's inequality metric Lab#2 (1) Lab#5 (1) Lab#4 (1) Lab#5 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Zilliacus error metric Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#4 (2)

RSS error metric metric Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#3 (2) Lab#1 (2) Lab#4 (2)

WIFac_Error Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#3 (2) Lab#2 (2) Lab#1 (2) Lab#4 (2)

Regression Coefficient Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#3 (2) Lab#1 (2) Lab#4 (2)

Correlation Coefficient Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

Correlation Coefficient(NARD) Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)

ANOVA Metrics

Average Lab#5 (2)

Std Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#4 (2)

T-test Lab#5 (1) Lab#3 (2) Lab#2 (1) Lab#4 (2) Lab#1 (2) Lab#3 (1) Lab#4 (1) Lab#2 (2) Lab#5 (2)

Lab#2 (2) & Lab#5 (1)Lab#4 (2) & Lab#5 (2)

Lab#3 (1) & Lab#4 (1)

Lab#2 (2) & Lab#4 (2)

All tests (Set 1 + Set 2)

Lab#2 (1) & Lab#5 (1)Lab#1 (2), Lab#2 (2), Lab#3 (1), Lab#3 (2), Lab#4 (1) & Lab#4 (2)

Lab#1 (2) & Lab#3 (2)

CONCLUSIONS

This paper investigated the use of shape comparison metrics to quantitatively compare the similarity

of the time histories of two sets of five repeated identical full-scale crash tests. Ten essentially identical full-

scale crash tests with a vertical concrete wall were considered, five of them performed using the same type

of vehicle and the other five using similar vehicles but still complying with the EN1317 test specifications

for that class of vehicle.

The original raw time histories from the 10 tests were filtered, re-sampled and synchronized in order

to be correctly compared to each other. The statistics derived from the analysis of the residuals confirmed

the hypothesis that the errors were normally distributed and could, therefore, be attributed to normal random

experimental error.

Page 16: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

16

A total of three sets of tests were analyzed and a variety of quantitative shape comparison metrics

were then calculated for each set of repeated crash test cases. The comparison of the results showed the

utility of each metric and its diagnostic value in assessing the degree of comparison between the repeated

crash test time histories. Also, possible acceptance criteria were suggested for each evaluated metric

according to the value obtained from the analysis of these nearly identical tests.

Acknowledgments:

The authors are grateful to Dr. Leonard Schwer, Mr. David Moorcroft and other members of the

ASME PTC-60 committee for providing helpful information about verification and validation metrics.

Also, the authors are thankful to Dr. Chiara Silvestri for her help in finalizing and revising the manuscript.

This work was made possible thanks to the support of the National Cooperative Highway Research Program

as a part of Project NCHRP 22-24.

Page 17: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

17

REFERENCES

[1] C.A. Plaxico, M.H. Ray and K. Hiranmayee, “Impact Performance of the G4(1W) and G4(2W)

Guardrail Systems Comparison Under NCHRP Report 350Test 3-11 Conditions”, Transportation Research

Record 1720, Transportation Research Board, Washington, D.C., pp 7-18, (2000).

[2] M.H. Ray, “Repeatability of Full-Scale Crash Tests and a Criteria for Validating Finite Element

Simulations”, Transportation Research Record, Vol. 1528, pp. 155-160, (1996).

[3] W.L. Oberkampf and M.F. Barone, “Measures of Agreement Between Computation and Experiment:

Validation Metrics,” Journal of Computational Physics Vol. 217, No. 1 (Special issue: Uncertainty

quantification in simulation science) pp 5–36, (2006).

[4] T.L Geers, “An Objective Error Measure for the Comparison of Calculated and Measured Transient

Response Histories”, The Shock and Vibration Bulletin, The Shock and Vibration Information Center, Naval

Research Laboratory, Washington, D.C., Bulletin 54, Part 2, pp. 99-107, (June 1984).

[5] Comparative Shock Analysis (CSA) of Main Propulsion Unit (MPU), Validation and Shock Approval

Plan, SEAWOLF Program: Contract No. N00024-90-C-2901, 9200/SER: 03/039, September 20, 1994.

[6] M.A. Sprague and T.L. Geers, “Spectral elements and field separation for an acoustic fluid subject to

cavitation”, J Comput. Phys., pp. 184:149, Vol. 162, (2003).

[7] D.M. Russell, “Error Measures for Comparing Transient Data: Part I: Development of a Comprehensive

Error Measure”, Proceedings of the 68th shock and vibration symposium, pp. 175184, (2006).

[8] L.E. Schwer, “Validation Metrics for Response Time Histories: Perspective and Case Studies”, Engng.

with Computers, Vol. 23, Issue 4, pp. 295309, (2007).

[9] C.P. Knowles and C.W. Gear, “Revised validation metric”, unpublished manuscript, 16 June 2004

(revised July 2004).

[10] J. Cohen, P. Cohen, S.G. West and L.S. Aiken, Applied multiple regression/correlation analysis for the

behavioral sciences, Hillsdale, NJ: Lawrence Erlbaum, (3rd ed.), 2003.

[11] S. Basu and A. Haghighi, “Numerical Analysis of Roadside Design (NARD) vol. III: Validation

Procedure Manual”, Report No. FHWA-RD-88-213, Federal Highway Administration, Virginia, 1988.

[12] B. Whang, W.E. Gilbert and S. Zilliacus, Two Visually Meaningful Correlation Measures for

Comparing Calculated and Measured Response Histories, Carderock Division, Naval Surface Warfare

Center, Bethesda, Maryland, Survivability, Structures and Materials Directorate, Research and Development

Report, CARDEROCKDIV-U-SSM-67-93/15, September, 1993.

Page 18: Comparison of Validation Metrics Using Repeated Full-scale ...€¦ · mario@wpi.edu Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests

18

[13] H. Theil, Economic Forecasts and Policy, North-Holland Publishing Company, Amsterdam, 1975.

[14] D.M. Russell, “Error Measures for Comparing Transient Data: Part II: Error Measures Case Study”,

Proceedings of the 68th shock and vibration symposium, pp. 185198, (2006).

[15] ROBUST PROJECT

[16] European Committee of Standardization, “European Standard EN 1317-1 and EN 1317-2: Road

Restraint Systems”, CEN, 1998.

[17] Matlab: User Guide, The MathWorks Inc, (1994-2008).

[18] SAE J211-1 (R) Instrumentation for Impact Test—Part 1—Electronic Instrumentation, SAE

International, Deaburn MI, USA, Jul 1, 2007.