comparison of validation metrics using repeated full-scale ...€¦ · [email protected] key words:...
TRANSCRIPT
8th. World Congress on Computational Mechanics (WCCM8)
5th. European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2008)
June 30 – July 5, 2008
Venice, Italy
Comparison of Validation Metrics Using Repeated Full-scale Automobile Crash Tests
* Malcolm H. Ray¹, Marco Anghileri² and Mario Mongiardini³
¹ Worcester Polytechnic
Institute
Dept. of Civil and
Environmental Eng.
100 Institute Road Worcester,
MA – 01609, USA
² Politecnico di Milano
Dept. of Aerospace Eng.
Via La Masa 34
20156 Milan, Italy
³ Worcester Polytechnic
Institute
Dept. of Civil and
Environmental Eng.
100 Institute Road Worcester,
MA – 01609, USA
Key Words: Verification and Validation, Metrics, Finite Element, Roadside safety, Crash Tests.
ABSTRACT
Qualitatively comparing the shapes of time histories to validate experiments with computational
simulations is a common technique in both general computational mechanics as well as computational
roadside safety. Qualitative comparisons, however, are subjective and open to interpretation. A variety of
quantitative metrics are available for comparing time history data as well but developing acceptance criteria
for these methods often relies on equally imprecise engineering judgment.
This paper presents the results of time-history comparisons of 10 essentially identical full-scale
vehicle re-directional crash tests with a vertical concrete wall. Five of the crash tests used exactly the same
type of vehicle whereas the other five used a similar vehicle that was within the EN1317 test vehicle
specifications for that class of vehicle. A variety of quantitative shape comparison metrics were calculated
for each set of repeated crash test cases and the results are presented. The results are compared and
contrasted as to the utility of each metric and its diagnostic value in assessing the degree of comparison
between the repeated crash test time histories.
Since the crash test experiments are as identical as can be achieved experimentally, the values of the
quantitative metrics represent the reasonable range for the metric corresponding to matched experiments.
Statistical analysis of the data will also be performed to assess the typical residual errors that can be
expected in full-scale roadside safety crash tests. Finally, recommendations for the use of specific metrics
are provided.
2
INTRODUCTION
Comparing the correspondence between curves from physical experiment and mathematical models
is a very important and common technique used by scientists and engineers to determine if the mathematical
models adequately represent physical phenomena. Two common reasons for which shapes are compared are
the verification or validation of computational results and the assessment of the repeatability of experimental
tests. In the former case, an experimental and a numerical curve are compared in order to assess how well
the numerical model predicts a physical phenomenon; while in the latter case, two or more experimental
curves are compared in order to asses if they represent the same or similar physical events.
A traditional technique has been to visually compare curves by matching peaks, oscillations,
common shapes, etc. Although this kind of comparison gives a subjective impression of how similar two
curves are, it is based on a purely subjective judgment which could vary from one analyst to another.
Approval decisions need to be based as much as possible on quantitative criteria that are unambiguous and
mathematically precise. In order to minimize the subjectivity, it is necessary to define objective comparison
criteria based on computable measures. Comparison metrics, which are mathematical measures that
quantify the level of agreement between simulation outcomes and experimental outcomes, can accomplish
this goal.
Recently, several comparison metrics have been developed in different engineering domains. [2-14]
Metrics can be grouped into two main categories: (i) deterministic metrics and (ii) stochastic metrics.
Deterministic metrics do not specifically address the probabilistic variation of either experiments or
calculation (i.e., for deterministic metrics the calculation results are the same every time given the same
input), while stochastic metrics involve computing the likely variation in both the simulation and the
experiment response due to parameter variations. Deterministic metrics found in literature can be further
classified into two main types: (a) domain-specific metrics and (b) shape comparison metrics. The domain-
specific metrics are quantities specific to a particular application. For example, the axial crush of a railroad
car in a standard crash test might be a metric that is useful in designing rolling stock but has no relevance to
other applications.
On the other hand, shape comparison metrics involve a comparison of curves from a numerical
simulation and a physical experiment. The curves may be time histories, force-deflection plots, stress-strain
plots, etc. Shape comparison metrics assess the degree of similarity between any two curves in general and,
therefore, do not depend on the particular application domain.
In roadside safety, comparisons between several tests or test and simulation results have mainly used
domain-specific metrics (e.g. occupant severity indexes, changes in velocity, 10-msec average accelerations,
maximum barrier deflection etc.).[1] The main advantage of this method was that the user could use the
same domain-specific metrics that are already used to evaluate experiments also to compare test and
simulations results. Although the comparison of domain-specific metrics can give an idea of how close two
tests or a test and a simulation are, shape-comparison metrics would be a more precise tool since they can be
used to directly evaluate the basic response of the structures, like acceleration and velocity time histories.
In roadside safety, domain-specific metrics are all derivative from the acceleration time histories so if the
time history information is valid, any metric derived from the time history data will also be valid.
Once a particular deterministic shape comparison metric is chosen, it is necessary to establish an
acceptance criterion for deciding if the comparison is acceptable. One approach is to arbitrarily set an
acceptance criteria but this is not very satisfying nor precise. A better approach is to determine what the
realistic variation in the deterministic shape comparison metrics is for identical physical experiments and use
that variation as an acceptance criterion. For example, if a series of physical experiments result in a shape
comparison metric that is within some specific range, a mathematical model of the same phenomena need
only fall within that same range.
3
OBJECTIVE
The purpose of this paper is to evaluate the response of the most common shape comparison metrics
found in literature for the case of ten essentially identical full-scale crash tests. The experiments consist of
two groups of five. The first five use identical vehicles (e.g., the same model and year) whereas the second
set of five use similar though not identical vehicles. For each of the two data sets, the five experimental
curves were compared in couples for a total of four different comparisons and the obtained metric values
were then used to define acceptance criteria and the expected range of values based on a probabilistic
approach.
The curves used in this study represent the time history of the lateral acceleration measured at the
center of gravity of the vehicles during the impact. A similar effort was undertaken for the longitudinal and
vertical accelerations but they are not presented here for the sake of brevity. In this type of collision, the
lateral acceleration is generally the best measure of the stiffness of the barrier response and is most directly
related to measures of occupant impact severity. Acceleration time histories were used rather than velocity
or displacement time histories because the accelerations are the experimentally observed quantities whereas
the velocity and displacement are calculated by integrating and double integrating the acceleration response,
respectively. If the acceleration response compares well, the velocity and displacement will also compare
well since they are simply a mathematical operation on the same source data. [2].
METRICS
A brief description of the metrics evaluated in this work is presented in this section. All fourteen
metrics considered in this paper are deterministic shape-comparison metrics. Details about the mathematical
formulation of each metric can be found in the sited literature. Conceptually, the metrics evaluated can be
classified into three main categories: (i) magnitude-phase-composite (MPC) metrics, (ii) single-value
metrics and (iii) analysis of variance (ANOVA) metrics.
MPC
MPC metrics treat the curve magnitude and phase separately using two different metrics (i.e., M and P,
respectively). The M and P metrics are then combined into a single value comprehensive metric, C. The
following MPC metrics were used: (a) Geers (original formulation and two variants), (b) Russell and (c)
Knowles and Gear. [4-9] Table 1 shows the analytical definition of each metric. In this and the following
sections, the terms mi and ci refer to the measured and computed quantities respectively with the “i”
subscribe indicating a specific instant in time.
Table 1: Definition of MPC metrics.
Magnitude Phase Comprehensive
Integral comparison metrics
Geers
Geers CSA
Sprague & Geers
4
Russell
where
Point-to-point comparison metrics
Knowles & Gear
where (with )
In all MPC metrics the phase component (P) should be insensitive to magnitude differences but
sensitive to differences in phasing or timing between the two time histories. Similarly, the magnitude
component (M) should be sensitive to differences in magnitude but relatively insensitive to differences in
phase. These characteristics of MPC metrics allow the analyst to identify the aspects of the curves that do
not agree. For each component of the MPC metrics, zero indicates that the two curves are identical. Each of
the MPC metrics differs slightly in its mathematical formulation. The different variations of the MPC
metrics are primarily distinguished in the way the phase metric is computed, how it is scaled with respect to
the magnitude metrics and how it deals with synchronizing the phase. In particular, the Sprague and Geers
metric [6] uses the same phase component as the Russell metric [7]. Also, the magnitude component of the
Russell metric is peculiar as it is based on a base-10 logarithm and it is the only MPC metrics among those
considered in this paper to be symmetric (i.e., the order of the two curves is irrelevant). The Knowles and
Gear metric [8,9] is the most recent variation of MPC-type metrics. Unlike the previously discussed MPC
metrics, it is based on a point-to-point comparison. In fact, this metric requires that the two compared
curves are first synchronized in time based on the so called Time of Arrival (TOA), which represents the
time at which a curve reaches a certain percentage of the peak value In this work the percentage of the peak
value used to evaluate the TOA was 5%, which is the typical value found in literature. Once the curves have
been synchronized using the TOA, it is possible to evaluate the magnitude metric. Also, in order to avoid
creating a gap between time histories characterized by a large magnitude and those characterized by a
smaller one, the magnitude component M has to be normalized using the normalization factor QS.
Single-value metrics
Single-value metrics give a single numerical value that represents the agreement between the two
curves. Seven single-value metrics were considered in this work: (1) the correlation coefficient metric,
(2) the NARD correlation coefficient metric (NARD), (3) Zilliacus error metric, (4) RSS error metric, (5)
Theil's inequality metric, (6) Whang's inequality metric and (7) the regression coefficient metric. [10-13]
The first two metrics are based on integral comparisons while the others are based on a point-to-point
comparison. The definition of each metric is shown in Table 2.
Table 2: Definition of single-value metrics.
Integral comparison metrics
Correlation Coefficient
Correlation Coefficient (NARD)
Point-to-point comparison metrics
Zilliacus error RSS error
5
Theil's inequality Whang's inequality
Regression coefficient
ANOVA metric
ANOVA metrics are based on the assumption that is two curves do, in fact, represent the same event then
any differences between the curves must be attributable only to random experimental error. The analysis of
variance (i.e., ANOVA) is a standard statistical test that assesses whether the variance between two curves
can be attributed to random error.[2,3] When two time histories represent the same physical event, both
should be identical such that the mean residual error, e , and the standard deviation of the residual errors, ,
are both zero. Of course, this is never the case in practical situations (e.g., experimental errors cause small
variations between tested responses even in identical tests). The conventional T statistic provides an
effective method for testing the assumption that the observed e is close enough to zero to represent only
random errors. Ray proposed a method where the residual error and its standard deviation are normalized
with respect to the peak value of the test curve and came to the following acceptance criteria based on six
repeated frontal full-scale crash tests [2]:
The average residual error normalized by the peak response (i.e., re ) should be less than five
percent.
The standard deviation of the normalized residuals (i.e., r ) should be less than 20 percent.
The t-test on the distribution of the normalized residuals should not reject the null hypothesis that the
mean value of the residuals is null for a paired two-tail t-test at the five-percent level, ,005.0t (i.e.,
90th
percentile).
r
r
enT
REPEATED FULL SCALE CRASH TESTS
A series of five crash tests with new Peugeot 106 vehicles (model year 2000) and a rigid concrete
barrier were performed as a part of the ROBUST project (Figure 1).[15] The tests were independently
carried out by five different test laboratories in Europe, herein called laboratories one through five, with the
purpose of assessing the repeatability of crash tests. As the main intent was to see if experimental curves
representing the same test result in similar responses, a rigid barrier was intentionally chosen in order to
limit the scatter of the results which is typically greater in the case of deformable barriers. A second series
of five tests was performed using the same barrier but with vehicles of different brands and models. All the
vehicles used in the series, however, corresponded to the standard small test vehicle specified the European
crash test standards, EN 1317. [16] The second set of tests was performed to investigate influences arising
from different vehicle models on the repeatability of crash tests. In all cases, the three components of
acceleration, including the lateral acceleration used in this paper, were measured at the center of gravity of
the vehicles.
Preprocessing
6
In order to correctly compare the different time histories, it was necessary to prepare them properly
by performing the following operations: (a) filtering, (b) re-sampling, (c) synchronizing and (d) trimming.
All the preprocessing operations and the following metrics evaluation were performed using Matlab® [17].
Initially, all the time histories were filtered using a SAE J211 CFC 60 compliant filter [18] and the initial
vertical shift typical of most experimental data was eliminated by shifting each curve by a value equal to the
average of the first ten data points.
Figure 1. Crash test of one the new Peugeot vehicles.
The residuals between two time histories are by definition the difference at each instant in time
between two curves, one of which is considered to be the “true” or correct curve. For each of the two sets of
tests, the “true” curve was chosen to be the time history which was considered to be closest to the average
response of the specific set (i.e., tests Lab#1 (Set 1) and Lab#1 (Set 2)). In order to compute the residuals,
the different time histories have to be sampled at the same rate and start at a common point. As the original
sampling rate was not the same for all the time histories, it was necessary to resample each curve to a
common rate which was chosen to be the highest sampling rate among all the tests (i.e., 20 kHz).
Also, as the curves did not always start at the exact time (i.e., the impact time did not happen at the
same time in each test), it was necessary to synchronize each test such that the most probable impact point
was matched in each curve. Although each curve could be considered independently, for the sake of
simplicity, it was decided to use the “true” curve of the first data set as the reference for the synchronization.
The method used to synchronize each time history pair was based on the minimization of the area between
the two curves (i.e., the absolute area of the time history of residual). A Matlab routine was implemented
which could shift either one or the other curve and evaluate the area between the curves in the new
configuration. A loop was then implemented to search for the shift value which corresponded to the
minimum residual error between the two curves. Once the shift values corresponding to each time history
from both the two sets of tests were evaluated, each curve was shifted by the maximum value and trimmed
to adjust the potential difference between the specific shift value required for that curve and the common
shift value used for all the curves. Also, the curves were cut at the tail in order to guarantee the same final
time (i.e., length of data vectors).
Residuals statistics
Once all the curves had the same sampling rate and were synchronized with respect to the time of initial
impact, the average and standard deviation of the residuals were evaluated for each time history of both the
two sets of tests. The specific values and the average of the standard deviation for the first two sets are
summarized in Table 3, while the value obtained considering all the tests are summarized in Table 4.
Table 3. Residual errors for crash test Set #1 and #2.
Comparison vs. True curve [Lab #1 (Set1)]
Set #1 Lab #2 Lab #3 Lab #4 Lab #5 Mean Std
Average 0 -0.01 -0.01 0 -0.005 0.00577
Standard Deviation
0.19 0.24 0.23 0.2 0.215 0.02380
Comparison vs. True curve [Lab #1 (Set2)]
Set #2 Lab #2 Lab #3 Lab #4 Lab #5 Mean Std
Average -0.01 0 0 -0.02 -0.0075 0.00957
Standard deviation
0.21 0.22 0.31 0.25 0.2475 0.04500
Table 4. Residual errors for crash tests for Sets #1 and #2 combined.
Lab #2 Lab #3 Lab #4 Lab #5 Lab #1 Lab #2 Lab #3 Lab #4 Lab #5 Mean Std
7
(Set 1) (Set 1) (Set 1) (Set 1) (Set 2) (Set 2) (Set 2) (Set 2) (Set 2)
Average 0 -0.01 -0.01 0 -0.01 -0.01 -0.01 -0.01 -0.03 -0.01 0.00866
Standard deviation
0.19 0.24 0.23 0.2 0.26 0.25 0.26 0.32 0.21 0.24 0.03937
Since the time histories for all the crash tests represented essentially identical physical events, the
residuals for each curve should be attributable only to random experimental error or noise. Statistically
speaking, this means that the residuals should be normally distributed around a mean error equal to zero. As
shown in the cumulative density function in Figure 2, the shape of the residual accelerations distribution is
typical of a normal distribution for both sets of crash tests when taken separately or combined. Since the
cumulative distribution is an “S” shaped curve centered on zero, the distribution of the residuals is consistent
with random experimental error as would be expected in these series of repeated crash tests.
Set#1 [True curve: Lab #1 (Set 1)] Set#2 [True curve: Lab #1(Set 2) ]
All tests [True curve: Lab #1 (Set 1)]
Figure 2. Cumulative density function of the residual accelerations for Set #1 (top left), Set #2 (top right)
and the combination of Sets #1 and #2 (bottom).
RESULTS
Once the time histories were preprocessed, each was compared to the “true” curve by evaluating all
fourteen comparison metrics previously described. All the calculations were performed using Matlab.
Initially, the two sets of tests, Set #1 with the same new vehicle and Set#2 with similar vehicles, were
considered separately using the response from the test Lab #1 belonging to the respective set as the “true”
curve.
For both sets, the average of the standard deviations of the residuals between each curve of the set
and the respective “true curve” was evaluated. The average value of the standard deviation for each set was
then used to evaluate the 90th
percentile envelope for each set by adding and subtracting to the respective
8
“true” curve the average of the standard deviations of the residuals for each specific set of tests multiplied by
1.6449 (Figure 3).
Set #1 Set #2
Figure 3. 90th
percentile envelope and acceleration time histories for Set# 1 and Set#2.
Table 5. Values of the comparison metrics for Set#1.
Lab#2 (Set 1) Lab#3 (Set 1) Lab#4 (Set 1) Lab#5 (Set 1) Mean STD
MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Geers Magnitude -23 -21.4 -14.4 -25.8 -21.2 4.85 -29.2 -13.1
Geers Phase 22.6 36.3 31.2 24.9 28.8 6.21 18.4 39.1
Geers Comprehensive 32.3 42.1 34.3 35.8 36.1 4.23 29.1 43.2
Geers CSA Magnitude -23 -21.4 -14.4 -25.8 -21.2 4.85 -29.2 -13.1
Geers CSA Phase 22.6 36.3 31.2 24.9 28.8 6.21 18.4 39.1
Geers CSA Comprehensive 32.3 42.1 34.3 35.8 36.1 4.23 29.1 43.2
Sprague-Geers Magnitude -23 -21.4 -14.4 -25.8 -21.2 4.85 -29.2 -13.1
Sprague-Geers Phase 21.8 28 25.8 22.9 24.6 2.81 20.0 29.3
Sprague-Geers Comprehensive 31.7 35.2 29.6 34.5 32.8 2.59 28.5 37.0
Russell Magnitude -18.5 -17.2 -11.8 -20.6 -17.0 3.75 -23.3 -10.8
Russell Phase 21.8 28 25.8 22.9 24.6 2.81 20.0 29.3
Russell Comprehensive 25.3 29.1 25.2 27.3 26.7 1.86 23.6 29.8
Knowles-Gear Magnitude 55.2 60.6 48.6 56.8 55.3 5.01 47.0 63.6
Knowles-Gear Phase 40.9 79.9 79.9 67.9 67.2 18.39 36.6 97.7
Knowles-Gear Comprehensive 53.1 64.2 55.1 58.8 57.8 4.88 49.7 65.9
Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Whang's inequality metric 40.9 53.2 46.3 46 46.6 5.05 38.2 55.0
Theil's inequality metric 35.8 43.9 40.1 37.9 39.4 3.46 33.7 45.2
Zilliacus error metric 70.8 92.2 87.1 77.9 82.0 9.53 66.2 97.8
RSS error metric metric 63.3 78.5 74.4 66 70.6 7.10 58.8 82.3
WIFac_Error 53.3 61.5 58.2 55.5 57.1 3.54 51.3 63.0
Regression Coefficient 72.6 52.4 58.9 69.8 63.4 9.43 47.8 79.1
Correlation Coefficient 72.7 55.3 62 69.8 65.0 7.86 51.9 78.0
Correlation Coefficient(NARD) 77.4 63.7 68.8 75.1 71.3 6.21 60.9 81.6
ANOVA Metrics Value Value Value Value Value Value min max
Average 0 -0.01 -0.01 0 -0.01 0.01 -0.01 0.00
Std 0.19 0.24 0.23 0.2 0.22 0.02 0.18 0.25
T-test -1.51 -1.87 -2.56 1.31
Acceptable value
9
Table 5 and 6 show the values of the comparison metrics obtained for each of the two sets of test data. The
values of the metrics for each of the laboratories (i.e., two through five) are shown in each table along with
the mean and standard deviation of the metric. If it is assumed that distribution of the metric is normal, then
90 percent of the values should be within 1.66 standard deviations of the mean. Possible acceptance criteria
are listed in the last column by calculating the 90th
percentile limit of the observed variation of the metric.
Table 6. Values of the comparison metrics for tests of Set#2.
Lab#2 (Set 2) Lab#3 (Set 2) Lab#4 (Set 2) Lab#5 (Set 2) Mean STD
MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Geers Magnitude -2.6 -8.2 35.6 -9.3 3.9 21.35 -31.6 39.3
Geers Phase 21.3 24.5 30.2 33.2 27.3 5.39 18.4 36.2
Geers Comprehensive 21.5 25.9 46.7 34.5 32.2 11.10 13.7 50.6
Geers CSA Magnitude -2.6 -8.2 35.6 -9.3 3.9 21.35 -31.6 39.3
Geers CSA Phase 21.3 24.5 30.2 33.2 27.3 5.39 18.4 36.2
Geers CSA Comprehensive 21.5 25.9 46.7 34.5 32.2 11.10 13.7 50.6
Sprague-Geers Magnitude -2.6 -8.2 35.6 -9.3 3.9 21.35 -31.6 39.3
Sprague-Geers Phase 21.2 22.8 25.4 26.7 24.0 2.49 19.9 28.2
Sprague-Geers Comprehensive 21.3 24.2 43.8 28.3 29.4 10.02 12.8 46.0
Russell Magnitude -2.3 -6.9 20.9 -7.8 1.0 13.50 -21.4 23.4
Russell Phase 21.2 22.8 25.4 26.7 24.0 2.49 19.9 28.2
Russell Comprehensive 18.9 21.1 29.2 24.6 23.5 4.49 16.0 30.9
Knowles-Gear Magnitude 54.2 60.7 97.3 68.4 70.2 19.01 38.6 101.7
Knowles-Gear Phase 38 74 16.7 100 57.2 37.07 -4.4 118.7
Knowles-Gear Comprehensive 51.9 63.2 89.1 74.6 69.7 15.91 43.3 96.1
Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Whang's inequality metric 37.7 42.2 43.8 43.2 41.7 2.76 37.1 46.3
Theil's inequality metric 32.7 35.3 41.3 41 37.6 4.26 30.5 44.7
Zilliacus error metric 69.3 77 97.3 79.5 80.8 11.84 61.1 100.4
RSS error metric metric 64.5 67.6 97.2 78.1 76.9 14.76 52.3 101.4
WIFac_Error 49.6 54 61.1 56.8 55.4 4.83 47.4 63.4
Regression Coefficient 67.1 62.8 0 43.9 43.5 30.67 -7.5 94.4
Correlation Coefficient 71.2 66.8 63.9 53 63.7 7.75 50.9 76.6
Correlation Coefficient(NARD) 78.7 75.4 69.8 66.8 72.7 5.37 63.8 81.6
ANOVA Metrics Value Value Value Value Value Value min max
Average -0.01 0 0 -0.02 -0.01 0.01 -0.02 0.01
Std 0.21 0.22 0.31 0.25 0.25 0.05 0.17 0.32
T-test -2.3 0.66 -0.23 -6.27
Acceptable value
Next, all ten tests from both sets were compared together considering the response of test Lab#1
from set 1 as the “true” curve. Similarly to the previous case of the two separate sets, the 90th
percentile
envelope was first evaluated (Figure 4). Table 7 shows the results obtained in this case.
10
Figure 4. 90
th percentile envelope and acceleration time histories considering all tests.
Table 7. Values of the comparison metrics considering all tests.
Lab#1 (Set 2) Lab#2 (Set 1) Lab#2 (Set 2) Lab#3 (Set 1) Lab#3 (Set 2) Lab#4 (Set 1) Lab#4 (Set 2) Lab#5 (Set 1) Lab#5 (Set 2) Mean STD
MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Geers Magnitude -16.7 -23 -18.9 -21.4 -23.5 -14.4 13 -25.8 -24.5 -17.2 11.94 -37.1 2.6
Geers Phase 41.3 22.6 38.3 36.3 42.6 31.2 47.5 24.9 29.2 34.9 8.44 20.9 48.9
Geers Comprehensive 44.6 32.3 42.7 42.1 48.6 34.3 49.2 35.8 38.1 40.9 6.10 30.7 51.0
Geers CSA Magnitude -16.7 -23 -18.9 -21.4 -23.5 -14.4 13 -25.8 -24.5 -17.2 11.94 -37.1 2.6
Geers CSA Phase 41.3 22.6 38.3 36.3 42.6 31.2 47.5 24.9 29.2 34.9 8.44 20.9 48.9
Geers CSA Comprehensive 44.6 32.3 42.7 42.1 48.6 34.3 49.2 35.8 38.1 40.9 6.10 30.7 51.0
Sprague-Geers Magnitude -16.7 -23 -18.9 -21.4 -23.5 -14.4 13 -25.8 -24.5 -17.2 11.94 -37.1 2.6
Sprague-Geers Phase 30 21.8 28.8 28 30.5 25.8 32.4 22.9 25 27.2 3.60 21.3 33.2
Sprague-Geers Comprehensive 34.4 31.7 34.5 35.2 38.5 29.6 34.9 34.5 34.9 34.2 2.45 30.2 38.3
Russell Magnitude -13.6 -18.5 -15.3 -17.2 -18.8 -11.8 9.5 -20.6 -19.5 -14.0 9.26 -29.4 1.4
Russell Phase 30 21.8 28.8 28 30.5 25.8 32.4 22.9 25 27.2 3.60 21.3 33.2
Russell Comprehensive 29.2 25.3 28.9 29.1 31.8 25.2 29.9 27.3 28.1 28.3 2.13 24.8 31.8
Knowles-Gear Magnitude 63.3 55.2 63.9 60.6 78.9 48.6 83.4 56.8 59.1 63.3 11.16 44.8 81.8
Knowles-Gear Phase 62.3 40.9 123.9 79.9 182.4 79.9 35.2 67.9 100 85.8 45.42 10.4 161.2
Knowles-Gear Comprehensive 63.1 53.1 77.2 64.2 103.6 55.1 77.5 58.8 67.6 68.9 15.60 43.0 94.8
Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Whang's inequality metric 51.6 40.9 53.3 53.2 52.7 46.3 53.3 46 44.9 49.1 4.66 41.4 56.9
Theil's inequality metric 46.2 35.8 44.8 43.9 47.6 40.1 49 37.9 40.3 42.8 4.54 35.3 50.4
Zilliacus error metric 98.3 70.8 93.6 92.2 92 87.1 112.1 77.9 79.1 89.2 12.33 68.8 109.7
RSS error metric metric 84.7 63.3 81.1 78.5 84 74.4 104.4 66 70.8 78.6 12.30 58.2 99.0
WIFac_Error 65.9 53.3 63.9 61.5 63.7 58.2 69.5 55.5 58 61.1 5.23 52.4 69.7
Regression Coefficient 39.5 72.6 47.6 52.4 41 58.9 0 69.8 64 49.5 22.02 13.0 86.1
Correlation Coefficient 49.3 72.7 52.6 55.3 47.2 62 44.6 69.8 64.6 57.6 10.13 40.7 74.4
Correlation Coefficient(NARD) 58.7 77.4 61.7 63.7 57.4 68.8 52.5 75.1 70.8 65.1 8.44 51.1 79.1
ANOVA Metrics Value Value Value Value Value Value Value Value Value Value Value min max
Average -0.01 0 -0.01 -0.01 -0.01 -0.01 -0.01 0 -0.03 -0.01 0.01 -0.02 0.00
Std 0.26 0.19 0.25 0.24 0.26 0.23 0.32 0.2 0.21 0.24 0.04 0.17 0.31
T-test -1.84 -1.51 -3.45 -1.87 -1.41 -2.56 -1.68 1.31 -7.99
Acceptable value
Table 8 shows the average of the mean values and the standard deviations evaluated for each of the
three cases previously analyzed, Set#1 vs. Lab#1 (Set 1), Set#2 vs. Lab#1 (Set 2) and all tests vs. Lab#1 (Set
1). The last column of Table 8 also shows the Acceptable Range of values considering these average values
obtained from each of the three above mentioned cases considered in this study.
11
Table 8. Average of the means, the standard deviations and the corresponding acceptance criteria.
Set#1 Set#2 All Set#1 Set#2 All
MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Geers Magnitude -21.2 21.4 -17.2 4.85 39.32 11.94 -5.7 18.71 -36.7 25.4
Geers Phase 28.8 5.4 34.9 6.21 36.24 8.44 23.0 16.96 -5.2 51.2
Geers Comprehensive 36.1 11.1 40.9 4.23 50.58 6.10 29.4 20.30 -4.3 63.1
Geers CSA Magnitude -21.2 21.4 -17.2 4.85 39.32 11.94 -5.7 18.71 -36.7 25.4
Geers CSA Phase 28.8 5.4 34.9 6.21 36.24 8.44 23.0 16.96 -5.2 51.2
Geers CSA Comprehensive 36.1 11.1 40.9 4.23 50.58 6.10 29.4 20.30 -4.3 63.1
Sprague-Geers Magnitude -21.2 21.4 -17.2 4.85 39.32 11.94 -5.7 18.71 -36.7 25.4
Sprague-Geers Phase 24.6 2.5 27.2 2.81 28.15 3.60 18.1 11.52 -1.0 37.2
Sprague-Geers Comprehensive 32.8 10.0 34.2 2.59 46.03 2.45 25.7 17.02 -2.6 53.9
Russell Magnitude -17.0 13.5 -14.0 3.75 23.38 9.26 -5.8 12.13 -26.0 14.3
Russell Phase 24.6 2.5 27.2 2.81 28.15 3.60 18.1 11.52 -1.0 37.2
Russell Comprehensive 26.7 4.5 28.3 1.86 30.91 2.13 19.8 11.63 0.5 39.2
Knowles-Gear Magnitude 55.3 19.0 63.3 5.01 101.70 11.16 45.9 39.29 -19.3 111.1
Knowles-Gear Phase 67.2 37.1 85.8 18.39 118.71 45.42 63.3 60.84 -37.6 164.3
Knowles-Gear Comprehensive 57.8 15.9 68.9 4.88 96.11 15.60 47.5 38.86 -17.0 112.1
Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Whang's inequality metric 46.6 2.8 49.1 5.05 46.31 4.66 32.8 18.67 1.8 63.8
Theil's inequality metric 39.4 4.3 42.8 3.46 44.65 4.54 28.8 17.55 -0.3 58.0
Zilliacus error metric 82.0 11.8 89.2 9.53 100.43 12.33 61.0 40.76 -6.6 128.7
RSS error metric metric 70.6 14.8 78.6 7.10 101.36 12.30 54.6 40.25 -12.2 121.4
WIFac_Error 57.1 4.8 61.1 3.54 63.40 5.23 41.0 24.05 1.1 80.9
Regression Coefficient 63.4 30.7 49.5 9.43 94.36 22.02 47.9 41.94 -21.7 117.5
Correlation Coefficient 65.0 7.8 57.6 7.86 76.60 10.13 43.4 31.53 -8.9 95.8
Correlation Coefficient(NARD) 71.3 5.4 65.1 6.21 81.59 8.44 47.2 32.08 -6.0 100.5
ANOVA Metrics Value Value Value Value Value Value Value Value min max
Average -0.01 0.01 -0.01 0.01 0.01 0.01 0.0 0.01 -0.01 0.01
Std 0.22 0.05 0.24 0.02 0.32 0.04 0.2 0.13 -0.05 0.38
T-test
Acceptable valueSTDMean
Average Mean Average STD
RESULTS USING DERIVED TIME HISTORIES
As previously mentioned, the evaluation of metrics should always be performed using time histories
directly measured and not derived using either integration or differentiation. For example, if accelerations
are measured experimentally, accelerations should be the basis of comparison. Velocities and displacements
obtained by integrating the acceleration curve will accumulate error. As an example of this situation, the
comparison metrics were evaluated using the velocity time histories obtained by integrating the acceleration
time histories from Set#1 (Figure 5).
12
Figure 5. 90
th percentile envelope and velocity time histories from Set#1.
As shown in Table 9 all comparison metrics except the ANOVA metric gave better results using velocities
instead of accelerations. The integration process is essentially a second low-pass filter which smoothes the
time history data further. Also, the exception of the ANOVA metric confirmed that the residuals
accumulated because of the integration operation. Figure 6 shows the residual distribution in the case of the
velocity time histories. As can be seen, the distribution of the four tests is more spread than in the case of
the acceleration time histories and the mean residual (i.e., the residual error corresponding to the 50th
percentile) is no longer zero for three of the four comparisons While the acceptance criteria for velocity-
based comparisons would be smaller, the smaller value would provide an incorrect estimation of how similar
the curves are since the actual variation of the metrics in the units the data was measured in is greater as
shown in Table 5.
Figure 6. Cumulative density function of the residual velocities for Set #1.
13
Table 9. Values of the comparison metrics using velocity time histories (Set#1).
Lab#2 (Set 1) Lab#3 (Set 1) Lab#4 (Set 1) Lab#5 (Set 1) Mean STD
MPC Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Geers Magnitude 0.5 5.1 4.5 -4 1.5 4.21 -5.5 8.5
Geers Phase 0.2 0.6 0.5 0.4 0.4 0.17 0.1 0.7
Geers Comprehensive 0.5 5.1 4.5 4 3.5 2.07 0.1 7.0
Geers CSA Magnitude 0.5 5.1 4.5 -4 1.5 4.21 -5.5 8.5
Geers CSA Phase 0.2 0.6 0.5 0.4 0.4 0.17 0.1 0.7
Geers CSA Comprehensive 0.5 5.1 4.5 4 3.5 2.07 0.1 7.0
Sprague-Geers Magnitude 0.5 5.1 4.5 -4 1.5 4.21 -5.5 8.5
Sprague-Geers Phase 2 3.5 3.1 2.8 2.9 0.64 1.8 3.9
Sprague-Geers Comprehensive 2 6.2 5.4 4.9 4.6 1.83 1.6 7.7
Russell Magnitude 0.4 4.1 3.6 -3.4 1.2 3.46 -4.6 6.9
Russell Phase 2 3.5 3.1 2.8 2.9 0.64 1.8 3.9
Russell Comprehensive 1.8 4.8 4.2 3.9 3.7 1.30 1.5 5.8
Knowles-Gear Magnitude 6.2 12.3 10.2 8.5 9.3 2.59 5.0 13.6
Knowles-Gear Phase 36.2 52 43.4 48 44.9 6.78 33.6 56.2
Knowles-Gear Comprehensive 15.8 24 20 21.1 20.2 3.40 14.6 25.9
Single Value Metrics Value [%] Value [%] Value [%] Value [%] Value [%] Value [%] min [%] max [%]
Whang's inequality metric 2.7 5.1 4.7 4.4 4.2 1.06 2.5 6.0
Theil's inequality metric 3.1 6 5.4 4.8 4.8 1.25 2.8 6.9
Zilliacus error metric 5.3 10.3 9.4 8.5 8.4 2.18 4.8 12.0
RSS error metric metric 6.2 12.3 11 9.5 9.8 2.63 5.4 14.1
WIFac_Error 6.1 11.5 10.3 9.3 9.3 2.32 5.5 13.1
Regression Coefficient 98.7 94.9 96 97 96.7 1.61 94.0 99.3
Correlation Coefficient 99 96.9 97.7 98.1 97.9 0.87 96.5 99.4
Correlation Coefficient(NARD) 99.8 99.4 99.5 99.6 99.6 0.17 99.3 99.9
ANOVA Metrics Value Value Value Value Value Value min max
Average 0 -0.02 -0.02 0.04 0.00 0.03 -0.05 0.05
Std 0.05 0.09 0.08 0.06 0.07 0.02 0.04 0.10
T-test 4.99 -16.38 -14.17 44
Acceptable value
DISCUSSION
The data shown in Tables 5 through 8 show several interesting characteristics. Since the Geers,
Geers CSA and Sprague-Geers metrics all use the same formulation for the magnitude comparison it is not
surprising that all yield essentially the same values. These magnitude comparisons all yield metrics less
than 25.8 for set 1 and 35.6 for set 2. The phase components of the Geer, Geer CSA and Russel MPC
metrics also share the same formulation and the values for set 1 are less than 36.3 and 33.2 for set 2. In
general, the Geers, Geer CSA, Sprague-Geer and Russell metrics all result in similar values with maximum
magnitude components of 35.6, phase components of 33.2 and combined values of 46.7. Interestingly,
while the mean values are the same for these metrics for both sets 1 and 2, the standard deviation of the
magnitude components of these MPC metrics is about five times greater for set 2 than set 1 while the
standard deviation of the phase metric is similar in both sets. This may be an indication that the use of
slightly different vehicles in set 2 introduces additional variability.
Table 10 and Table 11 show the ranking of the full-scale crash test according to each metric. As
shown in the first column of Table X and Y, the Geers, Geer CSA, Sprague-Geer and Russell metrics all rate
the results from Lab#4 as the best match and Lab#2 as the worst for magnitude, Lab#5 as the best and Lab#1
as the worst for phase and Lab#4 for the best and Lab#2 as the worst for the combined metric.
The Knowles-Gear metric scales much differently than the other MPC metrics. The maximum
magnitude component is 60.6 for set #1 and 97.3 for set #2. Similarly, the maximum phase component is
28.0 for set #1 and 100 for set #2. The Knowles-Gear metric rates Lab#5 as the best magnitude match,
Lab#3 as the best phase match and Lab#5 as the best combined. Lab#2 results in the worst magnitude
match and Lab#1 the worst phase match. These results are slightly different than the other MPC metrics.
Both the standard deviation of the magnitude and phase components increased significantly between set#1
14
and set#2 indicating, again, that the use of similar but not identical vehicles may increase the variability of
the metric.
Table 10. Ranking of the full-scale crash test according to each evaluated metric (Set 1 and Set 2).
Best
2n
d
3rd
Wo
rst
Best
2n
d
3rd
Wo
rst
MPC Metrics
Geers Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4
Geers Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5
Geers Comprehensive Lab#2 Lab#4 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Geers CSA Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4
Geers CSA Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5
Geers CSA Comprehensive Lab#2 Lab#4 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Sprague-Geers Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4
Sprague-Geers Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5
Sprague-Geers Comprehensive Lab#4 Lab#2 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Russell Magnitude Lab#4 Lab#3 Lab#2 Lab#5 Lab#2 Lab#3 Lab#5 Lab#4
Russell Phase Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5
Russell Comprehensive Lab#4 Lab#2 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Knowles-Gear Magnitude Lab#4 Lab#2 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Knowles-Gear Phase Lab#2 Lab#5 Lab#4 Lab#2 Lab#3 Lab#5
Knowles-Gear Comprehensive Lab#2 Lab#4 Lab#5 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Single Value Metrics
Whang's inequality metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Theil's inequality metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Zilliacus error metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
RSS error metric metric Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
WIFac_Error Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Regression Coefficient Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
Correlation Coefficient Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5
Correlation Coefficient(NARD) Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#4 Lab#5
ANOVA Metrics
Average Lab#2 Lab#5
Std Lab#2 Lab#5 Lab#4 Lab#3 Lab#2 Lab#3 Lab#5 Lab#4
T-test Lab#5 Lab#2 Lab#3 Lab#4 Lab#4 Lab#3 Lab#2 Lab#5
Lab#3 & Lab#4
Lab#2 & Lab#5Lab#3 & Lab#4Lab#3 & Lab#4
Set 1 Set 2
All the single value metrics resulted in rating the results from Lab#5 as the best and either Lab#1 or
Lab#2 as the worst. As shown in Table 2, the correlation coefficient, NARD correlation coefficient, RSS
error and regression coefficient all have similar formulation and an identical match is indicated by unity.
The poorest correlation was 55.3 for Lab#3 and the best was 75.1 for Lab#5. Like the MPC metrics, the
standard deviation of the metrics generally doubled in the second set of experiments.
The average residual error component of the ANOVA metric was generally very close to zero for all
the crash tests in both sets of data. The standard deviation of the residual errors was as high as 31 percent
and like all the other metrics, the standard deviation of the standard deviation of the residuals doubled in the
second set of tests. Only two tests in each set (Lab#3 and Lab#5 for set #1 and Lab#3 and Lab#4) passed
the T test at the 90th
percent confidence level.
In evaluating the ANOVA metrics for a series of six identical frontal rigid pole impacts, Ray proposed an
acceptance criterion of a mean residual error less than five percent of the peak and a standard deviation of
less than 20 percent of the peak test acceleration.[2] As shown in Tables 5 through 7, most of the tests in
these two test series had mean residual errors very close to zero but the standard deviations tended to be
somewhat higher than the 20 percent resulting in T statistics just outside the 90th
percentile acceptance
range. The reason for this is probably that the redirectional test examined in this paper is much less
15
repeatable than the frontal impact originally studied by Ray. The T test in some of these impacts can fail
because the result of the test was in actual fact somewhat different. For example, in a redirectional test, a
slight difference in the suspension or steering system could cause a slightly different orientation of the front
wheels which would in turn change the redirection angle and lateral forces. While the impact conditions
may well be essentially identical, the result of the test is still not identical due to the other uncontrollable
variations in the experiment. The T-test aspect of the ANOVA metrics is, therefore, an extremely difficult
test to match although when the T statistic is in the passing range, the resulting comparison will be very
good.
Table 11.. Ranking of the full-scale crash test according to each evaluated metric (all tests).
Best
2n
d
3rd
4th
5th
6th
7th
8th
Wo
rst
MPC Metrics
Geers Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)
Geers Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Geers Comprehensive Lab#2 (1) Lab#4 (1) Lab#5 (1) Lab#5 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Geers CSA Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)
Geers CSA Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Geers CSA Comprehensive Lab#2 (1) Lab#4 (1) Lab#5 (1) Lab#5 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Sprague-Geers Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)
Sprague-Geers Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Sprague-Geers Comprehensive Lab#4 (1) Lab#2 (1) Lab#1 (2) Lab#3 (1) Lab#3 (2)
Russell Magnitude Lab#4 (2) Lab#4 (1) Lab#1 (2) Lab#2 (2) Lab#3 (1) Lab#2 (1) Lab#3 (2) Lab#5 (2) Lab#5 (1)
Russell Phase Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Russell Comprehensive Lab#4 (1) Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#2 (2) Lab#3 (1) Lab#1 (2) Lab#4 (2) Lab#3 (2)
Knowles-Gear Magnitude Lab#4 (1) Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#3 (1) Lab#1 (2) Lab#2 (2) Lab#3 (2) Lab#4 (2)
Knowles-Gear Phase Lab#4 (2) Lab#2 (1) Lab#1 (2) Lab#5 (1) Lab#5 (2) Lab#2 (2) Lab#3 (2)
Knowles-Gear Comprehensive Lab#2 (1) Lab#4 (1) Lab#5 (1) Lab#1 (2) Lab#3 (1) Lab#5 (2) Lab#2 (2) Lab#4 (2) Lab#3 (2)
Single Value Metrics
Whang's inequality metric Lab#2 (1) Lab#5 (2) Lab#5 (1) Lab#4 (1) Lab#1 (2) Lab#3 (2) Lab#3 (1)
Theil's inequality metric Lab#2 (1) Lab#5 (1) Lab#4 (1) Lab#5 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Zilliacus error metric Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (2) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#4 (2)
RSS error metric metric Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#3 (2) Lab#1 (2) Lab#4 (2)
WIFac_Error Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#3 (2) Lab#2 (2) Lab#1 (2) Lab#4 (2)
Regression Coefficient Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#3 (2) Lab#1 (2) Lab#4 (2)
Correlation Coefficient Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
Correlation Coefficient(NARD) Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#1 (2) Lab#3 (2) Lab#4 (2)
ANOVA Metrics
Average Lab#5 (2)
Std Lab#2 (1) Lab#5 (1) Lab#5 (2) Lab#4 (1) Lab#3 (1) Lab#2 (2) Lab#4 (2)
T-test Lab#5 (1) Lab#3 (2) Lab#2 (1) Lab#4 (2) Lab#1 (2) Lab#3 (1) Lab#4 (1) Lab#2 (2) Lab#5 (2)
Lab#2 (2) & Lab#5 (1)Lab#4 (2) & Lab#5 (2)
Lab#3 (1) & Lab#4 (1)
Lab#2 (2) & Lab#4 (2)
All tests (Set 1 + Set 2)
Lab#2 (1) & Lab#5 (1)Lab#1 (2), Lab#2 (2), Lab#3 (1), Lab#3 (2), Lab#4 (1) & Lab#4 (2)
Lab#1 (2) & Lab#3 (2)
CONCLUSIONS
This paper investigated the use of shape comparison metrics to quantitatively compare the similarity
of the time histories of two sets of five repeated identical full-scale crash tests. Ten essentially identical full-
scale crash tests with a vertical concrete wall were considered, five of them performed using the same type
of vehicle and the other five using similar vehicles but still complying with the EN1317 test specifications
for that class of vehicle.
The original raw time histories from the 10 tests were filtered, re-sampled and synchronized in order
to be correctly compared to each other. The statistics derived from the analysis of the residuals confirmed
the hypothesis that the errors were normally distributed and could, therefore, be attributed to normal random
experimental error.
16
A total of three sets of tests were analyzed and a variety of quantitative shape comparison metrics
were then calculated for each set of repeated crash test cases. The comparison of the results showed the
utility of each metric and its diagnostic value in assessing the degree of comparison between the repeated
crash test time histories. Also, possible acceptance criteria were suggested for each evaluated metric
according to the value obtained from the analysis of these nearly identical tests.
Acknowledgments:
The authors are grateful to Dr. Leonard Schwer, Mr. David Moorcroft and other members of the
ASME PTC-60 committee for providing helpful information about verification and validation metrics.
Also, the authors are thankful to Dr. Chiara Silvestri for her help in finalizing and revising the manuscript.
This work was made possible thanks to the support of the National Cooperative Highway Research Program
as a part of Project NCHRP 22-24.
17
REFERENCES
[1] C.A. Plaxico, M.H. Ray and K. Hiranmayee, “Impact Performance of the G4(1W) and G4(2W)
Guardrail Systems Comparison Under NCHRP Report 350Test 3-11 Conditions”, Transportation Research
Record 1720, Transportation Research Board, Washington, D.C., pp 7-18, (2000).
[2] M.H. Ray, “Repeatability of Full-Scale Crash Tests and a Criteria for Validating Finite Element
Simulations”, Transportation Research Record, Vol. 1528, pp. 155-160, (1996).
[3] W.L. Oberkampf and M.F. Barone, “Measures of Agreement Between Computation and Experiment:
Validation Metrics,” Journal of Computational Physics Vol. 217, No. 1 (Special issue: Uncertainty
quantification in simulation science) pp 5–36, (2006).
[4] T.L Geers, “An Objective Error Measure for the Comparison of Calculated and Measured Transient
Response Histories”, The Shock and Vibration Bulletin, The Shock and Vibration Information Center, Naval
Research Laboratory, Washington, D.C., Bulletin 54, Part 2, pp. 99-107, (June 1984).
[5] Comparative Shock Analysis (CSA) of Main Propulsion Unit (MPU), Validation and Shock Approval
Plan, SEAWOLF Program: Contract No. N00024-90-C-2901, 9200/SER: 03/039, September 20, 1994.
[6] M.A. Sprague and T.L. Geers, “Spectral elements and field separation for an acoustic fluid subject to
cavitation”, J Comput. Phys., pp. 184:149, Vol. 162, (2003).
[7] D.M. Russell, “Error Measures for Comparing Transient Data: Part I: Development of a Comprehensive
Error Measure”, Proceedings of the 68th shock and vibration symposium, pp. 175184, (2006).
[8] L.E. Schwer, “Validation Metrics for Response Time Histories: Perspective and Case Studies”, Engng.
with Computers, Vol. 23, Issue 4, pp. 295309, (2007).
[9] C.P. Knowles and C.W. Gear, “Revised validation metric”, unpublished manuscript, 16 June 2004
(revised July 2004).
[10] J. Cohen, P. Cohen, S.G. West and L.S. Aiken, Applied multiple regression/correlation analysis for the
behavioral sciences, Hillsdale, NJ: Lawrence Erlbaum, (3rd ed.), 2003.
[11] S. Basu and A. Haghighi, “Numerical Analysis of Roadside Design (NARD) vol. III: Validation
Procedure Manual”, Report No. FHWA-RD-88-213, Federal Highway Administration, Virginia, 1988.
[12] B. Whang, W.E. Gilbert and S. Zilliacus, Two Visually Meaningful Correlation Measures for
Comparing Calculated and Measured Response Histories, Carderock Division, Naval Surface Warfare
Center, Bethesda, Maryland, Survivability, Structures and Materials Directorate, Research and Development
Report, CARDEROCKDIV-U-SSM-67-93/15, September, 1993.
18
[13] H. Theil, Economic Forecasts and Policy, North-Holland Publishing Company, Amsterdam, 1975.
[14] D.M. Russell, “Error Measures for Comparing Transient Data: Part II: Error Measures Case Study”,
Proceedings of the 68th shock and vibration symposium, pp. 185198, (2006).
[15] ROBUST PROJECT
[16] European Committee of Standardization, “European Standard EN 1317-1 and EN 1317-2: Road
Restraint Systems”, CEN, 1998.
[17] Matlab: User Guide, The MathWorks Inc, (1994-2008).
[18] SAE J211-1 (R) Instrumentation for Impact Test—Part 1—Electronic Instrumentation, SAE
International, Deaburn MI, USA, Jul 1, 2007.