Download - Reliability of Three-Dimensional Facial Landmarks Using ... · Reliability of Three-Dimensional Facial Landmarks Using Multivariate Intraclass Correlation Abstract The intraclass

1

Reliability of Three-Dimensional Facial Landmarks Using Multivariate Intraclass Correlation

Abstract The intraclass correlation coefficient (ICC) is widely used in many fields, including orthodontics,

as a measure of reliability of quantitative data. The main goals of this study were to determine

the multivariate intraclass correlation, a measure of intra-rater reliability, for the selection of 6

3D mid-facial soft tissue landmarks and compare the multivariate ICC with other measures of

landmark reliability. 3D stereophotogrammetric images of 15 randomly selected subjects were

landmarked twice by the same rater using 3dMD software. The multivariate ICC was found to

be as high or higher than the individual coordinate measures in each of the landmarks

examined in this study. The reliability of all six soft-tissue landmarks in this study was excellent

(ICC >0.98). When the within-coordinate and Euclidean distance criteria were applied, the

reliability of the landmarks was found to be acceptable. However, the two selections of the

landmark pronasion were found to be significantly different in the X coordinate. The

multivariate ICC identified this same landmark as its lowest estimate of ICC. This indicates that

the multivariate ICC may be important in quantifying reliability and warrants investigation in

future craniofacial landmark reliability studies.

Introduction The intraclass correlation coefficient is widely used in many fields, including orthodontics, as a

measure of reliability of quantitative data. With the availability of technical software capable of

2

recording two and three-dimensional landmark coordinate data for cone beam computerized

tomography (CBCT) and digital cephalometric data, the intraclass correlation coefficient is used

to establish intra- and inter-rater reliability or reliability of landmarks across software or

machines. Recent applications of the intraclass correlation to assess reliability of landmark

placement have been used in the study of soft tissue landmarks using CBCT (Fourie, 2010), hard

tissue landmarks using CBCT (Lagravere, 2009), and cephalometry (Chien, 2009). The most

commonly used method of reporting reliability is a summary measure of the distribution (mean

or range) of intraclass correlation coefficients calculated within each coordinate. The estimates

of the intraclass correlation coefficient are often obtained using a one-way or two-way random

effects ANOVA model (Fisher, 1958; Haggard, 1958; Shrout and Fleiss, 1979) from which

confidence intervals and significance values can also be computed.

However, statistical methods for calculating a multivariate estimate of intraclass correlation

exist and have been used extensively in the fields of biology, epidemiology, and genetics to

estimate the degree of resemblance between members of a family with respect to several

characteristics (Mian, 1997). In the study of reliability of craniofacial landmark selection, an

estimate of multivariate intraclass correlation would represent how closely a “family” of

measures (usually 2 measures) resemble each other with respect to the X, Y, and Z coordinate

information. The unified estimate across coordinates enable a global measure, rather than a

summary of multiple estimates, on which to assess reliability of landmark selection.

A measure of multivariate intraclass correlation was first introduced by Rao (1945, 1953) in

which asymptotic distribution and methods of significance testing were established in the case

3

that each group had the same number of members. This early work was expanded by Donner

and Koval (1980) who proposed an analysis of variance estimator. Srivastava (1984) developed

non-iterative methods based on weighted sums of squares to obtain unbiased estimators of

covariance matrices for a single characteristic and their asymptotic properties for the case in

which groups have a different number of members. Other estimates based on maximum

likelihood methods have been proposed (Rosner, 1977) , however it has been shown through

simulation studies that inferences based on non-iterative methods perform as favorable or

more favorable than methods based on maximum likelihood estimators (Rosner, 1979; Konishi,

1982, 1985; Donner and Eliasziw, 1988). Non-iterative approaches are free of the possible

pitfalls of methods of estimation based on the maximization of implicit functions of multiple

parameters, which are sensitive to local extrema and not guaranteed to converge. Konishi and

Khatri (1991) generalized Srivastava’s estimator and the Donner and Koval (1980) estimator to

the multivariate situation of more than one characteristic and proposed a unified estimator

based on the maximum canonical correlation (interclass) and eigenvalue (intraclass) of the

covariance matrix as measures of degree of resemblance (Konishi, 1991). This method assumes

multivariate normal distribution for asymptotic properties of the estimates.

The main goals of this study were to determine the multivariate intraclass correlation, a

measure of intra-rater reliability, for the selection of 6 three-dimensional mid-facial soft tissue

landmarks (nasion, pronasion, right and left alare, labiale superius, labiale inferius, and

subnasale (used for superimposition) see Figure 1), and compare the multivariate intraclass

correlation coefficient with other measures of landmark agreement. This study is innovative in

the application of these methods of calculating the estimate of multivariate intraclass

4

correlation coefficient to assess the reliability of 3D landmark selection and in the comparison

of this estimate to conventional within-coordinate intraclass correlation coefficients.

Materials and Methods Three-dimensional stereophotogrammetric images of 15 randomly selected subjects were

landmarked using 3dMD software (Atlanta, GA) and coordinates of each landmark were

selected twice by the same rater with approximately 2 weeks between landmark selections. Six

facial landmarks were selected for this study because they represent the nasal and mouth area

of the face which are pronounced three- dimensional facial features. Coordinates were

exported and the sets of landmarks were superimposed based on translation of each set of first

subnasale landmark coordinates to the origin. The sets of landmarks were then standardized

with respect to rotation (X into Y holding Z constant) and facial tilt (Y into Z holding X constant)

using rotation matrices with the coordinates of the first measure of nasion located on the Y-axis

for all subjects after adjustment (Lele and Richtsmeier, 2001). Standardization of head position

and coordinate location was not done in the imaging process, but is important in assessing

reliability to accurately reflect the variation of each landmark; which impacts the intraclass

correlation coefficient. Therefore the head position was standardized using rotation matrices

after the coordinates were exported. Conventional within-coordinate intraclass correlation

coefficients, confidence intervals, and significance probabilities, obtained from a one-way

ANOVA model, were computed for the X, Y, and Z planes for each landmark. Since the intraclass

correlation coefficient is a parametric procedure, the residuals from each model were

examined for normality using the Shapiro-Wilk test. Only left alare in the Y coordinate was

there evidence of non normality (Shapiro-Wilk p=0.0172). However since the one-way ANOVA

5

model is generally robust against minor departures from normality, the use of ICC for this

coordinate was deemed appropriate. The intraclass correlation coefficient gives the proportion

of variance attributable to between-group differences, and the null hypothesis for significance

testing is that this coefficient is equal to zero. The ICC ranges from 0 to 1, with 1 indicating

perfect agreement. A commonly adopted minimum acceptable univariate intraclass correlation

coefficient is 0.80 (Shrout and Fleiss, 1979), with 0.90 generally considered excellent

agreement. However, since the estimate of the intraclass coefficient is subject to sampling

error, Lee (1989) proposes that the lower bound of the 95% confidence interval of the estimate

be at least 0.75. A method of multivariate intraclass correlation (Konishi, 1991) was used to

determine the level of agreement across X, Y, and Z coordinates. Distributional assumptions

were validated using the Shapiro-Wilk test within each dimension. This multivariate method, as

discussed above, produces a unified estimator of intra-rater agreement between the dual

landmark selections with respect to X, Y, and Z coordinate information.

Lee (1989) proposes a three-pronged approach for establishing acceptable agreement. In

addition to a 1) lower bound of the intraclass correlation coefficient 95% confidence interval of

at least 0.75, there must be 2) no systematic bias between the measures, and 3) no significant

differences between the measures. Estimation of intraclass correlation does not distinguish

between systematic and random error. To identify any possible bias and detect significant

differences in the sets of landmarks, the difference was calculated for each set of landmark

coordinates (first-second) within each plane. The normality of each within-coordinate

difference variable was assessed using the Shapiro-Wilk test under the null hypothesis that the

distribution is normal. Difference variables shown to be normal were tested using the paired t-

6

test under the null hypothesis that the mean difference between the two measures was equal

to zero. For variables in which the distributions of the difference were shown to be non-

normal, a Wilcoxon Signed-Rank test was used to determine if the median difference between

the coordinates from the two landmarks was equal to zero (assuming symmetry).

To assess differences in landmark placement that may be of clinical importance, the Euclidean

distance between the two sets of 3D coordinates was calculated. Descriptive statistics are

given to summarize the actual distance in space between the two sets of landmarks.

Calculation of the multivariate intraclass correlation coefficient is not available in standard

commercially available statistical software. An R program was created by the author to

perform the calculations of the estimates for the multivariate ICC reported in this study and is

available upon request.

Analysis was done using SAS Enterprise Guide 4.2 and R statistical software, with a specified

Type I error of 0.05.

Results Within-coordinate reliability analysis was performed on the selection of these six soft tissue

landmarks. The intraclass correlations coefficients for the landmarks are shown in Table 1. The

within-coordinate coefficients ranged from 0.9841 in the Y coordinate of left alare, to 0.9999 in

the Y coordinate of nasion and the X coordinate of right alare. No lower bounds of the 95%

confidence intervals were below the 0.75 standard proposed by Lee; with all lower bounds

>0.95. In addition, all coordinates and all landmarks had intraclass coefficients significantly

greater than zero (p<0.0001 for all). The mean ICC for all coordinates within a landmark was

7

calculated, this value ranged from 0.9912 in left alare to 0.9997 in nasion. It is worth noting

here that the first measure of nasion was a landmark used to standardize the rotation and tilt of

the set of facial landmarks.

The multivariate intraclass correlation (Konishi, 1990), encompassing information for all three

dimensions, was calculated for each landmark. These unified landmark values ranged from

0.9991 for pronasion to 0.9999 for nasion, right and left alare.

The difference between the first and second landmark selections (first – second) were

calculated for each dimension and each landmark. These differences were tested using the

Student’s t test or Wilcoxon signed rank test to determine if mean or median respectively was

equal to zero. The p-values for these tests are given in Table 2. A p value <0.05 indicates that

there is a significant difference in the two landmark selections in that coordinate. A significant

difference occurred between the two selections of pronasion in the X coordinate with

significant differences (mean=-0.22mm, t-test p=0.0014). Note that no adjustment was made

for multiple comparisons.

The Euclidean distance (in mm) was calculated for each landmark to assess the actual distance

between the two landmark selections in three-dimensional space. The summary of the

Euclidean distances is given in Table 3. The largest mean Euclidean distances were found to

exist in the left alare landmark (mean=0.84mm, st.dev=0.73mm) followed by the labiale

superius landmark (mean=0.78mm, st.dev=0.51mm). Pronasion (mean=0.49mm,

st.dev=0.24mm) and right alare (mean=0.48mm, st.dev=0.29mm) had the smallest mean

Euclidean distances between landmark selections. However, when medians were considered

8

the largest distance occurred in the labiale superius landmark (median=0.70mm). The left alare

landmark had the largest variance in the Euclidean distance of the landmarks examined in this

study.

Discussion This study applied a technique of calculating multivariate intraclass correlation coefficients, a

widely used measure in other scientific fields, to measure the reliability of two sets of 3D soft-

tissue facial landmark selections. The multivariate measure was then compared to the

conventional methods used currently in reliability studies in the field of craniofacial

landmarking, specifically within-coordinate intraclass correlation coefficients and summary

measures by landmark. In contrast with these summary measures of coordinate-specific

intraclass correlations, the multivariate measure was as high or higher than the individual

coordinate measures in each of the landmarks examined in this study. As the maximum

eigenvalue of the variance-covariance matrix, the multivariate estimate takes into account both

the variances of all three dimensions simultaneously and their interrelationships as well; a

possible explanation of the differences between the multivariate and univariate procedures.

This does raise the question of direct numeric comparability of the measures and, further, the

interpretability of the multivariate measures in terms of a threshold value that would indicate a

clinical need for re-calibration of landmark selection.

The reliability of all six soft-tissue landmarks in this study was excellent (ICC >0.98). Lagravere

(2010) warns that landmarks with a mean difference of 1-2mm are clinically acceptable,

however landmarks with mean differences >2mm should be used with caution. All the

9

landmarks examined in this reliability study met this clinically acceptable criterion with the

maximum mean Euclidean distance of 0.84mm (left alare). When Lee’s (1989) criteria were

applied, no landmark had the lower bound of the 95% confidence interval of the ICC below the

0.75 threshold. However, a significant difference existed between the two landmark selections

in the X dimensions of pronasion (p=0.0014). Also, the significant difference was in the

negative direction indicating a possible bias of the second X coordinate being systematically

greater than the first. Using Lee’s criteria, the reliability of the landmark pronasion may not be

acceptable and may warrant review of selection protocol and re-calibration. Left alare had the

largest variance of the Euclidean distance between selections of all the landmarks in the study.

The lowest mean within-coordinate ICC was found in the left alare landmark in response to two

of the lower ICCs in the Y and Z coordinates, left alare also had the highest mean Euclidean

distance between landmark selections. However the lowest multivariate ICC was found in the

pronasion landmark which was the only landmark with a significant difference between the

coordinates (X dimension). It is possible that the univariate and multivariate ICCs are sensitive

to different characteristics within the landmark data. Further work is needed to determine the

relationship and comparability of the within-coordinate measures of agreement and the unified

multivariate estimator of agreement as well as the interpretability and clinically acceptable

level of the multivariate intraclass correlation coefficient.

10

References

Adams, G. L., Gansky, S. A., Miller, A. J., Harrell, W. E., & Hatcher, D. C. (2004). Comparison between traditional 2-dimensional cephalometry and a 3-dimensional approach on human dry skulls. American Journal of Orthodontics and Dentofacial Orthopedics, 126(4), 397-409. doi:DOI: 10.1016/j.ajodo.2004.03.023

Bland, J. M., & Altman, D. G. (1990). A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Computers in Biology and Medicine, 20(5), 337-340. doi:DOI: 10.1016/0010-4825(90)90013-F

Chien, P., Parks, E., Eraso, F., Hartsfield, J., Roberts, W., & Ofner, S. (2009). Comparison of reliability in anatomical landmark identification using two-dimensional digital cephalometrics and three-dimensional cone beam computed tomography in vivo. Dentomaxillofacial Radiology, 38(5), 262-273. doi:10.1259/dmfr/81889955

Donner, A., & Bull, S. (1984). A comparison of significance-testing procedures for parent-child correlations computed from family data. Journal of the Royal Statistical Society.Series C (Applied Statistics), 33(3), pp. 278-284.

Donner, A., & Eliasziw, M. (1991). Methodology for inferences concerning familial correlations: A review. Journal of Clinical Epidemiology, 44(4-5), 449-455. doi:DOI: 10.1016/0895-4356(91)90084-M

Donner, A., & Eliasziw, M. (1991). Methodology for inferences concerning familial correlations: A review. Journal of Clinical Epidemiology, 44(4-5), 449-455. doi:DOI: 10.1016/0895-4356(91)90084-M

Fourie, Z., Damstra, J., Gerrits, P. O., & Ren, Y.Evaluation of anthropometric accuracy and reliability using different three-dimensional scanning systems. Forensic Science International, In Press, Corrected Proof doi:DOI: 10.1016/j.forsciint.2010.09.018

Haggard, EA. Intraclass Correlation and the Analysis of Variance. New York: Dryden. 1958.

Konishi, S. (1982). Asymptotic properties of estimators of interclass correlation from familial data Springer Netherlands. doi:10.1007/BF02481048

Konishi, S. (1985). Testing hypotheses about interclass correlations from familial data. Biometrics, 41(1), pp. 167-176.

Lagravère, M. O., Gordon, J. M., Guedes, I. H., Flores-Mir, C., Carey, J. P., Heo, G., & Major, P. W. (2009). Reliability of traditional cephalometric landmarks as seen in three-dimensional analysis in maxillary expansion treatments. The Angle Orthodontist, 79(6), 1047-1056. doi:10.2319/010509-10R.1

Lagravère, M. O., Low, C., Flores-Mir, C., Chung, R., Carey, J. P., Heo, G., & Major, P. W. (2010). Intraexaminer and interexaminer reliabilities of landmark identification on digitized lateral cephalograms and formatted 3-dimensional cone-beam computerized tomography images. American Journal of Orthodontics and Dentofacial Orthopedics, 137(5), 598-604. doi:DOI: 10.1016/j.ajodo.2008.07.018

Lee, J., Koh, D., & Ong, C. N. (1989). Statistical evaluation of agreement between two methods for measuring a quantitative variable. Computers in Biology and Medicine, 19(1), 61-70. doi:DOI: 10.1016/0010-4825(89)90036-X

Lele S, Richtsmeier J. An Invariant Approach to Statistical Analysis of Shapes. Chapman & Hall: Boca Raton, 2001.

Mian, I. U. H., & Shoukri, M. M. (1997). Statistical analysis of intraclass correlations from multiple samples with applications to arterial blood pressure data. Statistics in Medicine, 16(13), 1497-1514. doi:10.1002/(SICI)1097-0258(19970715)16:13<1497::AID-SIM569>3.0.CO;2-7

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. doi:10.1037/0033-2909.86.2.420

11

Figure 1. Facial Soft Tissue Landmarks

Table 1. Intraclass Correlation Coefficients for Each Landmark By Coordinate and Multivariate Estimate (Adjusted Measures)

Landmark ICC 95% CI P value Mean Multivariate

ICC

1 Left Alare

X 0.9997 ( 0.9991, 0.9999 ) <0.0001

0.9912 0.9999 Y 0.9841 ( 0.9546, 0.9945 ) <0.0001

Z 0.9899 ( 0.9712, 0.9965 ) <0.0001

2 Labiale Superius

X 0.9960 ( 0.9885, 0.9986 ) <0.0001

0.9962 0.9994 Y 0.9972 ( 0.9918, 0.9990 ) <0.0001

Z 0.9954 ( 0.9868, 0.9984 ) <0.0001

3 Nasion

X 0.9996 ( 0.9987, 0.9998 ) <0.0001

0.9997 0.9999 Y 0.9999 ( 0.9996, 0.9999 ) <0.0001

Z 0.9996 ( 0.9987, 0.9998 ) <0.0001

4 Pronasion

X 0.9978 ( 0.9936, 0.9992 ) <0.0001

0.9953 0.9991 Y 0.9976 ( 0.9931, 0.9992 ) <0.0001

Z 0.9905 ( 0.9923, 0.9991) <0.0001

5 Right Alare

X 0.9999 ( 0.9997, 0.9999 ) <0.0001

0.9980 0.9999 Y 0.9968 ( 0.9909, 0.9989 ) <0.0001

Z 0.9972 ( 0.9919, 0.9990 ) <0.0001

6 Labiale Inferius

X 0.9986 ( 0.9960, 0.9995 ) <0.0001

0.9990 0.9996 Y 0.9996 ( 0.9988, 0.9999 ) <0.0001

Z 0.9988 ( 0.9965, 0.9996 ) <0.0001 All p values <0.0001; these are significance probabilities associated with the test of the null hypothesis that the intraclass correlation is equal to zero.

Sub Nasale

Nasion

Pronasion

Labiale Superius

Labiale Inferius

Right Alare

Left Alare

12

Table 2. Summary of Within-coordinate Differences (mm) Between Dual Landmark Selections

Landmark Coordinate Mean St. Dev. Min Max Median

T-test or Wilcoxon*

P-value

1 Left Alare

X 0.04 0.27 -0.70 0.48 0.07 0.5838

Y 0.04 0.85 -1.22 2.47 -0.02 0.7615*

Z -0.01 0.70 -1.34 1.43 -0.17 0.9409

2 Labiale Superius

X -0.19 0.47 -0.98 0.64 -0.18 0.3806

Y -0.11 0.75 -2.01 0.88 -0.09 0.2826

Z 0.03 0.28 -0.66 0.55 0.06 0.3608

3 Nasion

X -0.10 0.35 -0.73 0.55 -0.05 0.2713

Y -0.04 0.50 -0.94 0.79 0.03 0.7843

Z 0.01 0.10 -0.25 0.15 0.02 0.6548

4 Pronasion

X -0.22 0.22 -0.51 0.14 -0.29 0.0014

Y 0.13 0.39 -0.61 0.69 0.18 0.2029

Z 0.04 0.19 -0.32 0.52 0.02 0.3911

5 Right Alare

X 0.01 0.16 -0.27 0.30 0.00 0.8771

Y 0.16 0.39 -0.40 0.97 0.06 0.1279

Z 0.09 0.34 -0.46 0.56 -0.03 0.3415

6 Labiale Inferius

X -0.09 0.40 -0.77 0.63 -0.12 0.9242

Y 0.14 0.45 -0.72 0.96 0.19 0.4622

Z -0.01 0.18 -0.38 0.22 0.03 0.1946

7 Subnasale

X -0.17 0.50 -1.12 0.59 -0.08 0.2101

Y 0.10 0.54 -0.73 1.07 0.02 0.5081

Z 0.30 0.81 -1.11 1.68 0.31 0.1749 Significance probabilities associated with the t-test of the null hypothesis that the mean is equal to zero, or the Wilcoxon* signed rank that the median difference is equal to zero (assuming symmetry).

Table 3. Summary of Euclidean Distances Between Dual Landmark Selections

Landmark N Mean St Dev. Median Minimum Maximum

Left Alare 15 0.84 0.73 0.51 0.11 2.85

Nasion 15 0.53 0.31 0.52 0.10 1.19

Pronasion 15 0.49 0.24 0.45 0.08 0.91

Right Alare 15 0.48 0.29 0.52 0.13 1.12

Labiale Inferius 15 0.58 0.24 0.51 0.30 1.12

Labiale Superius 15 0.78 0.51 0.70 0.18 2.33

Download - Reliability of Three-Dimensional Facial Landmarks Using ... · Reliability of Three-Dimensional Facial Landmarks Using Multivariate Intraclass Correlation Abstract The intraclass

Top Related