graphical markov models - chalmers · survival analysisx. also, further criteria are de-veloped to...

17
281 GR{PHICAL }IARKO\- },IODELS second edition appeared in 1952 [13]; seealso ref. [18]. It was a standard textbookin Canada and the USA and was translated into many languages. Goulden's career as a part-time teacher at the University of Manitoba continueduntil 1948, when he moved to Ottawa to become Chief of the Cereal Crops Division in what is now Agriculture Canada. This move was another turning point in Goulden's career; he became more absorbed in administrative duties and less involved in day-to-day research activities. In 1955 he was appointed Director of the Experi- mental Farms Service of Agriculture Canada and four years later was promoted to Assis- tant Deputy Minister with responsibilityfor the newly formed Research Branch of Agricultural Canada.As an administratorGoulden brought about several organizationalimprovements to the research arm of Agriculture Canada. He re- tired from the civil service in 1962 but contin- ued to be active, designingseveralexhibits for "Man the Provider" for the international exhi- bition, Expo 67, held in Montreal. Cyril Goulden was the recipient of many honors. He was an electedfellow of the Royal Society of Canada(1941), an electedfellow of the American Statistical Associationx (1952), and an honorary member of the StatisticalSo- ciety of Canada* (1981). In 1958he served as president of the Biometrics Societyx. He was awarded an honorary LL.D. from the Univer- sity of Saskatchewan (1954) and an honorary D.Sc. from the University of Manitoba (1964). In 1970 he receivedan Outstanding Achieve- ment Award from the University of Minnesota, and ten years later was named to the Canadian Agricultural Hall of Fame. References tll Bartlett, M. S. (1939). Statistics in theory and practice. Nature. 144. 199-800. l2l Box, J. F. (1978). R. A. Fisher: The Life of a Scientist. Wiley, New York. l3l Cochran, W. G. (1940). Review of Methods of Statistical Analysis. J. R. Statist. Soc. A, 103, 250-251. l4l Cochran, W. G. and Cox, G. M. (1950). Experi- mental Designs. Wiley, New York. t5l Goulden, C. H. (1927). Some applications of biometryto agronomic experiments. Sci.Agric., 7, 365-3',76. t6l Goulden,C. H. (1929). Statistical Methodsin Agronomic Research. . . a Report Presented at the AnnualConference of PlantBreeders, June, 1929. Canadian Seed Growers' Association, Ottawa. 171 Goulden, C. H. (1931). Modern methods of field experimentation. Sci. Agric., 11, 681-701. tSl Goulden,C. H. (1936). Methodsof Statistical Analysis. Burgess, Minneapolis. l9l Goulden,C. H. (1931). Methodsof Statistical Analysis, rev. ed. Burgess, Minneapolis. tl0l Goulden, C. H. (1937). Efficiency in field trials of pseudo-factorial incomplete randomized block methods. Can. J. Res.. 15. 231-241. tl1l Goulden, C. H. (1939). Methods of Statistical Analysis. Wiley, New York. Ll21 Goulden, C. H. (1944). A uniform method of analysis for square lattice experiments. Sci. Agric. 23, 115-136. t13l Goulden,C. H. (1952). Methodsof Statistical Analysis, 2nd ed. Wiley, New York. ll4l Goulden, C. H. and Elders, A. T. (1926). A sta- tisticalstudyof the characters of wheat varieties influencing yield.Sci. Agric., 6, 331-345. t15l Goulden, C. H., Neatby, K.W., andWelsh, J. N. (192$. The inheritance of resistance to Puccinia graminis tritici in a cross between two varieties of Triticum vulgare. Phytopathology, 18, 631 -658. tl6l Johnson, T. (1967). TheDominion Rust Research Laboratory,Winnipeg, Manitoba, 1925-195'/. Unpublished report seencourtesy of the Cana- dian Agriculture Research Station, Winnipeg. llll Kirk, L. E. and Goulden, C. H. (1925). Some statistical observations on a yield test of potato varieties. Sci. Agric.,6, 89-91. t18l Owen,A. R. G. (1953). Reviewof Methods of Statistical Analysis,2nd ed. J. R. Statist. Soc. A, 126,204-205. t19l Student (1926). Mathematicsand agronomy. J. Amer. Soc. Agron., 18,703-'719. L20l Yates, F. (1936). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci.,26, 424-455. DRvrnBplr-House GRAPHICAL MARKOV MODELS Graphical Markov models provide a flexible tool for formulating, analyzing, and interpreting relations among many variables. The models

Upload: others

Post on 17-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

281 GR{PHICAL }IARKO\- },IODELS

second edition appeared in 1952 [13]; see alsoref. [18]. It was a standard textbook in Canadaand the USA and was translated into manylanguages.

Goulden's career as a part-time teacher at theUniversity of Manitoba continued until 1948,when he moved to Ottawa to become Chiefof the Cereal Crops Division in what is nowAgriculture Canada. This move was anotherturning point in Goulden's career; he becamemore absorbed in administrative duties and lessinvolved in day-to-day research activities. In1955 he was appointed Director of the Experi-mental Farms Service of Agriculture Canadaand four years later was promoted to Assis-tant Deputy Minister with responsibility for thenewly formed Research Branch of AgriculturalCanada. As an administrator Goulden broughtabout several organizational improvements tothe research arm of Agriculture Canada. He re-tired from the civil service in 1962 but contin-ued to be active, designing several exhibits for"Man the Provider" for the international exhi-bition, Expo 67, held in Montreal.

Cyril Goulden was the recipient of manyhonors. He was an elected fellow of the RoyalSociety of Canada (1941), an elected fellow ofthe American Statistical Associationx (1952),and an honorary member of the Statistical So-ciety of Canada* (1981). In 1958 he served aspresident of the Biometrics Societyx. He wasawarded an honorary LL.D. from the Univer-sity of Saskatchewan (1954) and an honoraryD.Sc. from the University of Manitoba (1964).In 1970 he received an Outstanding Achieve-ment Award from the University of Minnesota,and ten years later was named to the CanadianAgricultural Hall of Fame.

References

tll Bartlett, M. S. (1939). Statistics in theory andpractice. Nature. 144. 199-800.

l2l Box, J. F. (1978). R. A. Fisher: The Life of aScientist. Wiley, New York.

l3l Cochran, W. G. (1940). Review of Methods ofStatistical Analysis. J. R. Statist. Soc. A, 103,250-251.

l4l Cochran, W. G. and Cox, G. M. (1950). Experi-mental Designs. Wiley, New York.

t5l Goulden, C. H. (1927). Some applications ofbiometry to agronomic experiments. Sci. Agric.,7, 365-3',76.

t6l Goulden, C. H. (1929). Statistical Methods inAgronomic Research. . . a Report Presented at theAnnual Conference of Plant Breeders, June, 1929.Canadian Seed Growers' Association, Ottawa.

171 Goulden, C. H. (1931). Modern methods of fieldexperimentation. Sci. Agric., 11, 681-701.

tSl Goulden, C. H. (1936). Methods of StatisticalAnalysis. Burgess, Minneapolis.

l9l Goulden, C. H. (1931). Methods of StatisticalAnalysis, rev. ed. Burgess, Minneapolis.

tl0l Goulden, C. H. (1937). Efficiency in field trialsof pseudo-factorial incomplete randomized blockmethods. Can. J. Res.. 15. 231-241.

tl1l Goulden, C. H. (1939). Methods of StatisticalAnalysis. Wiley, New York.

Ll21 Goulden, C. H. (1944). A uniform method ofanalysis for square lattice experiments. Sci. Agric.23, 115-136.

t13l Goulden, C. H. (1952). Methods of StatisticalAnalysis, 2nd ed. Wiley, New York.

ll4l Goulden, C. H. and Elders, A. T. (1926). A sta-tistical study of the characters of wheat varietiesinfluencing yield. Sci. Agric., 6, 331-345.

t15l Goulden, C. H., Neatby, K.W., and Welsh, J. N.(192$. The inheritance of resistance to Pucciniagraminis tritici in a cross between two varieties ofTriticum vulgare. Phytopathology, 18, 631 -658.

tl6l Johnson, T. (1967). The Dominion Rust ResearchLaboratory, Winnipeg, Manitoba, 1925-195'/.Unpublished report seen courtesy of the Cana-dian Agriculture Research Station, Winnipeg.

lll l Kirk, L. E. and Goulden, C. H. (1925). Somestatistical observations on a yield test of potatovarieties. Sci. Agric., 6, 89-91.

t18l Owen, A. R. G. (1953). Review of Methods ofStatistical Analysis,2nd ed. J. R. Statist. Soc. A,126,204-205.

t19l Student (1926). Mathematics and agronomy.J. Amer. Soc. Agron., 18,703-'719.

L20l Yates, F. (1936). A new method of arrangingvariety trials involving a large number ofvarieties. J. Agric. Sci.,26, 424-455.

DRvrn Bplr-House

GRAPHICAL MARKOV MODELS

Graphical Markov models provide a flexibletool for formulating, analyzing, and interpretingrelations among many variables. The models

beruflich
In: Encyclopedia of Statistical Sciences. S. Kotz, C. Read and D. Banks (eds). New York: Wiley, Second Update Volume, 284-300.
Page 2: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

combine and generallze at least three dif-ferent concepts developed at the turn of thelast century: using graphs, in which variablesare represented by nodes, to characterize andstudy processes by which joint distributionsmay have been generated (Sewell Wright [60,611), simplifying a joint distribution with thehelp of conditional independences (AndreiMarkov* [36]), and specifying associationsonly for those variables which are in somesense nearest neighbors* in a graph (WillardGibbs t21l).

Graphical Markov models are used now inmany different areas, such as in expert sys-tems* (Pearl [41], Neapolitan [38], Spiegel-halter et al. l45l), in decision analysis (Oliverand Smith [39]), for extensions of the notionof probability (Almond [1]), for attempts tomodel causal relations (Spirtes et al. [47]), andin multivariate statistics to set out and de-rive properties of structures (Lauritzen [30],Whittaker [57]) or to explain and summarizeobserved relations (Cox and Wermuth [15],Edwards [19]). We emphasize here their use-fulness in observational studies*, where dataare obtained on a considerable number ofvariables for each individual under investi-gation, where the isolation of relations betweenthese variables is of main concern, and whereavailable subject-matter knowledge is to beintegrated well into model formulation, analy-sis, and interpretation. We illustrate in particularhow graphical Markov models can aid

f . in setting up a first ordering of the variablesunder study to reflect knowledge about re-sponse variables of primary and secondaryinterest, about one or more levels of inter-mediate variables and purely explanatoryvariables,

2. in specifying hypotheses resulting fromprevious investigations,

3. in providing an overuiew of the analysesto be carried out,

4. in summarizing and interpreting the com-pleted analysis, and

5. in predicting results in related investiga-tions involving the same variables or a se-lection of the same variables.

GRAPHICAL MARKOV MODELS 285

The last important feature can also be madeavailable to a number of traditional statisticalmodels if they can be viewed [56] as specialcases of graphical Markov models.

Most of the general concepts are introducedin the next two sections with the help of twospecific research problems. A brief historicalview is given first.

The geneticist Sewell Wright used directedgraphs, which he called path diagrams, to de-scribe hypothesized linear generating processes;he suggested estimating corresponding path co-efficients and judging the goodness of fit ofthe process by comparing observed marginalcorrelations with those he derived as impliedby the hypothesized process (see PATH ANALY-SIS). It was not until much later that his es-timated coefficients were identified by Tukey

[50] as least squares regression coefflcients ofvariables standardized to have mean zero andvariance one, and his goodness-of-flt criterionwas derived by Wermuth [52] as Wilks' likeli-hood ratio test [58] for Gaussian variables, pro-vided the process can be represented by whatis now called a decomposable concentrationgraph. These tests were not yet improved byBartlett's adjustment* lI2).The same type ofpath analysis models were propagated in econo-metrics by Wold [59] and extended to discretevariables by Goodman [24] to be used in soci-ology and political science.

The notion of having a generating processwhich admits a causal interpretation was givenup in extensions of path analysis aimed atmodeling proper joint responses: extensions tosimultaneous equation models 1261, to linearstructural equations* l21l and to chain graphmodels [32]. Similarities and distinctions be-tween the different approaches were derivedmuch later [53].

The notion of conditional independence*,which had been used by the probabilist Markovto simplify seemingly complex processes, wasstudied in detail by Dawid [17] and related bySpeed [43] and Darroch et al. [16] to the undi-rected graphs which Gibbs had used to deter-mine the total energy in a system of particlessuch as atoms of a gas.

Measures of association in graphical Markovmodels depend on the type of variables in-

Page 3: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

GRAPHICAL MARKOV MODELS

volved in a system. For instance, for Gaussianvariables they are linear regression or corre-lation coefficients, for discrete variables theyare odds ratios* studied early by Yule [62] andBartlett [5], and for mixed variables they arebased on different measures discussed by Olkinand Tate [40] and Cox [10].

Models with several variables which arenow seen as special cases of graphical Markovmodels were developed quite separately, forinstance, as linear models for contingency ta-bles* [6, 23, l I , ] l and as covariance se-lection models for Gaussian variables byDempster i181. Analogies between indepen-dence interpretations and between likelihoodratio tests in these two model classes wererecognized later by Wermuth [51]. Similarly,linear structure models in covariances 12, 27)which admit an independence interprerationwere integrated into graphical Markov modelsby Cox and Wermuth [13] only many years af-ter they had flrst been proposed, as were manyof the generalized linear models* introducedby McCullagh and Nelder [37].

Of special importance for sparse data aremodels which permit explicit maximum like-lihood estimation of parameters and testing ofgoodness of fit by considering only subsets ofvariables. Such decompositions of joint distri-butions were first studied by Haberman [25] andSundberg [48] for contingency tables, by Speedand Kiiveri l14l for covariance selection, andby Lauritzen and Wermuth l32l for models withboth discrete and Gaussian variables. Efficientalgorithms for deciding on this property of amodel from its graph have been designed [49,33, 34).

Models for the same set of variables maydiffer with respect to their defining parameters.Then they typically correspond to differentgraphical representations, but they may never-theless imply the same set of independencestatements. This agreement has been termedindependence equivalence of models and is oneimportant aspect of the stronger requirement ofequivalence in distributions. Early first resultsby Wermuth and Lauritzen 152, 55, 321 weregeneralized by Frydenberg [20J.

This last is one active research area. Anotheris to incorporate latent variables into the model

formulations and to integrate time series andsurvival analysisx. Also, further criteria are de-veloped to read off a graph which conditionalindependences and which conditional associa-tions are implied. Some of the available cr-iteriaare described here in the next-to-last section.

AN INTERVENTION STUDY

A first illustration of the use of graphicalMarkov models in observational studies is anintervention study involving 85 chronic painpatients and observations before and after threeweeks of stationary treatment [35, l5]. Therewere three main objectives. We wanted to seewhether results reported in earlier studies canbe replicated, whether it is worthwhile to studymeasures of well-being of a chronic pain pa-tient other than self-reported treatment success,and whether a psychotherapeutic placebo inter-vention shows sizeable effects.

This placebo intervention, offered to aboutone-third of the patients, consisted of telephonecontacts, solely involving general informationand interest in the patient's well-being, offeredby one physician over a period of three monthsafter stationary treatment, together with the op-tion for the patient of phoning this physicianagain.

The available knowledge about the variablesunder study and decisions about potential rele-vant explanatory variables lead to the orderingof the variables in Fig. 1. Such an ordering iscalled a dependence chain, because the vari-ables are arranged in sequence, in a chain ofboxes. One variable is binary, the additional in-tervention, A; all25 other variables are quanti-tative measurements.

There are three response variables of primaryinterest, measuring different aspects of the pa-tient's well-being three months after the patienthas left the hospital: the patient's own judge-ment of success of the stationary treatment, f;the self-reported typical type of pain, Zo; and adepression score, Xo. Whenever there are sev-eral variables that we want to treat on an equalfooting in the sense of not specifying one vari-able as a response and another as an explana-tory variable, then we list them in the same

Page 4: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

Well-being,aftertreatment:

zo,type of painY,success oftreatmentxo,depression

GRAPHICAL MARKOV MODELS 287

Personali tytraits:

Anxiety,V1

Anger,V2, V3, V4

Med ica lbackgroundvs, vo

Demo-graphicalbackgroundV7, Vg, Vg

Figure I First ordering of types of variable for chronic-pain patients: patient's well-being threeweeks after stationary treatment (box a) and before (box d)', an intervention (box b) interrnediate forthe former and response to the latter. The stage of chronic pain, U, and the coping strategies Siare stacked to display hypothesized independence given pretreatment well-being (box d), personalitycharacteristics, and medical and demographic background variables (box e). The double-lined boxindicates relations taken as siven.

(e)

box. This is to imply that we investigate aspectsof their joint (conditional) distribution given allvariables listed in boxes to the right.

Several variables are intermediate, becausethey are taken as potentially explanatory forsome variables in the system (listed in boxesto the left) and as response to others (listedin boxes to the right). Depression before treat-ment, Xa,, for instance, is considered here asexplanatory for chronicity of pain, U, but asa response to certain patient characteristics,Vt,, . . . ,Vg, and, as another measure of well-being of the patient before treatment, on anequal footing with typical intensity of parn, Za.

In a previous investigation it had beenclaimed that certain strategies to cope withpain Sr, . . . , Sr0 are independent of chronic-ity of pain, U, given pretreatment conditions(2a., X), personality characteristics, and medi-cal and demographic information about the pa-t ient Vt, . . . ,Vg. The two stacked boxes indicatethis hypothesized conditional independence.

The set of variables to be taken as purelyexplanatory is listed to the far right end of thesequence, in a doubly lined box, to indicate thattheir relations are taken as given without beinganalyzed in much detail.

We illustrate two typical steps of a full analy-sis for these data: checks for possible interactiveeffects and plots showing the direction and sizeof some of the estimated conditional interac-tions. For a full analysis of joint responses a de-

cision has to be made whether other responseson an equal footing are to be included in regres-sion equations as additional regressors or not.If prediction of responses is an aim, the lat-ter appears often most appropriate, and it leadsto what we call a multivariate regression* ap-proach. In this, each of the joint responses istaken separately in turn, to decide which of thepotentially important explanatory variables arein fact directly important. Next it is checkedwhether any important symmetric associationsremain among the responses after the regres-sions. To summarrze such analyses, relations toimportant regressors enter as arrows betweenboxes and important symmetric associations aslines within boxes to give what is called in itsmost general form a joint response chain graph.

Plots to check systematically for nonlinearrelations have recently been proposed by Coxand Wermuth ll4l.Two such plots, displaying/-statistics for interactive effects from trivari-ate marginal distributions, are shown in Fig. 2.Most /-statistics, even if they are considerablylarger than 2, he along the line of unit slope inFrg. 2a and may therefore be interpreted as be-ing well compatible with random variation. Afew statistics deviate from the unit line. Theyindicate the presence of interactions, which areto be identified. Figure 2b shows that these mar-ginal interactions do not involve chronicity ofpain, U, coping strategies, or earlier variables(listed in boxes c, d,, e),, since the largest t-

Wel l -being,beforetreatment

Zd,type ofpa in

Xd,depression

Page 5: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

0rdered f-statistic

GRAPHICAL MARKOV MODELS

- 22

Expected normal order statistic( a J

values in this plot are small, that is, they arein absolute value near 2.

Two estimated interactive effects are dis-played in Fig. 3. All other directly importantexplanatory variables are there kept at their av-erage level, that is, mean values are insertedfor all remaining regressors in the selected re-gression equation. Figure 3a shows that thepsychotherapeutic placebo intervention, A, hasessentially a positive effect on reported treat-ment success for patients with low pain inten-sity 7a - s., but a negative effect for patientswith typically high intensity of pain. Figure 3bshows that the hypothesized conditional inde-pendence between stage of chronic pain andstrategies to cope has to be rejected: the riskfor higher chronicity decreases with incleasinguse of one of these strategies by patients withlow pain intensity, but it increases with use of

Y. success of treatment

-22

Expected normal order statistic(b)

the same strategy by patients with high painintensity.

In this study the additional intervention isimportant for each of the three responses, theparticular effect type depending on specificcharacteristics of the patient and on the cop-ing strategies of the patient. The estimated re-lations, however, appear to be fairly complexrelative to the sample size. For instance, forthe self-reported treatment success there arethree sizable interactive effects, and for typicalpain intensity there are two alternative regres-sion equations, which fit the data equally welland which both permit plausible interpretations.Hence it appears best to try to replicate someof the results with a reanalysis of previous dataand in a study with more and different patients.instead of attempting to summarize results witha graph.

5101520Ss, resignation

(b)

Figure 2 Normal probability plot of t-statistics of cross-product terms in trivariate regressions forvariables on chronic pain. (a) Piot for all variables. (b) PIot for stage ofchronic pain, U, coping strategiesS;, and their potential explanatory variables; each triple includes U.

iIIsres.

o . . . - -

"Q 24 - s

o-- - - - - - - - - - - - - - - - -a 7a

26,Iypicalpain

n0 yes

A, intervention(4.)

Figure 3 Piots to interpret interactive effects on two responses in treatment of chronic-pain data.Remaining directly explanatory variables are fixed at level I or at mean. (a) Response is Y, (b) responseis U. Standard deviation of residuals in the regression is denoted by s.",; observed mean and standarddeviation of a variable X are denoted by -, s, respectively.

Ordered t-statistic

U, chronici ty of pain

Page 6: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

A COHORT STUDY

The second example is a cohort study 122,l5l,in which the main aim is to identify impor-tant developments in a school and a studentcareer which might increase the risk that a stu-dent stops studying without having received adegree.

Complete records were available for 2339high school students on their average perfor-mance in high school and on questions regard-ing school career and demographic background.During their first year at university, responsesto psychological questionnaires were obtained,and it was finally recorded whether studentssuccessfully completed their studies or droppedout of university.

Figure 4 shows the first ordering of the ob-served variables, based largely on the timesequence involved; it implies that relations be-tween the variables are to be studied in five con-ditional distributions. For the variable in box a,all other variables are taken as potentially ex-planatory. The variables in box b are treatedon an equal footing and are considered condi-tionally given those in boxes c, d, e,/. For thevariable in box c, those in boxes d, e, f maybe explanatory, but not those listed in boxes a,b, and so on. Finally, the variables in box/aretreated as purely explanatory with associationsnot to be further specified by any model.

Checks for nonlinearities gave no indica-tion of any strong interactions or nonlinear de-

GRAPHICAL MARKOV MODELS 289

pendences among the ten variables, of whichthree are quantitative measurements (I, X, Dand seven are binary variables (A, B, . . . , F).The procedure followed to study the relationsamong the variables via separate logistic andlinear regression, as well as log-linear con-tingency table analyses, is summarized in thegraphs of Figs. 5 and 6.

In studying the relations among the joint

responses, X, Y, Z, we did not intend to iden-tify good predictors, so that a block regres-sion approach has been used. In it the othercomponents of a response vector serve asadditional regressors for each component con-sidered in turn as a single response. This leadstypically to simpler independence structures,that is, to larger sets of independences im-plied by a system. The reason is that fewerof the potentially important regressors becomedirectly important if information on other re-sponses on an equal footing is available in aregression equation.

Even with as many as ten variables, itwould be possible to summarrze the resultsof the analyses with a single chain graph, butwe present separate graphs for the universityvariables given earlier variables and for thehigh school variables given the demographicbackground information in Figs. 7 and 8.

This not only is clearer, but illustrates oneof the features of chain graph representations:separate analyses may be integrated into largergraphs in different ways to emphasize special

Figure 4 First ordering of variables for university students. Variable A, dropout at university (box a), isthe response variable of primary interest; for instance, B, change of high school (box d), is an intermediatevariable, potentially explanatory for dropping out at university (box a), f.or the student's attitudes towardshis study situation (box b), and for grades at high school (box c), and also a potential response to otherschool-career and demographic variables (boxes e,/). Variables are treated on an equal footing in boxesb, e because no direction of dependence is specified-in box / because variables are purely explanatory.

X,motivation

Y

expectedach ievement

Z,integrationinto studentgroup

U,averagegrade,last threeyears ofhigh school

C,integrationinto high-school class

D ,high-schoolclassrepeated

E ,change ofprimaryschool

F,educationof father

Page 7: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

A-*

a<-.

Xu cF2"ooor oo oOz D E

GRAPHICAL MARKOV MODELS

Selected regression models

A:Y+Z+U

Figure 5 Regression graphs showing results of analyses for university variables in cohort study.

Selected regressions

U: C+D+F

Figure 6 Regression graphs andvariables in cohort study.

B: D+E+F

concentration graph showing

Selected log-l inear model

CD,DE,EF

results of analyses for high school

Figure 7 Independence graph for university variables given high school and background variables.

Figure 8 Independence graph for high-school vari-ables given demographic background variables.

aspects, here the university components and thehigh school career.

A short qualitative interpretation of thegraphs is that dropout, A, is directly influ-enced by the attitudinal variables Y, Z and byacademic performance in the final years of highschool as measured by the average mark, U,

and that these variables in turn depend on someof the earlier properties.

A next step is to interpret the regression re-sults more quantitatively, in particular the logis-tic regression* for A. One way of appreciatingthe sizes of the individual logistic regressioncoefficients is via Fig. 9, which shows the es-timated probability for the three important ex-planatory variables, again in each plot holdingthe other two variables fixed at their means.

It is a matter of judgement when and how tointroduce directed relations between the compo-nents of the psychological questionnaire scoresX, Y, Z, but, partly because X is not directly ex-planatory for the primary response A and partlybecause of the simpiification achieved, we have

B Z 5

p-r E

Page 8: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

GRAPHICAL MARKOV MODELS

A, Estimated probability of dropout A, Estimated probability of dropout A, Estimated probability of dropout

291

48

Z, integration into student group

48

{ self-expected achievement

24

U, average mark, years 11-13

graphs later, but give some formal definitionsnext.

SOME GRAPH TERMINOLOGY

For graphical Markov models, p nodes V :

{1,.. . , p} rn a graph denote random variablesYr,.. . ,Yr. There is at most one edge i, j be-tween each pair of nodes i and j. Edges rep-resent conditional association parameters in thedistribution of Yy. An edge may be directed andthen drawn as an arrow, or it may be undirectedand then drawn as a line. If a graph has onlylines, then it is an undirected graph. The graphis fully directed if all its edges are arrows, andpartially directed if some of its edges are linesand some are arrows.

An edge 1,7 has no orientation rf it is a line;it has one of two possible orientatiorzs if it isan affow, either pointing from 7 to i or point-ing from i to j. A path of length n - 1 is asequence of nodes (1r , . . . , in) wi th successiveedges \ i , , i ,+1) present in the graph; this is ir-respective of the orientation of the edges. Thegraph obtained from any given one by ignor-

Figure 9 Fitted probabiiity of university dropout, A, versus the directiy explanatory variables in turnln each plot two other variables are fixed at the mean; lower marks correspond to higher achievement.

explored treating Y as a response to Z and to U,ignoring X, that is, marginahzing over it. Simi-larly, we have marginalized over the variable C.This leads to the chain of Fig. 10, which showsonly univariate responses.

An indirect path like the path from E to Dto B to Z to A may be interpreted as follows:change of primary school, E, increases the riskthat a high school class will have to be repeated,D, which in turn increases the risk that thestudent will change high school at least onceduring his school career, B. Once a high schoolchange has been experienced, it becomes lesslikely that a student integrates well into his laterstudent group (Q, and this in turn is a direct riskfactor for A, leaving university without havingobtained a degree. The overall effect of sucha path may be moderate, however, if some ofthe relations along such a path are of moderatestrength. This does not show in the graph, butis the case here for the edge DE.' The possibility of using the graph to infer

additional independences and induced depen-dences is an additional appealing feature ofgraphical Markov models. We describe these inmore detail for univariate-recursive resression

Figure 10 Independence graph for subset of variables concerning university dropout. Consistent withthe results of Fies. 7 and 8 after marsinalizins over variables X and C.

Page 9: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

292 GRAPHICAL MARKOV MODELS

ing type and orientation of edges is called itsskeleton.

Two nodes i, j are said to be adjacent orneighbors if they are connected by an edge, andthey have a common neighbor r if r is adjacentto both i and j. Three types of common neighbornodes t of i,7 can be distinguished in a directedpath as shown in Fig. lla to c. Two anowspoint to i and j from a source node t (Fig. 11a);a transition node t has one incoming arrow, sayfrom7, and one outgoing arrow (Fig. 11b); anda sink node t has two arrows pointing at it fromeach of i and j (Fig. 11c). Because two affowsmeet head on at a sink node. it is also calleda collision node.

A graph constructed from a given one bykeeping nodes and edges present within a se-lected subset S ofnodes is an induced subgraph.If the induced subgraph of three nodes i, t, j hasexactly two edges, it is a Y-configuration, andif it is one of the paths of Fig. IIa, b, c, rtrs source-, transition-, ot sink-oriented, respec-tively. Similarly, a subgraph as in Fig. ild iscalled a sink-oriented U -confi guration.

A path containing a collision node is a col-lision path, and a path is said to be collision-/ess otherwise. A path of arrows leading to IfromT via transition nodes is called a direction-preserving path. In such a path I rs a descendantof node,/, and node 7 LS an ancestor of i.

A cycle is a path leading from a node backto itself and a directed cycle is a (partially)oriented cycle without a sink-oriented V- or U-configuration along it.

Induced subgraphs which are complete in thesense of having all nodes joined but whichbecome incomplete if even one other node isadded are the cliques of the graph. The set ofcliques of a graph without arrows and only fulllines, points to the set of minimal sufficient sta-tistics for a corresponding exponential family

model. This special role of nearest neighborsis closely related to simplifying log-linear con-tingency table analyses and to the Hammers-ley-Clifford theorem* [16].

JOINT.RESPONSE GRAPHS ANDGAUSSIAN SYSTEMS

The arrangement of the variables for a joint-response chain graph is, here, into boxes fromleft to right, starting with responses of pri-mary interest. There are in general severalnodes within each box corresponding to jointresponses, i.e., to variables to be considered onequal footing. This corresponds to an orderedpartition of the set y of all p nodes into subsetsas V : (a ,b,c , . . . ) , to the dependence chain.Different types of edge [15] add flexibility informulating distinct structures.

Arrows pointing to any one box are eitherall dashed or all full arrows. Dashed arrows toa node i indicate that regression of the singlevariable I; on variables in boxes to the right of iis considered, whereas full arrows mean that theregression is taken both on variables in boxesto the right of i and on the variables in the samebox as L In this way dashed arrows indicate amultivariate regression and full arrows point toa block regression approach.

Within a box there is either a full-line graphor a dashed-line graph, each considered condi-tionally given variables in boxes to the right.For instance, a dashed line for a pair (i,7) inbox b means the presence of conditional asso-ciation between Yi and Y; given Y.,.. . ; a fullline for a pair (i,7) in box b means the pres-ence of conditional association between I; andY; given Ya\ri . ir , Y., . . .

" \ L ' ! J i

Block regressions are combined with full-line response boxes, while for a multivariateregression there is a choice between a full-

, \ r / , ' \

/ '

, \ / , L l l it ll lVV\ . . . - -t

(a) (b)

t

(cl (d)

Figure 11 Some V-configurations with I (a) a source node, (b) a transition node, (c) a sink or collisionnode, (d) a sink-oriented U-confisuration.

Page 10: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

and a dashed-line graph for the correspond-ing joint responses. Figure 12 shows two joint-response graphs having the same skeleton butcorresponding to different models.

To relate formally the correspondingparametrization in a joint Gaussian distribu-tion it is convenient to think of the p X 1vector variable Y as having mean zero andbeing partitioned into component columnvectors Yo,Yb, . . . , o f d imensions po X l ,Pu x 1 , . " .

For a dependence chain of three elements letthe covariance matrix I and the concentrationmatrix I-l be partitioned accordingly as

/ s5s \I - a a - a D a a c \

s - l s s I- l - |

2 b b z ' b c l ,

\ . 2 , , /

/ S a a S a b \ a c \l -

1 L l \

s - l - | s b b S b c I- - t - 2 t .

t l

\ >"/A zero off-diagonal element in >a marginal independence; a zero off-diagonalelement in I-l implies a conditional indepen-dence statement given all other remaining vari-ables [51; 15, p. 69]. Independence statementswith other conditioning sets arise as zero re-gression coefficients in different regressions asfollows.

From regressions of each component of Y6on Y., regression coefficients are obtained (inthe position of Ia. by sweeping > on c orresweeping I- l on (a,b) in a p6 X p. matrixBa l . :

Bbb : I r . I _ ' : - (2bb .a ) - ' 2u , . " .

GRAPHICAL MARKOV MODELS 293

In this each row corresponds to regression co-efficients of Y6 in the regression of one of thecomponents of Y, on Y6 ' 2)bb a,Zbr'o denotesubmatrices of the concentration matrix in themarginal joint distribution of Y6, Y..

From regressions of each component of Yoon both Y6 and Y. we get (in the positions of\ob,2o, by sweeping I on b, c or resweepingI-1 on a) the po x (po + p,) matrix

Bolb, : (Bob.r ,Bol r .u)

: (2or,,>";(1ol 3r. )- '\ 4 c b a c c /

: - (>aa)-r (z"u,>, . ) .

Then, the parameters of a pure multivariate-regression chain of only dashed edges resultfrom the transformation Z : AY, and the pa-rameters of a pure block-regression chain ofonly full edges, such as shown in Fig. l2a, re-sult from the further transformation X : DZ,where D : T-1 is a block-diagonal matrixcontaining concentration matrices of the distri-butions of Y, given Yr and Y., of Y6 givenY., and of Y., and

( 1 " . - B o l a . , - B , t . r \

A:10 raa -Br t . I ,

\0 0 r , , /

/Zoo.u, 0 0 \T:l o 2uu, o l .

\ o o >,,)(>" 0 0\

D: l s 2 luua o l .\ O

g S i c c ' a u f

(b)(a)

Figure 12. Two joint-response chain graphs with the same arrows and lines but different typesof edges, implying different independence statements. For instance, absence of an edge tor (yt,yz)means in (a ) Iy t tYz l (Y2,X1, . . . ,U : ) , in (b ) Y t t tYz l (Xr , . . . ,U) ; absence o f an edge fo r (Xr ,U: )means in (a) x1 tt uzl(Xz,xs,ur,u), in (b) xr tL uzl(ut, uz); absence of an edge for (xz,x) meansX2 x X3l(Xr ,Ur,Uz, U3) in both.

Page 11: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

GRAPHICAL MARKOV MODELS

Here identity block matrices are denoted by I.The regression coefficients of a multivariate-regression chain are in A; those of a full block-regression chain are obtained from the ma-trix DA after dividing each row by the corre-sponding diagonal element [53]. The varianceof Z is Ar>A : T; the variance of X is Dand coincides with the diasonal block matricesof DA.

The parameters in a mixed dashed- and full-edge response graph coincide with those ofa block-regression chain for full lines andwith those of a multivariate regression chainfor dashed edges. For instance, the model ofFig. l2b is like a multivariate regression chain,except that for the distribution of Y6 givenY. the inverse covariance matrix is consideredinstead of the covariance matrix. i.e..

I - l - / t , , - s , ; - l s , \ - l- b b . c \ - D o a D C - c c - c D I

- 9 b b . a - t b b _ t b o 1 5 a a 1 - l y a b

The transformation from covariance to con-centration parameters introduces no complica-tions, since mean and concentration parametersvary independently. This property holds moregenerally for mixed parametrrzations with mo-ment and canonical parameters in exponentialfamilies (Barndorff-Nielsen 14, p. 122)). Forjoint Gaussian distributions the numbers of pa-rameters obtained with complete joint-responsemodels, i.e., with unrestricted regression chains,coincide in the three different approachesdescribed.

Full-edge joint-response graphs for discreteand continuous variables may correspond tojoint conditional Gaussian distributions [32],in which null values of interaction parametersindicate the independences specified with thegraph. With nonlinear and higher-order interac-tive effects permitted, the individual regressionsspecified with a graph capture independences insome non-Gaussian joint distribution.

Subclasses of joint-response chain graphsarise as follows. If the set of nodes, % is notpartitioned, so that there is just one box andno alrows, the dashed-line graph is a covari-ance graph and the full-line graph a concentra-tion graph, because their edges correspond fora ioint Gaussian distribution to elements in the

overall covariance and concentration matrix, re-spectively. One main distinction is that concen-tration graphs with some edges missing point tosets of minimal sufficient statistics which sim-plify estimation, while this is not so for covari-ance graphs.

If V is partitioned into just two elements andthe distribution of the explanatory variables isregarded as fixed by drawing a double-lined boxfor them, then the joint-response chain graph isreduced to a regression graph. Finally, if thepartitioning is into single nodes, so that thereare no joint responses (i.e., there are as manyboxes as responses), then this is a univariate-recursive regression graph as in Fig. 10.

Some results are available on when twogeneral joint-response graphs with identicalskeletons imply the same set of independencestatements. Such a condition for independenceequivalence is that for any two full-edge graphs

[20] the sink-oriented V- and U-configurationscoincide. It explains, for instance, why the sub-graph of nodes C, D, E, F rn Fig. 8 has thesame independence interpretation as the graphfor these nodes in Fig. 6. For full-line, dashed-affow graphs another criterion involving V- andU-configurations has been given by Andersonet al. [3].

Joint-response graphs arranged in boxes donot contain directed cycles. Graphs with di-rected cycles, i.e., cyclic graphs, were studiedand related to some linear structural equationsby Spirtes [46] and Koster 1291. In them anedge may be missing but not correspond to anindependence statement.

TYPES OF FULLY DIRECTEDINDEPENDENCE GRAPHS

In a directed acyclic graph G[ur, all edges aredirected and there is no direction-preservingpath from a node back to itself. Given the setof ancestors adjacent to any node i, this graphdefines Y; to be conditionally independent ofthe remaining ancestors of i.

In the contexts we are concerned with, theorder of the variables is specified from subject-matter knowledge about the variables and, foreach response in turn, by decisions about sets of

Page 12: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

variables considered to be potentially explana-tory. A recursive ordering is indicated by draw-ing a cha in o f boxes a round nodes 1 , . . . , p .The order need not be complete, in the sensethat sets of several responses may be condition-ally independent given ali variables in boxes tothe right; then the nodes of these sets are drawnrn stacked boxes.If in Fig. 10 the edge E, F isoriented to point to E, then the fully directedgraph obtained by deleting all boxes is directedacyclic.

Given such an order, we have a univariate re-cursive regression system, that is, a sequence ofconditional distributions with fl regressed onYz , . . . ,Y r ; Yz reg ressed on Y3 , . . . ,Y p ; and soon up to Yr-1 regresSed on Yr. Each responseY; has then potentially explanatory variablesYi+1, . . . ,Yp, and we assume that i ts condi -tional dependence on Y1 given its remainingpotentially explanatory variables can be cap-tured by a set of parameters, the null values ofwhich imply the corresponding independencestatement, denoted by Yi -tL YjlYl;+r,...,r)\{j},for I { 7. That the independence structures ofa directed acyclic graph and of a univariate-recursive regression graph coincide followsfrom what has been called the equivalence oflocal and pairwise Markov properties (Lau-ritzen et al. [31]).

The graphical representation of such a sys-tem, the univariate recursive regression graph,is a directed acyclic graph with two additionalfeatures: each edge present represents a spe-cific nonvanishing conditional dependence, andeach edge absent represents one particular con-ditional independence statement for the variablepair involved. We then say that the joint distri-bution is generated over the given Gju,, or thatGju* is a generating graph, because it is to rep-resent a process by which the data could havebeen generated.

By considering Gju, as a generating graph,we mean that it is to have the same propertiesas a univariate recursive regression graph butwithout drawing the boxes; the order is thenoften indicated by a numbering of the nodes(1, . . . , p) whereby a set of r stacked nodes canobtain any one of the r! numberings possiblewithout affecting the independence structure ofthe system.

GRAPHICAL MARKOV MODELS 295

RELATIONS INDUCED BY DIRECTEDACYCLIC GENERATING GRAPHS

The possibility of reading all independencestatements directly off a graph has been madeavailable by so-called separation criteria for agraph. Such have been given to date for thetwo types of undirected graphs [16, 28], forpartially directed full-line graphs [20, 8], andfor directed acycl ic graphs f4l, 42,311. Theypermit one to conclude, under fairly generalconditions on the joint distributions, that Y4is independent of Ys flrven Y6, provided thatthe criterion is satisfied. The simplest crite-rion is for concentration graphs: C separatesA from B in Gf,"" for a nondegenerate jointdistribution if every path from A to B has anode in C.

The separation criterion for directed acyclicgraphs is more complex. To present it in a formwhich permits one to read induced associationsdirectly off the graph, we define a path to beactive whenever it is correlation-inducing inGaussian systems. More precisely, in a.jointGaussian distribution generated over Gju, anactive path between i and 7 relative to a setC introduces a nonzero component to pii.c ifthe path is stable, that is, if correlations toeach edge along it are strictly nonzero givenC. Such edges may either be present in Gju" orbe generated after conditioning on C.

A path between nodes i and j, i < j, in adirected acyclic graph Gju* is active relative toC if either

f. it is collisionless with everv node alons itoutside C, or

2. a collisionless path wholly outside C is gen-erated from it by completing with a line thenonadjacent nodes of every sink-orientedV-configuration having a descendant in C.

The definition is illustrated with Fig. 13.Note that a path is collisionless if I is a de-

scendant of 7 @ig. l3a) or if there is a sourcenode r which is an ancestor to both i andj (Fig. 13b). The contribution of such a pathto the conditional association between i and jgiven C is obtained by margrnalizing over allnodes along it. Similarly, the contribution of

Page 13: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

GRAPHICAL MARKOV MODELS

i+<- . . .< -+ - j i

(a)

i < - . . .+k+ .+ j

Figure 13 Active paths for a pair (l,7) relative to C: (a) collisionless active path with i descendantof j,(b) collisionless active path with i andT descendants of t (both have all nodes along it outside Q,and (c) active collision path: a collisionless path outside C is generated from it by conditioning on C,i.e., a path not touching the collision nodes c1 and / results after joining with a line the nonadjacentnodes in every sink-oriented V-configuration having a descendant in C.

(b)

an active collision path results by marginaliz-ing over all nodes along the collisionless path,which only gets generated after conditioning onnodes in C.

Separation effects in directed acyclic graphscan be specified for disjoint subsets A, B, C ofV as follows: Xa Jr YnlYc if in Gj"* there is noactive path between A and B relative to C.

A stronger result of asserting dependence ispossible whenever the joint distribution is non-degenerate and satisfies a condition which hasbeen called lack of parametric cancellatiorz. Westress that often representing dependences is asimportant as representing independences.

An association is induced by a generatinggraph Gf,u, for nondegenerate systems withoutparametric cancellation relative to C as follows:Yi andYl are conditionally dependent given 16'if in Gju, there is an active path between i and

7 relative to C.The stronger result applies in essence to other

than Gaussian systems provided they are quasi-linear, that is, any dependence present has alinear component such that the vanishing ofthe least squares regression coefficient impliesor closely approximates an independence state-ment. Excluded thereby are situations in whichdependences are so curved, or involve such aspecial high-order interaction, that they corre-spond to a vanishing correlation.

A parametric cancellation is a very specialconstellation among parameters such that anindependence statement holds even though itis not implied by the generating graph, thatis, even though it cannot be derived from theseparation criterion. Thus the specific numericalvalues of the parameters are such that an inde-

pendence arises that does not hold in generalfor structures associated with the given graph.

In a stable Gaussian system there is paramet-ric cancellation only if the contributions to pij.cof several active paths between i, 7 relative toC add up to give zero 1541. Examples are givenhere with the simple system of Fig. 14.

The graph of Fig. 14 specifies two inde-pendences: Xr x Xql(Xz,X3) and Xz )L XtlXq.The separation criterion tells us that the graphimplies no marginal independence for the pairXt,Xq and no conditional independence forXz,Xz given X1. Nevertheless, each of theseadditional independences may hold even ina stable Gaussian system whenever effectsof different paths cancel each other. Withthe special constellation of correlation coeffi-c i en t s g i ven by p t r : pB : p24 : p34 : p ,p23 : p2, and pA : 2p'10 + p2), it followsthat py.1 : 0, because then the contributionto pzt.t from the collisionless path (2,4,3)and from the collision path (2, 1,3) cancel.If, instead, the correlation coefficients satisfy

PtzPz+ : - PBp3q, i t fol lows that ps : 0,because the nonzero contribution to ps of thepath (1,2,4) cancels the nonzero contributionof the path (1,3,4) .

Figure 14 A simple stable Gaussian system in whicheffects of two paths may cancel: for ps those of paths(1,2,4) and (1,3,4), and for p23.1 those of paths (2,1,3)and (2,4,3).

Page 14: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

PREDICTION OF RESULTS INRELATED INVESTIGATIONS

We now illustrate the use of the results stated inthe preceding section for deriving consequencesof a given hypothesized generating process.

The simple generating graph in five nodes ofFig. l5a consists just of one path, so that therecan be no parametric cancellation in a corre-sponding stable system. This path is a collisionpath with node 1 being the single collision nodein the system. Thus, in this graph a collisionpath between any pair of nodes i,7 is active rela-tive to C whenever C includes node 1. Then thegenerating graph does not imply Yi tL Y jl16 forgeneral joint distributions, and it implies I; de-pendent on Y1 given Yg for a nondegeneratestable Gaussian system and for systems that be-have similarly, such as the following system ofmain-effect regressions in mixed variables.

Let the joint distribution be generated in stan-dardized variables Y, X, Z corresponding tonodes I,2,5 and in binary variables A, B, eachtaking values -1 and 1 with probability 0.5,such that the following set of regressions de-fines the joint distribution:

E {y l (X ,A ) } : pyxx * pyo i ,

varir l(X,A)) : | - p1, - p1o,

GRAPHICAL MARKOV MODELS 297

Then the. joint distribution has as generatinggraph Gju, of Fig. l5a, and it is quasilinear.

If we choose C to be the empty set and drawa dashed-line edge for any pair for which mar-ginal independence is not implied, the over-all induced covariance graph of Fig. 15c isobtained, Gf,"". Similarly, if for each i, 7 wetake C to be the set of all remaining nodes(C : y\{t,7}), and draw a full-line edge when-ever Yi tt Ylllv\{;,,,} is not implied, then theoverall induced concentration graph Gf,"" ofFig. 15c results.

The following construction criteria for thesegraphs can be formulated. A given generatinggraph Gju* induces an edge i, 7 in the overallcovariance graph GX", if and only if in thegenerating graph there is a collisionless pathbetween the two nodes, and it induces an edgei, j rn the overall concentration graph GJ"" ifand only if in the generating graph either i, 7is an edge or nodes I and 7 have a commoncollision node.

More generally, for any selected subset S ofall nodes..V (where S includes i, j ) a generaringgraph Gju. induces a dashed-line edge in theconditional covariance graph GSf for nodes l,

7 if and only if in the generating graph thereis an active path between i and j relative to Cand it induces a full-line edge in the conditionalconcentration graph GSf for nodes i, j if and,only if in the generating graph either i,7 is anedge or there is an active path between I and 7relative to S U C (Wermuth and Cox l54l).

Figure 16 shows a conditional covariancegraph and a marginal concentration graph in-duced by the generating graph of Fig. 15a.These are not induced subgraphs of the coffe-sponding overall graphs. For a stable Gaussian

AIX- / A l x ) \ T t l rl os i t ( n i i " ' l : los - :- - o - - \ - ' r t x

/ ' " D A l x

T -t lx

2Po*x

| - p 2 o " '

, . / B lx ) \lO9tt [ ?7' i r - I-

\ r r ^ /

2p 0,,l -P3"

Here p denetes a strictly , nonzero correla-tion coefficient and,

".g., nfl{ the conditional

probability that A takes value 1 given X : x.

.2r . . '5c.'- - -

t a \ . - . t t t r '

t . ' a 3 r '\ t ,aV

4

Figure 15induced by

(b)

(a) A generating graph Gf,," with fiveit; and (c) the overall covariance graph

nodes; (b) the overallGjo, induced by it.

(c.)

concentration graph Gj""

Page 15: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

2l r

^ l t t . 5

3,; L - -'.j'v ' ll r t' r .

, /,!r,

4

298 GRAPHICAL MARKOV MODELS

(a) (b)

Figure 16 Further graphs induced by the generaringgraph of Fig. 15a: (a) the covariance graph Gj;f withS : i2.3,4.5) given C : {1}, and (b) rhe concenrra-t ion graph G$" with . ! : {1,2,4,5).

system they show the zero and nonzero entriesin Iss.c with S : {2,3,4,5}, C : {1} and intr# with S : {1 ,2,4,5}.

These results may be used to predict impli-cations of a hypothesized generating system fordifferent analyses with the same set of data. Pre-diction of results in related investigations maybe illustrated with the help of Fig. 10, whichsummarizes some aspects of the study in thesection "A Cohort Study."

Suppose, for instance, there is a study of aca-demics, that is, of persons who have receiveda university degree, so that we condition onlevel I of variable A. Suppose further that in-formation on the other seven variables shownin Fig. 10 is available. Then we may predict(for instance, with the induced covariance graphof seven nodes given A) which variable pairsshould be marginally independent in this sub-population, provided the graph of Fig. 10 isan adequate description of how the data aregenerated.

Suppose as a further example for A at level 1,we consider the relation between U and B givenD, E, F. Figure 10 specif ies U x Bl(O,F), butan active path between U and B via Z relatleto A. This implies, for distributions that are likestable Gaussian systems without cancellation ofpath effects, that U and B are dependent givenA, D, F. Indeed, strong conditional associationis typically obtained between U and B given D,F in similar studies of German academics.

Such possibilities of deriving consequencesimplied by a particular model mean that the im-portant general principle of making a hypothe-sis elaborate, discussed in detail by Cochran [9],can be applied to these multivariate structures.

References

Almond, R. (1995). Graphical Belief Modeltins.Chapman & Hall, London.Anderson, T. W. (1973). Asymptotically efficientestimation of covariance matrices with linearstructure. Ann. Statist., 1, 135- 141.

Andersson, S. A., Madigan, D., and Perlman,M. D. (1996). An alternative Markov property forchain graphs. Proc. l2th Conf. Uncertainty in Ar-tif. Intell., E. Horvitz and F. Jensen, eds. MorsanKaufmann, San Mateo, Calif., pp. 40-48.

Barndorff-Nielsen, O. (1978). Information andExponential Families in Statistical Theont. Wi-Iey, Chichester.

Bartlett, M. S. (1935). Contingenoy rable inrerac-tions. Suppl. J. R. Statist. 9oc.,2,248-252.Birch, M. W. (1963). Maximum likelihood inthree-way contingency tables. J. R. Statist. Soc.8 .25 . 220-233.

Bishop, Y. M. M., Fienberg, S. E., and Holland,P. W. (19"75). Discrete Multivariate Analysis:Theory and Practice. MIT Press.Bouckaert, R. and Studeny, M. (1995). Chaingraphs: semantics and expressiveness. In Sym-bolic and Qualitative Approaches to Reasoningand Uncertainty, C. Froidevaux and J. Kohlas,eds. Springer-Verlag, Berlin, pp. 69-76.Cochran, W. G. (1965). The planning of observa-tional studies of human populations. J. R. Statist.Soc . A .128.234-265.

Cox, D. R. (1966). Some procedures connectedwith the logistic qualitative response curve. InResearch Papers in Statistics: Essays in Honourof J. Neyman's 70th Birthday, F. N. David, ed.Wiley, London, pp. 55-71.Cox, D. R. (1912). The analysis of mulrivariatebinary data. Appl. Statisr.,2l, l l3-120.Cox, D. R. (1997). Barlett's adjusrment. In Ency-clopedia of the Statistical Sciences Update, 1, S.Kotz, C. B. Read and D. L. Banks, eds. Wiley,New York, pp.43-45.

Cox, D. R. and Wermuth, N. (1993). Linear de-pendencies represented by chain graphs (with dis-cussion). Statist. Sci., 8, 204-218, 247 -211.

Cox, D. R. and Wermuth, N. (1994). Tests oflinearity, multivariate normality and the adequacyof linear scores. Appl. Statist., 43, 347 -355.

Cox, D. R. and N. Wermuth (1996). MultivariateDependencies: Models, Analysis, and Interpreta-tion. Chapman and Hall, London.Darroch, J. N., Lauritzen, S. L., and Speed, T. P.(1980). Markov fields and log-linear models forcontingency tables. Ann. Statist., 8, 522-539.Dawid, A.P. (1979). Conditional independencein statistical theory. J. R. Statist. Soc. B, 41, | -31.

t l l

t2l

t-?t

l4l

t-51

t6l

ttl

t8l

tel

t10l

t11 l

lr2l

t13 l

tr4l

I 1 5 tL I J J

i l 6 l

I17l

Page 16: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

t18l Dempster, A. P. (1972). Covariance selection.Biometrics, 28, 157 -175.

t19l Edwards, D. (1995). Intoduction to GraphicalModelling. Springer-Verlag, New York.

120) Frydenberg, M. (1990). The chain graph Markovproperty. Scand. J. Statist., 17, 333-353.

I2ll Gibbs, W. (1902). Elementary Principles of Sta-tistical Mechanics. Yale University Press, NewHaven.

l22l Giesen, H., Bcihmeke, W., Effler, M., Hum-mer, A., Jansen, R., K0tter, B., Kr2imer, H.-J.,Rabenstein, E., and Werner, R. R. (1981). VozSchiiLer zum Studenten. BildungsLebensLciufe imUingsschnitt, Monografien zur PiidagogischenPsychologie 7, Reinhardt, Munchen.

l23l Goodman, L. A. (1970). The multivariate analy-sis of qualitative data: interaction among mul-tiple classifications. J. Amer. Statist. Ass., 65,226-256.

l24l Goodman, L. A. (1913). The analysis of multidi-mensional contingency tables when some vari-ables are posterior to others: a modified pathanalysis approach. Biometika, 60, 119-192.

l25l Haberman, S. (1974). The Analysis of FrequencyData.lJniversity of Chicago Press. Chicago.

126) Haavelmo, T. (1943). The statistical implicationsof a system of simultaneous equations. Econo-metr ica, l l , l -12.

l27l J6reskog, K. G. (1981). Analysis of covariancestructures. Scand. J. Statist., 8, 65-92.

t28l Kauermann, G. (1996). On a dualization ofgraphical Gaussian models. Scand. J. Statist.,2 3 , 1 0 5 - 1 1 6 .

t29) Koster, J. (1996). Marko.r properties of non-recursive causal models. Ann. Statist., 24.2148-2177.

t30l Lauritzen, S. L. (1996). Graphical Models. Ox-ford University Press.

t31l Lauritzen, S. L., Dawid, A.P., Larsen, B., andLeimer, H.-G. (1990). Independence properties ofdirected Markov fields. Networks. 20.491-505.

t32l Lauritzen, S. L. and Wermuth, N. (1989). Graphi-cal models for associations between variables,some of which are qualitative and some quan-titative. Ann. Statist.. 17. 3l-54.

t33l Leimer, H.-G. (1989). Triangulated graphs withmarked vertices. In Graph Theory in Memoryof G. A. Dirac, L. D. Andersen, et al., eds.Ann. Discrete Math. 41, Elsevier, Amsterdam,pp.99-123.

t34l Leimer, H.-G. (1993). Optimal decomposition byclique separators. Discrete Math., ll3, 99 -123 .

l35l Leber, M. (1996). Die Effekte einer poststa-tiond.ren telefonischen Nachbetreuung auf dasBefinden chronisch Schmerzkranker. Disserta-tion, Medical School, Universitdt Mainz.

GRAPHICAL MARKOV MODELS 299

t36l Markov, A. A. (1912). Wahrscheinlichkeitsrech-nung. Teubner, Leipzig. (German translation of2nd Russian ed., 1908).

131) McCullagh, P. and Nelder, J. A. (1989). Genera-lized Linear Models, 2nd ed. Chapman & Hall,London.

t38l Neapolitan, R. E. (1990). Probabilistic Reasoningin Expert Systems. Wiley, New York.

l39l Oliver, R. E. and Smith, J. Q. (1990). InfluenceDiagrams, Belief Nets and Decision Analysis.Wi-ley, London.

t40l Olkin, L and Tate, R. F. (1961). Multivariate cor-relation models with mixed discrete and continu-ous variables. Ann. Math. Statist.. 32. 448-465.

t41l Pearl, J. (1988). Probabilistic Reasoning in Intel-ligent Systems: Networks of Plausible Inference.Morgan Kaufmann, San Mateo, Calif.

l42l Pearl, J. and Verma, T. (1988). The logic of rep-resenting dependencies by directed graphs. Proc.6th Conf. Amer. Ass. Artif. Intell., Seattle, Wash.,pp .374-379.

t43l Speed, T. P. (1919). A note on nearest-neighbourGibbs and Markov distributions over graphs.Sankhyd A, 41,184-197.

l44l Speed, T. P. and Kiiveri, H. T. (1986). Gauss-ian Markov distributions over finite graphs. Ann.Statist.. 14. 138- 150.

t45l Spiegelhalter, D. J., Dawid, A. P., Lauritzen,S. L., and Cowell, R. G. (1993). Bayesian analy-sis in expert systems (with discussion). Starisr.Sci.. 8. 219-283.

146l Spirtes, P. (1995). Directed cyclic graphicalrepresentations of feedback models. Proc. IIthConf. Uncertainty in Artif. Intell., P. Besnard andS. Hanks, eds. Morgan Kaufmann, San Mateo,Calif.

l47l Spirtes, P., Glymour, C., and Scheines, R. (1993).Causati,on, Prediction, and Search. Springer-Verlag, New York.

t48l Sundberg, R. (1975). Some results about de-composable (or Markov-type) models for mul-tidimensional contingency tables: distributionof marginals and partitioning of tesIs. Scand.J. Statist., 2, 11-19.

t49l Tarjan, R. E. (1985). Decomposition by cliqueseparators. Discrete Math., 55, 221 -232.

l50l Tukey, i. W. (1954). Causation, regression, andpath analysis. In Srarisrics and Mathematics inBiology, O. Kempthorne et al., eds. Iowa StateCollege Press, Ames, pp. 35-66.

t51l Wermuth, N. (1976). Analogies between multi-plicative models in contingency tables and co-variance selection. Biometrics, 32, 95- 108.

l52l Wermuth, N. (1980). Linear recursive equations,covariance selection, and path analysis. J. Amer.Statist. Ass.. 75. 963-991.

Page 17: GRAPHICAL MARKOV MODELS - Chalmers · survival analysisx. Also, further criteria are de-veloped to read off a graph which conditional independences and which conditional associa-tions

3OO GROUPING NORMAL MEANS

i53l Wermuth, N. (1992). On block-recursive re-gression equations (with discussion). Brazil J.Probab. Statist. (Rev. Brasil. Probab. e Estatist.),6 , 1 -56 .

t54l Wermuth, N. and Cox, D. R. (1997). On associ-ation models defined over independence graphs.BernouLli, to appear.

t55l Wermuth, N. and Lauritzen, S. L. (1983), Graphi-cal and recursive models for contingency tables.Biometrika. 70. 53'7 -552.

156l Wermuth, N. and Lauritzen, S. L. (1990). Onsubstantive research hypotheses, conditional in-dependence graphs and graphical chain models(with discussion). J. R. Stat ist . Soc. 8,52,21-72.

l51l Whittaker, J. L. (1990). Graphical Models inAp-plied Muhivariate Statislics. Wiley, New York.

l58l Wilks, S. S. (1938). The large sample distribu-tion of the likelihood ratio for testing compositehypotheses. Ann. Math. Statist.,9, 60-62.

t59l Wold, H. O. (1954). Causality and economerrics,Econometrica, 22, 162-117.

t60l Wright, S. (1923). The theory of path coefficienrs:a reply to Niles' criticism. Genetics,8,239-255.

t6il Wright, S. (1934). The merhod of path coeffi-cients. Ann. Math. Statist..5. 161-215.

L62l Yule, G. U. (1902). Notes on the rheory of as-sociation of attributes in statistics. Biometrika.2.t 2 l - t 3 4 .

(CAUSATIONCONDITIONAL INDEPENDENCEEXPERT SYSTEMS" PROBABILISTICGRAPH THEORYPATH ANALYSIS)

NRNNy WEnuurH

GROUPING NORMAL MEANS

The comparison of k normal means

Fr, F2, . . . , Fk is typ ica l ly presented inthe context of analysis of variance*. The hy-pothesis Ho : Lr,t : ltz : ... : ltk is testedby an F-testx with k - 1 and z degrees offreedom (df ), where lu represents the df for theerror mean square, sz say, in the ANovA.

When the groups (treatments) whose meansare to be compared are qualitative and unstruc-tured, commonly used procedures to investi-gate the more specific question of which meansare different are multiple comparison proce-dures* (MCPs), multiple range tests* (MRTs),

and simultaneous testing procedures* (STPs).The basic idea of such procedures is to ar-range the observed means in ascending order,saY) r<Y2<" '<servations, by testing sequentially hypothesesof the form

H p g I l t r p + t : F p + 2 : . . . : F p * q

f o t p :0 , 1 , . . . , k - e , e :2 ,3 , . . . , f t . Th i sis done in at most k - I stages. At stage/, hypotheses of the form Hrp-r+ry for p :

0, 1,.. . ,1 - 1 are tested [5]. More specif ical ly,the hypothesis Ho,, is rejected if

! p *q -Vp* r > , t . , t l J i ,

where a is the familywise,error rate and c].,is an appropriate critical value.

One of the drawbacks of these proceduresis that often they lead to overlapping clustersof homogeneous groups (means). This may beunsatisfactory from a practical point of view.To alleviate this problem, alternative proce-dures have been proposed which are embeddedin STPs or make use of MCPs and clusteringmethods. All of these procedures are sequentialand lead to nonoverlapping clusters of homo-geneous groups.

Basically, the procedures can be classified intwo ways: hierarchical (H) versus nonhierarchi-cal (NH), and agglomerative (A) versus divi-sive (D). Here, hierarchical means that once agroup is assigned to a cluster in one step of theprocedure, then it will remain in that cluster atsubsequent steps. Agglomerative means that ateach step two clusters will be combined to forma new cluster, and divisive means that at eachstep more clusters are being formed than existedin the preceding step. Methods by several au-thors fall into the various categories mentionedabove (see Table 1).

Thble 1

Calinski and Corsten (1) tll,Jolliffe [5]

Scott andKnott [6]Calinski and

Corsten (2) t l l