xuhua xia smoking and lung cancer this chest radiograph demonstrates a large squamous cell carcinoma...

33
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous cell carcinoma in which a portion of the tumor demonstrates central cavitation, probably because the tumor outgrew its blood supply. Squamous cell carcinomas are one of the more common primary malignancies of lung and are most often seen in smokers.

Upload: cory-bruce

Post on 25-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Smoking and Lung Cancer

This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe.

This is a larger squamous cell carcinoma in which a portion of the tumor demonstrates central cavitation, probably because the tumor outgrew its blood supply. Squamous cell carcinomas are one of the more common primary malignancies of lung and are most often seen in smokers.

Page 2: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Smoker Non-smokerLung Cancer 105 3No Lung Cancer 99895 99996

Sub-total 100000 100000

Smoking and Lung Cancer

The number of smokers and non-smokers sampled from the population

Page 3: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Association between being sick and taking medicine:

Taking medicine Not taking medicineSick 990 111Healthy 10 889

Sub-total 1000 1000

Sickness and Medication

Biological and statistical questions

“Taking medicine” is strongly associated with “Sick”. Can we say that “Sick” is caused by “Taking medicine”?

Page 4: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Simpson’s paradox

Treatment A Treatment B

Kidney stones 78% (273/350) 83% (289/350)

Small Stones 93% (81/87) 87% (234/270)

Large Stones 73% (192/263) 69% (55/80)

C. R. Charig et al. 1986. Br Med J (Clin Res Ed) 292: 879–882

Treatment A: all open procedures

Treatment B: percutaneous nephrolithotomy

Question: which treatment is better?

Conclusion changed when a new dimension is added.

Page 5: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

What is a Contingency Table?

• A contingency table: a table of counts cross-classified according to categorical variables.

• A contingency table has r rows and c columns, and is referred to as an r x c contingency table.

• The simplest contingency table is a 2 x 2 table.• The most typical null hypothesis: The counts found

in the rows are independent of the counts found in columns.

Page 6: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Contingency Tables and 2-Test

• Chi-Square test is based on 2 distribution.• Chi-Square test is typically used in tests for goodness

of fit, i.e., how well the observed values fit the expected values

• The SAS procedure FREQ can be used to output Chi-Square statistics.

• Chi-square test and Yates correction for continuity.

Page 7: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

What is a Contingency Table?

SexFavour Oppose

Male 61 34Female 43 52

Response

SexFavour Oppose

Male n11 n12 n1.

Female n21 n22 n2.

n.1 n.2 n..

Response

Marginal totals(Column totals)

Marginal totals(Row totals)

TotalCell

Page 8: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

What is a Contingency Table?

SexFavour Oppose

Male 61 34Female 43 52

Response

The null hypothesis: The response is independent of sex (i.e., the response is the same for both sexes).

Another way of stating the null hypothesis is that the sex ratio is the same for each response category.

The null hypothesis can be tested with the Chi-square test of goodness-of-fit.

Page 9: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

X2-test of a Contingency Table?

SexFavour Oppose

Male 61 34 95Female 43 52 95

104 86 190

Response

• Marginal totals• Expected frequencies (the test should be done on counts,

not on proportions).• Degree of freedom• X2 value: 0 if the data is perfectly consistent with the null

hypothesis.• p: the probability of obtaining the observed X2 value given

that the null hypothesis is true, i.e., p(X2|H0).

SexFavour Oppose

Male 61 34Female 43 52

Response

Page 10: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

X2-test of a Contingency Table?

SexFavour Oppose

Male 61 34 95Female 43 52 95

104 86 190

Response

52190

10495ˆ.,.,ˆ

..

1..111

..

..

n

nnnge

n

nnn jiij

ij

ijij

n

nn

ˆ

)ˆ( 22

52

52 43

43

• Do hand-calculation of X2.

• What is the df associated with the test?

• df = (r-1)(c-1)

Page 11: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Chi-square Distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

0 5 10 15 20

x

f(x)

= 2

= 4

= 8

2 distribution is a special case of gamma distribution with = /2 and = 2.

1 /

/ 2 1 / 2

/ 2

( ; , )( )

( ; / 2, 2)2 ( / 2)

x

x

x ep x

x ep x

In EXCEL, p = chidist(x,DF) = 1-gammadist(x,DF/2,2,true)0

1 ( ; / 2,2)x

p p x v dx The p value in chi-square test:

Page 12: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Sex | Response---------+--------+--------+ |Favour |Oppose |---------+--------+--------+male | 61 | 34 |---------+--------+--------+female | 43 | 52 |---------+--------+--------+

Categorical Data & Associated Tests

2 by 2 contingency table

Data BigIssue; input gender $ response $ wt @@;cards;Male Favour 61 Female Favour 43Male Oppose 34 Female Oppose 52;proc freq; table gender*response / chisq; weight wt;run;

Request X2-test and measures of association.

Page 13: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

SAS Output

GENDER RESPONSEFrequency|Percent |Row Pct |Col Pct |Favour |Oppose | Total---------+--------+--------+Female | 43 | 52 | 95 | 22.63 | 27.37 | 50.00 | 45.26 | 54.74 | | 41.35 | 60.47 |---------+--------+--------+Male | 61 | 34 | 95 | 32.11 | 17.89 | 50.00 | 64.21 | 35.79 | | 58.65 | 39.53 |---------+--------+--------+Total 104 86 190 54.74 45.26 100.00

Page 14: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

SAS Output

Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 6.883 0.009 Likelihood Ratio Chi-Square 1 6.927 0.008 Continuity Adj. Chi-Square 1 6.139 0.013 Mantel-Haenszel Chi-Square 1 6.847 0.009 Fisher's Exact Test (Left) 0.997 (Right) 6.50E-03 (2-Tail) 0.013 Phi Coefficient 0.190 Contingency Coefficient 0.187 Cramer's V 0.190

---------+--------+--------+ |Favour |Oppose |---------+--------+--------+male | 61 | 34 |---------+--------+--------+female | 43 | 52 |---------+--------+--------+

2

1 1 1

2

11 22 11 212

1 2 1 2

2 ln( ) ln( ) ln( ) ln( )

| |2

c cr rN NN N

ij ij i i i ii j i j

c

G f f R R C C n n

nn f f f f

R R C C

Page 15: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Formulas for different statistics

X n m m

where m n n n

G n n m

Q n r

n n n n

n n n nfor tables

X n otherwise

PX

X n

VX

n R C

ij ij ijji

ij i j

ij ij ijji

MH

2 2

2

2

11 22 12 21

1 2 1 2

2

2

2

2

2

1

2 2

1 1

( ) /

/ .

ln( / )

( )

;

/ .

min( , )

Statistic for significance tests

Measures of association: note that Phi can be used only with contingency table, otherwise the value may be greater than 1.

Correlation between the two categorical variables coded in binary

Page 16: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

2 and Measures of Association

SexFavour Oppose

Male 2 6 8Female 6 2 8

8 8 16

Response

SexFavour Oppose

Male 1 3 4Female 3 1 4

4 4 8

Response

The same pattern as above, except that the sample size is doubled.

Should the two data set have the same measure of association? Should they yield the same X2 value?

Page 17: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Sex and Hair Color

GENDER COLOR | Black | Blond | Brown | Red | Total---------+--------+--------+--------+--------+Female | 55 | 64 | 65 | 16 | 200---------+--------+--------+--------+--------+Male | 32 | 16 | 43 | 9 | 100---------+--------+--------+--------+--------+Total 87 80 108 25 300

Write a SAS program to test the association between Gender and Hair Color.

Page 18: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

SAS OutputStatistic DF Value Prob------------------------------------------------------Chi-Square 3 8.987 0.029Likelihood Ratio Chi-Square 3 9.512 0.023Mantel-Haenszel Chi-Square 1 0.459 0.498Phi Coefficient 0.173Contingency Coefficient 0.171Cramer's V 0.173

Sample Size = 300

The Mantel-Haenszel statistic is appropriate only when the two classification variables are on an ordinal scale (e.g., poor, average, good, excellent).

2

1 1 1

2

11 22 11 212

1 2 1 2

2 ln( ) ln( ) ln( ) ln( )

| |2

c cr rN NN N

ij ij i i i ii j i j

c

G f f R R C C n n

nn f f f f

R R C C

Page 19: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Why There Are More Blondes?

• An evolutionary explanation• A genetic explanation• A simple chemical explanation• The limitation of statistics

Page 20: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Log-linear model

• Preferred statistical tool for analyzing multi-way contingency table

• Use likelihood ratio test to choose the best model• Main effects and interactions can be interpreted in a

similar manner as ANOVA

2

1 1 1

2

1 1 1 1 1 1

2 ln( ) ln( ) ln( ) ln( )

2 ln( ) ln( ) ln( ) ln( ) 2 ln( )

c cr r

c t c cr r

N NN N

ij ij i i i ii j i j

N N N NN N

ijk ijk i i i i k ki j k i j j

G f f R R C C n n

G f f R R C C T T n n

Page 21: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Log-linear modelDisease Present Disease absent

Loc1 Loc2 Loc1 Loc2

Race1 44 12 38 10

Race2 28 22 20 18

data Disease; do Race= 1 to 2; do Disease = 1 to 2; do Loc=1 to 2; input wt @@; output; end; end; end;datalines; 44 12 38 1028 22 20 18; proc catmod; weight wt; model Race*Disease*Loc=_response_ / noparm pred=freq; loglin Race|Disease|Loc @ 2; quit;

1. Do two races distribute similarly in the two locations?

2. Do races differ in their susceptibility to the disease?

3. Is the disease more prevalent in one location than the other?

4. Significant 3-way interactions (e.g., one race is more susceptible to disease in one location but less susceptible to disease in the other location)?

Run and explain

Page 22: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Log-linear modeldata YeastBPS; input S1 $ S2 $ S3 $ S4 $ S5 $ S6 $ S7 $ wt; datalines; U A C U A A C 212A A C U A A C 11A A C U A A U 5C A C U A A C 8G A C U A A C 8U A C U A A U 4U A C U G A C 2U A U U A A C 3U G C U A A C 3C G C U A A C 1; proc catmod; weight wt; model S1*S2*S3*S5*S7=_response_ / noparm pred=freq; loglin S1|S2|S3|S5|S7 @ 3; run;

Page 23: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Goodness of fit tests

• Deviation of sex ratio from 1:1• Deviation from Mendelian 3:1 ratio• Deviation from Mendelian 9:3:3:1 ratio

Page 24: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Spatial Statistics

The spatial distribution of animals and plants has been described as random, contagious and even. We will learn some basic statistical techniques to detect these spatial patterns.

Page 25: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Starfish Bay

Page 26: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Quadrat Sampling

Page 27: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Three Distribution Patterns

Random Even Contagious

Page 28: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Quadrat Sampling

Quadrat N

1 2

2 2

3 3

4 0

5 6

. .

. .

100 1

Mean

Variance

Page 29: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Three Distribution Patterns

2 2 2

Page 30: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Three Probability Distributions

• Poisson distribution (random distribution)2 =

• Binomial distribution (even distribution)2 <

• Negative binomial distribution (contagious distribution)2 >

Page 31: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Random Distribution

!)(

xexP

xC1*C2 P(x) N(x) X20 14 0 0.1395 13.9457 0.00021 27 27 0.2747 27.4730 0.00812 27 54 0.2706 27.0609 0.00013 18 54 0.1777 17.7700 0.00304 9 36 0.0875 8.7517 0.00705 4 20 0.0345 3.4482 0.08836 1 6 0.0155 1.5505 0.19557 0 08 0 09 0 0

Sum 100 197 1 100 0.3023Mean 1.97 P 0.999486

Number of individuals

in a quadratNumber of

quadrats

Var = [14*(0-1.97)2+27*(1-1.97)2+27*(2-1.97)2+18*(3-1.97)2

+9*(4-1.97)2+4*(5-1.97)2+1*(6-1.97)2]/(100-1) = 1.91 < Mean.

Does the distribution deviate significantly from Poisson?

Conclusion: The spatial distribution of the species does not deviate significantly from random distribution.

Page 32: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Contagious Distributionx N(x) N(x)' C1*C2 SS P(x) N(x) X2

0 14 30 0 126.075 0.129 12.873 22.7851 27 20 20 22.050 0.264 26.391 1.5482 27 15 30 0.037 0.271 27.050 5.3683 18 14 42 12.635 0.185 18.484 1.0884 9 9 36 34.223 0.095 9.473 0.0245 4 4 20 34.810 0.039 3.884 0.0036 1 3 18 46.808 0.018 1.844 0.7257 0 2 14 49.0058 0 2 16 70.8059 0 1 9 48.303

Sum 100 205 444.750 1 100 31.541Mean 2.05 P 0.0000Var 4.492

Compare the two columns headed with N(x). The first N(x) is from the previous slide, and fits closely to a Poisson distribution. N(x) is for another species. Is the distribution in this species more contagious or more even?

Conclusion: The spatial distribution of the species is not random. Because var >> mean, the distribution is contagious.

If you are still not sure, then look at the mean and the variance. The variance is more than twice as large as the mean. Does this indicate a contagious or even distribution? Does the distribution really deviate significantly from the Poisson?

Lump the last four categories to increase n

Page 33: Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous

Xuhua Xia

Even Distribution

Compare again the two columns headed with N(x). The first N(x) fits closely to a random distribution. Is the distribution in the second species more contagious or more even?

Conclusion: The spatial distribution of the species is not random. Because var << mean, the distribution is even.

If you are still not sure, then look at the mean and the variance. The variance is smaller than the mean. Does this indicate a contagious or even distribution? Does the distribution really deviate significantly from the Poisson?

x N(x) N(x)' C1*C2 SS P(x) N(x) X20 14 0 0 0 0.016 1.608 1.6081 27 0 0 0 0.066 6.642 6.6422 27 0 0 0 0.137 13.716 13.7163 18 0 0 0 0.189 18.883 18.8834 9 90 360 1.521 0.195 19.496 254.9595 4 7 35 5.298 0.161 16.104 5.1476 1 3 18 10.491 0.111 11.085 5.8977 0 0 0 0 0.065 6.540 6.5408 0 0 0 0 0.034 3.376 3.3769 0 0 0 0 0.025 2.549 2.549

Sum 100 413 17.31 1 100 319.318Mean 4.13 P 0.0000Var 0.175