eha: more on plots and interpreting hazards sociology 229a: event history analysis class 9 copyright...
TRANSCRIPT
EHA: More On Plots and Interpreting Hazards
Sociology 229A: Event History AnalysisClass 9
Copyright © 2008 by Evan SchoferDo not copy or distribute without permission
Announcements
• Final paper assignment due next week• Questions?
• Class topics: • More on interpreting hazard & cumulative hazard
functions• More multilevel models…
Hazard Plots: Smoothing
• Issue: Stata heavily smooths hazard plots• “Raw” hazard plots are very spiky… smoothing can
help with interpretation• Issue: Too much smoothing obscures the detail within
your data
– Simplest way to control smoothing: • Set the “width” of the kernel smoother in Stata• EX: sts graph, haz width(3)• Lower width = less smoothing; try different values.
Hazard Smoothing• Environmental Law Data: Default smoothing
0.0
2.0
4.0
6.0
8.1
1970 1980 1990 2000analysis time
Smoothed hazard estimate
Hazard Smoothing• Environmental Law Data: width (1)
0.0
5.1
.15
.2
1970 1980 1990 2000analysis time
Smoothed hazard estimate
Hazard Smoothing• Environmental Law Data: width (.2)
0.1
.2.3
1970 1980 1990 2000analysis time
Smoothed hazard estimate
Hazard Smoothing• Don’t make width too small!: width (.001)
010
2030
4050
1970 1980 1990 2000analysis time
Smoothed hazard estimateStata’s default smoother amplifies peaks in data if width is too small!
Hazard Smoothing: Remarks
• Stata default smoothing is quite aggressive• Obscures detail in your data
– Stata default smoothing “width” is ~4 in this case• Smoothing of 1-2 works much better
• In addition to removing detail, smoothing but lowers the peaks…
• Highest peak = .1 (width 4)• Highest peak = .3 (width .2)• Also: REALLY narrow width exaggerates peaks
– Hightest peak = 50 (width .0001)
Survival Plot Problem: noorigin• Issue: Stata always likes to include t=0…
0.00
0.25
0.50
0.75
1.00
0 500 1000 1500 2000analysis time
Kaplan-Meier survival estimate
Survival Plot Problem: noorigin• Solution: sts graph, noorigin
0.00
0.25
0.50
0.75
1.00
1970 1980 1990 2000analysis time
Kaplan-Meier survival estimate
Plots: Confidence Intervals
• Confidence intervals are a good idea• Especially useful when comparing groups
– Stata• sts graph, ci• sts graph, haz ci
– Issue: Adding CIs tends to compress the Y axis to make room for the confidence bands
• Makes the hazard look less variable over time• Watch for that…
– Issue: CIs can make charts “busy” / hard to read.
Hazard Plot with 95% CI0
.1.2
.3
1970 1980 1990 2000analysis time
95% CI Smoothed hazard function
Smoothed hazard estimate
Hazard plot with 95% CI0
.1.2
.3.4
1970 1980 1990 2000analysis time
95% CI 95% CIwest2 = 0 west2 = 1
Smoothed hazard estimates
Survivor plot with 95% CI0
.25
.5.7
51
1970 1980 1990 2000analysis time
95% CI 95% CIwest2 = 0 west2 = 1
Kaplan-Meier survival estimates
Other sts graph options
• Options to show # of lost, entered, or censored cases
• Lost: puts a number above plots showing cases lost• Atrisk: shows # of cases at risk
– Actually, it shows risk per interval– EX: if unit = nation, it shows nation-years in an interval
• Censored: shows number of cases censored
Sts graph: atrisk
8684
16581 166 247 82 81 80
7874 73 72
6866
63 6262
5755 53
4540
3330
2420
0.00
0.25
0.50
0.75
1.00
1970 1980 1990 2000analysis time
Kaplan-Meier survival estimate
Interpreting Hazard & Cumulative Haz
• The survivor plot has a clear interpretation: The proportion of cases that have not experienced the event
• Assuming non-repeated events– If events repeat frequently, survivor falls to 0, stays there…
• Assuming the risk-set stays more-or-less constant– Survivor never goes back up, even if more cases enter the risk
set…
• But, hazard rates & cumulative hazard rates are harder to understand intuitively…
• So, I made some illustrative examples
Hazard Example 1
• Start with 10 people• Let’s put them in the risk set sequentially• All cases start at time t=0• One case fails at each point in time
Start End Failed?0 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 1
Example 1: Survivor Plot0.
000.
250.
500.
751.
00
0 2 4 6 8 10analysis time
Kaplan-Meier survival estimate
Example 1: Hazard Plot.1
.2.3
.4.5
0 2 4 6 8 10analysis time
Smoothed hazard estimate
Events occur at an even interval… but rate goes up because the risk set dwindles…
Example 1: Integrated Hazard0.
001.
002.
003.
00
0 2 4 6 8 10analysis time
Nelson-Aalen cumulative hazard estimate
Example 2
• Let’s figure out what’s really going on…• Again, start with 10 people• Imagine each enters the risk set sequentially, and fails
after 1 time unit– So only 1 case at risk in any period of time– And, 1 event per each point in time
Start End Failed?0 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 1
Example 2: Survivor Plot0.
000.
250.
500.
751.
00
0 2 4 6 8 10analysis time
Kaplan-Meier survival estimate
Survivor drops to zero when first case fails… doesn’t go back up when additional cases enter
NOT very informative…
Example 2: Hazard Plot.9
.92
.94
.96
.98
1
0 2 4 6 8 10analysis time
Smoothed hazard estimate
Hazard basically sits at 1.0. Variations = due to smoothing issues…
That’s because for every time unit at risk there is event
Interpreting Hazards
• Let’s run an exponential model• We’ll estimate the constant only… the baseline hazard
. streg , dist(exponential) nohr
Exponential regression -- log relative-hazard form
No. of subjects = 10 Number of obs = 10No. of failures = 10Time at risk = 10 LR chi2(0) = 0.00Log likelihood = 5.1044126 Prob > chi2 = .
------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _cons | 0 .3162278 0.00 1.000 -.619795 .619795------------------------------------------------------------------------------
Why is the base rate zero?
Answer: We need to exponentiate!
Exp(0) = 1
The model estimates the baseline hazard to be 1.0!
Example 2: Integrated Hazard0.
002.
004.
006.
008.
0010
.00
0 2 4 6 8 10analysis time
Nelson-Aalen cumulative hazard estimate
Integrated Hazard reaches 10
Same number of events as previous example… but less time-at-risk… so overall cumulated risk was higher
Example 3
• Let’s keep those same cases but add 10 more• Each in risk for 1 time-unit; all of which are censored
Start End Failed?0 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 10 1 01 2 02 3 03 4 04 5 05 6 06 7 07 8 08 9 09 10 0
Example 3: Hazard Plot.4
5.4
6.4
7.4
8.4
9.5
0 2 4 6 8 10analysis time
Smoothed hazard estimate
The risk set is doubled, but # events stays the same…
So, hazard drops by half… to .5
Interpreting Hazards
• Let’s run an exponential model• We’ll estimate the constant only… the baseline hazard
. streg , dist(exponential) nohr
Exponential regression -- log relative-hazard form
No. of subjects = 20 Number of obs = 20No. of failures = 10Time at risk = 20 LR chi2(0) = 0.00Log likelihood = -1.8270592 Prob > chi2 = .
------------------------------------------------------------------------------ _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _cons | -.6931472 .3162278 -2.19 0.028 -1.312942 -.0733521------------------------------------------------------------------------------
Exp(-.693) = .5
The baseline hazard rate is .5…
Example 3: Integrated Hazard0.
001.
002.
003.
004.
005.
00
0 2 4 6 8 10analysis time
Nelson-Aalen cumulative hazard estimate
Likewise, integrated hazard is only half as big…
Example 4
• What about when events occur in clumps?• Example: two dense clusters of events
– Between times 1-2 and 4-5
Start End Failed?0 1 10 1.25 10 1.5 10 1.75 10 2 10 3 10 4 10 4.25 10 4.5 10 4.75 10 5 1
Example 4: Survivor Plot0.
000.
250.
500.
751.
00
0 1 2 3 4 5analysis time
Kaplan-Meier survival estimate
Here we see the two “clumps” of events…
Example 4: Hazard Plot.2
.4.6
.8
1 2 3 4 5analysis time
Smoothed hazard estimate
Second “clump” has much higher hazard because the risk set is much smaller…
Default smoothing pretty much wipes out the first clump
Example 4: Hazard Plot, less smoothing
0.5
11.
52
1 2 3 4 5analysis time
Smoothed hazard estimate
Hazard with “width(.3)”
Now both clumps of events are clearly visible…
Example 4: Integrated Hazard0.
001.
002.
003.
00
1 2 3 4 5analysis time
Nelson-Aalen cumulative hazard estimate
Note how events with small risk set affect the cumulative hazard more (2nd clump)…
Interpreting Hazards
• The hazard rate reflects the rate of events per unit time at risk
• A constant hazard of .1 for one time-unit means that 10% of at-risk cases will have events
– But, things are often more complex than that when hazards are computed in continuous time
• The rate may vary within the interval depending on how the events are concentrated
• The risk set may change over the interval… esp. if cases leave the risk set.
Interpreting Integrated Hazards
• Integrated hazards represent the total amount of risk that has accumulated
• If the hazard is constant at .1, the integrated hazard would reach 10 after one hundred time-units…