epidemiologic study design and data analysis

Upload: yongyew2

Post on 07-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    1/26

    Epidemiologic Study Design andEpidemiologic Study Design and

    Data AnalysisData Analysis

    Shelley Harris, Ph.D.Shelley Harris, Ph.D.

    Associate ProfessorAssociate Professor

    Department of Epidemiology and Community Health &Department of Epidemiology and Community Health &

    Center for Environmental StudiesCenter for Environmental Studies

    [email protected]@vcu.edu

    BIOINFORMATICS TECHNOLOGIES

    LFSC520

    OutlineOutline

    Introduction to epidemiologyIntroduction to epidemiology

    Study designs and measures of risk inStudy designs and measures of risk inepidemiologyepidemiology

    Statistical Analysis programsStatistical Analysis programs

    Epidemiological Analysis programsEpidemiological Analysis programs

    An example calculation ..An example calculation ..

    Some cautionsSome cautions

    Shelley A.Harris

    Digitally signed by Shelley A. HarrisDN: CN = Shelley A. Harris, C = US, O= VCUReason: I am the author of thisdocumentDate: 2005.02.02 16:42:43 -05'00'

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    2/26

    Introduction to EpidemiologyIntroduction to Epidemiology

    EpidemiologyEpidemiology is the study of patterns ofis the study of patterns of

    disease occurrence and other healthdisease occurrence and other health--

    related conditions in human populationsrelated conditions in human populations

    and of the factors that influence theseand of the factors that influence these

    occurrences and conditions.occurrences and conditions.

    Leading Causes of DeathLeading Causes of Death

    Percent of Total Deaths

    US data/Adapted from Cancer Journal for Clinicians, 1994.

    33.5%

    Hea

    rt

    Dis

    ease

    s

    Acci

    dents

    Cer

    ebro

    vasc

    ular

    Dis

    ease

    s

    Can

    cer

    23.5%

    6.7%4.3% 4.0% 3.7%

    2.2%1.4% 1.2% 1.2%1.2%

    Dia

    betes

    Chr

    onic

    Obs

    tructive

    Lung

    Dis

    ease

    s

    Pneu

    mon

    ia&

    Influ

    enza

    Suic

    ide

    Cirr

    hosi

    sof

    Live

    r

    HIV

    Infe

    ction

    Hom

    icide

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    3/26

    Male Cancer StatisticsMale Cancer Statistics

    Estimated incidenceEstimated incidence Estimated deathsEstimated deaths

    Melanoma of skinMelanoma of skin

    OralOral

    LungLung

    PancreasPancreas

    StomachStomach

    Colon & RectumColon & Rectum

    ProstateProstate

    UrinaryUrinary

    Leukemia & LymphomasLeukemia & Lymphomas

    All othersAll others

    3%3%

    3%3%

    16%16%

    2%2%

    2%2%

    12%12%

    32%32%

    9%9%

    7%7%

    14%14%

    US data/Adapted from Cancer Journal for Clinicians, 1994.

    Melanoma of skinMelanoma of skin

    OralOral

    LungLung

    PancreasPancreas

    StomachStomach

    Colon & RectumColon & Rectum

    ProstateProstate

    UrinaryUrinary

    Leukemia & LymphomasLeukemia & Lymphomas

    All othersAll others

    2%

    2%

    33%

    4%

    3%

    10%

    13%

    5%

    8%

    20%

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    4/26

    1%1%

    1%1%

    18%18%

    23%23%

    5%5%

    11%11%

    5%5%

    4%4%

    3%3%

    8%8%

    21%21%

    Female Cancer StatisticsFemale Cancer Statistics

    Estimated incidenceEstimated incidence Estimated deathsEstimated deaths

    Melanoma of skinMelanoma of skin

    OralOral

    BreastBreast

    LungLung

    PancreasPancreas

    Colon & RectumColon & Rectum

    OvaryOvary

    UterusUterus

    UrinaryUrinary

    Leukemia & LymphomasLeukemia & Lymphomas

    All othersAll others

    3%3%

    2%2%

    32%32%

    13%13%

    2%2%

    13%13%

    4%4%

    8%8%

    4%4%

    6%6%

    13%13%

    US data/Adapted from Cancer Journal for Clinicians, 1994.

    Melanoma of skinMelanoma of skin

    OralOral

    BreastBreast

    LungLung

    PancreasPancreas

    Colon & RectumColon & Rectum

    OvaryOvary

    UterusUterus

    UrinaryUrinary

    Leukemia & LymphomasLeukemia & Lymphomas

    All othersAll others

    A report from the National Cancer InstituteA report from the National Cancer Institute

    (NCI) estimates that about 1 in 8 women in(NCI) estimates that about 1 in 8 women in

    the United States (approximately 12.8the United States (approximately 12.8

    percent) will develop breast cancer duringpercent) will develop breast cancer during

    her lifetime.her lifetime.

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    5/26

    Descriptive EpidemiologicDescriptive EpidemiologicStudiesStudies

    correlational or ecologic studiescorrelational or ecologic studies

    used to determine patterns of disease orused to determine patterns of disease or

    disability across different populations,disability across different populations,

    geographical areas and timegeographical areas and time

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    6/26

    Breast CancerBreast CancerInternational incidence ratesInternational incidence rates (per 100,000(per 100,000

    women)women)

    Adapted from International Opportunities in Cancer Management, SRI International, 1994.

    129.5129.5

    108.8108.8 108.6108.6 106.8106.8

    94.394.3

    84.384.376.476.4

    60.160.1

    37.037.0

    SwedenSweden USUS ItalyItaly NetherlandsNetherlands UnitedUnited FranceFrance GermanyGermany SpainSpain JapanJapanKingdomKingdom

    Breast Cancer Rates in American Women

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    7/26

    Breast Cancer Rates

    0 5 10 15 20 25 30 35

    Thailand

    China

    Japan

    Hong Kong

    USSR

    Singapore

    Spain

    France

    Australia

    United States

    Canada

    New Zealand

    Denmark

    England & Wales

    Rate/10000

    Ecological or CorrelationalEcological or Correlational

    StudiesStudiesgenerally inexpensive and quickgenerally inexpensive and quick

    substantial international and temporal variationssubstantial international and temporal variations

    relate observed differences in the morbidity orrelate observed differences in the morbidity or

    mortality to the spatial and temporal distributionmortality to the spatial and temporal distribution

    of risk factorsof risk factors

    living habits, genetic composition of groups,living habits, genetic composition of groups,

    occupational or environmental exposuresoccupational or environmental exposures

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    8/26

    Ecological StudiesEcological Studies-- BreastBreast

    Cancer?Cancer?

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    Rate of Breast Cancer 25 20 18 17 15 10 7

    Per Capita Consumption of Beer 50 40 40 35 30 26 24

    Austral ia Germany Hol land England France CanadaUnited

    States

    Ecological StudiesEcological Studies-- HeartHeart

    DiseaseDisease

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    Rate of Heart Disease 7 10 15 17 18 20 25

    Per Capita Consumption of Beer 50 40 40 35 30 26 24

    Austral ia Germany Hol land England France CanadaUnited

    States

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    9/26

    Sperm Counts over time (years)

    R2

    = 0.9793

    0

    2

    4

    6

    8

    10

    0 1 2 3 4 5 6 7 8

    Year

    S

    erm

    C

    ounts(x

    10

    6)

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    10/26

    Popular Hypotheses:Popular Hypotheses:

    Exposure toExposure to syntheticsyntheticenvironmental estrogens isenvironmental estrogens is

    related to:related to:

    1) increases of breast and prostate cancer over time1) increases of breast and prostate cancer over time

    2) differences in rates of breast and prostate cancer2) differences in rates of breast and prostate cancer

    between countiesbetween counties

    3) decreases in sperm quality/quantity observed over3) decreases in sperm quality/quantity observed over

    the last 50 yearsthe last 50 years

    Exposure toExposure to naturalnaturalenvironmental estrogensenvironmental estrogens

    accounts for:accounts for:

    1) differences in rates of breast and prostate cancer1) differences in rates of breast and prostate cancer

    between counties and differences in heart diseasebetween counties and differences in heart disease

    AnAn ecological fallacyecological fallacyis defined as:is defined as:

    The bias that may occur because an associationThe bias that may occur because an association

    observed between variables on a group level doesobserved between variables on a group level does

    not necessarily represent the association thatnot necessarily represent the association that

    exists at an individual level.exists at an individual level.

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    11/26

    Geographic InformationGeographic Information

    SystemsSystemsAn organized collection ofAn organized collection ofcomputer hardware,computer hardware,

    software, geographicsoftware, geographic

    data, and personneldata, and personnel

    designed to efficientlydesigned to efficiently

    capture, store,capture, store,

    manipulate, analyze, andmanipulate, analyze, and

    display all forms ofdisplay all forms of

    geographically referencedgeographically referenceddatadata

    Breast CancerBreast Cancer

    --

    SociodemographicSociodemographic and Lifestyleand Lifestylesociodemographicsociodemographic

    SESSES 1.11.1--2x2x (high / low)(high / low)

    Marital statusMarital status 1.11.1--2x2x (never / ever)(never / ever)

    ResidenceResidence 1.11.1--2x2x (urban / rural)(urban / rural)

    Race/ethnicityRace/ethnicity 1.11.1--2x2x (white,(white, hispanichispanic // asianasian))

    ReligionReligion 1.11.1--2x2x (Jewish / Mormon)(Jewish / Mormon)

    lifestyle factorslifestyle factors

    Diet (high fat) ?Diet (high fat) ? Obesity/body shape??Obesity/body shape??

    high alcohol consumption??high alcohol consumption?? low physical activity??low physical activity??

    Smoking, passive smoke???Smoking, passive smoke???

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    12/26

    Established/Suspected BreastEstablished/Suspected Breast

    Cancer Risk FactorsCancer Risk Factors

    Age > 50: 4xAge > 50: 4x

    Family history of disease: 1 relativeFamily history of disease: 1 relative premenopausalpremenopausal or 2or 2

    relatives with any form: 4xrelatives with any form: 4x

    History of benign breast disease: 4xHistory of benign breast disease: 4x

    BRCA1 or BRCA2 mutation: 4xBRCA1 or BRCA2 mutation: 4x

    Ionizing radiation (Ionizing radiation (espesp b/w puberty and 20 yrs): 2b/w puberty and 20 yrs): 2--4x4x

    Lifetime exposure to estrogenLifetime exposure to estrogenLifestyle factors?Lifestyle factors?

    Environmental factors?Environmental factors?

    Lifetime exposure to estrogenLifetime exposure to estrogen

    early menarche:early menarche: 1.11.1--2x2x (11 / 15)(11 / 15)

    NulliparityNulliparity:: 1.11.1--3x3x ((nulliparousnulliparous // parousparous))

    late age of first birth:late age of first birth: 1.11.1--3x (>30 / 30 /

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    13/26

    SuspectedSuspected

    Environmental/OccupationalEnvironmental/OccupationalCauses of Breast CancerCauses of Breast Cancer

    1)1) Low level ionizing radiationLow level ionizing radiation

    2)2) Solvent exposuresSolvent exposures

    3)3) Electromagnetic fields (EMF)Electromagnetic fields (EMF)

    4)4) OrganochlorineOrganochlorine compoundscompounds

    5)5) PesticidesPesticides

    Analytic StudiesAnalytic Studies

    To test hypotheses it is necessary to conductTo test hypotheses it is necessary to conduct

    analytic epidemiological studies. Analyticanalytic epidemiological studies. Analytic

    studies can be divided into two main types:studies can be divided into two main types:

    1) Experimental studies1) Experimental studies -->> Clinical trialsClinical trials

    2) Observational Studies2) Observational Studies

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    14/26

    The Epidemiologic Study

    Controlled Assignment Uncontrolled Assignment

    Experimental Studies

    Non Randomized Randomized Sampling with Regard Sampling with RegardAssignment Assignment To Disease or Effect To Exposure, Characteristic,

    Or Cause

    Exposure or Characteristic History of ExposureAt Time of Study or Characteristic Prior

    to Time of Study

    Observational Studies

    Clinical Trials(Efficacy, Effectiveness)

    Community Trials Cross-sectional and/or Retrospective Studies

    Prospective studies(cohort, case-cohort)

    Cross-sectional StudiesRetrospective Studies

    (Case-control)

    CaseCase--Control StudiesControl Studies

    selected into a study based on theirselected into a study based on their

    disease status.disease status.

    sometimes called retrospective studies orsometimes called retrospective studies or

    casecase--referentreferent

    most common type of epidemiologic studymost common type of epidemiologic study

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    15/26

    CaseCase--Control StudiesControl Studies

    2001

    ProspectiveRetrospective

    1960 2030

    In 2001 select

    case and controlgroups

    Follow back in timeto determineexposure status

    Measures of Risk or AssociationMeasures of Risk or Association

    Cases Controls Totals

    Exposed a b M1

    Not exposed c d M2

    Totals N1 N2 N

    Odds Ratio (OR)

    = (a/b) / (c/d)

    Odds of being a case if you are exposed = a/b

    Odds of being a case if you are not exposed = c/d

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    16/26

    Self-reported

    pesticide exposure

    Breast Cancer

    Yes No Totals

    Yes 499 462 961

    No 19 56 75

    Totals 518 518 1036

    Odds Ratio (OR) = (499 / 462) / (19 / 56)

    = 3.18

    Cohort StudiesCohort Studies

    considered a natural experimentconsidered a natural experiment

    called followcalled follow--up studies, incidenceup studies, incidence

    studies, or longitudinal studiesstudies, or longitudinal studies

    1)1) Prospective cohortProspective cohort

    2)2) Retrospective cohortRetrospective cohort

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    17/26

    Cohort StudiesCohort Studies

    2005

    ProspectiveRetrospective

    exposed and

    groups

    exposed and

    groups

    exposed and

    groups

    exposed and

    groups

    exposed and

    groups

    non-exposedgroups

    In 2005 select

    1960

    non-exposed

    In 2005 select

    2030

    Measures of RiskMeasures of Risk

    Disease No Disease Totals

    Exposed a b M1

    Not exposed c d M2

    Totals N1 N2 N

    Rate of disease in exposed = a/(a+b) = a/M1

    Rate of disease in non-exposed = c/(c+d) = c/M2

    Relative risk = (a/M1) (c/M2)

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    18/26

    [PCBs] in blood

    Breast cancer

    Yes No Totals

    High 20 9980 10000

    Low 5 9995 10000

    Totals 25 19975 20000

    Relative Risk (RR) = (20 / 10,000) / (5 / 10,000)

    = 4.0

    Established EnvironmentalEstablished Environmental

    Causes of Breast CancerCauses of Breast Cancer

    1)1) Ionizing radiationIonizing radiation

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    19/26

    Some Statistical Analysis SoftwareSome Statistical Analysis Software

    SASSAS

    SPSSSPSS

    SYSTATSYSTAT

    SPLUSSPLUS

    SASSAS is a large, generalis a large, general--purpose package descendedpurpose package descendedfrom an original program that was designed to run onfrom an original program that was designed to run onmainframe computers in a "batch" mode,mainframe computers in a "batch" mode, ieie. by the user. by the usersubmitting a batch of commands and then getting a pilesubmitting a batch of commands and then getting a pileof results in a separate output file (or window, now thatof results in a separate output file (or window, now thatWindows and Mac versions are available). Along with aWindows and Mac versions are available). Along with aslightly complicated approach to data management, thisslightly complicated approach to data management, thismakes the program harder to learn and comparedmakes the program harder to learn and comparedwith SPSS there is less capability to learn by experimentwith SPSS there is less capability to learn by experimentusing menus. On the other hand, the data processingusing menus. On the other hand, the data processing

    capabilities are extremely powerful and the range ofcapabilities are extremely powerful and the range ofstatistical procedures wide.statistical procedures wide.

    University of MelbourneUniversity of Melbourne

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    20/262

    SPSSSPSS is a wellis a well--known package particularlyknown package particularlypopular in the social sciences and psychology. Itpopular in the social sciences and psychology. Itis a very large and somewhat cumbersomeis a very large and somewhat cumbersomeprogram but also very powerful and capable ofprogram but also very powerful and capable ofperforming almost all the standard methods ofperforming almost all the standard methods ofanalysis. Recent Windows versions have aanalysis. Recent Windows versions have aconvenient user interface, but it can still be hardconvenient user interface, but it can still be hardto keep track of exactly what you've done. Theto keep track of exactly what you've done. Themenumenu--based interface makes it relatively easy tobased interface makes it relatively easy tolearn, at least for simple applicationslearn, at least for simple applications

    University of MelbourneUniversity of Melbourne

    SS--PLUSPLUS is a program for specialist statisticiansis a program for specialist statisticians

    only. It is an interactive, objectonly. It is an interactive, object--oriented system,oriented system,

    with both a wide range of builtwith both a wide range of built--in functions andin functions and

    complete programming capabilities for extendingcomplete programming capabilities for extending

    these. Probably its most useful feature for us isthese. Probably its most useful feature for us is

    an extremely powerful and relatively easyan extremely powerful and relatively easy

    --toto

    --useuse

    capacity for graphics.capacity for graphics.

    University of MelbourneUniversity of Melbourne

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    21/262

    Epidemiological AnalysisEpidemiological Analysis

    SoftwareSoftware

    EpiEpi InfoInfo (free)(free) http://http://www.cdc.gov/epiinfowww.cdc.gov/epiinfo

    PEPIPEPI (not so free)(not so free)

    EGRETEGRET (not free)(not free)

    EpiEpi InfoInfo

    Latest Version:Latest Version: EpiEpi Info Version 3.3Info Version 3.3

    WithWith EpiEpi Info and a personal computer, epidemiologists and otherInfo and a personal computer, epidemiologists and otherpublic health and medical professionals can rapidly develop apublic health and medical professionals can rapidly develop aquestionnaire or form, customize the data entry process, and entquestionnaire or form, customize the data entry process, and entererand analyze data. Epidemiologic statistics, tables, graphs, andand analyze data. Epidemiologic statistics, tables, graphs, and mapsmapsare produced with simple commands such as READ, FREQ, LIST,are produced with simple commands such as READ, FREQ, LIST,TABLES, GRAPH, and MAP.TABLES, GRAPH, and MAP. EpiEpi Map displays geographic mapsMap displays geographic mapswith data fromwith data from EpiEpi Info.Info.

    A new version,A new version, EpiEpi Info for Windows retains many features of theInfo for Windows retains many features of the

    familiarfamiliarEpiEpi Info for DOS, while offering Windows ease of useInfo for DOS, while offering Windows ease of usestrengths such as pointstrengths such as point--andand--click commands, graphics, fonts, andclick commands, graphics, fonts, andprinting.printing.

    http://http://www.cdc.gov/epiinfowww.cdc.gov/epiinfo//

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    22/262

    EgretEgret

    Egret software is a statistical package that specializes in offEgret software is a statistical package that specializes in offering modeling andering modeling andgraphics capabilities to investigators conducting epidemiologicagraphics capabilities to investigators conducting epidemiological and biomedicall and biomedicalstudies. Egret is a userstudies. Egret is a user--friendly statistical package for epidemiologists.friendly statistical package for epidemiologists.

    Comprehensive Set of Models: Many Not Available ElsewhereComprehensive Set of Models: Many Not Available Elsewhere Contingency TablesContingency Tables

    Logistic RegressionLogistic Regression

    Conditional Logistic Regression*Conditional Logistic Regression*

    Logistic Regression with Random EffectsLogistic Regression with Random Effects

    BetaBeta--Binomial RegressionBinomial Regression

    Poisson RegressionPoisson Regression

    WeibullWeibull Regression Exponential RegressionRegression Exponential Regression

    Cox Proportional Hazards RegressionCox Proportional Hazards Regression

    Cox Regression with TimeCox Regression with Time--Dependent CovariatesDependent Covariates

    KaplanKaplan--Meier Analysis and PlotsMeier Analysis and Plots

    Extensive PostExtensive Post--Fit Analysis with Plots, Including DeltaFit Analysis with Plots, Including Delta--Betas, and Hazard FunctionsBetas, and Hazard FunctionsPlus a new spreadsheetPlus a new spreadsheet--based data editor and a statistical scratchpadbased data editor and a statistical scratchpad

    Unlike other epidemiology software, Egret permits the case/contrUnlike other epidemiology software, Egret permits the case/control ratio to vary over strataol ratio to vary over stratawithout using an approximation for the conditional likelihood fuwithout using an approximation for the conditional likelihood function.nction.

    Cancer: Surveillance, EpidemiologyCancer: Surveillance, Epidemiologyand End Resultsand End Results

    The SEER Cancer Statistics reports,The SEER Cancer Statistics reports,

    publications, publicpublications, public--use data and analysisuse data and analysis

    software are available at the Nationalsoftware are available at the National

    Cancer Institute web site:Cancer Institute web site:

    http://http://SEER.Cancer.GovSEER.Cancer.Gov//

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    23/262

    SEERSEER The small printThe small print

    PC Software to Calculate Statistics from SEER and Other Data SouPC Software to Calculate Statistics from SEER and Other Data Sourcesrces

    SEER*StatSEER*Stat statistical software can be used to view individual cancer recostatistical software can be used to view individual cancer records orrds orcalculate incidence, mortality, survival, and prevalence statistcalculate incidence, mortality, survival, and prevalence statistics from SEER andics from SEER andother cancerother cancer--related databases. All variables in the SEER publicrelated databases. All variables in the SEER public--use data areuse data areavailable for analysis. Statistics calculated in SEER*Stat can bavailable for analysis. Statistics calculated in SEER*Stat can be viewed, printed, ore viewed, printed, orexported for further analysis using other statistical software,exported for further analysis using other statistical software, including thoseincluding thosedescribed below.described below.

    JoinpointJoinpoint is statistical software for the analysis of trends using modelsis statistical software for the analysis of trends using models with severalwith severaldifferent lines that are connected at the "different lines that are connected at the "joinpointsjoinpoints." The software takes trend data." The software takes trend data(e.g., cancer rates) and fits the simplest(e.g., cancer rates) and fits the simplestjoinpointjoinpoint model that the data allow.model that the data allow. JoinpointJoinpointis often used to analyze trends in rates calculated by SEER*Statis often used to analyze trends in rates calculated by SEER*Stat..

    DevCanDevCan software usessoftware uses lifetablelifetable methods to compute the lifetime and agemethods to compute the lifetime and age--conditionedconditionedprobability of developing cancer and dying of cancer in the geneprobability of developing cancer and dying of cancer in the general population. Inputral population. Inputdata for the computations include cancer incidence and mortalitydata for the computations include cancer incidence and mortality rates as well as allrates as well as all--cause mortality rates. Data sets are supplied to estimate riskscause mortality rates. Data sets are supplied to estimate risks of developing andof developing anddying of cancer for over 20 cancer sites by race and sex. In adddying of cancer for over 20 cancer sites by race and sex. In addition,ition, DevCanDevCan can becan beused to calculate the lifetime risk using rates calculated in SEused to calculate the lifetime risk using rates calculated in SEER*Stat and exportedER*Stat and exportedfor use infor use in DevCanDevCan..

    Class exerciseClass exercise see handoutssee handouts

    Sample size calculation using:Sample size calculation using:

    1) Hand calculation1) Hand calculation

    2) Web2) Web--based free program (find yourbased free program (find your

    own)own)

    3) Examples in EXCEL and SAS3) Examples in EXCEL and SAS

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    24/262

    True DifferencePresent Absent

    Conclusion ofStatistical Test

    Different Correct(true positive)(1-=Power)

    Incorrect:Type I () error(false positive)

    Not Different Incorrect:Type II ( ) error(false negative)

    Correct(true negative)

    Power

    Ho

    Ho

    n=100

    n=1000

    alpha

    beta

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    25/262

    Sample SizeSample Size

    nk

    p q Z Z

    p p=

    +

    +

    11

    2

    2

    1 2

    2

    _ _

    /( )

    ( )

    CRAP Detector #1CRAP Detector #1

    Beware the large sample size.Beware the large sample size.

    Effects can be statistically significant andEffects can be statistically significant andbiologically inconsequentialbiologically inconsequential

    CRAP.: Circular Reasoning or AntiCRAP.: Circular Reasoning or Anti--intellectual Pomposityintellectual Pomposity

  • 8/3/2019 Epidemiologic Study Design and Data Analysis

    26/26

    CRAP Detector #2CRAP Detector #2

    Beware the small sample sizeBeware the small sample size

    It is hard to find significant differences and noIt is hard to find significant differences and no

    difference means nothing.difference means nothing.

    Some ThoughtsSome Thoughts

    garbage in garbage out.garbage in garbage out.

    consult the biostatistician and theconsult the biostatistician and the

    epidemiologistepidemiologist

    we charge by the hourwe charge by the hour