Download - Lecture 4
![Page 1: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/1.jpg)
Lecture 4
• Ways to get data into SAS
• Some practice programming
• Review of statistical concepts
![Page 2: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/2.jpg)
Getting data into SAS
• DATALINES statement– Data is contained within a data step
• INFILE statement– Data contained in separate file
• PROC IMPORT– Data contained in separate file
![Page 3: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/3.jpg)
* List Directed Input: Reading data values separated by spaces.;
DATA bp; INFILE DATALINES; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C 84 138 93 143D 89 150 91 140A 78 116 100 162A . . 86 155C 81 145 86 140;RUN ;TITLE 'Data Separated by Spaces';PROC PRINT DATA=bp;RUN;
Obs clinic dbp6 sbp6 dbpbl sbpbl
1 C 84 138 93 143 2 D 89 150 91 140 3 A 78 116 100 162 4 A . . 86 155 5 C 81 145 86 140
![Page 4: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/4.jpg)
* List Directed Input: Reading data values separated by commas;
DATA bp; INFILE DATALINES DLM = ',' ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C,84,138,93,143D,89,150,91,140A,78,116,100,162A,.,.,86,155C,81,145,86,140;RUN ;TITLE 'Data separated by a comma';PROC PRINT DATA=bp;RUN;
![Page 5: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/5.jpg)
* List Directed Input: Reading data values from a .csv type file;
DATA bp; INFILE DATALINES DLM = ',' DSD ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140;TITLE 'Reading in Data using the DSD Option';PROC PRINT DATA=bp;RUN;
![Page 6: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/6.jpg)
* List Directed Input: Reading data values separated by tabs (.txt files);
DATA bp; INFILE DATALINES DLM = '09'x DSD; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES;C 84 138 93 143D 89 150 91 140A 78 116 100 162A 86 155C 81 145 86 140;TITLE 'Reading in Data separated by a tab';PROC PRINT DATA=bp;RUN;
![Page 7: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/7.jpg)
* Reading data from an external file
DATA bp; INFILE '/home/ph5415/data/bp.csv' DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ;TITLE 'Reading in Data from an External File';PROC PRINT DATA=bp;
clinic,dbp6,sbp6,dbpbl,sbpblC,84,138,93,143D,89,150,91,140A,78,116,100,162A,,,86,155C,81,145,86,140
Content of bp.csv
![Page 8: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/8.jpg)
*Using PROC IMPORT to read in data ;
PROC IMPORT DATAFILE='/home/ph5415/data/bp.csv' OUT = bp
DBMS = csv REPLACE ; GETNAMES = yes;
TITLE 'Reading in Data Using PROC IMPORT';
PROC PRINT DATA=bp;PROC CONTENTS DATA=bp;
![Page 9: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/9.jpg)
The CONTENTS Procedure
Data Set Name: WORK.BP Observations: 5 Member Type: DATA Variables: 5 Engine: V8 Indexes: 0 Created: 18:15 Tuesday, January 25, 2005 Observation Length: 40 Last Modified: 18:15 Tuesday, January 25, 2005 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label:
-----Alphabetic List of Variables and Attributes-----
# Variable Type Len Posƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ1 clinic Char 8 322 dbp6 Num 8 04 dbpbl Num 8 163 sbp6 Num 8 85 sbpbl Num 8 24
![Page 10: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/10.jpg)
Some Definitions
• Statistics: The art and science of collecting, analyzing, presenting, and interpreting numerical data.
• Data: facts and figures that are analyzed• Dataset: All the data collected for a study• Elements: Units in which data is collected
– People, companies, schools, households• Variables: Characteristics measured on elements
– People (height, weight)– Company (number of employees)– Schools (percentage of students who graduate in 5 years)– Households (number of computers owned)
![Page 11: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/11.jpg)
Informal Definition
• Statistics:
In a scientific way gain information about something you do not know
![Page 12: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/12.jpg)
Start With Research Question
• What is the proportion of persons without health insurance in Minnesota?
• Do newer BP medications prevent heart disease compared to older medications?
• What is the relationship between grade point average and SAT scores
• Do persons who eat more F&V have lower risk of developing colon cancer.
• Does the program DARE reduce the risk of young persons trying drugs?
![Page 13: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/13.jpg)
Statistics
Start WithQuestion
Start WithQuestion
Design Study And
Collect Data
Compute SummaryCompute SummaryData to AssessData to Assess
Question.Question.
Compute SummaryCompute SummaryData to AssessData to Assess
Question.Question.
Make Conclusions(Inference)
Make Conclusions(Inference)
![Page 14: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/14.jpg)
Statistical Inference
• Estimation (Chapter 4)
• Hypothesis Testing (Chapter 5)– Comparing population proportions (Chap 6)– Comparing population means (Chap 7)
![Page 15: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/15.jpg)
Common Parameters to Estimate
Parameter Parameter Description
Mean of population
Proportion with a certain trait
Correlation between 2 variables
Difference between 2 means
Difference between 2 proportions
Population standard deviation
![Page 16: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/16.jpg)
Statistical Inference
Population with mean
= ?
Population with mean
= ?
A simple random sampleof n elements is selected
from the population..
The sample data provide a value for
the sample mean . .
The sample data provide a value for
the sample mean . .xx
The value of is used tomake inferences about
the value of .
The value of is used tomake inferences about
the value of .
xx
![Page 17: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/17.jpg)
Sampling
• Sample: a subset of target population
(usually a simple random sample - each sample has equal probability of occurring)
• Different samples yield different estimates
• Trying to understand the population parameter (the “true value”)– It’s usually not possible to measure the population value
![Page 18: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/18.jpg)
Point Estimate
Parameter Point Estimate
Sample mean
Sample proportion
Sample correlation
Difference between 2 sample means
Difference between 2 sample proportions
Sample standard deviation
![Page 19: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/19.jpg)
Interval Estimation
In general, confidence intervals are of the form:
SEestimate 96.1
SE = standard error of your estimate
Estimate = mean, proportion, regression coefficient, odds ratio...
1.96 = for 95% CI based on normal distribution
![Page 20: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/20.jpg)
Estimation“What is the average total cholesterol level for MN
residents?”
Random sample of cholesterol levels
sample mean = sum of values / number of observations
Xn
XX
Estimates the population mean:
![Page 21: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/21.jpg)
Estimation
“What is the average total cholesterol level for MN residents?”
sample standard deviation:
sestimates the
population standard deviation:
1
)( 2
n
XXs
![Page 22: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/22.jpg)
Confidence Interval Example
Suppose sample of 100
mean = 215 mg/dL, standard deviation = 20
95% CI = nsX /96.1
= (215 - 1.96*20/10, 215 + 1.96*20/10) approximately = (211, 219)
ns / = standard error of mean
![Page 23: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/23.jpg)
Properties of Confidence Intervals
• As sample size increases, CI gets smaller– If you could sample the whole population;
• Can use different levels of confidence – 90, 95, 99% common– More confidence means larger interval; so a 90% CI is smaller than a 99% CI
• Changes with population standard deviation– More variable population means larger interval
X
![Page 24: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/24.jpg)
Caution with Confidence Intervals
– Data should be from random sample
– More complicated sampling requires different methods• Example - multistage or stratified sampling
– Outliers can cause problems
– Non-normal data can change confidence level• Skewed data a big problem
– Bias not accounted for• Non-responders
• Target and sampled population different
![Page 25: Lecture 4](https://reader036.vdocuments.net/reader036/viewer/2022070410/5681455b550346895db22c4f/html5/thumbnails/25.jpg)
95% Confidence Intervals with SAS
1) Construct from output
estimate +/- 1.96*SE
2) Provided automatically by some procedures
PROC MEANS DATA = STUDENTS LCLM UCLM;
VAR AGE;