combining data vertically
DESCRIPTION
Combining Data VerticallyTRANSCRIPT
3-32Chapter 3 Combining Data Vertically
3.3 Solutions to Exercises3-33
Chapter 3 Combining Data Vertically
3-33.1Appending Raw Data Files
3.2Appending SAS Data Sets3-243.3Solutions to Exercises3-33
3.1 Appending Raw Data Files
(Only the FILENAME statement and the FILEVAR= option are discussed in this section.
The FILEREF= option is an alias to the FILE= option.
In addition to viewing the values of the raw data file, you need to obtain the file layout for the fields in the raw data file.
PROC FSLIST (Full Screen List) is part of Base SAS beginning in Version 7. XE "FSLIST procedure" Selected commands and menu options:
HEXdisplays data in hexadecimal.
COLSdisplays column numbers.
NUMSdisplays row numbers.
PROC FSLIST displays a raw data file in browse mode. To display a raw data file in edit mode within the SAS interactive windowing mode, select File ( Open from the Program Editor or Enhanced Editor window. To remove the contents of the raw data file from the Program Editor or Enhanced Editor window, select Edit ( Clear All.
For z/OS (OS/390):
proc fslist file = '.prog3.rawdata(month1)';
run;
Use multiple INFILE statements to read a record from one raw data file, a record from the second raw data file, a record from the third raw data file, and so on (similar to an interleave).
Use the FILENAME statement to concatenate multiple raw data files whose names can be hard-coded.
Use the FILEVAR= option to conditionally concatenate multiple raw data files.
A FILENAME statement can associate a fileref with multiple physical external files. XE "DESCRIBE statement:DATA step view"
XE "DATA step views:DESCRIBE statement"
XE "FILENAME statement:syntax"
Under z/OS (OS/390):
filename Q1 ('.prog3.rawdata(month1)' '.prog3.rawdata(month2)' '.prog3.rawdata(month3)');
Log
filename Q1 ('month1.dat' 'month2.dat' 'month3.dat');
data firstq;
infile Q1;
input Flight $ Origin $ Dest $ Date : date9. RevCargo : comma15.2;
run;
NOTE: The infile Q1 is:
File Name=c:\workshop\winsas\prog3\month1.dat,
File List=('c:\workshop\winsas\prog3\month1.dat'
'c:\workshop\winsas\prog3\month2.dat'
'c:\workshop\winsas\prog3\month3.dat'),
RECFM=V,LRECL=256
NOTE: The infile Q1 is:
File Name=c:\workshop\winsas\prog3\month2.dat,
File List=('c:\workshop\winsas\prog3\month1.dat'
'c:\workshop\winsas\prog3\month2.dat'
'c:\workshop\winsas\prog3\month3.dat'),
RECFM=V,LRECL=256
NOTE: The infile Q1 is:
File Name=c:\workshop\winsas\prog3\month3.dat,
File List=('c:\workshop\winsas\prog3\month1.dat'
'c:\workshop\winsas\prog3\month2.dat'
'c:\workshop\winsas\prog3\month3.dat'),
RECFM=V,LRECL=256
NOTE: 2299 records were read from the infile Q1.
The minimum record length was 33.
The maximum record length was 37.
NOTE: 2090 records were read from the infile Q1.
The minimum record length was 33.
The maximum record length was 37.
NOTE: 2297 records were read from the infile Q1.
The minimum record length was 33.
The maximum record length was 37.
NOTE: The data set WORK.FIRSTQ has 6686 observations and 5 variables.
NOTE: DATA statement used:
real time 0.31 seconds
cpu time 0.12 seconds
The value of a FILEVAR= variable option is a character string that contains the physical filename of the raw data file to be read.
When the next INPUT statement executes, it reads from the new file that the FILEVAR= variable option specifies.
Like automatic variables, the FILEVAR= variable option is not written to the data set.
The FILEVAR= variable option can read raw data files conditionally. You can construct the names of the raw data files programatically. XE "INFILE statement:FILEVAR= option"
XE "FILEVAR= option:INFILE statement"
The placeholder must be eight characters or fewer, and must begin with an alpha character or underscore, followed by alphanumeric characters or underscores.
If the characters-to-remove option is omitted, the COMPRESS function removes blanks from the source.
The STOP statement is needed to prevent an infinite loop of the DATA step.
Log
data movingq;
do i = 11,10,9;
NextFile = "month"!!put(I,2.)!!".dat";
NextFile = compress(NextFile,' ');
infile xxx filevar=NextFile;
input Flight $ Origin $ Dest $ Date : date9. RevCargo : comma15.2;
output;
end;
stop;
run;
NOTE: The infile XXX is:
File Name=c:\workshop\winsas\prog3\month11.dat,
RECFM=V,LRECL=256
NOTE: The infile XXX is:
File Name=c:\workshop\winsas\prog3\month10.dat,
RECFM=V,LRECL=256
NOTE: The infile XXX is:
File Name=c:\workshop\winsas\prog3\month9.dat,
RECFM=V,LRECL=256
NOTE: 1 record was read from the infile XXX.
The minimum record length was 37.
The maximum record length was 37.
NOTE: 1 record was read from the infile XXX.
The minimum record length was 37.
The maximum record length was 37.
NOTE: 1 record was read from the infile XXX.
The minimum record length was 37.
The maximum record length was 37.
NOTE: The data set WORK.MOVINGQ has 3 observations and 6 variables.
NOTE: DATA statement used:
real time 0.15 seconds
cpu time 0.01 seconds
The END= variable option is not written to the SAS data set.
(Obtain the month number of todays date to begin the rolling quarter range.
(Calculate the month numbers of the two months prior to todays month number.
See the SAS documentation for more information about the INTNX function.
Reading Multiple Raw Data Filesc03s1d4data movingq;
drop MonNum MidMon LastMon I;
MonNum=month(today());
MidMon=month(intnx('month',today(),-1));
LastMon=month(intnx('month',today(),-2));
do i=MonNum, MidMon, LastMon;
NextFile="month"!!put(i,2.)!!".dat";
NextFile=compress(NextFile,' ');
do until (LastObs);
infile xxx filevar=NextFile end=LastObs;
input Flight $ Origin $ Dest $ Date : date9.
RevCargo : comma15.2;
output;
end;
end;
stop;
run;
For z/OS (OS/390):
NextFile = '.prog3.rawdata(month'!!put(i,2.)!!')';
Log
data movingq;
drop MonNum MidMon LastMon I;
MonNum=month(today());
MidMon=month(intnx('month',today(),-1));
LastMon=month(intnx('month',today(),-2));
do i=MonNum, MidMon, LastMon;
NextFile="month"!!put(i,2.)!!".dat";
NextFile=compress(NextFile,' ');
do until (LastObs);
infile xxx filevar=NextFile end=LastObs;
input Flight $ Origin $ Dest $ Date : date9.
RevCargo : comma15.2;
output;
end;
end;
stop;
run;
NOTE: The infile XXX is:
File Name=c:\workshop\winsas\prog3\month2.dat,
RECFM=V,LRECL=256
NOTE: The infile XXX is:
File Name=c:\workshop\winsas\prog3\month1.dat,
RECFM=V,LRECL=256
NOTE: The infile XXX is:
File Name=c:\workshop\winsas\prog3\month12.dat,
RECFM=V,LRECL=256
NOTE: 2090 records were read from the infile XXX.
The minimum record length was 33.
The maximum record length was 37.
NOTE: 2299 records were read from the infile XXX.
The minimum record length was 33.
The maximum record length was 37.
NOTE: 2190 records were read from the infile XXX.
The minimum record length was 33.
The maximum record length was 37.
NOTE: The data set WORK.MOVINGQ has 6579 observations and 5 variables.
NOTE: DATA statement used:
real time 0.48 seconds
cpu time 0.14 seconds
This program was run in February.
Reference Information
Storing the Raw Data Filenames in a SAS Data Set
If raw data files that are to be read are in the SAS data set ia.rawdata, you can use the following code:
ia.rawdata
Obs ReadIt
1 route1.dat
2 route2.dat
3 route3.dat
4 route4.dat
5 route5.dat
c03ref1.sasdata route1_5;
set ia.rawdata;
infile in filevar = ReadIt end = LastFile;
do while(LastFile = 0);
input @1 RouteID $7.
@8 Origin $3.
@11 Dest $3.
@14 Distance 5.
@19 Fare1st 4.
@23 FareBusiness 4.
@27 FareEcon 4.
@31 FareCargo 5.;
output;
end;
run; XE "raw data file names:storing in a SAS data set"
XE "DATALINES statement" Storing the Raw Data Filenames in an External File
If the raw data files to be read are in the external file rawfiles.dat, you can use the following code:
rawfiles.dat
route1.dat
route2.dat
route3.dat
route4.dat
route5.datc03ref2.sasdata route1_5;
infile 'rawfiles.dat';
input ReadIt $ 10.;
infile in filevar = ReadIt end = LastFile;
do while(LastFile = 0);
input @1 RouteID $7.
@8 Origin $3.
@11 Dest $3.
@14 Distance 5.
@19 Fare1stclass 4.
@23 FareBusiness 4.
@27 FareEcon 4.
@31 FareCargo 5.;
output;
end;
run;
XE "SAS data sets:creating from multiple raw data files" \r "rdfrm"
XE "raw data files:reading multiple" \r "rdfrm"
XE "multiple raw data files:reading" \r "rdfrm"
XE "reading multiple raw data files" \r "rdfrm"
XE "raw data file names:storing in an external file" Exercises1. Concatenate the companys annual raw data files for the current year and previous two years using the FILEVAR= option. Create a SAS data set named last3. The raw data files use the following naming convention:
For directory based:y2002.dat
For z/OS (OS/390):'.prog3.rawdata(y2002)'
Use the following INPUT statement:
input Flight $ Date : date9. Depart : time5.;
Partial Output
The SAS System
Obs Flight Date Depart
1 IA10800 01JAN2004 6:35
2 IA10801 01JAN2004 9:35
3 IA10802 01JAN2004 12:35
4 IA10803 01JAN2004 15:35
5 IA10804 01JAN2004 18:35
6 IA10805 01JAN2004 21:35
7 IA10800 01JAN2004 6:35
8 IA10801 01JAN2004 9:35
9 IA10802 01JAN2004 12:35
10 IA10803 01JAN2004 15:35
11 IA10804 01JAN2004 18:35
12 IA10805 01JAN2004 21:35
13 IA10800 01JAN2004 6:35
14 IA10801 01JAN2004 9:35
15 IA10802 01JAN2004 12:35
3.2 Appending SAS Data Sets
(Only PROC APPEND is discussed in this section.
PROC APPEND only reads the data in the DATA= SAS data set, not the BASE= SAS data set.
The FORCE option forces PROC APPEND to concatenate data sets when the DATA= data set contains variables that either
are not in the BASE= data set.
do not have the same type as the variables in the BASE= data set. For variables with a type mismatch, missing values are assigned in the appended observations when the FORCE option is used.
are longer than the variables in the BASE= data set.
The work.moredata data set has 19 variables. The work.partdata data set has 15 variables.
The FORCE option was required because of the variable type mismatch for RouteID. The new observations receive missing values for RouteID.
The FORCE option can cause loss of data due to truncation or dropping variables.
The work.moredata data set has 19 variables. The work.partdata data set has 15 variables.
In addition to the type mismatch for RouteID, the additional variables present in work.moredata require the use of the FORCE option.
Appending Variables with Different Attributes
c03s2d1proc contents data = airports;
run;Partial Output
The CONTENTS Procedure
Data Set Name WORK.AIRPORTS Observations 9397
Member Type DATA Variables 4
Alphabetic List of Variables and Attributes
# Variable Type Len Label
2 City Char 50 City Where Airport is Located
1 Code Char 3 Airport Code 3 Country Char 40 Country Where Airport is Located
4 Name Char 50 Airport Name
proc contents data = acities;
run;
Partial Output
The CONTENTS Procedure
Data Set Name WORK.ACITIES Observations 52
Member Type DATA Variables 4
Alphabetic List of Variables and Attributes
# Variable Type Len Label
1 City Char 30 City Where Airport is Located
2 Code Char 3 Start Point 4 Country Char 40 Country Where Airport is Located
3 Name Char 50 Airport Name
proc append data = airports base = acities force;
run;
proc contents data = acities;
run;
Log
proc append data=airports base=acities force;
run;
NOTE: Appending WORK.AIRPORTS to WORK.ACITIES.
WARNING: Variable City has different lengths on BASE and DATA files
(BASE 30 DATA 50).
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 9397 observations read from the data set WORK.AIRPORTS.
NOTE: 9397 observations added.
NOTE: The data set WORK.ACITIES has 9449 observations and 4 variables.
NOTE: PROCEDURE APPEND used:
real time 0.04 seconds
cpu time 0.04 secondsPartial Output
The CONTENTS Procedure
Data Set Name WORK.ACITIES Observations 9449
Member Type DATA Variables 4
Alphabetic List of Variables and Attributes
# Variable Type Len Label
1 City Char 30 City Where Airport is Located
2 Code Char 3 Start Point 4 Country Char 40 Country Where Airport is Located
3 Name Char 50 Airport Name
proc contents data = allemps;
run;
Output
The CONTENTS Procedure
Data Set Name WORK.ALLEMPS Observations 550
Member Type DATA Variables 5
Alphabetic List of Variables and Attributes
# Variable Type Len
5 Division Char 30
1 EmpID Char 6
2 LastName Char 15
4 Location Char 13
3 Phone Char 4
proc contents data = pilots;
run;
Output
The CONTENTS Procedure
Data Set Name WORK.PILOTS Observations 31
Member Type DATA Variables 5
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
2 Division Char 30 $30. $30. Division
5 EmpID Char 6 $6. $6. Employee Identification Number
3 LastName Char 32 $32. $32. Employee Last Name
4 Location Char 16 $16. $16. Employee Office Location
1 Phone Num 8proc append base = allemps data = pilots force;
run;
Log
proc append base=allemps data=pilots force;
run;
NOTE: Appending WORK.PILOTS to WORK.ALLEMPS.
WARNING: Variable LastName has different lengths on BASE and DATA files
(BASE 15 DATA 32).
WARNING: Variable Phone not appended because of type mismatch.
WARNING: Variable Location has different lengths on BASE and DATA files
(BASE 13 DATA 16).
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 31 observations read from the data set WORK.PILOTS.
NOTE: 31 observations added.
NOTE: The data set WORK.ALLEMPS has 581 observations and 5 variables.
NOTE: PROCEDURE APPEND used:
real time 0.01 seconds
cpu time 0.01 seconds
proc print data = allemps;
var phone;
run;
Output
Obs Phone
547 1003
548 1028
549 1070
550 1016
551
552
553
554proc contents data = allemps;
run;
Output
The CONTENTS Procedure
Data Set Name WORK.ALLEMPS Observations 581
Member Type DATA Variables 5
Alphabetic List of Variables and Attributes
# Variable Type Len
5 Division Char 30
1 EmpID Char 6
2 LastName Char 15
4 Location Char 13
3 Phone Char 4
Summary
DATA= data set contains variables thatForce Required?Consequences
Are not in the BASE= data setyesExtra DATA= data set variables are dropped.
Do not have the same type as the variables in the BASE= data setyesData not appended. Missing values are assigned to mismatched data.
Are longer than the variables in the BASE= data setyesDATA = data set variable values truncated.
Are in the BASE= data set, but BASE= data set has more variablesnoMissing values are assigned to extra BASE= data set variables.
Exercises2. Append ia.quarter4 to ia.y2000. First, determine if the data sets have the same variables. The resulting data set should be ia.y2000 data with the additional observations from ia.quarter4.
Partial Output: Added Observations
ia.y2000 with Quarter4 Data
Obs CrgoRev1 CrgoRev2 CrgoRev3 CrgoRev4
265 $3,281,364 $558,698 $2,094,261 $1,814,348
266 $3,296,780 $534,094 $2,403,148 $1,803,004
267 $3,317,456 $567,020 $2,155,557 $1,822,840
268 $3,279,250 $526,076 $1,893,366 $1,801,768
269 $3,260,316 $552,722 $2,133,225 $1,834,500
270 $3,243,090 $559,722 $2,337,188 $1,849,388
271 $3,293,606 $531,262 $2,132,043 $1,824,242
272 $3,268,782 $553,850 $2,114,361 $1,828,158
273 $3,227,646 $545,726 $2,369,204 $1,825,288
274 $3,287,060 $549,280 $2,132,679 $1,817,324
275 $3,281,134 $555,670 $1,917,524 $1,769,740
276 $3,270,620 $572,136 $2,102,609 $1,775,210
277 $3,296,466 $592,800 $2,352,088 $1,797,826
278 $3,299,664 $542,860 $2,102,151 $1,846,074
279 $3,283,118 $538,246 $2,135,697 $1,795,390
280 $3,212,646 $528,154 $2,403,092 $1,800,462
Obs CrgoRev5 CrgoRev6 Date
265 $216,498 $1,229,390 22SEP2000
266 $233,466 $975,811 23SEP2000
267 $217,542 $943,923 24SEP2000
268 $219,428 $967,185 25SEP2000
269 $214,046 $985,297 26SEP2000
270 $212,828 $949,119 27SEP2000
271 $223,846 $943,461 28SEP2000
272 $219,926 $1,194,524 29SEP2000
273 $219,114 $974,305 30SEP2000
274 $219,792 $972,585 01OCT2000
275 $225,944 $984,625 02OCT2000
276 $215,386 $981,231 03OCT2000
277 $216,650 $941,179 04OCT2000
278 $217,364 $980,101 05OCT2000
279 $225,754 $1,211,148 06OCT2000
280 $213,560 $969,143 07OCT20003.3 Solutions to Exercises1. Concatenate the companys annual raw data files for the current year and previous two years using the FILEVAR= option. Create a SAS data set named last3. The raw data files use the following naming convention:
For directory based:y2002.dat
For z/OS (OS/390):'.prog3.rawdata(y2002)'
Use the following INPUT statement:
input Flight $ Date : date9. Depart : time5.;
Save your SAS program.
For directory based:
ch3ex1.sas
For z/OS (OS/390):
'.prog3.sascode(ch3ex1)'
data last3(drop=year thisyear);
thisyear=year(today());
do year=thisyear to thisyear-2 by -1;
NextFile="y"!!put(year,4.)!!".dat";
do until(Last);
infile xxx filevar=NextFile end=Last;
input Flight $ Date : date9. Depart : time5.;
output;
end;
end;
stop;
run;
proc print data=last3;
format Date date9. Depart time5.;
run;
2. Append ia.quarter4 to ia.y2000. First, determine if the data sets have the same variables. The resulting data set should be ia.y2000 data with the additional observations from ia.quarter4.
proc append data = ia.quarter4 base = ia.y2000 force;
run;