importing excel into sas: a robust approach for difficult ... · importing excel into sas: a robust...

21
Importing Excel into SAS: A Robust Approach for Difficult-To-Read Worksheets Name of event: TASS Location of event: Toronto Presenter’s name: Bill Sukloff Branch name: Science &Technology Date of event: September 25, 2015

Upload: hoangtuyen

Post on 04-May-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Importing Excel into SAS: A Robust

Approach for Difficult-To-Read

Worksheets

Name of event: TASS

Location of event: Toronto

Presenter’s name: Bill Sukloff

Branch name: Science &Technology

Date of event: September 25, 2015

Page 2: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 2 – November-11-15

Overview of this Presentation

• Background

• Issues encountered when importing Excel Workbooks

• Requirements

• Approach and implementation

• Conclusions

Page 3: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 3 – November-11-15

Background

• Air Quality Measurements and Analysis Section,

Environment Canada

• Mandate:

– Monitor atmospheric pollutants at multiple sites across Canada

– Data analysis and reporting

• Obtain data from Canada and the U.S. to analyze and

report on:

– the chemical composition of the atmosphere,

– atmospheric processes,

– spatial and temporal patterns,

– source-receptor relationships,

– and long range and transboundary transport of air pollutants.

Page 4: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 4 – November-11-15

Canadian Air and Precipitation

Monitoring Network

Page 5: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 5 – November-11-15

Background

• Data sources

– Laboratories

– Scientists

– External organizations

• Data are often provided in Excel workbooks

Page 6: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 6 – November-11-15

Example of a Difficult-to-Read Worksheet

Page 7: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 7 – November-11-15

Example of a Difficult-to-Read Worksheet

Page 8: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 8 – November-11-15

Example of a Difficult-to-Read Worksheet

Page 9: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 9 – November-11-15

Example of a Difficult-to-Read Worksheet

Page 10: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 10 – November-11-15

Example of a Difficult-to-Read Worksheet

Page 11: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 11 – November-11-15

Issues

• Column header names are not always on the first row

• Column header names sometimes span two rows

• Worksheet descriptions often appear in the first few rows

• Characters in numeric fields

– Special meaning, e.g., “<“ indicates a value that is below the

detection limit

– Data entry errors such as 0..01

– Inadvertently repeating column header names

• PROC IMPORT often produces inconsistent variable

attributes across files

– i.e., variable types (numeric, character, date, time) and formats

Page 12: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 12 – November-11-15

Requirements

• Requirements

– No manipulation of original worksheets

– Column headers may appear on any row and may span over two

rows

– Variable attributes are consistent over multiple worksheets

– A report that shows values which are inconsistent with the

specified attributes

▪ Characters in a numeric variable

▪ Extra decimal places

▪ Invalid dates and times

▪ The same column header name appearing in more than one column

– SAS/ACCESS Interface to PC Files

Page 13: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 13 – November-11-15

Approach

• A variable definition worksheet that specifies:

1. Column name

2. Variable type

▪ Character

▪ Numeric

▪ Date

▪ Time

3. Format

▪ Char: # of characters

▪ Numeric: number of digits before and after decimal place

▪ Date: format, e.g., yyyy-mm-dd, mm-dd-yyyy

▪ Time: format, e.g., hh:mm, hh:mm:ss

Page 14: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 14 – November-11-15

Approach (continued)

• Import worksheets with all columns in character format

– PROC IMPORT option GETNAMES=NO;

• The DATA step is used to convert results from PROC

IMPORT to the attributes specified for each variable

• Creation of a report showing conversion errors

Page 15: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 15 – November-11-15

Variable Definition Worksheet

Page 16: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 16 – November-11-15

Code

Page 17: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 17 – November-11-15

Implementation

• Open the input worksheet in Excel

• Record the following information:

– Workbook name

– Worksheet name

– Column headers row number

– 2nd row number for column headers, if any

– Start of data row number

– Output SAS dataset name

▪ the work directory is emptied with each run of the program so use

two-level names if importing multiple worksheets, e.g.,

archive.airdata

– Directory name for the data conversion report (PDF’s)

Page 18: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 18 – November-11-15

Implementation (continued)

• Enter the required information into the macro

parameters appearing in the header section of the

SAS program

• Alternatively, when you have many worksheets, move

the macro parameters to a new SAS program and

“%include” the SAS conversion program

– repeat the macro parameters and %include statement for

each worksheet

• Copy the header row(s) into a new worksheet

– Select “transpose” when pasting

– Insert the three variable definition headings on the top row

– Enter the variable attributes for each column heading

Page 19: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 19 – November-11-15

Output SAS Dataset

Page 20: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 20 – November-11-15

Conversion Report

Page 21: Importing Excel into SAS: A Robust Approach for Difficult ... · Importing Excel into SAS: A Robust Approach for Difficult-To ... •Issues encountered when importing Excel ... two-level

Page 21 – November-11-15

Conclusions

• Keys to a robust method for difficult-to-read Excel

worksheets:

– Variable attributes are specified in a variable definition

worksheet

– Import worksheets as character and programmatically

convert to numeric, date, and time variables

• Combining data from multiple worksheets is easy

because variable attributes are forced to be consistent

• Invalid numeric entries are reported to data originators

• For a copy of the SAS program please contact

[email protected]