axio research e-compare a tool for data review bill coar

28
Axio Research E-Compare A Tool for Data Review Bill Coar

Upload: esmeralda-bliss

Post on 01-Apr-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Axio Research E-Compare A Tool for Data Review Bill Coar

Axio ResearchAxio Research

E-Compare

A Tool for Data Review

Bill Coar

Page 2: Axio Research E-Compare A Tool for Data Review Bill Coar

MotivationMotivation

• Consider the case when programming with near final data

• Begin running some standard validation checks• Identify problem records and request changes• Desire to know all changes are made, and no

unexpected changes occurred

Page 3: Axio Research E-Compare A Tool for Data Review Bill Coar

MotivationMotivation

• Consider the case where you receive accumulating data throughout the life of a project

• In each iteration, some data has already been reviewed and queried

• For subsequent reviews– Wish to know the requested changes were made– Only review data that is new

Goal is to develop a tool using SAS to assist in these areas of data review

Page 4: Axio Research E-Compare A Tool for Data Review Bill Coar

OutlineOutline

• Identify the goals of the tool (E-compare)• Introduction and steps of E-compare• Look at some examples• Extension to comparing many datasets• Final remarks

Page 5: Axio Research E-Compare A Tool for Data Review Bill Coar

GoalsGoals

• Based on needs of data management group and clinical scientists– Identify new records– Identify which records were changed

• Review new values versus old values– Identify records that did not change– Identify records that were deleted

Page 6: Axio Research E-Compare A Tool for Data Review Bill Coar

Proc CompareProc Compare

• Compares (two) datasets (based on key variables)– Base versus compare

• Identify attributes that differ• Identify variables\records in one but not the other• Allows for variable names to differ but values be

compared• Can set tolerances for defining what is really

“different”• Many other procedure options to assist

Page 7: Axio Research E-Compare A Tool for Data Review Bill Coar

Basics of Proc CompareBasics of Proc Compare

Proc compare base=basedata compare=compddata listvar listobs;

id key variables;

var var1 var2 var3;

with ovar1 ovar2 ovar3;

Run;

In preparing for this presentation, I found the TRANSPOSE option that might help!

Page 8: Axio Research E-Compare A Tool for Data Review Bill Coar

Proc CompareProc Compare

• Pros– Displays a lot of relevant information– Fairly straightforward

• Cons– Not always easy to read

• Amount of text that gets displayed for differences– Non-SAS users seem to be intimidated by it

Page 9: Axio Research E-Compare A Tool for Data Review Bill Coar

Introduction to E-compareIntroduction to E-compare

• Idea originated from talking with data managers and clinical scientists

• Different group with different needs• Many not comfortable working within SAS

– Excel– Review listings

• Desire for repeatability• Extend to many datasets

– D-compare

Page 10: Axio Research E-Compare A Tool for Data Review Bill Coar

Introduction to E-compareIntroduction to E-compare

• Parameters:– Base data, compare data, key variables, variables to

compare (optional), output data, debugging indicator

• Assumes the same data structure, and that the key variables exist

• Uniqueness identified by key variables• Output is a SAS dataset with essentially the same

structure as the input datasets– One additional flag to identify the results of the compare

Page 11: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

• Sorting and creating working copies of input datasets• Check for uniqueness based on key variables

– First. and last. on the last key variable– Check both the base and compare datasets

• If there are records with duplicate key variables– Print a message in the output and log– Goto the end of the macro to stop execution

%goto NOEXEC;..

%NOEXEC:%mend;

Page 12: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

• Merge on key variables, create 3 datasets– NEW records (zz_newrecs)– DELETED records (zz_delrecs)– Records in BOTH datasets needed to identify differences

(zz_both)

• Perform proc compare– ID key variables– Default compares all variables– Obtain the output dataset using OUT= and OUTNOEQUAL

options

Page 13: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

Straight-forward merge…

data zz_newrecs zz_delrecs zz_both;

merge zz_comp(in=a keep=&keyvar)

zz_base(in=b keep=&keyvar);

by &keyvar;

if a and ^b then output zz_newrecs;

if b and ^a then output zz_delrecs;

if (a and b) then output zz_both;

run;

Page 14: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

Straight-forward proc compare

proc compare base=zz_base compare=zz_comp out=zz_cout noprint outnoequal;

id &keyvar;

%if &compvar ne ALL %then %do;

var &compvar;

%end;

run;

Page 15: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

• If a record changed, it is in the output data (zz_cout) from proc compare due to the OUTNOEQUAL option

• Merge various datasets on key variables• Identify records that did not change

– Remerge ZZ_COUT with ZZ_BOTH to obtain records that did not change

• For records that did change– Remerge ZZ_COUT with ZZ_BASE to obtain old values– Remerge ZZ_COUT with ZZ_COMP to obtain new values

Page 16: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

• Set 5 datasets together and define flags using the in= option– 1 - No change– 2 - Change from– 3 - Change to– 4 - New record– 5 - Deleted record

• Clean up work space by deleting interim data, unless– DEBUG option is specified to be TRUE

Page 17: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

Basic set statement…

data &out;

set zz_nodiff(in=a) zz_diffbase(in=b) zz_diffcomp(in=c) zz_newcomp(in=e) zz_delbase(in=f) ;

by &keyvar;

length zz_compflg $15;

if a then zz_compflg='1 - No Change';

else if b then zz_compflg='2 - Change From';

else if c then zz_compflg='3 - Change To';

else if d then zz_compflg=‘4 - Rec Added';

else if e then zz_compflg=‘5 - Rec Deleted';

label zz_compflg='Per record comparison';

Run;

Page 18: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

Some cleaning up of the work space…

%if &debug=F %then %do;

proc datasets library=work nodetails nolist;

delete zz_: / memtype=data;

quit;

%end;

Page 19: Axio Research E-Compare A Tool for Data Review Bill Coar

Steps in E-CompareSteps in E-Compare

• Note about DEBUG– If macro does not execute because of non-uniqueness in

key variables, set DEBUG=TRUE– This does not delete the working datasets– Allows one to identify the problem records using a viewtable

Page 20: Axio Research E-Compare A Tool for Data Review Bill Coar

E-compareE-compare

• What E-compare does not do:– Does not identify the variable that changed– Does not indicate if the attributes of a variable change– Does not actually generate a report

• Generation of a report can be added, but…– This component was considered in extending E-compare to

all corresponding datasets in two libraries allowing for a single output

– Proc report or export to Excel– This part is defined by the needs of the users

Page 21: Axio Research E-Compare A Tool for Data Review Bill Coar

E-compare Example OutputE-compare Example Output

• Creation of RTF via Proc Report and ODS• Creation of Excel file via SAS Access to PC File form

ats or ODBC

Consider repeating E-compare on all datasets in two libraries

Page 22: Axio Research E-Compare A Tool for Data Review Bill Coar

Schematic of D-compare with Excel OutputSchematic of D-compare with Excel Output

• Use proc contents output to obtain information about datasets in each

• Identify mismatches (in one library but not the other)• Subset using a list of datasets to exclude• Obtain a list of datasets for looping

Page 23: Axio Research E-Compare A Tool for Data Review Bill Coar

Schematic of D-compare with Excel OutputSchematic of D-compare with Excel Output

• Check if the Excel file exists (may need to delete)• For each iteration, identify key variables from a proc

format and %sysfunc

• For each iteration, perform E-compare• For each iteration, update the Excel file

– Select records to include– SAS\Access to PC File Formats– SAS\Access to ODBC

%let kvars=%sysfunc(putc(&&MEM&I,$fmtname.));

Page 24: Axio Research E-Compare A Tool for Data Review Bill Coar

D-compare with Excel OutputD-compare with Excel Output

• Proc export– Requires SAS\Access to PC File Formats– Specify the SHEET to have the name of the dataset being

compared

– Appends to the excel file if it exists

proc export data=zz_fnl outfile="&OUTFILE" DBMS=excel;sheet="&&MEM&I."; run;

Page 25: Axio Research E-Compare A Tool for Data Review Bill Coar

D-compare with Excel OutputD-compare with Excel Output

• Export using a data step and ODBC– Requires SAS\Access to ODBC– libname prior to iteration through each dataset

– Data step to append within each iteration

LIBNAME _lbxls odbc NOprompt= "dsn=Excel Files; Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; dbq=&OUTFILE";

DATA _lbxls.&&MEM&I; SET zz_fnl;

run;

Page 26: Axio Research E-Compare A Tool for Data Review Bill Coar

E-compare Example OutputE-compare Example Output

• Creation of Excel file via SAS Access to PC File formats or ODBC

Page 27: Axio Research E-Compare A Tool for Data Review Bill Coar

ConclusionsConclusions

• E-compare is just a different way of looking at Proc Compare results

• Provides the ability to monitor data as changes are applied to the central database

• Reports can be printed or saved to assist in documentation

• Strict data structures allow for simplification across studies

Page 28: Axio Research E-Compare A Tool for Data Review Bill Coar

Any Question?

ConclusionsConclusions