copyright © 2004, sas institute inc. all rights reserved. paul kent vp sas platform research &...
TRANSCRIPT
Copyright © 2004, SAS Institute Inc. All rights reserved.
Paul KentVP SAS Platform Research & Development<[email protected]>
Forthcoming Changes in SAS
Copyright © 2004, SAS Institute Inc. All rights reserved.
Where do I come from?
New Hill, North Carolina
Y’all
Johannesburg, South Africa
Julle
Fareham, England
???
Copyright © 2004, SAS Institute Inc. All rights reserved.
R & D :: Loyal Employees
Copyright © 2004, SAS Institute Inc. All rights reserved.
R & D groups, and where I come from
Platform
Clients
Solutions• With Analytics
Copyright © 2004, SAS Institute Inc. All rights reserved.
R & D groups, and where I come from
Platform
Clients
Solutions• With Analytics
Copyright © 2004, SAS Institute Inc. All rights reserved.
What do we programmers do?
Gather Data
Organise Data
Arrange Data for consumption
Facilitate said consumption
Create understanding of Data
Promote understanding of said DataValue
Copyright © 2004, SAS Institute Inc. All rights reserved.
Power Reporting
Web Reporting
Information Delivery Framework
Information Consumers Domain Experts Power User
Business Analyst
InfoTech
Large% Small%
Web Report Viewing
Analytic Reporting
Who do we programmers do it for?Audience Continuum
Value
Copyright © 2004, SAS Institute Inc. All rights reserved.
Forthcoming Improvements in the SAS Foundation
ODS (and the new ODS statistical graphics)
SAS Database Storage capabilities
The Data Step and Proc SQL
Grid Computing Capabilities
Bits and Pieces
Copyright © 2004, SAS Institute Inc. All rights reserved.
ODS Statistical Graphics
Copyright © 2004, SAS Institute Inc. All rights reserved.
Survival Plot Using PROC LIFETEST in SAS 8
J. Zhou, NESUG 2002
Three-page SAS program with macros
Use GPLOT and GREPLAY for graphics
Statistical Metadata
Overlaid Curves
Copyright © 2004, SAS Institute Inc. All rights reserved.
Statistical Graphics
Essential for modern data analysis
Difficult to create in SAS prior to SAS 9• Context lost when statistical procedure terminates
• Programmer must recreate context, metadata
Statistical procedures should automatically create graphics
Follow the 80-20 rule – 20% of these might need further tweaking, but for the most part…
Copyright © 2004, SAS Institute Inc. All rights reserved.
Life Is Easier in SAS 9 …ods graphics on;
ods html file="lifetest.htm";
proc lifetest data=surv;
time surv*censor(1);
survival plots=(survival hwb);
strata trt;
id patient;
run;
ods html close;
ods graphics off;
Copyright © 2004, SAS Institute Inc. All rights reserved.
LIFETEST Procedure – Survival Plot
Copyright © 2004, SAS Institute Inc. All rights reserved.
LIFETEST Procedure – HWB plot
Copyright © 2004, SAS Institute Inc. All rights reserved.
Usage of ODS Statistical Graphics in SAS 9
Experimental in 30 SAS/STAT and SAS/ETS procedures - SAS 9.1
Automates creation of commonly used graphical displays for a particular analysis
Production in SAS 9.2
Copyright © 2004, SAS Institute Inc. All rights reserved.
PROC GLM
Copyright © 2004, SAS Institute Inc. All rights reserved.
PROC GLM (ANCOVA)
Copyright © 2004, SAS Institute Inc. All rights reserved.
GAM Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
HPF Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
KDE Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
KDE Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
LOGISTIC Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
MIXED Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
MIXED Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
PHREG Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
PLS Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
PRINCOMP Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
REG Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
TIMESERIES Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
UCM Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
Integration with ODS Styles Over 30 different styles
New style elements for statistical graphics• Fitted line
• Confidence lines and bands
• Prediction Lines
• Outliers
• Classification groups
Copyright © 2004, SAS Institute Inc. All rights reserved.
Style Demonstrationods html file=“robustreg.htm” style=journal;
ods graphics on;
title “Journal Style”;proc robustreg data=mydata plot=all;
model y = x1 x2 x3;run;
ods html close;
Journal Analysis Default Statistical
(only Summary Statistics and Residual Histogram output shown)
Copyright © 2004, SAS Institute Inc. All rights reserved.
Summary
Goal is to automate creation of graphics by statistical procedures• Minimum work for user
• Maximum built-in functionality
Experimental in SAS 9.1
Production in SAS 9.2
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Transactional Storage(aka SAS Database Capabilities)
Demo Time
1. Color_table• Remember to start your TableServer
2. Customers• Remember to start your AppServer (tomcat5)
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Transactional Storage(aka SAS Database Capabilities)
A more traditional Database Capability
From SAS. (not oracle, ibm, or microsoft)
Based on OpenSource “Firebird”
Real Datatypes – INT, MONEY, VARCHAR
Real Connectors – JDBC, ODBC, SAS Libname
Real Transactions – Rollback and Commit
MultiUser Server
Copyright © 2004, SAS Institute Inc. All rights reserved.
What’s New in SAS Grid Automation
Cheryl DoningerR&D Director, Grid Development
Roger ThompsonRelationship Manager
Merry RabbProduct Manager, Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Computing Market Size & Growth
Rapid Adoption of Grid Computing Based on Benefits
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Adoption is Increasing
A high percentage of firms using analytical
applications are considering grid
2/3 of firms surveyed are using or
considering grid technology
Copyright © 2004, SAS Institute Inc. All rights reserved.
Benefits of Grid Computing
Faster results
More executions – more data
Time to recover from errors
Better use of resources
Virtualize resources
Incremental IT spend
Copyright © 2004, SAS Institute Inc. All rights reserved.
Types of Applications Suitable for Grid Long running
Many replicate runs of same fundamental task• simulation (what if analysis)• optimization (testing lots of scenarios)• BY GROUP processing• data segmentation
Independent tasks running against large data sources• scoring – risk analysis• multiple procedures and data steps
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Strategy
Infrastructure benefits SAS applications• large data / complex algorithms
Focus areas• Development
• Run-time
• System management
Incremental Releases
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Roadmap Phase I
SAS 8.2 functionality• %Distribute
• SAS/CONNECT
• SAS log
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Success Stories
Texas Tech University
Statistics Canada
Large Pharmaceutical Company
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Roadmap Phase II
SAS 9.1.3 Q3/2005 functionality• smarter engines for SAS IDEs
• SAS/Platform integration
• SASMC monitoring
Copyright © 2004, SAS Institute Inc. All rights reserved.
Business Analytics - Enterprise Miner on SMP
Copyright © 2004, SAS Institute Inc. All rights reserved.
Business Analytics - Enterprise Miner on Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Data Integration – ETL Studio on SMP/Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Data Integration – ETL Studio on SMP/Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Stored Process
Business Intelligence – Enabled on SMP/Grid
SAS Program
ETL Studio
Enterprise Miner
Web Services
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Manager Plugin – job view
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Manager Plugin – host view
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS 9 Grid Computing Components
SAS Applications
Piping
Distribution
Session Spawning
Grid Enabled Code Generation
NEW September 2005 Multi-Processor SAS
Multiple Components Working Together to Provide Grid Computing
SAS 9 Grid Computing
Grid Manager Plug-in
Platform Suite for SAS
Grid Monitoring
Grid Management
Job Termination
Dynamic Load Balancing
Job, Queue & Host Management
Enterprise Miner Stored Processes Data Integration
SAS Connect
Copyright © 2004, SAS Institute Inc. All rights reserved.
General Layout of a SAS Grid
Client Machine
Metadata Server
Grid Control Machine
Grid Node
Grid Node
Grid Node
…
nSAS Grid
Machine Grid Mgr plugin
Platform Suite for SAS
LSF
LSF
LSF
SAS ETLSAS EMSAS Foundation
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Work Flow…
n
Node1
Node2
Node3
Node1 ! ! 1 () (SASMain)
Node2 ! ! 1 () ()
Node3 ! ! 1 () (SASMain)
…
LSF Cluster File
SASMain – Server Context
Platform Server Component
sas -noobjectserver
SASServers
Metadata Server
Workspace Server
Connect Client
LSF
SAS MC
SAS Metadata
session resource sascmd wl options------------------------------------------------- p1 SASMain sas –noobjectserver
grdsvc_enable(p1, “resource=SASMain”);
ETL Studio
Enterprise Miner
signon p1;
Copyright © 2004, SAS Institute Inc. All rights reserved.
Partitioning the Grid…
n
EM grid
ETL grid
Node1
Node2
Node3
Node1 ! ! 1 () (SASMain,EM)
Node2 ! ! 1 () (SASMain,EM,ETL)
Node3 ! ! 1 () (SASMain, ETL)
…
LSF Cluster File
Metadata Server
Workspace Server
Connect Client
LSF
SAS MC
SASServers
SASMain – Server Context
Platform Server Component
sas –noobjectserver
EM, ETL
SAS Metadata
ETL Studio
Enterprise Miner
session resource sascmd wl options-------------------------------------------------------------------------- p1 SASMain sas –noobjectserver ETL
grdsvc_enable(p1, “resource=SASMain, workload=ETL”);signon p1;
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Provides: Speed and Efficiency
Copyright © 2004, SAS Institute Inc. All rights reserved.
Analytics are working, so people…
Build more models• For successively refined segments of customers
Use more data in those models
Integrate the results into operational systems• <near real time>
A SAS9.2 datastep movie
Copyright © 2004, SAS Institute Inc. All rights reserved.
Implications
More Multi thread enablement within SAS
Yes, even the DATA STEP
Saved Programs
Multi Threaded Server Capabilities• Same model, parallel data for thruput
• Many models, same data – one off scores in operational systems
Models Management can deploy models to “score servers” without restarting them
Copyright © 2004, SAS Institute Inc. All rights reserved.
Bits and Pieces
Reverse Engineer SAS jobs
Checkpoint and Restart SAS jobs
Encode (and protect) your SAS jobs
ZIP functions
CRC …
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Protect your IP
PROC SCRAMBLE
file=‘myfile.sas’
outfile=‘secret.sas’ <expire=> <site=> …
;
Send secret.sas to your customers
%include ‘secret.sas’; • Implies nosource; your macros can reset NOMPRINT…
Copyright © 2004, SAS Institute Inc. All rights reserved.
Checkpoint/Restart andParallelization Featuresin the Core SupervisorRick Langston, Core Systems Department
Copyright © 2004, SAS Institute Inc. All rights reserved.
Checkpoint/Restart
Craig R.’s request as per user community
Job fails – want to restart where it left off
ETL Studio also wanted a restart facility
Copyright © 2004, SAS Institute Inc. All rights reserved.
A simple solution
Record a checkpoint number, save it in WORK
If restarting, skip PROC / DATA steps to there
Tokenize everything
Execute all global statements
Copyright © 2004, SAS Institute Inc. All rights reserved.
To set up for checkpointing
Use NOWORKINIT, NOWORKTERM
Have WORK refer to a permanent directory
Use the CHECKPOINT option
Copyright © 2004, SAS Institute Inc. All rights reserved.
Subsequent restarting
Again use NOWORKINIT, NOWORKTERM
Again use WORK to the permanent directory
Use the RESTART option
Job will restart as of the last successful step
Copyright © 2004, SAS Institute Inc. All rights reserved.
Is this what users want?
We can’t do this without user being proactive
data temp / set temp issues
skipped steps may need to be executed
Output files (flat files – DISP=MOD, databases…)
Copyright © 2004, SAS Institute Inc. All rights reserved.
EXECUTE_ALWAYS
CHECKPOINT / EXECUTE_ALWAYS;
Use it for a step that must be executed
For example, SYMPUT and CALL EXECUTE
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Example
Using options debug=‘checkpoint-implicit’;
Option names still to be decided
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
data temp1; x=1; run;
data temp2; x=2; run;
data temp3; x=3; run;
data _null_;
if "&sysparm."="1"
then abort abend 999;
run;
data temp4; x=4; run;
Copyright © 2004, SAS Institute Inc. All rights reserved.
Invoke once with checkpoint-implicit
Then reinvoke with restart-implicit
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Additional info
Planned for 9.2
Option names still being decided
Wanting additional input
Copyright © 2004, SAS Institute Inc. All rights reserved.
Parallelization Efforts
Reading in arbitrary SAS code
Producing metadata in comments
This could be post-processed by ETL Studio
This could be post-processed by Grid Computing
Copyright © 2004, SAS Institute Inc. All rights reserved.
Parallelization Efforts
Researching so far
Hooks in dependency opens
Catalogs, flat files, SAS data sets, etc.
Emitting info in comments
Example of use
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Exposure to User
New option, such as DEPMETA=fileref
SAS program with comments written to this file
Copyright © 2004, SAS Institute Inc. All rights reserved.
Questions/comments?
Copyright © 2004, SAS Institute Inc. All rights reserved.
Ideas for the Future!
How can the software learn?
So the user doesn’t have to learn about the software; they can learn the business!
Some future ETL studio JOB• Remembers data volumes from last weeks run
• Uses that memory to choose a better strategy
Copyright © 2004, SAS Institute Inc. All rights reserved.
Your Turn!!
You tell me next time SAS forgets something it should have remembered
And why remembering that would help SAS improve next time
Thanks for listening!