rstat: release 1.2 ali-zain rahim, strategic product manager march 18, 2010
TRANSCRIPT
Agenda:
Differentiators and Benefits
Review 1.2 Enhancements
Survival Analysis demo - Child welfare
Questions
RStat: Differentiators & Benefits
Based on R-Project Open Source Maintained by world wide consortium of universities, scientists,
government funded research organizations, statisticians. Over 2000 packages
RStat is a GUI to R Intuitive guided approach to modeling Simple model evaluation Intended both for business analysts and advanced modelers
Single BI and Predictive Modeling Environment Re-use metadata and queries Perform data manipulation and sampling Build scoring applications
Unique Deployment Method for Scoring Solutions Scoring models are built directly into WF metadata Deployment on any platform and operating system - Windows, Unix,
Linux, Z/OS, and i Series.
RStat 1.2 Enhancements:
New Modeling Technique: Survival Analysis:
Two Techniques – Cox Regression and Parametric Time Regression Cox Regression – risk scoring routine Parametric regression – time scoring routine
What Survival Does and when to use Survival analysis encompasses a wide variety of methods for
analyzing the timing of events with censored data (Censoring: Nearly every sample contains some cases that do not experience an event)
How to study the causes of Births and Deaths Marriages and Divorces Arrests and Convictions Job Changes and Promotions Bankruptcies and Mergers
Wars and Revolutions Residence Changes Consumer Purchases Adoption of Innovations Hospitalizations
.
RStat 1.2 Enhancements – cont’d
New Scoring Routines: Neural Network model with comprehensive output – Enables
users to compile NNET models into WebFOCUS functions for creation of applications.
Transformation capabilities for scoring routines – Allows for data manipulation within the RStat tool. Some methods are: Imputation, Scaling, and Remapping
Enhanced statistical output: Indicators to Regression models ANOVA table to show
significance – Enables users to determine the variables that are significant to the model.
Performance and Usability optimization Auto sampling for faster visualization of large data sets in the
KMeans model – Enables more optimized and efficient resource usage to display Cluster model statistics and data plots.
Performance and Usability optimization Model optimization – Allows only the variables used to create
the model to be included in the exported C file. [In RStat 1.1 all variables selected by the user were included in the model]
Enhanced Log functionality – Allows users to create R-scripts for use with other applications, such as a Dialogue Manager application.
Process Cancellation capability – Allows users to cancel a long running process from within RStat.
Special characters functionality – Enables efficient handling of data with special characters.
Timestamp within the RConsole and Log Textview – Enables users to view and match the log with any errors received, thereby allowing for easier troubleshooting.
RStat 1.2 Enhancements – cont’d
Demo: Child Welfare Use Case
To identify the children who will stay in Child Welfare programs, and at what age will the children leave the programs – a time to event analysis
Foster Care Analytical Framework: Background and Optimization Goals
Half a million children in foster care Managed by county departments and the private
agencies who train families It is a team effort to find a child a permanent home Severe consequence of bad foster care:
Youth who leave the system are more likely to be homeless, incarcerated, unemployed, and unskilled.
Foster Care Analytical Framework: Goals & Benefits :
Provide better understanding of the factors that contribute to better foster care to all parties involved in the process
Provide standardized analytic and reporting system
Match children with better foster parents Optimize child foster care duration
Thank you!
"..if you are serious about statistics as a career, you need to become familiar with R because it is the most powerful and flexible language available, and may become the lingua franca of statistical programming in the near future.“
Source: "Statistics in a Nutshell" by Sarah Boslaugh published by O'Reilly