sas enterprise miner 13support.sas.com/.../enterprise-miner/pdfs/em131.pdfdeveloped using sas rapid...
TRANSCRIPT
Turning increasing volumes of data into useful information is a challenge for most organizations. Relationships and answers that identify key opportunities lie buried somewhere in all of that data.
• Whichcustomerswillpurchasewhatproducts and when?
• Whichcustomersareleavingandwhatcan be done to retain them?
• Howshouldinsuranceratesbesettoensure profitability?
• Howcanyoupredictfailures,reduceunnecessary maintenance and increase uptime to optimize asset performance?
To get answers to complex questions and gain an edge in today’s competitive market,powerfuladvancedanalyticsolutions are required. Discovering previously unknown patterns can help decision makers across your enterprise create effective strategies. Those who choose to implement SAS® data mining into their business processes will be able to stay competitive in today’s fast-moving markets.
SAS® Enterprise Miner™ 13.1Create highly accurate analytical models that enable you to predict with confidence
What does SAS® Enterprise Miner™ do?It streamlines the data mining process so you can create accurate predictive and descriptive analytical models using vast amounts of data. Our customers use this soft-waretodetectfraud,minimizerisk,anticipateresourcedemands,reduceassetdown-time,increaseresponseratesformarketingcampaignsandcurbcustomerattrition.
Why is SAS® Enterprise Miner™ important?It offers state-of-the-art predictive analytics and data mining capabilities that enable organizationstoanalyzecomplexdata,findusefulinsightsandactconfidentlytomakefact-based decisions.
For whom is SAS® Enterprise Miner™ designed?It’s designed for those who need to analyze increasing volumes of data to identify and solve critical business or research issues – and help others make well-informed deci-sions.Thisincludesdataminers,statisticians,marketinganalysts,databasemarketers,riskanalysts,fraudinvestigators,engineers,scientistsandbusinessanalysts.
› Fact Sheet
Benefits• Understand key relationships and develop models intuitively and quickly. The graphi-
cal user interface makes it easy for analytic professionals to interact with information at any point in the modeling cycle. Both analytical professionals and business analysts enjoyacommon,easy-to-interpretvisualviewofthedataminingprocessandcancol-laborate to solve the toughest challenges.
• Build better models more efficiently with a versatile data mining workbench. An interactive self-documenting process flow diagram environment shortens model devel-opment time. It efficiently maps the data mining process to produce the best possible results.
• Easily derive insights in a self-sufficient and automated manner. The SAS Rapid Predictive Modeler enables business analysts and subject-matter experts with limited statistical skills to automatically generate models and act on them quickly. Analytic results are provided in easy-to-understand charts for improved decision making.
• Enhance the accuracy of predictions to ensure the right decisions are made and that best actions are taken. Better-performing models enhance the stability and accuracy ofpredictions,whichcanbeverifiedeasilybyvisualmodelassessmentandvalidationmetrics. Model profiling is also supported to provide an understanding of how the predictor variables contribute to the outcome being modeled.
• Ease model deployment and scoring processes for faster results. SAS® Enterprise Miner™ automates the tedious process of scoring new data and provides complete scoring code for all stages of model development. The scoring code can be deployed in a variety of real-time or batch environments. This saves time and helps you achieve accurate results so you can make decisions that result in the most value.
Product Overview
Everyone can benefit from incorporating analytics in a secure and scalable manner. But this requires collaboration across the organization and calls for a powerful,multipurposedataminingsolution that can be tailored to meet different needs.
Whileoneanalyticalapproachmayworkfineononedatacollection,itmaynotperform well with new data sources or be able to answer new business questions. This makes it crucial to have a wide selection of analysis tools at hand. Different tools produce different models,andonlywhenyoucomparemodels side by side can you see which data mining approach produces the best “fit.” If you start with a workbench thathaslimitedanalyticaltools(e.g.,onlyregressionoronlydecisiontrees),the end result could be a model with limited predictive value.
SAS Enterprise Miner is delivered as a distributed client/server system. This provides an optimized architecture so data miners and business analysts can work more quickly to create accurate predictiveanddescriptivemodels,andproduce results that can be shared and incorporated into business processes. Toenhancethedataminingprocess,thissoftware is designed to work seamlessly withotherSAStechnologies,suchasdataintegration,analyticsandreporting.
An integrated, complete view of your dataData mining is most effective when it is part of an integrated information delivery strategy – one that includes data gath-ered from hugely diverse enterprise sources.Callcenterlogs,surveyresults,customerfeedbackforms,webdata,timeseries data and transactional point-of-sale data can all be combined and analyzed with the industry’s most sophisticated
data mining package. Adding SAS Text Miner lets you analyze structured and unstructured data together for more accurate and complete results.
Easy-to-use GUIAneasy-to-use,drag-and-dropinterfaceis designed to appeal to analytic profes-sionals. The advanced analytic algorithms are organized under core tasks that are performed in any successful data mining endeavor. The SAS data mining process encompasses five primary steps: sam-pling,exploration,modification,modelingandassessment(SEMMA).Ineachstep,you perform an array of actions as the data mining project develops. By deploying nodes from the SEMMA toolbar,youcanapplyadvancedstatis-tics,identifythemostsignificantvariables,transform data elements with expression builders,developmodelstopredictoutcomes,validateaccuracyandgener-ate a scored data set with predicted values to deploy into your operational applications.
A quick, easy and self-sufficient way to generate models SAS Rapid Predictive Modeler automati-cally steps nontechnical users through a workflowofdataminingtasks(e.g.,transformingdata,selectingvariables,
fitting a variety of algorithms and assessing models) to quickly generate predictive models for a wide range of business problems. SAS Rapid Predictive Modeler is a SAS® Enterprise Guide® or SAS Add-In for Microsoft Office (Microsoft Excel only) task and uses prebuilt SAS Enterprise Miner modeling steps. A collaborative approach allows models developed using SAS Rapid Predictive Modeler to be customized by analytic professionals using SAS Enterprise Miner.
Both classical and modern modeling techniquesSAS Enterprise Miner provides superior analyticaldepthwithasuiteofstatistical,data mining and machine learning algorithms.Decisiontrees,baggingandboosting,timeseriesdatamining,neuralnetworks,memory-basedreasoning,hierarchicalclustering,linearandlogisticregression,associations,sequenceandweb path analysis are all included. And more. The breadth of analytical algo-rithms extends to industry-specific algorithmssuchascreditscoring,andstate-of-the-art methods such as gradient boosting and least angular regression splines.
Figure1:Performprincipalcomponentanalysisfordimensionreduction,afrequentintermediatestep in the data mining process.
Sophisticated data preparation, summarization and exploration Preparing data is a time-consuming aspect of all data mining endeavors. A powerful set of interactive data prepara-tion tools is available for addressing missingvalues,filteringoutliersanddeveloping segmentation rules. Core data preparation tools include file importingandappending,andmergingand dropping variables. Extensive descriptive summarization features and interactive exploration tools let even novice users examine large amounts of dataindynamicallylinked,multidimen-sional plots. This produces quality data mining results tailored and optimally suited to specific business problems.
Business-based model compari-sons, reporting and managementAssessment features let you compare models to identify the ones that produce the best lift and overall ROI. Models generated with different algorithms can be evaluated consistently using a highly visual assessment interface. Data miners can discuss results with business domain experts for improved collaboration and better results. An innovative Cutoff node examines posterior probability distribu-tions to define the optimal actions for solving the business problem at hand.
Open, extensible design provides flexibilityThe customizable environment of SAS Enterprise Miner provides the ability to add tools and include personalized SAS code. Existing SAS models developed outside of the SAS Enterprise Miner environment can be integrated easily into the process flow environment while maintaining full control of each syntax statement. The Extension node includes interactive editor features for training and score codes. Users can edit and submit code interactively while viewing the log and output listings. Default selection lists
can be extended with custom-developed toolswrittenwithSAScodeorXMLlogic,which opens the entire world of SAS to data miners.
Open Source Integration nodeYou can now easily integrate R language code inside of a SAS Enterprise Miner process flow diagram. This enables you to perform data transformation and exploration as well as training and scoring supervised and unsupervised models in R. You can then seamlessly integratetheresults,assessyourRmodel and compare it to models generated by SAS Enterprise Miner.
In-database and in-Hadoop scoring delivers faster resultsScoring is the process of regularly applying a model to new data for implementation into an operational environment.Thiscanbetedious,especially when it entails manually rewritingorconvertingcode,whichdelays model implementation and can introduce potentially costly errors. SAS Enterprise Miner automatically generates scorecodeinSAS,C,JavaandPMML.The scoring code can be deployed in a variety of real-time or batch environ-mentswithinSAS,ontheweb,ordirectlyinrelationaldatabasesorHadoop.
Combined with a SAS Scoring Accelera-tor(availableforHadoop,PivotalGreenplum,DB2,IBMNetezza,Oracle,Teradata and SAS Scalable Performance DataServer),SASEnterpriseMinermodels can be published as database-specific scoring functions for execution directly in the database. Results can be passed to other SAS solutions for deployment of data mining results into real-time operational environments.
Parallelized grid-enabled workbenchScale from a single-user system to very largeenterprisesolutionswiththeJavaclient and SAS server architecture. Powerful servers can be dedicated to computing,whileusersmovefromofficeto home to remote sites without losing access to mining projects or services. Manyprocess-intensivetasks,suchasdatasorting,summarization,variableselectionandregressionmodeling,aremultithreaded,andprocessescanberunin parallel for distribution and workload balancing across a grid of servers or scheduled for batch processing.
Distributable data mining system suited for enterprisesSAS Enterprise Miner is deployable via a thin-client web portal for distribution to multiple users with minimal maintenance oftheclients.Alternatively,thecompletesystem can be configured on a stand-alone PC. SAS Enterprise Miner supports WindowsserversandUNIXplatforms,making it the software of choice for organizations with large-scale data mining projects. Model result packages can be created and registered to the SAS Metadata Server for promotion to SAS ModelManager,SASDataIntegrationStudio (a component of SAS Data Integration) and SAS Enterprise Guide.
High-performance data mining A select set of high-performance data mining nodes is included in SAS Enterprise Miner. Depending on the data and complexityofanalysis,usersmayfindperformance gains in a single-machine SMPmode.Inthefuture,asyouneedtoprocessbigdatafaster,aseparatelicensableproduct,SASHigh-PerformanceDataMining,letsyoudeveloptimelyandaccuratepredictivemodels.High-perfor-mance data mining procedures are available for those who prefer a coding environment. Many options are provided for complete customization of your data miningprograms.Formoredetails,visitsas.com/hpdatamining.
Intuitive interfaces• Easy-to-useGUIforbuildingpro-
cess flow diagrams:• Buildmore,bettermodelsfaster.• Deliverableviatheweb.• AccesstheSASprogramming
environment.• ProvidesXMLdiagramexchange.• Reusediagramsastemplatesfor
other projects or users. • Directlyloadaspecificdatamin-ingprojectordiagram,orchoosefromaProjectNavigatortreethatcontains the most recent projects or diagrams.
• Batchprocessing(programdevel-opment interface):• Encapsulatesallfeaturesofthe
GUI.• SASmacrobased.• Embedtrainingandscoringpro-
cesses into customized applica-tions.
Scalable processing• Server-basedprocessing.• Gridcomputing,in-databaseand
in-memory processing options.• Asynchronousmodeltraining.• Abilitytostopprocessingcleanly.• Parallelprocessing–runmultiple
tools and diagrams concurrently. • Multithreadedpredictivealgorithms.• Allstoragelocatedonservers.
Accessing and managing data• Accessandintegratestructuredandunstructureddatasources,includingtimeseriesdata,marketbaskets,webpathsandsurveydataas candidate predictors.
• FileImportnodeforeasyaccesstoMicrosoftExcel,comma-delimitedfiles,SASandothercommonfileformats.
• Supportforvariableswithspecial characters.
• SASLibraryExplorerandLibrary Assignment wizard.
Key Features
Figure2:WithintheSASEnterpriseMinerGUI,theprocessflowdiagramisaself-documentingtemplate that can be easily updated or applied to new problems and shared with modelers or other analysts.
• EnhancedExplorerwindowtoquickly locate and view table listings or develop a plot using interactive graph components.
• DropVariablesnode.• MergeDatanode.• Appendnode.• Filteroutliers:• Applyvariousdistributional
thresholds to eliminate extreme interval values.
• Combineclassvalueswithfewerthan n occurrences.
• Interactivelyfilterclassand numeric values.
• Metadatanodeformodifyingcolumnsmetadatasuchasrole,measurement level and order.
• IntegratedwithSASDataIntegrationStudio,SASEnterpriseGuide,SASModel Manager and SAS Add-In for Microsoft Office through SAS Metadata Server:• Buildtrainingtablesformining.• Deployscoringcode.
Sampling• Simplerandom.• Stratified.• Weighted.
• Cluster.• Systematic.• FirstN.• Rareeventsampling.• Stratifiedandevent-levelsampling
in Teradata 13.
Data partitioning• Createtraining,validationandtest
data sets.• Ensuregoodgeneralizationofyour
models through use of holdout data.• Defaultstratificationbytheclass
target.• Balancedpartitioningbyanyclass
variable.• OutputSAStablesorviews.
Transformations• Simple:log,log10,squareroot,inverse,square,exponentialandstandardized.
• Binning:bucketed,quantileandoptimalbinning for relationship to target.
• Bestpower:maximizenormality, maximize correlation with target and equalize spread with target levels.
• Interactionseditor:definepolynomialand nth degree interaction effects.
Figure3:DevelopcustomizedtransformationsusingtheinteractiveTransformVariablesnode Expression Builder.
• Interactivelydefinetransformations:• Definecustomizedtransformations
using the Expression Builder or SAS code editor.
• Comparethedistributionofthenewvariable with the original variable.
• Predefineglobaltransformationcode for reuse.
Interactive variable binning• Quantileorbucket.• Ginivariableselection.• Handlemissingvaluesasseparate
group. • Fineandcoarseclassingdetail.• Profilebinsbytarget.• Modifygroupsinteractively.• Savebinningdefinitions.
Rules Builder node• Createadhocdata-drivenrulesand
policies.• Interactivelydefinethevalueofthe
outcome variable and paths to the outcome.
Data replacement• Measuresofcentrality.• Distribution-based.• Treeimputationwithsurrogates.• Mid-mediumspacing.• RobustM-estimators.• Defaultconstant.• ReplacementEditor:• Specifynewvaluesforclass
variables.• Assignreplacementvaluesfor
unknown values.• Interactivelycapextremeinterval
values to a replacement threshold.
Descriptive statistics• Univariatestatisticsandplots:• Intervalvariables:n,mean,median,min,max,standarddeviation,scaleddeviation and percent missing.
• Classvariables:numberofcatego-ries,counts,mode,percentmodeand percent missing.
• Distributionplots.
• Statisticsbreakdownforeachlevel of the class target.
• Bivariatestatisticsandplots:• OrderedPearsonandSpearman
correlation plot.• Orderedchi-squareplotwithop-
tion for binning continuous inputs into nbins.
• Coefficientofvariationplot.• Variableselectionbylogworth.• Otherinteractiveplots:• Variableworthplotrankinginputs
based on their worth with the target.
• Classvariabledistributionsacrossthe target and/or the segment variable.
• Scaledmeandeviationplots.
Graphs/visualization• Batchandinteractiveplots:scatter,matrix,box,constellation,contour,
needle,lattice,densityandmultidi-mensionalplots;3-D,pieandareabar charts; and histograms.
• Segmentprofileplots:• Interactivelyprofilesegmentsof
data created by clustering and modeling tools.
• Easilyidentifyvariablesthatdetermine the profiles and the differences between groups.
• Easy-to-useGraphicsExplorerwizard and Graph Explore node:• Createtitlesandfootnotes.• ApplyaWHEREclause.• Choosefromcolorschemes.• Easilyrescaleaxes.• Surfacetheunderlyingdatafrom
standard SAS Enterprise Miner results to develop customized graphics.
• Plotsandtablesareinteractivelylinked,supportingtaskssuchasbrushing and banding.
Figure 4: Use link analysis to evaluate relationships between nodes to visually discover new patterns.
• Dataandplotscanbeeasilycopiedand pasted into other applications or saved as BMP files.
• Interactivegraphsareautomaticallysaved in the Results window of the node.
Clustering and self-organizing maps• Clustering:• Userdefinedorautomatically
chooses the best clusters.• Severalstrategiesforencoding
class variables into the analysis.• Handlesmissingvalues.• Variablesegmentprofileplots
show the distribution of the inputs and other factors within each cluster.
• Decisiontreeprofileusestheinputs to predict cluster member-ship.
• PMMLscorecode.• Self-organizingmaps:• BatchSOMswithNadaraya- Watsonorlocal-linearsmoothing.
• Kohonennetworks.• Overlaythedistributionofother
variables onto the map.• Handlesmissingvalues.
Market basket analysis• Associationsandsequencediscovery:• Gridplotoftherulesorderedby
confidence.• Expectedconfidenceversus
confidence scatter plot.• Statisticslineplotofthelift,confi-dence,expectedconfidenceandsupport for the rules.
• Statisticshistogramofthefre-quency counts for given ranges of support and confidence.
• Rulesdescriptiontable.• Networkplotoftherules.
• Interactivelysubsetrulesbasedonlift,confidence,support,chainlength,etc.
• Seamlessintegrationofruleswithother inputs for enriched predictive modeling.
Dimension reduction• Variableselection:• Removevariablesunrelatedto
target based on a chi-square or R2selectioncriterion.
• Removevariablesinhierarchies.• Removevariableswithmany
missing values.• Reduceclassvariableswitha
large number of levels.• Bincontinuousinputstoidentify
nonlinear relationships.• Detectinteractions.
• LeastAngleRegression(LARS) variable selection:• AIC,SBC,MallowsC(p),cross-
validation and other selection criteria.
• Plotsinclude:parameteresti-mates,coefficientpaths,iterationplot,scorerankingsandmore.
• GeneralizestosupportLASSO(least absolute shrinkage and selection operator).
• Supportsclassinputsandtargetsas well as continuous variables.
• Scorecodegeneration.• Principalcomponents:• CalculateEigenvaluesand
Eigenvectors from correlation and covariance matrices.
• Hierarchicalassociations:• Deriverulesatmultiplelevels.• Specifyparentandchildmappings
for the dimensional input table.
Web path analysis• Scalableandefficientminingofthe
most frequently navigated paths from clickstream data.
• Minefrequentconsecutivesubse-quences from any type of sequence data.
Link analysis• Convertsdataintoasetofintercon-
nected linked objects that can be visualized as a network of effects.
• Providesavisualmodelofhowtwovariables’ levels in relational data or between two items’ conoccurrence in transactional data are linked.
• Providescentralitymeasuresandcommunity information to understand linkage graphs.
• Providesweightedconfidencestatis-tics to provide next-best offer informa-tion.
• Generatesclusterscoresfordatareduction and segmentation.
Figure5:IntegratecustomizedSAScodetocreatevariabletransformations,incorporate SASprocedures,developnewnodes,augmentscoringlogic,tailorreportsandmore.
• Plotsinclude:principalcom-ponentscoefficients,principalcomponentsmatrix,Eigenvalue,Log Eigenvalue and Cumulative Proportional Eigenvalue.
• Interactivelychoosethenumberof components to be retained.
• Mineselectedprincipalcompo-nents using predictive modeling techniques.
• Variableclustering:• Dividevariablesintodisjointor
hierarchical clusters. • Eigenvalueorprincipalcompo-
nents learning.• Includesclassvariablesupport.• Dendrogramtreeoftheclusters.• Selectedvariablestablewith
cluster and correlation statistics. • ClusternetworkandR-square
plot. • Interactiveuseroverrideof
selected variables.• Timeseriesmining:• Reducetransactionaldatainto
a time series using several accumulation methods and transformations.
• Analysismethodsincludeseasonal,trend,timedomain,andseasonal decomposition.
• Minethereducedtimeseriesusingclustering and predictive modeling techniques.
SAS Code node• WriteSAScodeforeasy-to-com-
plex data preparation and transfor-mation tasks.
• IncorporateproceduresfromotherSAS products.
• Developcustommodels.• CreateSASEnterpriseMinerexten-
sion nodes.• Augmentscorecodelogic.• SupportforSASprocedures.• Batchcodeusesinputtablesof
different names and locations.•Batchcodenowintegratesproj-
ect-start code that you can use to define libraries and options.
•Easy-to-useprogramdevelop-ment interface:
• Macrovariablestoreferencedatasources,variables,etc.
• Interactivecodeeditorandsubmit.• Separatelymanagetraining,scor-
ing and reporting code. • SASOutputandSASLOG.
• Creategraphics.
Consistent modeling features• Selectmodelsbasedoneitherthetraining,validation(default)ortestdata using several criteria such as profitorloss,AlC,SBC,averagesquareerror,misclassificationrate,ROC,Gini,orKS(Kolmogorov-Smirnov).
• Incorporatepriorprobabilitiesintothe model development process.
• Supportsbinary,nominal,ordinaland interval inputs and targets.
• Easyaccesstoscorecodeandall partitioned data sources.
• Displaymultipleresultsinonewin-dow to help better evaluate model performance.
• Decisionsnodeforsettingtargetevent and defining priors and profit/loss matrices.
Regression• Linearandlogistic.• Stepwise,forwardandbackward
selection.
• Equationtermsbuilder:polynomi-als,generalinteractions,andeffecthierarchy support.
• Cross-validation.• Effecthierarchyrules.• Optimizationtechniquesinclude:ConjugateGradient,DoubleDogleg,Newton-RaphsonwithLineSearchorRidging,Quasi-Newtonand Trust Region.
• DmineRegressionnode:• Fastforwardstepwiseleast
squares regression.• Optionalvariablebinningto
detect nonlinear relationships. • Optionalclassvariablereduction.
• Includeinteractionterms.• In-databasemodelingforTera-
data 13. • PMMLscorecode.
Decision trees• Methodologies:• CHAID,classificationandregressiontrees,baggingandboosting,gradientboosting,and bootstrap forest.
• Treeselectionbasedonprofitor lift objectives and prune accordingly.
• K-foldcross-validation.• Splittingcriterion:ProbChi-squaretest,ProbF-test,Gini,Entropyandvariance reduction.
Figure6:FithighlycomplexnonlinearrelationshipsusingtheNeuralNetworknode.
• Switchtargetsfordesigningmulti-objective segmentation strategies.
• AutomaticallyoutputleafIDsasinputs for modeling and group processing.
• DisplaysEnglishrules.• Calculatesvariableimportancefor
preliminary variable selection and model interpretation.
• Displayvariableprecisionvaluesinthe split branches and nodes.
• Uniqueconsolidatedtreemaprepresentation of the tree diagram.
• Interactivetreecapabilities:• Interactivegrowing/pruning
of trees; expand/collapse tree nodes.
• Incorporatesvalidationdatatoevaluate tree stability.
• Definecustomizedsplitpoints,including binary or multiway splits.
• Splitonanycandidatevariable.• Copysplit.• Tablesandplotsaredynamically
linked to better evaluate the tree performance.
• Easy-to-printtreediagramsona single page or across multiple pages.
• Interactivesubtreeselection.• User-specifieddisplayoftextand
statistics in the Tree node.• User-controlledsamplesizewithin
interactive trees.• BasedonthefastARBORETUM
procedure.• PMMLscorecode.
Neural networks• NeuralNetworknode:• Flexiblenetworkarchitectures
with combination and activation functions.
• 10trainingtechniques.• Preliminaryoptimization.• Automaticstandardizationof
inputs.• Supportsdirectionconnections.
• AutoneuralNeuralnode:• Automatedmultilayerper-
ceptron building searches for optimal configuration.
• Typeandactivationfunctionselected from four different types of architectures.
• PMMLscorecode.• DMNeuralnode:• Modelbuildingwithdimension
reduction and function selection.• Fasttraining;linearandnonlinear
estimation.
Partial Least Squares node• Especiallyusefulforextracting
factors from a large number of potential correlated variables.
• Performsprincipalcomponentsregression and reduced rank regression.
• Userorautomatedselectionofthe number of the factors.
• Choosefromfivecross-validationstrategies.
• Supportsvariableselection.
Rule induction• Recursivepredictivemodeling
technique. • Especiallyusefulformodelingrare
events.
Two-stage modeling• Sequentialandconcurrentmodel-
ing for both the class and interval target.
• Chooseadecisiontree,regressionor neural network model for each stage.
• Controlhowtheclasspredictionisapplied to the interval prediction.
• Accuratelyestimatecustomervalue.
Memory-based reasoning• k-nearest neighbor technique to
categorize or predict observations. • PatentedReducedDimensionality
Tree and Scan.
Model ensembles• Combinemodelpredictionsto
form a potentially better solution. • Methodsinclude:Averaging,Vot-
ing and Maximum.
Open Source Integration node• WritecodeintheRlanguage
inside of SAS Enterprise Miner.
Figure 7: Analyze time series data using classical seasonal decomposition.
• SASEnterpriseMinerdataandmetadata are available to your R code with R results returned to SAS Enterprise Miner.
• Trainandscoresupervisedandunsupervised R models. The node allows for data transformation and exploration.
• GeneratemodelcomparisonsandSAS score code for supported models.
Incremental response/net lift models• Nettreatmentvs.controlmodels.• Binaryandintervaltargets.• Stepwiseselection.• Fixedorvariablerevenuecalcula-
tions. • Netinformationvaluevariable
selection.• Usercanspecifythetreatment
level of the treatment variable.• Usercanspecifyacostvariablein
addition to a constant cost.• PenalizedNetInformationValue(PNIV)forvariableselection.
• Separatemodelselectionoptionsavailable for an incremental sales model.
Time series data mining• Timeseriesdatapreparation:• Aggregate,transformand
summarize transactional and sequence data.
• Automaticallytransposethetime series to support similarity analysis,clusteringandpredictivemodeling.
• ProcessdatawithorwithoutTimeID variables.
• Similarityanalysis:• Usefulfornewproductforecast-ing,patternrecognitionandshortlifecycle forecasting.
• Computessimilaritymeasuresbetween the target and input se-ries,oramonginputtimeseries.
• Similaritymatrixforallcombina-tions of the series.
• Hierarchicalclusteringusingthesimilarity matrix with dendro-gram results.
• Constellationplotforevaluatingthe clusters.
• Exponentialsmoothing:• Controlweightsdecayusingone
or more smoothing parameters.• Best-fittingsmoothingmethod(simple,double,linear,dampedtrend,seasonalorWinters’method) is selected automatically.
• Dimensionreduction:• Supportsfivetimeseriesdimen-
sion reduction techniques: DiscreteWaveletTransform,DiscreteFourierTransform, SingularValueDecomposition,Line Segment Approximation withtheMean,andLineSeg-ment Approximation with the Sum.
• Cross-correlation:• Providesautocorrelationand
cross-correlation analysis for time-stamped data.
• TheTimeSeriesCorrelationnode outputs time-domain statistics based on whether auto-correlation or cross-correlation is performed.
• Seasonaldecomposition.
Survival analysis• Discrete time to event regression
with additive logistic regression.• Event probability for time effect is
modeled using cubic splines.• Userscannowenterthecubic
spline basis functions as part of the stepwise variable selection procedure in addition to the main effects.
• User-definedtimeintervalsforspecifying how to analyze the data and handle censoring.
• Automaticallyexpandsthedatawith optional sampling.
• Supportsnon-timevaryingcovariates.• Computessurvivalfunctionwith
holdout validation.• Generates competing risks or sub-
hazards.• Scorecodegenerationwithmean
residual life calculation. • Userscanenterthecubicspline
basis functions as part of the step-wise variable selection procedure in addition to the main effects.
• Incorporatetime-varyingcovariatesinto the analysis with user-specified dataformats,includingstandard,change-time and fully expanded.
• Userscanspecifyleft-truncationand censor dates.
Figure 8: Model the incremental impact of a marketing treatment in order to maximize the return on investment.
Group processing with the Start and End Groups nodes• Repeatprocessingoverasegment
of the process flow diagram. • Usesincludestratifiedmodeling,baggingandboosting,multipletargets,andcross-validation.
SAS® Rapid Predictive Modeler customized task in SAS® Enter-prise Guide® or SAS® Add-In for Microsoft Office (Excel only)• Automaticallygeneratespredic-
tive models for a variety of busi-ness problems.
• Modelscanbeopened,augment-ed and modified in SAS Enter-prise Miner.
• Producesconcisereports,includ-ingvariableimportanceliftcharts,ROC charts and model scorecards for easy consumption and review.
• Abilitytoscorethetrainingdatawith option to save the scored data set.
High-performance data mining procedures• Multithreadedproceduresexecute
concurrently and take advantage of all available cores on your existing symmetric multiprocess-ing (SMP) server to speed up processing:• HPBIN(high-performance
binning).• HPBNET(high-performance
Bayesian networks).• HPCLUS(high-performance
clustering).• HPCORR(high-performance
correlation).• HPDECIDE(high-performance
decide).• HPDMDB(high-performance
data mining database).• HPDS2(high-performanceDS2).• HPFOREST(high-performance
random forests).• HPIMPUTE(high-performance
imputation).
• HPNEURAL(high-performanceneural networks).
• HPREDUCE(high-performancevariable reduction).
• HPSAMPLE(high-performancesampling).
• HPSUMMARY(high-perfor-mance data summarization).
• HPSVM(high-performance SupportVectorMachine).
• HP4SCORE(high-performance4Score).
• High-performance-enabled SAS Enterprise Miner nodes:• HPCluster.• HPDataPartition.• HPExplore.• HPForest.• HPGLM.• HPImpute.• HPNeuralNetwork.• HPPrincipalComponents.• HPRegression.• HPSVM.• HPTree.• HPTransform.• HPVariableSelection.
Model Import node• RegisterSASEnterpriseMiner
models for reuse in other dia-grams and projects.
• Importandevaluateexternalmodels.
Model evaluation• ModelComparisonnodecom-
pares multiple models in a single framework for all holdout data.
• Automaticallyselectsthebestmodel based on the user-defined model criterion.
• Supportsuseroverride.• Extensivefitanddiagnostics
statistics.• Liftcharts;ROCcurves.• Intervaltargetscorerankingsand
distributions.• Profitandlosschartswithdeci-
sion selection; confusion (classifi-cation) matrix.
• Classprobabilityscoredistribu-tion plot; score ranking matrix plots.
• Cutoffnodetodetermineprob-ability cutoff point(s) for binary targets.
• Useroverridefordefaultselec-tion.
• MaxKSStatistic.• MinMisclassificationCost.• MaximumCumulativeProfile.• MaxTruePositiveRate.• MaxEventPrecisionfromTraining
Prior.• EventPrecisionEqualRecall.
Figure 9: Automatically generate predictive models for a variety of business problems using the SAS Rapid Predictive Modeler task in SAS Enterprise Guide or SAS Add-In for Microsoft Office (Microsoft Excel only).
Figure10:Buildarandomforestmodel,whichconsistsofensemblingseveraldecisiontrees.Throughmultipleiterations,randomlyselectvariablesforsplittingwhilereducingthe dependence on sample selection. Use out-of-bag samples to form predictions.
Reporter node• UsesSASOutputDeliverySystem
to create a PDF or RTF of a pro-cess flow.
• Helpsdocumenttheanalysispro-cess and facilitate results sharing.
• Documentcanbesavedandincluded in SAS Enterprise Miner results packages.
• Includesimageoftheprocessflowdiagram.
• User-definednotesentry.
Save Data node• Savetraining,validation,test,score
or transaction data from a node to either a previously defined SAS library or a specified file path.
• ExportJMP®,Excel2010,CSVandtab delimited files. Default options are designed so that the node can be deployed in SAS Enterprise Miner batch programs without user input.
• Canbeconnectedtoanynodeina SAS Enterprise Miner process flowdiagramthatexportstraining,validation,test,scoreortransac-tion data.
Scoring• Scorenodeforinteractivescoring
in the SAS Enterprise Miner GUI.• Optimizedscorecodeiscreatedbydefault,eliminatingunused variables.
• AutomatedscorecodegenerationinSAS,C,JavaandPMML(Version4.0).
• SAS,CandJavascoringcodecapturesmodeling,clustering,transformations and missing value imputation code.
• ScoreSASEnterpriseMinermod-elsdirectlyinsideAsterData,DB2,Greenplum,Hadoop,IBMNetezza,Oracle and Teradata databases with SAS Scoring Accelerator.
To learn more about SAS Enter-prise Miner system requirements, download white papers, view screenshots and see other related material, please visit sas.com/enterpriseminer.
Figure 11: Evaluate multiple models together in one easy-to-interpret framework using the Model Comparison node.
Model registration and management• Registersegmentation,classifi-
cation or predictive models to the SAS Metadata Server. Input variables,outputvariables,targetvariables,miningfunction,trainingdata and SAS score code are regis-tered to the metadata. • RegisterModelnodeconsoli-
dates registration steps and pro-vides a registration mechanism that can run in SAS Enterprise Miner batch code.
• RegistrationofmodelsinSASMetadata Server enables:• IntegrationwithSASModel
Manager for complete lifecycle management of models.
• IntegrationwithSASEnterpriseGuide and SAS Data Integration Studio for scoring models.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names aretrademarksoftheirrespectivecompanies.Copyright©2014,SASInstituteInc.Allrightsreserved.101369_S114796.0414
Figure12:Developdecisiontreesinteractivelyorinbatch.Numerousassessmentplotshelpgauge overall tree stability.