hana reference implementat
TRANSCRIPT
Reference Implementation for Application Development on HANABy TIP D&NA Data Management June 10th, 2011
© 2011 SAP AG. All rights reserved. 2
Agenda
Motivations Languages available for application development Demo – CarShop, V1 CarShop, V2 Resource
We like to make this session as an open discussion instead of presentation; your feedback is very welcome.
© 2011 SAP AG. All rights reserved. 3
Motivations
•What’s this?• It is a sample application built on HANA. It leverages
HANA to provide features like “Analysis”, “Forecast”, “What-if Planning”, “Sales promotion like cross-selling” etc.
•Why we do this?• HANA has great features which other DBMSs don’t have,
such as column-based modeling, in-memory computing, build-in business library, build-in predictive library, R integration etc. The official HANA document can’t cover all the details, especially the sample codes. We make this app so that other developers can use this as a reference and quickly develop new apps on HANA.
•What are the benefits for SAP?• Other HANA Content/App developers can
quickly :master” HANA advanced features like column-based modeling, in-memory computing, build-in business library, predictive library, R integration, etc. As sample codes, other teams/developers can know how to make 2-tired and 3-tired application based on HANA.
SQLBFLR
Analysis
What-if
Cross-
selling
Forecast
© 2011 SAP AG. All rights reserved. 4
Motivations
Project DefinitionJust like Sun provides “petStore” for Java EE platform, here the HANA Reference Implementation Application, named “carShop”, is designed to illustrate how the HANA can be used to develop an amazing application. By learning this project, the learner can get the following things: SQLscript SQLscript V2 L IMSL R BFL /PAL .net/java frontend
Target Audience: Everyone who is interested in how to develop new applications on HANA
Virtual Business ScenarioThis project is based on a virtual car sales scenario. A company has lots of salesmen in different cities to sell cars. That company uses carShop system to analyze historical sales data, forecast and plan future sales data, set KPI to salesman based on the plan data, calculate the volume-driven bonus, analysis the potential customer information, cluster them, find selling opportunities.
© 2011 SAP AG. All rights reserved. 5
“Languages”
The following languages could be used to access the HANA functionalities:
• IMSL (International Math &
Statistics Lib)
• R
• BFL (Business Function Lib) /
PAL (Predictive Analysis Lib)
• L
• SQL Script V1 and V2
• calEngine
• few others (e.g. logic/inference) HANA
BFL / PAL
SQL Script
IMSL
LR
LL
© 2011 SAP AG. All rights reserved. 6
IMSL
The IMSL Numerical Libraries have been the cornerstone of high-performance and desktop computing applications in science, technical and business environments for well over three decades.
It’s developed by Visual Numerics, which has achieved an OEM agreement with SAP to embed IMSL C Numerical Library Into TREX Component to offer advanced analytics for SAP applications.
IMSL C math
IMSL C Statitacs
© 2011 SAP AG. All rights reserved. 7
IMSL
Functional areas included in the IMSL Numerical Libraries:
Mathematics Statistics •Matrix Operations•Linear Algebra•Eigensystems•Interpolation & Approximation•Numerical Quadrature•Differential Equations•Nonlinear Equations•Optimization•Special Functions•Finance & Bond Calculations•Genetic Algorithm
•Basic Statistics•Time Series & Forecasting•Nonparametric Tests•Correlation & Covariance•Data Mining•Regression•Analysis of Variance•Transforms•Goodness of Fit•Distribution Functions•Random Number Generation•Neural Networks
© 2011 SAP AG. All rights reserved. 8
IMSL
The IMSL sample code to access HANA
Note: Currently, IMSL functions are only available in the DEV branch of NewDB
© 2011 SAP AG. All rights reserved. 9
IMSL
Benefits of Embedding the IMSL
Accelerate Development Develop Better Software Applications Develop Flexible Software Applications Improve Quality and Reduce Uncertainty Reduce Costs (?) Fair or better results than other packages
Limitations of IMSL:OpenMP-based parallelism is not compatible w NewDBDo not work for partitioned tablesGovernance issue: Cannot monitor its memory usage and threading
© 2011 SAP AG. All rights reserved. 10
What is R?
Aims at building an open source version of S (under GNU GPL)
Project Home: http://www.r-project.org/ Available on Windows, Linux, and MaxOS
Latest version 2.12.1 (dated on 16/12/2010) Now has a core team of about 19 people Support for multiple languages CRAN
a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R
The R Journal a refereed journal of the R project for statistical computing
Some well-known weakness is not particularly efficient in handling large data sets it is rather slow in executing a large number of for loops Learning curve is somewhat steep compared to point and click
software
© 2011 SAP AG. All rights reserved. 11
Join OP
ROP
OLAP OP
Calc. Engine
Rich other Plug-in
(Forecasting, Parallelism,
statistics, etc.)
SHM Channel Plug-in
“REngine”•Parser•Runtime•Operators
RClient
TCP/IP Channel Plug-in
SHM SolutionSingle Server
TCP/IP SolutionDifferent Servers
NewDB Space OpenSource R Space
1
3
2
NOTE:1. SHM (SHared Memory) Solutoin
we use LGPL to solve the potential IP issue. NewDB and R need to be in the same server
2. TCP/IP solution – NewDB and R can be deployed in different machines.3. “REngine” – In discussion.
R Integration in NewDB (Available w HANA 1.0GA)
R Runtime
NewDB R Integration Open Source R
Milestones
1. May, 2010 – NewDB team had JDBC and CSV version for R Integration, but it was very slow. D&NA team joined to develop better solutions.
2. Oct, 2010 – Checked in SHM solution into NewDB Standard build. Got at least 50X performance improvement V.S. old solution.
3. March, 2011 – Checked in Parallemlism handling for data transition between NewDB and R. Gained at least another 3X improvements.
4. (In plan) HANA 1.5 Release – Release SHM LGPL version to lower possible IP issues.
5. (In plan) HANA 1.5 Release – Release TCP/IP solution to support multi server requirement.
Internal Customers
1. Oct, 2010 – DNA for SalesForecasting
2. Jan, 2011 – EPM SBC for spend analysis
3. Mar, 2011– PIO for personal financial analysis.
4. April, 2011 – IDDC PA for predictive analysis
© 2011 SAP AG. All rights reserved. 12
Language and tools
Packages
© 2011 SAP AG. All rights reserved. 13
Languages
SQLScript + R: determine the Poisson Regression Model
CREATE FUNCTION LR( IN input1 SUCC_PREC_TYPE, OUT output0 R_COEF_TYPE) LANGUAGE RLANG AS'''
CHANGE_FREQ<-input1$CHANGE_FREQ;SUCC_PREC<-input1$SUCC_PREC; coefs<-coef(glm(SUCC_PREC ~ CHANGE_FREQ, family = poisson )); INTERCEPT<-coefs["(Intercept)"]; CHANGEFREQ<-coefs["CHANGE_FREQ"]; names(INTERCEPT)<-NULL; names(CHANGEFREQ)<-NULL;result<-as.data.frame(cbind(INTERCEPT,CHANGEFREQ))
''';
TRUNCATE TABLE r_coef_tab; CALLS LR(SUCC_PREC_tab,r_coef_tab );SELCET * FROM r_coef_tab;
© 2011 SAP AG. All rights reserved. 14
Business Function Library
Business Function Library (BFL) is now the calculation library for the Applications which is built on top of NewDB. It resides in NewDB CalcEngine, consists of many Business Functions executing at NewDB layer and is written in C++.
Significant performance improvements for SAP apps1. Utilizing new hardware ( i.e. multi core,
built in vector engine)2. Massive parallel main memory
processing3. Changing the boundaries between
application server and data management layer
Simplification of application programming model1. Usage of extended SQL(SQLScript) 2. Rich Functionalities in Calculation Engine3. Quick apps delivery
Design Goals
BFL Wiki
© 2011 SAP AG. All rights reserved. 15
BFL Governance
Adam TheirRalf EhretWen-Syan LiVolkmar Soehner (LiveCache, Planning Eng)
Kai Stammerjohann Nico BohnsackVolkmar SoehnerPeter Goertz Thorsten GlebeFranz FaerberDaniel Booβ Andrei Suvernev
Volkmar Soehner (LiveCache, Planning eng, …)Wen-Syan Li (BFL)
© 2011 SAP AG. All rights reserved. 16
BFL Framework
BFL Framework:
Core Service+ RUNTIME EINVIRONNEMENT, will be residence in NewDB. Can be configured/Plug-in/Invoke BFL . With core service, the application teams can build BFL without whole NewDB code.
Future Release
As one proposal, we plan to develop BDK (BFL Development Kit) for BFL development environment, and the BRE (BFL Runtime Environment) for BFL runtime, including memory allocation, error handing and so on.
BDK plus BRE is the future BFL framework. With the new framework, clients don’t need directly interact with NewDB development environment.
Support the stateful execution of each function.
© 2011 SAP AG. All rights reserved. 17
L Language
“L” is tailored to NewDB by SAP.
The programming language L is targeted as a robust, low-level, high-performance programming language inside NewDB.
“L” can be described as a safe subset of C++ with NewDB data types and additional support for processing table like data
“L” provides direct access to the table and column objects which are used in the Calculation Engine.
© 2011 SAP AG. All rights reserved. 18
L Language
Llang — The L Programming Language
© 2011 SAP AG. All rights reserved. 19
L Language
Type Mappings
SQL Type Column Store Type L Null Type L Non-Null Type L Raw Type Notes on L Type NullBool Bool Size TINYINT INT NullInt32 Int32 SMALLINT INT NullInt32 Int32 INTEGER INT NullInt32 Int32
BIGINT FIXED8 NullFixed8<0> Fixed8<0> default 8 bytes
length REAL FLOAT NullFloat Float RawFloat DOUBLE DOUBLE NullDouble Double RawDouble DATE DAYDATE NullDate Date
CHAR(a) FIXEDSTRING(a) NullFixedString<a> FixedString<a>
….......... …………… ……………. …………… ……………. ……………
© 2011 SAP AG. All rights reserved. 20
L Language
Embed L code in the SQLScript
© 2011 SAP AG. All rights reserved. 21
SQLScript
SQL is the main interface to applications. NewDB supports standard SQL with a set of NewDB specific extensions
SQLScript A new language for processing application-specific code in the database layer, The
main goal of SQL Script is to allow the execution of data intensive calculations inside NewDB
The main concept in SQL Script is the function. SQL Script functions can have multiple input and output parameters. They are composed of calls of other functions, and of SQL queries.
Intermediate results can be assigned to variables that are local to the function. Basic control flow is possible via if/else clauses and error handing is supported via try/catch blocks.
The recursion (direct or indirect) is not allowed. A SQL Script function is free of side effects, that means it computes the values of the
output parameters but modifies no other data. delete, update, insert statements are not allowed inside SQL Script functions. These restrictions ensure that two function calls that are not connected via data flows can be executed in parallel.
© 2011 SAP AG. All rights reserved. 22
SQLScript
Datatype Extension
SQLScript’s datatype extension also allows the definition of table types. These table types are used to define parameters for functions
A table type is created using the CREATE TABLE TYPE statement
Functional Extension
The functional extension allows its users to describe complex data-flow logic using side-effect free table functions
Functions can be created using CREATE FUNCTION and dropped using DROP FUNCTION
© 2011 SAP AG. All rights reserved. 23
SQLScript
Functional Extension
Built-in FunctionsThere are different categories of built-in functions
Tracing and debugging
Data source access
Relational Operators
© 2011 SAP AG. All rights reserved. 24
SQLScript
SQLScript version 2
Coming up soon (in a week)
Support loop flow control statements
…
© 2011 SAP AG. All rights reserved. 25
Comparisons
IMSL R BFL L SQLScript
OpenSource?
No Yes No No No
Directly Called by Clients
Via LVia SQLscriptExcel (soon)
Via SQLscriptVia R consoleExcel (soon)
Via SQLscrpt/LExcel
No Yes
“Known” Limitations
Not comply w IM-DB governance
not particularly efficient in handling large data sets
Limited availability
Pre-fined input and output
No flow control
Parallelism Limited via OpenMP
Limited via OpenMP etc
Yes No Partially Yes?
© 2011 SAP AG. All rights reserved. 26
Our suggestions
1. Use SQLscript as much as possible because Reasonable safer than C You control the development process independent from NewDB Good for reporting / simple aggregation
2. Use R if You need to develop algorithms and need interact with the data Quick prototyping / PoC / small data set for analysis Have flow control / GUI /Debugging tool
3. Use IMSL if computation is complex.
4. Use BFL/PAL if computation is complex, data set is large, and algorithms need customization. If product level quality is needed. If partitioned table and cluster supported are needed.
© 2011 SAP AG. All rights reserved. 27
Demo
CarShop
V1
© 2011 SAP AG. All rights reserved. 28
CarShop, V2
In the next version of CarShop, the following features will be considered to enrich its functionalities to make it more useful for SAP internal users:
SQLscript V2 (control flow)
Support transaction
Planning capability via planning engine
Reduce the memory footprint during execution
Support map/reduce on cluster (HANA 1.5)
“Best practice” in term of selecting right languages to implement applications
Testing related features? Cancel flag, profiling, ….
Q / A ?