kettle palo

38
Contents Introduction ................................................................................................................................ 2 Palo connections ......................................................................................................................... 3 Palo input step ............................................................................................................................ 6 Palo output step .......................................................................................................................... 8 Palo engine step .......................................................................................................................... 9 Palo engine step usage and configuration ................................................................................ 9 Working with rules repository ............................................................................................... 11 Engine rules .............................................................................................................................. 13 Rule execution ...................................................................................................................... 13 Rule Node ............................................................................................................................. 13 Parameters and output stream ................................................................................................ 13 Assignment Node .................................................................................................................. 14 Call Node .............................................................................................................................. 14 Condition Node ..................................................................................................................... 15 Enumeration Node ................................................................................................................ 15 Immediately and buffered rule execution............................................................................... 15 Nested rules .......................................................................................................................... 16 Expression Syntax .................................................................................................................... 18 Numeric operations and functions ......................................................................................... 18 String operations and functions ............................................................................................. 19 Logical operations and functions ........................................................................................... 20 Sets of values ........................................................................................................................ 20 Dimension element operations and functions ......................................................................... 21 Cube cells operations ............................................................................................................ 22 Aggregate function................................................................................................................ 23 Working with input and output streams ................................................................................. 23 Database operations .............................................................................................................. 24 Logging functions ................................................................................................................. 25 Error messages description........................................................................................................ 26 Compiler errors ..................................................................................................................... 26 Evaluating errors ................................................................................................................... 27 Engine errors......................................................................................................................... 28 Rule reading errors ................................................................................................................ 29 Examples .................................................................................................................................. 31 Exporting cube structure to stream ........................................................................................ 31 Create cube by given structure .............................................................................................. 32 Currency exchange example .................................................................................................. 33 Cube examples ...................................................................................................................... 33 Creating and filling germany cube ......................................................................................... 33 Creating and filling cube for other countries .......................................................................... 35 Creating and filling global cube............................................................................................. 35 Creating and filling analytical cube ....................................................................................... 36

Upload: sakchai-padungpattanodom

Post on 25-May-2017

252 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Kettle Palo

Contents

Introduction ................................................................................................................................ 2 Palo connections ......................................................................................................................... 3 Palo input step ............................................................................................................................ 6 Palo output step .......................................................................................................................... 8 Palo engine step .......................................................................................................................... 9

Palo engine step usage and configuration ................................................................................ 9 Working with rules repository ............................................................................................... 11

Engine rules .............................................................................................................................. 13 Rule execution ...................................................................................................................... 13 Rule Node ............................................................................................................................. 13 Parameters and output stream ................................................................................................ 13 Assignment Node .................................................................................................................. 14 Call Node .............................................................................................................................. 14 Condition Node ..................................................................................................................... 15 Enumeration Node ................................................................................................................ 15 Immediately and buffered rule execution ............................................................................... 15 Nested rules .......................................................................................................................... 16

Expression Syntax .................................................................................................................... 18 Numeric operations and functions ......................................................................................... 18 String operations and functions ............................................................................................. 19 Logical operations and functions ........................................................................................... 20 Sets of values ........................................................................................................................ 20 Dimension element operations and functions ......................................................................... 21 Cube cells operations ............................................................................................................ 22 Aggregate function ................................................................................................................ 23 Working with input and output streams ................................................................................. 23 Database operations .............................................................................................................. 24 Logging functions ................................................................................................................. 25

Error messages description ........................................................................................................ 26 Compiler errors ..................................................................................................................... 26 Evaluating errors ................................................................................................................... 27 Engine errors ......................................................................................................................... 28 Rule reading errors ................................................................................................................ 29

Examples .................................................................................................................................. 31 Exporting cube structure to stream ........................................................................................ 31 Create cube by given structure .............................................................................................. 32 Currency exchange example .................................................................................................. 33 Cube examples ...................................................................................................................... 33 Creating and filling germany cube ......................................................................................... 33 Creating and filling cube for other countries .......................................................................... 35 Creating and filling global cube............................................................................................. 35 Creating and filling analytical cube ....................................................................................... 36

Page 2: Kettle Palo

Introduction This document contains detailed description how to process data in Palo Cubes by using Kettle transformations. This data processing is performed by using special Kettle extensions. Those software modules allow connecting with Palo servers, getting data from Palo servers and putting data to Palo. Described software is well integrated with another Kettle tools. So, it is able to use this software with already existed Kettle tools (for example, loading data from flat file). Finally, this software allows accomplishing more complex data manipulation by using Palo Rule Engine (PRE) in Kettle transformations. This document includes listed below topics:

How to connect with Palo servers in Kettle transformations. How to get data from Palo servers in Kettle transformations. How to put data to Palo servers in Kettle transformations. How to use PRE in Kettle transformations and write PRE rules for complex data

processing by Kettle transformations.

Page 3: Kettle Palo

Palo connections If it is required to process data from Palo during Kettle transformation, then it need to

describe each connection with Palo server (or several servers). There is no significant difference between Palo connection and connection with another database servers (for example, Oracle). So, you can create new connection by Kettle’s Connection Wizard or enter connection parameters in Kettle connection dialog. It is clean that you can create several Palo connections in same kettle transformation. For configure connection with one Palo server, it is need to specify:

Connection name. This field doesn’t affect on Palo connection, but is used for identifying this connection among another connection. For example, name of connection is used in Kettle transformation steps configurations (see below).

Palo connection type. Palo server may be remote server and it is need to specify its network address and port. Database name, because same Palo server can manage several databases. Moreover, each

database can contain several data cubes, but it is able to process data in these cubes by same Palo connection.

Authentification parameters (user name and password). Following figure illustrates creation (or modifying) of Palo connection.

Fig 1. Palo connection dialog

After setting up common, obligatory parameters, you may set up Palo specific parameters on Palo tap page of that dialog. I some cases their proper set up is a strong requirement, so, please be careful about them.

Page 4: Kettle Palo

Fig 2. Palo-specific parameters

Palo 1.0 supports only one interface, called ‘Legacy’. Palo 1.5 in addition supports new API based on HTTP. Two different access drivers are used for serve two different interfaces, so in some cases switch between access methods might give performance boost. In general HTTP connection interface is faster in most cases. Connection dialog contains special helpful buttons for working with already configured connection:

Button “Test” opens connection with specified database server. If connection successful then it shows special dialog window. If connection can’t be established then module shows detailed description of happen error.

Button “Explore” allows viewing database contents. This feature is helpful if user wants to get information about Palo database. Following figure illustrates this.

Page 5: Kettle Palo

Fig 3. Palo database explorer Appeared dialog window contains detailed information about database cubes,

dimensions and elements. But, this window doesn’t allow viewing cube content and it is necessary to use another software for this task.

Button “Feature List” allows getting more structured information about connection.

Following figure illustrates this.

Fig 4. Palo connection feature list

Page 6: Kettle Palo

Palo input step Palo input step types allows to get data from Palo cube and process data received in Kettle transformation. Getting data from scans cells of Palo cube and generate single data row for each non-empty cell. This row is constructed by following method:

1. If cube has N dimensions, then each output row has N+1 fields 2. First N fields are corresponds to cube dimensions and contains names of elements from

cube dimensions. Those fields have string type, because element names are strings. 3. Last cube field contains value of cube cell. This value can have string or numeric type. It

depends of elements in each dimension. Moreover, cube can contain string and numeric values and if this case it is unable to specify value field type unambiguously.

Generated stream can be processed by another Kettle steps, for example, it can be saved to flat comma-separated file. Following figure illustrates Palo Input step configuration window.

Fig 5. Palo input step configuration Configuration window allow to specify listed parameters:

1. Step name. This value used only for identifying Palo input step among other transformation steps.

2. Palo connection and cube name. Palo input step will get data from this Palo cube. There two special buttons near connection combo box. They allow configuring already existed connection or creating new connection.

3. Field names in generated output stream. It is required to specify names for each cube dimension (in table) and value field.

Palo input step automatically evaluates size of output stream by multiplying count of elements in each cube dimension and indicates this value in “Combination factor”. Really, this value is much bigger than real output stream size. But this value precisely indicates count of cube cells that will be scanned and allows to evaluate operation durability.

Page 7: Kettle Palo

It is possible to decrease this time by specifying single element in any dimension. In this case, Palo input will not look all elements from this dimension. It is allowed to specify simple element or consolidated element. If first case, Palo input step processes only single element from dimension, but in second case, Palo input step processes all children of specified element. You need to click in elements column in order to see input step’s shows special window, which allows selecting element from corresponding dimension.

Fig 6. Selecting element from dimension User can select any element in showed dimension elements or click “Clean button”. If user clicks this button, then Palo input step clears element selection for this dimension and Palo input step would scan all elements from this dimension. Finally, user can click ‘refresh’ button if wants to refresh contents of tree shown is required.

Page 8: Kettle Palo

Palo output step Palo output step allows putting data from input data stream to Palo server. It takes rows from input kettle data stream and treat is by next way. One of fields is treated as cell field and contains cell data. Another fields are treated as dimension elements. Thus, each row from input stream represents data one single cube cell. If specified element doesn’t exist then Palo output step can create it or skip whole row. Followed figure illustrates Palo output step configuring dialog.

Fig 7. Palo output step configure Elements of this dialog allows to specify listed parameters:

1. Transformation step name. 2. Palo connection. There two special buttons near connection combo box. They allow

configuring already existed connection or creating new connection. 3. Setup according between input row fields and cube dimensions. 4. Moreover, it is able to specify how to process element if there is no this element in

dimension. There are three types of action: add numeric elements, add string elements or skip whole row. If element is numeric then corresponding cube cells can contain numeric values and they contain numeric values if all corresponding elements are numeric. Otherwise, if element is string then all corresponding cells contains string values.

Page 9: Kettle Palo

Palo engine step

Palo engine step usage and configuration Palo input step and Palo output step gives basic tools for processing data in Palo cubes. But, this steps con copy data only and can’t make even simple data modification. Palo engine allow processing more complex data modification, for example, converting prices from one currency to another and so on. This step takes row from input, processes it by specified rule, generates another row and puts in to output stream. Palo rules can have additional parameters and it is need to specify values of each parameter. Finally, Palo rule can get data from Palo cube and put data to Palo cube not throw input or output streams. It may be necessity in more complex data transformation. In this case, Palo rule refers to cubes by aliases and it is need to setup accordance between this aliases and real Palo connections. Followed figures illustrate Palo engine step dialog.

Fig 8. Palo engine step dialog. Structure of output stream

Page 10: Kettle Palo

Fig 9. Palo engine step dialog. Rule parameters

Fig 10. Palo engine step dialog. Connections with Palo cubes This dialog window has elements for specifying:

1. Kettle transformation step name. 2. Rule that analyses data of input row and calculates fields of output row. Button ‘Browse’

allows to view contents of rules repository and select rule (see below) 3. Tab ‘Fields’ allows viewing structure of output stream. It includes count of fields, name

and type of each field. 4. Tab ‘Parameters’ allows to view rule parameters and specify values of them 5. Tab ‘Connections’ allows viewing connections and setup accordance between cube

aliases and real connections with Palo servers.

Page 11: Kettle Palo

Working with rules repository If user clicks ‘browse’ button in Palo engine step dialogue, then rules repository dialog opens and shows content of repository.

Fig 11. Rules repository It is able to create new rules, delete rules or modify already existed rule or function. Buttons allows selecting any rule, canceling selection or refreshing content of window. If user creates new rule or modifies already then kettle opens rule modification dialog (see next figure).

Page 12: Kettle Palo

Fig 12. Rule modification window It is able to modify rule name, description and content. Rule context is XML document, which describes rule of data modifying. Syntax and meaning of XML document is described in following parts. Described window has buttons for checking rule and viewing fields of input stream.

Page 13: Kettle Palo

Engine rules As described above, rules repository contains user defined rules that can be used for data processing in Palo rule transformation steps. Palo rules can get data from input stream, from rule configuration parameters and from Palo cubes directly. Similarly, rules can put data to output stream or to Palo cube directly.

Rule execution Rule has own name that identifies this rule in repository and rule can have description that can contain any text. This text may contain additional comments for a rule and it can be helpful for user. Each rule may be represented as a multibranch tree with single root node. Each node can be one of several types and it can have additional parameters depends of node type. Node execution is some actions, but those actions depend of node type and additional node parameters. In most cases, additional node parameters are expressions that calculate output or intermediate data. Rule execution consists of root node execution. Rule is easy represented as XML document. Each rule node is represented as XML element, which has specific attributes and children elements. Name of element depends of node type. All currently supported node types are described follow.

Rule Node Root rule node element always is rule type node and corresponding XML element has name ‘rule’ and has following format:

<rule name=”....” description=”...” immediately=”true/false”> ...

</rule> It always has attribute ‘name’ which contains name of rule and can have two additional attributes. First of them named ‘description’ and contains non-restricted user comment for this rule. Second attribute named ‘immediately’ and defines how rule stores data to Palo cubes or output stream. If this attribute has value ‘true’ then rule engine stores data immediately. Else, Palo engine caches data modifications and stores cache content only when rule execution will be finished. Immediately execution and buffered execution are described in following section. Rule node execution consists in execution of each child node starting from first node and finished by last node.

Parameters and output stream Palo rule can get additional input parameters and generate output stream. It is need to specify each input parameter by special XML element: <param name=”...” type=”....”/> Each parameter has unique name and type. Parameter type can be “numeric”, “string” or “boolean”. But, if rule works with Palo cube then it is necessary to specify each cube that can be

Page 14: Kettle Palo

used by rule. For this purpose, each cube must me declared as parameter with special type “connection” Similarly, if rule generates output stream, then it is need to declare structure of output stream. Each stream field must be declared by special XML element:

<output name=”...” type=”....”/> Output field type can be “numeric”, “string” or “boolean”.

Assignment Node This node type allows to set values of output stream or Palo cubes. In more usual terms, this node type realizes assignment operator. This node type has following structure in XML document. <set cell=”...” expression=”...”

description=”...” /> This type has two additional parameters. First parameters name is ‘cell’ and this parameter contains expression for calculate which cell (or output value) will me modified. So, this expression can refer to output stream, cube cell, consolidation factor and so on. In most of cases it is able to use special function only in this expression. These functions are described in following sections. Second parameter contains expression for calculating new value of modifying data. Both expressions must have same value type. Finally, assignment node can have description attribute for user comments. Assignment node can’t have child nodes. Execution of assignment node consists of calculating source ad destination expression. After that, if rule stores data immediately, then node execution writes data to cube or output stream. And if rule doesn’t work immediately, then node execution stores calculated value in cache. Immediately execution and buffered execution are described in following section.

Call Node This node type allows to call functions that performs some operations such as creating new Palo cube and so on. This node type has following structure in XML document. <call expression=”...”

description=”...” /> First parameter (expression) stores expression that refers to function and describes how to evaluate parameter values. Second parameter contains description attribute of this rule node. Call node can’t have child nodes. Execution of call node consists of calculation given expression that consists of calculation of function parameters and execution of specified function. Call node can perform specified action in immediate or buffered mode.

Page 15: Kettle Palo

Condition Node This node type allows specifying execution branches in rule execution. In XML document condition node has following format.

<if condition =”...” description=”...”> ...

</if> <elseif condition =”...”> </elseif> <else> </else>

Condition node always has condition expression that is stored in corresponding attribute and contains logical expression. Similarly to another node types, condition node can have description attribute with user comments. Execution of condition node involves calculation of condition. And, if condition expression is true, then condition node executes each child node in their order. But, if condition expression is false then condition node doesn’t make anything. Elseif branches and else branch are executed consequently when if condition check has negative result.

Enumeration Node This node allows to enumerate all items of specified collection and perform actions for each value from given collection. In XML document condition node has following format.

<foreach name=”...” in=”...” description=””> ...

</foreach> First parameter (name) contains name of variable. This variable will store value of current item from collection. Second parameter (in) specifies collection of elements and represents expression with one of following result type: set, vector, rule result, cell enumeration. Enumeration node contains children nodes. They describes actions that are performed on collection items. Enumeration node can be empty (without children nodes), but such rule node doesn’t do anything.

Immediately and buffered rule execution Below sections already describes call nodes and assignment nodes that can by performed in immediate mode or buffered mode. So, special attribute immediately can be specified in this XML nodes and it must by specified in rule node. Value of this attribute determines how to perform specified action. If value is “true” then actions will be performed immediately. Else, if value is “false”, then actions will be buffered. It allows to do some operations more quickly. For example, dimension elements creation works faster in buffered mode. Finally, if immediately attribute is not specified in call node or assignment node, then they works in accord with whole rule.

Page 16: Kettle Palo

All buffered actions are performed after rule execution, but rule can perform all buffered actions by calling special function “processActions”. It is very helpful in some cases, For example, database creation rule uses it (See examples).

Nested rules Nested rules are rules that are described inside rule. They calculate something and can be executed several times from different node positions. They are vary helpful in listed cases:

1. It is necessary to perform same actions in several cases. 2. It is necessary to calculate something and process it, but calculating is produced by

complex node. Moreover, engine allows to declare variables and engine doesn’t allows to modify variable values. But, variable value modifying is very comfortable in cases, then variable value is calculated by complex logic and depends from several conditions. This may be programmed in Palo engine by following mean. Special nested rule calculates value of variable and variable is declared with this value.

3. Rule is very complex and it is very helpful to disjoint rule on several simple rules. Nested rules has same format as normal rules. So, nested rules have name, description, input parameters and, finally, they can generate output stream. Nested rule can be executed in any expression as any internal engine function, but result of nested rule function is sequence of cortages. This sequence can be stored in variable or processed by enumeration node. Two special functions are accessible inside nested rules. First of them allows to write data to rule execution result and have name “RuleName_OutputRow”, Second of them allows to restart nested rule execution with other parameter values and have name “RuleName_Restart”. Following simple example demonstrates calculation of factorial. <rule name=”F” immediately=”true” description=””> <param name=”N” type=”numeric”/> <output name=”V” type=”numeric”/> <if condition=”N=1”> <call expression=”F_OutputRow(1)”/> </if> <if condition=”N&gt;1”> <rule name=”FComplex” immediately=”true” description=””>

<param name=”N1” type=”numeric”/> <param name=”V1” type=”numeric”/>

<output name=”result” type=”numeric”/>

<if condition=”N1=1”> <call expression=”F_OutputRow(V1)”/> </if> <if condition=”N1&gt;1”> <call expression=”FComplex_Restart(N1-1,V1*N)”/> </if>

</rule> <foreach name=”value” in=”FComplex(N,1)”> <call expression=”value.getNumeric(0)”/> </foreach> </if> </rule>

Page 17: Kettle Palo

Main rule F has one input parameter N and generates stream with one filed V. This stream always contains one cortege that contains value of factorial for given N. Further, if given N is equal to one, then rule writes one to result. Else it is used additional nested rule FComplex. It gets two parameter N1 and V1 and calculates N1!*V1. Internal logic of FComplex is very simply: if N1=1 then result of FComplex is V1 . Else, result of FComplex is equal to (N1-1)!*V1*N1. Finally, rule F gets result of FComplex and writes it to own result. Finally, is it able to store nested rule execution result to variable and check it by function “isEmpty”. This function returns true if rule result is empty sequence.

Page 18: Kettle Palo

Expression Syntax Working of most rule nodes based on expressions that determine how to analyze input data and calculate output data. Palo rule engine supports several data types listed below:

Numeric data. Value of this data type is a floating-point number. String data. Value of this data type is string. There are no any restrictions on string

content and length. Logic data (Boolean). Value of this data type is boolean flag. Single dimension elements. Value of this data type is single element from specified

dimension in specified Palo database. Sets of dimension elements. Value of this data type is set of elements from specified

dimension in specified Palo database. Sets of cube cells. Value of this data type is set of cells from specified cube. In other

words, this is a cube projection. PRE expressions are strong-type expressions and any expression has definite type. Of course, this type depends on used operations and functions that are used in expressions. More important, this type always can be defined during expression parsing and compilation. Palo rule engine supports standard set of operations for values manipulation. For example, engine supports standard mathematical operations written in infix form and so on. Syntax and usage of these operations depends on operand types, but some operations are independent from value types and their syntax is fixed. So, it is able to use brackets for specifying more priority operations and it is able to use functions. Function call has following syntax: function-name( arg1,arg2,....,argN ) Function name can be not unique, but function can be unambiguously identified by its name, count of parameters and types of parameters. Function result depends of parameter values and can depend from any external data. For example, function can get additional information from Palo cubes, input stream and so on. This information isn’t evidently specified in function arguments. In contrast, function result type is determined during expression compilation. It means that function result type depends on arguments count, arguments types and, probably, argument values if values can be calculated during expression compilation.

Numeric operations and functions PRE supports basic mathematical operations for manipulating by numbers. So, it is able to perform adding, subtracting, multiplying and dividing. Similarly, PBRE supports standard mathematical functions listed in following table. Function and arguments Description Abs(x) Returns modulus of given argument. Int(x) Returns integer value of given argument. Round(x) Rounds value of given argument. Mod(x,y) Returns residue of division x by y.

Page 19: Kettle Palo

String operations and functions PRE supports basic operations for manipulating by strings. First of all, BRE supports string concatenation and it has following syntax:

A + B + C ...

Expressions A, B, C, ... must be string expressions and result of whole expression is string expressions. Secondly, PRE supports string comparison operations that make lexicographic (alphabetical) comparison of given values. Mentioned operations have following syntax: A Operation B Operation may be “<”,”<=”,”>”,”>=”,”=” and ”!=”. All operations always return logical result. PRE has several build-in functions for working with string values. Those functions are listed in following table. Function and arguments Description isNumeric(str) Checks given string and returns true if given string represents

numeric. For example, isNumeric(‘123.45’)=true. toNumeric(str) Converts given string to numeric or generates error if given

string doesn’t represent numeric. toString(numval) Converts given numeric argument to its string representation. toString(setval) Converts given Set argument to its string representation. toString(numval,digits,maxlen) Converts given numeric argument to its string representation

with given precision. toStringSet(set) Converts given Set to its string representation that is created by

string representations of each item from given argument. toStringVector(str) Converts given Vector to its string representation that is created

by string representations of each item from given argument. toUpper(str) Converts given string to upper case. toLower(str) Converts given string to lower case. trimLeft(str) Trims left spaces in given string. trimRight(str) Trims right spaces in given string. trimAll(str) Trims left and right spaces in given string. getLength(str) Returns length of given string. getMiddle(str,i,length) Returns part of given string. Parameter ‘i’ is start character of

middle part and parameter length is length of part. Function can generate error.

getLeft(str,length) Returns left part of given string. Length is a length of required part. Function generates error if length greater than length of string.

getRight(str,length) Returns right part of given string. Length is a length of required part. Function generates error if length greater than length of string.

Page 20: Kettle Palo

Logical operations and functions PRE supports standard logical operations and they have common syntax. This syntax is similar to syntax of common programming languages including C, C++, C# and Java. So, syntax operations have following syntax:

Logical “and” has following syntax: A & B & C & ....

Expressions A, B, C must be logical expressions.

Logical “or” has following syntax:

A | B | C ...

Expressions A, B, C must be logical expressions. Logical and is more priority operation than logical or. So, expression “ A | B & C “ is compiled and calculated as expression “ A | (B & C) ”.

Logical “not” has followed syntax: !A

Expressions A must be logical expression. Logical not is more priority operation than logical or and logical and operations. So, expression “!A & B” is compiled and calculated as expression “(!A) & B”.

Condition operation allows to determine two branches of expression calculating. This operation has following syntax:

C ? A : B

Expression C must be a logical expression. Expressions A and B must by any type expressions, but they must have same result type. For example, it is not allowed when expression A gives string type value, but expression B gives numeric type value. Whole condition expression has type of expression A.

Sets of values PRE gives several operations for manipulating by set of values. All sets haven’t any distinctions from other values and it is able to create sets of sets of any values. So, sets operations has following syntax:

Set construction operator has following syntax:

{ item1, item2, item3 ,... }

In this expression items must have same value type, but it is able to use expressions instead constant values. Result of this whole expression is set of listed values.

Set union operation has syntax as numeric addition operation:

A + B + C ...

Expressions A,B,C must have set type. Result of whole expression is union of given arguments.

Page 21: Kettle Palo

Set union operation has syntax as numeric multiplication:

A * B * C ...

Expressions A,B,C must have set type. Result of whole expression is intersection of given arguments

Set subtraction operation has syntax as numeric subtraction:

A - B - C ...

Expressions A,B,C must have set type. Result of whole expression is subtraction of given arguments.

Finally, it is able to get single elements from set by using following syntax:

A[index]

So, in this expression A is a function or variable that returns set and index is a numeric index of element.

It is able to enumerate all set items by using enumeration node. Moreover, PRE has several functions for working with set of values. These functions listed in following table. Function and arguments Description getSize(set) Returns count of elements in given set. isEmpty(set) Checks given set and returns true if given set is empty. contains(set,elem) Checks given set and returns true if it contains given element. includes(set,subset) Checks given set and returns true if it contains each element of

given subset. getEmpty(value) Creates empty set which can contains values of same type as given

value. indexOf(set,value) Returns index of given value in given set. getNumericRange(start, finish)

Returns set of values in range (start, finish)

Dimension element operations and functions PRE allows using cub dimension elements in expressions. So, it is possible to use following syntax for calling to elements: Database.Dimension.Element Similarity, it is able to create set of all dimension elements by using following expression:

Database.Dimension PRE has several build-in functions for working with dimension elements. Those functions are listed in following table.

Page 22: Kettle Palo

Function and arguments Description getElementName(elem) Returns name of given element. getElementType(elem) Returns type of given element. Type is a string values

and may be one of following values: - consolidated - numeric - string - rule

getChildren(element) Returns set of children of the dimension element getParents(elem) Returns set of all consolidated elements, which contains

specified element. getDimensionElements(connection, dimension)

Returns set of all elements from dimension

getWholeDimension(elem) Returns set of all elements from dimension given by one element from this dimension or by given set of dimension elements.

getDimensionElement(connection, dimension, element)

Returns specified element from dimension

checkDimensionElement(connection, dimension, element)

Checks whether specified element exists in the dimension

Cube cells operations PRE allows to work with single cube cells and sets of cube cells. Sets of cube cells serves as cube projections with fixed one or more elements on each cube dimension. So, set of cube cells can be whole cube, subset of cube cells, single cube cells or empty projection. Following expression gives set of all cube cells:

Database.Cube You should use indexing operator for getting cube projection. It has following syntax Projection[elem1,elem2,elem3,...] Projection must be already existed cube projection (accessing to whole cube, variable or function). Elements must be a single elements from same cube dimension. Result of expression is a cube projection that contains cells of given source projection, but those cells are corresponding to specified dimension elements. PRE supports functions for working with cube projections. Function and arguments Description getCube(connectionName, cubeName) Returns cube by given name using given connection. getCubeName(projection) Returns cube name by given projection. getDatabaseName(projection) Returns database name by given projection. getDimensionCount(projection) Returns dimension count in cube of given projection. getDimensionName(cell,dim) Returns dimension name by given projection and

dimension index. getDimensionElements(cell,dim) Returns set of fixed elements in given projection by

dimension index. isSingleCell(projection) Checks projection and returns true if projection

contains single element.

Page 23: Kettle Palo

isNumericCells(projection) Checks projection and returns true if projection contains only numeric elements.

isStringCells(projection) Checks projection and returns true if projection contains only string elements.

NumericValue Returns numeric value of first projection element. It is able to use this function in set-cell engine items.

StringValue Returns string value of first projection element. It is able to use this function in set-cell engine items.

Cache(cell) Caches values of given cube or sub cube. Cache(cell,maxsize) Caches values of given cube or sub cube, but limits

cache size. PutValueToCache(cell,value) Put value of specified cell to cache.

Aggregate function PRE supports several aggregated functions listed below. Each function gets one or three parameters. If function gets one parameter then this parameter must be set of values or cube projection and function calculates result based on values from set or values from cube cells. If function gets three parameters, then first of them must be set of values or cube cells. Second must be string constant and specifies variable name in expression. Third parameter must be string constant too and specifies expression. This expression can use variable specified in second function parameter. Aggregate function maps values from given set of value or cube cells to specified variable and calculates specified expression for each value from given set or cube projection. Finally, function aggregates calculated values and results single value. Function and arguments Description Min(set) Calculates minimum value. Min(set,varname,exp) Calculates minimum value. Max(set) Calculates maximum value. Max(set,varnname,exp) Calculates maximum value. Avg(set) Calculates average value. Avg(set,varname,exp) Calculates average value. Sum(set) Calculates sum of all values. Sum(set,varname,exp) Calculates sum of expression values.

Working with input and output streams PRE can be used for processing data from kettle streams and generate kettle streams. For this purposes PRE has several functions listed below. Function and arguments Description Input(fieldName) Returns value of specified field in processing row of

default input stream. Input(streamName, fieldName) Returns value of specified field in processing row of

specified input stream. Output(fieldName) Must be used in set-cell engine item and writes data to

specified field of output row. OutputRow(f1,f2,...fN) Generates additional output row with given field values. IsFirstRow() Checks whether current row is the first row of the

Page 24: Kettle Palo

default input stream IsFirstRow(streamName) Checks whether current row is the first row of the

specified input stream IsLastRow() Checks whether current row is the last row of the

default input stream IsLastRow(streamName) Checks whether current row is the last row of the

specified input stream IsFinished() Checks whether default input stream is finished IsFinished(streamName) Checks whether specified input stream is finished ProcessNext(streamName) Tries to read next row from specified input stream CacheRows(streamName) Allows rows caching in specified input stream FinishProcess(streamName) Marks stream as read Rows(streamName) Reads all remaining rows of specified stream

Database operations PRE gives several functions for working with cubes, dimensions in database. Function and arguments Description getDatabaseCubes(connect) Returns set of cubes’ names in given Palo database.

Database must be specified by connection name explained in rule connection.

createDatabaseCube(connect,name,dims)

Creates new cube by given gets connection name, cube name and set of dimension names. It is able to use this function in set-call engine items only.

getDatabaseDimensions(connect) Returns set of dimension names in given Palo database. Database must be specified by connection name explained in rule connection.

createDatabaseDimension(connect,name)

Creates new dimension by given connection name and dimension name. It is able to use this function in set-call engine items only.

getCubeDimensions(connect,cube) Returns set of dimension names in given Palo cube. This function similar to getDatabaseDimensions, but returns set of dimension names that are used in given cube. Function receives connection name and cube name as arguments.

getDimensionElementsByName(connect,name)

Returns set of element names in specified dimension. Function receives connection name and dimension name as arguments.

createDimensionElement(connect,dim,name,type)

Function allows to create new element. It receives connection name, dimension name, new element name and element type. Last argument (type) must be one of following values:

- numeric - string - consolidated

It is able to use this function in set-call engine items only.

createConsolidation(connect,dim,elem,elemParent,factor)

Function allows to consolidate elements. So, it gets connection name, dimension name, child element name, parent element name and consolidation factor. After

Page 25: Kettle Palo

function execution specified children element becomes a children element in specified parent element. It is able to use this function in set-call engine items only.

consolidationFactor(elem,elemParent)

This function allows to work with already consolidated element. So, it returns consolidation factor if it is used in expression and it sets consolidation facto if it is used in set-call engine item.

Logging functions PRE support several for writing messages to log and controlling data process. Function and arguments Description logMinimal(args) Writes message to log. Message is constructed by concatenating

string representation of given arguments and will be visible on minimal loging level.

logBasic(args) Writes message to log. Message is constructed by concatenating string representation of given arguments and will be visible on basic logging level.

logError(args) Writes error message to log. Message is constructed by concatenating string representation of given arguments.

logDetailed(args) Writes detailed message to log. Message is constructed by concatenating string representation of given arguments.

logDebug(args) Writes debug message to log. Message is constructed by concatenating string representation of given arguments.

stop() Stops data processing. processActions() Processes delayed (buffered) actions. incrementInputRows Increments count of read rows. incrementInputRows Increments count of wrote rows. InputRowsCount Returns count of read rows. OutputRowsCount Returns count of wrote rows.

Page 26: Kettle Palo

Error messages description PRE can generate errors if it is unable to execute specified Rule. There are several causes of it:

- syntax error in expression and expression can’t be compiled; - evaluation error occurs if it is unable to calculate expression value. For example, this

occurs if specified index of range and so on; - engine error occurs if all expressions are correct, but unable to perform specified action; - reader error occurs if rule representation XML contains error.

Next document sections gives detailed description of each possible error.

Compiler errors Code Description

1 Syntax error. (Can’t parse expression) Error occurs if expression contains evident syntax error. For example, there is no close bracket and so on. In most cases, error description contains more detailed information about occurred error.

2 Internal parser error. This is an unknown error that occurs in unintelligible cases.

3 Attempt to incorrect add. This error occurs if user attempts to add value that doesn’t support this operation. For example, user tries to add logic value to another logic value.

4 Attempt to incorrect sub. This error occurs if user attempts to subtract value that doesn’t support this operation. For example, user tries to subtract logic value from another logic value.

5 Add different types. This error occurs if user tries to add values of different types. For example, user tries to add numeric value to string value.

6 Attempt to incorrect multiplication. This error occurs if user attempts to multiply values that don’t support this operation. For example, user tries to multiply logic value to another logic value.

7 Attempt to multiply different types. This error occurs if user tries to multiply values of different types. For example, user tries to multiply numeric value on set of numeric values.

8 Attempt to and different types. This error occurs if user tries to perform logical and on values of different types. For example, user tries to build logical and on logic value and numeric value.

9 Attempt incorrect and. This error occurs if user attempts to perform logical and operation on values which don’t support this operation.

10 Attempt incorrect not. This error occurs if user attempts to perform logical not operation on values which don’t support this operation.

11 Attempt to or different types. This error occurs if user tries to perform logical or on values of different types. For example, user tries to build logical and on logic value and numeric value.

12 Attempt incorrect or. This error occurs if user attempts to perform logical or operation on values which don’t support this operation.

13 Can’t compare values.

Page 27: Kettle Palo

This error occurs if user attempts to compare values that don’t support specified comparison operation.

14 Constructed set contains values of different types. User tries to create set of values, but specified set members has different type. For example, user tries to put into same set numeric value and string value.

15 Index is not supported for this type. User tries to get index of value, but this value type doesn’t support indexing operation. For example, it is unable to write “i[10]” if “i’ is a numeric variable.

16 Different indexing operations. User has specified different type value in same index. For example, he has written elements from different dimensions in same index.

17 Result of condition expression has different types. Expression contains condition operator, but it is unable to determine result type of this operator. For example, user has wrote “ (i>j)?1:{1} “. If i>j then expression result is numeric value, but if i<=j, then expression result is set of numeric values.

18 Condition expression isn’t logical expression. Condition expression in condition operator doesn’t give boolean result. For example, expression “ i?1:0 ” is error expression if “i” isn’t logical value.

19 Function or variable not found. Error occurs if user has specified unknown function or variable name.

20 Unknown database object identifier... Error occurs if user has specified some database object, but compiler can’t treat this object. For example, user has written “A.B.C”, but “A” is not connection name or “B” isn’t database object name.

21 Invalid function arguments. Expression calls existed function, but function has another parameter count or parameter types.

Evaluating errors Code Description

1 Division by zero. 2 Dimension element not found. 3 Database not found. 4 Dimension not found. 5 Cube not found. 6 Can’t set value.

Specified expression doesn’t represent any stored value and it is unable to set new value of specified variable.

7 Invalid function argument. Function has got invalid argument value.

8 Unable to setup constant. Specified expression always is a constant or isn’t constant.

9 Operation not supported. 10 Can’t compile string.

Aggregate function can’t compile given expression. 11 Elements are not consolidated

Specified dimension elements are not consolidated. 12 Calculation error that doesn’t depends from evaluator (some external error). 13 Can’t calculate function. 14 Can’t perform action.

Page 28: Kettle Palo

Engine errors Code Description

1 Can’t compile left part of assignment. Error can occur during engine initialization. In assignment item unable to compile expression in “cell” attribute. Error description contains detailed information about compiler error.

2 Can’t compile right part of assignment. Error can occur during engine initialization. In assignment item unable to compile expression in “expression” attribute. Error description contains detailed information about compiler error.

3 Can’t calculate assigned value. Error can occur during rule execution if it is unable to calculate expression in assignment item. Error description contains detailed information about evaluator error.

4 Can’t set value. Error can occur during rule execution if it is unable to set new value of specified expression (see “cell” attribute in assignment item). Error description contains detailed information about evaluator error.

5 Can’t compile condition expression. Error can occur during engine initialization. In condition engine unable to compile expression in “condition” attribute. Error description contains detailed information about compiler error.

6 Can’t calculate condition. Error can occur during rule execution if it is unable to calculate expression in condition item. Error description contains detailed information about evaluator error.

7 Invalid parameter name. Attempt to set value of unknown rule additional parameter.

8 Invalid parameter type. Attempt to incorrect value of rule additional parameter.

9 Condition expression isn’t logic expression. Error occurs if specified expression in condition item doesn’t return logical value.

10 Can’t compile enumeration expression. Error can occur during engine initialization. In enumeration item unable to compile expression in “in” attribute. Error description contains detailed information about compiler error.

11 Can’t calculate result of enumeration expression. Error can occur during engine execution. In enumeration item unable to calculate expression in “in” attribute. Error description contains detailed information about evaluator error.

12 Result of enumeration expression isn’t set of elements. Error can occur during engine initialization. Result of expression in “in” attribute of enumeration item must return set of value.

13 Can’t compile variable value. Error can occur during engine initialization. In variable item unable to compile expression in “expression” attribute. Error description contains detailed information about compiler error.

14 Can’t calculate variable value. Error can occur during engine initialization. In variable item unable to calculate variable value. Error description contains detailed information about evaluator error.

15 Variable already declared.

Page 29: Kettle Palo

Error can occur during engine initialization if variable with specified name already declared.

16 Can’t compile function call. Error can occur during engine initialization. In call item unable to compile expression in “expression” attribute. Error description contains detailed information about compiler error.

17 Can’t make call. Error occurs because of some internal function error. In most cases, error description gives detailed information about error cause.

Rule reading errors Code Description

1 Can’t read file. Error occurs if unable to read file content of file content isn’t valid XML document.

2 Can’t write file. 3 Index out of range.

Invalid index of rule or function in repository. 4 Not found.

An’ find rule or function with specified name. 5 Invalid node type.

Attempt to create unknown rule node type. 6 Call element can’t have children elements.

In XML representation <call> element can’t have children elements. 7 Assignment element can’t have children elements.

In XML representation <set-cell> (or <set>) element can’t have children elements. 8 Variable declaration element can’t have children elements.

In XML representation <set-var> (or <declare>) element can’t have children elements. 9 Unable to declare child rule. 10 There is no variable name.

In XML representation <set-var> (or <declare>) element hasn’t “name” attribute. 11 There is no variable value.

In XML representation <set-var> (or <declare>) element hasn’t “expression” attribute. 12 There is no cell addressing expression.

In XML representation <set-cell> (or <set>) element hasn’t “cell” attribute. 13 There is no cell value.

In XML representation <set-cell> (or <set>) element hasn’t “expression” attribute. 14 There is no rule name.

In XML representation <rule> element hasn’t “name” attribute. 15 There is no parameter or output rule name.

In XML representation <param> or <output> element hasn’t “name” attribute. 16 There is no function name.

In XML representation <function> element hasn’t “name” attribute. 17 Invalid XML element.

There is an unknown element in XML representation. 18 There is no condition expression.

In XML representation <if> element hasn’t “condition” attribute. 19 There is no enumeration variable name.

In XML representation <foreach> element hasn’t “in” attribute. 20 There is no enumeration value.

Page 30: Kettle Palo

In XML representation <foreach> element hasn’t “name” attribute. 22 Can’t connect to repository. 23 There is no rule description.

In XML representation <rule> element hasn’t “description” attribute. 24 There is no function description.

In XML representation <function> element hasn’t “description” attribute. 25 There is no parameter type.

In XML representation <param> or <output> element hasn’t “type” attribute. 26 Invalid parameter type.

In XML representation <param> or <output> element contains invalid value in “type” attribute.

Page 31: Kettle Palo

Examples

Exporting cube structure to stream This example demonstrates how to represent cube structure in Kettle stream and how to generate this stream by Palo Rule step. For this goal, kettle transformation must contain Palo Rule step, which gets data from Palo server and generates data stream. This step must use special rule, which gets Palo connection and Palo cube as input parameters. Generated output stream may be directed to flat file or any another kettle step. For example, this stream may be directed to another Palo rule step that generates new cube by given description. Stream generation rule will be described after detailed description of cube structure stream. So, cube structure description includes information about each cube dimension and each element in dimensions. Each row of structure stream has six fields and can represents information about single dimension element or about elements consolidation. Moreover, special row contains information about all dimensions in cube. This information id needed for cube creation and this row must be a first row in stream. Following table contains detailed information about stream fields. Field Description dimension Dimension name. If this field is empty, then row contains information about all

dimensions of cube. element Element name, or list of cube dimension (if row dimension field is empty). parent Parent element name if row contains information about element consolidation. type Element type. Field must contain on of listed values:

- numeric - string - consolidation

factor If row contains information about elements consolidation, then this field contains consolidation factor.

Described cube structure stream is generated be following PRE rule: <rule description="" name="ExportStructureToFile">

<!-- Rule gets two input additional parameters: Palo connection and cube name in specified Palo database -->

<param name="connect" type="connection"/> <param name="cubename" type="string"/>

<!-- Rule generates output stream. This stream has size fields --> <output name="dimension" type="string"/> <output name="parent" type="string"/> <output name="element" type="string"/> <output name="type" type="string"/> <output name="factor" type="numeric"/>

<!-- Firstly, declare variable for simple cube accessing --> <declare expression="getCube('connect',cubename)" name="c"/>

<!-- Secondly, write special row to output stream. This row contains information about cube dimensions -->

<call expression="OutputRow('',getCubeDimensions('connect',cubename).toString,'','',0)"/>

<!-- Look each cube dimension --> <foreach in="getNumericRange(0,c.getDimensionCount-1) " name="dimnum">

<!-- Get dimension name and store it in variable --> <declare expression="c.getDimensionName(dimnum)" name="dimname"/>

Page 32: Kettle Palo

<!-- Look each cube dimension element --> <foreach in="c.getDimensionElements(dimnum) " name="elem">

<!-- Write information about each element -->

<call expression="OutputRow(c.getDimensionName(dimnum),'',elem.getElementName,elem.getElementType,0)"/>

<!-- Write information about consolidated elements -->

<foreach in="elem.getChildren" name="child"> <!-- Get consolidation factor and store it to variable --> <declare expression="consolidationFactor('connect',dimname,

child.getElementName,elem.getElementName)" name="f"/> <!—- Output row --> <call expression="OutputRow(dimname, elem.getElementName,

child.getElementName, elem.getElementType,f)"/> </foreach> </foreach> </foreach> </rule>

Create cube by given structure This example demonstrates how to create cube by Palo rule. This rule gets cube description from cube structure stream and two additional parameters. First parameter is connection with Palo database. Second parameter is cube name. This cube will be created in specified Palo database during rule execution. Followed listing represents this Palo rule. <rule description="" name="ImportStructureFromFile">

<!-- Rule gets two input additional parameters: Palo connection and cube name in specified Palo database -->

<param name="connect" type="connection"/> <param name="cubename" type="string"/>

<!-- If current row represents information about whole cube --> <if condition="Input('dimension')=''">

<!-- Get set of cube dimension names --> <declare expression="toStringSet(Input('parent'))" name="dims"/> <!-- Create each cube dimension --> <foreach in="dims" name="dim"> <call expression="createDatabaseDimension('connect',dim)"/> </foreach>

<!-- Create whole cube if it is necessary --> <call expression="createDatabaseCube('connect',cubename,dims)"/> </if> <!-- If current row represents information about single dimension element or consolidation --> <if condition="Input('dimension')!=''">

<!-- If cube hasn’t specified dimension then log error message and stop process--> <if condition=" !(getCubeDimensions('connect',cubename).contains( Input('dimension')) )">

<call expression="logError('Does not contains dimension',Input('dimension'))"/> <call expression="stop"/> </if>

<!—- Create new dimension element if it is necessry --> <call expression="createDimensionElement('connect', Input('dimension'),

Input('element'),Input('type')) "/> <!—- If row represents information about element consolidation then create element

consolidation with specified factor --> <if condition="Input('parent')!=''"> <call expression="createConsolidation('connect',Input('dimension'),

Input('element'),Input('parent'),Input('factor')) "/> </if> </if> </rule>

Page 33: Kettle Palo

Currency exchange example This section describes more real example. It fills cube that has three dimensions and contains information about currency exchanges. First dimension has name ‘Period’ and stores information Date. Second dimension contains currency name and, finally, third dimension contains information type about currency exchange: Actual value, variance and forecast and next period. Cube

Cube examples Followed several examples works on same cubes and demonstrates Palo Engine usage for manipulating data in more complex case. This section describes structures of each cube, but rule examples are described in following sections. All examples are based on one idea. There is a company that works in Europe. But has two subcompanies. First of them works in Germany, second works in another country. Each company has own cube, which contains information about sales. This information is analogue of Palo Demo database, but there are some distinctions. Germany company cube hasn’t region dimension. Other company cube has this dimension, but hasn’t “Germany” element in this dimension. Both cubes are filled by data from Palo database; global company cube is filled by data from Germany and other cube. Followed figure illustrates this.

Fig 10. Data transformation

Finally, global company has special cube that contains analytical information based on global cube data. Followed sections describe implementation of each arrow on illustrated figure.

Creating and filling germany cube Germany cube is analogue of Demo cube, but it hasn’t ‘Regions’ dimension, because cube contains info about sales in germany only. Input stream is generated by cube structure output example and structure of this stream is described above. Germany cube generation rule is similar to listed above cube creating rule, but this rule doesn’t create region dimension. Following rule gets structure of demo cube and creates germany cube. <rule description="" name="ImportSingleRegionStructureFromFile">

<!-- Rule gets two input additional parameters: Palo connection and cube name in specified Palo database -->

<param name="connect" type="connection"/> <param name="cubename" type="string"/> <!-- If row contains cube description, then create cube without regions dimension -->

Palo Demo

Germany

Global

Other

Analytics

Page 34: Kettle Palo

<if condition="Input('dimension')=''"> <declare expression="toStringSet(Input('parent'))" name="dims"/> <foreach in="dims" name="dim">

<!-- Create dimension if it is not regions dimension --> <if condition="dim!='Regions'"> <call expression="createDatabaseDimension('connect',dim)"/> </if> </foreach> <!-- Create Germany cube --> <call expression="createDatabaseCube('connect',cubename,dims-{'Regions'})"/> </if> <!-- If row contains data about dimension element and it is not element from Regions dimension,

then create it --> <if condition="Input('dimension')!=''"> <if condition="Input('dimension')!='Regions'"> <if condition="!(getCubeDimensions('connect',cubename).contains(Input('dimension')) )"> <call expression="logError('Does not contains dimension ',Input('dimension'))"/> <call expression="stop"/> </if> <call expression="createDimensionElement('connect',Input('dimension'),Input('element'),Input('type')) "/> <if condition="Input('parent')!=''"> <call expression="createConsolidation('connect',Input('dimension'),Input('element'),Input('parent'),Input('factor')) "/> </if> </if> </if> </rule>

Germany cube filling rule isn’t so complex. It gets Demo cube output stream (it may be generated by Palo Output stream), reads data from input stream and checks region value. If region value isn’t “Germany”, then rule writes data to “Germany” cube. Rule is listed below. <rule description="" name="ImportGermanyData">

<!-- Rule gets two input additional parameters: Palo connection and cube name in specified Palo database -->

<param name="connect" type="connection"/> <param name="cubename" type="string"/>

<!-- Rule works with specified cube only and stores it in special variable --> <declare expression="getCube('connect',cubename)" name="c"/>

<!-- Check regions value --> <if condition="Input('Regions')='Germany'">

<!-- If region is Germany, then get elements of each dimension and store value to cube --> <declare expression="getDimensionElement('connect','Products',Input('Products'))"

name="product"/> <declare expression="getDimensionElement('connect','Months',Input('Months'))"

name="month"/> <declare expression="getDimensionElement('connect','Years',Input('Years'))" name="year"/> <declare expression="getDimensionElement('connect','Datatypes',Input('Datatypes'))"

name="datatype"/> <declare expression="getDimensionElement('connect','Measures',Input('Measures'))"

name="measure"/> <set

cell="NumericValue(c[product][month][year][datatype][measure])" expression="Input('Value')"/>

</if> </rule>

Page 35: Kettle Palo

Creating and filling cube for other countries Structure of other company cube is similar to Demo cube and this cube can be created by above listed rule (see create cube by given structure). But there is feature in cube filling process, because it is need to check data from Demo cube and don’t process data about sales in Germany. It is produced by special rule. This rule is similar to previous rule, but puts data into more complex cube with dimension ‘Regions’. <rule description="" name="ImportOtherData">

<!-- Rule gets two input additional parameters: Palo connection and cube name in specified Palo database -->

<param name="connect" type="connection"/> <param name="cubename" type="string"/>

<!-- Rule works with specified cube only and stores it in special variable --> <declare expression="getCube('connect',cubename)" name="c"/> <if condition="Input('Regions')!='Germany'">

<!-- If region is not Germany, then get elements of each dimension and store value to cube -->

<declare expression="getDimensionElement('connect','Products',Input('Products'))" name="product"/>

<declare expression="getDimensionElement('connect','Regions',Input('Regions'))" name="region"/>

<declare expression="getDimensionElement('connect','Months',Input('Months'))" name="month"/>

<declare expression="getDimensionElement('connect','Years',Input('Years'))" name="year"/> <declare expression="getDimensionElement('connect','Datatypes',Input('Datatypes'))"

name="datatype"/> <declare expression="getDimensionElement('connect','Measures',Input('Measures'))"

name="measure"/> <set cell="NumericValue(c[product][region][month][year][datatype][measure])"

expression="Input('Value')"/> </if> </rule>

Creating and filling global cube Global cube is analogue of Palo Demo cube and can be created by above listed rule (see create cube by given structure). Moreover, this cube is filled by data from Germany cube and Other company cube, but second operation doesn’t require special rule, because cubes has identical structures and data can be transferred by Palo Input-Palo Output Kettle steps. Transferring data from cube from Germany is similar and based on Palo Input and Palo output steps. So, Palo input step gets data from Germany cube and generates Kettle stream. This stream is routes special rule (listed below), which adds additional ‘Region’ column to each stream row, and, finally, modified rows are sent to Palo output stream. <rule description="" name="ImportGermanyDataToGlobal"> <!-- Rule hasn’t additional parameters and connections, but it generates output stream --> <output name="Regions" type="string"/> <output name="Datatypes" type="string"/> <output name="Measures" type="string"/> <output name="Months" type="string"/> <output name="Products" type="string"/> <output name="Years" type="string"/> <output name="Value" type="numeric"/> <!-- Rule copies data from input stream to output and adds ‘Regions’ value--> <set cell=" Output('Regions')" expression="'Germany'"/> <set cell=" Output('Datatypes') " expression=" Input('Datatypes') "/> <set cell=" Output('Measures') " expression=" Input('Measures') "/> <set cell=" Output('Months') " expression=" Input('Months') "/>

Page 36: Kettle Palo

<set cell=" Output('Products') " expression=" Input('Products') "/> <set cell=" Output('Years') " expression=" Input('Years') "/> <set cell=" Output('Value') " expression=" Input('Value') "/> </rule>

Creating and filling analytical cube Structure of analytical cube differs from global cube. It has five dimensions. Three of them are identical to dimensions of global cube. It is a “Measures”, “Products” and “Regions” dimensions. Analytical cube has dimension “Periods” instead of dimensions “Years” and “Months”. This dimension contains elements like “2002 Jan”, “2002 Feb”, “2002 Mar” grouped by “2002 Qtr.1 ” and finally four quarter elements are grouped to element “2002”. Finally, Analytical cube has dimension “Shares” which contains elements “Value”, “In qtr”, ”In year”, ”In all time”, ”In subregion”, ”In all regions”, ”In product group”, ”In all products”. Below listed rule generates structure of analytical cube and writes it to output stream. Structure of this stream is similar to structure of stream in first example and may be used for creating new cube by second example. <rule description="" name="CreateAnalyticstructure"> <!-- Rule gets cube name as parameter and works with Palo database directly --> <param name="connect" type="connection"/> <param name="cubename" type="string"/>

<!-- Rule generates output stream with five fields --> <output name="dimension" type="string"/> <output name="parent" type="string"/> <output name="element" type="string"/> <output name="type" type="string"/> <output name="factor" type="numeric"/> <!-- Write row with information with cube structure --> <declare

name="names" expression="{'Regions','Products','Measures','Periods','Shares'}"/>

<call expression="OutputRow('',names.toString,'','',0)"/> <!—- Copy dimensions which are similar to global cube --> <foreach in="{'Regions','Products','Measures'}" name="dimname">

<!—- Output information about each element --> <foreach

in="getDimensionElementsByName('connect',dimname) " name="elem">

<call expression="OutputRow(dimname,'',elem.getElementName,elem.getElementType,0)"/> <foreach in="elem.getChildren" name="child"> <declare

expression="consolidationFactor('connect',dimname,child.getElementName,elem.getElementName)"

name="f"/> <call

expression="OutputRow(dimname,elem.getElementName,child.getElementName,elem.getElementType,f)"/>

</foreach> </foreach> </foreach>

<!—- Generate Periods dimension --> <foreach in="getDimensionElementsByName('connect','Years') "

name="elemYear"> <call expression="OutputRow('Periods','',elemYear.getElementName,'consolidated',0)"/> <foreach

in="getDimensionElementsByName('connect','Months') " name="elemMonth"> <if condition="elemMonth.getElementName!='Year'"> <declare name="elemName"

expression=" elemYear.getElementName + ' ' + elemMonth.getElementName" />

Page 37: Kettle Palo

<call expression="OutputRow('Periods','',elemName,elemMonth.getElementType,0)"/> <if condition="elemMonth.getChildren.getSize!=0"> <call

expression="OutputRow('Periods',elemYear.getElementName,elemName,elemMonth.getElementType,1)"/>

</if> <foreach

in="elemMonth.getChildren" name="child"> <declare

name="childName" expression=" elemYear.getElementName + ' ' + child.getElementName"

/> <declare

expression="consolidationFactor('connect','Months',child.getElementName,elemMonth.getElementName)"

name="f"/> <call

expression="OutputRow('Periods',elemName,childName,child.getElementType,f)"/>

</foreach> </if> </foreach> </foreach>

<!—- Generate Shares dimension --> <call expression="OutputRow('Shares','','Value','numeric',0)"/> <call expression="OutputRow('Shares','','In qtr','numeric',0)"/> <call expression="OutputRow('Shares','','In year','numeric',0)"/> <call expression="OutputRow('Shares','','In all time','numeric',0)"/> <call expression="OutputRow('Shares','','In subregion','numeric',0)"/> <call expression="OutputRow('Shares','','In all regions','numeric',0)"/> <call

expression="OutputRow('Shares','','In product group','numeric',0)"/> <call

expression="OutputRow('Shares','','In all products','numeric',0)"/> </rule>

Moving data from global cube to analytic cube consists of two steps, but it is possible to unite them to single step, based on transformation rule. First of steps copies data from global cube to analytic cube and second – calculates analytic cells in cube. Both steps are based on Palo Rule. Moving data from global cube to analytic cube is produced by rule that gets data from stream and puts it into Palo cube. <rule name="ImportAnalyticFromGlobal" description=""> <!-- Rule gets cube name as parameter and works with Palo database directly --> <param name="connect" type="connection"/> <param name="cubename" type="string"/>

<!—- Get cube --> <declare name="c" expression="getCube('connect',cubename)"/> <!—- Get elements from each dimension --> <declare name="m"

expression="getDimensionElement('connect','Measures',Input('Measures'))"/> <declare name="p"

expression="getDimensionElement('connect','Products',Input('Products'))"/> <declare name="r"

expression="getDimensionElement('connect','Regions',Input('Regions'))"/> <declare

name="pername" expression=" Input('Years')+' ' + Input('Months')"/>

<declare name="per" expression="getDimensionElement('connect','Periods',pername)"/>

<declare name="s"

expression="getDimensionElement('connect','Shares','Value')"/>

<!—- Write value to cube cell -->

Page 38: Kettle Palo

<set cell="NumericValue(c[m][p][r][per][s])" expression="Input('Value')" />

</rule>