kettle manual

Upload: gustavoadolfomedinapaz

Post on 12-Oct-2015

89 views

Category:

Documents


1 download

TRANSCRIPT

  • 01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 02. Spoon Introduction

    Installing KettleYou can download PDI from . At the time of this writing, the newest released version is 3.0.3, so the file you have to download is Sourceforge.net

    .Kettle-3.0.3.GA-nnnn.zip

    PrerequisitesKettle requires the Sun Java Runtime Environment (JRE) version 1.5 (also called 5.0 in some naming schemes) or newer. You can obtain a JREfor free from .http://java.sun.com/

    InstallationPDI does not require installation unless you download the Windows .exe file, which needs no specific installation instructions. For all otherplatforms, simply unpack the zip file into a folder of your choice. On Unix-like operating systems, you will need to make the shell scriptsexecutable by using the command:chmod

    cd Kettle chmod +x *.sh

    Pentaho Data Integration (Kettle) Tutorial 02. Spoon Introduction

  • 1. 2.

    02. Spoon Introduction01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 03. Hello World Example

    Spoon IntroductionSpoon is the graphical tool with which you design and test every PDI process. The other PDI components execute the processes designed withSpoon, and are executed from a terminal window.

    Repository and filesIn Spoon, you build Jobs and Transformations. PDI offers two methods to save them:

    Database repositoryFiles

    If you choose the repository method, the repository has to be created the first time you execute Spoon. If you choose the files method, the Jobsare saved in files with the extension, and the Transformations are in files with the extension. In this tutorial you'll work with the secondkjb ktrmethod.

    Starting SpoonStart Spoon by executing on Windows, or on Unix-like operating systems. As soon as Spoon starts, a dialog windowspoon.bat spoon.shappears asking for the repository connection data. Click the button.No Repository

    The next thing you'll see is a welcome window. Go to the menu and click . A window will come up that enables you to changeEdit Options...various general and visual characteristics. If you change something, it will be necessary to restart Spoon in order to see the changes applied.

    01. Installing Kettle Pentaho Data Integration (Kettle) Tutorial 03. Hello World Example

  • 03. Hello World Example02. Spoon Introduction Pentaho Data Integration (Kettle) Tutorial 04. Refining Hello World

    Hello World ExampleAlthough this will be a simple example, it will introduce you to some of the fundamentals of PDI:

    Working with the Spoon toolTransformationsSteps and HopsPredefined variablesPreviewing and Executing from SpoonExecuting Transformations from a terminal window with the Pan tool.

    OverviewLet's suppose that you have a CSV file containing a list of people, and want to create an XML file containing greetings for each of them.

    If this were the content of your CSV file:

    last_name, nameSuarez,MariaGuimaraes,JoaoRush,JenniferOrtiz,CamilaRodriguez,Carmenda Silva,Zoe

    This would be the output in your XML file:

    - -

    Hello, Maria! -

    Hello, Joao! -

    Hello, Jennifer! -

    Hello, Camila! -

    Hello, Carmen! -

    Hello, Zoe!

    The creation of the file with greetings from the flat file will be the goal for your first Transformation.

    A Transformation is made of Steps linked by Hops. These Steps and Hops form paths through which data flows. Therefore it's said that aTransformation is .data-flow oriented

    Preparing the environment

  • 1. 2. 3.

    1.

    2.

    3. 4.

    1. 2. 3. 4.

    Before starting a Transformation, create a folder in the installation folder or some other convenient place. There you'll save all the files forTutorialthis tutorial. Then create a CSV file like the one shown above, and save it in the Tutorial folder as .list.csv

    Transformation walkthroughThe proposed task will be accomplished in three subtasks:

    Creating the TransformationConstructing the skeleton of the Transformation using Steps and HopsConfiguring the Steps in order to specify their behavior

    Creating the TransformationClick , then select . Alternatively you can go to the menu, then select , then . You can alsoNew Transformation File New Transformationjust press .Ctrl-NIn the navigator, click , then click . Or right click the diagram and click . Or useView Transformation 1 Settings Transformation Settingsthe Ctrl+T shortcut.A window appears where you can specify Transformation properties. In this case, just write a name and a description, then click .SaveSave the Transformation in the folder with the name . This will create a file.Tutorial hello hello.ktr

    Constructing the skeleton of the Transformation using Steps and HopsA is the minimal unit inside a Transformation. A wide variety of Steps are available, grouped into categories like Input and Output, amongStepothers. Each Step is designed to accomplish a specific function, such as reading a parameter or normalizing a dataset.

    A is a graphical representation of data flowing between two Steps, with an origin and a destination. The data that flows through that HopHopconstitutes the of the origin Step, and the of the destination Step. A Hop has only one origin and one destination, but moreOutput Data Input Datathan one Hop could leave a Step. When that happens, the Output Data can be copied or distributed to every destination. Likewise, more than oneHop can reach a Step. In those instances, the Step has to have the ability to merge the Input from the different Steps in order to create theOutput.

    A Transformation has to do the following:

    Read the CSV fileBuild the greetingsSave the greetings in the XML file

    For each of these items you'll use a different Step, according to the next diagram:

    In this example, the correspondence between tasks and Steps is one-to-one because the Transformation is very simple. It isn't always that way,though.

    Here's how to start the Transformation:

    To the left of the workspace is the . Select the category.Steps Palette InputDrag the CSV file onto the workspace on the right.Select the category.Scripting

  • 4. 5. 6.

    1. 2. 3.

    1. 2. 3. 4.

    Drag the icon to the workspace.Modified JavaScript ValueSelect the category.OutputDrag the icon to the workspace.XML Output

    Now you will link the CSV file input with the Modified Java Script Value by creating a Hop:

    Select the first Step.Hold the key and drag the icon onto the second Step.ShiftLink the Modified Java Script Value with the XML Output via this same process.

    Specifying Step behaviorEvery Step has a configuration window. These windows vary according to the functionality of the Steps and the category to which they belong.However, is always a representative name inside the Transformation - this doesn't change among Step configurations. Step Name Step

    allows you to clarify the purpose of the Step.Description

    Configuring the CSV file input Step

    Double-click on the CSV file input Step.The configuration window belonging to this kind of Step will appear. Here you'll indicate the location, format and content of the input file.Replace the default name with one that is more representative of this Step's function. In this case, type in .name listIn the field, type the name and location of the input file.Filename

    Note: Just to the right of the text box is a symbol with a red dollar sign. This means that you can use variables as wellas plain text in that field. A variable can be written manually as ${name_of_the_variable} or selected from the variablewindow, which you can access by pressing . This window shows both predefined and user-definedCtrl-Spacebarvariables, but since you haven't created any variables yet, right now you'll only see the predefined ones. Among those,select:

  • 4.

    5.

    6.

    7. 8. 9.

    1. 2.

    3. 4.

    5.

    ${Internal.Transformation.Filename.Directory}

    Next the name of the variable, type a slash and the name of the file you created:

    ${Internal.Transformation.Filename.Directory}/list.csv

    At runtime the variable will be replaced by its value, which will be the path where the Transformation was saved. TheTransformation will search the file

    in that location.list.csv

    Click to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Get Fields checkbox is checked).Header row present

    Note: The button is present in most Steps' configuration windows. Its purpose is to load a grid with data fromGet Fieldsexternal sources or previous Steps. Even when the fields can be written manually, this button gives you a shortcut whenthere are many available fields and you want to use all or almost all of them.

    The grid has now the names of the columns of your file: and , and should look like this:last_name name

    Switch lazy conversion offClick to ensure that the file will be read as expected. A window showing data from the file will appear.PreviewClick to finish defining the Step CSV file input.OK

    Configuring the Modified JavaScript Value Step

    Double-click on the Step.Modified JavaScript ValueThe Step configuration window will appear. This is different from the previous Step config window in that it allows you to write JavaScriptcode. You will use it to build the message concatenated with each of the names."Hello, "Name this Step .GreetingsThe main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in thecode. In particular, the last two branches have the and fields, ready to use in the code. In this example there are two fields: input output

    and . Write the following code:last_name name

    var msg = 'Hello, ' + name.getString() + ;"!"

    Note: The text name.getString() can be written manually, or by double-clicking on the text in the function tree.At the bottom you can type any variable created in the code. In this case, you have created a variable named . Since you need tomsgsend this message to the output file, you have to write the variable name in the grid. This should be the result:

  • 5.

    6. 7.

    8. 9.

    10.

    1.

    2. 3.

    4.

    5.

    Don't mix these variables with PDI variables - they are not the same.Warning:

    Note: is not an adjective for , but for the Step. You are not dealing with a variant of JavaScript - itModified JavaScriptis the Step itself that is modified. It is an enhanced version of the original Step, which you found in previous versions ofPDI.

    Click to finish configuring .OK Step Modified Script ValueSelect the Step you just configured. In order to check that the new field will leave this Step, you will now see the Input and Output Fields.

    are the data columns that reach a Step. are the data columns that leave a Step. There are Steps that simplyInput Fields Output Fieldstransform the input data. In this case, the input and output fields are usually the same. There are Steps, however, that add fields to theOutput - , for example. There are other Steps that filter or combine data causing that the Output has less fields that the Input - Calculator

    , for example.Group byRight-click the Step to bring up a context menu.Select . You'll see that the Input Fields are and , which come from the CSV file input Step.Show Input Fields last_name nameSelect . You'll see that not only do you have the existing fields, but also the new field.Show Output Fields msg

    Configuring the XML Output Step

    Double-click the . The configuration window for this kind of Step will appear. Here you're going to set the name andXML Output Steplocation of the output file, and establish which of the fields you want to include. You may include all or some of the fields that reach theStep.Name the Step .File with GreetingsIn the box write:File

    ${Internal.Transformation.Filename.Directory}/Hello.xml

    Click to fill the grid with the three input fields. In the output file you only want to include the message, so delete and Get Fields name.last_name

    Save the Transformation again.

    How does it work?When you execute a Transformation, almost all Steps are executed simultaneously. The Transformation executes asynchronously; the rows ofdata flow through the

  • 1.

    2.

    3.

    4. 5.

    6.

    Steps at their own pace. Each processed row flows to the next Step without waiting for the others. In real-world Transformations, forgetting thischaracteristic can be a significant source of unexpected results.

    At this point, Hello World is almost completely configured. A Transformation reads the input file, then creates messages for each row via theJavaScript code, and then the message is sent to the output file. This is a small example with very few rows of names, so it is difficult to notice theasynchronous execution in action. Keep in mind, however, that it's possible that at the same time a name is being written in the output file,another is leaving the first Step of the Transformation.

    Verify, preview and executeBefore executing the Transformation, check that everything is properly configured by clicking . Spoon will verify that theVerifyTransformation is syntactically correct, and look for unreachable Steps and nonexistent connections. If everything is in order (it should beif you followed the instructions), you are ready to preview the output.Select the JavaScript Step and then click button. The following window will appear:Preview

    As you can see, Spoon suggests that you preview the selected Step. Click . After that, you will see a window with a sampleQuickLaunchof the output of the JavaScript Step. If the output is what you expected, you're ready to execute the Transformation.Click .RunSpoon will show a window where you can set, among other information, the parameters for the execution and the logging level. Click

    .LaunchA new window tab will appear in the Job window. This is the log tab, which contains a log of the current execution.

    The log tab has two sections: An upper part and a lower part.

    In the upper side you can see the executed operations for each Step of the Transformation. In particular, pay attention to these:

    Read: the number of rows coming from previous Steps.Written: the number of rows leaving from this Step toward the next.Input: the number of rows read from a file or table.Output: the number of rows written to a file or table.Errors: errors in the execution. If there are errors, the whole row will become red.

    In the lower portion of the window, you will see the execution step by step. The detail will depend on the log level established. If you pay attentionto this detail, you will see the asynchronicity of the execution. The last line of the text will be:

    Spoon - The transformation has finished!!

    If there weren't error messages in the text, open the newly generated file and check its content.Hello.xml

  • PanPan allows you to execute Transformations from a terminal window. The script is on Windows, or on other platforms, and it'span.bat pan.shlocated in the installation folder. If you run the script without any options, you'll see a description pan with a list of available options.

    To execute your Transformation, try the simplest command:

    Pan /file /Hello.ktr /norep

    /norep is a command to ask Spoon not to connect to the repository./file precedes the name of the file that contains the Transformation. is the full path to the Tutorial folder, for example:

    C:/Pentaho/Tutorial

    or

    /home/PentahoUser/Tutorial

    The other options are run with default values.

    After you enter this command, the Transformation will be executed in the same way it did inside Spoon. In this case, the log will be written to theterminal unless you specify a file to write to. The format of the log text will vary a little, but the information will be basically the same that you sawin the graphical environment.

    02. Spoon Introduction Pentaho Data Integration (Kettle) Tutorial 04. Refining Hello World

  • 04. Refining Hello World03. Hello World Example Pentaho Data Integration (Kettle) Tutorial

    Refining Hello WorldNow that the Transformation has been created and executed, the next task is enhancing it.

    OverviewThese are the improvements that you'll make to your existing Transformation:

    You won't look for the input file in the same folder, but in a new one, a folder independent to that where the Transformations are saved.The name of the input file won't be fixed; the Transformation will receive it as a parameter.You will validate the existence of the input file (exercise: execute the Transformation you created, setting as the name of the file, a filethat doesn't exist. See what happens!)The name the output file will be dependent of the name of the input file.

    Here's what happens:

    Get the parameterCreate the output file with greetingsCheck if the parameter is null; if it is, abortCheck if the file exists; if not, abort

    This will be accomplished via a , which is a component made by Job Entries linked by Hops. These Entries and Hops are arranged accordingJobthe expected order of execution. Therefore it is said that a Job is .flow-control oriented

    A is a unit of execution inside a Job. Each Job Entry is designed to accomplish a specific function, ranging from verifying the existenceJob Entryof a table to sending an email.

    From a Job it is possible to execute a Transformation or another Job, that is, Jobs and Transformations are also Job Entries.

    A Hop is a graphical representation that identifies the sequence of execution between two Job Entries.

    Even when a Hop has only one origin and one destination, a particular Job Entry can be reached by more than a Hop, and more than a Hop canleave any particular Job Entry.

    This is the process:

    Getting the parameter will be resolved by a new TransformationThe parameter will be verified through the result of the new Transformation, qualified by the conditional execution of the next Steps.The file's existence will be verified by a Job Entry.Executing the main task of the Job will be made by a variation of the Transformation you made in the first part of this tutorial.

    Graphically it's represented like this:

  • 1. 2. 3.

    1. 2.

    Preparing the EnvironmentIn this part of the tutorial, the input and output files will be in a new folder called - go ahead and create it now. Copy the file to thisFiles list.csvnew directory.

    In order to avoid writing the full path each time you need to reference the folder or the files, it makes sense to create a variable containing thisinformation. To do this, edit the configuration file, located in the kettle.properties C:\Documents and Settings\\.kettle* folder on

    directory on other platforms. Put this line at theWindows XP/2000, C:\Profiles\\.kettle* folder on Windows Vista or the *~/.kettleend of the file, changing the path to the one specific to the Files directory you just created:

    FILES=/home/PentahoUser/Files

    Spoon reads this file when it starts, so for this change to take effect, you must restart Spoon.

    Now you are ready to start. This process involves three stages:

    Create the TransformationModify the TransformationBuild the Job

    Creating the TransformationCreate a new Transformation the same way you did before. Name this Transformation .get_file_nameDrag the following Steps to the workspace, name them, and link them according to the diagram:

  • 2.

    a. b. c.

    3.

    1. 2. 3. 4.

    1. 2. 3. 4. 5.

    1. 2.

    3.

    ## Get System Info (Input category)Filter Rows (Flow category)Abort (Flow category)Set Variable (Job category)

    Configure the Steps as explained below:

    Configuring the Get System Info Step (Input category)This Step captures information from sources outside the Transformation, like the system date or parameters entered in the command line. In thiscase, you will use the Step to get the first and only parameter. The configuration window of this Step has a grid. In this grid, each row you fill willbecome a new column containing system data.

    Double-click the Step.In the first cell, below the Name column, write .my_fileWhen you click the cell below Type, a window will show up with the available options. Select .command line argument 1Click .OK

    Configuring the Filter Rows Step (Flow category)This Step divides the output in two, based upon a condition. Those rows for which the condition evaluates to true follow one path in the diagram,the others follow another.

    Double-click the Step.Write the condition: In select and replace the with .Field my_file = IS NULLIn the drop-down list next to , select .Send 'true' data to Step AbortIn the drop-down list next to , select .Send 'false' data to Step Set VariableClick .OK

    Now a NULL parameter will reach the Abort Step, and a NOT NULL parameter will reach the Set Variable Step.

    Configuring the Abort Step (Flow category)You don't have anything to configure in this Step. If a row of data reaches this Step, the Transformation aborts, then fails, and you will use thatresult in the main Job.

    Configuring the "Set Variable" Step ("Job" category)This Step allows you to create variables and put the content of some of the input fields into them. The configuration window of the Step has a grid.Each row in this grid is meant to hold a new variable.

    Now you'll create a new variable to use later:

    Double-click the Step.Click . The only existing field will appear: . The default variable name is the name of the selected field in upper case: Get Fields my_file

    . Leave the default intact.MY_FILEClick OK.

  • 1. 2.

    3. 4.

    5.

    1. 2. 3.

    4. 5. 6.

    7.

    8. 9.

    10.

    11. 12.

    13. 14.

    1.

    Execution

    To test the Transformation, click .RunWithin the run dialog, you will find a grid titled "Arguments" on the bottom left. Delete whatever arguements are already inside, andinstead type as the first argument value. This will be transfered to the transformation as the command line argument.listClick .LaunchIn the pane, you'll see a message like this:Logging

    Set Variables.0 - Set variable MY_FILE to value [list]

    Click again, and clear the value of the first argument. This time, when you hit you'll see this:Run Launch

    Abort.0 - Row nr 1 causing abort : []Abort.0 - Aborting after having seen 1 rows.

    In the Pane, You'll see the line highlighted in red, which indicates that an error occurred and that the TransformationStep Metrics Step Abortfailed (as expected).

    Modifying the TransformationNow it's time to modify the transformation in order to match the names of the files to their corresponding parameters. If the command lineHelloargument to the job would be , this transformation should read the file and create the file . It would also befoo foo.csv foo_with_greetings.xmlhelpful to add a filter to discard the empty rows in the input file.

    Open the Transformation .Hello.ktrOpen the configuration window.CSV File Input StepDelete the content of the text box, and press to see the list of existing variables. You should see the Filename Ctrl-Spacebar FILESvariable you added to kettle.properties. Select it and add the name of the variable you created in the previous Transformation. The textbecomes:

    ${FILES}/${MY_FILE}.csv

    Click .OKOpen the configuration window.XML Output StepReplace the content of the text box with this:Filename

    ${FILES}/${MY_FILE}_with_greetings

    Click to view the projected XML filename. It should replace the FILES variable with your files directory and look likeShow Filename(s)this (depending on the location specified for FILES):

    /home/Pentaho/files/${MY_FILE}_with_greetings.xml

    Click .OKDrag a step into the transformation.Filter RowsDrag the step onto the Hop that leaving and reaching . When you see that theFilter Rows CSV Input Modified Javascript Script ValueHop line becomes emphasized (thicker), release the mouse button. You have now linked the new step to the sequence of existent steps.Select for the Field, and for the comparator.name IS NOT NULLLeave and blank. This makes it so only the rows that fulfill the condition (rowsSend 'true' data to Step Send 'false' data to Stepwith non-null names) follow to the next Step. This is similar to an earlier Step.Click .OKClick and name this Transformation .Save As Hello_with_parameters

    Executing the Transformation

    To test the changes you made, you need to make sure that the variable exists and has a value. Because this Transformation isMY_FILEindependent of the Transformation that creates the variable, in order to execute it, you'll have to create the variable manually.

  • 1. 2. 3. 4. 5. 6.

    1. a. b. c. d. e.

    2.

    1.

    2.

    3. a. b.

    c. 4.

    a. b.

    c. 5.

    a. b.

    6. a.

    7. a.

    In the menu, click . A list of variables will appear.Edit Set Environment VariablesAt the bottom of the list, type in as the variable name; as the content, type the name of the file without its extension.MY_FILEClick .OKClick .RunIn the list of variables, you'll see the one you just created. Click to execute the Transformation.LaunchLastly, verify the existence and content of the output file.

    Building the main jobThe last task in this part of the tutorial is the construction of the main Job:

    Create the Job:Click , then .New JobThe Job workspace, where you can drop Job Entries and Hops, will come up.Click , then .Job SettingsA window in which you can specify some Job properties will come up. Type in a name and a description.Click . Save the Job in the Tutorial folder, under the name .Save Hello

    Build the skeleton of the Job with Job Entries and Hops:

    To the left of the workspace there is a palette of Job Entries.

    Now build the Job:

    Drag the following steps into the workspace: one step, two steps, and one step.General->Start General->Transformation File ExistsLink them in the following order: Start, Transformation, File Exists, Transformation.Drag two steps to the workspace. Link one of them to the first step and the other to the General->Abort Transformation File Existsstep. The newly created hops will turn red.Configure the Steps:

    Double click the first Transformation step. The configuration window will come up.In the field, type the following:Transformation filename

    ${Internal.Job.Filename.Directory}/get_file_name.ktr

    This will work since transformations and jobs reside in the same folder.Click .OK

    Configure the second of the two Transformation Job Entries:Double-click the entry. The configuration window will come up.Type the name of the other Transformation in the field:Transformation Filename

    ${Internal.Job.Filename.Directory}/Hello_with_parameter.ktr

    Click .OKConfigure the File Exists Job Entry:

    Double-click the entry to bring up the configuration window.Put the complete path of the file whose existence you want to verify in the field. The name is the same that you wroteFilenamein the modified Transformation Hello:

    ${FILES}/${MY_FILE}.csv

    Note: Remember that the variable ${FILES} was defined in the kettle.properties file and the variable${MY_FILE} was created in the Job Entry that is going to be executed before this one.

    Configure the Abort step connected to the get_file_name transformation step:In the Message textbox write: The file name argument is missing

    Configure the Abort step connected to the File Exists step:In the Message textbox write this text:

    The file ${FILES}/${MY_FILE}.csv does not exist

    Note: In runtime, the tool will replace the variable names by its values, showing for example: "The filec:/Pentaho/Files/list.csv does not exist. If you place your mouse pointer over the Message textbox, Spoon willdisplay a tooltip showing projected output.

  • 1. 2. 3.

    1. 2. 3. 4. 5. 6. 7.

    Configuring the HopsA Job Entry can be executed unconditionally (it's executed always), when the previous Job Entry was successful, and when the previous JobEntry failed. This execution is represented by different colors in the Hops: a black Hop indicates that the following Job Entry is always executed; agreen Hop indicates that the following Job Entry is executed only if the previous Job Entry was successful; and a red Hop indicates that thefollowing Job Entry is executed only if the previous Job Entry failed.

    As a consequence of the order in which the Job Entries of your Job were created and linked, all of the Hops took the right color, that is, the Stepswill execute as you need:

    The first Transformation entry will be always executed (The Hop that goes from Start toward this entry, is black)If the Transformation that gets the parameter doesn't find a parameter, (that is, the Transformation failed), the control goes through thered Hop towards theAbort Job entry.If the Transformation is successful, the control goes through the green Hop towards the File Exists entry.If the file doesn't exist, that is, the verification of the existence fails, the control goes through the red Hop, towards the second Abort Jobentry.If the verification is successful, the control goes through the green Hop towards the main Transformation entry.

    If you wanted to change the condition for the execution of a Job Entry, the steps to follow would be:

    Select the Hop that reached this Job Entry.Right click to bring up a context menu.Click , then one of the three available conditions.Evaluation

    How it worksWhen you execute a Job, the execution is tied to the order of the Job Entries, the direction of the Hops, and the condition under which an entry isor not executed. The execution follows a sequence. The execution of a Job Entry cannot begin until the execution of the Job Entries that precedeit has finished.

    In real-world situations, a Job can be a solution to solve problems related to a sequence of tasks in the Transformations. If you need a part of aTransformation to finish before another part begins, a solution could be to divide the Transformation into two independent Transformations, andexecute them from a Job, one after the other.

    Executing the JobTo execute a Job, you first must supply a parameter. Because the only place where the parameter is used is in the Transformationget_file_name(after that you only use the variable where the parameter is saved) write the parameter as follows:

    Double-click the Transformation Step.get_file_nameThe ensuing window has a grid named . In the first row type .Arguments listClick .OKClick the button, or from the title menu select .Run Job->RunA window will appear with general information related with the execution of the Job.Click .LaunchThe execution results pane on the bottom should display the execution results.

    Within the execution results pane, the Job Metrics tab shows the Job Entries of your Job. For each executed Job Entry, you'll see, among otherdata, the result of the execution. The execution of the entries follows a sequence. As a result, if an entry fails, you won't see the entries that followbecause they never start. In the Logging tab you can see the log detail, including the starting and ending time of the Job Entries. In particular, when an Entry is aTransformation, the log corresponding to the transformation is also included.

    The new file has been created when you see this at the end of the log text:

    Spoon - Job has ended.

    If the input file was , then the output file should be and should be in the same folder. Find it and check itslist.csv list_with_greetings.xmlcontent.

    Now change the name of the parameter by replacing it with a nonexistent file name and execute the Job again. You'll see that the Job aborts, andthe log shows the following message (where is the parameter you supplied):

  • Abort - The file does not exist

    Now try deleting the parameter and executing the Job one more time. In this case the Job aborts as well, and in the log you can see thismessage, as expected:

    Abort - The file name is missing

    KitchenKitchen is the tool used to execute Jobs from a terminal window. The script is on Windows, and on other platforms, andkitchen.bat kitchen.shyou'll find it in the installation folder. If you execute it, you'll see a description of the command with a list of the available options.

    To execute the Job, try the simplest command:

    kitchen /file /Hello.kjb /norep

    /norep is a command to ask Spoon not to connect to the repository./file precedes the name of the file corresponding to the Job to be executed. is the full path of the folder Tutorial, for example:

    c:/Pentaho/Tutorial (Windows)

    or

    /home/PentahoUser/Tutorial

    is the parameter that the Job is waiting for. Remember that the expected parameter is the name of the input file, without the csv.The other options (i.e. log level) take default values.

    After you enter this command, the Job will be executed in the same way it did inside Spoon. In this case, the log will be written to the terminalunless you redirect it to a file. The format of the log text will vary a little, but the information will be basically the same as in the graphicalenvironment.

    Try to execute the Job without parameters, with an invalid parameter (a nonexistent file), and with a valid parameter, and verify that everythingworks as expected. Also experiment with Kitchen, changing some of the options, such as log level.

  • Pentaho Data Integration (Kettle) TutorialWritten by , Pentaho Community Member, BI consultant (Assert Solutions), ArgentinaMara Carina RoldnThis work is licensed under the .Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License

    IntroductionPentaho Data Integration (PDI, also called ) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes.KettleThough ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:

    Migrating data between applications or databasesExporting data from databases to flat filesLoading data massively into databasesData cleansingIntegrating applications

    PDI is easy to use. Every process is created with a graphical tool where you specify what to do without writing code to indicate how to do it;because of this, you could say that PDI is .metadata oriented

    PDI can be used as a standalone application, or it can be used as part of the larger Pentaho Suite. As an ETL tool, it is the most popular opensource tool available. PDI supports a vast array of input and output formats, including text files, data sheets, and commercial and free databaseengines. Moreover, the transformation capabilities of PDI allow you to manipulate data with very few limitations.

    Through a simple "Hello world" example, this tutorial will to show you how easy it is to work with PDI and get you ready to make your own morecomplex Transformations.