Hello, and welcome to this online, self-paced lesson entitled “ORE Embedded R Scripts: SQL
Interface.” This session is part of an eight-lesson tutorial series on Oracle R Enterprise.
My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and
review on this lesson.
ORE Embedded R Scripts: SQL Interface - 1
Before we begin, now might be a good time to take a look at some of the features of this
Flash-based course player. Feel free to skip this slide and start the lecture if you’ve attended
similar Oracle Self Study courses in the past.
To your left, you will find a hierarchical course outline. This course enables and even
encourages you to go at your own pace, which means you are free to skip over topics you
already feel confident on, or jump right to a feature that really interests you, or go back and
review topics that were already covered. Simply click on a course section to expand its
contents and then select an individual slide. However, note that by default we will
automatically walk you through the entire course without requiring you to use the outline.
Standard Flash player controls are also found at the bottom of the player, including pause,
previous, and next buttons. There is also an interactive progress bar to fast forward or rewind
the current slide. Interactive slides may have additional controls and buttons along with
instructions on how to use them.
Also found at the bottom of the player is a panel containing any additional reference notes for
the current slide. Feel free to read these reference notes at the conclusion of the course, in
which case you can minimize this panel and restore it later. Or if you prefer you can pause
and read them as we go along.
ORE Embedded R Scripts: SQL Interface - 2
Various handouts may be available from the Attachments button, including the audio
narration scripts for this course.
The course will now pause, so feel free to take some time and explore the interface. Then
when you’re ready to continue, click the next button below or alternatively click the Module 1
slide in the course outline at left.
ORE Embedded R Scripts: SQL Interface - 3
“Embedded R Scripts: SQL Interface” is the seventh lesson of eight self-study sessions on
Oracle R Enterprise.
ORE enables database server execution of R code to facilitate embedding R in operational
systems. Two interfaces support this feature.
As you learned in the previous lesson, the R interface allows users to interactively test R
scripts before putting them into production in a database application. In this lesson, you learn
how to use the SQL interface, which is typically used for embedding R script execution in
production database applications.
Let’s take a look at the topics.
ORE Embedded R Scripts: SQL Interface - 4
This lesson includes three topics:
• First, you’ll learn about the functions that are available to enable embedded R execution
by using the SQL interface.
• Next, you’ll examine each function individually, with examples.
• Finally, you’ll learn how to enable parallel execution in the database for embedded R
execution.
So first , an overview of the SQL interface functions.
ORE Embedded R Scripts: SQL Interface - 5
Like the R interface, the SQL interface for embedded R script execution allows users to
execute their R scripts on the database server machine. However, unlike the R interface, the
functions associated with SQL interface must be stored in the database R repository, and
referenced by name in SQL API functions, as you’ll see in subsequent code examples.
The SQL interface for embedded R execution in the database consists of the functions shown
in the slide. Each of these functions provides capabilities intended for different situations.
• rqEval() invokes a stand-alone R script in the database. A stand-alone R script does not
need data passed in.
• rqTableEval() invokes an R script with a full table as input. This table may be an input
cursor or the result of a query. The table must be able to fit the Database R Engine’s
available memory.
• rqGroupEval() requires additional SQL specification and is provided here as a virtual
function, which partitions the data according to a specified column’s values and invokes
the R script on each partition.
• rqRowEval() invokes an R script on one row at a time or on multiple rows in chunks. The
function is invoked multiple times until all data is processed.
ORE Embedded R Scripts: SQL Interface - 6
The last two functions are used to create or drop an R script in the database.
• Sys.rqScriptCreate() creates a named R script in the database repository for use by name
in the other embedded R script functions, both in the R interface and in the SQL interface.
• Sys.rqScriptDrop() removes a named R script from the database repository.
Only users that are granted the rq.admin role are allowed to execute these functions. Privileges
and roles will be discussed at the end of this lesson.
ORE Embedded R Scripts: SQL Interface - 7
This table provides more detail on the SQL interface functions for embedded R execution.
The rqTableEval(), rqGroupEval(), and rqRowEval() functions, shown in red, all take a table
cursor for their input data, which is required.
The rqEval() function, shown in blue, can have input that is
• Internally generated
• Loaded from a file
• Pulled from the database by using ore.pull()
• Made available through the Transparency Layer
The return values for the first four functions can specify one of three values:
• NULL
• A table signature that is specified in a SELECT statement, which returns results as a
table from the rq function
• XML, which returns both structured and graph images in an XML string, where the
structured components are provided first, followed by the base 64 encoding of the png
representation of the image
ORE Embedded R Scripts: SQL Interface - 8
Arguments are optional, but if specified, they are specified through a select statement.
In addition, the rqEval(), rqTableEval(), rqGroupEval(), and rqRowEval() functions must specify
an R script by the name that is stored in the R script repository.
Special options for each function will be illustrated in the examples.
The sys.rqScriptCreate() and sys.rqScriptDrop() functions are specifically used to create R
scripts for execution in the database and to drop R scripts, respectively. When using
sys.rqScriptCreate(), you must also specify a corresponding R Closure of the function string.
ORE Embedded R Scripts: SQL Interface - 9
All of the rq*Eval functions, including rqEval(), rqTableEval(), rqGroupEval, and rqRowEval(),
share the same general syntax, consisting of a few basic parameters.
In this slide, each parameter is color coded, with a summary at the top and the general syntax
format in the code box.
Each of the rq*Eval functions may take the following parameters:
• First, the Input Cursor in red specifies the data to be provided to the R function. Data
preparation depends on the type of table function that is invoked, as you’ll see in the
examples to follow. As mentioned previously, rqEval() takes no input data; it just
executes the script.
• Second, the Parameters Cursor in blue specifies other parameters to be passed to the R
function. This parameter must be defined as a cursor, and can include reading values from a table or from DUAL.
ORE Embedded R Scripts: SQL Interface - 10
• Third, the Output Table Definition in green specifies the return value, or result. This
definition corresponds to the tabular output that can be returned from an R function in the
form of a data.frame. If the result is NULL, it is returned as a serialized BLOB.
• The optional fourth parameter, in black, may include a group name or the number of rows.
- For rqGroupEval(), a group name specifies the column on which to partition the data.
- For rqRowEval(), this parameter defines the number of rows to provide to the
function as a chunk.
• Lastly, the R Closure in purple specifies the name of the R function to execute.
ORE Embedded R Scripts: SQL Interface - 11
Next, you’ll learn how to use the SQL interface for embedding R execution in the database by
examining code examples.
ORE Embedded R Scripts: SQL Interface - 12
First, lets look at is a simple example that uses sys.rqScriptCreate() and rqEval(). Here, we
create an R script that generates its own data, and then invoke the R script by name.
At the top of the code example, sys.rqScriptCreate() is used to create a named R script as part of a PL/SQL procedure . The Example1 R script:
• Defines a vector of integers from 1 to 10 in the variable ID
• Defines a data.frame named “res” that contains two columns: RES and ID
• Returns the data.frame as the result
Then , the rqEval() function is invoked as part of a SELECT statement.
Within the rqEval() table function:
• The first parameter is NULL, because rqEval() does not take any input parameters.
• The next parameter defines the table output format by using the select statement. The example illustrates using the number 1 to indicate a NUMBER type column for id and
res from dual.
• The last parameter provides the name of the R script to execute: Example1.
The output of the select statement is shown on the right.
ORE Embedded R Scripts: SQL Interface - 13
In the lesson that covered the R interface for embedded R scripts, we showed how to build a
linear regression model by using the ore.doEval() function. Here, we show the equivalent
approach by using the sys.rqScriptCreate() and rqTableEval() functions.
As with the previous lesson’s example, we use the R stats lm() function to build the model with data in the ONTIME_S data set.
At the top of the code example , sys.rqScriptCreate() is used to create an R script named Example2 as part of a PL/SQL procedure. The Example2 R script:
• Takes input data by using the dat argument for the function.
• Uses the lm() function to build a linear model from the input data by using the ARRDELAY, DISTANCE, and DEPDELAY columns.
• Returns the model – mod.
Then, the rqTableEval() function is invoked as part of a CREATE TABLE AS SELECT
statement.
ORE Embedded R Scripts: SQL Interface - 14
In the rqTableEval() function:
• The first parameter defines an input cursor for the data that will be passed to the dat
argument , when the Example2 R script is executed.
• The next two parameters (for the cursor and the output table definition) are both NULL.
• The last parameter provides the name of the R script to execute: Example2.
When the SQL code is executed, the output on the right is generated:
• The PL/SQL procedure creating Example2 is executed.
• The ontime_lm table is dropped.
• The linear model is generated and stored in the new ontime_lm table as a chunked
BLOB.
In the next slide, you will see how to score data with this model in batch mode.
ORE Embedded R Scripts: SQL Interface - 15
Now, using the rqTableEval() function, we score data by using the model that we just created.
At the top of the code example , sys.rqScriptCreate() creates an R script named Example3
as part of a PL/SQL procedure.
The Example3 R script defines the scoring process as follows:
• Takes input data by using the dat argument to the function, and take a model by using
the mod argument to the function.
• Uses the R predict() function to score the data.
• Appends the prediction results to the input data as a column named PRED.
• Returns the resulting data set.
Below the PL/SQL procedure , the rqTableEval() function is invoked as part of a select
statement, which is used to perform the scoring process.
ORE Embedded R Scripts: SQL Interface - 16
In the rqTableEval() function:
• The first parameter defines the input cursor for the subset of data in ontime_s that we want to score. This data is passed as the first argument in the Example3 script
function.
• The second parameter defines a cursor for the ontime_lm table, which contains the
linear model that was created on the previous slide. This cursor is passed as the second argument in the Example3 script function.
• The third parameter defines the output table definition in the form of a select
statement.
• The last parameter provides the name of the R script to execute, which is Example3.
When the select statement for this batch scoring process is executed, the score results on
the right are produced.
ORE Embedded R Scripts: SQL Interface - 17
Let’s take a quick look at the contents of the ontime_lm table, which stores the linear
regression model. Because the model is somewhat large, it is stored in a series of chunks,
and reconstituted when needed.
Here, we display the first two chunks by using a select statement.
ORE Embedded R Scripts: SQL Interface - 18
You can perform real-time scoring in the database by making a slight modification to the
rqTableEval() function invocation.
Real-time scoring is useful in a number of situations, such as:
• In call centers, when data is collected from customers and must be evaluated in the
moment, or
• In what-if scenarios, where users can change certain input values to see the outcome.
A shown in the code example:
• The first parameter of the rqTableEval() function is the input cursor.
• We simply pass the input data record directly from dual, rather than reading them from
a table.
• We are reusing the model that we created earlier, as well as the same output specifications. The function specified as Example4 is even the same.
On the right, we see the results of the real-time scoring by using the input record. Despite a
departure delay of 45 minutes, the pilot must have made up some time. The actual arrival
delay was only 23 minutes compared with the predicted arrival delay of over 39 minutes.
ORE Embedded R Scripts: SQL Interface - 19
In the R interface, using ore.groupApply() was quite straightforward, requiring no unusual
specification. However, in the SQL interface, the equivalent function, rqGroupEval(), requires
you to create two database objects: a package and a function.
As shown in the code example, you must first:
• Create a PL/SQL package that specifies the type of result to be returned. In this example, we want to return a cursor for the ROWTYPE in the ontime_s data set.
• Then , you create a function that does the following:
- Takes the return value of the package , in this case as the first argument.
- Uses the return value with PIPELINED PARALLEL_ENABLE set , indicating the
column on which you want to partition data . In this example, we specify month as
the partitioning column.
We will use this ontimeGroupEval() function in the next example.
ORE Embedded R Scripts: SQL Interface - 20
In the Example7 R script, we’re building a linear model by using the same ontime_s data set to predict arrival delay. The script looks almost identical to the Example3 script that we created previously, except that we are using the R stats biglm() function rather than lm().
Then , the ontimeGroupEval() function that was created on the previous slide is invoked as part of a CREATE TABLE AS SELECT statement. This function is fairly simple, due to the setup work done previously. Here:
• The input cursor selects the entire ontime_s data set. This input data is set to the dat function argument when the Example7 R script is executed.
• The next two parameters (parameters cursor and output table definition) are both NULL.
• The fourth parameter specifies the column name on which to partition the data, in this case MONTH.
• The last parameter provides the name of the R script to execute: Example7.
When the SQL code is executed:
• The PL/SQL procedure is executed.
• The linear models are generated for each month in the input data set, and the list of models is stored in the table ontime_lmg.
In the next slide, you will see how to score data with these models.
ORE Embedded R Scripts: SQL Interface - 21
Here, we will use the models that were just created with the rqGroupEval() capability, and
score data by using the rqRowEval() function.
At the top of the code example , sys.rqScriptCreate() is used to create an R script named Example8 as part of a PL/SQL procedure.
The Example8 R script scores data with the model associated with a particular month. The
script performs the following:
• Takes input data by using the dat argument to the function, and takes the complete
model list by using the mod.list argument to the function.
• Gets the model associated with the month by using the month number as an index into the list, and stores it in mod.
• Scores the data for that month.
• Appends the prediction results PRED column to the input data, returning the prediction
with the rest of the data.
Then , we use the rqRowEval() function as part of a select statement, in order to score the
data.
ORE Embedded R Scripts: SQL Interface - 22
In the rqRowEval() function:
• The first parameter defines the input cursor for the subset of data in ontime_s that we
want to score. This includes May 2nd and June 2nd of 2003. The input data is passed as the first argument in the Example8 script function.
• The second parameter defines a cursor for the ontime_lmg table, which contains the
linear models for each month. (These models were created in the example on the
previous slide.) This complete model list is passed as the second argument in the Example8 script function.
• The third parameter defines the output table definition in the form of a select
statement.
• The last two parameters provide:
- The chunk size, which is set to 1 row, and
- The name of the R script to execute, which is Example8.
When the select statement for this scoring process is executed, the results on the right are
produced.
ORE Embedded R Scripts: SQL Interface - 23
For the output of rq*Eval functions, XML is also an output option.
This is useful in addressing the two limiting aspects of SQL language output:
• Since R script output is often dynamic, and doesn’t conform to predefined table
structures, XML provides a way to return complex R objects.
• In addition, R applications generate heterogeneous data results for statistics, new data,
and graphics. These may not be readily expressed as a SQL table function result for use
by applications that need one or all of these results.
ORE’s support for XML output enables applications to embed ORE in order to work with
complex statistical results, new data, and graphics that are produced by R. To accomplish
this, ORE wraps R objects in a generic and powerful framework, enabling ready integration of
R into any operational application, including OBIEE dashboards and BI Publisher documents,
by using the SQL interface for embedded R scripts.
ORE Embedded R Scripts: SQL Interface - 24
Here, we show a simplistic example using a text string as the return value of our R script.
Although simple, this example illustrates the technique that also allows ORE to load graphs
generated in R to applications that read XML, such as Oracle BI Publisher and Oracle BI
Interactive Dashboards.
In the code example:
• We return the string “Hello World” from our R script named Example5.
• Then, in the rqEval() function, invoked as part of a select statement , we specify ’XML’
in the output format parameter.
When the script is executed , the XML output is generated. Notice the “Hello World!” value between the value tags in the XML output.
ORE Embedded R Scripts: SQL Interface - 25
Using the same technique, we show an R script that generates graphical output in XML
format.
In the code example:
• The function in the Example6 script plots 100 random numbers, and returns a vector
with values 1 to 10.
• Then, in the rqEval() table function, invoked as part of a select statement , we specify
‘XML’ in the output definition format parameter.
When the script is executed, XML output produces a base 64 encoding of the corresponding
PNGrepresentation of this graph . This representation can be used by any application that can
parse XML and interpret graphics represented in this format.
On the next slide, we view the XML value that is returned by this function, which can be
consumed by applications such as Oracle BI Publisher and OBIEE.
ORE Embedded R Scripts: SQL Interface - 26
Looking at the XML output at the bottom:
• You see the value tags for some of the index numbers from 1 to 10.
• These are followed by the image content.
As stated, these results can be easily incorporated into applications that read XML output,
and where images are involved, the base 64 encoding of the PNG image.
ORE Embedded R Scripts: SQL Interface - 27
As you learned in the previous lesson, because R allows access to the database sever
machine at the operating system level, roll-based privileges are warranted for security
purposes.
For this, ORE defines two database roles: RQADMIN and RQROLE. The roles and privileges
are shown in the table.
These roles apply not only to the R interface for embedded R execution that we covered in
the previous lesson, but also to the SQL interface for embedded R execution that you learned
about in this lesson.
ORE Embedded R Scripts: SQL Interface - 28
One of the drawbacks of open-source R is its lack of support for parallel execution in base R
before release 2.15. Parallelism is possible with R through the use of additional open-source
packages; however, this normally requires explicitly writing code to leverage this parallelism
as well as being aware of the underlying hardware environment.
However, ORE uses Oracle Database to augment R with database-supported parallelism in
several ways. In this section, you’ll learn how to enable parallel execution in the database,
both for the Transparency Layer and with embedded R execution.
ORE Embedded R Scripts: SQL Interface - 29
In the ORE Transparency Layer, parallelism may be enabled by virtue of having the database
perform overloaded R functions. Of course, this assumes that the database runs on a
machine capable of supporting parallelism and that the database is configured for parallelism.
The Transparency Layer is one way that ORE leverages the database as a compute engine
for both scalability and performance.
Therefore, the Transparency Layer is ideal for “bigger data,” where the supporting
functionality exists in the database.
ORE Embedded R Scripts: SQL Interface - 30
In the previous lesson, you learned about the R interface for embedded R execution, and a
brief mention was made about support for parallelism. Here, you learn specifically how to
enable parallelism to support embedded R execution.
As you may recall, the ore.*Apply() functions support two types of data parallel computations,
specifically using the ore.groupApply() or ore.rowApply() functions.
Parallel execution is ideal for:
• Model building and data scoring on partitions of data
• Any data parallel operations
• Monte Carlo type simulations for ore.indexApply()
Although we haven’t shown its use yet, the ore.*Apply() functions take a “parallel” parameter,
which can be set to TRUE to affect whether multiple database R engines are spawned.
If this parameter is set to FALSE, data parallelism is effectively turned off.
ORE Embedded R Scripts: SQL Interface - 31
In the SQL interface for embedded R execution, two of the rq*Eval() functions support data
parallel computations, specifically using the rqGroupEval() capability and rqRowEval()
functions.
As with the R interface, data parallelism is ideal for model building and scoring as well as any
data parallel operations.
However, the SQL interface provides two additional features when leveraging parallelism:
• Enables lights-out execution of R scripts by having them included in production SQL scripts, including dbms.scheduler jobs, triggers, and so on.
• Supports parallel cursor hints when specifying the data input cursor for an rq*Eval()
function. You can leverage the database setting for the default degree of parallelism, or
you can specify your own setting for the specific cursor definition.
You’ll see an example of a parallel cursor hint in one of the following examples.
ORE Embedded R Scripts: SQL Interface - 32
This code example shows an rqRowEval() function being used in a select statement. This
code is very similar to the example shown earlier in this lesson.
Here, we have added a parallel cursor hint to the input cursor definition. This hint, which is color coded in red, specifies parallel execution on the ontime_s table, using the default
degree of parallelism that is set for the table.
ORE Embedded R Scripts: SQL Interface - 33
This slide provides a brief overview of the commands and features that are associated with
enabling parallelism in the database. These settings are database features, not features of
ORE. These settings may affect the parallel behavior of the Transparency Layer functions for
ORE users.
For additional information on parallelism, see the references in the next slide.
ORE Embedded R Scripts: SQL Interface - 34
This slide provides a list of information resources on parallel execution in Oracle Database.
ORE Embedded R Scripts: SQL Interface - 35
True or False quiz:
True response: Correct. Unlike the R interface, when using the SQL interface, the functions
must be stored in the database R repository and referenced by name in the SQL API
functions.
False response: Incorrect. Although the R interface does not require functions to be stored in
the database R repository, it is mandatory when using the SQL interface.
ORE Embedded R Scripts: SQL Interface - 36
In this lesson, you learned about embedding R script execution by using the SQL interface.
• First, you learned about the functions that are available in the SQL interface.
• Second , you examined each function individually, with code examples.
• Finally , you learned how to enable parallel execution in the database for embedded R
scripts.
ORE Embedded R Scripts: SQL Interface - 37
You’ve just completed “Embedded R Scripts: SQL Interface”. Please move on to the last
lesson in the series: “Using the Oracle R Connector for Hadoop”.
ORE Embedded R Scripts: SQL Interface - 38