ibm websphere portal content publishing v4.2

15
IBM WebSphere Portal content publishing v4.2: Understanding the Feedback logging tables By Andrei Malacinski Software Engineer, IBM Corp. April 2003 WebSphere Portal content publishing (WPCP) version 4.2 provides a rich set of logging functio ns. These functions include both an automatic rule logging feature (referred to as Personalization Rule logging), as well an extensible logging feature (an application programming interface (API), referred to as Logging Beans). Data logged by the WPCP logging subsystem is collected and stored into a set of database tables referred to as the Feedback database schema. One of the keys to tapping into to this wealth of data is to understand the Feedback database schema. This paper describes a subset of the schema, specifically those tables that store the personalization specific data (referred to as Personalization Tables). These tables include: Hit_Facts HitParms Parms Key_Value_Combo Key_Value_Pair Key Value WebSphere Portal content publishing logging subsystem Figure 1 illustrates the WPCP logging subsystem. Through the use of either Personalization Rules or Logging Beans in a Web application, data is captured by the WPCP Log Manager and then dispatched, in the form of Java event objects, to a set of registered listeners. One such listener, the WPCP Feedback Listener (sometimes referred to as the Shared Schema Listener), stores the data in the Feedback schema for the purpose of subsequent user reporting. Another listener, the LikeMinds Listener, stores data. However, the data stored by the LikeMinds Listener is written to a set of recommendation tables outside of the Feedback schema and is not designed for general reporting. The figure below shows a configuration in which the same database (labeled the Feedback Database) is used for the Feedback schema (containing the Personalization tables) and for the LikeMinds schema (containing the recommendation tables).

Upload: others

Post on 24-Mar-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM WebSphere Portal content publishing v4.2

IBM WebSphere Portal content publishing v4.2: Understanding the Feedback logging tables By Andrei Malacinski Software Engineer, IBM Corp. April 2003 WebSphere Portal content publishing (WPCP) version 4.2 provides a rich set of logging functions. These functions include both an automatic rule logging feature (referred to as Personalization Rule logging), as well an extensible logging feature (an application programming interface (API), referred to as Logging Beans). Data logged by the WPCP logging subsystem is collected and stored into a set of database tables referred to as the Feedback database schema. One of the keys to tapping into to this wealth of data is to understand the Feedback database schema. This paper describes a subset of the schema, specifically those tables that store the personalization specific data (referred to as Personalization Tables). These tables include:

• Hit_Facts • HitParms • Parms • Key_Value_Combo • Key_Value_Pair • Key • Value

WebSphere Portal content publishing logging subsystem Figure 1 illustrates the WPCP logging subsystem. Through the use of either Personalization Rules or Logging Beans in a Web application, data is captured by the WPCP Log Manager and then dispatched, in the form of Java event objects, to a set of registered listeners. One such listener, the WPCP Feedback Listener (sometimes referred to as the Shared Schema Listener), stores the data in the Feedback schema for the purpose of subsequent user reporting. Another listener, the LikeMinds Listener, stores data. However, the data stored by the LikeMinds Listener is written to a set of recommendation tables outside of the Feedback schema and is not designed for general reporting. The figure below shows a configuration in which the same database (labeled the Feedback Database) is used for the Feedback schema (containing the Personalization tables) and for the LikeMinds schema (containing the recommendation tables).

Page 2: IBM WebSphere Portal content publishing v4.2

figure 1: WebSphere Portal content publishing logging subsystem Feedback schema The Feedback schema follows a traditional star schema model (or snowflake pattern), where data is represented by a basic fact table, around which dimensional data is linked. The main fact table in the Feedback schema is the Hit_Facts table. Each row of data is this table corresponds to a ‘Web page hit’. The data within the columns of this table comprise the elements of the ‘Web page hit’. Some of these elements include: The hit date/time stamp, URI (resource + query string) (e.g. index.jsp?key=value), hit referral information, session information, and most notably, personalization information. The personalization information relating to each hit is stored in a set of dimensional tables linked off the Hit_Facts. Figure 2 depicts an abstract view of the star schema model of the Feedback schema. In addition to the Hit_Facts table, is a Session_Facts table, around which session dimensional data is linked. Figure 3 shows the personalization tables and their relationship to the Hit_Facts table.

Rules Personalizaion Rule

Query Prediction Rule

LikeMinds Schema (recommendation tables)

(LikeMinds specific tables)

Feedback Database

Feedback Schema (personalization tables)

Feedback Listener

(interface to base tables)

LikeMinds Listener (interface to

LikeMinds tables)

Event/Listener Interface

Log Manager ( multicaster /dispatcher) asynchronous

Beans Category bean

Action bean Rating bean

CustomLog bean Page view bean

Web page (Java Server Page)

Write a new custom

listener

Customize

Listener

Page 3: IBM WebSphere Portal content publishing v4.2

Figure 2: WebSphere Portal content publishing Feedback schema

Parameters

Date/Time

Personalization Data(Rules and Bean data)

Referrer

Protocol Entry resource

Client IP

Exit resource

User ID

Agent

SessionsHits

Page ViewsDuration

HitsPage Views

Bytes Transferred

Browser,Platform

Page 4: IBM WebSphere Portal content publishing v4.2

Figure 3: The WebSphere Portal content publishing personalization tables Feedback schema tables Lets take a look at how it all links together: Hit_Facts table: Each time a Web user browses a page of your Web site that contains a rule or a logging bean, WPCP logs the hit to that Web page as an entry in its Hit_Facts table. In a typical Web page implementation, a browser may make several HTTP requests to your server to render the page request. For example, there will be a request for the page body itself, as well as separate requests for each of the page’s embedded objects, such as

Page 5: IBM WebSphere Portal content publishing v4.2

images. It is the page request itself that is recorded in the Hit_Facts table. The subsequent request for the page’s embedded objects are not recorded. Figure 4 contains the source code for a simple Java Server Page (JSP), referred to from this point on as the category example. This sample is instrumented with a logging bean, in this case, the category bean (highlighted in blue). Each time a user requests this page, WPCP logs a row in the Hit_Facts table. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <HTML> <HEAD> <TITLE>Category Bean tester</TITLE> <META name="GENERATOR" content="IBM WebSphere Studio"> </HEAD> <BODY> <jsp:useBean class="com.ibm.wcp.analysis.beans.Category" id="category" scope="session"> </jsp:useBean> <%@ page import="java.util.*, com.ibm.wcp.analysis.event.*" %> <% // Set the user. if (session.getAttribute( "pzn.userName" ) == null)

session.setAttribute( "pzn.userName", "TestUser" ); try { category.log( request, "TestCategory" ); } catch (Throwable t) { t.printStackTrace(); } String[] categories = category.getCategoryNames( request ); for (int i = 0; i < categories.length; i++) { %> Category "<%= categories[i] %>" count = <%= category.getCategoryCount( request, categories[i] ) %><br> <% } %> </BODY> </HTML> figure 4: A simple JSP page containing a WebSphere Portal content publishing logging bean. Table 1 contains a brief description of each column of the Hit_Facts table. The personalization specific data is stored in key/value pair form, in a series of tables linked to the Hit_Facts table through the HitParms table. The HitParms table is the gateway to the series of tables referred to here as the personalization tables.

Page 6: IBM WebSphere Portal content publishing v4.2

The Hit_Facts table: ID The primary key ID to identify the row in the table.

IMPORTHISTORY_ID This column is not used by WPCP and is empty.

SESSION_ID

Foreign Key pointer to the SESSION_FACTS table. SESSION_FACTS contains data related to the session, such as the session identifier, user ID, IP address, user agent, and referral information. The SESSION_FACTS table is itself a fact table that contains links to dimensional data. All user hits must belong to a session, so all HIT_FACTS entries point to a session table entry.

HITTIMESTMP This is a time stamp of the hit entry. It is a numeric value indicating the current time as milliseconds from the epoch (i.e. January 1st, 1970).

LOCALDATE_ID This is a Foreign Key pointer into the CALENDAR table. The CALENDAR table contains an entry for each day from January 1995 to the year 2030. This field stores the ID of the CALENDAR row that represents the day in which the hit occurred. This is date of the actual hit, as recorded by the Web server, in the Web server’s time zone. For example, if the hit occurred on December 16th, 2002, then this entry will contain the value 2907.

LOCALTIMEOFDAY_ID This is a Foreign Key pointer into the TIMEOFDAY table. The TIMEOFDAY table contains 86,400 entries. One for each second in a 24 hour day. This field stores the ID of the TIMEOFDAY row that represents the time in which the hit occurred. This time is the time of the actual hit, as recorded by the Web server, in the Web server’s time zone. For example, if the hit occurred in the afternoon at 4:23PM and 22 seconds, then this entry will contain the value 59002, which corresponds to the time stamp: Hour:16, Minute:23, Seconds:22.

GMTDATE_ID Just like the LOCALDATE_ID, this is also a Foreign Key pointer into the CALENDAR table. This field stores the ID of the CALENDAR row that represents the Greenwich Mean Time (GMT) day in which the hit occurred. The

Page 7: IBM WebSphere Portal content publishing v4.2

date/time stamp of the actual hit, as recorded by the Web server, in the Web server’s time zone is converted to GMT time. It is the GMT date that is stored in this field.

GMTTIMEOFDAY_ID

This is a Foreign Key pointer into the TIMEOFDAY table. This field stores the ID of the TIMEOFDAY row that represents the GMT time in which the hit occurred. The date/time stamp of the actual hit, as recorded by the Web server, in the Web server’s time zone is converted to GMT time. It is the GMT time that is stored in this field. For example, if the hit occurred in the afternoon at 4:23PM and 22 seconds, and the server recorded the hit in Eastern Standard Time, then this entry will contain the value 77002, which corresponds to the GMT time stamp: Hour: 21, Minute: 23, Seconds:22.

NETWORK_ID

This column is not used and will be empty. The Network ID (or IP Address) of the client making this request is stored in the NETWORKS table. However, the network ID is linked to this request through the SESSION_FACTS table, which contains a link to the NETWORKS table.

USER_ID This column is not used and is empty. The User ID of the client making this request is stored in the USERS table. However, the user ID is linked to the HIT_FACTS table through the SESSION_FACTS table, which contains a link to the USERS table.

RESOURCE_ID This is a Foreign Key pointer to the RESOURCES table. This table contains the resource name of the URL of the HTTP request. For example, if the requested page was accessed through the URL http://localhost/wps/wcp/SimpleCategory.jsp then the RESOURCES table would contain an entry with a resource name of wps/wcp/SimpleCategory.jsp. The RESOURCE_ID of this HIT_FACTS entry would then contain the ID of that RESOURCES entry.

REFERRER_ID This is a Foreign Key pointer to the REFERRER table. This is used to reference the URL that referred the user to the page represented by the HIT_FACTS entry. The REFERRER table contains pointers to the host name and resource name of the referring URL. For example, if the user clicked on a link to the current page from a previous page, then the address of that previous page is recorded as the referral. The REFERRER_ID column of the HIT_FACTS entry would contain the ID of the REFERRER entry.

Page 8: IBM WebSphere Portal content publishing v4.2

PROTOCOL_ID This is a Foreign Key pointer to the PROTOCOLS table.

The PROTOCOL_ID represents the protocol that was used to transfer the page request.

REFPROTOCOL_ID

This is a Foreign Key pointer to the PROTOCOLS table. The REFPROTOCOL_ID represents the protocol designation of the page referral URL. For example, if the referring page was http://localhost/index.html then http is the protocol designation of this URL. The NAME field of the PROTOCOLS table would contain the value http. The ID of that entry is the foreign key contained in the REFPROTOCOL_ID column.

HTTPVERSION_ID This column is a Foreign Key pointer to the HTTPVERSION table. If the http version (e.g. “HTTP/1.1”, “HTTP/1.0) is not known, then an ID of 99 is used. The ID 99 in the HTTPVERSION table corresponds to the NAME unknown.

RETURNCODE_ID This column is not used by WPCP and is empty.

USERAGENT_ID

This column is not used by WPCP and is empty. The User Agent indentifying the client making this request is stored in the USERAGENTS table. However, it is linked to the HIT_FACTS table through the SESSION_FACTS table which contains a link to the USERAGENTS table.

STATUS_ID This column is not used by WPCP. It is set to the value 99, which is a Foreign Key pointer to the RESETSTATUS table. The ID 99 in the RESETSTATUS table corresponds to the NAME unknown.

JS_ID

This column is not used by WPCP. It is set to the value 99, which is a Foreign Key pointer to the JAVASCRIPTSTATUS table. The ID 99 in the JAVASCRIPTSTATUS table corresponds to the NAME unknown.

COOKIESSTATUS_ID This column is not used by WPCP. It is set to the value 99, which is a Foreign Key pointer to the COOKIESSTATUS table. The ID 99 in the COOKIESSTATUS table corresponds to the NAME unknown.

SERVER_ID This column is not used by WPCP and is empty.

Page 9: IBM WebSphere Portal content publishing v4.2

HITS This column is used to represent the number of HTTP hits represented by the HIT_FACTS entry. This column always has the value 1.0 for WPCP, because each HIT_FACTS entry represents a single page hit.

PAGEVIEWS

This column always has the value 1.0 for WPCP, because each HIT_FACTS entry represents a single page hit.

BYTES This column is not used by WPCP. It always has a value of 0.0.

TIMETAKEN This column is not used by WPCP. It always has a value of 0.0.

LASTUPDATED This column will contain a time stamp for the last time the HIT_FACTS entry was updated. When the HIT_FACT entry is created, and each time the HIT_FACTS entry is updated, WPCP updates this field by telling the database to assign its current time stamp to this field.

CORRELATIONKEY This column is not used by WPCP and is empty.

RECORDTYPE This column is used to identify the type of the HIT_FACTS entry. All entries in the HIT_FACTS table created by WPCP 4.2 have the value 32786. The value in this column is helpful to distinguish between entries created by WPCP and those created by other software such as Tivoli WebSphere Site Analyzer, which would use this field of the HIT_FACTS table identify records that it creates.

Table 1: The Hit_Facts table To understand the relationship among the personalization tables, consider the category example. When a user requests this page, the personalization data associated with the page hit to be recorded is the category TestCategory. WPCP stores all personalization data in the database in key/value pair format. WPCP defines keys for specific types of data. In this case the key is WcpCategory, identifying the corresponding value as category data. Other WPCP defined keys include WcpAction, WcpRule, etc. The key/value pair in this case is WcpCategory/TestCategory. This pair is stored in the database and linked back to the Hit_Facts entry as follows: Key and Value tables: Two simple tables, the Key table and Value table, hold the data from each key/value pair. The Key table holds the key name (and an ID associated with the name) and the

Page 10: IBM WebSphere Portal content publishing v4.2

Value table holds the value (and an ID associated with it). For example, the logical key/value pair of WcpCategory/TestCategory would be stored in the tables as follows: Key_Value_Pair table: Linking the keys and values together into pairs is the Key_Value_Pair table. The Key_Value_Pair table holds the IDs of each key and value that make up a pair. Each pair itself has an ID associated with it. For example, tying the key/value pair WcpCategory/TestCategory together, would be an entry in the Key_Value_Pair table that looks like this: where the value 1 in the KEY_ID column is the ID from the row of the Key table that corresponds to WcpCategory, and the value 1 in the VALUE_ID column is the ID from the row of the Value table that corresponds to TestCategory. Parms table: The Parms table, used in conjunction with the Key_Value_Combo table, is used to group multiple key/value pairs together. Key/value pairs are grouped together if WPCP needs multiple pairs to store all the data it collects about a rule execution or bean invocation. Multiple groups of key/value pairs are possible from the same page request, or the same Hit_Fact entry in the Feedback database. This would be the case if

Page 11: IBM WebSphere Portal content publishing v4.2

multiple rules were fired, or multiple logging beans were called within the page. In the category example, WPCP needs only one key/value pair (WcpCategory/TestCategory) to represent the personalization data associated the page request. WPCP creates a row in the Parms table for each group of key/value pairs. For this page request, there is one group of key/value pairs, and this group contains only one key/value pair. Therefore only one row is added to the Parms table. The ID column of the Parms table uniquely identifies the row. KVCOUNT indicates the number of key/value pairs contained in the group. WEBNODE_ID and PARMSSTRING are not used and remain empty. If WPCP needed multiple key/value pairs to store the personalization data for bean invocation, then the KVCOUNT in the Parms row would match the number of pairs. If, for example, the page request contained a call to an additional logging bean, then WPCP would need an additional key/value group. This would be reflected by an additional row in the Parms table. For the category example, here is the Parms table: To complete the group, all key/value pairs within the group must be linked together. This is done in the Key_Value_Combo table. Key_Value_Combo table: All key/value pairs that are grouped together are associated with the same ID in the Parms table. This is done through the Key_Value_Combo table. The Key_Value_Combo table contains the ID of each key/value pair (from the Key_Value_Pair table) and the ID of the Parms table entry to which the pair belongs. In the category example, the key/value pair (WcpCategory/TestCategory), whose pair ID is 1, is linked to the row in the Parms table whose ID is 1, as follows:

Page 12: IBM WebSphere Portal content publishing v4.2

If multiple key/value pairs were to be grouped together, then each pair ID would be linked to the same row in the Parms table, using the PARMS_ID. For example, if key/value pairs with IDs of 50, 51 and 52 were grouped together and key/value pairs of 100, 101, and 102 were grouped together, then there would be two rows in the Parms table (with unique IDs such as 2 and 3) and the Key_Value_Combo table would look like this: HitParms table: The HitParms table is the gateway from the Hit_Facts table, where each page request is logged to the personalization tables and the personalization data is logged. The HitParms table links each group of key/value pairs to the Hit_Facts table. The HitParms table contains rows associating each group of key/value pairs (identified by an entry in the Parms table) with its associated page request (identified by an entry in the Hit_Facts table). The HIT_ID and PARMS_ID columns of the HitParms table columns are both foreign key pointers to the Hit_Facts and Parms tables. In the category example, one row would be added to the HitParms table. It would look like:

Page 13: IBM WebSphere Portal content publishing v4.2

The HIT_ID identifies the page request, by containing the ID of proper row in the Hit_Facts table. The PARMS_ID identifies the key/value pair group, which in this example is set to the value of 1, which corresponds to the row in the Parms table with the ID of 1. The ORDERING column is used to record the sequence in which multiple key/value pair groups are recorded for the same page. If only one group of key/value pairs is needed to record the personalization information for a particular page request, then there would be only one group in the sequence of groups. In that case the ORDERING column for the row corresponding to that group would have a value of 0, indicating it is the first in the sequence. As it turns out, in this case it is the only one in the sequence. If multiple key/value pair groups were associated with the same page request (which can occur if multiple rules were fired, or multiple logging beans were called within the page), then the entries in the HitParms for that page request would have increasing ORDERING values, incremented starting with 0 for the first, 1 for the second, 2 for the third and so on. The following table shows an example. Note the values increase only for rows with the same HIT_ID:

Page 14: IBM WebSphere Portal content publishing v4.2

The PARMTYPE column is used to classify the type of data represented by each group of key/value pairs. Personalization data is collected and logged when both rules and logging beans are used within a page. Data collected by a rule firing may need to be differentiated from data collected by a bean invocation. Although the name of the key in each key/value pair can generally identify the data, there are cases, such as custom data with custom keys, where additional differentiation is needed. Following are the PARMTYPE values used for personalization data:

81 Rule Data. Data recorded as a result of a rule execution. 82 Action Data. Data recorded due to use of an Action bean. 83 Category Data. Data recorded due to the use of a Category bean. 84 Custom Data. Data recorded due to the use of a Custom log bean. 85 Rating Data. Data recorded due to the use of a Rating bean.

Note: The use of a Page view bean does not result in key/value pair data, so there is no PARMTYPE to identify it. In the category example, the row added to the HitParms table contains a PARMTYPE value of 83 indicating that the key/value pair group represents categorical data. Conclusion Through an understanding of the tables that WPCP version 4.2 uses to log Personalization data, it is possible to better exploit the WPCP logging features. For example, you could extend the WPCP default (out-of-the-box) reports by writing your own reports, each employing one or more custom SQL queries against the data. It is also possible to leverage the features of third party reporting tools, or use the Feedback schema as part of a data warehouse solution. Figure 5 depicts the possibilities.

Page 15: IBM WebSphere Portal content publishing v4.2

figure 5: Exploiting the Feedback data

Trademarks The following are trademarks of International Business Machines Corporation: IBM and WebSphere.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of the Open Group in the United States and other countries.

All other marks are the property of their respective owners.

© IBM Corporation 2003. All rights reserved.

Reports

.JSPs

Data ExplorationAnd

Business Intelligence

WHMData Warehouse

Data Warehouses

OLAP

Mining

Custom SQL

3rd Party tools

•Extended WPCP out-of-the-box reports with custom reports•Leverage other reporting solutions

1) WPCP out-of-the-box reports2) Leverage external solutions

LikeMinds Schema(recommendation tables)

(LikeMinds specific tables)

Feedback Database

Feedback Schema(personalization tables)