spotfire recommendations in action

Examples of Spotfire Recommendations in ActionEasy dashboard setup for business users, dramatically faster creation of full-featured data analysis applications for analysts

The agile business intelligence market is growing rapidly, and as Gartner points out, the transition is toward platforms that can be rapidly implemented and used by analysts and business users to find insights quickly—as well as by IT staff to quickly build analytics content to meet business requirements and deliver more timely business benefits.1 This drive for speed is about business value: accuracy and speed of interpretation for decision-making, authoring, and development of data discovery applications, and task completion to enable developers to implement their ideas quickly and obtain accurate insights. 2

This paper describes a recommendation engine for the TIBCO Spotfire® interactive graphical analysis system. Spotfire Recommendations makes data discovery fast and easy for both analysts and business users. The system uses metadata typing and built-in graphics taxonomy to produce a collection of inherently sensible graphics choices applied to the data at hand. The user chooses from the suggestions, and the software builds a dashboard of linked, brushable, configurable graphics with supporting data filters and graphics controls that can be rapidly applied to the canvas.

1 Rita L. Sallam, Bill Hostmann, Kurt Schlegel, Joao Tapadinhas, Josh Parenteau, Thomas W. Oestreich. Mag-ic Quadrant for Business Intelligence and Analytics Platforms, Gartner. February 23, 2015.

2 Stephen Few. Information Dashboard Design, Analytics Press, CA. 2015.

TIME IS OF THE ESSENCE“With a dashboard, every unnecessary piece of information results in time wasted trying to filter out what’s important, which is intolerable when time is of the essence.”2

WHITEPAPER | 2

For business users, Recommendations reduces the burden for initial setup of the dashboard, and for analysts, it dramatically speeds the creation of full-featured data analysis applications.

Following are two case studies showing Recommendations applied to datasets for consumer packaged goods manufacturing and for homeless populations in the United States.

CASE STUDY: GEOLOCATION ANALYSIS OF US HOMELESSThe Department of Housing and Urban Development (HUD) collects data on homelessness in the US and releases two annual reports to Congress: the Annual Homelessness Assessment Report (AHAR), Parts 13 and 24. Part 1 contains information from the annual point-in-time counts (PIT) conducted by communities nationwide on a single night in January. Part 2 includes information obtained from homeless shelters throughout the course of a calendar year, the Homeless Inventory Count (HIC). In March 2015, HUD released the 2013 AHAR Part 2; Part 1 was released in October 2014.

Raw data is available online at data.hud.gov. We obtained PIT and HIC data for 2007–2013. Estimates of homeless veterans are included beginning in 2011. HUD partners with the Veterans Administration on the Veterans Homelessness Prevention Demonstration Program.

The Housing Inventory Count and point-in-time data are yearly measures across ~400 spatial regions in the US using HUD’s Continuums of Care (CoC) regions. A shape file describing these regions is available at https://www.hudexchange.info/coc/gis-tools/.

Key variables in the beds data (HIC [Housing Inventory Count]) are: Shelter Type (ES [Emergency Shelter], TH [Transitional Housing], RRH [Rapid Re-Housing], SH [Safe Haven], PSH [Permanent Supportive Housing] and Household Type (with children, without children, with only children). Key variables in homeless data (PIT) are: Shelter Access (Sheltered, Unsheltered) and Family Situation (Individuals, Persons in Families).

We also included US Census data in the analysis for the period 2010–2013, including counts by state, county, and age group of total population, total male population, total female population, and male and female populations broken down by race.

3 https://www.hudexchange.info/resources/documents/2014-AHAR-Part1.pdf

4 https://www.hudexchange.info/onecpd/assets/File/2013-AHAR-Part-2.pdf

https://www.hudexchange.info/coc/gis-tools/

https://www.hudexchange.info/coc/gis-tools/

https://www.hudexchange.info/resources/documents/2014-AHAR-Part1.pdf

https://www.hudexchange.info/onecpd/assets/File/2013-AHAR-Part-2.pdf

WHITEPAPER | 3

DATA PANEL The analysis begins by loading the data, which results in data column names being organized in a data panel (Figure 1). To start the analysis, the user clicks the Recommendations icon and selects one or more columns of interest.

Figure 1. Spotfire open with data panel on the left showing homeless data. The user clicks the Recommended visualizations icon in the center of the canvas to start the analysis.

The numerical columns in this data panel are the homeless counts (PIT) and beds (HIC) by year. Other data includes a time variable (YearDate), location (by state and county) and categorical data relating to the Continuum of Care.

We select homeless and state data initially. Spotfire Recommendations suggests some maps, a bar chart, and a tree map of homeless by state (Figure 2). We add a map and treemap by state to the canvas.

Figure 2. Recommendations panel for US homeless data, state selected.

WHITEPAPER | 4

HOMELESS, BEDS, AND YEARWe next add beds, and year. Recommendations responds by suggesting cross tables, a bar chart trellised by year, and a parallel coordinates plot (Figure 3). We choose the cross table (lower right) to add to the canvas.

FIGURE 3. Recommendations panel for US homeless data, state, beds, and year selected.

HOMELESS BY STATEWe now have a map of homeless by state, a treemap by state, and a cross table of homeless and beds by state. Recommendations has linked and arranged these three graphs on the canvas. With a few more mouse clicks to configure the graphs, we have an accurate, interactive summary of homeless in the US (Figure 4). Creating this initial dashboard took approximately 30 seconds.

Figure 4. Dashboard showing map of homeless and utilization of shelters by state.

WHITEPAPER | 5

HOMELESS SHELTERSUsing this dashboard as a starting point, we are now able to build a comprehensive analysis of homeless across the US. This enables us to assess if there are enough shelters for the homeless on an ongoing regional basis.

Figure 5 shows such a dashboard including a map of homeless utilization by state, trends of homeless and available beds, beds by shelter type, top states for bed utilization, and tables of homeless and bed utilization by CoC. The dashboard addresses the question: Do we have enough shelters for the homeless? Relevant KPIs are shown across the top and visualizations are arranged for easy interpretation.

Figure 5. Completed dashboard providing a detailed analysis of homeless in the US during 2007–2013.

The dashboard in Figure 5 is rapidly assembed from that shown in Figure 4. We calculated bed utilization, configured colors on the map and bar chart, and added the trend charts and a slider for years at the top.

This dashboard is setup for drill down into regions and times of interest. All the data is now in shape for continued analysis, and for combining with additional data. We focus on Massachusetts and incorporate some weather data into the analysis. We fit contours to the temperature data (zip code) and display precipitation by size of circle. Color of contour lines and circles indicates temperature (red is warmer and blue is colder).

Figure 6 shows this updated analysis for Massachusetts. Note that the pockets of high homeless utilization to the southeast of Boston coincide with milder temperatures and lower precipitation.

WHITEPAPER | 6

Figure 6. Completed dashboard with drill-down to homeless in Massachusetts. Weather data has been incorporated: contours are fit to the temperature data, and precipitation is shown in circles (larger circles indicate more precipitation). Color of contour lines and circles indicates temperature (red is warmer and blue is colder). Note that the pockets of high homeless utilization to the southeast of Boston coincide with milder temperatures and lower precipitation.

CASE STUDY: PAPER TOWEL MANUFACTURINGPaper towel manufacturing involves equipment including dryers, dyes, feeders, cutting and pattern machines, and a series of process steps. Product quality is assessed via measurements of quality characteristics like absorbancy, strength, and softness. Large quantities of data are collected on machines and on process times for each batch at every process step.

The data under consideration in this simple example includes measurements of product quality and equipment operation and performance. One goal of the analysis is to assess effects of equipment on product quality.

DATA PANEL AND COLUMN NAMESThe analysis begins by loading the data, which results in data column names being organized in a data panel. To start the analysis, the user clicks the Recommendations icon and selects one or more columns of interest.

The numerical columns in this dataset are the measured product quality characteristics. Selecting these columns produces histograms, density plots, and tables in the Recommendations panel. Basic versions of the actual visualizations are displayed (not canned representations of generic chart types), so the user can see the shape of the distribution directly in the panel, (Figure 7).

WHITEPAPER | 7

FIGURE 7. Recommendations panel with numeric columns selected. The absorbency histogram is shown in the lower right.

HISTOGRAMSPaper towel softness appears to be normally distributed (upper right). Selecting each of the other product quality characteristics indicates that most of them are normally distributed as well. However, when absorbency is selected, the histogram shows a bi-modal distribution (lower right). This is an interesting finding that needs an explanation. The absorbency histogram is selected and added to the analysis by clicking on it in the Recommendations display (Figure 7).

PROCESS DATESTo investigate whether the changes in absorbency correlate to any temporal effects, a column with process dates is now selected. In this case, a line plot, bar chart, tree map, table, and other graphics (not showing) are then displayed in the Recommendations panel (Figure 8). The line plot shows a dramatic change in absorbency over time (Figure 8, top left), so it is selected and added to the analysis.

Figure 8. Recommendations panel with absorbency numeric column and a time column selected in the Data Panel. Why does absorbency increase in the later part of July? The line plot is selected (clicked) to add it to the analysis.

WHITEPAPER | 8

Additional line plots are available from Recommendations when the numeric variables are chosen along with a time variable (Figure 9). Absorbency (lower left) stands out as being more affected by time (day of the month).

Figure 9. Recommendations panel with numeric columns and time (day of the month) selected. Absorbency is the red trace in the lower left.

MACHINE USE IN TWO PROCESS STEPSTo investigate whether the machines may be affecting absorbency at one or more process steps, additional categorical columns, each containing the machine used at a process step, are selected. The most relevant recommended graph is an absorbency line plot, colored by machines at one step and trellised by machines at the second step (Figure 10, top right). This is chosen and added to the analysis.

Figure 10. Recommendations panel with absorbency numeric column, a time column, and two categorical process machine columns selected.

WHITEPAPER | 9

DAY OF MONTH VS. LOW ABSORBENCYThe Recommendations panel is then closed to view the useful visualizations that have been added. Marking is set to correlate colors for corresponding points in the line plot and histogram (Figure 11). It is clear that low absorbency in the histogram correlates to batches processed before July 17th, and high absorbency correlates to batches processed on or after July 17th.

Figure 11. Analysis with line plot and histogram added from the Recommendations panel.

ANALYSIS OF VARIANCETo understand the paper towel absorbency variability, we use the Spotfire analysis of variance (ANOVA) function. This enables an assessment of the effects of the machine process steps on the measured product quality characteristics. In the ANOVA setup dialog (Figure 12), absorbency is selected as a response variable (Y), and the process step equipment (tool) columns are selected as explanatory variables (X).

Figure 12. ANOVA analysis setup dialog.

WHITEPAPER | 10

ANOVA results (Figure 13) are presented as a sorted summary table with one row per response predictor pair (product quality characteristic and process step equipment). Each row contains a p-value indicating statistical significance. The most significant relationship is the effect of Dryer Tool on absorbency. Marking this row produces a drill-down box plot showing that Dryer Tool DY1 produces paper towel batches with significantly higher absorbency than Dryer Tools DY2 and DY3.

Figure 13. ANOVA results.

DRYER TOOLSSince the ANOVA results indicate that Dryer Tool is responsible for the variation in absorbency, the linked line plot and histogram produced with Recommendations are then configured to show this clearly (Figure 14). The earlier dashboard graphics are colored to distinguish between the Dryer Tools, and the line plot X-axis is changed to the Dryer process date. This shows that only Dryer Tools DY2 and DY3 were in use prior to July 13th and only DY1 was used on or after July 13th. It is now clear that the increase in paper towel absorbency is related to the switch to Dryer DY1. Additional investigation is warranted to determine how DY2 and DY3 could be modified to match the superior absorbency results obtained with DY1.

Figure 14. Reconfigured discovery analysis for presentation; line plots and histograms colored according to Dryer Tool.

WHITEPAPER | 11

TIBCO Software Inc. is a global leader in infrastructure and business intelligence software. Whether it’s optimizing inventory, cross-selling products, or averting crisis before it happens, TIBCO uniquely delivers the Two-Second Advantage®— the ability to capture the right information at the right time and act on it preemptively for a competitive advantage. With a broad mix of innovative products and services, customers around the world trust TIBCO as their strategic technology partner. Learn more about TIBCO at www.tibco.com.©2015, TIBCO Software Inc. All rights reserved. TIBCO, the TIBCO logo, TIBCO Software, and Spotfire are trademarks or registered trademarks of TIBCO Software Inc. or its subsidiaries in the United States and/or other countries. All other product and company names and marks in this document are the property of their respective owners and mentioned for identification purposes only.

05/27/15

Global Headquarters 3307 Hillview Avenue Palo Alto, CA 94304+1 650-846-1000 TEL +1 800-420-8450 +1 650-846-1005 FAXwww.tibco.com

CONCLUSIONSFor business users, Recommendations reduces the burden for initial setup of the dashboard, and for analysts, the engine dramatically speeds the creation of full-featured data analysis applications. Recommendations enables rapid insights, but does not eliminate the need for the analyst and business user to know the data structure and understand the principles of sound graphical data and information display.

This whitepaper shows two examples: a KPI dashboard and analysis for the US homeless population and an analysis of a paper towel manufacturing process. In both cases, Recommendations enables rapid creation of initial dashboards, while also setting up the Spotfire workbook for continued advanced analysis to assess underlying correlations and root cause. The homeless analysis identifies homeless hotspots and relates these to weather patterns. The manufacturing analysis identifies issues with absorbency and relates these to introduction of a new dryer tool.

The speed of analysis and presentation is a common theme in these examples. This productive Spotfire environment enables users to move through data quickly, to identify hotspots, and get to the underlying issues. This environment thereby drives extreme value as insights are generated rapidly and thoroughly throughout all the available data. These insights are then available for action on the TIBCO Fast Data Platform.

ACKNOWLEDGEMENTSThis whitepaper features the remarkable engineering work of the Spotfire Recommendations Engineering team: Jörgen Gustavsson, Gustav Karlberg, Erik Brandin, and Anders Fougstedt.

The manufacturing case study was prepared by Mike Alperin and Michael O’Connell from the Spotfire Analytics team. The homeless case study was based on analysis contributions from Andrew Berridge, Brad Hopper, Ujval Kamath, Jagrata Minardi, Eric Novik, Anna Nowakowska, Michael O’Connell and Peter Shaw from the Spotfire Analytics team.

spotfire recommendations in action

Data & Analytics

time data

business users

raw data

business requirements

business value

supporting data filters

timely business benefits

information dashboard