driver analysis and product optimization with bayesian networks

Tutorial on Driver Analysis andProduct Optimization with BayesiaLabStefan Conrady, [email protected]

Dr. Lionel Jouffe, [email protected]

December 11, 2010

Revised: March 13, 2013

mailto:[email protected]




Table of ContentsIntroduction

BayesiaLab 4

Acknowledgements 4

Abstract 4

Bayesian Networks 5

Structural Equation Models 5

Probabilistic Structural Equation Models 5

TutorialNotation 6

Model Development 6

Dataset 6

Consumer Research 6

Data Import 7

Unsupervised Learning 11

Preliminary Analysis 15

Variable Clustering 20

Multiple Clustering 23

Analysis of Factors 26

Completing the PSEM 32

Market Driver Analysis 38

Product Driver Analysis 44

Product Optimization 44

Conclusion 56

Appendix: The Bayesian Network Paradigm

Acyclic Graphs & Bayes’s Rule 57

Compact Representation of the Joint Probability Distribution 58

References

Driver Analysis and Product Optimization with BayesiaLab

ii www.bayesia.com | www.bayesia.us

http://www.bayesia.com


http://www.bayesia.us


Contact Information

Bayesia USA 60

Bayesia S.A.S. 60

Bayesia Singapore Pte. Ltd. 60

Copyright 60


www.bayesia.com | www.bayesia.us iii





IntroductionThis tutorial is intended for new or prospective users of BayesiaLab. The example in this tutorial is taken from the !eld of marketing science and is meant to illustrate the capabilities of BayesiaLab with a real-world case study and actual consumer data. Beyond market researchers, analysts and researchers in many !elds will hopefully !nd the proposed methodology valuable and intuitive. In this context, many of the technical steps are outlined in great detail, such as data preparation and the network learning, as they are applicable to research with BayesiaLab in general, regardless of the domain.1

BayesiaLab

Bayesia S.A.S., based in Laval, France has been developing BayesiaLab since 1999 and it has emerged as the leading software package for knowledge discovery, data mining and knowledge modeling using Bayesian networks. BayesiaLab enjoys broad acceptance in academic communities as well as in business and industry. The relevance of Bayesian networks, especially in the context of market research, is highlighted by Bayesia’s strategic partnership with Procter & Gamble, who has deployed BayesiaLab globally since 2007.

Acknowledgements

We would like to express our gratitude to Ares Research (www.ares-etudes.com) for generously providing data from their consumer research for our case study.

Abstract

Market driver analysis and product optimization are one of the central tasks in Product Marketing and thus relevant to virtually all types of businesses. BayesiaLab provides a uni!ed software platform, which can, based on consumer data,

1. provide deep understanding of the market preference structure

2. directly generate recommendations for prioritized product actions.

The proposed approach utilizes Probabilistic Structural Equation Models (PSEM), based on machine-learned Bayesian networks. PSEMs provide an ef!cient alternative to Structural Equation Models (SEM), which have been used traditionally in market research.


4 www.bayesia.com | www.bayesia.us

1 This tutorial is based on version 5.0 of BayesiaLab.

http://www.ares-etudes.com

http://www.ares-etudes.com





Bayesian Networks

A Bayesian network or belief network is a directed acyclic graphical model that represents the joint prob-ability distribution over a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between dis-eases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.2

Structural Equation Models

Structural Equation Modeling (SEM) is a statistical technique for testing and estimating causal relations using a combination of statistical data and qualitative causal assumptions. This de!nition of SEM was ar-ticulated by the geneticist Sewall Wright (1921), the economist Trygve Haavelmo (1943) and the cognitive scientist Herbert Simon (1953), and formally de!ned by Judea Pearl (2000).

Structural Equation Models (SEM) allow both con!rmatory and exploratory modeling, meaning they are suited to both theory testing and theory development.

Probabilistic Structural Equation Models

Traditionally, specifying and estimating an SEM required a multitude of manual steps, which are typically very time consuming, often requiring weeks or even months of an analyst’s time. PSEMs are based on the idea of leveraging machine learning for automatically generating a structural model. As a result, creating PSEMs with BayesiaLab is extremely fast and can thus form an immediate basis for much deeper analysis and optimization.


www.bayesia.com | www.bayesia.us 5

2 See appendix for a brief introduction to Bayesian networks.





TutorialAt the beginning of this tutorial, we want to emphasize the overarching objectives of this case study, so we do not lose sight of the “big picture” as we immerse ourselves into the technicalities of BayesiaLab and Bayesian networks.

In this study, we want to examine how product attributes perceived by consumers relate to purchase inten-tion for speci!c products. Put simply, we want to understand the key drivers for purchase intent. Given the large number of attributes in our study, we also want to identify common concepts among these attributes in order to make interpretation easier and communication with managerial decision makers more effective.

Secondly, we want to utilize the generated understanding of consumer dynamics so product developers can optimize the characteristics of the products under study in order to increase purchase intent among consum-ers, which is our ultimate business objective.

Notation

In order to clearly distinguish between natural language, BayesiaLab-speci!c functions and study-speci!c variable names, the following notation is used:

• BayesiaLab functions, keywords, commands, etc., are shown in bold type.

• Variable names are capitalized and italicized.

Model Development

Dataset

Consumer Research

This study is based on a monadic3 consumer survey about perfumes, which was conducted in France. In this example, we use survey responses from 1,320 women, who have evaluated a total of 11 fragrances on a wide range of attributes:

• 27 ratings on fragrance-related attributes, such as, “sweet”, “!owery”, “feminine”, etc., measured on a 1-to-10 scale.

• 12 ratings on projected imagery related to someone, who would be wearing the respective fragrance, e.g. “is sexy”, “is modern”, measured on a 1-to-10 scale.

• 1 variable for Intensity, a measure re"ecting the level of intensity, measured on a 1-to-5 scale.4

• 1 variable for Purchase Intent, measured on a 1-to-6 scale.



3 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume.

4 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just-

about-right” level.





• 1 nominal variable, Product, for product identi!cation purposes.

Data Import

To start the analysis with BayesiaLab, we !rst import the data set, which is formatted as a CSV !le.5 With Data | Open Data Source | Text File, we start the Data Import wizard, which immediately provides a pre-view of the data !le.

The table displayed in the Data Import wizard shows the individual variables as columns and the responses as rows. There are a number of options available, e.g. for sampling. However, this is not necessary in our example given the relatively small size of the database.

Clicking the Next button, prompts a data type analysis, which provides BayesiaLab’s best guess regarding the data type of each variable.

Furthermore, the Information box provides a brief summary regarding the number of re-cords, the number of missing values 6 , !ltered states, etc.

For this example, we will need to override the default data type for the Product variable as each value is a nominal product identi!er rather than a numerical scale value. We can change the data type by highlighting the Prod-uct variable and clicking the Discrete check box, which changes the color of the Product column to red.



5 CSV stands for “comma-separated values”, a common format for text-based data !les.

6 There are no missing values in our database and !ltered states are not applicable in this survey.





We will also de!ne Purchase Intent and Inten-sity as discrete variables, as the default number of states of these variables is already adequate for our purposes.7

The next screen provides options as to how to treat any missing values. In our case, there are no missing values so that the corresponding panel is grayed-out.

Clicking the small upside-down triangle next to the variable names brings up a window with key statistics of the selected variable, in this case Fresh.

The next step is the Discretization and Aggre-gation dialogue, which allows the analyst to determine the type of discretization, which must be performed on all continuous variables.8 For this survey, and given the num-ber of observations, it is appropriate to reduce the number of states from the original 10 states (1 through 10) to a smaller number. One could, for instance, bin the 1-10 rating into low, mid and high, or apply any other arbi-trary method deemed appropriate by the ana-lyst.



7 The desired number of variable states is largely a function of the analyst’s judgment.

8 BayesiaLab requires discrete distributions for all variables.





The screenshot shows the dialogue for the Manual selection of discretization steps, which permits to select binning thresholds by point-and-click.

For this particular example, we select Equal Distances with 5 intervals for all continuous variables. This was the analyst’s choice in order to be consistent with prior research.

Clicking Select All Continuous followed by Finish completes the import process and the 49 variables (columns) from our database are now shown as blue nodes in the Graph Panel, which is the main window for network editing.

Note

For choosing discretization algorithms beyond this example, the following rule of thumb may be helpful:

• For supervised learning, choose Decision Tree.

• For unsupervised learning, choose, in the order of priority, K-Means, Equal Distances or Equal Frequencies.







This initial view represents a fully unconnected Bayesian network.

For reasons, which will become clear later, we will initially exclude two variables, Product and Purchase Intent. We can do so by right-clicking the nodes and selecting Properties | Exclusion. Alternatively, holding “x” while double-clicking the nodes performs the same exclusion function.







Unsupervised Learning

As the next step, we will perform the !rst unsupervised learning of a network by selecting Learning | Asso-ciation Discovering | EQ.

The resulting view shows the learned network with all the nodes in their original position.







Needless to say, this view of the network is not very intuitive. BayesiaLab has numerous built-in layout al-gorithms, of which the Force Directed Layout is perhaps the most commonly used.







It can be invoked by View | Automatic Layout | Force Directed Layout or alternatively through the key-board shortcut “p”. This shortcut is worthwhile to remember as it is one of the most commonly used func-tions.







The resulting network will look similar to the following screenshot.

To optimize the use of the available screen, clicking the Best Fit button in the toolbar “zooms to !t”

the graph to the screen. In addition, rotating the graph with the Rotate Left and Rotate Right buttons helps to create a suitable view.

The !nal graph should closely resemble the following screenshot and, in this view, the properties of this !rst learned Bayesian network become immediately apparent. This network is a now compact representation of the 47 dimensions of the joint probability distribution of the underlying database.







It is very important to note that, although this learned graph happens to have a tree structure, this is not the result of an imposed constraint.

Preliminary Analysis

The analyst can further examine this graph by switching into the Validation Mode, which immediately opens up the Monitor Panel on the right side of the screen.







This panel is initially empty, but by clicking on any node (or multiple nodes) in the network, Monitors ap-pear inside the Monitor Panel. The corresponding nodes are highlighted in yellow.







By default, the Monitors show the marginal distributions of all selected variables. This shows, for instance, 9.7% of respondents rated their perfume at <=2.8 in terms of the Fresh attribute.

On this basis, one can start to experiment with the properties of this particular Bayesian network and query it. With BayesiaLab this can be done in an extremely intuitive way, i.e. by setting evidence (or observations) directly on the Monitors. For instance, we can compute the conditional probability distribution of Flowery, given that we have observed a speci!c value, i.e. a speci!c state of Fresh. In formal notation, this would be

P(Flowery | Fresh)

We will now set Flowery to the state that represents the highest rating (>8.2), and we can immediately ob-serve the conditional probability distribution of Fresh, i.e.

P(Fresh | Flowery = " > 8.2")







The gray arrows inside the bars indicate how the distributions have changed compared to the previous distributions. This means that re-spondents, who have rated the Flowery attribute of a perfume at the top level, will have a 67% probability of also assigning a top rating to the Fresh attribute.

P(Fresh = " > 8.2" | Flowery = " > 8.2") = 66.9%

Switching brie"y back into the Modeling Mode and by clicking on the Flowery node, one can see the prob-abilistic relationship between Flowery and Fresh in detail. By learning the network, BayesiaLab has auto-matically created a contingency table for every single direct relationship between nodes.

All contingency tables, together with the graph structure, thus encode the joint probability distribution of our original database.

Returning to the Validation Mode, we can further examine the properties of our network. Of great interest is the strength of the prob-abilistic relationships between the variables. In BayesiaLab this can be shown by selecting Analysis | Graphic | Arcs’ Mutual Information.

Note

The structure of our Bayesian network may be directed, but the directions of the arcs do not necessarily have to be meaningful.

For observational inference, it is only necessary that the Bayesian network correctly represents the joint probability distribution of the underlying database.







The thickness of the arcs is now proportional to the Mutual Information, i.e. the strength of the relationship between the nodes.

Intuitively, Mutual Information measures the information that X and Y share: it measures how much know-ing one of these variables reduces our uncertainty about the other. For example, if X and Y are independent, then knowing X does not provide any information about Y and vice versa, so their mutual information is zero. At the other extreme, if X and Y are identical then all information conveyed by X is shared with Y: knowing X determines the value of Y and vice versa.

Formal De!nition of Mutual Information

I(X;Y ) = p(x, y)log p(x, y)p(x)p(y)

⎛⎝⎜

⎞⎠⎟x∈X

∑y∈Y∑

We can also show the values of the Mutual Information on the graph by clicking on Display Arc Com-ments.







In the top part of the comment box attached to each arc, the Mutual Information of the arc is shown. Below, expressed as a percentage and highlighted in blue, we see the relative Mutual In-

formation in the direction of the arc (parent node ➔ child node). And, at the bottom, we have the relative mutual information in the opposite direction of the arc (child node ➔ parent node).

Variable Clustering

The information about the strength between the manifest variables can also be utilized for purposes of Vari-able Clustering. More speci!cally, a concept related closely to the Mutual Information, namely the Kullback-Leibler Divergence (K-L Divergence) is utilized for clustering.

For probability distributions P and Q of a discrete random variable their K–L divergence is de!ned to be

DKL = (P ||Q) = P(i)log P(i)Q(i)i

∑

In words, it is the average of the logarithmic difference between the joint probability distributions P(i) and Q(i), where the average is taken using the probabilities P(i).







Such variable clusters will allow us to induce new latent variables, which each represent a common concept among the manifest variables.9 From here on, we will make a very clear distinction between manifest vari-ables, which are directly observed, such as the survey responses, and latent variables, which are derived. In traditional statistics, deriving such latent variables or factors is typically performed by means of Factor Analysis, e.g. Principal Components Analysis (PCA).

In BayesiaLab, this “factor extraction” can be done very easily via the Analysis | Graphics | Variable Clus-tering function, which is also accessible through the keyboard shortcut “s”.

The speed in which this is performed is one of the strengths of BayesiaLab, as the resulting variable clusters are presented instantly.



9 An alternative approach is to interpret the derived concept or factor as a hidden common cause.





In this case, BayesiaLab has identi!ed 15 variable clusters and each node is color-coded according to its cluster membership. To interpret these newly-found clusters, we can zoom in and visually examine the structure in the Graph Panel.

To support the interpretation process, BayesiaLab can also display a Dendrogram, which allows the analyst to review the linkage of nodes into variable clusters.

The analyst may also choose a different number of clusters, based on his own judgement relating to the do-main. A slider in the toolbar allows to choose various numbers of clusters and the color association of the nodes will be update instantly.

By clicking the Validate Clustering button in the toolbar, the clusters are saved and the color codes will be formally associated with the nodes. A clustering report provides us with a formal summary of the new factors and their associated manifest variables.10



10 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.





The analyst also has the option to use his do-main knowledge to modify which manifest variables belong to speci!c factors. This can be done by right-clicking on the Graph Panel and selecting Class Editor.

Multiple Clustering

As our next step towards building the PSEM, we will introduce these newly-generated latent factors into our existing network and also estimate their probabilistic relationships with the manifest variables. This means we will create a new node for each latent factor, creating 15 new dimensions in our network. For this step, we will need to return to the Modeling Mode, because the introduction of the factor nodes into the net-works requires the learning algorithms.







More speci!cally, we select Learning | Multiple Clustering, which brings up the Multiple Clus-tering dialogue. There is a range of settings, but we will focus only on a subset. Firstly, we need to specify an output directory for the to-be-learned networks. Secondly, we need to set some parameters for the clustering process, such as the minimum and maximum number of states, which can be created during the learning process.

In our example, we select Automatic Selection of the Number of Classes, which will allow the learning algorithm to !nd the optimum num-ber of factor states up to a maximum of !ve states. This means that each new factor will need to represent the corresponding manifest variables with up to !ve states.







The Multiple Clustering process concludes with a report, which shows details regarding the generated clustering. The top portion of the report is shown in the following screen-shot.

The detail section of Factor_0, as it relates to the manifest variables, is worth highlighting. Here, we can see the strength of the relation-ship between the manifest variables, such as Trust, Bold, etc., and Factor_0. In a traditional Factor Analysis, this would be the equivalent of factor loading.

After closing the report, we will now see a new (unconnected) network, with 15 additional nodes, one for each factor, i.e. Factor_0 through Factor_14, highlighted in yellow in the screenshot.







Analysis of Factors

We can also further examine how the new factors relate to the manifest variables and how well they repre-sent them. In the case of Factor_0, we want to understand how it can summarize our !ve manifest variables.

By going into our previously-speci!ed output directory, using the Windows Explorer or the Mac Finder, we can see that 15 new networks (in BayesiaLab’s xbl format for networks) were generated. We open the spe-ci!c network for Factor_0, either by directly double-clicking the xbl !le or by selecting Network | Open. The factor-speci!c networks are identi!ed by a suf!x/extension of the format “_[Factor_#].xbl” and “#” stands for the factor number. We then see a network including the manifest variables and with the factor being linked by arcs going from the factor to the manifest variables.

Returning to the Validation Mode, we can see !ve states for Factor_0, labeled C1 through C5, as well as their marginal distribution. As Factor_0 is a target node by default, it automatically appears highlighted in red in the Monitor Panel.







Here, we can also study how the states of the manifest variables relate to the states of Factor_0. This can be done easily by setting observations to the monitors, e.g. setting C1 to 100%.

We now see that given that Factor_0 is in state C1, the variable Active has a probability of approximately 75% of being in state <=2.8. Expressed more formally, we would state P(Active = “<=2.8” | Factor_0 = C1) = 74.57%. This means that for respondents, who have been assigned to C1, it is likely that they would rate the Active attribute very low as well.

In the Monitor for Factor_0, in parentheses behind the cluster name, we !nd the expected mean value of the numeric equivalents of the

states of the manifest variables, e.g. “C1 (2.08)”. That means that given the state C1 of Factor_0, we expect the mean value of Trust, Bold, Ful"lled, Active and Character to be 2.08.







To go into even greater detail, we can actually look at every single respondent, i.e. every record in the data-base and see what cluster they were assigned to. We select Inference | Interactive Inference,

which will bring up a record selector in the toolbar.

With this record selector, we can now scroll through the entire database, review the actual ratings of the respondents and then see the estimation to which cluster each respondent belongs.







In our !rst case, record 0, we see the ratings of this respondent indicated by the manifest Monitors. In the highlighted Monitor for Factor_0, we read that this respondent, given her responses, has an 82% probabil-ity of belonging to Cluster 5 (C5) in Factor_0.







Moving to our second case, record 1, we see that the respondent belongs to Cluster 3 (C3) with a 96% probability.

We can also evaluate the performance of our new network based on Factor_0 by selecting Analysis | Net-work Performance | Global.







This will return the log-likelihood density function as shown in the following screenshot.







Completing the PSEM

We are now returning to our main task and our principal network, which has been augmented by the 15 new factors.

Before we re-learn our network with the new factors, we need to include Purchase Intent as a variable and also impose a number of constraints in the form of Forbidden Arcs.







Being in the Modeling Mode, we can include Purchase Intent by right-clicking the node and uncheck Exclu-sion.

This makes the Purchase Intent variable available in the next stage of learning, which is re"ected visually as well in the node color and the icon.

Our desired SEM-type network structure stipulates that manifest variables be connected exclusively to the factors and that all the connections with Purchase Intent must also go through the factors. We achieve such a structure by imposing the following sets of forbidden arcs:

1. No arcs between manifest variables

2. No arcs from manifest variables to factors

3. No arcs between manifest variables and Purchase Intent







We can de!ne these forbidden arcs by right-clicking anywhere on the Graph Panel, which brings up the fol-lowing menu.

In BayesiaLab, all manifest variables and all factors are conveniently grouped into classes, so we can easily de!ne which arcs are forbidden in the Forbidden Arc Editor.







Upon completing this step, we can proceed to learning our network again: Learning | Association Discover-ing | EQ







The initial result will resemble the following screenshot.







Using the Force Directed Layout algorithm (shortcut “p”), as before, we can quickly transform this network into a much more interpretable format.

Now we see the manifest variables “laddering up” to the factors, and we also see how the factors are related to each other. Most importantly, we can observe where the Purchase Intent node was attached to the net-work during the learning process. The structure conveys that Purchase Intent has the strongest link with Factor_2.

Now that we can see the big picture, it is perhaps appropriate to give the factors more descriptive names. For obvious reasons, this task is the responsibility of the analyst. In this case study, Factor_0 was given the name “Self-Con"dent”. We add this name into the node comments by double-clicking Factor_0 and scroll-ing to the right inside the Node Editor until we see the Comments tab.







We repeat this for all other nodes, and we can subsequently display the node comments for all factors by clicking the Display Node Comment icon in the toolbar or by selecting View | Display Node Comments from the menu.

Market Driver Analysis

Our Probabilistic Structural Equations Model is now complete, and we can use it to perform the actual analysis part of this exercise, namely to !nd out what “drives” Purchase Intent.

We return to the Validation Mode and right-click on Purchase Intent and then check Set As Target Node. Double-clicking the node while pressing “t” is a helpful shortcut.







This will also change the appearance of the node and literally give it the look of a target.

In order to understand the relationship between the factors and Purchase Intent, we want to tune out all the manifest variables for the time being. We can do so by right-clicking the Use of Classes icon in the bottom right corner of the screen. This will bring up a list of all classes. By default, all are checked and thus visible.







For our purposes, we want to deselect All and then only check the Factor class.

The resulting view has all the manifest variables grayed-out, so the relationship between the factors becomes more prominent. By deselecting the manifest variables, we also exclude them from subsequent analysis.







We will now right-click inside the (currently empty) Monitor Panel and select Monitors Sorted wrt Target Variable Correlations. The keyboard shortcut “x” will do the same.

This brings up the monitor for the target node, Purchase Intent, plus all the monitors for the factors, in the order of the strength of relationship with the Target Node.







This immediately highlights the order of importance of the factors relative to the Target Node, Purchase Intent. Another way of comprehensively displaying the importance is by selecting Reports | Target Analy-sis | Correlations With the Target Node

“Correlations” is more of a metaphor here, as BayesiaLab actually orders the factors by their Mutual In-formation relative to the target node, Purchase Intent.

By clicking Quadrants, we can obtain a type of opportunity graph, which shows the mean value of each factor on the x-axis and the relative Mutual Information with Purchase Intent on the y-axis. Mutual Infor-mation can be interpreted as importance in this context.







By right-clicking on the graph, we can switch between the display of the formal factor names, e.g. Factor_0, Factor_1, etc., and the factor comments, such as Adequacy, Seduction, which is much easier for interpreta-tion.

As in the previous views, it becomes very obvious that the factor Adequacy is most important with regard to Purchase Intent, followed by the factor Seduction. This is very helpful for understanding the overall market dynamics and for communicating the key drivers to managerial decision makers.

The lines dividing the graph into quadrants re"ect the mean values for each axis. The upper-left quadrant highlights opportunities as these particular factors are “above average” in importance, but “below average” in terms of their rating.







Product Driver Analysis

Although this insight is relevant for the whole market, it does not yet allow us to work on improving spe-ci!c products. For this we need to look at product-speci!c graphs. In addition, we may need to introduce constraints as to where we may not have the ability to impact any attributes. Such information must come from the domain expert, in our case from the perfumer, who will determine if and how odoriferous com-pounds can affect the consumers’ perception of the product attributes.

These constraints can be entered into BayesiaLab’s Cost Editor, which is accessible by right-clicking any-where in the Graph Panel. Those attributes, which cannot be changed (as determined by the expert), will be set to “Not Observable”. As we proceed with our analysis, these constraints will be extremely important when searching for realistic product scenarios.

On a side note, an example from the presumably more tangible auto industry may better illustrate such kinds of constraints. For instance, a vehicle platform may have an inherent wheelbase limitation, which thus sets a hard limit regarding the maximum amount of rear passenger legroom. Even if consumers perceived a need for improvement on this attribute, making such a recommendation to the engineers would be futile. As we search for optimum product solutions with our Bayesian network, this is very important to bear in mind and thus we must formally encode these constraints of our domain through the Cost Editor.

Product Optimization

We now return brie"y to the Modeling Mode to include the Product variable, which has been excluded from our analysis thus far. Right-clicking the node and then unchecking Properties | Exclusion will achieve this.

At this time, we will also move beyond the analysis of factors and actually look at the individual product attributes, so we select Manifest from the Display Classes menu.







Back in the Validation Mode, we can perform a Multi Quadrant Analysis: Tools | Multi Quadrant Analysis

This tool allows us to look at the attribute ratings of each product and their respective importance, as ex-pressed with the Mutual Information. Thus, we pick Product as the Selector Node and choose Mutual In-formation for Analysis. In this case, we also want to check Linearize Nodes’ Values, Regenerate Values and specify an Output Directory, where the product-speci!c networks will be saved. In the process of generating the Multi Quadrant Analysis, BayesiaLab produces one Bayesian network for each Product. For all Prod-ucts the network structure will be identical to the network for the entire market, however, the parameters, i.e. the contingency tables, will be speci!c to each Product.

However, before we proceed to the product-speci!c networks, we will !rst see a Multi Quadrant Analysis by Product, and we can select each product’s graph simply by right-clicking and choosing the appropriate product identi!cation number.

Please note that only the observable variables are visible on the chart, i.e. those variables which were not previously de!ned as “Not Observable” in the Cost Editor.







For Product No. 5, Personality is at the very top of the importance scale. But how will the Personality at-tribute compare in the competitive context? If we Display Scales by right-clicking on the graph, it appears that Personality is already at the best level among the competitors, i.e. to the far right of the horizontal scale. On the other hand, on the Fresh attribute Product No. 511 marks the bottom end of the competitive range.



11 Any similarities of identi!ers with actual product names are purely coincidental.





For a perfumer it would thus be reasonable to assume that there is limited room for improvement with re-gard to Personality, and that Fresh perhaps offers a signi!cant opportunity for Product No. 5.







To highlight the differences between products, we will also show Product No. 1 in comparison.

For Product No. 1 it becomes apparent that Intensity is highly important, but that its rating is towards the bottom end of the scale. The perfumer may thus conclude a bolder version of the same fragrance will im-prove Purchase Intent.







Finally, by hovering over any data point in the opportunity chart, BayesiaLab can also display the position of competitors compared to the reference product for any attribute. The screenshot shows Product No. 5 as the reference and the position of competitors on the Personality attribute.

BayesiaLab also allows us to measure and save the “gap to best level” (=variations) for each product and each variable through the Export Variations function. This formally captures our opportunity for improve-ment.







Please note that these variations need to be saved individually by Product.

By now we have all the components necessary for a comprehensive optimization of product attributes:

1. Constraints on “non-actionable” attributes, i.e. excluding those variables, which can’t be affected through product changes.

2. A Bayesian network for each Product.

3. The current attribute rating of each Product and each attribute’s importance relative to Purchase Intent.

4. The “gap to best level” (variation) for each attribute and Product.

With the above, we are now in a position to search for realistic product con!gurations, based on the exist-ing product, which would realistically optimize Purchase Intent.

We proceed individually by Product, and for illustration purposes we use Product No. 5 again. We load the product-speci!c network, which was previously saved when the Multi Quadrant Analysis was performed.







One of the powerful features of BayesiaLab is Target Dynamic Pro"le, which we will apply here on this network to optimize Purchase Intent: Analysis | Report | Target Analysis | Target Dynamic Pro"le







The Target Dynamic Pro"le provides a number of important options:

• Pro"le Search Criterion: we intend to optimize the mean of the Purchase Intent.

• Criterion Optimization: maximization is the objective.







• Search Method: We select Mean and also click on Edit Variations, which allows us to manually stipulate the range of possible variations of each attribute. In our case, however, we had saved the actual variations of Product No. 5 versus the competition, so we load that data set, which subsequently displays the values in the Variation Editor. For example, Fresh could be improved by 10.7% before catching up to the highest-rated product in this attribute.

• Search Stop Criterion: We check Maximum Number of Evidence Reached and set this parameter to 4. This means that no more than the top-four attributes will be suggested for improvement.

Upon completion of all computations, we will obtain a list of product action priorities: Fresh, Fruity, Flow-ery and Wooded.

The highlighted Value/Mean column shows the successive improvement upon implementation of each ac-tion. From initially 3.76, the Purchase Intent improves to 3.92, which may seem like a fairly small step. However, the importance lies in the fact that this improvement is not based on utopian thinking, but rather on attainable product improvements within the range of competitive performance.







Initially, we have the marginal distribution of the attributes and the original mean value for Purchase Intent, i.e. 3.77.

To further illustrate the impact of our product actions, we will simulate their implementation step-by-step, which is available through Inference | Interactive Inference.

With the selector in the toolbar, we can go through each product action step-by-step in the order in which they were recommended.







Upon implementation of the !rst product action, we obtain the following picture and Purchase Intent grows to 3.9. Please note that this is not a sea change in terms of Purchase Intent, but rather a realistic consumer response to a product change.

The second change results in further subtle improvement to Purchase Intent:







The third and fourth step are analogous and bring us to the !nal value for Purchase Intent of 3.92.

Although BayesiaLab generates these recommendations effortlessly, they represent a major innovation in the !eld of marketing science. This particular optimization task has not been tractable with traditional methods.

Conclusion

The presented case study demonstrates how BayesiaLab can transform simple survey data into a deep un-derstanding of consumers’ thinking and quickly provides previously-inconceivable product recommenda-tions. As such, BayesiaLab is a revolutionary tool, especially as the work"ow shown here may take no more than a few hours for an analyst to implement. This kind of rapid and “actionable”12 insight is clearly a breakthrough and creates an entirely new level of relevance of research for business applications.



12 The authors cringe at the in"ationary use of “actionable”, but here, for once, it actually seems appropriate.





Appendix: The Bayesian Network Paradigm13

Acyclic Graphs & Bayes’s Rule

Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the work of geneticist Sewall Wright in the 1920s. Variants have appeared in many !elds. Within statistics, such models are known as directed graphical models; within cognitive science and arti!cial intelligence, such models are known as Bayesian networks. The name honors the Rev. Thomas Bayes (1702-1761), whose rule for updating probabilities in the light of new evidence is the foundation of the approach.

Rev. Bayes addressed both the case of discrete probability distributions of data and the more complicated case of continuous probability distributions. In the discrete case, Bayes’ theorem relates the conditional and marginal probabilities of events A and B, provided that the probability of B does not equal zero:

P(A∣B) = P(B∣A)P(A)

P(B)

In Bayes’ theorem, each probability has a conventional name:

• P(A) is the prior probability (or “unconditional” or “marginal” probability) of A. It is “prior” in the sense that it does not take into account any information about B; however, the event B need not occur after event A. In the nineteenth century, the unconditional probability P(A) in Bayes’s rule was called the “antecedent” probability; in deductive logic, the antecedent set of propositions and the inference rule imply consequences. The unconditional probability P(A) was called “a priori” by Ronald A. Fisher.

• P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the speci!ed value of B.

• P(B|A) is the conditional probability of B given A. It is also called the likelihood.

• P(B) is the prior or marginal probability of B, and acts as a normalizing constant.

Bayes theorem in this form gives a mathematical representation of how the conditional probability of event A given B is related to the converse conditional probability of B given A.

The initial development of Bayesian networks in the late 1970s was motivated by the need to model the top-down (semantic) and bottom-up (perceptual) combination of evidence in reading. The capability for bidirec-tional inferences, combined with a rigorous probabilistic foundation, led to the rapid emergence of Bayesian networks as the method of choice for uncertain reasoning in AI and expert systems replacing earlier, ad hoc rule-based schemes.



13 Adapted from Pearl (2000), used with permission.





The nodes in a Bayesian network represent variables of interest (e.g. the temperature of a device, the gen-der of a patient, a feature of an object, the occur-rence of an event) and the links represent statistical (informational) or causal dependencies among the variables. The dependencies are quanti!ed by condi-tional probabilities for each node given its parents in the network. The network supports the computation of the posterior probabilities of any subset of vari-ables given evidence about any other subset.

Compact Representation of the Joint Probability Distribution

“The central paradigm of probabilistic reasoning is to identify all relevant variables x1, . . . , xN in the environment [i.e. the domain under study], and make a probabilistic model p(x1, . . . , xN) of their interac-tion [i.e. represent the variables’ joint probability distribution].”

Bayesian networks are very attractive for this purpose as they can, by means of factorization, compactly represent the joint probability distribution of all variables.

“Reasoning (inference) is then performed by introducing evidence that sets variables in known states, and subsequently computing probabilities of interest, conditioned on this evidence. The rules of probability, combined with Bayes’ rule make for a complete reasoning system, one which includes traditional deductive logic as a special case.” (Barber, 2012)







ReferencesBarber, David. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.

Darwiche, Adnan. Modeling and Reasoning with Bayesian Networks. 1st ed. Cambridge University Press, 2009.

Heckerman, D. “A Tutorial on Learning with Bayesian Networks.” Innovations in Bayesian Networks (2008): 33–82.

Holmes, Dawn E., ed. Innovations in Bayesian Networks: Theory and Applications. Softcover reprint of hardcover 1st ed. 2008. Springer, 2010.

Kjaerulff, Uffe B., and Anders L. Madsen. Bayesian Networks and In"uence Diagrams: A Guide to Con-struction and Analysis. Softcover reprint of hardcover 1st ed. 2008. Springer, 2010.

Koller, Daphne, and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. 1st ed. The MIT Press, 2009.

Koski, Timo, and John Noble. Bayesian Networks: An Introduction. 1st ed. Wiley, 2009.

Mittal, Ankush. Bayesian Network Technologies: Applications and Graphical Models. Edited by Ankush Mittal and Ashraf Kassim. 1st ed. IGI Publishing, 2007.

Neapolitan, Richard E. Learning Bayesian Networks. Prentice Hall, 2003.

Pearl, Judea. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge University Press, 2009.

———. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. 1st ed. Morgan Kaufmann, 1988.

Pearl, Judea, and Stuart Russell. Bayesian Networks. UCLA Congnitive Systems Laboratory, November 2000. http://bayes.cs.ucla.edu/csl_papers.html.

Pourret, Olivier, Patrick Naïm, and Bruce Marcot, eds. Bayesian Networks: A Practical Guide to Applica-tions. 1st ed. Wiley, 2008.

Schafer, J.L., and M.K. Olsen. “Multiple Imputation for Multivariate Missing-data Problems: A Data Ana-lyst’s Perspective.” Multivariate Behavioral Research 33, no. 4 (1998): 545–571.

Spirtes, Peter; Glymour, Clark. Causation, Prediction and Search. The MIT Press, 2001.



http://bayes.cs.ucla.edu/csl_papers.html

http://bayes.cs.ucla.edu/csl_papers.html





Contact Information

Bayesia USA

312 Hamlet’s End WayFranklin, TN 37067USAPhone: +1 888-386-8383 [email protected]

Bayesia S.A.S.

6, rue Léonard de VinciBP 11953001 Laval CedexFrancePhone: +33(0)2 43 49 75 [email protected]

Bayesia Singapore Pte. Ltd.

20 Cecil Street#14-01, Equity PlazaSingapore 049705Phone: +65 3158 [email protected]

Copyright

© 2013 Bayesia USA, Bayesia S.A.S. and Bayesia Singapore. All rights reserved.













http://www.bayesia.sg

http://www.bayesia.sg





driver analysis and product optimization with bayesian networks

Technology