page 1 . page 2 overview brief project history hugin expert a/s and bayesian technology discussion...

60
Page 1 www.poulinhugin. com

Upload: lamont-gambrell

Post on 28-Mar-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 1

www.poulinhugin.com

Page 2: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 2

Overview

• Brief Project History

• Hugin Expert A/S and Bayesian Technology Discussion

• Poulin Automation Tool Discussion

Page 3: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 3

Hugin Software?• Product maturity and optimisation produce the world’s

fastest Bayesian inference engine• State-of-the-art capabilities based on internal and external

research and development• Practical experience and theoretical excellence combined

form the basis of further product refinement• High-performance and mission critical systems in

numerous areas are constructed using Hugin software

Page 4: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 4

Hugin Expert A/S?• The market leader for more than a decade • Highly skilled researchers and developers• Has strategic cooperation internationally• Has a clear strategy for maintaining its leadership as tool

and technology provider• Part of the world’s largest Bayesian research groups• Has experience from numerous, large-scale, international

R&D projects

Page 5: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 5

Client ListUSAHewlett-Packard Intel Corporation DynastyDrRedDukeXerox Lockheed Martin NASA/Johnson Space Center Boeing Computer Service USDA Forest Service Information Extraction & Transport Inc. Pacific Sierra Research Price Waterhouse Swiss Bank Corporation Bellcore ISX Corporation Lam Research Corporation Orincon Corporation Integrate IT Charles River Analytics Northrop Grumman CHI Systems Inc Voyan Technology Los Alamos National Laboratory Rockwell Science Center Citibank Perkin Elmer Corporation InscomHoneywell Software Initiative Aragon Consulting Group

Raytheon Systems Company Kana CommunicationsSandia National LaboratoriesGE Global ResearchWesthollow Technology CenterGreat BritainRolls-Royce Aerospace Group Philips Research Laboratories USB AG Motorola Defence Research Agency Nuclear Electric Plc Marconi Simulation Lucas Engineering & Systems ltd Lloyd´s Register BT Laboratories Brown & Root Limited Silsoe Research Institute Aon Risk Consultants RailtrackShell Global SolutionsGermanySiemens AG Volkswagen AG DaimlerChrysler AG GSF MedisReutlingen KinderklinikFrancePGCC Technologie ProtecticObjectif Technologies

UsinorCanada Decision Support TechnologiesItalyENEA CRE Casassia C.S.E.L.T. IsraelIBM Haifa Research LaboratoryAustraliaDepartment of Defence, DSTONational Australian BankNetherlandsShell International E&PJapanSumitomo Metal Industries Dentsu Inc.ScandinaviaDefence Research Agency Danish Defence Research Establishm.Aalborg PortlandDanish Agricultural Advisory Center COWIFLS AutomationJudex DatasystemerAON DenmarkABBNykreditSwedpowerSouth AfricaCSIR

Page 6: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 6

Hugin Expert Bayesian Software

Page 7: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 7

Bayes’ Theorem

• Rev. Thomas Bayes (1702-1761), an 18th century priest from England

• The theorem, as generalized by Laplace, is the basic starting point for inference problems using probability theory as logic – assigns degree of belief to propositions

)|(

),|()|(),|(

cEP

cHEPcHPcEHP

likelihoodpriorposterior *

Page 8: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 8

Bayesian Technology• Probablistic graphical models

• Model based approach to decision support• Compact and intuitive graphical representation• Sound & coherent handling of uncertainty

• Reasoning and decision making under uncertainty• Bayesian networks and influence diagrams

Page 9: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 9

A Bayesian Network

• A Bayesian network consists of:• A set of nodes and a set of directed edges between nodes

• The nodes together with the directed edges form a directed acyclic graph (DAG)

• Each node has a finite set of states

• Attached to each node X with parents there is a conditional probability table

• A knowledge representation for reasoning under uncertainty

nYY ,...,1

),...,|( 1 nYYXP

),( PGN

Page 10: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 10

Bayesian Expert Systems

• Induce structure of the graphical representation• Fusion of data & expert knowledge

• Estimate parameters• Fusion of data & expert knowledge

Generativedistribution

Page 11: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 11

Implementation

• Cause and effect relations represented in an acyclic, directed graph• Strengths of relations are encoded using probabilities• Compute probabilities of events given observations on other events• Fusion of data and domain knowledge• Analyse results using techniques like conflict & sensitivity analysis

Page 12: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 12

Example: Car Won’t Start

Page 13: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 13

Technology Summary

• A compact and intuitive graphical representation of causal relations

• Coherent and mathematically sound handling of uncertainty and decisions

• Construction and adaptation ofBayesian networks based on data sets

• Efficient solution of queries against the Bayesian network• Analysis tools such as

• Data conflict, Explanation, Sensitivity, Value of information analysis

Page 14: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 14

What Does This Do For You?

• Reasoning and decision making under uncertainty supporting• Diagnosis• Prediction• Process analysis and supervision• Filterting & classification• Control• Troubleshooting• Predictive maintenance• …

Page 15: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 15

Bayesian Applications• Medicine – forensic identification, diagnosis of muscle and

nerve diseases, antibiotic treatment, diabetes advisory system, triage (AskRed.com)

• Software – software debugging, printer troubleshooting, safety and risk evaluation of complex systems, help facilities in Microsoft Office products

• Information Processing – information filtering, display of information for time-critical decisions, fault analysis in aircraft control

• Industry – diagnosis and repair of on-board unmanned underwater vehicles, prediction of parts demand, control of centrifugal pumps, process control in wastewater purification.

• Economy – prediction of default, credit application evaluation, portfolio risk and return analysis

• Military – NATO Airborne Early Warning & Control Program, situation assessment

• Agriculture – blood typing and parentage verification of cattle, replacement of milk cattle, mildew management in winter wheat

Page 16: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 16

• General purpose decision support• Hugin Explorer

• Hugin graphical user interface

• Hugin Developer• Hugin graphical user interface

• Hugin decision engine • APIs (C, C++, Java) and ActiveX server

• Troubleshooting • Hugin Advisor

• A suite of tools for troubleshooting

• Data mining• Hugin Clementine Link

Hugin Products

Page 17: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 17

POULIN HUGIN Automation Tool v0.1

Page 18: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 18

Vision• To create an application that would provide automation for

The Hugin Decision Engine.• Focus on main Bayesian Inference capabilites• Build automation capabile command line tool• Build data parser for formating of structured/unstructured data• Divide problem space and build meta-database• Integrate with Hugin GUI for human based knowledge discovery

Page 19: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 19

Methodology

• The Naive Bayes Model• Structure, variables and states,

• Discretization using Principle of Maximum Entropy

• Parameter estimation using EM

• The Tree Augmented Naive Bayes Model• Interdependence relations between information

variables based on mutual information • (extra step compared to NBM)

• Model update by adding new nodes as in NBM• Value of information (variables and cases)• Evidence sensitivity analysis (what-if)

Page 20: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 20

Functionality

• Data preparation• Model construction - build Naive Bayes Model orTree Augmented NBM• Model update - Add additional information variables• Inference - compute probability of target given evidence• What-if analysis - robustness of probabilities• Value of Information analysis

• Which case is most informative• Which observation is most informative

Command-line interface

Analysis Model Data

Page 21: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 21

Features• An application for construction of a Naive Bayes Model

• Updating a Naive Bayes Model• Construction of a Tree Augmented Naive Bayes Model• Inference base on a case• What-if sensitivity analysis (a single piece of evidence)• Value-of-information analysis (cases and observations)

• Error handling and tracing have been kept at a minimum.• Implemented in C++ using Hugin Decision Engine 6.3

• Runs on Windows 2k, Linux Redhat, Sun Solaris.

• Program documentation in HTML.

Page 22: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 22

Tools• The POULIN-HUGIN package consists of a set of tools for

• Data preparation : • dat2hcs, class2net, class2dat, weather2dat, pull, struct2dat, ustruct2dat,

ustruct2hcs

• Model construction & update : • ph

• Inference : • ph

• Analysis : • ph

Page 23: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 23

Data Sample• Source data are from the Global Summary of the Day (GSOD) database

archived by the National Climatic Data Center (NCDC).• Used average daily temperature (of 24 hourly temperature readings) in 145

US cities measured from January 1, 1995 to December 29, 2003.

• Data of 3,255 cases split into subsets for learning, updating, cases, and case files.

• learning: 2000 cases• update: 1000 cases• cases: 10 cases• case files: 245 cases

• 2,698 missing values out of 471,975 entries: 0.006% missing values.

Page 24: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 24

Discretization• Measures on average daily temperatures are continuous by nature.• Continuous variables can be represented as discrete variables through

discretization.• Determining intervals: how many, width, equally sized, . . . ?

• We discretize using the principle of maximum entropy, but can easily make equidistant (uniform) discretization.

Page 25: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 25

Discretization

• Entropy can be considered as a measure of information. Obtain uninformative distribution under current information.

• Principle of Maximum Entropy• By choosing to use the distribution with the maximum entropy allowed by our

information, the argument goes, we are choosing the most uninformative distribution possible. To choose a distribution with lower entropy would be to assume information we do not possess; to choose one with a higher entropy would violate the constraints of the information we do possess. Thus the maximum entropy distribution is the only reasonable distribution.

• Discretize variables to have uniform distribution based on data.

Page 26: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 26

Model Specification

• A Bayesian network consists of a• qualitative part, the graph structure (DAG G).

• quantitative part, the the conditional probability distributions (P).

• Model specification consists of two parts.• A Bayesian network N is minimal if and only if, for every node X and for

every parent Y, X is not independent of Y given the other parents of X

),( PGN

Page 27: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 27

Naive Bayes Model• A well-suited model for classification tasks and tasks of the following type

• An exhaustive set of mutex hypotheses h1; : : : ; hn are of interest• Measures on indicators I1; : : : ; In to predict hi

• The Naive Bayes Model• h1; : : : ; hn are represented as states of a hypothesis variable H• Information variables I1; : : : ; In are children of H

• The fundamental assumption is that I1; : : : ; In are pairwise independent when H is known.

• Computationally and representationally a very efficient model that provides good results in many cases.

Page 28: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 28

Naive Bayes Model

• The Naive Bayes Model in more details• Let the possible hypotheses be collected

into one hypothesis variable H with prior P(H).• For each information variable I, acquire P(I | H) = L(H | I).• For any set of observations calculate:

• The posterior is where

• The conclusion may be misleading as the assumption may not hold

Page 29: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 29 [email protected], May 11th, 2004

NBM Model Construction

ph -nbm <data> <target> <states> <iterations> [-verbose]

• This command builds a NBM model from the data contained in <data> with <target> as the hypothesis variable.

• All variables will have a maximum of <states> states. • As many as <iterations> iterations of the EM algorithm will be performed• The model constructed is saved in file "nbm.net", which can be loaded into

Hugin Graphical User Interface for inspection• Example: ph -nmb model.dat MDWASHDC 2 1

Page 30: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 30 [email protected], May 11th, 2004

Binary NBM Model Construction

ph -boolnbm <data> <target> <states> <iterations> [-verbose]

• This command builds a Boolean NBM model from the data contained in <data> with <target> as the hypothesis variable.

• All variables will be Boolean indicating the presence of a word (the word represented by a variable is equal to the label of the variable).

• As many as <iterations> iterations of the EM algorithm will be performed.• The model constructed is saved in file ”boolnbm.net", which can be loaded

into Hugin Graphical User Interface for inspection.• Example: ph -boolnmb model.dat MDWASHDC 2 1

Page 31: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 31

Tree-Augmented NBM Model

• Let M be a Naive Bayes Model with hypothesis H and information variables I = fI1; : : : ; Ing

• We can use I(Ii; Ij j H) to measure the conditional dependency between two information variables Ii; Ij conditional on H.

• After computing I(Ii; Ij j H) for all Ii; Ij, we use Kruskal’s algorithm to find a maximum weight spanning tree T on I:

• The edges of T are directed such that no variable has more than two parents (H and one other I).

• Complexity of inference becomes polynomial in the number of information variables.

Page 32: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 32

TAN Model Construction

ph -tan <data> <target> <states> <iterations> [-verbose]

• This command builds a Tree-Augmented Naive Bayes model (TAN) from the data contained in <data> with <target> as the hypothesis variable.

• All variables will have a maximum of <states> states. • As many as <iterations> iterations of the EM algorithm will be performed.• The model constructed is saved in file "tan.net", which can be loaded into

Hugin Graphical User Interface for inspection.• Example: ph -tan model.dat MDWASHDC 2 1

Page 33: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 33

Binary TAN Model Construction

ph -booltan <data> <target> <states> <iterations> [-verbose]

• This command builds a Tree-Augmented Boolean Naive Bayes model (TAN) from the data contained in <data> with <target> as the hypothesis variable.

• All variables will be Boolean indicating the presence of a word (the word represented by a variable is equal to the label of the variable).

• As many as <iterations> iterations of the EM algorithm will be performed.• The model constructed is saved in file ”booltan.net", which can be loaded

into Hugin Graphical User Interface for inspection.• Example: ph -booltan model.dat MDWASHDC 2 1

Page 34: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 34

Model Updates

ph -update <data> <model> <target> <states> <iterations> [-verbose]

• This command updates a model with data contained in <data>. <target> is the hypothesis variable of the model stored in <model>.

• Variables in the data not represented in the original model will be added to the model as children of the hypothesis variable (no structure between information variables is added). The data file should contain measures on all variables (old and new).

• All new variables will have a maximum of <states> states. • As many as <iterations> iterations of the EM algorithm will be performed.• The updated model is saved in file "update.net", which can be loaded into

Hugin Graphical User Interface for inspection.• Example: ph -update update.dat model.net MDWASHDC 2 1

Page 35: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 35

Parameter Estimation

• Parameter learning is identification of the CPTs of the Bayesian network.• theoretical considerations, database of cases, subjective estimates.

• The CPTs are constructed based on a database of cases D = fc1; : • There may be missing values in some of the cases indicated by N/A.

• The CPTs are learned by maximum likelihood estimation:

• where n(Y = y) is the (expected) number of cases for which Y = y.

Page 36: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 36

Parameter Estimation

Page 37: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 37

Parameter Estimation

• Prior (domain expert) knowledge can be exploited.• Experience is the number of times pa(Xi) = j has been observed.

• Experience count is positive number j > 0.• Also used to turn on/off learning.

• Prior knowledge is used both to speed up and guide learning in search of global optimum

• Expected counts used when values are missing.• Including parameters not appearing in the data.

• The EM algorithm is an iterative procedure using the current estimate of the parameters as the true values. In the first run the initial content is used.

Page 38: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 38

Inference

ph -inference <model> <target> <case> [-verbose]

• This command performs inference in <model>, which has <target> as hypothesis variable. The posterior distribution in <target> is displayed for the case stored in the file <case>

• Example: ph -inference nbm.net MDWASHDC case.hcs

Page 39: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 39

VOI in Bayesian Networks

• How do we perform value of information analysis without specifying utilities?

• The reason for acquiring more information is to decrease the uncertainty about the hypothesis.

• The entropy is a measure of how much probability mass is scattered around on the states (the degree of chaos).

• Thus, where • Entropy is a measure of randomness. The more random a variable is,

the higher entropy its distribution will have.

Page 40: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 40

Value of Information• If the entropy is to be used as a value function, then

• We want to minimize the entropy

Page 41: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 41

• What is the expected most informative observation ?• A measure of the reduction of the entropy of T given X.

• The conditional entropy is

• Let T be the target, now select X with maximum information gain

Variables Value of Information

Page 42: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 42

Variables Value of Information• Assume we are interested in B, i.e. B is target:

• We are interested in observing variable Y with most information on B:

• We select to observe and compute:

• Thus,

Page 43: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 43

Variables VOI Command Line

ph -voivariables <model> <target> <case> [-verbose]

• This command performs a value-of-information analysis on each non-observed variable given the observations in <case> relative to <target>. That is, for each unobserved variable in <case>, a measure of how well the variable predicts <target> is displayed.

• Example: ph -voivariables nbm.net MDWASHDC case_2.hcs

Page 44: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 44

Case Value of Information

• Assume T is the target of interest and assume we have a database of cases D = fc1; : : :• The uncertainty in T can be measured as H(T):

• A high value of H(T) indicates high uncertainty• A low value of H(T) indicates low uncertainty

• Entropy for the binary case E(H)

• We compute H(T j c) for all cases c.• The case c producing the lowest value of H(T j c) is considered most

informative wrt. T.

Page 45: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 45

Case VOI Command Line

ph -voicase <model> <target> <case> [-verbose]

• This command performs a value-of-information analysis on the case stored in <case> relative to <target>. That is, a measure of how well the case predicts <target> is displayed.

• Example: ph -voicase tan.net MDWASHDC case_2.dat

Page 46: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 46

Evidence Sensitivity Analysis

• Let = f1; : : : ; ng be a set of observations and assume a single hypothesis h is of interest.

• What-if the observation i had not been made, but instead ?• Involves computing P(h j [ f0ig n fig) and comparing results.• This kind of analysis will help you determine, if a subset of evidence acts

for or against a hypothesis.

Page 47: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 47

What-If Analysis• What happens to the temperature inWashington, DC if the temperature in Austin, TX

changes?• Assume evidence = f1; : : : ; ng and let i be the measured temperature in

Austin, TX

• We compute P(T = t j n fig [ f0ig) for all• Myopic what-if analysis: change finding on one information marginal and

monitor the change in probability of T

Page 48: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 48

What-If Analysis: Cases

ph -whatif <model> <target> <case>

• This command performs a what-if analysis on each instantiated variable in the case file <case> relative to <target>. That is, the posterior distribution of each hypothesis (each state of the target variable) is displayed for each possible value of the observed variables.

• Example: ph -whatif model.net MDWASHDC case.hcs

Page 49: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 49

What-If Analysis: Variables

ph -whatif <model> <target> <case> <variable>

• This command performs a what-if analysis on <variable> relative to <target>. That is, the posterior distribution of each hypothesis (each state of the target variable) is displayed for each possible value of the indicated variable.

• Example: ph -whatif model.net MDWASHDC case.hcs TXAUSTIN

Page 50: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 50

Help

ph –help

• This command displays as simple help.

Page 51: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 51

Create Weather Data File

weather2dat <output> <input> …

• This command will create a HUGIN data file from the input files specified as arguments.

• Example: weather2data model.data MDWASHDC.txt MDBALTIM.txt LAVIENTIN.txt

Page 52: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 52

Pull a Web Page

pull <url> <output>

• This command will save the web page specified in <url> to a file name <output>

• Example: pull http://www.hugin.com hugin.html

Page 53: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 53

Parse Structured HTML

struct2dat <html> [<output>]

• This command will parse <html> and output the content of any tables specified in <html>.

• The content is either output to standard output or stored in a file named <output>

• Example: struct2dat page.html model.dat

Page 54: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 54

Parse Unstructured HTML

ustruct2dat <html> [<output>]

• This command will parse <html> removing all HTML tags• The content is either output to standard output or stored in a file named

<output>• Example: ustruct2dat page.html model.dat

Page 55: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 55

Classification to Data File

class2dat <model> <target> <classification> <output>

• This command will create a HUGIN data set based on the variables stored in <model> and the data stored in <classification>.

• The resulting data set will be stored in a file named <output> • Example: class2dat variables.net class classification.txt model.dat

Page 56: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 56

Feature Selection

• The identification of predictor variables proceeds by statistical tests for independence

• We test the strength of the dependence between the classification variable and each potential predictor variable • Chi-squarre test between class variable X and predictor Y:

• Hypothesis : X and Y are independent (Y is not a predictor)• Compute test statistic

• If is sufficiently small, i.e. then hypothesis is not rejected (=> Y is not included in the model)

• The hypothesis is rejected if the probability of obtaining a larger statistic is less than 5 % (significance level)

Page 57: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 57

Identify Predictor Variables

class2net <target> <classification> <output>

• This command will identify a set of predictor variables based on the data stored in <classification>.

• The resulting set of predictor variables is saved to a HUGIN network file named <output>.

• Each variable is Boolean indicating whether or not the word represented by the variable is present.

• The predictor variables are identified based on statistical tests for pair-wise independence between <target> and each potential predictor variable (currently the significance level is 5%).

• Example: class2net class classification.txt variables.net

Page 58: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 58

Parse Unstructured HTML

ustruct2hcs <model> <target> <unstruct> …

• This command will parse a sequence of unstructed HTML files creating one HUGIN case file for each HTML file

• The command identifies whether or not each of the variables (except <target>) in the model stored in <model> is present in the HTML file

• The case file is either output to standard output or stored in a file named <output>

• Example: ustruct2hcs boolnbm.net class page1.html page2.html

Page 59: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 59

Extract Case From Data File

dat2hcs <model> <target> <data> <index> [-verbose]

• This command will save the case with <index> in the data file named <data> to a file name ”case_” + <index> +”.hcs”

• Example: dat2hcs nbm.net MDWASHDC model.dat 2

Page 60: Page 1 . Page 2 Overview Brief Project History Hugin Expert A/S and Bayesian Technology Discussion Poulin Automation Tool Discussion

Page 60

Contact Information

Anders L. Madsen

Hugin Expert A/S

Gasværksvej 5

9000 Aalborg

Denmarkwww.hugin.comPhone: +45 96 55 07 90Fax: +45 96 55 07 99

www.poulinhugin.comChris Poulin

Poulin Holdings LLC

P.O. Box 969

Portsmouth, NH 03802

USwww.poulinholdings.comPhone: +1 617 755 9049Fax: +1 207 351 2509