belief augmented frames belief augmented frames 14 june 2004 colin tan [email protected] ctank

Belief Augmented FramesBelief Augmented Frames14 June 2004

Colin Tan

[email protected]://www.comp.nus.edu.sg/~ctank

Motivation

• Primary Objective:– To study how uncertain and defeasible

knowledge may be integrated into a knowledge base.

• Main Deliverable:– A system of theories and techniques that

allow us to integrate new knowledge we have gained, and to use this knowledge to make better inferences

Proposed Solution

• A frame-based reasoning system augmented with belief measures.– Frame-based system to structure

knowledge and relations between entities.– Belief measures provide uncertain

reasoning on existence of entities and the relationships between them.

Why Belief Measures?

• Statistical Measures– Standard tool for modeling uncertainty.– Essentially, if the probability that a

proposition E is true is p, then the probability of that E is false is 1-p.

• P(E) = p• P(not E) = 1-p


• This relationship between P(E) and P(not E) introduces a problem:– This relationship essentially leaves no

room for ignorance. Either the proposition is true with a probability of p, or it is false with a probability of 1-p.

– This can be counter-intuitive at times.


• [Shortliffe75] cites a study in which, given a set of symptoms, doctors were willing to declare with certainty x that a patient was suffering from a disease D, yet were unwilling to declare with certainty 1-x that the patient was not suffering from D.


• To allow for ignorance our research focuses on belief measures.

• The ability to model ignorance is inherent in belief systems.– E.g. in Dempster-Shafer Theory

[Dempster67], if our belief in E1 and E2 are 0.1 and 0.3 respectively, then the ignorance is (1 – (0.1 + 0.3)) = 0.6.

Why Frames?

• Frames are a powerful form of representation.– Intuitively represents relationships

between objects using slot-filler pairs.• Simple to perform reasoning based on

relationships.– Hierarchical

• Can perform generalizations to create general models derived from a set of frames.

Why Frames?

• Frames are powerful form of representation:– Daemons

• Small programs that are invoked when a frame is instantiated or when a slot is filled.

Combining Frames with Uncertainty Measures

• Augmenting slot-value pairs with uncertainty values.– Enhance expressiveness of relationships.– Can now do reasoning using the

uncertainty values.

• A Belief Augmented Frame (BAF) is a frame structure augmented with belief measures.

Example BAF

Alice,1.0, 0.0

owns0.7, 0.2

walks0.9, 0.1

Donkey0.6, 0.3

color1.0, 0.0

Grey,1.0, 0.0

Dog0.9, 0.0

color1.0, 0.0

location1.0, 0.0

Blue,1.0, 0.0

Bay,1.0, 0.0

Belief Representation in Belief Augmented Frames

• Beliefs are represented by two masses:– φT: Belief mass supporting a proposition.– φF: Belief mass refuting a proposition.– In general φT + φF 1

• Room to model ignorance of the facts.

• Separate belief masses allow us to:– Draw φT

and φF from different sources.

– Have different chains of reasoning for φT and φF.


• This ability to derive the refuting masses from different sources and chains of reasoning is unique to BAF.– In Probabilistic Argumentation Systems (the

closest competitor to BAF) for example, p(not E) = 1 – p(E).

– Possible though to achieve this in Dempster Shafer Theory through the underlying mechanisms generating m(E) and m(not E).


• BAFs however give a formal framework for deriving T and F

– BAF-Logic, a complete reasoning system for BAFs.

• BAFs provide a formal framework for Frame operations.– E.g. how to generalize from a given set of

frames.

• BAF and DST can in fact be complementary:– BAF as a basis of generating masses in DST

Degree of Inclination

• The Degree of Inclination is defined as:– DI = T - F

• DI is in the range of [-1, 1].

• One possible interpretation of DI:

-1 0 1

IgnorantMost

ProbablyFalse

MostProbably

True

ProbablyFalse

ProbablyTrue

-0.75 -0.5 0.5 0.75

False True

-0.25LikelyFalse

0.25

LikelyTrue

Utility Value

• The Degree of Inclination DI can be re-mapped to the range [0, 1] through the Utility function:– U = (DI + 1) / 2– By normalizing U across all relevant

propositions it becomes possible to use U as a statistical measure.

Plausibility, Ignorance, Evidential Interval

• Plausibility pl is defined as:pl = 1 - F

• Ignorance ig is defined as:ig = pl – T

= 1 – (T + F)

• The Evidential Interval EI is defined to be the range

EI =[T, pl]

Interpreting the Evidential Interval

Evidential Interval Interpretation

[0, 1] Complete ignorance.

[0, 0] The evidence provided completely refutes the fact.

[1, 1] The evidence provided completely supports the fact.

[T, Pl] 0 < T, Pl < 1

Pl T

The evidence both supports and refutes the fact.

[T, Pl] 0 < T, Pl < 1

Pl < T

The evidence supporting the fact exceeds the plausibility of the fact. I.e. the evidence is contradictory.

Reasoning with BAFs

• Belief Augmented Frame Logic, or BAF-Logic, is used for reasoning with BAFs.

• Throughout the remainder of this presentation, we will consider two propositions A and B, with supporting and refuting masses T

A, FA, T

B, and F

B.

Reasoning with BAFs AND, OR, NOT

• A B: T

A B = min(TA, T

B) F

A B = max(FA, F

B)

• A B: T

A B = max(TA, T

B) F

A B = min(FA, F

B) A:

T A

= F A

F A

= T A

Default Reasoning in BAF

• When the truth of a proposition is unknown, then we set the supporting and refuting masses to T

DEF and FDEF respectively.

– Conventionally, TDEF = F

DEF = 0

• Two special default values: T

ONE = 1 , FONE = 0

TZERO = 0 , F

ZERO = 1

• Used for defining contradiction and tautology.

Default Reasoning in BAF

• Other default reasoning models are possible too.– E.g. categorical defaults:

• : (A, TA , F

A) (B, TB , F

B) / (B, TB , F

B)• Semantics:

– Given a knowledge base KB.– If KB :- A and KB :-/- B, infer B with supporting

and refuting masses TB and F

B

– Detailed study of this topic still to be made.

BAF and Propositional Logic

• BAF-Logic properties that are identical to Propositional Logic:– Associativity, Commutativity, Distributivity,

Idempotency, Absorption, De-Morgan’s Theorem, - elimination.


• Other properties of Propositional Logic work slightly differently in BAF-Logic.– In particular, some of the properties hold

true only if the constituent propositions are at least “probably true” or “probably false”

• I.e. |DIP | 0.5


• For example, P and P Q must both be at least probably true for Q to not be false.– If DIP and DIP Q are less than 0.5, DIQ

might end up < 0.

• For - elimination, P Q must be probably true, and P must be probably false, before we can infer that Q is not false.


• This can lead to unexpected reasoning results.– E.g. P, P Q are not false, yet DIQ < 0.

• A possible solution is to set {TQ = T

DEF , FQ

= FDEF} when DIP and DIPQ are less than

0.5• In actual fact, the magnitude of DIP and DIP

Q don’t both have to be 0.5. Only their average magnitudes must be 0.5.

Belief Revision

• Beliefs are not static. We need a mechanism to update beliefs [Pollock00].

• To track the revision of belief masses, we add a subscript t to time-stamp the masses.– E.g. T

P,0 is the value of TP at time 0, T

P,1 at time 1 etc.

• At time t, given a proposition P with masses TP, t

and FP,t, suppose we derive masses T

P, * and FP,

*, then the new belief masses at time t+1 are: T

P, t+1 = TP, t + (1- ) T

P, *

FP, t+1 = F

P, t + (1- ) FP, *

Belief Revision

• Intuitively, this means that we give a credibility factor to the existing masses, and (1- ) to the derived masses.

therefore controls the rate at which beliefs are revised, given new evidence.

An Example

• Given the following propositions in your knowledge base:– KB = {(A, 0.7, 0.2), (B, 0.9, 0.1), (C, 0.2,

0.7), (A B R, TONE , F

ONE,), (A B

R, TONE , F

ONE)}

– We want to derive TR, 1, F

R, 1.

An Example

• Combining our clauses regarding R, we obtain:– R = (A B) (A B)

• = A B ( A B)

• With De-Morgan’s Theorem we can derive R: R= A B (A B)

An Example

TR,* = min(T

A , TB , max(F

A , TB ))

= min(0.7, 0.9, max(0.2, 0.9))

= min(0.7, 0.9, 0.9)

= 0.7 F

R,* = max(FA , F

B , min(TA , F

B ))

= max(0.2, 0.1, min(0.7, 0.1))

= max(0.2, 0.1, 0.1)

= 0.2

An Example

• We begin with default values for R: T

R,0 = TDEF

= 0.0

FR,0 = F

DEF

= 0.0

• This gives us the following attributes:

An Example

Measure Value

DIR, 0 0.0

PlR,0 1.0

IgR,0 1.0

EIR,0 [0.0, 1.0]

An Example

• Deriving the new belief values with = 0.4 T

R,1 = 0.4 * 0.0 + (1.0 – 0.4) * 0.7

= 0.42 F

R,1 = 0.4 * 0.0 + (1.0 – 0.4) * 0.2

= 0.12

• This gives us:

An Example

Measure Value

DIR, 1 0.42 – 0.12 = 0.30

PlR,1 1.0 – 0.12 = 0.88

IgR,1 0.88 – 0.42 = 0.46

EIR,1 [0.42, 0.88]

An Example

• We see that with our new information about R, our ignorance falls from 1.0 (total ignorance) to 0.46. With more knowledge available about whether R is true, we also see the plausibility falling from 1.0 to 0.88.

• Further, suppose it is now known that:– B C R

An Example

• Combining our clauses regarding R, we obtain:– R = (A B) (B C) (A B)

= A B C ( A B)

• With De-Morgan’s Theorem we can derive R: R= A B C (A B)

An Example

TR,* = min(T

A , TB , T

C , max(FA , T

B ))

= min(0.7, 0.9, 0.2, max(0.2, 0.9))

= min(0.7, 0.9, 0.2, 0.9)

= 0.2 F

R,* = max(FA , F

B , FC , min(T

A , FB ))

= max(0.2, 0.1, 0.7, min(0.7, 0.1))

= max(0.2, 0.1, 0.7, 0.1)

= 0.7

An Example

• Updating the beliefs: T

R,2 = 0.4 * 0.42 + (1.0 – 0.4) * 0.2

= 0.288 F

R,2 = 0.4 * 0.12 + (1.0 – 0.4) * 0.7

= 0.468

• This gives us:

An Example

Measure Value

DIR, 2 0.288 – 0.468 = -0.18

PlR,2 1.0 – 0.468 = 0.532

IgR,2 0.532 – 0.288 = 0.244

EIR,2 [0.288, 0.532]

An Example

• Here the new evidence that B C R fails to support R, because C is not true (DIC = -0.5)

• Hence the plausibility of R falls from 0.88 to 0.532, while the truth value DIR,2 enters into the negative range.

Integrating Belief Measures with Frames

• Belief measures to quantify:– The existence of the object/concept

represented by the frame.– The existence of relations between frames

Frames with Belief Measures

Kyle

At

Brother

Friend

Likes

Ike

IsA

Kenny

Kicked

Insults

Helped

StudiesAt

(1.0, 0.0)

(0.9, 0.1)(0.9, 0.1)

Camp

(1.0, 0.0)

(0.8, 0.2)

South Park

(1.0, 0.0)

(0.75, 0.25)

Stan

At

Friend

(1.0, 0.0)

(0.9, 0.1)

(0.75, 0.25)

Baby(0.75, 0.25)


• Deriving Belief Values– BAF-Logic statements can be used to derive

belief measures.

• For example, suppose we propose that:– Sam is Bob’s son if Sam is male and Bob has a

child.– Within our knowledge base, we have {(Sam is

male, 0.6, 0.2), (Bob has child, 0.8, 0.1), (Sam is male Bob has child Sam is Bob’s Son, 0.7, 0.1)}


• Assuming that = 0, we can derive: T

sam,son,bob = min(0.6, 0.8, 0.7)= 0.6

Fsam,son,bob = max(0.2, 0.1, 0.1)

= 0.2

DIsam,son, bob = 0.4

Plsam, son, bob = 0.8

Igsam, son, bob = 0.2


• Daemons– Can be activated based on belief masses,

DI, EI, Ig and Pl values.– Can act on DI, EI, Ig, Pl values for further

processing.• E.g. if it is likely that Sam is Bob’s son, and if

the ignorance is less than 0.2, create a new frame School, and set Sam, Student, School relationship.

Frame Operations

• add_frame, del_frame, add_rel, etc. etc.• More interesting operations include abstract:

– Given a set of frames– Create a super-frame that is the parent of the set

of frames.– Copy relations that occur in at least % of the

set of frames to the superframe.– Set the belief masses to be a composition of all

the belief masses in the set for that relation.

Application ExamplesDiscourse Understanding

• Discourse can be translated to a machine understandable form before being cast as BAFs.

• Discourse Representation Structures (DRS) are particularly useful.– Algorithm to convert from DRS to BAF is

trivial [Tan03].


• Setting Belief Masses– Initial belief masses may be set using

fuzzy-sets.• E.g. to model a person being helpful

– Shelpful = {1.0/”invaluable”, 0.75/”very helpful”, 0.5/”helpful”, 0.25/”unhelpful”, 0.0/”uncooperative”}

• If we say that Kenny is very helpful, we can set:

Tkenny_helpful = 0.75

Fkenny_helpful = 1.0 - 0.75= 0.25


• Further propositions and rules may be inserted into the knowledge base to perform reasoning on the initial belief masses.

• Propositions and rules modeled as prolog clauses.

Application ExamplesText Classification

• Can model text classification as a BAF problem:– In BAF-Logic the jth document Dij in the

document class ci is taken to be a conjunction of terms tk:

• Dij = tij0 tij1 … tij(n-1)

– Each term and document is related by a set of relations:

• Rijk = {(Dij, term, tk, Tijk, F

ijk) | tk is a term in Dij}


• Given a set of documents D in class ci, we apply the abstract operator to produce the set of relations characterizing ci.

– v = (Si0, Si1, Si2, … Si(m-1))

• Each Sik is the relation:

– Sik = {(ci, term, tk, Tik, F

ik) | tk occurs in at least % of documents Dj in class ci}

Tik = minj T

ijk

Fik = maxlmaxj T

ljk, l i


Tik is our belief that the term tk implies that the

document belongs to class ci.

Fik is our belief that the term tk implies that the

document belongs to some other class cl.

• Given an unseen document Du, we derive the keyword terms tunk, k. We can derive the following masses that support and refute the proposition that Du belongs to class ci.


Ti, unk = min( T

i0, Ti1, …max( F

i0, Fi1, …))

Fi, unk = max( F

i0, Fi1, …min( T

i0, Ti1, …))

• From this we derive the degree of inclination using the standard definition:– DIi, unk = T

i, unk - Fi, unk

• We choose the class with the largest DI as the winner.– win = argmax DIi,unk

Text ClassificationExperiment I

• Corpus used: 20 Newsgroups– 20,000 USENET articles culled from 20

newsgroups.– 19,600 articles to train classifiers, 400 to test.– Relatively poor performance from classifiers due

to nature of USENET postings.

• Jeffreys-Perks Law used to smoothen statistics.

Classification ResultsInside Testing

Text Classification - Inside Test

0

10

20

30

40

50

60

70

80

90

100

0% 10% 20%

Degree of Abstraction

Cla

ssif

icat

ion

Acc

ura

cy (

%)

NBAYES

BAF

PAS

Classification ResultsOutside Testing

Text Classification - Outside Test

0

10

20

30

40

50

60

70

0% 10% 20%


Cla

ss

ific

ati

on

Ac

cu

rac

y (

%)

NBAYES

BAF

PAS

Classification ResultsOverall

Classification Results - Overall

0

10

20

30

40

50

60

70

80

0% 10% 20%


Cla

ss

ific

ati

on

Ac

cu

rac

y (

%)

NBAYES

BAF

PAS

Text ClassificationAnalysis

• Both BAF and Probabilistic Argumentation Systems (PAS) perform better than Naïve Bayes (NBAYES).

• BAF performs significantly better than PAS for unseen documents.

• However performance for seen documents is mixed. PAS and BAF appear to have similar performance.

Text ClassificationExperiment II

• Corpus Used: Reuters Newswire articles– 2,000 articles in 25 categories for training.– 500 articles for testing.

• Results:– Similar to Experiment I

• Compared with PAS, mixed performance for seen data.• Superior performance for unseen data.• PAS and BAF both have superior performance to Naïve

Bayes.

Text ClassificationConclusions

• Both BAF and PAS perform better than Naïve Bayes.

• BAF and PAS have similar performance for seen data.

• BAF has better performance over PAS for unseen data.

Publications

C. K. Y. Tan, K. T. Lua, “Discourse Understanding with Discourse Representation Theory and Belief Augmented Frames”, 2nd International Conference on Computational Intelligence, Robotics and Autonomous Systems, Singapore, 2003.

C. K. Y. Tan, K. T. Lua, “Belief Augmented Frames for Knowledge Representation in Spoken Dialogue Systems”, 1st International Indian Conference on Artificial Intelligence, Hyderabad, India, 2003.

Publications

C. K. Y. Tan, “Text Classification using Belief Augmented Frames”, 8th Pacific Rim International Conference on Artificial Intelligence, Auckland, 2004.

C. K. Y. Tan, “Belief Augmented Frames”, Doctoral Thesis, Department of Computer Science, School of Computing, National University of Singapore, 2003.

Current and Future Work

• Currently:– Developing a BAF Reasoning Engine

• Future:– Dialog Management using BAFs– Automatic Text Classification– AI Engine for Game Playing

Conclusion

• Use of belief measures to quantify uncertainty.– Room for ignorance

• Use of Frames to organize knowledge.– Frames represent objects or ideas in the world.– Slot-filler pairs represent relations between

frames.– Relations are weighted by belief measures.

References

• [Shortliffe75] E. H. Shortliffe, B. G. Buchanan, “A Model of Inexact Reasoning in Medicine”, Mathematical Biosciences Vol 23, pp 351-379, 1975.

• [Dempster67] A. P. Dempster, “Upper and Lower Probabilities Induced by a Multivalued Mapping”, The Annals of Mathematical Statistics Vol 38 No 2, pp 325-339, 1967

References

• [Pollock00] J. L. Pollock, A. S. Gilles, “Belief Revision and Epistemology”, Synthese 122, pp 69-92, 2000.

• [Tan03] C. K. Y. Tan, K. T. Lua, “Discourse Understanding with Discourse Representation Theory and Belief Augmented Frames”, 2nd International Conference on Computational Intelligence, Robotics and Autonomous Systems, Singapore, 2003.

belief augmented frames belief augmented frames 14 june 2004 colin tan [email protected] ctank

Documents

belief measures

belief representation

belief systems

belief augmented frame

p slide

separate belief masses

uncertainty measures

example baf slide