[icist 2013] a probabilistic relational data model for uncertain information

28
A Probabilistic Relational Data Model for Uncertain Information Nguyen Hoa and Tran Duc Hieu IEEE 2013 the 3 rd International Conference on Information Science and Technology (ICIST 2013) March23-25, Yangzhou, Jiangsu, China & March 27-28, Phuket, Thailand Reporter: Tran Duc Hieu Department for Computational and Knowledge Engineering Institute of Applied Mechanics and Informatics Vietnam Academy of Science and Technology

Upload: hieu-tran

Post on 26-Jan-2015

106 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

A Probabilistic Relational Data

Model for Uncertain Information

Nguyen Hoa and Tran Duc Hieu

IEEE 2013 the 3rd International Conference on Information Science and Technology (ICIST 2013)

March23-25, Yangzhou, Jiangsu, China & March 27-28, Phuket, Thailand

Reporter: Tran Duc Hieu Department for Computational and Knowledge Engineering

Institute of Applied Mechanics and Informatics

Vietnam Academy of Science and Technology

Page 2: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Contents

Introduction 1

Uncertain Attribute Values 2

Probabilistic Relational Data Base model 3

Selection Operation 4

PRDB Management System 3

Conclusions and Future Works 4

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 2

Page 3: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Introduction

• Motivation The restriction of Traditional Relational Database

(RDB) in representing and handling uncertain and

imprecise information

Uncertain or imprecise information is very

important and also very popular in daily life

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 3

Page 4: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Introduction

• Objectives Build a new Probabilistic Relational Data Base

(PRDB) model to represent and handle uncertain

information in the real world

Build an initial PRDB-SQLite Management System

to demonstrate the ability to apply and process of

PRDB in reality

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 4

Page 5: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Some Probabilistic Combination Strategies

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 5

Strategy Operators

Independence ([L1, U1] in [L2, U2]) [L1 . L2, U1 . U2]

([L1, U1] in [L2, U2]) [L1 + L2 – (L1 . L2), U1 + U2 – (U1 . U2)]

([L1, U1] ⊖in [L2, U2]) [L1 . (1 – U2), U1 . (1– L2)]

Mutual Exclusion ([L1, U1] me [L2, U2]) [0, 0]

([L1, U1] me [L2, U2]) [min(1, L1 + L2), min(1, U1 + U2)]

([L1, U1] ⊖me [L2, U2]) [L1, min(U1, 1 – L2)]

• Prob(e1) = [L1, U1], prob(e2) = [L2, U2]

Prob(e1 e2), Prob(e1 e2), Prob(e1 e2) is calculated as followed

Page 6: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Uncertain Attribute Values

• In PRDB the value of each attribute is a probabilistic triple

A: V, α, β

V: a set of values of the atribute A ( V = {v1, v2,…,vk} )

α, β: lower bound and upper bound probabilistic

distribution on V

The attribute A take a value v in V with a probability

belongs to [α(v), β(v)]

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 6

Page 7: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

PRDB Model

PRDB model is extended from RDB model by

integrating uncertain attribute values

Each tuple of a relation is a list of probabilistic triples

t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 7

Page 8: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Probabilistic Relations

A probabilistic relation r over a probabilistic relational

schema R(A1, A2, …, Ak) is

r = {t t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)}

Example

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 8

PATIENT_ID PHYSICIAN_ID DISEASE DURATION

PT0421, u, u DT005, u, u lung cancer, tuberculosis, 0.8u, 1.2u 400, 500, u, u

PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u

PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u

Page 9: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Probabilistic Functional Dependencies

The probabilistic measure for equal attribute values

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 9

where t1.A = V1, 1, 1, t2.A = V2, 2, 2 and

[(v), (v)] = [1(v1), 1(v1)] [2(v2), 2(v2)],

v W = (v1, v2) V1 V2 v1 = v2

[vW (v), min(1, vW (v))], if W

prob(t1.A = t2.A) =

[0, 0], if W =

Page 10: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Probabilistic Functional Dependencies

The probabilistic functional dependency in PRDB is

extended from the functional dependency in RDB

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 10

t1, t2 r, prob(t1[X] = t2[X]) prob(t1[Y] = t2[Y])

Page 11: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Selection Expressions

x.A v x X, A is an attribute in R, is a

binary relation from =, , , , , ≥

and v is a value

x.A1 = x.A2 is a probabilistic conjunction strategy of

combining the probabilities for x.A1 = v1

and x.A2 = v2 so that v1 = v2

E1 E2 E1 and E2 are selection expressions

E1 E2 is a probabilistic disjunction strategy

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 11

Page 12: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Selection Expressions

Example Relation DIAGNOSE

Selection expression

(x.DURATION 40) (x.COST 60)

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 12

PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST

PT0421, u, u DT005, u, u lung cancer, tuberculosis,

0.8u, 1.2u 400, 500, u, u 300, 350, u, u

PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u

PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u

Page 13: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Selection Conditions

(E)[L, U] E is a selection expression [L, U] is an

probabilistic interval

( ) and are selection conditions

( )

Example

((x.DURATION 40) (x.COST 60))[0.4, 0.6])

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 13

Page 14: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Probabilistic Interpretation of Selection Expressions

probt(E) is a probabilistic interval for a tuple t to satisfy

selection expression E

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 14

Page 15: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Probabilistic Interpretation of Selection Expressions

Probt (x.A d) = [vW (v), min(1, vW (v))]

Probt (x.A1 = x.A2) = [vW (v), min(1, vW (v))]

Probt (E1 E2) = probt (E1) probt (E2)

Probt (E1 E2) = probt (E1) probt (E2)

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 15

Page 16: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Satisfaction of Selection Conditions

Probt ⊨ (E)[L, U] if and only if probt(E) [L, U]

Probt ⊨ if and only if probt ⊨ does not hold

Probt ⊨ if and only if probt ⊨ and probt ⊨

Probt ⊨ if and only if probt ⊨ or probt ⊨

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 16

Page 17: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Selection Operation

The selection on a relation r with respect selection

condition

(r) = t r | probR,r,t ⊨

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 17

Page 18: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Selection Operation

Example Relation DIAGNOSE

Selection operation on DIAGNOSE with the selection condition

(x.DISEASE = hepatitis in x.COST 70)[0.4, 0.6])

is t =

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 18

PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST

PT0421, u, u DT005, u, u lung cancer, tuberculosis,

0.8u, 1.2u 400, 500, u, u 300, 350, u, u

PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u

PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u

Page 19: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

PRDB-SQLite Architecture

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 19

Page 20: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

PRDB-SQLite Schema

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 20

Page 21: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

PRDB-SQLite Relation

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 21

Page 22: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

PRDB-SQLite Query

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 22

Page 23: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Conclusions & Future Works

• Conclusions Building a new PRDB model which is extended from RDB

model

Uncertain values in PRDB model are represented by a

probabilistic triple

The notions of schema, relation, functional dependency,

and selection on PRDB are respectively defined

Implement a simple visual management system for PRDB

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 23

Page 24: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Conclusions & Future Works

• Future Works Build all other relational algebra operations on PRDB

Build a complete database management system for PRDB

Integrate fuzzy set value into the attribute value to build a

fuzzy and probabilistic relational data base

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 24

Page 25: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Any questions for us ?

March 27-28th 2013 in Kathu, Phuket, Thailand Sea Pearl Villa Resort

Page 26: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

References

[1] R. Cavallo, M. Pittarelli, “The theory of probabilistic databases”, in Proc. 13th International

Conf. on Very Large Data Bases, Brighton, England, 1987, pp. 71-81.

[2] E. F. Codd, “A Relational model of data for large shared data banks”, Communications of

the Association for Computing Machinery, vol. 13,June. 1970, pp. 377-387.

[3] N. Fuhr, T. Rolleke, “A probabilistic relational algebra for the integration of information

retrieval and database systems”, Association for Computing Machinery Transactions on

Information Systems, vol. 15, Jan. 1997, pp. 32-66.

[4] T. Eiter, T. Lukasiewicz, and M. Walter, “Extension of the relational algebra to probabilistic

complex values”, in Proc. 1th International Symposium on Foundations of Information and

Knowledge System, Burg, Germany, 2000, 1762, pp. 95-115.

[5] T. Eiter, J. J. Lu, T. Lukasiewicz, and V. S. Subrahmanian, “Probabilistic object bases”,

Association for Computing Machinery Transactions on Database Systems, vol. 26, 2001, pp.

264–312.

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 26

Page 27: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

References

[6] H. Garcia-Molina, J. D. Ullman, J. Widom, Database systems: the complete book, 2nd ed.,

Prentice Hall, Upper Saddle River, New Jersey, 2002.

[7] T. Imielinski, Jr. W. Lipski, “Incomplete Information in Relational Databases”, Journal of the

Association for Computing Machinery, vol. 31 issue 4, Oct. 1984, pp. 761-791.

[8] L. V. S. Lakshmanan, N. Leone, R. Ross, V. S. Subrahmanian, “Probview: A flexible

probabilistic database system”, Association for Computing Machinery Transactions on Database

Systems, vol. 22, 1997, pp. 419-469.

[9] H. Nguyen, T. H. Cao, “Extending probabilistic object bases with uncertain applicability and

imprecise values of class properties”, in Proc. 5th IEEE International Conf. on Fuzzy Systems,

London, England, 2007, pp. 487-492.

[10] T. H. Cao, H. Nguyen, “Uncertain and fuzzy object bases: a data model and Algebraic

operations”, International Journal of Transaction on Fuzzy Systems, 2011, pp. 275-305.

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 27

Page 28: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

References

[11] H. D. Tran, “Constructing A Probabilistic Relational Data Base”, B.A. thesis, Dept.

Information. Tech., Ho Chi Minh City Open Univ., Ho Chi Minh City, Vietnam, 2010.

[12] W. Zhao, A. Dekhtyar, J. Goldsmith, “Databases for interval probabilities”, International

Journal of Intelligent Systems, vol. 19, 2009, pp. 789–815.

[13] W. Zhao, A. Dekhtyar, J. Goldsmith, “Query algebra operations for interval probabilities”,

in Proc. 14th International Conf. on Database and Expert Systems Applications, Prague, Czech

Republic, 2003, pp. 527-536.

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 28