trio database system management  · web viewtrio system is more efficient, flexible, and powerful...

26
Table of contents: 1.0 Understand Trio new database system…………………................................................ ...............1 Trio System Architecture.................................................. ........................................................1 Uncertainty-Lineage Database (ULID) ……............................................................... ..................1 Trio: The Trio Query Language……....................................................... .....................................2 2.0 System requirements of installing Trio system........................................................... ............3 3.0 Source codes/packages for Trio system……………………………………….........................................3 4.0 Procedure of Trio system installation...................................................................................... 4 Python…….................................................... ............................................................ ...........4 Readline…................................................... ............................................................ ............4 Ctypes…….................................................... ............................................................ ............4 PostgreSQL……................................................ ............................................................ ........4 Graphviz………................................................. ............................................................ .........5 Easy_install................................................ ............................................................ .............5 PyGreSQL.................................................... ............................................................ ............5 Pylons………................................................... ................................................................. .....5 PLY……....................................................... ................................................................. .........5 PyParsing................................................... ................................................................. .........5

Upload: others

Post on 23-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Table of contents:1.0 Understand Trio new database system…………………...............................................................1

Trio System Architecture..........................................................................................................1Uncertainty-Lineage Database (ULID)…….................................................................................1Trio: The Trio Query Language……............................................................................................2

2.0 System requirements of installing Trio system.......................................................................33.0 Source codes/packages for Trio system……………………………………….........................................34.0 Procedure of Trio system installation......................................................................................4

Python……...........................................................................................................................4Readline…...........................................................................................................................4Ctypes……............................................................................................................................4PostgreSQL……....................................................................................................................4Graphviz………......................................................................................................................5Easy_install.........................................................................................................................5PyGreSQL............................................................................................................................5Pylons……….........................................................................................................................5PLY…….................................................................................................................................5PyParsing.............................................................................................................................

5PyDot..................................................................................................................................5Trio

API................................................................................................................................55.0 Configuration of Trio System.…………………...............................................................................6

Windows superuser authentication to access PostgreSQL.................................................6TrioExploerer…………………….………………...............................................................................6TrioPlus…………………….……………...........................................................................................6

6.0 Experiment Trio DBMS Using TrioExplorer and TrioPlus…………..............................................7TrioExplorer (Home, Schema, Script, and Help).................................................................7TrioPlus...............................................................................................................................9

7.0 Trio API Integrated Into Other Python Scripts.........................................................................9TrioCnx……………..................................................................................................................9TrioCursor…………………........................................................................................................9XTuple……………………………………………………………………..........................................................9Alternative………………….......................................................................................................9

8.0 Trio API and Translator (Python)...........................................................................................109.0 Trio Query Language (TriQL) Structure..................................................................................10

Current supported for TriQL.............................................................................................10Drop index/table...............................................................................................................11Create Trio table/index.....................................................................................................11TriQL language (uncertainty, Compute confidences).......................................................12Table of TriQL contents.....................................................................................................16

10.0 Advantage and Disadvantage of using Trio DBMS ............................................................17Advantage.........................................................................................................................17Disadvantage....................................................................................................................17

11.0 Documenting Trio system report/bugs………….....................................................................1812.0 Written Components………………………….……….......................................................................18

Turn-in documents............................................................................................................18Demo and presentation....................................................................................................19

13.0 References………………..…….…………………………........................................................................19

Page 2: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

1.0 Understanding of Trio new Database System:

Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec, 2006. It is based on an extended relational model called Uncertainty Lineage Database (ULDB) [1], and also supports Trio’s query language Called TriQL [1]. This new database system technology handles structured data, uncertainty of data, and data lineage together in a fully integrated manner. In order to understand deeply and explicitly about this new Trio DBMS different than others, we are going to look at its architecture/design, ULDB data model, and TriQL.

1.1 Trio System architecture:

M. Mutsuzaki, M. Theobald, etc in [1] summarized in details of Trio-1.0 system architecture, design, and data structure handling. As seem in Fig 1-System architecture, there are four primary components comprised as Trio DBMS, including command-line client (TrioPlus), TrioExplorer, Trio API and translator (Python), and standard relational DBMS (PostgreSQL). Also standard relational DBMS (PostgreSQL) contained four essential elements – Encoded data table, Lineage table, Trio metadata, and Trio Stored Procedures. Clustering these four together becomes Uncertain Lineage Database (ULDB) [1] on handling uncertainty and lineage data during execution. Trio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts TriQL queries in addition to regular SQL, and query results may be x-tuples in the ULDB model as well as regular tuples.”

(Fig 1) System architecture [1]

1.2 Uncertainty-Lineage Database (ULDB)

ULDB, which called Uncertainty-Lineage Database, was mentioned in [2] as “the first database formalism to integrate uncertainty and lineage”. The solution of handling uncertainty and lineage is a primary key of developing Trio DBMS. What are uncertainty and lineage? As discovered by O. Benjelloun, A. Das Sarma , etc in [2], the definition of uncertainty is “…captured by tuples that may include several alternative possible values, with optional confidence values associated with each alternative.” And lineage is as “…associates with a data item information about its derivation.” After cleared out the concepts of ULDB, the next step focuses on ULDB data model which extended the standard SQL relational model into alternatives, maybe (‘?’) annotations, numerical confidence, and lineage [2]. In the following discussion and demo, I will use Drives (person, color, car) and Saw (witness, color, car) uncertainty tables with/no with confidence as an example. A table with blue color in TrioExplor is uncertainty and orange color is with computing confidence. The source code is listed in Learning Trio Query Language (TriQL) structure section.

1

Page 3: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

1.2.1 Alternatives

In ULDB, it is consisted with x-tuple (x-relation). Each x-tuple can have one or more alternatives. As defined in [2], alternatives are presenting uncertainty about the contents of a tuple. For example in Fig 3, persons called Frank and Jimmy are x-tuples which contains one or more alternatives by using annotation ‘||’. By clicking on the ‘||’, x-tuple will be split into two alternatives for both Frank and Jimmy in Fig 4. Each alternative is a regular tuple. The reason of combining regular tuples as x-tuple is because the possible instances have same values/relations in particular fields.

(Fig 2) uncertainty tables (Fig 3) Drives table (Fig 4) Drives table

1.2.2 ‘?’ (Maybe) Annotations

‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2]. For example in Fig 3, there is a ‘?’ on the right side of Billy’s record. In specific by the definition, Billy may or may not sure what he saw is a blue Honda. If x-tuple has a ‘?’, all of alternatives will have ‘?’.

1.2.3 Numerical Confidences

Numerical confidence also was considered as probability [2]. In Fig 5 is a confidence table. When we view the tuples of Drivers table, there is a number attached with each x-tuple indicated the confidence as in Fig 6; Frank has 1.0, Johnny/Jimmy has 1.0, and Billy has 0.9. By looking at Fig 7, there is a detailed probability for each tuple. The total confidences of x-tuple can be less or equal to 1.0 [2]. In this example, I inserted the data along with confidences into Drivers table. Optionally, TrioExplorer will compute the confidences by probability if no confidence available.

(Fig 5) w/confidence tables (Fig 6) Drives table (Fig 7) Drives table

1.2.4 Lineage

Lineage, which as defined in [2], is “recorded at the granularity of tuple alternatives: Lineage connects as x-tuple alternative to other x-tuple alternative.” Specifically, lineage is a function which is used to derive alternatives for a set of alternatives. For an example, when the SQL query ‘select person from Drives’ was executed in TrioExplorer, the returned result is in Fig 8. The blue arrow in the picture presents the lineage. By clicking on it, we can see that Billy record was derived from an entire Billy record. By inserting this function in DBMS, the users can capture derivation of SQL query for better understanding of data relationship.

2

Page 4: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

(Fig 8) Drives table (Fig 9) Drives table

1.3 TriQL: The Trio Query Language

TriQL was developed specifically for querying Trio DBMS. It contains two major parts: built-in functions and predicates for querying confidence values and lineage, and regular SQL syntax [2]. As mentioned in [1], TriQL, which used to query over ULDBs, are translated automatically to SQL queries over the encoded tables. Here I will not discuss about TriQL query language and structure, but will be covered in Learning Trio Query Language (TriQL) structure section by using demo.

2.0 System Requirement of Installing Trio System

To know the system requirement before the installation is a very important step. It may the installation of Trio system much easier and compatible. The requirement mainly concentrates on type of operation system, version of database server (PostgreSQL), and Python. In [3], the web page describes the operating systems in which was experimented successfully with Trio System, such as Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server). In PostgreSQL as database server which supported Trio system function has various versions available in www.postgresql.com, including 8.2.5, 8.1.10, 8.0.14, and 7.4.18 [4]. As mentioned in postgreSQL website, it only supports Linux and Win32. Optimal copy for Trio system is PostgreSQL 8.1.10. The last one is Python API. It can run on windows, Linux/Unix, Mac OS X, OS/2, and Amiga [5]. In the Trio system installation section, I used Windows XP professional as one of operating systems to install Trio system.

3.0 Source Codes/packages for Trio System

In http://dbpubs.stanford.edu:8011/doku.php/trio:installation page listed 12 packages for completing the installation of Trio system. Each package has its own website which contained different versions of source codes corresponding with installation instructions. This section is to list all the resources for users to make their lives easier and install properly. (Please use listed versions of packages)

3.1 Listing source codes:

1. Python 2.4 can be downloaded from http://www.python.org/ .2. Easy_install can be downloaded from http://peak.telecommunity.com/DevCenter/EasyInstall

and the file called ez_setup.py. 3. Readline 1.7.win32 can be downloaded from http://www.python.org/ .4. ctypes-1.0.2.win32-py2.4 can be download from http://www.python.org/ .5. PostgreSQL 8.1 can be downloaded from http://www.postgresql.org/ .6. Graphviz 2.14 is the only version compatible with Trio API. It is available in

http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe . 7. PyGreSQL can be downloaded from http://www.pygresql.org/ .8. Pylons 0.9.5 can be downloaded from http://pylonshq.com/ .9. PLY 2.2 can be downloaded from http://www.dabeaz.com/ply/ .

3

Page 5: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

10. PyParsing can be downloaded from http://pyparsing.wikispaces.com/ .11. PyDot can be downloaded from http://code.google.com/p/pydot/downloads/list .12. Trio API 1.0 can be downloaded from http://infolab.stanford.edu/~theobald/sources/TRIO.zip .

4.0 Procedure of Trio system installation

Since Trio is deployed on top of the above packages, following order below in installing them one by one will be much easier. Some packages are depending among the others packages. For convenience, most of the required Python modules (Pylons, PyGreSQL, etc.) can be installed via ''Easy Install'' for Windows platforms. Most of the procedures below were mainly from [3], but modified as needed.

4.1. Pythono Download Python 2.4 windows version (python-2.4.4.msi).

o Install Python in C:/Python directory

o Set path=c:/Python24; in environment variables

4.2. Readline - download Readline-1.7.win32-py2.4.exe and install into Python directory

4.3. Ctypes – download ctypes-1.0.2.win32-py2.4.exe and install into Python directory

4.4. PostgreSQL

o Download PostgreSQL 8.1 windows version (postgresql-8.1.msi )

o Install PostgreSQL 8.1 as following:

Language selection (Fig 10) – English

Introduction screen (Fig 11) – next

Welcome message and instructions (Fig 12) – next

Feature selection (Fig 13) – next

Service installation (Fig 14) – check install a service, input account name ‘postgres’ and password.

Initdb (Fig 15) – check initialize database cluster, super username and password.

Procedural languages (Fig 16) – Check PL/pgsql only.

Contrib modules (Fig 17) – check Admin81 only.

Next in (Fig 18, 19, 20, 21) to complete the installation.

o Set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation.

4

Page 6: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

(Fig 10) [4] (Fig 11) [4] (Fig 12) [4]

(Fig 13) [4] (Fig 14) [4] (Fig 15) [4]

(Fig 16) [4] (Fig 17) [4] (Fig 18) [4]

(Fig 19) [4] (Fig 20) [4] (Fig 21) [4]

4.5. Graphviz - Download Graphviz 2.14 version and install to your workstation and set path C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation

4.6. Easy_install - Download ez_setup.py in C:/ directory

5

Page 7: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

4.7. PyGreSQL – In command line, cd\ to c: directory, and run python ez_setup.py PyGreSQL to install components.

4.8. Pylons – In command line, cd\ to c: directory, and run python ez_setup.py Pylons==0.9.5 to install Pylons. Set path c:\python24\Scripts in environment variables.

4.9. PLY – In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install Ply==2.2.

4.10.PyParsing – In command line, cd\ to c: directory, and run python ez_setup.py PyParsing.

4.11.PyDot – download the source from website. Access to folder in command line, and then install manually by running ‘python setup.py install’.

4.12.Trio API

o Download source code in any directory

o Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and renamed as triospi.dll

5.0 Configuration of Trio System

5.1.Windows superuser authentication to access PostgreSQLo Double click in start->all programs->PostgreSQL 8.1->pgadmin III

o Right click on Login Roles to create new login role (see in Fig 22)

Role name ‘myname’ (as same as windows login account)

Set password (Password can be any)

Check all role privileges and click ok.

o Right click on Database to create new database (see in Fig 23)

Database name ‘myname’ (as same as username)

Owner is ‘myname’ and click ok

o Initialize Trio schema information (see in Fig 24)

In Trio-1.0\setup, open setup.py with notepad to comment out the last three codes and put the following. (After complete the initialization, please change back to original).

os.system("psql %s %s < setup.sql" % (pgdbname, username)) os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username)) os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username))

6

Page 8: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

Save the file, and at the command line, cd \Trio-1.0\setup, and run ‘python setup.py myname myname’ in command line.

Provide password to create schema and done.

(Fig 22) (Fig 23) (Fig 24)

5.2.TrioExplorer [3]o Make sure PostgreSQL is working.

o Running TrioExplorer – Ensure path ‘c:\python24\Scripts;’ in environment variables. And double click ‘start_te_server.bat’ under Trio-1.0\explorer.

o At the command line, you are now prompted for an admin user login to PostgreSQL, which should have been created along with your PostgreSQL installation and which will be used by TrioExplorer to create new user roles and database instances.

o TrioExplorer should now be reachable from your browser using http://localhost:8080/. For new users can now press ‘Create a new user’ and create their own Trio login and database instances, which are then managed by the PostgreSQL server.

5.3.TrioPlus [3] – Trioplus is to use to prepare for a new user role and database instances be installed manually which should be different from the default PostgreSQL instance, and which requires logging in onto the PostgreSQL server with admin privileges.

o Create new PostgreSQL user role and database instance

Run ‘createuser demo’

Run ‘createdb demo’, the name must be the same as username

o Initialize Trio schema information for new user by access as same as windows superuser authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create new role’ in Web.

o Connect to new Trio database using the command line clients by running ‘python trioplus.py –u demo –d demo –p’

7

Page 9: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

6.0 Experiment Trio DBMS Using TrioExplorer And TrioPlus

(Fig 25)

6.1 TrioExplorer:

Fig 25 described the interface of TrioExploer in home tab after login in using login username and password. On the left side of the web page listed all symbols which used to show the types of tables/records. In the body of the page, a text-field area uses to execute TriQL. After press ‘execute’, the information will show up under with correct result.

(Fig 26)

Fig 26 is the schema tab in TrioExplorer. After created the tables of database, the diagram described the relations of tables, type of tables, and attributes. On the left side also listed the symbols and meanings of the diagram.

(Fig 27)

Fig 27 is a sample tab. It uses to load the sample script which was provided by the application for testing purpose. After load the sample script data, the application will automatically execute the queries in DBMS. Also a copy of sample script will list on the body of the page.

8

Page 10: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

(Fig 28)

Fig 28 is a script tab in TrioExplorer which uses to load a script from a file in your workstation to the application to be executed. After loaded successfully, you can click on the script to put a single line of script to the text-field to be executed in DBMS.

(Fig 29)

Fig 29 is a help tab which helps users for common problems such as how to use all tabs, how to create database, and more.

(Fig 30)

6.2 TrioPlus

As seem in Fig 30, by accessing to Trio-1.0/ directory at the command line, just type ‘python trioplus.py –u myname –d myname –p’ to connect to Postgres server. Please aware that myname account has to be created in Trio DBMS first. After established the connection with postgreSQL, you can execute any TriQL query statement in ‘>’ as same as in the text-field of TrioExplorer.

7.0 Trio API Integrated Into Other Python Scripts

9

Page 11: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

Trio API contains many methods in the classes Triodb.py and xtyple.py under Trio-1.0 directory for integrating into other Python scripts. By importing both classes to Python scripts, it provides additional capability to handle tuple alternatives, computation of confidence, and lineage. As seem closely in the classes, the Trio API methods were designed to work very similar as Python DB-API interface [3]. There are several Trio API methods described detailed below was from [3].

7.1 TrioCnx7.1.1. TrioCnx(pgdb) – This constructor method creates a new Trio connection from a given

PyGreSQL connection pgdb, the default Python DB-API.7.1.2. cursor() – return a new TrioCursor object for the current connection.7.1.3. commint() – commits the current transaction.7.1.4. rollback() – performs a rollback for the current transaction.7.1.5. close() – closes the Trio connection (and the underlying pgdb connection)

7.2 TrioCursor7.2.1. execute(triql) - Executes a TriQL statement triql for this cursor object. 7.2.2. fetchone() - Fetches a single XTuple object from the current cursor position. 7.2.3. fetchall() - Fetches and returns a list of all XTuple objects beginning from the current cursor

position.

7.3 XTuple7.3.1. len() - Returns the number of Alternative objects contained in this XTuple object. 7.3.2. getAlternative(idx) - Returns the Alternative object at the designated index idx. 7.3.3. getConfidence() - Returns the confidence value (if any) of this XTuple object as the sum of its

Alternative objects' confidence values. 7.3.4. getQuestionMark() - Returns whether this XTuple object has a question mark or not.

7.4 Alternative7.4.1 getLineage() – returns a list of immediate lineage information as (source-table, source-aid)

pairs of this alternative.7.4.2 traceLineage() – performs a transitive lineage traversal for this alternative back to the base

data7.4.3 getConfidence() - Returns the confidence value (if any) of this alternative. 7.4.4 computeConfidence() - Computes the confidence value of this alternative based on the

traceLineage() function.

8.0 Trio API And Translator (Python)

Trio API and translator (Python) is one of important components in Trio system. Every Trio query language (TriQL), entered from either TrioExplorer or TrioPlus by users, will be translated into encoded data table, Lineage table, Trio metadata, and/or Trio Stored Procedures of ULDB, and be executed in relational database. Node.py in Trio-1.0 directory and lexer.py in Trio-1.0\trioparser directory contain classes, methods, TriQL keywords initiation for supporting and validating TriQL queries through Trio API.

TriQL built-in functions TriQL keywords

Aggregate(), Alternative(), As(), Avg(), BinaryExpression(), Brackets(), Cascade(), ColumnList(), Column(), Command(), ComputeConfidences(), Conf(), ConfUniform(),

'all', 'and', 'any', 'as', 'avg', 'by', 'cascade', ‘compute', 'conf', 'confidences', 'count', 'create', 'distinct', 'drop', 'eavg', 'ecount', 'emax', 'emin', 'esum', 'except', 'exists', ‘flatten', 'float', ‘float4',

10

Page 12: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

ConfScaled(), Count(), CreateIndex(), CreateTableAs(), CreateTable(), CreateTempTableAs(), CreateTempTable(), DataType(), Dot(), DropIndex(), DropTableList(), DropTable(), Eavg(), Ecount(), EcountStar(), Emax(), Emin(), Empty(), Esum(), FromClause(), GroupAlts(), GroupbyClause(), GroupByKey(), HavingClause(), HorizontalSelect(), IdentifierList(), InsertList(), Insert(), InsertSpec(), Lineage(), Literal(), Lsum(), Max(), Maybe(), Maybe(), Min(), OrderBy(), Question(), SelectClause(), Select(), SelectOptions(), SetOperator(), Star(), Sum(), UnaryExpression(), UncertainSet(), UniformTable(), ViewTable(), WhereClause(), WithConfidences()

'float8', 'from', 'group', 'groupalts', 'having','in', 'index', 'insert', 'int', 'int4', 'int8', 'intersect', 'into', 'is', 'like', 'lineage', #'lavg', #'lcount', #'lmax', #'lmin', #'lsum', 'max', 'maybe', 'min', 'noconf', 'nolineage', 'nomaybe', 'not', 'null', 'on', 'or', 'order', 'scaled', 'scaledbyexp', 'select', 'sum', 'table', 'temporary', 'trio', 'trio_aid', 'trio_xid', 'uncertain', 'uniform', 'union', 'merged', 'values', 'varchar', 'view', 'where', 'with'

9.0 Trio Query Language (TriQL) Structure

As mentioned in [3], “Trio is based on the ULDB model, an extension to the relational model that adds both uncertainty (U) and lineage (L) as first-class concepts.” Therefore, Trio Query Lanaguage (TriQL) is designed for querying and updating ULDB database using Encoded data table, Lineage table, Trio metadata, and Trio Stored Procedures. Also discussed earily, ULDB is extended over the standard SQL. So TriQL contains not only built-in functions to handle uncertainty, query/compute confidence values, derive data lineage, but also regular SQL queries.

9.1 Current supported for TriQL

Supported DDL and Insert Commands Supported Subset of TriQL Query Language

Create Table Create Trio Table Create Index Drop Table -

Including “Cascade” option to drop derived tables

Drop Index Insert Into T Values <data>

o With or without alternatives, confidence values, and ?

o T must be a base table Insert Into T <TriQL query>

o T must be a base table

Select-Project-Join queries Create Table T as <query> Select Distinct Union, Intersect, or Except of two

subqueries (duplicate-eliminating) Union All of two subqueries Merged, Flatten, and GroupAlts Horizontal subqueries in Where clause

o Including shortcuts and aggregation/group-by/having

Horizontal subqueries in Select clauseo Including shortcuts and aggregation

but not group-by/having Conf() function with any number of tables

or Conf(*) o except for Conf(*) in the SELECT

clause in conjunction with DISTINCT or MERGED

Maybe() predicate “Uniform <table-name>” in From clause “as conf” for query-defined result

confidence values

11

Page 13: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

o Including “Uniform as conf” and “Scaled as conf”

“Compute Confidences” at end of query Lineage() predicate

o Not in horizontal subquerieso “=⇒” abbreviation allowed

NoLineage, NoConf, and NoMaybe

Based on the sample of TriQL in http://infolab.stanford.edu/~widom/triql.html, I created the TriQL tables/indexes and inserted the uncertainty data and confidences for demo.

9.2 Drop index/table – TriQL can use ‘drop index/table’ statemetn to delete the table as same as the standard SQL statement over the DBMS.

drop index DRIVES_INDEX;drop table Drives;drop index SAW_INDEX;drop table Saw;

9.3 Create Trio table/index – To create TriQL table for uncertainty, the word ‘Trio’ must involve in the create statement ‘create trio…’ as below. Also a ‘uncertain()’ funciton in the same statement helps to define which field inside the table will be considerred as uncertainty. If the tail of the statement has ‘with confidences’, it means the table can be insertted with confidences and compute confidences during the query execution. ‘[…|…]’ means x-tuple. ‘?’ as mentioned above is alternative. ‘:number’ is the confidence.

----------------------------------------------------------------------- Creating tables and indexes, inserting data---------------------------------------------------------------------create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car));create index SAW_INDEX on Saw(color, car);

insert into Saw values [('Amy','blue','Honda') | ('Amy','red','Toyota')] ?;insert into Saw values [('Betty','green','Mazda') | ('Betty','green','Toyota') | ('Betty','green',NULL)];insert into Saw values ('Cathy','red','Acura') ?;insert into Saw values [('Diane','red','Toyota') | ('Diane','blue','Toyota')];

create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color, car));create index DRIVES_INDEX on Drives(person,color, car);insert into Drives values [('Frank','red','Toyota') | ('Frank','blue','Toyota')];insert into Drives values ('Billy','blue','Honda') ?;insert into Drives values [('Jimmy','green','Mazda') | ('Johnny','green','Mazda')];

---------------------------------------------------------------------------------------------------------------- With confidence value---------------------------------------------------------------------------------------------------------------create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car)) with confidences;create index SAW_INDEX on Saw(color, car);

12

Page 14: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

insert into Saw values [('Amy','blue','Honda'):0.4 | ('Amy','red','Toyota'):0.3];insert into Saw values [('Betty','green','Mazda'):0.5 | ('Betty','green','Toyota'):0.2 | ('Betty','green',NULL):0.3];insert into Saw values [('Cathy','red','Acura'):0.6];insert into Saw values [('Diane','red','Toyota'):0.2 | ('Diane','blue','Toyota'):0.8];

create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color, car)) with confidences;create index DRIVES_INDEX on Drives(person,color, car);insert into Drives values [('Frank','red','Toyota'):0.7 | ('Frank','blue','Toyota'):0.3];insert into Drives values [('Billy','blue','Honda'):0.9];insert into Drives values [('Jimmy','green','Mazda'):0.4 | ('Johnny','green','Mazda'):0.6];

9.4 TriQL language

(Fig 31)

9.4.1 uncertainty

To create Saw (witness, color, car) and Drives (person, color, car) in the DBMS without confidences, basically copy the first part of above script into the text-field area under home tab and press ‘execute’. The application will execute and create two tables, Drives and Saw as seem in Fig 31. All the data was successfully inserted into the table. The green color of the table means uncertainty.

(Fig 32)

Selection - In Fig 32 shows the results which selectted all reconrds from Saw table if its car is toyato while executing the TriQL ‘select * from Saw where car = ‘Toyato’. In the table, the blue arrow on the left side presents the lineage, and right side with ? presents the alternative. The 3D in the last recod contains two tuples (x-tuple).

(Fig 33)

Projection – Fig 33 shows the results of executing the query ‘Select color From Saw Where car = 'Mazda' or car = 'Toyota'. this query will find the colors of sighted Mazdas and Toyotas.

13

Page 15: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

(Fig 34)

Projection (Merged) – Fig 34 shows the results of executing the query ‘Select Merged color From Saw Where car = 'Mazda' or car = 'Toyota'. This query will find the colors of sighted Mazdas and Toyotas. Merged in query statement will eliminate the duplicated tuples in horizontal.

(Fig 35)

Join – Fig 35 shows the results of executing the query ‘Select S.witness, D.person as suspect, D.color, D.carFrom Drives D, Saw SWhere D.color = S.color and D.car = S.car’. the query joins Saw and Drives table based on the same color. Using merged fuction in query statement as ‘Select Merged D.person as suspect, D.colorFrom Drives D, Saw SWhere D.color = S.color’ will help to eliminate the duplicated tuples.

(Fig 36)

Duplicate-Elimination (Distinct) – Fig 36 shows the results of executing query ‘Select Distinct color, car from Saw’. The query scan the entire Saw and return color and car for each tuple. By using distince funciton will eleminiate any vertical duplicated tuples.

(Fig 37)

Flatten – Fig 37 shows the results of executing query ‘Select Flatten * From Saw’. Flatten is used to turn tuples with alternative values into regular tuples. The query converts table Saw into a list of sightings [3].

14

Page 16: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

(Fig 38)

GroupAlts - Fig 38 shows the reuslts of executing the TriQL ‘Select GroupAlts(color) * From Saw’. The query takes a list of alternatives, and reorganizes the Saw table so the data is ‘keyed’ on color and uncertainty about car and witness [3]. The GroupAlts funciton is used to create or restructure alternative values.

(Fig 39)

Horizontal Subqueries: The [ ] in the Where Clause – Fig 39 Shows the results of executing the query ‘Select * From SawWhere 2 <= [Select Count(*) From Saw Where car = 'Toyota']’. The ‘[ ]’ inside the statement was treated as a horizontal subqueries. The query finds all tuples with at least two alternatives involving Toyotas.

The [ ] in the select clause as ‘Select Merged witness, [Count(Distinct color)] as #colorsFrom Saw’

(Fig 40)

Syntactic Shortcuts in [ ] for implicit table-list – Fig 40 shows the results of executing the query ‘Select * From SawWhere color <=All [color]’. Color is one the attributes in table Saw. The query is doing lexicographic color comparisons. The shortcut in [ ] also can use for ‘select * ‘ as‘Select * From Saw Where Exists [car = 'Toyota']’.

(Fig 41)

9.4.2 Compute Confidences

To create Saw (witness, color, car) and Drives (person, color, car) in the DBMS with confidences, copy the second part of above script into text-field under home tab and press ‘execute’. The application will execute and create two tables, Drives and Saw with confidences as seem in Fig 41. The table color changed to orange, compute confidences. All the data was successfully inserted into the table.

15

Page 17: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

(Fig 42)

Built-in Function Conf() – Fig 42 shows the results of executing the query ‘Select S.witness, D.person as suspect, D.color, D.carFrom Drives D, Saw SWhere D.color = S.color and D.car = S.carAnd Conf(D) > 0.3 And Conf(S) > 0.3’. The conf() takes a single table name or variable from the ‘From’ clause as a parameter, and returns the confidence of the current alternative being evaluated from that table. The query join Drives and Saw tables to generate supects if confidence > 0.3.

Conf() for Multi-table: ‘Select S.witness, D.person as suspect, D.color, D.carFrom Drives D, Saw SWhere D.color = S.color and D.car = S.carAnd Conf(S) > 0.3 And Conf(*) > 0.2’ condition for result alternatives with confidence > 0.2.

(Fig 43)

Built-in function Maybe() – Fig 43 shows the results of executing query ‘Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw SWhere D.color = S.color and D.car = S.car And (Not Maybe(D)) And (Not Maybe(S))’. Maybe() takes a table name or variable from the ‘From’ clause as a parameter, and returns true if and only if the current alternative being evaluated from that table comes from a tuple with a ‘?’.

(Fig 44)

Uniform result confidence – Fig 44 shows the results of executing the query ‘Select *, uniform as conf From Saw’. The uniform used with ‘as conf’ assigns confidence values to a tuple as, if the tuple has n alternatives and no ‘?’, assign confidence 1/n to each alternative, or if the tuple has n alternatives and a ‘?’, assign confidence 1/(n+1) to each alternative. So the query computes and returns the confidences.

(Fig 45) Scaled result confidence – Fig 45 shows the results of executing the query ‘Select *, scaled as conf From Saw’. scaled used with ‘as conf’ assigns confidence values to a tuple, If the tuple has no confidence values and n alternatives, assign confidence 1/n to each alternative, or If the tuple has confidence values that sum to s, assign confidence value c/s to each alternative, where c is the existing confidence value for that alternative. So the query computes and returns the confidences.

16

Page 18: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

9.5 Table of all possible TriQL contents – in [3] about the TriQL Query Language (website http://infolab.stanford.edu/~widom/triql.html#options ), it listed all built-in functions for querying ULDB. For convinient, I collected all and put them in a table below.

ULDBs Uncertian attibutes, maybe annotations and confidence values

SQL over ULDBs Selection, projection, join, subqueries, duplicate-elimination, grouping and aggregation, aggregate variants, set operators, order by

Flatten and GroupAlts Flatten is used to turn tuples with alternative values into regular tuples, while GroupAlts is used to create or restructure alternative values

Horizontal subqeries: The [ ] construct [ ] in the where clause, [ ] with joins, Syntactics shortcuts in [ ], [ ] in the select clause, [ ] with Self-Joins

Builit-in Functions Conf() and Maybe() Multi-table conf()Result confidences Result confidence evaluation, uniform and scaled

result confidences, On-Demand confidence computation

Built-in Predicate Lineage() The Lineage() predicate lets queries filter joined tuples based on whether they are related via lineage

Options Nolineage, Noconf, and NoMaybe Indicate lineage, confidence values, and/or?'s should be omitted from query results

Data modificaiton Insert statement, delete statement, update statement

10.0 Advantage and Disadvantage of using Trio DBMS

10.1 Advantage

10.1.1 Open source and free support for any non-benefit users to experience new Trio DBMSTrio DBMS was created by Stanford University at Dec, 2006 and updated very often with new plug-in and Trio version. If uncounted any problems, the supporter in Stanford University, like Martin, Theobald, will provide solutions. Also there are many resources available for users to better understand about Trio DMBS in term of design/architecture, components, query structure, and more.

10.1.2 Advanced components in relational DBMSCurrent relational DBMSs, like Oracle, SQL, DB2, and more, can handle well-structure data easily, but not any uncertainty data and lineage. Hence, the Trio DBMS is more advanced. In Stanford Trio project, the ULDB was successfully built on top of postgreSQL. I believe later, it can be implemented on any relational DBMS as one of plug-in.

10.1.3 Computing confidences Computing Confidences is one of function in Trio System. It is very important for calculating the probability of each tuple in total. From the marketing perspective, the probability gives the better view of users to make decision, data analysis, and statistics.

10.1.4 Efficient, Convenient, safe, Multi-User storage of and access to, Massive, Persistent data [6]

17

Page 19: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

Persistent and convenient is because all data stored in simple tables (“relations”) and queries and updates via simple but powerful declarative language (SQL). In data transactions, it is safer, and also allows multi-user access. The DBMS can storage and indexing structures which makes it massive and efficient.

10.2 Disadvantage

10.2.1 Time cost for queryIn Trio DBMS query structure, the time costing to query uncertainty, lineage, and confidences, or regular SQL is more than regular relational DBMS because Trio needs to create super set of tuples which requires several extra steps. As below example, ULDB takes four steps and relational database takes only two steps. Therefore, relational database query is much faster than ULDB.

Using ‘SELECT attr-list FROM X1, X2, ..., Xn WHERE predicate’ as a query example in [6] for a comparison between relational database and ULDB.

Over standard relational database:For each tuple in cross-product of X1, X2, ..., Xn

1. Evaluate the predicate2. If true, project attr-list to create result tuple

Over ULDB:For each tuple in cross-product of X1, X2, ..., Xn

1. Create “super tuple” T from all combinations of alternatives2. Evaluate predicate on each alternative in T ; keep only the true ones3. Project attr-list on each alternative to create result tuple 4. Details: ‘?’, lineage, confidences

10.2.2 DependencyTrio DBMS is not a completed DBMS but depending on any relational DBMS, like postgreSQL. The TrioExplorer was built in using python and other packages. Later some packages was updated to new version, for instance, graphviz 2.16, it is not compatible with TrioExplorer which will cause problems. Therefore, in order to keep Trio DBMS running, it requires efforts to maintain, reconfigure, and update.

10.2.3 On development stage - Trio project has not been developed completely with all functionalities, such as TrioExplorer for DB admonition, DB security, and more.

11.0 Documenting Trio System report/bugs

During the experiment of Stanford Trio project, I went through the Trio instalation, TrioExplorer, TrioPlus and TriQL. There are several places with mistakes/not correct/imcomplete.

11.1.The install instruction in website http://dbpubs.stanford.edu:8011/doku.php/trio:installation , indicated unclearly the version of Graphviz for Trio system. In Graphviz website http://www.graphviz.org/ only has version 2.16 but not compatable except version 2.14. Graphviz version 2.14 is available for download in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe .

11.2.The windows authentication supperuser needs to be created first in the PostgreSQL in order to connect to database. After established the connection, TrioExplorer and TrioPlus can use the

18

Page 20: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

supperuser’s login and password as windows authentiction to access to database system. However, it doesn’t mention at all in the installation procedure on how to create this typle of new user. The only way to solve it is to use PostgreSQL->pgadmin III manually.

11.3.After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup directory in order to run ‘python setup.py –u myname –d myname –p’.

11.4.TriQL query statements in http://infolab.stanford.edu/~widom/triql.html#options, there are many samples queries not working properly as desired.

12.0 Written Components

12.1 Turn-in DocumentsTurn in Special interesting activity task plan summary, Activity deliverables, Standard, minimal documentation of hands-on work, and Electronic copy of activity.

12.2 Demo and Presentation

12.2.1 Presentation of Trio database system architecture Brief description of Trio database system architecture, ULDB data model, Python API, TriQL

query structure, features, installation step-by-step instruction, Python Sever Installation step-by-step instruction, and more

12.2.2 DemoLaunch Python server, PostgreSQL, TrioExplorer, TrioPlus, TriQL (Create tables, indexes, insert, select, update, more query statement).

12.3 Reflective written component

After spent some many hourses experimenting Trio project, I learned about python (server, coding, and executing programs), PostgreSQL (how different than other DBMS, installation, etc), Trio (installation, architecture, TriQL, and ULDB, TrioExplorer, TrioPlus), and more. during the experiment, I found out several things are very important to complete my goal such as support, resources, environment, and scale. The support means where to get help if encouted problems. Resources can provide a bais picture of the entire system, so I know what I am working on. Environment is where to experienment the project and what the operating system is, what the database server is, and more. The last one is the scale. To make sure to complete the project on time, I have to know how big the project is in order to manage my time. I felt that this is a greate experince to establish skill and ability in term of trouble shooting, problem solving, and adopting new technology.

13.0 References

[1] M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07), Pacific Grove, California, January 2007. Demonstration description.

[2] O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16, March 2006.

19

Page 21: Trio Database System Management  · Web viewTrio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts

Trio Database System Management

[3] Trio: A System for integrated Management of Data, Uncertainty, and Lineage . Retrieved on November, 18, 2007 from http://infolab.stanford.edu/trio/ .

[4] PostgreSQL . Retrieved on November, 20, 2007 from http://www.postgresql.org/.

[5] Python . Retrieved on November, 15, 2007 from http://www.python.org/.

[6] Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, 2006-07. Ppt.

20