design data retrieval and manipulation for subset of ‘gombe’ database using qbe durga gumaste

40
1 Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste Advisor: Dr. Shashi Shekhar June 10, 2003

Upload: fionn

Post on 25-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste Advisor: Dr. Shashi Shekhar June 10, 2003. Agenda. Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

1

Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE

Durga GumasteAdvisor: Dr. Shashi ShekharJune 10, 2003

Page 2: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

2

Agenda Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo

Page 3: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

3

ObjectiveDesign and implement queries to access andmanipulated ‘Gombe’ chimpanzees data subset,

suchthat queries can be modified by the user having

no background of any Data Manipulation

Language(DML)

Page 4: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

4

Agenda Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo

Page 5: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

5

Background and motivation Data about ‘Gombe’ chimpanzees

Collected since 1953 Behavioral and location data 15 tables Average size: 10-12 MB

Dr. Jane Goodall has done active research of ‘Gombe’ chimpanzee for last 35 years

Jane Goodall Institute's Center for Primate Studies at the University of Minnesota

Data retrieval for analysis on the data Frequent query modification

Ease of modification

Page 6: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

6

Sample in ‘Gombe’ database

Name Description No of records

chimp Chimps observed by biologists 212Follow Each time a chimp is observed by

biologists8463

Follow_arrival_ new

Any chimp arriving with the focal chimp

230743

Food_bout What and where a focal chimp eats during the follow

68871

Follow_map_position

Location of the focal chimp during the follow

393615

Page 7: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

7

Relationship between tables

Page 8: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

8

Agenda Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo

Page 9: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

9

Related work Earlier implementations

Oracle Paradox

Limitations of earlier implementations Ecologists not comfortable in modifying PL/SQL

queries in Oracle Paradox is not licensed at University of

Minnesota

Page 10: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

10

Present Implementation Database: Microsoft Access 2000

Ecologists familiar with MS Access environment

Desktop database Microsoft office suite University of Minnesota has a license for

Microsoft Office Provides QBE

Page 11: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

11

My contribution Port ‘Gombe’ database to MS Access

Implement new queries using MS Access

Helped behavioral ecologists to modify queries using MS Access QBE

Optimize queries

Page 12: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

12

Agenda Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo

Page 13: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

13

Port tables to MS Access

Apply primary key constraints Apply referential integrity constraints

Create tables in design view

Import tables using import utility

Page 14: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

14

Verification of porting Number of records present in .txt files

Follow: 8459 records

Count of records by count query

Page 15: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

15

Agenda Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo

Page 16: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

16

Queries Nested, join, range

Q1: Find all chimps arriving alone Q2: Include mothers arriving with off springs in Q1Q3: Include siblings in Q1Q4: include mothers and siblings in Q1Q5: Find chimps arriving together with other chimp

Single table, aggregate, pointQ6: Find food count of food items in a particular month of a year (Find % food counts)Q7: Find duration for which food items are eaten in a particular month of a year(Find % food duration)

Page 17: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

17

Implementation (Q1)follow_arrival (A)

follow_arrival (B)

Inner join on A and B (self join)

A.date=B.date A.follow=B.follow A.chimp<>B.chimp

Result Set

Arrival time difference between A.chimp and B.chimp > 5 minutes

Follow_map_position (F)

Inner join with F

A.date=F.date A.follow=F.follow A.chimp<>F.focal A.seq = F.seq

1. Inner join (self join) on follow_arrival

2. Select chimps having fa_time_start difference more than 5 minutes for a particular follow on a particular date

3. Take location coordinates for such chimps from follow_map_position table by joining follow_arrival table with follow_map_position table

Chimps are said to be alone when arrival timebetween 2 chimps is more than 5 minutes

Page 18: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

18

MS Access Implementation for Q1 Few Inner joins conditions cannot be displayed

in QBE MS Accesses uses Dyna sets Sub-query over base query

Base query in SQL (views) Sub-query in QBE

Sub-query easy to change by ecologists

Page 19: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

19

MS Access Implementation for Q1

Page 20: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

20

Comparison - PL/SQL & QBE for Q1

Page 21: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

21

Comparison - PL/SQL & QBE for Q1

Page 22: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

22

Q1 extension

Chimps arriving alone Mothers arriving with off springs are counted as

arriving alone (Q2) Chimps which arrive with their siblings are

counted as arriving alone(Q3) Both Q2 and Q3 (Q4)

Page 23: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

23

Derived table

Follow Date Time Map Grpsize

------

chimp_id

AL

AO

AP

AR

AT

Follow_Arrival Follow_map_time

Time interval adjustment10:03 10:0010:11 10:15

Follow_arrival

Certainty Value 1 1 0 0 not observed blank

Sum of certainties

AL740101 1/1/1974

10:00 AM 2 1 1 0 13

AL AO AP AR

Group_composition_table

•Written in VB

Page 24: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

24

Agenda Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo

Page 25: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

25

Query optimization in MS Access Cost bases query optimization

MS Jet 4.0 Table statistics

Rushmore optimization Efficient use of indexes Index intersection,union,minus

Page 26: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

26

Performance evaluationExecution times

11.6 13.5 13.2 13.8 18

200

260240

290

450

0

50

100

150

200

250

300

350

400

450

500

Queries

Tim

e(s)

indexno index

Page 27: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

27

Compacting databaseCompact database using Compact utility

provided byMS Access

De-fragmentation Reordering database pages Reclaim unused space

Original size: 1.1 GB After compaction: 284 MB

Flags queries needing recompilation

Page 28: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

28

Summary Porting of data to MS Access Query modification using QBE

Ease of writing and modifying queries GUI

Queries over views Base queries in SQL and sub-queries in QBE Access uses dyna-sets

Derived tables created in VB Multiple queries Onetime queries

Indexes on join attributes improve the performance by 90-95%

Page 29: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

29

Future work Updating group composition table if new

chimp gets added to chimp table

Page 30: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

30

Demo Query Description

Find out chimpanzees arriving alone (include relations)

Base query (view) in SQL Sub-query in QBE Modifications in QBE Results of modifications

Page 31: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

31

Base query (view) in SQL

Page 32: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

32

Sub-query in QBE

Page 33: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

33

Modifications in query (1)

Page 34: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

34

Results(1)

Page 35: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

35

Modifications in query (2)

Page 36: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

36

Results(2)

Page 37: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

37

Modifications in query (3)

Page 38: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

38

Results(3)

Page 39: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

39

Acknowledgements I would like to thank Prof. Shekhar for giving me

this wonderful opportunity to work on this project and his precious guidance from time to time

I would also like to thank Prof. Pusey, Carson Murray, and Ian Gilby of Ecology Department for their help in understanding Gombe database.

I would like to thank Prof. Pusey and Prof. Srivastava for their time today

Page 40: Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste

40

Thank You!!