dynamic database integration in a jdbc driver terrence mason and dr. ramon lawrence iowa database...

26
Dynamic Database Dynamic Database Integration Integration in a JDBC Driver in a JDBC Driver Terrence Mason and Dr. Ramon Lawrence Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Iowa Database and Emerging Application Laboratory Laboratory University of Iowa University of Iowa 7th International Conference on Enterprise Information 7th International Conference on Enterprise Information Systems Systems ICEIS 2005 Miami, Florida ICEIS 2005 Miami, Florida

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Dynamic Database Dynamic Database Integration Integration

in a JDBC Driverin a JDBC Driver

Terrence Mason and Dr. Ramon LawrenceTerrence Mason and Dr. Ramon LawrenceIowa Database and Emerging Application LaboratoryIowa Database and Emerging Application Laboratory

University of IowaUniversity of Iowa

7th International Conference on Enterprise Information Systems 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, FloridaICEIS 2005 Miami, Florida

Discuss the contributions of JDBC Driver Discuss the contributions of JDBC Driver

Review the ArchitectureReview the Architecture

Step through an example integration and Step through an example integration and query (Partitioned TPC-H* dataset)query (Partitioned TPC-H* dataset)

Review the experimental resultsReview the experimental results

Demonstrate efficient Database Demonstrate efficient Database IntegrationIntegration

*http://www.tpc.org/tpch/default.asp*http://www.tpc.org/tpch/default.asp

PresentationPresentation

Contributions to Database Contributions to Database

IntegrationIntegration Standard APIStandard API for Integration (JDBC) for Integration (JDBC)AutomaticAutomatic generation of a generation of a globalglobal viewview of of integrated data sourcesintegrated data sources– Annotation done locally Annotation done locally – Common Vocabulary (National Cancer Institute-EVS)Common Vocabulary (National Cancer Institute-EVS)– Scalable to build a global schemaScalable to build a global schema

Simple Conceptual Query LanguageSimple Conceptual Query LanguageAutomatic Automatic JoinJoin DeterminationDetermination for queries for queriesAllows Allows evolutionevolution of data sources of data sourcesDetects Detects inconsistentinconsistent datadata across sources across sources

Unity JDBC Driver ArchitectureUnity JDBC Driver Architecture

DB1 DB2 DBn. . . . . . . . . . . .

Embedded Database Engine

JDBC JDBC JDBC

SQL

Unity JDBC Driver

Java Application

Semantic Query

Results

Extending Standard JDBC API for IntegrationExtending Standard JDBC API for Integration

Standard Java Interfaces for Single Standard Java Interfaces for Single Database JDBC Connections extended to Database JDBC Connections extended to Multiple DatabasesMultiple Databases– ConnectionConnection– Driver ManagerDriver Manager– StatementStatement– Result SetResult Set

Java Code for JDBC IntegrationJava Code for JDBC Integrationimport java.sql.*; public class JDBCApplication { public static void main(String[] args) { { String url = “jdbc:unity://sources.xml"; (1)

Connection con; (2)// Load UnityDriver class try { { Class.forName(``unity.jdbc.UnityDriver");} } (3)catch (java.lang.ClassNotFoundException e) { System.exit(1); } (4) try { //Initiate connection (5)

con = DriverManager.getConnection(url); (6)Statement stmt = con.createStatement(); (7)ResultSet rst = stmt.executeQuery(`SELECT Part.Name, (8)LineItem.Quantity, Customer.Name (9)WHERE Customer.Name='Customer_25’ ” ); }

(10)System.out.println(``Part , Quantity, Customer"); (11)while (rst.next()) (12){ System.out.println(rst.getString(``Part.Name") (13) +”,”+rst.getString(``LineItem.Quantity")

(14) +”,”+rst.getString(``Customer.Name") ); (15)

} con.close(); (16)} (17) catch (SQLException ex) { System.exit(1); }

(18) } }

XML File to Reference Data XML File to Reference Data SourcesSources

<SOURCES> SOURCES>

<DATABASE> <DATABASE>

<URL>jdbc:microsoft:sqlserver://<URL>jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.eduIDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC; :1433;DatabaseName=TPC; User=terry;Password=xxxxx</URL> User=terry;Password=xxxxx</URL>

<DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER>

<XSPEC>xspec/Order.xml</XSPEC> <XSPEC>xspec/Order.xml</XSPEC>

</DATABASE> </DATABASE>

<DATABASE> <DATABASE>

<URL>jdbc:microsoft:sqlserver://<URL>jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC; 1433;DatabaseName=TPC; User=terry;Password=yyyyyy</URL> User=terry;Password=yyyyyy</URL>

<DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER>

<XSPEC>xspec/Part.xml</XSPEC><XSPEC>xspec/Part.xml</XSPEC>

</DATABASE> </DATABASE>

</SOURCES></SOURCES>

- <XSPEC>      <databaseName><databaseName>Order</databaseName> </databaseName>       <databaseProductVersion><databaseProductVersion>Microsoft SQL Server 2000 Microsoft SQL Server 2000 </databaseProductVersion> </databaseProductVersion>

- <TABLE>   <semanticTableName><semanticTableName>CustomerCustomer</semanticTableName> </semanticTableName>    <tableName><tableName>CUSTOMERCUSTOMER</tableName> </tableName>

- <FIELD>   <semanticFieldName><semanticFieldName>Customer.Id</semanticFieldName> </semanticFieldName>

   <fieldName><fieldName>C_CUSTKEYC_CUSTKEY</fieldName> </fieldName>    <dataTypeName><dataTypeName>intint</dataTypeName> </dataTypeName>

   </FIELD>- <FIELD>

   <semanticFieldName><semanticFieldName>Customer.Name</semanticFieldName> </semanticFieldName>    <fieldName><fieldName>C_NAMEC_NAME</fieldName> </fieldName>    <dataTypeName><dataTypeName>varcharvarchar</dataTypeName> </dataTypeName>    <fieldSize><fieldSize>2525</fieldSize> </fieldSize>

</FIELD>- <FIELD>

   <semanticFieldName><semanticFieldName>Customer.Nation.Id</semanticFieldName> </semanticFieldName>    <fieldName><fieldName>C_NATIONKEYC_NATIONKEY</fieldName> </fieldName>    <dataTypeName><dataTypeName>intint</dataTypeName> </dataTypeName>

</FIELD>- <PRIMARYKEY>

   <keyScope>4</keyScope>   <keyScopeName>Organization</keyScopeName>   -- <FIELDS> <FIELDS>   <fieldName><fieldName>C_CUSTKEYC_CUSTKEY</fieldName> </fieldName>    </FIELDS></FIELDS>

  </PRIMARYKEY>- <FOREIGNKEY>

   <keyScope><keyScope>44</keyScope></keyScope>    <keyScopeName><keyScopeName>OrganizationOrganization</keyScopeName> </keyScopeName>    -- <FIELDS> <FIELDS>   <fieldName><fieldName>C_NATIONKEYC_NATIONKEY</fieldName> </fieldName>    </FIELDS></FIELDS>   <toTableName><toTableName>NATIONNATION</toTableName> </toTableName>

  </FOREIGNKEY>- <JOIN>

   <joinName><joinName>CUSTOMER->NATIONCUSTOMER->NATION</joinName> </joinName>    <fromKeyName><fromKeyName>FK__CUSTOMER__C_NATI__7A672E12FK__CUSTOMER__C_NATI__7A672E12</fromKeyName> </fromKeyName>    <fromTableName><fromTableName>CUSTOMERCUSTOMER</fromTableName> </fromTableName>    <toKeyName><toKeyName>PK__NATION__6E01572DPK__NATION__6E01572D</toKeyName> </toKeyName>    <toTableName><toTableName>NATIONNATION</toTableName> </toTableName>    <joinType><joinType>33</joinType> </joinType>

   </JOIN>

•Order.xml file (XSpec)

•Schema Information•Table

•Fields

•Primary key

•Foreign key

•Join

• Order Database

• Annotation-Semantic Names

• Scope of Keys (Global joins)

XML Document created Semi-automatically

• Schema Information - Extracted Automatically from Database

• Annotation and Scopes – Semi-automatically

Order Database customer(c_custkey, c_name, c_nationkey) orders(o_orderkey,o_custkey,o_orderdate)lineitem(l_orderkey,l_partkey,l_suppkey,l_linenum,l_qty)nation(n_nationkey, n_name,n_regionkey) region(r_regionkey, r_name)

Part Databasepart(p_partkey, p_name, p_mfgr)supplier(s_suppkey, s_name, s_nationkey)partsupp(ps_partkey,ps_suppkey)nation(n_nationkey, n_name, n_regionkey)region(r_regionkey, r_name)

Global Schema Part.Id, Part.Name, Part.ManufacturerSupplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name

Build Global SchemaOn Local Database

Annotations

Global Schema Part.Id, Part.Name, Part.ManufacturerSupplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name

Attribute Only SQLAttribute Only SQL Global SchemaGlobal Schema

Query Language on Query Language on concepts in Global concepts in Global SchemaSchemaNo FROM clauseNo FROM clause– Tables not specifiedTables not specified

Selection conditions on Selection conditions on conceptsconceptsOrder byOrder by

Query:SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025'

QueryQuery Processing StepsProcessing Steps

Parse Semantic Query– Validate concepts– Create parse tree

Map concepts to fields in local databasesDetermine joins to relate attributes in each local databaseBuild Execution Tree (Relational Algebra)– Execute a sub-query to each local database– Find global join or union to relate sub-queries– Combine sub-queries into single result set

Conceptual Query and Parse Conceptual Query and Parse TreeTree

Parse Tree:SELECT

Identifier: Part.Name

Identifier: LineItem.Quantity

Identifier: Customer.Name

WHERE

Comparison_Op: =

Identifier: Customer.Name

String: 'Customer#000000025'

Conceptual Query:

SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025'

Join Graph ConstructionJoin Graph Construction

Graph Graph represents joinsrepresents joins for each local for each local databasedatabase

Edges directed as N:1joinsEdges directed as N:1joins

Automatically extracted into XSpec or Automatically extracted into XSpec or added to the XSpec.added to the XSpec.

Used to Used to calculate joins for each sub-querycalculate joins for each sub-query

Nation

Line Item

Order

Customer

Region

Database Join Graphs

PartSupp

Nation

SupplierPart

Region

Part Database Order Database

Nation

Line Item

Order

Customer

Region

Map the Concepts in query to Relations

PartSupp

Nation

SupplierPart

Region

Part Database Order Database

Part.Name, LineItem.Quantity, Customer.NamePart.Name, LineItem.Quantity, Customer.Name

Line Item

Order

Customer

Determine Local Joins Steiner Tree Approximation Algorithm

Part

Part Database Order Database

Global Join

LineItem.Part.Id is

foreign key to Part.Id

Semantic Query: SELECT Part.Name, LineItem.Quantity, Customer.Name Where Customer.Name = 'Customer#000000025'

Build Execution TreeBuild Execution TreeRelational AlgebraRelational Algebra

ProjectionProjection – Concepts in SELECT portion of conceptual queryConcepts in SELECT portion of conceptual query– Sub-query projections of required fields Sub-query projections of required fields (global joins)(global joins)

SelectionSelection– WHERE conditions of conceptual queryWHERE conditions of conceptual query

JoinsJoins– Determined from Join GraphsDetermined from Join Graphs– Global joins identified by key scopesGlobal joins identified by key scopes

Sub-queries Sent to Each Local Database Sub-queries Sent to Each Local Database SQL through JDBCSQL through JDBC

Part Database:SELECT P.P_NAME, P.P_PARTKEY FROM PART AS P

Order Database:SELECT L.L_QUANTITY, C.C_NAME, L.L_PARTKEY FROM LINEITEM AS L, CUSTOMER AS C, ORDERS AS O WHERE C.C_NAME = 'Customer#000000025' AND O.O_CUSTKEY = C.C_CUSTKEY AND L.L_ORDERKEY = O.O_ORDERKEY

• Local joins determined from join graphsLocal joins determined from join graphs

•Selection ConditionSelection Condition

• Elements added to queries in order for the global join to be executed in Unity Driver.

Operator Execution TreeOperator Execution Tree

Idealab5

Database Server

Idealab1 Client (Unity Driver)

Unity Embedded Database Engine

Idealab3

Database Server

Experimental ResultsExperimental Results

Dynamic integration is efficient and scalableDynamic integration is efficient and scalable

Minimal overhead Minimal overhead

Multi-source query processingMulti-source query processing– Competitive with single source executionCompetitive with single source execution– Possible to execute queries on a global schema Possible to execute queries on a global schema

Schema Integration Results Schema Integration Results Multiple Copies of TPC-H (Seconds)Multiple Copies of TPC-H (Seconds)

Number of Schemas

IntegrateSchemas Connect

ParseSub-queries

Total Time

2 0.219 0.203 0.110 0.532

3 0.235 0.023 0.125 0.594

10 0.485 0.437 0.406 1.328

100 1.375 1.719 5.063 8.157

• Integration of schemas occurs in linear time based on number of schemas integrated.

• Integration and Connection executes only once at start up. Not for each query.

Query Small Result SizeQuery Small Result Size(76 Tuples)(76 Tuples)

0.42270.4553

1.3859

0

1

2

3

4

5

Unity TPC-H JDBC TPC-H Separate Computers

Tim

e (s

econ

ds)

* Only 76 tuples transported over network for single sub-query * Only 76 tuples transported over network for single sub-query

* Separate requires entire Part table imported to Unity for join* Separate requires entire Part table imported to Unity for join

Query Large Result SizeQuery Large Result Size(6,000,215 tuples)(6,000,215 tuples)

59.35 59.33

46.91

0

10

20

30

40

50

60

70

Unity TPC-H JDBC TPC-H SeparateComputers

Tim

e (

seco

nd

s)

* Distributed execution of the queries on multiple computers executed faster than a single database server due to parallelism for this particular query.

ConclusionsConclusions

Integration possible in a JDBC Driver Integration possible in a JDBC Driver

Local Annotation permits scalable Local Annotation permits scalable integrationsintegrations

Minimal Overhead to Process QueriesMinimal Overhead to Process Queries

Query Multiple Database on a Global ViewQuery Multiple Database on a Global View– No need to specify joinsNo need to specify joins– No requirement to know underlying schemasNo requirement to know underlying schemas

Future WorksFuture Works

AutoJoin – Scalable inference engine for AutoJoin – Scalable inference engine for join determinationjoin determination

Improve global query inferenceImprove global query inference

Sophisticated Global Query OptimizerSophisticated Global Query Optimizer

Extend to support Federated Database Extend to support Federated Database Queries Queries – No global schemaNo global schema– Fully Specified QueriesFully Specified Queries

Queries to Test Unity PerformanceQueries to Test Unity Performance(Data Labels on Charts)(Data Labels on Charts)

TPC-HTPC-H - Conceptual query executed through - Conceptual query executed through Unity driver against a single source TPC-H Unity driver against a single source TPC-H database.database.JDBC TPC-HJDBC TPC-H - SQL query equivalent to - SQL query equivalent to conceptual query executed directly through SQL conceptual query executed directly through SQL Server JDBC driver on a single source TPC-H Server JDBC driver on a single source TPC-H database.database.Partitioned on One ComputerPartitioned on One Computer - Conceptual - Conceptual query executed on TPC-H data set virtually query executed on TPC-H data set virtually partitioned into the Part and Order databasespartitioned into the Part and Order databases