dynamic database integration in a jdbc driver terrence mason and dr. ramon lawrence iowa database...
Post on 21-Dec-2015
217 views
TRANSCRIPT
Dynamic Database Dynamic Database Integration Integration
in a JDBC Driverin a JDBC Driver
Terrence Mason and Dr. Ramon LawrenceTerrence Mason and Dr. Ramon LawrenceIowa Database and Emerging Application LaboratoryIowa Database and Emerging Application Laboratory
University of IowaUniversity of Iowa
7th International Conference on Enterprise Information Systems 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, FloridaICEIS 2005 Miami, Florida
Discuss the contributions of JDBC Driver Discuss the contributions of JDBC Driver
Review the ArchitectureReview the Architecture
Step through an example integration and Step through an example integration and query (Partitioned TPC-H* dataset)query (Partitioned TPC-H* dataset)
Review the experimental resultsReview the experimental results
Demonstrate efficient Database Demonstrate efficient Database IntegrationIntegration
*http://www.tpc.org/tpch/default.asp*http://www.tpc.org/tpch/default.asp
PresentationPresentation
Contributions to Database Contributions to Database
IntegrationIntegration Standard APIStandard API for Integration (JDBC) for Integration (JDBC)AutomaticAutomatic generation of a generation of a globalglobal viewview of of integrated data sourcesintegrated data sources– Annotation done locally Annotation done locally – Common Vocabulary (National Cancer Institute-EVS)Common Vocabulary (National Cancer Institute-EVS)– Scalable to build a global schemaScalable to build a global schema
Simple Conceptual Query LanguageSimple Conceptual Query LanguageAutomatic Automatic JoinJoin DeterminationDetermination for queries for queriesAllows Allows evolutionevolution of data sources of data sourcesDetects Detects inconsistentinconsistent datadata across sources across sources
Unity JDBC Driver ArchitectureUnity JDBC Driver Architecture
DB1 DB2 DBn. . . . . . . . . . . .
Embedded Database Engine
JDBC JDBC JDBC
SQL
Unity JDBC Driver
Java Application
Semantic Query
Results
Extending Standard JDBC API for IntegrationExtending Standard JDBC API for Integration
Standard Java Interfaces for Single Standard Java Interfaces for Single Database JDBC Connections extended to Database JDBC Connections extended to Multiple DatabasesMultiple Databases– ConnectionConnection– Driver ManagerDriver Manager– StatementStatement– Result SetResult Set
Java Code for JDBC IntegrationJava Code for JDBC Integrationimport java.sql.*; public class JDBCApplication { public static void main(String[] args) { { String url = “jdbc:unity://sources.xml"; (1)
Connection con; (2)// Load UnityDriver class try { { Class.forName(``unity.jdbc.UnityDriver");} } (3)catch (java.lang.ClassNotFoundException e) { System.exit(1); } (4) try { //Initiate connection (5)
con = DriverManager.getConnection(url); (6)Statement stmt = con.createStatement(); (7)ResultSet rst = stmt.executeQuery(`SELECT Part.Name, (8)LineItem.Quantity, Customer.Name (9)WHERE Customer.Name='Customer_25’ ” ); }
(10)System.out.println(``Part , Quantity, Customer"); (11)while (rst.next()) (12){ System.out.println(rst.getString(``Part.Name") (13) +”,”+rst.getString(``LineItem.Quantity")
(14) +”,”+rst.getString(``Customer.Name") ); (15)
} con.close(); (16)} (17) catch (SQLException ex) { System.exit(1); }
(18) } }
XML File to Reference Data XML File to Reference Data SourcesSources
<SOURCES> SOURCES>
<DATABASE> <DATABASE>
<URL>jdbc:microsoft:sqlserver://<URL>jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.eduIDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC; :1433;DatabaseName=TPC; User=terry;Password=xxxxx</URL> User=terry;Password=xxxxx</URL>
<DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER>
<XSPEC>xspec/Order.xml</XSPEC> <XSPEC>xspec/Order.xml</XSPEC>
</DATABASE> </DATABASE>
<DATABASE> <DATABASE>
<URL>jdbc:microsoft:sqlserver://<URL>jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC; 1433;DatabaseName=TPC; User=terry;Password=yyyyyy</URL> User=terry;Password=yyyyyy</URL>
<DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER>
<XSPEC>xspec/Part.xml</XSPEC><XSPEC>xspec/Part.xml</XSPEC>
</DATABASE> </DATABASE>
</SOURCES></SOURCES>
- <XSPEC> <databaseName><databaseName>Order</databaseName> </databaseName> <databaseProductVersion><databaseProductVersion>Microsoft SQL Server 2000 Microsoft SQL Server 2000 </databaseProductVersion> </databaseProductVersion>
- <TABLE> <semanticTableName><semanticTableName>CustomerCustomer</semanticTableName> </semanticTableName> <tableName><tableName>CUSTOMERCUSTOMER</tableName> </tableName>
- <FIELD> <semanticFieldName><semanticFieldName>Customer.Id</semanticFieldName> </semanticFieldName>
<fieldName><fieldName>C_CUSTKEYC_CUSTKEY</fieldName> </fieldName> <dataTypeName><dataTypeName>intint</dataTypeName> </dataTypeName>
</FIELD>- <FIELD>
<semanticFieldName><semanticFieldName>Customer.Name</semanticFieldName> </semanticFieldName> <fieldName><fieldName>C_NAMEC_NAME</fieldName> </fieldName> <dataTypeName><dataTypeName>varcharvarchar</dataTypeName> </dataTypeName> <fieldSize><fieldSize>2525</fieldSize> </fieldSize>
</FIELD>- <FIELD>
<semanticFieldName><semanticFieldName>Customer.Nation.Id</semanticFieldName> </semanticFieldName> <fieldName><fieldName>C_NATIONKEYC_NATIONKEY</fieldName> </fieldName> <dataTypeName><dataTypeName>intint</dataTypeName> </dataTypeName>
</FIELD>- <PRIMARYKEY>
<keyScope>4</keyScope> <keyScopeName>Organization</keyScopeName> -- <FIELDS> <FIELDS> <fieldName><fieldName>C_CUSTKEYC_CUSTKEY</fieldName> </fieldName> </FIELDS></FIELDS>
</PRIMARYKEY>- <FOREIGNKEY>
<keyScope><keyScope>44</keyScope></keyScope> <keyScopeName><keyScopeName>OrganizationOrganization</keyScopeName> </keyScopeName> -- <FIELDS> <FIELDS> <fieldName><fieldName>C_NATIONKEYC_NATIONKEY</fieldName> </fieldName> </FIELDS></FIELDS> <toTableName><toTableName>NATIONNATION</toTableName> </toTableName>
</FOREIGNKEY>- <JOIN>
<joinName><joinName>CUSTOMER->NATIONCUSTOMER->NATION</joinName> </joinName> <fromKeyName><fromKeyName>FK__CUSTOMER__C_NATI__7A672E12FK__CUSTOMER__C_NATI__7A672E12</fromKeyName> </fromKeyName> <fromTableName><fromTableName>CUSTOMERCUSTOMER</fromTableName> </fromTableName> <toKeyName><toKeyName>PK__NATION__6E01572DPK__NATION__6E01572D</toKeyName> </toKeyName> <toTableName><toTableName>NATIONNATION</toTableName> </toTableName> <joinType><joinType>33</joinType> </joinType>
</JOIN>
•Order.xml file (XSpec)
•Schema Information•Table
•Fields
•Primary key
•Foreign key
•Join
• Order Database
• Annotation-Semantic Names
• Scope of Keys (Global joins)
XML Document created Semi-automatically
• Schema Information - Extracted Automatically from Database
• Annotation and Scopes – Semi-automatically
Order Database customer(c_custkey, c_name, c_nationkey) orders(o_orderkey,o_custkey,o_orderdate)lineitem(l_orderkey,l_partkey,l_suppkey,l_linenum,l_qty)nation(n_nationkey, n_name,n_regionkey) region(r_regionkey, r_name)
Part Databasepart(p_partkey, p_name, p_mfgr)supplier(s_suppkey, s_name, s_nationkey)partsupp(ps_partkey,ps_suppkey)nation(n_nationkey, n_name, n_regionkey)region(r_regionkey, r_name)
Global Schema Part.Id, Part.Name, Part.ManufacturerSupplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name
Build Global SchemaOn Local Database
Annotations
Global Schema Part.Id, Part.Name, Part.ManufacturerSupplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name
Attribute Only SQLAttribute Only SQL Global SchemaGlobal Schema
Query Language on Query Language on concepts in Global concepts in Global SchemaSchemaNo FROM clauseNo FROM clause– Tables not specifiedTables not specified
Selection conditions on Selection conditions on conceptsconceptsOrder byOrder by
Query:SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025'
QueryQuery Processing StepsProcessing Steps
Parse Semantic Query– Validate concepts– Create parse tree
Map concepts to fields in local databasesDetermine joins to relate attributes in each local databaseBuild Execution Tree (Relational Algebra)– Execute a sub-query to each local database– Find global join or union to relate sub-queries– Combine sub-queries into single result set
Conceptual Query and Parse Conceptual Query and Parse TreeTree
Parse Tree:SELECT
Identifier: Part.Name
Identifier: LineItem.Quantity
Identifier: Customer.Name
WHERE
Comparison_Op: =
Identifier: Customer.Name
String: 'Customer#000000025'
Conceptual Query:
SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025'
Join Graph ConstructionJoin Graph Construction
Graph Graph represents joinsrepresents joins for each local for each local databasedatabase
Edges directed as N:1joinsEdges directed as N:1joins
Automatically extracted into XSpec or Automatically extracted into XSpec or added to the XSpec.added to the XSpec.
Used to Used to calculate joins for each sub-querycalculate joins for each sub-query
Nation
Line Item
Order
Customer
Region
Database Join Graphs
PartSupp
Nation
SupplierPart
Region
Part Database Order Database
Nation
Line Item
Order
Customer
Region
Map the Concepts in query to Relations
PartSupp
Nation
SupplierPart
Region
Part Database Order Database
Part.Name, LineItem.Quantity, Customer.NamePart.Name, LineItem.Quantity, Customer.Name
Line Item
Order
Customer
Determine Local Joins Steiner Tree Approximation Algorithm
Part
Part Database Order Database
Global Join
LineItem.Part.Id is
foreign key to Part.Id
Semantic Query: SELECT Part.Name, LineItem.Quantity, Customer.Name Where Customer.Name = 'Customer#000000025'
Build Execution TreeBuild Execution TreeRelational AlgebraRelational Algebra
ProjectionProjection – Concepts in SELECT portion of conceptual queryConcepts in SELECT portion of conceptual query– Sub-query projections of required fields Sub-query projections of required fields (global joins)(global joins)
SelectionSelection– WHERE conditions of conceptual queryWHERE conditions of conceptual query
JoinsJoins– Determined from Join GraphsDetermined from Join Graphs– Global joins identified by key scopesGlobal joins identified by key scopes
Sub-queries Sent to Each Local Database Sub-queries Sent to Each Local Database SQL through JDBCSQL through JDBC
Part Database:SELECT P.P_NAME, P.P_PARTKEY FROM PART AS P
Order Database:SELECT L.L_QUANTITY, C.C_NAME, L.L_PARTKEY FROM LINEITEM AS L, CUSTOMER AS C, ORDERS AS O WHERE C.C_NAME = 'Customer#000000025' AND O.O_CUSTKEY = C.C_CUSTKEY AND L.L_ORDERKEY = O.O_ORDERKEY
• Local joins determined from join graphsLocal joins determined from join graphs
•Selection ConditionSelection Condition
• Elements added to queries in order for the global join to be executed in Unity Driver.
Operator Execution TreeOperator Execution Tree
Idealab5
Database Server
Idealab1 Client (Unity Driver)
Unity Embedded Database Engine
Idealab3
Database Server
Experimental ResultsExperimental Results
Dynamic integration is efficient and scalableDynamic integration is efficient and scalable
Minimal overhead Minimal overhead
Multi-source query processingMulti-source query processing– Competitive with single source executionCompetitive with single source execution– Possible to execute queries on a global schema Possible to execute queries on a global schema
Schema Integration Results Schema Integration Results Multiple Copies of TPC-H (Seconds)Multiple Copies of TPC-H (Seconds)
Number of Schemas
IntegrateSchemas Connect
ParseSub-queries
Total Time
2 0.219 0.203 0.110 0.532
3 0.235 0.023 0.125 0.594
10 0.485 0.437 0.406 1.328
100 1.375 1.719 5.063 8.157
• Integration of schemas occurs in linear time based on number of schemas integrated.
• Integration and Connection executes only once at start up. Not for each query.
Query Small Result SizeQuery Small Result Size(76 Tuples)(76 Tuples)
0.42270.4553
1.3859
0
1
2
3
4
5
Unity TPC-H JDBC TPC-H Separate Computers
Tim
e (s
econ
ds)
* Only 76 tuples transported over network for single sub-query * Only 76 tuples transported over network for single sub-query
* Separate requires entire Part table imported to Unity for join* Separate requires entire Part table imported to Unity for join
Query Large Result SizeQuery Large Result Size(6,000,215 tuples)(6,000,215 tuples)
59.35 59.33
46.91
0
10
20
30
40
50
60
70
Unity TPC-H JDBC TPC-H SeparateComputers
Tim
e (
seco
nd
s)
* Distributed execution of the queries on multiple computers executed faster than a single database server due to parallelism for this particular query.
ConclusionsConclusions
Integration possible in a JDBC Driver Integration possible in a JDBC Driver
Local Annotation permits scalable Local Annotation permits scalable integrationsintegrations
Minimal Overhead to Process QueriesMinimal Overhead to Process Queries
Query Multiple Database on a Global ViewQuery Multiple Database on a Global View– No need to specify joinsNo need to specify joins– No requirement to know underlying schemasNo requirement to know underlying schemas
Future WorksFuture Works
AutoJoin – Scalable inference engine for AutoJoin – Scalable inference engine for join determinationjoin determination
Improve global query inferenceImprove global query inference
Sophisticated Global Query OptimizerSophisticated Global Query Optimizer
Extend to support Federated Database Extend to support Federated Database Queries Queries – No global schemaNo global schema– Fully Specified QueriesFully Specified Queries
Queries to Test Unity PerformanceQueries to Test Unity Performance(Data Labels on Charts)(Data Labels on Charts)
TPC-HTPC-H - Conceptual query executed through - Conceptual query executed through Unity driver against a single source TPC-H Unity driver against a single source TPC-H database.database.JDBC TPC-HJDBC TPC-H - SQL query equivalent to - SQL query equivalent to conceptual query executed directly through SQL conceptual query executed directly through SQL Server JDBC driver on a single source TPC-H Server JDBC driver on a single source TPC-H database.database.Partitioned on One ComputerPartitioned on One Computer - Conceptual - Conceptual query executed on TPC-H data set virtually query executed on TPC-H data set virtually partitioned into the Part and Order databasespartitioned into the Part and Order databases