data virtualization

18
1

Upload: cameroon45

Post on 18-Nov-2014

621 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: data virtualization

1

Page 2: data virtualization

2

Periscope: Access to Enterprise Data by Eric R. Broughton TUSC developed Periscope (patent pending), a product that addresses a serious business problem affecting all companies today. The problem Accessing real time enterprise data in a cost-effective manner The solution Data virtualization – the ability to view data from disparate sources without knowing or caring where the data actually resides. TUSC considers Periscope revolutionary because it allows users to access enterprise data without having to expend a great amount of time and resources. Periscope helps customers access their enterprise data from disparate data sources – in other words, it virtualizes your data. It drastically reduces the cost of integrating multiple systems because information residing in different systems and data sources can be accessed, updated and moved transparently to the users and developers. Because Periscope can access information from more than 100 JDBC data sources, it has the power to rapidly assimilate systems without expensive and lengthy integration efforts. Data is also transparently accessible from SMTP, HTTP, HTTPS, XML and SOAP via Periscope. In fact, any data can be virtualized with the Periscope Data Virtualization Toolkit. As competition for customers and market share intensifies, companies are seeking ways to gain a competitive advantage. For the majority of firms, the advantage lies within their corporate data sources. The data is located deep within legacy systems, disparate databases and data sources. Gaining access to this data is a monumental task in and of itself. This is particular true when attempting to achieve zero-latency data access. Objective The goal of this White Paper is to share the technical aspects of Periscope and how easily it can assimilate into a companies existing infrastructure. For additional information on the business aspects of Periscope, visit our Web site at www.tusc.com.

Page 3: data virtualization

3

Overview - What is Periscope? Periscope is a tool that enables enterprise data access from disparate data sources. It uses a combination of JDBC drivers and proprietary TUSC technology to create a connection between an Oracle database and external data sources. The connection generates a virtual table for any data source that has a JDBC driver. Periscope provides the ability to view 360 degrees of data while submerged in the Oracle database! Technology Stack

Periscope’s advantages • Access to 100s of unique data sources • Real time access to enterprise data • Data is available to support the zero-latency decision-making process • Significant reduction in total cost of ownership • A truly open architecture is achieved • Can place data appropriately (ie. Database - fastest/cheapest/lowest TCO/most space) • Data virtualization allows existing applications to access data transparently • No modifications or special APIs are required to use Periscope data Other benefits include: • Follows the company’s current security model

Page 4: data virtualization

4

• Provides the ability to query data from multiple non-Oracle data sources • One tool to access all data sources; one-to-many relationship • Ability to pump data to other data sources • Leverage existing systems – adds functionality without requiring changes • Ability to create materialized view (or snapshots – scheduled physical copies). • Platform independent Security One question that often arises from our Periscope users and potential customers is “How will Periscope handle my security.” The answer is simple: Periscope utilizes the existing security architecture of the data sources that Periscope is accessing. The proprietary combination of technologies leverages the inherent security features of existing applications through the chosen JDBC driver, XML, HTTP, SOAP, and SMTP connections. The chosen external data source access user will be confined to any and all security rules that currently exist. External data source administrators maintain total control over the privileges allowed. All of the database gateway passwords are stored in encrypted form – both in the database and the operating system levels. Data Access Periscope allows the end users and organization development teams access to external data sources without even knowing that the data exists on another system. In other words, the database appears to “live” in the Oracle database. The data may reside in a database (Access, SQL Server, Informix), on a Web Service (via SOAP or XML/RPC), from a Web site, or even an Excel spreadsheet. When an insert, update, delete, or select is performed, the user sees only what appears to be an Oracle database table. In addition to traditional Data Manipulation transactions, TUSC has developed an auto migration utility that allows you to quickly select a group of tables to either create Periscope Tables for or to actually copy directly into the Oracle database. This helps with the integration or migration of large databases. Any source of data can be virtualized. For example, through JDBC alone, over 100 distinct database technologies are supported. The Periscope data virtualization toolkit allows programmers to virtualize ANY source of data (as if it’s a local database table). Performance Periscope is extremely fast. In fact, while fetching 100 records at a time, we queried 2,000 records (from a 1GB/8 million row Microsoft Access table) in 7.5 seconds! 20,000 records were queried in about 78 seconds. By adjusting our table connection parameters, we began fetching 200 records at a time, which decreased our time to 76 seconds. Fetching 10 records at a time increased to

Page 5: data virtualization

5

107 seconds. These metrics are based on everything (Oracle database, Periscope and Access) all running on the same 1Ghz desktop machine, not a server. Even performing a full table scan of the Microsoft Access table with more than 8 million rows yields favorable results: 960 rows queried using a non-indexed column in less than 42 seconds! In fact, it’s important to note that the where clause is sent to the remote data source, so indexes on the remote database are used. Data will be limited BEFORE sending the data back to Oracle. Also, the bulk of the processing is performed in the remote database, not in the Oracle database, so transactions are distributed. Oracle/Periscope simply receives the virtualized data, which limits the processing requirements. The overhead for Periscope is minimal, varying between 10 and 150 milliseconds per query. In benchmarks, no noticeable overhead was found per row queried. The exceptional performance of Periscope is possible by utilizing the inherent performance characteristics of the external data sources. When a user selects data from a Periscope virtualized table, only the relevant data is retrieved from the external data source. Processing and network time is optimized! Requirements What does Periscope require to run?

- Oracle database (Standard Edition - Version 8.1.5 or later*) - Oracle Certified Operating System Version and Service Pack Level - JDBC drivers for all external data sources that you wish to virtualize

The administration component for Periscope (the component that creates the Periscope Gateways and Table Connections as described below) has the following requirements:

- JSP Capable Web / Application Server (ie. 9iAS, Apache/Tomcat) - Standard Browser *9i is the preferred Oracle database platform due to the performance gains of table functions versus the V8 usage of PL/SQL tables or arrays to pass data sets, which is very memory intensive.

Periscope Administration Component Everything necessary for the configuration and maintenance of Periscope, following the installation, is performed via the chosen browser by means of JavaServer Pages. There is no modification required of existing systems! If those systems can perform queries against “Oracle tables,” they will be able to query Periscope Table Connections transparently.

Page 6: data virtualization

6

As you can see in Figure 1, all administration begins at Periscope’s main menu. Whether you’re defining the JDBC Drivers, building/testing Database Gateways or Table Connections, or migrating an entire database it’s easily accomplished with the browser user interface.

Figure 1. Periscope’s Main Menu Setting Up JDBC Drivers Upon installation of Periscope, the necessary JDBC drivers are loaded into the Oracle database. These drivers must also be defined to Periscope. As you can see in Figure 2, the configuration of the Periscope JDBC drivers is easy to accomplish; simply specify the name of the database the driver supports, the

Page 7: data virtualization

7

specific JDBC driver library and sample JDBC connect string (both obtained from the respective JDBC driver documentation) and a complete description for the driver.

Figure 2. Setting up a JDBC Driver Building the Database Gateway As you can see in Figure 3, the configuration of a Periscope gateway is easily accomplished; simply choose the gateway name and JDBC driver (from the drop down list of drivers specified in the above JDBC driver step), specify the connect string, user name and password and click “Create Database Gateway”. Note that passwords are stored in the database and operating system in an encrypted format.

Page 8: data virtualization

8

Figure 3. Building the Database Gateway In addition to the intuitive Database Gateway creation utility, Periscope also contains the gateway testing and editing functionality that you would expect in an Enterprise Application Integration product. The output of this test utility is shown in Figure 4. The connection is tested from the application server (ie. 9iAS) and from the database.

Page 9: data virtualization

9

Figure 4. Testing the Database Gateway

Page 10: data virtualization

10

Building a Table Connection After the successful creation of a Database Gateway, immediately build a table connection between the Oracle database and the chosen data source. Figure 5 shows the options that the Table Connection utility gives for an effective remote data management strategy. The Table Connection builder provides the following: - User-selected name of the logical table (this translates to the name of the

view in the Oracle database) - Database Gateway to be used (chosen from the drop down list that results

from the prior Database Gateway step) - Select, group by, having, and order by statement parameters that can be

used to manipulate the data that you will be pulling from the remote data source before it returns data to Oracle

- Default fetch size for data retrieval management – this is the number of rows fetched from the remote database in each cursor block. The higher the number, the more rows that are queried per cursor fetch. If the number of rows being used from the query is small, this number should be small (ie. 1-10). If the process is a large batch process, set this number to a higher number (ie. 100-200).

- Create DML Functionality for Table - to create data manipulation (insert, update and delete) functionality, check this box. Otherwise leave it unchecked.

Page 11: data virtualization

11

Figure 5. Building a Periscope Table Connection

If you don’t know the specific data that you want to pull over from your application, you can use the Periscope “Query Builder.” The Query Builder provides a complete listing of database Gateway’s data points (ie. Tables and Views). After choosing the table to access (Figure 6.), Periscope displays a choice of columns from the table (Figure 7).

Page 12: data virtualization

12

Figure 6. Choose a Table or View

Figure 7. Choose the table/view columns to include in query

Page 13: data virtualization

13

Regardless of the method used to build a Periscope Connection, when you are done configuring the Table Connection parameters, click the “Create Periscope Table Connection” button. In less than one minute, Periscope creates all the necessary PL/SQL packages, Java objects, and views in the Oracle database. As shown in Figure 8, upon completion of the Periscope Connection build, the chosen data source is queried and a sample set of the data is displayed for verification. Note that information about the specific JDBC driver used and the specific database queried are displayed. All Oracle reserved words are also modified (an underscore is added to the end of the column name).

Figure 8 – Built Periscope Table Connection

Accessing the Data The data source is now viewable from within the Oracle database and can be manipulated as if it were any other Oracle table. Note that changes (insert, update and delete operations) can be performed on the Periscope Table Connection if you chose “Create DML Functionality for Table?” AND your JDBC driver supports DML AND you have the proper security to modify the table’s data, otherwise, you’ll only have read access (again, assuming you have the permission to do so). Complex queries can be written, Oracle products such as Portal, Developer and the other 1500+ products that run on Oracle will recognize

Page 14: data virtualization

14

the table, and all the advanced features of the Oracle database can now be used in concert with the Periscope Table Connection. Testing/Benchmarking Table Connectors Testing or Benchmarking Table Connectors is easy with Periscope. The JSP application queries rows (again Oracle using JDBC) and reports the performance statistics. Rows can be queried with or without displaying the records. As shown in Figure 9, it’s easy to choose how many rows to query (and the fetch size) and whether the table function (API) or the Oracle view is used.

Figure 9 – Periscope Test Table Connection

Migration Assistant Migrating entire databases with Periscope is easy. Simply choose the Database Gateway to migrate, as shown in Figure 10, then pick the tables to migrate (or copy), and finally choose whether to create Periscope Table Connectors (virtual links to the remote database) or physical Oracle tables for each remote database table selected.

Page 15: data virtualization

15

Figure 10 – Database Migration

Periscope Properties As shown in Figure 11, all of Periscope’s properties can also be modified via the JSP user interface. Data type translations, directory mappings, executable names, Oracle connection (that Periscope objects are created within), reserved words, and more are all maintained through the Periscope properties maintenance application.

Page 16: data virtualization

16

Figure 11 – Periscope Properties

The Future of Periscope When Periscope was conceived and first developed, it allowed access to data sources via JDBC drivers. While in pilot, customers began to ask for (and got) access to other data sources. It was at this point that TUSC realized the real potential behind virtualizing more than databases – specifically, virtualizing all data. Based on additional requests to read other data sources, TUSC is developing the functionality to allow access many additional data sources. Periscope’s developer data virtualization toolkit allows ANY data to be virtualized. As Periscope moves from a tier three (TUSC’s service-based solution) to a tier one (off-the-shelf software product) in the coming months, several changes will accompany the change in focus. We plan on working out relationships with the major JDBC vendors and provide an “approved” or “rated” vendor listing. Our data access reach will also be enhanced.

Page 17: data virtualization

17

Periscope APIs To support developers of Periscope applications, the following APIs are automatically generated by Periscope: Gateway API Java - {gateway_name}GAPI.java {gateway_name}GAPI() - no parameters, opens connection {gateway_name}GAPI(sql) - open connection for SQL statement Table API Java - {table_name}TAPI.java autoCommitOff() - turns auto commit off autoCommitOn() - turns auto commit on closeTable() - closes result set commit() - commit records fetchTable() - fetch records from the table openConnection() - call the gateway API to open the connection openTable() - create the initial SQL statement and result set rollback() - rollback records Table API Package - {table_name}_TAPI auto_commit_off auto_commit_on commit_data rollback_data my_table (Rows to Fetch - input, Where clause - input, Group By -

input, Having clause - input, Order by - input) - does open, fetch, close

open_connection open_table (SQL statement - input) fetch_table (Rows to fetch - input, Table Data Type - ouput) close_table Other Table Objects {table_name}_TABLE - user defined data type for PL/SQL table {table_name}_COLUMNS - user defined data type for columns in

the table {table_name} - view using {table_name}_TAPI.my_table Periscope allows the developer to programmatically create Database Gateways and Table Connectors using these APIs: BuildOpenDB.buildGateway() BuildTable.generateTable()

Page 18: data virtualization

18

Summary The vision for Periscope is clear: TUSC plans to virtualize data around the world. The zero-latency access to enterprise data will change the way companies view their data. The power of Periscope will change the world!