cse 636 data integration overview. 2 data warehouse architecture data source data source relational...
Post on 21-Dec-2015
226 views
TRANSCRIPT
![Page 1: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/1.jpg)
CSE 636Data Integration
Overview
![Page 2: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/2.jpg)
2
Data Warehouse Architecture
DataSource
DataSource
Relational Database(Warehouse)
DataSource
Users
Applications
OLAP / Decision SupportData Cubes / Data Mining
ETL Tools(Extract-Transform-Load)
Data Cleaning
![Page 3: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/3.jpg)
3
Virtual Integration Architecture
• Leave the data in the sources• When a query comes in:
– Determine the relevant sources to the query– Break down the query into sub-queries for the sources– Get the answers from the sources, filter them if needed
and combine them appropriately
• Data is fresh• Otherwise known as
On Demand Integration
![Page 4: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/4.jpg)
4
Virtual Integration Architecture
End Users
Applications
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Design-Time
SchemaMappingsSchema
MappingsSchema
Mappings
Sources can be:• Relational DBs• Excel Files• Web Sites• Web Services
![Page 5: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/5.jpg)
5
• Differences in:– Names in schema– Attribute grouping
– Coverage of databases– Granularity and format of attributes
Inventory Database B
AuthorsISBNFirstNameLastName
BooksTitleISBNPriceDiscountPriceEdition
Inventory Database A
BooksAndMusicTitleAuthorPublisherItemIDItemTypeSuggestedPriceCategoriesKeywords
Schema Mappings
BookCategoriesISBNCategory
CDCategoriesASINCategory
ArtistsASINArtistNameGroupName
CDsAlbumASINPriceDiscountPriceStudio
![Page 6: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/6.jpg)
6
Issues for Schema Mappings
Design-Time
• What formalisms to express them?
• How to create them?• Can we discover them
somehow?• How do we use them?
End Users
Applications
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
SchemaMappingsSchema
MappingsSchema
Mappings
![Page 7: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/7.jpg)
7
Mediator
Virtual Integration Architecture
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Run-Time
Reformulation
Optimization
Execution
Query Result
Wrapper Wrapper
![Page 8: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/8.jpg)
8
Mediator
Issues for Query Processing
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Reformulation
Reformulation
Query
• User queries refer to the global schema
• Data is stored in the sources in a local schema
• Rewriting algorithms
![Page 9: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/9.jpg)
9
Issues for Query Processing
Reformulation
Global Schema
BooksTitleISBNPriceDiscountPriceEdition
Local Schema A
BooksAndMusicTitleAuthorPublisherItemIDItemTypeSuggestedPriceCategoriesKeywords
SELECT ISBN, PriceFROM BooksWHERE Title = ‘on the road’
SELECT ItemID, SuggestedPriceFROM BooksAndMusicWHERE Title = ‘on the road’AND ItemType = ‘Books’
![Page 10: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/10.jpg)
10
Mediator
Issues for Query Processing
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Query Translation
Reformulation
Optimization
Execution
Query
Wrapper
• Different query languages
![Page 11: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/11.jpg)
11
Local Source A
Issues for Query Processing
Query Translation
Global Schema
BooksTitleISBNPriceDiscountPriceEdition
SELECT ISBN, PriceFROM BooksWHERE Title = ‘on the road’
http://www.amazon.com/homepage.html?ItemType=Books&Title=on+the+road
![Page 12: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/12.jpg)
12
Mediator
Issues for Query Processing
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Data Translation
Reformulation
Optimization
Execution
Query
Wrapper
• Different data models
![Page 13: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/13.jpg)
13
Issues for Query Processing
Data Translation
<table> <tr> <td> <a href=/details?isbn=123> <b>On the Road</b> </a> -- by Jack Kerouac; Paperback <br> <a href=/details?isbn=123> Buy new </a> :<b class=price>$10.86</b> </td> </tr></table>
Local Result A
Global Schema
BooksTitleISBNPriceDiscountPriceEdition
Title ISBN Price … …
On the Road 123 10.86 … …
![Page 14: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/14.jpg)
14
Mediator
Issues for Query Processing
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Query Execution
Reformulation
Optimization
Execution
Query
Wrapper Wrapper
• Access as many data sources as needed
• Duplicate/redundant and irrelevant data
• Limited query capabilities
![Page 15: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/15.jpg)
15
Issues for Query Processing
Limited Query Capabilities
Global Schema
BooksTitleISBNPriceDiscountPriceEdition
Local Schema A
BooksAndMusicTitleAuthorItemIDItemTypeSuggestedPrice
SELECT ISBN, Price, DiscountPriceFROM BooksWHERE Title = ‘on the road’
SELECT GreatPriceFROM DiscountBooksWHERE ISBN = ?
Local Schema B
DiscountBooksTitleEditionISBNGreatPrice
SELECT ItemID, SuggestedPriceFROM BooksAndMusicWHERE Title = ?
SELECT ItemID, SuggestedPriceFROM BooksAndMusicWHERE Title = ‘on the road’
A
B
SELECT GreatPriceFROM DiscountBooksWHERE ISBN = 123
C
ItemID SuggestedPrice
123 10.86
ItemID SuggestedPrice
123 10.86D
E
GreatPrice
8.86
ISBN Price DiscountPrice
123 10.86 8.86
![Page 16: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/16.jpg)
16
Mediator
Issues for Query Processing
DataSource
DataSource
GlobalSchema
LocalSchema
LocalSchema
DataSource
LocalSchema
Query Answering
Reformulation
Optimization
Execution
Query Result
Wrapper Wrapper
• Combine the results and further process them if needed
• Mainly union and merge• Inconsistencies
![Page 17: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/17.jpg)
17
Issues for Query Processing
Query Answering (Union)
ItemID SuggestedPrice
123 10.86
ISBN GreatPrice
456 8.86
ISBN Price
123 10.86
456 8.86
![Page 18: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/18.jpg)
18
Issues for Query Processing
Query Answering (Merge)
ItemID Title
123 On the Road
ISBN Edition Price
123 2nd 8.86
ISBN Title Edition Price
123 On the Road 2nd 8.86
PrimaryKey
ISBN Title Edition Price
123 On the Road 2nd 8.86
PrimaryKey
PrimaryKey
![Page 19: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/19.jpg)
19
Issues for Query Processing
Query Answering (Inconsistencies)
ItemID Title Edition
123 On the Road 1st
ISBN Edition Price
123 2nd 8.86
ISBN Title Edition Price
123 On the Road 8.86
PrimaryKey
ISBN Title Edition Price
123 On the Road ??? 8.86
PrimaryKey
PrimaryKey
![Page 20: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/20.jpg)
21
Peer-Based Integration
Peer 2
Peer 1
Peer 5
Peer 3
Peer 4Query
Query
![Page 21: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/21.jpg)
22
Peer-Based Integration
• No need for a central mediated schema• Peers serve as mediators for other peers• A peer can be both a server and a client• Semantic relationships are specified locally
(between small sets of peers)• Queries are posed using the peer’s schema• Answers come from anywhere in the system• This is not P2P file sharing.
– Data has rich semantics
![Page 22: CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users Applications](https://reader035.vdocuments.net/reader035/viewer/2022062407/56649d635503460f94a458fe/html5/thumbnails/22.jpg)
23
References
• Information integration– Maurizio Lenzerini
– Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003
– Invited Tutorial
• Data Integration: a Status Report– Alon Halevy
– German Database Conference (BTW), 2003– Invited Talk