informationslogistik unit 8: oltp, olap, sap, data...
TRANSCRIPT
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
InformationslogistikUnit 8: OLTP, OLAP, SAP, Data Warehouse, and
Object-relational Databases
Ronald Ortner
28. V. 2013
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Organization
Final exam for UE on June, 18th:group 1: 11:30–12:30, group 2: 12:45–13:45 (?)Final exam for VO on June 25th:register via MU Online
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
SQL-Lesson
Today:Extensions I:counting with 0Extensions II:when to put conditions in the ON / WHERE part
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
OLTP and OLAP
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
OLTP and OLAP
OLTP vs. OLAP
OLTP: online transaction processingDatabase applications for ongoing workExamples: orders, bookings, etc.current data is important→ many updates and changes in database
OLAP: online analytical processingDatabase applications for analysis and decision supportExample: analysis of trendshistorical data is important→ lots of data, need information in aggregated form
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
OLTP and OLAP
OLTP vs. OLAP
OLTP: online transaction processingDatabase applications for ongoing workExamples: orders, bookings, etc.current data is important→ many updates and changes in database
OLAP: online analytical processingDatabase applications for analysis and decision supportExample: analysis of trendshistorical data is important→ lots of data, need information in aggregated form
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
SAP
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
SAP
SAP
SAP: software system, mainly for OLTP
SAP has three levels:big relational database system in the backgroundapplications that work on the database systemgraphical user interface
Access to underlying database system:Some tables can be accessed also outside SAP (using SQL).Usually only read access is sensible.Some other tables can be accessed only via SAP.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
SAP
SAP
SAP: software system, mainly for OLTP
SAP has three levels:big relational database system in the backgroundapplications that work on the database systemgraphical user interface
Access to underlying database system:Some tables can be accessed also outside SAP (using SQL).Usually only read access is sensible.Some other tables can be accessed only via SAP.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
SAP
SAP
SAP: software system, mainly for OLTP
SAP has three levels:big relational database system in the backgroundapplications that work on the database systemgraphical user interface
Writing applications with ABAP/4 – access to databases withNative SQL (using special interface)Open SQL (direct access to databases)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
OLTP vs. OLAP
OLTP: online transaction processingDatabase applications for ongoing workExamples: orders, bookings, etc.current data is important→ many updates and changes in database
OLAP: online analytical processingDatabase applications for analysis and decision supportExample: analysis of trendshistorical data is important→ lots of data, need information in aggregated form
→ no good idea to do OLTP and OLAP on the same database system
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
OLTP vs. OLAP
OLTP: online transaction processingDatabase applications for ongoing workExamples: orders, bookings, etc.current data is important→ many updates and changes in database
OLAP: online analytical processingDatabase applications for analysis and decision supportExample: analysis of trendshistorical data is important→ lots of data, need information in aggregated form
→ no good idea to do OLTP and OLAP on the same database system
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Data Warehouse
Idea of Data Warehouse:Do OLTP on operational databasesStore information from operational databases regularly(but not online!) in data warehouse
Database Scheme for Data Warehouse:Star Scheme:
Central ‘fact’ tableother tables not normalized
Snowflake Scheme:Central ‘fact’ tableother tables normalized (→ more joins necessary)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Data Warehouse
Idea of Data Warehouse:Do OLTP on operational databasesStore information from operational databases regularly(but not online!) in data warehouse
Database Scheme for Data Warehouse:Star Scheme:
Central ‘fact’ tableother tables not normalized
Snowflake Scheme:Central ‘fact’ tableother tables normalized (→ more joins necessary)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Data Warehouse
Idea of Data Warehouse:Do OLTP on operational databasesStore information from operational databases regularly(but not online!) in data warehouse
Database Scheme for Data Warehouse:Star Scheme:
Central ‘fact’ tableother tables not normalized
Snowflake Scheme:Central ‘fact’ tableother tables normalized (→ more joins necessary)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Roll Up and Drill Down
Queries on Data Warehouse for analysis usually aggregate data(→ GROUP BY)
Drill down: more attributes in GROUP BY
Roll up: fewer attributes in GROUP BY
Data can be summarized in a cross table (data cube)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Relations for Aggregation & the Cube Operator
Creating the data cube:
expensive to execute all queries for creating cube
can store relation for data cube(using NULL values where aggregated)still elaborate and uncomfortable→ idea: new SQL operator CUBEUsage: GROUP BY CUBE( attr1, attr2, . . . )Other possibility:
storing maximally drilled-down tableaggregate this table(cheaper than doing each aggregation from scratch)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Relations for Aggregation & the Cube Operator
Creating the data cube:
expensive to execute all queries for creating cubecan store relation for data cube(using NULL values where aggregated)
still elaborate and uncomfortable→ idea: new SQL operator CUBEUsage: GROUP BY CUBE( attr1, attr2, . . . )Other possibility:
storing maximally drilled-down tableaggregate this table(cheaper than doing each aggregation from scratch)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Relations for Aggregation & the Cube Operator
Creating the data cube:
expensive to execute all queries for creating cubecan store relation for data cube(using NULL values where aggregated)still elaborate and uncomfortable
→ idea: new SQL operator CUBEUsage: GROUP BY CUBE( attr1, attr2, . . . )Other possibility:
storing maximally drilled-down tableaggregate this table(cheaper than doing each aggregation from scratch)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Relations for Aggregation & the Cube Operator
Creating the data cube:
expensive to execute all queries for creating cubecan store relation for data cube(using NULL values where aggregated)still elaborate and uncomfortable→ idea: new SQL operator CUBEUsage: GROUP BY CUBE( attr1, attr2, . . . )
Other possibility:storing maximally drilled-down tableaggregate this table(cheaper than doing each aggregation from scratch)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Relations for Aggregation & the Cube Operator
Creating the data cube:
expensive to execute all queries for creating cubecan store relation for data cube(using NULL values where aggregated)still elaborate and uncomfortable→ idea: new SQL operator CUBEUsage: GROUP BY CUBE( attr1, attr2, . . . )Other possibility:
storing maximally drilled-down tableaggregate this table(cheaper than doing each aggregation from scratch)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Row Store vs. Column Store
Usually, tables are stored row-wise.
When there are many columns, it may be better to storecolumn-wise:
Most queries consider only few columns.Column values can be better compressed.Use e.g. dictionary table.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Row Store vs. Column Store
Usually, tables are stored row-wise.When there are many columns, it may be better to storecolumn-wise:
Most queries consider only few columns.Column values can be better compressed.Use e.g. dictionary table.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Row Store vs. Column Store
Usually, tables are stored row-wise.When there are many columns, it may be better to storecolumn-wise:
Most queries consider only few columns.
Column values can be better compressed.Use e.g. dictionary table.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Row Store vs. Column Store
Usually, tables are stored row-wise.When there are many columns, it may be better to storecolumn-wise:
Most queries consider only few columns.Column values can be better compressed.
Use e.g. dictionary table.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Data Warehouse & OLAP
Row Store vs. Column Store
Usually, tables are stored row-wise.When there are many columns, it may be better to storecolumn-wise:
Most queries consider only few columns.Column values can be better compressed.Use e.g. dictionary table.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
An Example
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
An Example
An Example: Storing Books
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
An Example
An Example: Storing Books
We introduce IDs for authors, books and key words.
Then:One table for each entity type (with IDs being the primary key):authors: {[a_id, name]}books: {[b_id, title, publisher]}key words: {[k_id, keyword]}One table for each relation:writes: {[a_id, b_id]}has_keyword: {[b_id, k_id, weight]}
→ need five (!) tables for storing books (actually one entity)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
An Example
An Example: Storing Books
We introduce IDs for authors, books and key words.
Then:One table for each entity type (with IDs being the primary key):authors: {[a_id, name]}books: {[b_id, title, publisher]}key words: {[k_id, keyword]}One table for each relation:writes: {[a_id, b_id]}has_keyword: {[b_id, k_id, weight]}
→ need five (!) tables for storing books (actually one entity)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Objectrelational Databases
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Objectrelational Databases
Object-oriented Extensions
→ extend relational database theory with more object-oriented ideas:
Possible approaches:
include structure into database system:→ implicit joins (no join condition necessary)
give up first NF and allow structured information:→ can store several authors/keywords in set
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Objectrelational Databases
Object-oriented Extensions
→ extend relational database theory with more object-oriented ideas:
Possible approaches:
include structure into database system:→ implicit joins (no join condition necessary)
give up first NF and allow structured information:→ can store several authors/keywords in set
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Objectrelational Databases
Object-oriented Extensions
→ extend relational database theory with more object-oriented ideas:
Possible approaches:
include structure into database system:→ implicit joins (no join condition necessary)
give up first NF and allow structured information:→ can store several authors/keywords in set
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Objectrelational Databases
Objectrelational Databases
In a proper objectrelational database system one also has
possibility to define objects with other objectsinheritanceobject methods
→ cf. object-oriented programming languages like Java
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Data Mining
Idea:have large amount of datawant to look for special patterns in that data
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Data Mining
Idea:have large amount of datawant to look for special patterns in that data
We distinguish:ClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:
Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successfulIs the incoming e-mail spam?Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:
Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successfulIs the incoming e-mail spam?Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:Risk for insurance claim(known attributes: sex, age etc.)
Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successfulIs the incoming e-mail spam?Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)
Prospect that medical treatment is successfulIs the incoming e-mail spam?Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successful
Is the incoming e-mail spam?Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successfulIs the incoming e-mail spam?
Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successfulIs the incoming e-mail spam?Does this image contain a cat?
Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Classification
Setting:want to predict certain propertylook at past data (attributes) and generate hypothesis
Examples:Risk for insurance claim(known attributes: sex, age etc.)Risk that given loan will not be paid back(known attributes: sex, age, familiy status, job, etc.)Prospect that medical treatment is successfulIs the incoming e-mail spam?Does this image contain a cat?Is this text/website about database design?
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Classification
Machine Learning
There are various methods for learning from data to answer suchquestions:
Neural NetworksDecision TreesSupport Vector Machines...
if interested→ lecture “Maschinelles Lernen” (Auer)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Association Rules
Setting:want to generate rules from big database
Example: If somebody buys a computer (s)he also buys a printer.
These association rules do not hold always:
confidence: In how many % of cases is rule true?(How many % of the PC buyers did buy a printer?)support: How many data records support the rule?(How many records indicate the purchase of a PC and a printer?)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Association Rules
Setting:want to generate rules from big database
Example: If somebody buys a computer (s)he also buys a printer.
These association rules do not hold always:
confidence: In how many % of cases is rule true?(How many % of the PC buyers did buy a printer?)support: How many data records support the rule?(How many records indicate the purchase of a PC and a printer?)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Association Rules
Setting:want to generate rules from big database
Example: If somebody buys a computer (s)he also buys a printer.
These association rules do not hold always:confidence: In how many % of cases is rule true?(How many % of the PC buyers did buy a printer?)
support: How many data records support the rule?(How many records indicate the purchase of a PC and a printer?)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Association Rules
Setting:want to generate rules from big database
Example: If somebody buys a computer (s)he also buys a printer.
These association rules do not hold always:confidence: In how many % of cases is rule true?(How many % of the PC buyers did buy a printer?)support: How many data records support the rule?(How many records indicate the purchase of a PC and a printer?)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
The A Priori Algorithm
→ want to have all association rules with support≥ sminand confidence≥ cmin.
frequent itemset := set of items with support ≥ smin
Algorithm for finding frequent itemsets:
Check for all single items i whether {i} is frequent itemset.Repeat:For each found frequent itemset F and all single items i /∈ Fcheck whether F ∪ {i} is frequent itemset.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Rules from Frequent Itemsets
When frequent itemsets are known, association rules can be derived:
If F is frequent itemset and F = L ∪ R with L ∩ R = ∅,then L⇒ R is association rule with
confidence(L⇒ R) =support(F )
support(L)
skip rules with confidence< cmin
Example:If {Printer, Paper, Toner} is frequent itemset, then we have associationrule
printer⇒ paper, toner
with confidencesupport({printer, paper, toner})
support({printer})
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Rules from Frequent Itemsets
When frequent itemsets are known, association rules can be derived:
If F is frequent itemset and F = L ∪ R with L ∩ R = ∅,then L⇒ R is association rule with
confidence(L⇒ R) =support(F )
support(L)
skip rules with confidence< cmin
Example:If {Printer, Paper, Toner} is frequent itemset, then we have associationrule
printer⇒ paper, toner
with confidencesupport({printer, paper, toner})
support({printer})Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Increasing Confidence
Given two association rules
L⇒ R, L+ ⇒ R−
for the same frequent itemset F = L ∪ R = L+ ∪ R− with L ⊆ L+ andR− ⊆ R, it holds that
confidence(L+ ⇒ R−) ≥ confidence(L⇒ R).
Example:
conf({printer, paper} ⇒ {toner}) ≥ (1)conf({printer} ⇒ {paper, toner}) (2)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Increasing Confidence
Given two association rules
L⇒ R, L+ ⇒ R−
for the same frequent itemset F = L ∪ R = L+ ∪ R− with L ⊆ L+ andR− ⊆ R, it holds that
confidence(L+ ⇒ R−) ≥ confidence(L⇒ R).
Example:
conf({printer, paper} ⇒ {toner}) ≥ (1)conf({printer} ⇒ {paper, toner}) (2)
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Association Rules
Application: Recommendations
Example:Recommendations at amazon
does not work always well:
Liebe Kundin, lieber Kunde!Kunden, die sich für “The Art of Chess Combination” von EugeneZnosko-Borovsky interessierten, haben “Read the High Country: AGuide to Western Books and Films” von Mort bestellt. Daher möchtenwir Sie darüber informieren, dass “Read the High Country: A Guide toWestern Books and Films” von Mort in Kürze erscheinen wird.Bestellen Sie jetzt Ihr Exemplar vor!
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Outline
1 SQL: Subtleties for COUNT and JOINs
2 OLTP, OLAP, SAP, and Data WarehouseOLTP and OLAPSAPData Warehouse & OLAP
3 Objectrelational DatabasesAn ExampleObjectrelational Databases
4 Data MiningClassificationAssociation RulesClustering
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Clustering
Setting:Given: large amount of datafind clusters of similar data records
Example:clusters in image data→ images in the same cluster show similar objects
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Clustering
Setting:Given: large amount of datafind clusters of similar data records
Example:clusters in image data→ images in the same cluster show similar objects
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
An Example: Books that make you dumb
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
An Example: Music that makes you dumb
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer from
your shopping list
(huge amounts of alcohol, regularly)books you read / music you hear (see list above)your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list
(huge amounts of alcohol, regularly)
books you read / music you hear (see list above)your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)
books you read / music you hear
(see list above)your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)books you read / music you hear
(see list above)
your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)books you read / music you hear (see list above)
your hobbies
(parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)books you read / music you hear (see list above)your hobbies
(parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)books you read / music you hear (see list above)your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)books you read / music you hear (see list above)your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.
You were reported sick at the same time.
Ronald Ortner
SQL: Subtleties for COUNT and JOINs OLTP, OLAP, SAP, and Data Warehouse Objectrelational Databases Data Mining
Clustering
Data Privacy Protection
There may be information that you do not want to share with others(e.g. your employer, your insurance etc.):
diseases you suffer fromyour shopping list (huge amounts of alcohol, regularly)books you read / music you hear (see list above)your hobbies (parachuting)
While single pieces of data may look innocuous enough (to share it),combination may contain information you would not share:
You bought a chocolate bar yesterday morning.You logged in into Facebook yesterday afternoon.You were reported sick at the same time.
Ronald Ortner