15-826: multimedia databases and data miningchristos/courses/826.f19/foils-pdf/010_intro-db.… ·...
TRANSCRIPT
![Page 1: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/1.jpg)
C. Faloutsos 15-826
1
CMU SCS
15-826: Multimedia Databases and Data Mining
Lecture#1: Introduction Christos Faloutsos
CMU www.cs.cmu.edu/~christos
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 2
Outline
Goal: ‘Find similar / interesting things’ • Intro to DB • Indexing - similarity search • Data Mining
![Page 2: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/2.jpg)
C. Faloutsos 15-826
2
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 3
Problem
Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie:
• Allow fast, approximate queries, and • Find rules/patterns
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 4
Problem
Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie:
• Allow fast, approximate queries, and • Find rules/patterns
Q1: Applications, for ‘similar’?
![Page 3: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/3.jpg)
C. Faloutsos 15-826
3
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 5
Sample queries
• Similarity search – Find pairs of branches with similar sales
patterns – ???
Alcoa
American Express
Boeing
Citi Group
…
Stock prices
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 6
Sample queries
• Similarity search – Find pairs of branches with similar sales
patterns – find medical cases similar to Smith's – Find pairs of sensor series that move in sync – Find shapes like a spark-plug – (nn: ‘case based reasoning’)
![Page 4: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/4.jpg)
C. Faloutsos 15-826
4
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 7
Problem
Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie:
• Allow fast, approximate queries, and • Find rules/patterns
Q1: Examples, for ‘interesting’?
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 8
Problem
Given a large collection of (multimedia) records, or graphs, find similar/interesting things, ie:
• Allow fast, approximate queries, and • Find rules/patterns
Q1: Examples, for ‘interesting’?
actual mean mean+freq12
![Page 5: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/5.jpg)
C. Faloutsos 15-826
5
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 9
Sample queries –cont’d
• Rule discovery – Clusters (of branches; of sensor data; ...) – ???
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 10
Sample queries –cont’d
• Rule discovery – Clusters (of branches; of sensor data; ...) – Forecasting (total sales for next year?) – Outliers (eg., unexpected part failures; fraud
detection)
![Page 6: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/6.jpg)
C. Faloutsos 15-826
6
CMU SCS
Copyright: C. Faloutsos (2019) 11
Example:
15-826
U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan.
~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 12
Important Observation:
Find similar/interesting things: are related: - Similar things ->
- clusters/patterns - outliers
- Similar past waves -> forecasting
actual mean mean+freq12
![Page 7: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/7.jpg)
C. Faloutsos 15-826
7
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 13
Outline
Goal: ‘Find similar / interesting things’ • (crash) intro to DB • Indexing - similarity search • Data Mining
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 14
Detailed Outline
Intro to DB • Relational DBMS - what and why?
![Page 8: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/8.jpg)
C. Faloutsos 15-826
8
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 15
Detailed Outline
Intro to DB • Relational DBMS - what and why?
– inserting, retrieving and summarizing data – (views; security/privacy) – (concurrency control and recovery)
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 16
Detailed Outline
Intro to DB • Relational DBMS - what and why?
– inserting, retrieving and summarizing data – (views; security/privacy) – (concurrency control and recovery)
![Page 9: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/9.jpg)
C. Faloutsos 15-826
9
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 17
How do DBs work?
We use sqlite3 as an example, from http://www.sqlite.org
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 18
How do DBs work?
linux% sqlite3 mydb # mydb: file
sqlite> create table student ( ssn fixed; name char(20) );
studentssn name
![Page 10: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/10.jpg)
C. Faloutsos 15-826
10
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 19
How do DBs work?
sqlite> insert into student values (123, “Smith”);
sqlite> select * from student;
studentssn name
123 Smith
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 20
How do DBs work?
sqlite> create table takes ( ssn fixed, c_id char(5), grade fixed));
takesssn c_id grade
![Page 11: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/11.jpg)
C. Faloutsos 15-826
11
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 21
How do DBs work - cont’d
More than one tables - joins
studentssn name
takesssn c_id grade
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 22
How do DBs work - cont’d
sqlite> select name from student, takes where student.ssn = takes.ssn and takes.c_id = “15826” studentssn name
takesssn c_id grade
Q: What does this do?
![Page 12: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/12.jpg)
C. Faloutsos 15-826
12
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 23
How do DBs work - cont’d
sqlite> select name from student, takes where student.ssn = takes.ssn and takes.c_id = “15826” studentssn name
takesssn c_id grade
Q: What does this do? A: class roster
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 24
SQL-DML
General form: select a1, a2, … an from r1, r2, … rm where P [order by ….] [group by …] [having …]
![Page 13: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/13.jpg)
C. Faloutsos 15-826
13
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 25
Aggregation
Find ssn and GPA for each student
studentssn name
takesssn c_id grade
123 603 4123 412 3234 603 3
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 26
Aggregation
Find ssn and GPA for each student
studentssn name
takesssn c_id grade
123 603 4123 412 3234 603 3
How many lines of python/C++/Java code?
![Page 14: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/14.jpg)
C. Faloutsos 15-826
14
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 27
Aggregation
sqlite> select ssn, avg(grade) from takes group by ssn;
takesssn c_id grade
123 603 4123 412 3234 603 3
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 28
Detailed Outline
Intro to DB • Relational DBMS - what and why?
– inserting, retrieving and summarizing data – views; security/privacy – (concurrency control and recovery)
• What if slow? • Conclusions
![Page 15: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/15.jpg)
C. Faloutsos 15-826
15
CMU SCS
What if slow?
sqlite> select * from irs_table where ssn=‘123’;
Q: What to do, if it takes 2hours?
15-826 Copyright: C. Faloutsos (2019) 29
CMU SCS
What if slow?
sqlite> select * from irs_table where ssn=‘123’;
Q: What to do, if it takes 2hours? A: build an index
Q’: on what attribute? Q’’: what syntax?
15-826 Copyright: C. Faloutsos (2019) 30
![Page 16: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/16.jpg)
C. Faloutsos 15-826
16
CMU SCS
What if slow?
sqlite> select * from irs_table where ssn=‘123’;
Q: What to do, if it takes 2hours? A: build an index
Q’: on what attribute? A: ssn Q’’: what syntax? A: create index
15-826 Copyright: C. Faloutsos (2019) 31
CMU SCS
What if slow - #2?
sqlite> create table friends (p1, p2); Q: Facebook-style: find the 2-step-away
people
15-826 Copyright: C. Faloutsos (2019) 32
![Page 17: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/17.jpg)
C. Faloutsos 15-826
17
CMU SCS
What if slow - #2?
sqlite> create table friends (p1, p2); sqlite> select f1.p1, f2.p2
from friends f1, friends f2 where f1.p2 = f2.p1;
Q: too slow – now what?
15-826 Copyright: C. Faloutsos (2019) 33
f1.p1 f1.p2 f2.p1 f2.p2
CMU SCS
What if slow - #2?
sqlite> create table friends (p1, p2); sqlite> select f1.p1, f2.p2
from friends f1, friends f2 where f1.p2 = f2.p1;
Q: too slow – now what? A: ‘explain’: sqlite> explain select
…. 15-826 Copyright: C. Faloutsos (2019) 34
f1.p1 f1.p2 f2.p1 f2.p2
![Page 18: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/18.jpg)
C. Faloutsos 15-826
18
CMU SCS
Long answer:
• Check the query optimizer (see, say, Ramakrishnan + Gehrke 3rd edition, chapter15):
15-826 Copyright: C. Faloutsos (2019) 35 Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill 2002 (3rd ed).
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 36
Conclusions
• (relational) DBMSs: electronic record keepers
• customize them with create table commands
• ask SQL queries to retrieve info
![Page 19: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/19.jpg)
C. Faloutsos 15-826
19
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 37
Conclusions cont’d
Data mining practitioner’s guide: • group by + aggregates • If a query runs slow:
– explain select – to see what happens – create index – often speeds up queries
CMU SCS
15-826 Copyright: C. Faloutsos (2019) 38
For more info:
• Sqlite3: www.sqlite.org - @ linux.andrew • Ramakrishnan + Gehrke, 3rd edition • 15-415/615 web page, eg,
– http://www.cs.cmu.edu/~christos/courses/dbms.F16
![Page 20: 15-826: Multimedia Databases and Data Miningchristos/courses/826.F19/FOILS-pdf/010_intro-db.… · Intro to DB • Relational DBMS - what and why? – inserting, retrieving and summarizing](https://reader036.vdocuments.net/reader036/viewer/2022071004/5fc0c66088a03d07e53b85c1/html5/thumbnails/20.jpg)
C. Faloutsos 15-826
20
CMU SCS
We assume known:
• B-tree indices • www.cs.cmu.edu/~christos/courses/826.F19/FOILS-pdf/020_b-trees.pdf • Hashing • www.cs.cmu.edu/~christos/courses/826.F19/FOILS-pdf/030_hashing.pdf
• (also, [Ramakrishnan+Gehrke, ch. 10, ch.11])
15-826 Copyright: C. Faloutsos (2019) 39