advanced topics in databases · advanced topics in databases, 2019/april/05 otto-von-guericke...
TRANSCRIPT
![Page 1: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/1.jpg)
Advanced Topics in Databases, 2019/April/05Otto-von-Guericke University of Magdeburg
Advanced Topics in Databases
Gunter SaakeDavid Broneske, Gabriel Campero Durand, Bala Gurumurthy, Andreas Meister, Marcus Pinnecke, Roman Zoun
![Page 2: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/2.jpg)
2Gunter Saake | Introduction
● Familiarize students with current developments in database research
● Topics chosen:
● First solutions currently making their way into database
management systems and applications → practical relevance
● Solutions not yet fully developed and where open problems
still exist → research relevance
● Possible starting points for scientific work, e.g. master thesis,
position in academia, Ph.D. thesis, industry R&D, etc.
Aim of the Course
![Page 3: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/3.jpg)
3
What you should need to know already● Database introductory course (e.g., Database Concepts)● Recommended: Database implementation techniques
What you’ll learn in this lecture● Impact of modern hardware on main-memory database systems
○ Database operators○ Query optimization○ Index structures
● HTAP database management systems● AI techniques for data management● Analytics in document-stores
Audience & Prerequisites
Gunter Saake | Introduction
![Page 4: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/4.jpg)
Motivation for this CoursePART I
4
![Page 5: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/5.jpg)
5
Yesterday’s DBMS Landscape
Gunter Saake | Introduction
![Page 6: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/6.jpg)
6
Yesterday’s DBMS Hardware
Gunter Saake | Introduction
Picture taken from [1]
Picture taken from [2]
Small main memory
Disk-based systems
![Page 7: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/7.jpg)
7
Assumptions of Yesterday’s DBMS’s
Gunter Saake | Introduction
● Capacity of main memory <1% of the stored data
● Fixed block size based on the transfer unit between disks and main
memory
● Central scheduler to schedule transactions
● No redundant data storage in main memory
● Pipelining is always beneficial (no storage of intermediate results)
● Compiling of SQL for one processor architecture → Reuse of compiled
plan
![Page 8: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/8.jpg)
8Gunter Saake | Introduction
![Page 9: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/9.jpg)
9Gunter Saake | Introduction
Today’s Hot Topics
![Page 10: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/10.jpg)
10
Today’s DBMSInfrastructure
Gunter Saake | Introduction
● Large-scale query/data flow engines
● Stream-based query engines
● In-Memory Storage
● MPP DBs, cloud EDWs, GPU DBs
● NewSQL: Large-scale OLTP and HTAP DBs
● NoSQL: Column-families, graph data, key-
value stores, documents, time series, etc.
● Specialized data transformation
& integration tools
![Page 11: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/11.jpg)
11
Today’s DBMSAnalytics
Gunter Saake | Introduction
● Statistical analysis and Data science
workloads backed by DBs
● Interactive visual data exploration & BI tools
● Specialized ML systems with
their own data solutions
● Search engines
● Web, Commerce, Social and Log analytics
● Speech and NLP
![Page 12: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/12.jpg)
12
Today’s DBMS Hardware
Gunter Saake | Introduction
Picture taken from [1]
Picture taken from [4]
Large main memory
Solid state disks Co-processors
Multi-core CPUs
Picture taken from [5]
Picture taken from [3]
![Page 13: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/13.jpg)
13
Future DBMS’s
Gunter Saake | Introduction
● Capacity of main memory <1% of the stored data
○ DB in main memory
● Fixed block size based on the transfer unit
○ Direct access of data on all devices
● Central scheduler to schedule transactions
○ Which processor should do the job?
● No redundant data storage in main memory
○ Redundant data at co-processors
● Pipelining is always beneficial
○ Co-processors like GPUs support massive parallelism
● Reuse of compiled plan
○ Load-balancing between co-processors requires different plans
![Page 14: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/14.jpg)
14
The Goals of a ”Databaser”
Gunter Saake | Introduction
![Page 15: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/15.jpg)
15
The Goals of a ”Databaser”
Gunter Saake | Introduction
● Performance
Picture taken from [6]
![Page 16: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/16.jpg)
16
The Goals of a ”Databaser”
Gunter Saake | Introduction
● Performance
● Performance
Picture taken from [6]
![Page 17: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/17.jpg)
17
The Goals of a ”Databaser”
Gunter Saake | Introduction
● Performance
● Performance
● PerformancePicture taken from [6]
![Page 18: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/18.jpg)
18
The Goals of a ”Databaser”
Gunter Saake | Introduction
● Performance
● Performance
● Performance
How can we achieve more performance?
Picture taken from [6]
![Page 19: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/19.jpg)
19Gunter Saake | Introduction
Are DBMSs written for yesterday’s
hardware efficient on today’s hardware
as well?
”30 years of Moore’s law has antiquated the disk-oriented
relational architecture for OLTP applications”
[Stonebraker et al.]
![Page 20: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/20.jpg)
20
Data Access – Yesterday’s Bottleneck
Gunter Saake | Introduction
![Page 21: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/21.jpg)
21
Data Access – Today’s Bottleneck
Gunter Saake | Introduction
![Page 22: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/22.jpg)
22
The World of Co-Processors
Gunter Saake | Introduction
Picture taken from [7]
![Page 23: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/23.jpg)
23Gunter Saake | Introduction
What do we have to change in DBMSs’
architecture to exploit new hardware
capabilities and to meet tomorrow’s
challenges and applications?
![Page 24: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/24.jpg)
Topic OutlinePART I
24
![Page 25: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/25.jpg)
25
Topic Categorization
![Page 26: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/26.jpg)
26
Chapter 1
Chapter 2
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Main-Memory Database Systems
2019/April/05
● Computer and Database Systems ArchitectureChanges in hardware and their implications for database systems
● Cache AwarenessHow do caches work and how to optimize for them?
● Processing ModelsHow do database systems execute an operation on a number of tuples?
● Storage Models How to store a two-dimensional table in one-dimensional memory?
Chapter 3
![Page 27: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/27.jpg)
27
Parallel Join Ordering
2019/April/26
● Query ProcessingOverview of the process of query processing
● Join orderingOverview of join ordering
● Dynamic programming for join orderingDiscussion about sequential and dynamic programming variants
(A Picture Chapter 2
Chapter 1
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 3
![Page 28: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/28.jpg)
28
Hardware-Sensitive DBMSOperations
2019/May/10, 2019/May/17
● Hardware in DBMSOverview on different eras of H/W evolution and their capabilities
● CPU - Code OptimizationIntroduction to implementing hardware sensitive DBMS operations
● GPU Accelerated ProcessingIntroduction to GPU architecture and kernel-based execution
Chapter 3
Chapter 1
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 2
![Page 29: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/29.jpg)
29
Chapter 4
Chapter 1
Chapter 3
Chapter 5
Chapter 6
Chapter 7
Index Structures for Main- Memory Database Systems
2019/May/24
● Query Processing Basics Recap about query optimizer and selections
● Accelerated Full-Table Scans Tuning scans to the underlying hardware
● Tree-Based Index Structures for Main Memory Hardware-sensitive tree-based index structures optimized for SIMD and cache consciousness
(A Picture of You)
Chapter 2
![Page 30: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/30.jpg)
30
Chapter 5
Chapter 1
Chapter 3
Chapter 4
Chapter 6
Chapter 7
HTAP Data Management
TBD
● DBMS Design for Main-Memory OLTP Overview about organization choices, OLTP indexes, versioning
● Design Choices for HTAP How do HTAP systems balance OLAP and OLTP designs?Illustrations from production DBMSs
● Beyond Static HTAP Designs How can databases automatically adapt to shifting workloads?
(A Picture of You)
Chapter 2
![Page 31: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/31.jpg)
31
Chapter 1
Chapter 3
Chapter 4
Chapter 5
(A Picture of You)
Chapter 2Physical Design for Document Store Analytics
2019/June/072019/June/142019/June/21
● Document Data Model and Document StoresGet in touch with JSON, MongoDB, CouchDB, and what it means
● Document Store Storage Engine InternalsMongoDB/WiredTiger & CouchDB storage internals incl. records
● Columnar Binary-Encoded JSON (Carbon) ArchivesGet conceptual (and low-level technical) insights into our research
● Overview on Current State and Your Points to JoinGet an overview on open projects (thesis, individual projects,...)
Chapter 6
Chapter 7
![Page 32: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/32.jpg)
32
Chapter 1
Chapter 3
Chapter 4
Chapter 5
AI Techniques for Data Management
TBD
● How can developments from ML (machine learning) be used for next-gen database optimization problems? Introduction to the nascent field of ML for data managementOverview of core problems being tackledExamples of applications
● Background on ML techniques gaining interestIntroduction to deep reinforcement learning
(A Picture of You)
Chapter 2
Chapter 7
Chapter 6
![Page 33: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/33.jpg)
OrganizationPART I
33
![Page 35: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/35.jpg)
35
Organization
13 Lectures (each with an exercise sheet)New exercise sheets: on FridayBegin of exercises: from 2019/April/10 to 2019/July/03
12 Exercise SheetsRegistration to tutorials: Groups of 4 students until 2019/April/12We expect you to be prepared before a tutorial starts.
QuestionsAsk your fellow students first > then your tutor > then the main organizer > then the professor
Gunter Saake | Introduction
![Page 36: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/36.jpg)
36
Points & Assignments
● Exercises are optional, but recommended for being successful in the exam○ Presenting task by task○ Discussing student solutions and alternative solutions○ Short introductory exercise at 2019/April/10
● Each student team has to submit and successfully solve 2 out of 4 programming tasks
● Programming tasks will be presented in end of April (including registration for it)
● Limited amount of teams per task!● Final submission: 2019/July/05
Gunter Saake | Introduction
![Page 37: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/37.jpg)
37
Programming Tasks
1. Extending Main-Memory Index Structures with Special Selection CapabilitiesC/C++ Framework
2. Improving a Deep Reinforcement Learning Index Advisor Horizon Framework for Deep Reinforcement Learning, PostgreSQL3. Single Column Selection in a Interpretation-Based System
C/C++ framework4. Accelerating Analytics in CARBON
ANSI C, CARBON Framework
Gunter Saake | Introduction
![Page 38: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/38.jpg)
38
Elf code repository● Our main-memory index structure for multi-column selection predicates● https://git.iti.cs.ovgu.de/dbronesk/ICDE-elf
Libcarbon code repository● A C library for creating, modifying and querying Columnar Binary-Encoded JSON (Carbon) files● http://github.com/protolabs/libcarbon
Additional Material
Gunter Saake | Introduction
![Page 39: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/39.jpg)
39
● [1] http://commons.wikimedia.org/wiki/File:RAM_module_SDRAM_1GiB.jpg● [2] http://commons.wikimedia.org/wiki/File:Hard_disks.jpg● [3] http://www.flickr.com/photos/25757823@N07/2719552544● [4]
http://commons.wikimedia.org/wiki/File:Super_Talent_2.5in_SATA_SSD_SAM64GM25S.jpg
● [5] http://commons.wikimedia.org/wiki/File:Gtx260.jpg● [6] http://commons.wikimedia.org/wiki/File:Travis_Race_car.jpg● [7] http://www.flickr.com/photos/denieseclariz/7412854696
Web Resources
Gunter Saake | Introduction
![Page 40: Advanced Topics in Databases · Advanced Topics in Databases, 2019/April/05 Otto-von-Guericke University of Magdeburg Advanced Topics in Databases Gunter Saake David Broneske, Gabriel](https://reader030.vdocuments.net/reader030/viewer/2022041015/5ec6bf3046900a4db71b6d28/html5/thumbnails/40.jpg)
40
Summary
Andreas Meisterhttp://www.dbse.ovgu.de/Lehre/[email protected]
Have Fun and Good Luck!
Gunter Saake | Introduction