foundations of database systems class introduction g. green 1
TRANSCRIPT
Foundations of Database Systems
Class Introduction
G. G
reen
1
Agenda
• Introductions• Seating Chart• Course Overview• Syllabus• Case• Database Development Overview
G. G
reen
2
Foundations of Database Systems
ObjectivesUnderstand data-related activities of SDLCImplement data modeling, database design, and database
implementation techniques CASE (Visio) Database (SQL Server)
Course ContentsLectures, Examples, In-Class ExercisesIndividual Assignments (3)Team Project* (3 parts)Quizzes (3)Exams (2)
*Can request teammates; see syllabus for Team Preferences deadline
G. G
reen
3
Research
• Service Learning & Kolb’s Learning Cycle• International and US
• Periodic Assessments• Some NOT graded; others are
G. G
reen
4
Learning
Participate :› Prepare --read & reread book, notes-- for each class › Attend, listen, be attentive, engaged› Ask and answer questions, & add to discussion› Do each assignment completely & in a timely and professional
manner
Take PLENTY of notes in class:› Do NOT just rely on powerpoint
Explore :› Go beyond classroom material
G. G
reen
5
Class Resources
Syllabus/Schedule, Grades, Attendance: http://canvas.baylor.edu Schedule also contains links to all lecture slides, study guides,
assignments and project write-ups
Other Resources: http://blogs.baylor.edu/gina_green/mis-4340-resources/ NOTE: the syllabus/schedule on this website will NOT contain the
links described above
G. G
reen
6
Syllabus…
G. G
reen
7
Introduction to Databases
Chapter 1
G. G
reen
8
Topics• Chapter 1 • The Database Environment• Database Development Process
• Chapter 9 (Pages 409 – 410) • Big Data
• Chapter 10 (Pages 444 – 445, 446-447)• Master Data Management• Data Federation
• Chapter 11 (Pages 464 – 472, 486, 499 – 506)• Database Personnel• Metadata Management (e.g., Data Dictionaries)• Backup Facilities• Overview of Tuning the Database for Performance
G. G
reen
9
G. G
reen
10
Evolution of Database Technologies
1960’s 1970’s 1980’s 1990’s 2000+
Federated
MDDB
XML
NoSQL
…….
Traditional Files
Hierarchical
Network
Relational
Object
Object-Relational
11
Figure 1-3 Old file processing systems: Example
Duplicate Data
Traditional File Processing Environment
Disadvantages:› Program-data dependence = “structural” & “data”› Limited data sharing = “islands of automation”› Duplication of data = “redundancy”› Lengthy development times› Excessive program maintenance
G. G
reen
12
The Database Environment
G. Green 13
Advantages of Databases
Program-data independenceImproved data sharingMinimal data redundancyImproved data accessibility/responsiveness Improved data consistencyFaster application developmentEnforcement of standardsImproved data qualityReduced program maintenance
G. G
reen
14
Data and Database Administration
Chapter 11
G. G
reen
15
Traditional Administration Definitions
Data Administration: A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards
Database Administration: A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery
16
Data People Involved in SDLCData Administrators
Data(base) Analysts/Designers requirements elicitation, designBusiness (Intelligence) Analyst BI requirements, designData Architects strategy, governanceData Stewards quality, metadata, MDMBusiness Analytics Engineer data analytics, statistics, miningData Mining Engineer; Big Data “big data” specialists
Engineer; Data Scientist …
Database Administrators(System) DBAs implementation/maintenanceApplication DBAs
Procedural DBAs stored code e-DBAs web-enabled DBMSs
Data Warehouse Administrators ETL, DW implementation
G. G
reen
17
Growing Skillset• Relational database design, implementation• Database programming• ETL (extract, translate, load)• Data warehousing design (star schema) and implementation
(MDDB)• Data analysis, reporting, and mining techniques• Cloud database implementations• Statistical modeling with tools such as R, SAS, or SPSS• Data visualization tools• Technologies for structured and unstructured data• Hadoop (Hadoop is an Apache project to provide an open-source
implementation of frameworks for reliable, scalable, distributed computing and data storage.)
• NoSQL• "NewSQL"
***See Big Data University for (mostly) free self-study training
18
G. G
reen
Data Quality and Integration
Chapter 10
G. G
reen
19
Metadata Management• System Catalog• Part of DBMS• "Active" dictionary
• Data Dictionary • Typically "passive"• Extension of catalog metadata
• Information Repository (e.g., IRDS)• Standards for data dictionaries• Integrates dictionaries
G. G
reen
20
Master Data Management• "Ensuring the currency, meaning, and quality of
reference data within and across various subject areas" (pg 444)• Identify• Common Data Subjects• Common Data Elements• Sources of "the truth"
• Cleanse• Update applications to reference Master Data
repository• Ensures consistency of key data (not ALL data)
throughout organization
21
G. G
reen
Database Development Process
G. G
reen
22
Systems Development Life Cycle
G. G
reen
23
Planning
Analysis
Design
Implementation
Enterprise Modeling*
DB Scope, Requirements(Conceptual Data Model)
DB Design(Logical DB Design)
DB Design (Physical DB Design)
DB Implementation(Load, Test, Eval, Op)
DB Maintenance*
DB Activities in SDLCSDLC for this class
Enterprise Data Modeling
•Determine organizational data
requirements
• Build enterprise data model• outcome is a very high-level Entity-Relationship Diagram
• see :• http://da.ks.gov/kito/ITPlans/data_maps06.ppt
• http://www.tdan.com/view-articles/5205
25
G. G
reen
Source: http://www.tdan.com/view-articles/5205
Conceptual Data Modeling
Determine user data requirements
Determine business rules
Build conceptual data model› outcome is an Entity-Relationship Diagram
(conceptual schema)
G. G
reen
26
Logical Database Design
Select database model
› e.g., the Relational Model
Transform conceptual (ERD) into logical
(relational) data model
Normalize data structures
›Outcome is normalized, relational tables
G. G
reen
27
Physical Database Design
Select database product (e.g., SQL Server) Select storage device(s) Design fields, records, files (physical schema)
› outcomes are detailed, physical definitions for: fields (data dictionary) records (space requirements for physical structures)* files (access methods)
*Will not do in this class
G. G
reen
28
Database Implementation• Create database file/table structures
• Create views (external schema)
• Establish access rights
• Load test data
• Write/test programs that process data
• Install database (with production data) into production operations› outcomes are secured database tables loaded with data
G. G
reen
29
Database Maintenance•Maintain database structures• Storage/space management
• Performance, tuning• I/O Contention• CPU Usage• Application Tuning
•Data availability
•DBMS upgrades, "fixes"
• Backup, recovery …….
Database Maintenance, cont…
• Backup• Full • Incremental• Differential• Business Continuity• Data Replication ("fallback")
31
G. G
reen
Data and Database Administration
Chapter 11
G. G
reen
32
Cloud Computing
• Business Model• Computing resources on demand• Need-based architectures• Internet-based delivery• Pay as you go
• History (VERY high-level and approximate)
33
G. G
reen
Time-sharing
Virtual Machines
Utility Computing
WWW
Personal Computers
Grid Computing
Cloud Computing
50's 60's 70's 80's 90's 2000's
Cloud Computing Services
• Impacts to Data(base) Administration• See textbook page 469
G. Green 34
Summary• Evolution of Data Management• Disadvantages of file processing
• Database Concepts• Components of a DBMS Environment• Database Advantages
• Database Development:• Overall SDLC• Database Activities in the SDLC
• Data Models/Schemas• What they represent
• People Involved in SDLC (esp. DB)• Traditional job divisions and responsibilities• Newer job titles
G. G
reen
35