introduction to databases - michael liut · 2020. 2. 2. · introduction to databases csc 343...
TRANSCRIPT
-
IntroductiontoDatabasesCSC343Winter2018MICHAEL L IUT(MICHAEL.L [email protected])DEPARTMENT OF MATH EMAT ICA L ANDCOMPUTAT IONA L SC IENCE SUN IV E RS IT Y OF TORONTO M ISS ISSAUGA
-
AdministrationInstructor:MichaelLiut
Office:DH–3097B
CourseWebsite:https://www.michaelliut.ca/csc343.html
Textbook:DatabaseManagementSystems(3rd Ed.),Ramakrishnan &Gehrke.
2
Tutorial Date Time Location
TUT01 Wednesday 11:00am– 12:00pm DH-2020
TUT02 Wednesday 12:00pm – 1:00pm DH-2020
Lecture Date Time Location
LEC01 Monday 9:00am– 11:00am KN-L1220
-
3
-
TeachingAssistants
1. MohammedHossain ([email protected])
2. PankajAgrawal([email protected])
4
-
Evaluations
Examinations DueDate Weight
MidtermExam March5th, 2018 20%
Final Exam TBA 40%
5
GroupAssignments* DueDate Weight
Assignment 1 February 5th,2018 13.3"%Assignment 2 March12th, 2018 13.3"%Assignment 3 April 2nd,2018 13.3"%
*best2of3,undertherequirementthataminimumgradeof50%isachievedoneach.
-
GroupAssignments• Assignmentsaretobecompletedinpairs(groupsof2).
• Groupsmuststaytogetherforthedurationofthesemester.
• Pleaseemailmeyourpairselectionbefore9AMonJanuary22nd.
• Assignmentsarepostedonthecoursewebsite
6
-
GroupAssignments• SubmissionsmustbecompletedonBlackboard.
• LatePolicy:o 20%willbe dockedper dayoflateness.o Afterfour (4) days, theassignment willno longer beaccepted.
• Re-Markingrequests:◦ Please contactthegrading TAfirst.◦ Please feelfree tocontacttheInstructor if theTAcannot/does not resolve theissue.
7
-
PlagiarismYouareencouragedtodiscusscoursecontentwithyourfellowpeers,however,submittedworkandsolutionsmustbeformulatedbasedonyourownideasandconclusions.
Plagiarismandcheatingareseriousacademicoffenses,andwillbehandledaccordingly.
Whenyousubmitapieceofassessment(e.g.anassignmentoranexamination)youarecertifyingthat itisyourworkandyoualonegeneratedthesolution.
8
-
PlagiarismDetection
Turnitin - http://turnitin.com◦ Automatically integrated toBlackboard.◦ Checks againstcurrent/past submissions and allonline resource databases.
MOSS- MeasureOfSoftwareSimilarity◦ Developed atStanford, utilizing “Document Fingerprinting”:◦ http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
◦ Used forchecking allprogramming submissions.
9
-
Questions/Concerns/Issues/Doyoujustwanttotalk?
Ifsomething isunclear,pleaseask inclass!Iencouragefeedback!
Ifsomething isconcerningyou,pleaseletmeknow! Idon’t bite!
OfficeHours:Mondayfrom1PMto2PM
OfficeLocation:DH–3097B
OpenDoorPolicy! Ifmydoor isopen,feelfreetoenter,evenifitisjustforachatorquickquestion. Ifitisclosed,pleaseknock.
10
-
CourseSyllabusREADIT!Itisimportant!MyCourseSyllabus.UTM’sOfficialCourseSyllabus.
Found onmythecoursewebsite:https://www.michaelliut.ca/csc343.html
BlackboardandtheCourseWebsitewillbeusedinterchangeably.Botharetobecheckedonaregularbasis.
11
-
Topics
• RelationalModel
• ERModel
• SQL
• AggregationandJoins
• ConstraintsandTriggers
• RelationalAlgebra
12
• ViewsandIndexes
• DatabaseDesign
• Transactions
• Concurrency
• IntrotoNoSQLandMongoDB (timepermitting)
• Hadoopvs.GFSvs.Cassandra(timepermitting)
-
13
-
14
-
15
-
16
-
BigDataANDTrends
Duringahurricanewarning/priortoahurricaneoccurring,Walmartfoundanincreaseinsalesin:
17
StrawberryPop-Tarts
7timesthe“norm”
-
BigDataANDTrends
18
-
DataScience
EmpiricalScience:collectandsystemizefacts.
TheoreticalScience:formulatetheoriesandempiricallytestthem.
ComputationalScience:runautomaticproofs,runsimulations.
DataScience:collectdataandfindpatternswithinthedata.Thinkstatisticsmeetsmathematicsmeetscomputerscienceandprogramming.
19
-
20
-
Howdodatabasesrunyourlife?
• CloudStorage(e.g.Dropbox,GoogleDrive,iCloud,etc...)o Where isthedata?Howis itcategorizedandquickly accessible?
• OnlineStreamingApplications(e.g.Netflix,YouTube,HBONow,etc...)o Generating lists of videos basedon searches andtracking users preferences. “Recommended”videos.
• Finances(e.g.Chequing/SavingsAccount,StockMarket,CreditCards,etc...)o VISAprocesses anaverageof150million transactions per day.
21
-
Howdodatabasesrunyourlife?
• SocialMedia(e.g.Instagram,Facebook,Twitter,etc…)o Storing personal information andmultimedia content.o “Suggestions ForYou” or“People YouMayKnow”.
• E-commerce(e.g.Amazon,eBay,Alibaba,etc...)o Online business thatstoreandcatalogue items.o Organizetheir product’s details, pricing information, andsellers.o Store users’ purchase history, payment information/preferences, and search history.
22
HugeareaofDataAnalytics!
-
DataAnalytics
• Thescienceofexaminingandinterpretingrawdatatofindpatternsanddeduceconclusions.
• Applyingalgorithms,mathematicaltechniques, andmechanicalprocessestoformaconclusionaboutthe informationbeinganalyzed.
• By2020,therewillbeover$200Billionspentannually inthe‘BigDataandBusinessAnalytics’market.
23
.
-
WhatisaDatabase?
Naivelydefinedas…◦ a collectionofinformationthatexistsoveralongperiodoftime.
24
-
WhatisaDatabase?DATABASE
Averylarge,integratedcollectionofdata(i.e.recordsorfiles).
Modelsareal-worldenterprise◦ Entities (e.g.teams, games)◦ Relationships (e.g.BarackObama received TheNobel PeacePrize)◦ Constraints (e.g.atleast onedoctor onduty during off-hours)
DATABASEMANAGEMENTSYSTEM(DBMS)
Asoftwaresystemdesignedtostore,manage, and facilitateaccess todata.
25
-
IstheWWWaDBMS?WWW=WorldWideWeb
Fairlysophisticatedsearchesavailable◦ WebCrawlersindex pages◦ Keyword-based searchforpages
Currentlydataisunstructured anduntyped
SearchONLY◦ Can’tmodifythedata◦ Can’tgetsummariesorcomplexcombinations ofdata
26
-
IstheWWWaDBMS?Few(zero)guaranteesprovidefor:◦ Freshness ofdata◦ Consistencyacrossdataitems◦ FaultTolerance
Websites(e.g.e-commercesitessuchasAmazonorE-Bay)typicallyhaveaDBMSinthebackgroundtoprovidethesefunctions.
27
-
“Search”vs.“Query”
Whatifyouwantedtolookup allofthecountriesapartoftheEuropeanUnion(EU)?
Try“countries intheeu”inasearchengine(e.g.Google)
28
-
SearchBasedonkeywordmatching
◦ OursearchmatchescountriesthatbelongtotheEuropeanUnion(EU)
◦ Resultsarerankedbasedon:◦ Popularity◦ Reputation◦ PaidAdvertisements
◦ Webdocuments◦ Limitedstructure
29
-
“Search”vs.“Query”
“Search”returnsadocumentasis.
30
-
QueryArequestofinformationfromaDatabase.◦ InaDBMS,aspecializedlanguage(QueryLanguage)isused.
Theeaseofwhichthis informationcanbeobtainedfromadatabaseoftendeterminesitsvaluetoauser.
Thequestions posed inaQueryaregenerallydesignedforamorespecificresultthanthose inasearch.
31
-
QueryThinkofaUniversityDatabase,somequestions askedmaybe:
1. Whatisthenameofthestudentwithstudent ID#123456?
2. Howmanystudents areenrolled inCSC343?
3. Whatfractionofstudents inCSC343received agradebetterthanB?
32
-
IsaFileaDBMS?ThoughtExperiment1
Youandafriendarebotheditingafileatthesametime.
Youandyourfriendbothsavethefileattheexactsametime.
Whosechangesurvived?
A)YoursB)YourFriendsC)Both D)NeitherE)NotA,B,C,orD
33
-
IsaFileaDBMS?ThoughtExperiment2
Youandafriendareupdatingafile
Thepowergoesout
Whosechangesurvived?
A)AllB)NoneC)AllSinceLastSaveD)NotA,B,orC
34
Q: Howdoyouwriteprogramsoverasubsystemwhenitpromises youNoOptions?A: VERY,VERYCAREFULLY!!
-
WhyUseaDBMS?
• Dataindependence andefficientaccess.
• Reduceapplicationdevelopmenttime.
• Dataintegrityandsecurity.
• Concurrentaccess,recoveryfromcrashes.
35
-
WhyStudyDatabases?Shiftfromcomputationto information◦ Always trueforcorporate computing◦ Web madethis point for personal computing◦ Moreandmore trueforscientific computing
NeedforDMBShasexploded!◦ Corporate: retailswipe/clickstreams, “customer relationship management”, “supply chainmanagement”, ”datawarehouses”, BigData,etc..
◦ Scientific: digitallibraries, Human GenomeProject, SloanDigitalSky Survey, physical sensors, etc…
DMBSencompassesmuchofCSisapracticaldiscipline◦ OS, languages, theory, machine learning, logic◦ Yettraditional focus on real-world apps
36
-
WhatisIntellectualContent?Representing Information◦ datamodelling
Languages andSystemsforQueryingData◦ complexquerieswithrealisticsemantics*◦ overmassivedatasets
ConcurrencyControl forDataManipulation◦ controllingconcurrentaccess◦ Ensuringtransactionalsemantics*
Reliable DataStorage◦ maintaindatasemantics*evenifyoupulltheplug
37
*semantics: themeaningorrelationship ofmeaningofasignorsetofsigns.
-
DescribingData:DataModelsAdatamodel isacollection ofconcepts fordescribing data.
Ascheme isadescription ofaparticularcollection ofdata,usingagivendatamodel.
Therelational datamodel isthemostwidelyusedmodeltoday.◦ MainConcepts: relation,basicallyatablewithrowsandcolumns.◦ Everyrelationhasaschema,whichdescribesthecolumns,orfields.
38
-
DataIndependenceApplications insulated fromhowdataisstructuresandstored.
Logicaldataindependence• Protectionfromchanges inlogicalstructureofdata.• i.e.theabilitytochangetheconceptual(logical)schemawithout changingtheexternalschema(userview).• e.g.addition/removalofanentityorrelationship.
Physicaldataindependence• Protectionfromchanges inphysicalstructureofdata.• e.g.hardware-levelconsiderations,systemdesigns,etc…
39
Q:Whyisthisparticularly importantforDBMS?A: RateofchangeofDBapplications areslow!
MoreGenerally: dapp/dt
-
ConcurrencyControl
Concurrentexecutionofuserprograms:keytogoodDBMSperformance.◦ Frequent disk accesses.◦ Keep theCPUworking on several programs concurrently.
Interleavingactionsofdifferentprograms:trouble!◦ e.g.,account-transfer andprint statementatthesame time.
DBMSensuressuchproblemsdon’t arise.◦ Users/programmers canpretend they areusing asingle-user system (“Isolation”).◦ Thank goodness! Youdon’t have toprogram“very, very carefully”.
40
-
DatabaseStructure
Typicallyhasalayeredarchitecture.
Thefiguredoesn’tshow:◦ ConcurrencyControl◦ RecoveryComponents
Eachsystemhasitsownvariations.
41
Query Optimization and Execution
Relational Operators
Files andAccess Methods
Buffer Management
Disk SpaceManagement
DB
These layersmustconsiderconcurrencycontrolandrecovery!
-
WhyDon’tWeAlwaysUseaDBMS?
1. Expensive/complicated tosetupandmaintain
2. Costandcomplexitymustbeoffsetbyneed
3. General-purpose, notsuitedforspecial-purpose tasks(e.g.textsearch!)
42
-
TheACIDApproach
1. Atomicity:allchangestakeeffect,ornonedo.
2. Consistency: thedatabase istransferredfromonevalidstatetoanothervalidstate.
3. Isolation: theresultsofatransaction areinvisible toothertransactionsuntil thetransaction iscomplete.
4. Durability:oncecommitted,theresults ofatransactionarepermanentandsurvivefuturesystemandmedia failures.
43
-
DatabasesMakeTheseFolksHappy…DBMSVendorsandProgrammers◦ Oracle,IBM,Microsoft,…
End-Users inmanyfields◦ Business,Education,Science,…
DatabaseApplication Programmers◦ BuildenterpriseapplicationsontopofDBMSs◦ BuildwebservicesthatrunoffDBMSs
44
-
DatabasesMakeTheseFolksHappy…DatabaseAdministrators (DBAs)◦ Handlesecurityandauthorization◦ Dataavailabilityandcrashrecovery◦ Databasetuningasneedsevolve
DataScientists andAnalysts
45
-
SummaryDBMSusedtomaintain,querylargedatasets.◦ Canmanipulate dataandexploit semantics
Otherbenefitsinclude:◦ DataIndependence◦ Quickapplication development◦ Dataintegrityandsecurity◦ Recoveryfromsystemcrashes◦ Concurrentaccess
46
-
SummaryLevelsofabstractionprovidedataindependence◦ Keywhendapp/dt
-
Citations,ImagesandResourcesDatabaseManagementSystems(3rd Ed.),Ramakrishnan &Gehrke
http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily/
https://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
http://www.nytimes.com/2004/11/14/business/yourmoney/what-walmart-knows-about-customers-habits.html
https://www.kaggle.com/c/walmart-recruiting-sales-in-stormy-weather/forums/t/13299/predicting-strawberry-pop-tarts
http://truthaboutguns-zippykid.netdna-ssl.com/wp-content/uploads/2014/12/Strawberry_Pop_Tarts.jpg
http://gurupk.com/wp-content/uploads/2016/03/GTSEO.png
http://www.npr.org/sections/alltechconsidered/2016/06/24/480949383/britains-google-searches-for -what-is-the-eu-spike-after-brexit-vote
https://www.gov.uk/eu-eea
http://www.mjhaccountants.co.uk/wp-content/uploads/cartoon-filing-cabinet-l -e4b53be1891574f1.gif
http://www.cs.toronto.edu/~ryanjohn/teaching/cscc43-s12/lectures/c43-intro-v03.pdf
http://www.tenouk.com/ModuleV_files/image002.png
http://www.quoteslike.com/images/1480/love-girl-lyrics-and-leave-a-suggestion-at-the-bottom-of-the-page -SiPB6f-quote.jpg
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
Datascienceprocess flowchartfrom "DoingDataScience",CathyO'NeilandRachelSchutt,2013
48