l02-06-sql - v180129 · adopt the sum as your new number ... •sqllite-most widely deployed...
TRANSCRIPT
1
L02-L06:SQL
CS3200 Databasedesign(sp18 s2)1/11/2018
2
L01:SQLintroduction
3
SQL overview
4
SQLIntroduction
• SQLisastandardlanguageforqueryingandmanipulatingdata
• SQLisaveryhigh-levelprogramminglanguage- Thisworksbecauseitisoptimizedwell!
• Manystandardsoutthere:- ANSISQL,SQL92(a.k.a.SQL2),SQL99(a.k.a.SQL3),….- Vendorssupportvarioussubsets
NB:Probablytheworld’smostsuccessfulparallelprogramminglanguage(multicore?)
SQL standsforStructuredQueryLanguage
5
SQLHasThreeMajorSub-Languages
• DataDefinitionLanguage(DDL)- Definearelationalschema(create,alter,anddroptables;establishconstraints
- Create/alter/droptablesandtheirattributes
• DataManipulationLanguage(DML)- Insert/delete/modifytuplesintables- Commandsthatmaintainandqueryadatabase(ourmainfocus!)
• DataControlLanguage(DCL)- Commandsthatcontroladatabase,includingadministeringprivilegesandcommittingdata
6
AnAlgorithm
• Standupandthinkofthenumber1• Pairoffwithsomeonestanding,addyournumberstogether,andadoptthesumasyournewnumber
• Oneofyoushouldsitdown;theothershouldgobacktostep2
7
Scalability
logn
n
sizeofproblem
timetoso
lve
8
Mostspectacularthesedays:theoreticpotentialforperfectscaling!
• perfectscaling- givensufficientresources,performancedoesnotdegradeasthedatabasebecomeslarger
• key:parallelprocessing
• cost:numberofprocessorspolynomialinthesizeoftheDB- rememberourin-classcountingexercise
• all(most)relationaloperatorshighlyparallelisable
9
Moore'slaw
Source:http://en.wikipedia.org/wiki/Moore's_law
Multi-cores
10
WhatisSQL?
• It'salanguage(likeEnglish,Spanish,German,…)
• Thereareonlyafewkeywordsthatyouhavetolearn– it’sfairlysimple
• It‘smajorpurposeistocommunicatewithadatabaseandaskadatabasefordata
• It‘sadeclarativelanguage(youdefinewhattodo)
• Simplicityhasit‘scost– itgetscomplexquickly- Imagineonlyhaving2verbs(go,put,wait)to
expressallyoudoinalifetime- It'seitherinfeasibleoryouhavetocombinea
lotbasicactionstoconstructamorecomplexaction(e.g.skydiving=putparachuteintobackpack,putthebackpackonyourback,goairplane,waituntilairplaneisat14kfeet,gotoopendoor,gooutsideairplane,…)
• Declarativeprogrammingisperceivedasnon-intuitive(well,decideforyourselfJ)
ThePositives TheChallenges
11
DifferentsymanticsbetweenExcelandDatabasetables
Excel
Database1
row
tuple/entity/record/row
column
attribute/field/column
table heading
attributename
Tablename
1ADatabase(DB)issimplyasystemthatholdsmultipletables(likeExcelhasmultiplesheets)
12
TablesinSQL
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product
Attribute names Table name
Key
Tuple / row(Entity)
Attribute
13
DataTypesinSQL
• Atomictypes- Characterstrings:CHAR(20),VARCHAR(50)- Numbers:INT,BIGINT,SMALLINT,FLOAT- Others:MONEY,DATETIME,…
• Record(akatuple)- Everyattributemusthaveanatomictype
• Table(akarelation)- Asetoftuples(hencetablesareflat!)
14
TableSchemas
• Theschemaofatableisthetablename,itsattributes,andtheirtypes:
• Akeyisanattributewhosevaluesareunique;weunderlineakey
Product(Pname: string, Price: float, Category: string, Manufacturer: string)
Product(Pname: string, Price: float, Category: string, Manufacturer: string)
15
Basic SQL
16
SQLQuery
• Basicform(therearemanymanymorebellsandwhistles)
CallthisaSFW query.
SELECT <attributes>FROM <one or more relations>WHERE <conditions>
17
SimpleSQLQuery
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT *FROM ProductWHERE category='Gadgets'
Product
302Our friend here shows that you can follow along in SQLite. Just install the database from the text file "300 - ..." available in our sql folder
18
SimpleSQLQuery
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT *FROM ProductWHERE category='Gadgets'
Product
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSelection
302Our friend here shows that you can follow along in SQLite. Just install the database from the text file "300 - ..." available in our sql folder
19
Practicewithyourownlocaldatabases 302
If you are using Windows:1. Download the appropriate text files from our repository2. Open them with "Wordpad" (not "Notepad" which messes up the text!)3. Paste the SQL commands into your SQLite version, and execute
20
SimpleSQLQuery
PName Price ManufacturerSingleTouch $149.99 CanonMultiTouch $203.99 Hitachi
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT pName, price, manufacturerFROM ProductWHERE price > 100
Product
Selection& Projection
302
21
Selectionvs.Projection
PName PriceSingleTouch $149.99MultiTouch $203.99
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT pName, priceFROM ProductWHERE price > 100
ProductOne projects onto some attributes (columns)-> happens in the SELECT clause
One selects certain entires=tuples (rows)-> happens in the WHERE clause-> acts like a filter
302
22
AFewDetails
• SQLcommandsarecaseinsensitive:- SELECT =Select=select- Product =product
• Butvaluesarenot:- Different:'Boston','boston'- (Notice:ingeneral,butdefaultsettingswillvaryfromDBMStoDBMS.Justtobesafe,
alwaysassumevaluestobecasesensitive!)
• Usesinglequotesforconstants:- 'abc' - yes- "abc"- no
23
EliminatingDuplicates
CategoryGadgetsGadgets
PhotographyHousehold
CategoryGadgets
PhotographyHousehold
SELECT DISTINCT categoryFROM Product
SELECT categoryFROM Product
Set vs. Bag semantics
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowerGizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
Product
302
24
OrderingtheResults
- Ties in attribute price broken by attribute pname- Ordering is ascending by default. Descending:
SELECT pName, price, manufacturerFROM ProductWHERE category='Gadgets'
and price > 10ORDER BY price, pName
... ORDER BY price ASC, pname DESC
302
25
SELECT categoryFROM ProductORDER BY pName
SELECT DISTINCT categoryFROM ProductORDER BY category
SELECT DISTINCT categoryFROM ProductORDER BY pName
???
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
Product302
26
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
CategoryGadgets
HouseholdPhotography
CategoryGadgets
HouseholdGadgets
Photography
Syntax error on large DBMSs (Oracle, PostgreSQL, SQL server) / unpredictable results on others(MySQL, SQLite)
SELECT categoryFROM ProductORDER BY pName
SELECT DISTINCT categoryFROM ProductORDER BY category
SELECT DISTINCT categoryFROM ProductORDER BY pName
Product302
"ORDER BY items must appear in the select list if SELECT DISTINCT is specified."
27
L02:SQLBasics
28
Announcements!
• Microphone• IfyoustillhaveSQLitetrouble,pleaseaskDisha orPriyal forhelpduringlecture!
• Piazza:pleasebespecificonPiazzawithyourproblem,sowecanhelpyouremotely.- Compare:"Ican'tinstallSQLite.WhatshouldIdo?"(->cometoofficehours)vs."Iget
errormessageXYZwhenIdoZYX.Hereisascreenshot.WhatshouldIdo?"
• Piazza:pleasealsopostyourlessonslearned(e.g.,John'scommentonFF v56)• Textbooks• Homework#1willbereleasedbytonighttogetherwithPostgreSQLinstallationguide(youhave2weeks)
• Python,Jupyter
29
Some history
30
Some"birth-years"
• 2004:Facebook
• 1998:Google• 1995:Java,Ruby• 1993:WorldWideWeb• 1991:Python
• 1985:Windows
• 1974:SQL
31
SQL:DeclarativeProgramming
SQL
ProceduralLanguage:youhavetospecifyexactstepstogettheresult.
DeclarativeLanguage:yousaywhatyouwantwithouthavingtosayhowtodoit.
32
SQL:wasnottheonlyAttempt
SQL
Source:http://en.wikipedia.org/wiki/QUEL_query_languages
QUEL
Commercially not used anymore since ~1980
33
DisruptiveInnovation • Disruptiveinnovationsaregenerallynotacceptableforthemassmarketwhentheyareintroduced.Onlythefringesofthemarket pickuptheinnovationinthefirstiteration
• Itperformsworse inoneormoreareas,butistypicallysimpler,morereliable,ormoreconvenientthanexistingtechnologies.
• Itislessprofitablethanexistingtechnologies.Leadingfirms'mostprofitablecustomersgenerallycan'tuseitanddon'twantit.
• Astheinnovatorcontinuestorefinetheirproducttheutilityvaluetothemarketincreases
• Itsperformancetrajectoryissteeperthanthatofexistingtechnologies.
• Largeorganizationsarefundamentallyincapableofsuccessfullybringingittomarket.
Source:After“TheInnovator’sDilemma”byClaytonChristensen,1997.
Sustainingtechnology:ListentocustomersDisruptivetechnology:Notmarket-driven!
Performance
Time
34
iPhone:DisruptiveInnovationornot?
Source:http://www.youtube.com/watch?v=eywi0h_Y5_U ,http://www.asymco.com/2012/01/16/apple-is-the-top-personal-computer-vendor/
2:Laptops1:"BusinessPhones"Microsoftin2007
35
Whatkeyboardwithoutkeyscando...
Sources:https://en.wikipedia.org/wiki/Swype,https://en.wikipedia.org/wiki/SwiftKey
InFeb2016,SwiftKeywaspurchasedbyMicrosoft,for250M$
36
Thekeyboardofthefuture?
Sources:http://www.wired.com/2014/06/siri_ai/
37
38
Keyboardsandemails?
Sources:http://www.buzzfeed.com/benrosen/how-to-snapchat-like-the-teens
39
Whatisthis?
Source:http://pluggedin.kodak.com/pluggedin/post/?id=687843
(1975)
40
SQL:somehistory
• Dr.EdgarCodd(IBM)- CACMJune1970:"ARelationalModelof
DataforLargeSharedDataBanks"http://seas.upenn.edu/~zives/03f/cis550/codd.pdf
• Standardized- 1986byANSI:SQL1- 1992:Revised:SQL2
• Approx580pagedocumentdescribingsyntaxandsemantics- Revised:1999,2003,2008,...
• Players- Microsoft,IBM,RelationalSoftware(Oracle),….
• EveryvendorhasaslightlydifferentversionofSQL• Butthemaincommandsarestandardized
41
Codd's(disruptive?)innovation
Source:http://seas.upenn.edu/~zives/03f/cis550/codd.pdf
42
SQLandtherelationalmodelasstandard
Source:"InformationSystems:AManager’sGuidetoHarnessingTechnology(bookv1.4),"p.185,Gallaugher,2012.
43
Databases we are using
44
Client/ServerArchitecture
• Thereisasingleserverthatstoresthedatabase(calledDBMSorRDBMS):- Usuallyabeefysystem,e.g.IISQLSRV- Butcanbeyourowndesktop…- …orahugeclusterrunningaparalleldbms (laterassign.)
• ManyclientsrunappsandconnecttoDBMS- E.g.Microsoft’sManagementStudio- MorerealisticallysomeJava,Python,orC++program
• Clients“talk”toserverusingsomeprotocol
45
DBMSswewillworkwith
• SQLlite- mostwidelydeployeddatabaseengine- inparticularwithembeddedsystems,browsers,etc.,e.g.,Microsoft'sWindowsPhone8,Apple'siOS,Skype,Firefox
• PostgreSQL- popularandpowerfulopensourcedatabase(Microsoft)
46
SQLitevs.PostgreSQL
SQLlite• opensource&cross-platform• easytoinstall• hasnoserver("embedded")• idealforsingle-userapplication;has
limitationswhenitcomestoconcurrency/simultaneoustransactions(onewriteratatime)
• doesnotallowpartitioning;everythingisstoredinonesinglefile
• extrafunctionsarewritteninC/C++
PostgreSQL• commercial(Microsoft)• takesabitlongertoinstall• usesaserver• idealforsharedrepository;allows
concurrency(manysimultaneoustransactions),lockingandfine-grainedaccesscontrol
• scalesto>GBeasily;allowspartitioning(distributing)thedataacrossseveralfiles/nodes
• supportsuser-definedfunctions
47
SQL overview
48
Keyconstraints
• Akeyisanimplicitconstraintonwhichtuplescanbeintherelation
- i.e.iftwotuplesagreeonthevaluesofthekey,thentheymustbethesametuple!
1.Whichwouldyouselectasakey?2.Isakeyalwaysguaranteedtoexist?3.Canwehavemorethanonekey?
Akey isaminimalsubsetofattributes thatactsasauniqueidentifierfortuplesinarelation
Students(sid:string, name:string, gpa: float)
49
NULLandNOTNULL
• Tosay“don’tknowthevalue”weuseNULL- NULLhas(sometimespainful)semantics,moredetaillater
sid name gpa123 Bob 3.9143 Jim NULL Say,Jimjustenrolledinhisfirstclass.
InSQL,wemayconstrainacolumntobeNOTNULL,e.g.,“name”inthistable
Students(sid:string, name:string, gpa: float)
50
GeneralConstraints
• Wecanactuallyspecifyarbitraryassertions- E.g.“Therecannotbe25peopleintheDBclass”
• Inpractice,wedon’tspecifymanysuchconstraints.Why?- Performance!
Wheneverwedosomethingugly(oravoiddoingsomethingconvenient)it’sforthesakeofperformance
51
SummaryofSchemaInformation
• SchemaandConstraintsarehowdatabasesunderstandthesemantics(meaning)ofdata
• Theyarealsousefulforoptimization
• SQLsupportsgeneralconstraints:- Keysandforeignkeysaremostimportant- We’llgiveyouachancetowritetheothers
52
Basic SQL
53
SimpleSQLQuery
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT pNameFROM ProductWHERE manufacturer in ('Canon','Hitachi')
Product
302
54
SimpleSQLQuery
PNameSingleTouchMultiTouch
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT pNameFROM ProductWHERE manufacturer in ('Canon','Hitachi')
Product
Selection& Projection
302
55
WHERE...IN(...)cp.toExcel
Assume that there is a range defined forA10:C18 called"NameRange"
JFYI
56
WHERE...IN(...)cp.toExcel
Assume that there is a range defined for A10:C18 called "NameRange"
JFYI
57
LIKE
58
LIKE:SimpleStringPatternMatching
PNameGizmoPowergizmo
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT pNameFROM ProductWHERE pname LIKE '%izmo'
Product
% is a wildcard for any sequence of zero or more characters.
302
Moredetails:http://www.techonthenet.com/sql_server/like.php
59
LIKE:SimpleStringPatternMatching
PNameGizmo
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
SELECT pNameFROM ProductWHERE pname LIKE '_izmo'
Product
302
_ is a wildcard for exactly one character.
Moredetails:http://www.techonthenet.com/sql_server/like.php
60
Tableselectionusingcomparisonpredicates
= equalto< smaler than<= smallerorequalto> greaterthan>= greaterorequalto<> unequalto
Simplecomparators
Complexcomparators
Comparatorsthatworkacrosstypes
IN(value1,value2,…) anyvalueswithinthegivensetISNULL hasnovalueISNOTNULL hasavalue
Numbers
BETWEENvalue1 ANDvalue2anyvalueswithintherange
Text/Strings
= equalto(exactstring)
LIKE equalto(pattern)‘S%’ stringstartingwithS‘%S’ stringendingwithS‘%S%’ stringcontaininganS‘S_S’ stringwithSatbothendsandanycharacterinthemiddle
Note:CombinationsofmultiplepredicateswithAND &OR (usebrackets)
61
Date functions
62
Arithmeticexpressions
SELECT 3+2
SELECT (2*3 + 4*5) as name name26
(no column name)5
63
Datefunctionsaredatabase-specific
ThisishereSQLitesemanticsDatefunctionsaredifferentbetweendifferentdatabases.Inreallife,youmayneedtolookuphowyourDBhandlesdatefunctions:http://www.sqlite.org/lang_datefunc.html
Name BirthdateMax 1980-01-01Fred 1979-02-01Susan 1990-01-31Tilda 1988-01-01
Worker
age33342325
SELECT date('now')-date(birthdate) as ageFROM Worker
Wecanspecifytheoutputcolumnnames
JFYI
64
Joins
65
KeysandForeignKeys
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
Product
CompanyCName StockPrice CountryGizmoWorks 25 USACanon 65 JapanHitachi 15 Japan
302
Whatisaforeignkeyvs.akeyhere?
66
KeysandForeignKeys
PName Price Category ManufacturerGizmo $19.99 Gadgets GizmoWorksPowergizmo $29.99 Gadgets GizmoWorksSingleTouch $149.99 Photography CanonMultiTouch $203.99 Household Hitachi
Product
CompanyCName StockPrice CountryGizmoWorks 25 USACanon 65 JapanHitachi 15 Japan
Foreignkey
Key
Key
302
Whatisaforeignkeyvs.akeyhere?
67
ReferentialIntegrityPName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
Gizmo $14.99 Gadgets Hitachi
SuperTouch $249.99 Computer NewCom
violatesKeyconstraint
violate ForeignKeyconstraint
Keyconstraint:minimalsubsetofthefieldsofarelationisauniqueidentifierforatuple.
Delete fromCompanywhereCName='Canon';
Foreignkey:mustmatchfieldinarelationaltablethatmatchesacandidatekeyofanothertable
Insert intoProductvalues('Gizmo',14.99,'Gadgets','Hitachi');
Insert intoProductvalues('SuperTouch',249.99,'Computer','NewCom');
68
(RelationalDatabase)Schema
ProductPNamePriceCategoryManufacturer
CompanyCnameStockpriceCountry
Product(pname,price,category,manufacturer)Company(cname,stockprice,country)
Product.manufacturerisFKtoCompany
"Schema":describesthestructureofdataintermsoftherelationaldatamodel.
Aschemaincludestables,columns,PKs,FKs,andotherconstraints
69
Joins
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
302Product (pName, price, category, manufacturer)Company (cName, stockPrice, country)
Q:Findallproductsunder$200manufacturedinJapan;returntheirnamesandprices!
70
Joins
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
SELECT pName, price FROM Product, CompanyWHERE manufacturer=cName
and country='Japan'and price <= 200
302Product (pName, price, category, manufacturer)Company (cName, stockPrice, country)
Q:Findallproductsunder$200manufacturedinJapan;returntheirnamesandprices!
Joinb/wProductandCompany
PName Price
SingleTouch $149.99
71
SELECT pName, StockPriceFROM Product, CompanyWHERE manufacturer=cName
and country = 'USA'
Quiz
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
302Product (pName, price, category, manufacturer)Company (cName, stockPrice, country)
Whatdoesthequerybelowreturn?
72
SELECT pName, StockPriceFROM Product, CompanyWHERE manufacturer=cName
and country = 'USA'
Quiz
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product CompanyCName StockPrice Country
GizmoWorks 25 USA
Canon 65 Japan
Hitachi 15 Japan
302Product (pName, price, category, manufacturer)Company (cName, stockPrice, country)
Whatdoesthequerybelowreturn?
PName StockPrice
Gizmo 25
Powergizmo 25
73
TableAlias(TupleVariables)
SELECT DISTINCT pName, addressFROM Person, UniversityWHERE works_for = uName
Person (pName, address, works_for)University (uName, address)
312
74
TableAlias(TupleVariables)
SELECT DISTINCT pName, addressFROM Person, UniversityWHERE works_for = uName
SELECT DISTINCT Person.pName, University.addressFROM Person, UniversityWHERE Person.works_for = University.uName
SELECT DISTINCT X.pName, Y.addressFROM Person as X, University YWHERE X.works_for = Y.uName
which address?Error!
Note that "as" is optional !!
Person (pName, address, works_for)University (uName, address)
312