database design and...
TRANSCRIPT
![Page 1: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/1.jpg)
Database design and implementation CMPSCI 645
Lecture 08: Storage and Indexing
1
![Page 2: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/2.jpg)
Where is the data and how to get to it?
2
DB
![Page 3: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/3.jpg)
DBMS architecture
3
DiskSpaceManager
AccessMethods
BufferManager
QueryParser
QueryRewriter
QueryOp=mizer
QueryExecutor
LockManager LogManager
DB
![Page 4: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/4.jpg)
Memory hierarchy
5
randomaccessfastvola=le
randomaccessrela=velyslownon-vola=le
sequen=alscannon-vola=lelongarchiving
mainmemory
magne+cdisk
tape
![Page 5: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/5.jpg)
Disks and DBMS design
DB
Databasesarestoredondisks
write
read
RAM
expensiveopera=ons
6
![Page 6: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/6.jpg)
Why not store everything in memory?
7
vola=lity
cost
![Page 7: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/7.jpg)
Basics of disks
8
Pla4ers
Spindle
Armmovement
Diskhead
Armassembly
PlaIersspinunderthehead
Onlyoneheadreadsandwrites
Retrieval=mevaries:Seek=me+rota=ondelay+transfer=me
PlaIershavetracks,formingan(imaginary)cylinder
Eachtrackhassectors.Blocks(pages)aremul=pleofsectors
![Page 8: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/8.jpg)
Accessing a disk page
} Timetoaccess(read/write)adiskblock:1. seek'me(movingarmstoposi=onadiskheadonatrack)2. rota'onaldelay(wai=ngforablocktorotateunderthehead)3. transfer'me(actuallymovingdatato/fromdisksurface)
} Seek=meandrota=onaldelaydominate.
} PlacementofpagesondiskhasmajorimpactonDBMSperformance.
9
![Page 9: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/9.jpg)
Arranging pages on disk
} Sequen=alpagestorage:} blocksonthesametrack,followedby} blocksonthesamecylinder,followedby} blocksonanadjacentcylinder
} Pagesinafileshouldbearrangedsequen=allyondisk,tominimizeseekandrota=onaldelay.} Scanofthefileisasequen'alscan.
10
![Page 10: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/10.jpg)
Files of records
11
Fieldsareorganizedinarecord
Acollec=onofrecordsareorganizedinapage
Acollec=onofpagesmakesafile
![Page 11: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/11.jpg)
Unordered (Heap) Files } Simplestfilestructurecontainsrecordsinnopar=cularorder.
} Asfilegrowsandshrinks,diskpagesareallocatedandde-allocated.
} Tosupportrecordlevelopera=ons,wemust:} keeptrackofthepagesinafile} keeptrackoffreespaceonpages} keeptrackoftherecordsonapage
12
![Page 12: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/12.jpg)
Heap File Using a Page Directory
} Pageentrycanincludethenumberoffreebytesonthepage.
} Thedirectoryisacollec=onofpages;linkedlistimplementa=onisjustonealterna=ve.
DataPage1
DataPage2
DataPageN
HeaderPage
DIRECTORY
13
![Page 13: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/13.jpg)
Page format
14
} Howtostorerecordsonapage
} Considerapageasacollec=onofslots,oneforeachrecord
} Arecordisiden=fiedbyrid=<pageid,slot#>
} Recordids(rids)areusedinindexes
![Page 14: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/14.jpg)
Page formats: fixed length records
Movingrecordsforfreespacemanagementchangesrid;maynotbeacceptable.
Slot1Slot2
SlotN
... ...
N M10...
M...321PACKED UNPACKED,BITMAP
Slot1Slot2
SlotN
FreeSpace
SlotM
11
numberofrecords
numberofslots
15
![Page 15: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/15.jpg)
Page formats: variable length records
Canmoverecordsonpagewithoutchangingrid;so,aIrac=veforfixed-lengthrecordstoo.
PageiRid=(i,N)
Rid=(i,2)
Rid=(i,1)
Pointertostartoffreespace
SLOTDIRECTORY
N...2120 16 24 N
#slots
16
![Page 16: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/16.jpg)
Record formats: fixed length
Numberoffieldsandtypestoredinsystemcatalogs.Findingithfielddoesnotrequirescanofrecord.
Baseaddress(B)
L1 L2 L3 L4
F1 F2 F3 F4
Address=B+L1+L2
17
![Page 17: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/17.jpg)
Record formats: variable length
F1F2F3F4
S1 S2 S3 S4 E4ArrayofFieldOffsets
$ $ $ $
Scan
FieldsDelimitedbySpecialSymbols
F1F2F3F4
2ndchoiceoffersdirectaccesstoithfieldwithsmalldirectoryoverhead.
18
![Page 18: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/18.jpg)
Question
} Considerthefollowingquery:
} HowcantheDBMSexecutethisquerygiven} 1GBofmemory} 100GBTempSensorand10GBPressureSensor
SELECT S1.temp, S2.pressure!FROM ! TempSensor S1, PressureSensor S2!WHERE! S1.location = S2.location !
! AND S1.time = S2.time!
19
![Page 19: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/19.jpg)
Buffer manager
Disk
Mainmemory
Pagerequestsfromhigher-levelcode
Bufferpool
Diskpage
Freeframe
1pagecorrespondsto1diskblock
Disk=collec=onofblocks
Diskspacemanager
BufferpoolmanagerFilesandaccessmethods
choiceofframedictatedbyreplacementpolicy
• DatamustbeinRAMforDBMStooperateonit!• Bufferpool=tableof<frame#,pageid>pairs
READ/WRITE
INPUT/OUTPUT
01
11
02
pincount
dirty
20
![Page 20: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/20.jpg)
When a page is requested...
} Ifrequestedpageisnotinpool(andbufferisfull):} Chooseaframeforreplacement} Ifframeisdirty,writeittodisk} Readrequestedpageintochosenframe
} Pinthepageandreturnitsaddress.
Ifrequestscanbepredicted(e.g.,sequen=alscans)pagescanbepre-fetchedseveralpagesata=me!
23
![Page 21: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/21.jpg)
Buffer replacement policy } Frameischosenforreplacementbyareplacementpolicy:} Least-recently-used(LRU),Clock,MRUetc.
} Policycanhavebigimpacton#ofI/O’s;dependsontheaccesspa>ern.
} Sequen'alflooding:Nastysitua=oncausedbyLRU+repeatedsequen=alscans.} #bufferframes<#pagesinfilemeanseachpagerequestcausesanI/O.MRUmuchbeIerinthissitua=on(butnotinallsitua=ons,ofcourse).
24
![Page 22: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/22.jpg)
DBMS vs. OS file system
} Reason1:Correctness} DBMSneedsfinegrainedcontrolfortransac=ons} Needstoforcepagestodiskforrecoverypurposes
} Reason2:Performance} DBMSmaybeabletoan=cipateaccesspaIerns} Hence,mayalsobeabletoperformprefetching} MayselectbeIerpagereplacementpolicy
25
OSdoesdiskspace&buffermgmt:whynotletitmanagethesetasks?
![Page 23: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/23.jpg)
Database file types
Thedatafilecanbeoneof:} Heapfile} Setofrecords,par==onedintoblocks} Unsorted
} Sequen=alfile} SortedaccordingtosomeaIribute(s)called(sort)key
differentfrom“key"!
26
![Page 24: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/24.jpg)
Index
} A(possiblyseparate)file,thatallowsfastaccesstorecordsinthedatafilegivenasearchkey
} Theindexcontains(key,value)pairs:} Thekey=anaIributevalue} Thevalue=eitherapointertotherecord,ortherecorditself
againdifferentfrom“key"!
27
![Page 25: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/25.jpg)
High-level overview: Indexes
id age salary other
006 19 50k ...
005 20 55k ...
004 25 50k ...
007 30 80k ...
002 35 75k ...
003 35 70k ...
001 40 65k ...
id age salary other
006 19 50k ...
004 25 50k ...
005 20 55k ...
001 40 65k ...
003 35 70k ...
002 35 75k ...
007 30 80k ...
datafile=indexfileclustered(primary)index
indexfileunclustered(secondary)index
28
![Page 26: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/26.jpg)
Index classification } Clustered/unclustered
} Clustered=recordscloseinindexarecloseindata} Unclustered=recordscloseinindexmaybefarindata
} Primary/secondary} Primary=isoveraIributesthatincludetheprimarykey} Secondary=otherwise
} Organiza=on:B+treeorHashtable
29
![Page 27: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/27.jpg)
Clustered/Unclustered
} Clustered} Indexdeterminestheloca=onofindexedrecords} Typically,clusteredindexisonewherevaluesaredatarecords(butnotnecessary)
} Unclustered} Indexcannotreorderdata,doesnotdeterminedataloca=on
} Intheseindexes:value=pointertodatarecord
30
![Page 28: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/28.jpg)
Clustered index
} FileissortedontheindexaIribute} Onlyonepertable
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
Index File Data File
31
![Page 29: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/29.jpg)
Unclustered index
} Severalpertable
10
10
20
20
20
30
30
30
20
30
30
20
10
20
10
30
Index File Data File
32
![Page 30: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/30.jpg)
Clustered vs. unclustered index
Dataentries(IndexFile)(Datafile)
DataRecords
Dataentries
DataRecords
CLUSTERED UNCLUSTERED
B+Tree B+Tree
33
![Page 31: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/31.jpg)
Alternatives for data entry k* in index
} Inadataentryk*,wecanstore:} Alterna=ve1:<k,datarecordwithsearchkeyvaluek>} Alterna=ve2:<k,ridofarecordwithsearchkeyvaluek>
} Alterna=ve3:<k,listofridsofrecordswithsearchkeyk>
} Choiceofanalterna'vefordataentriesisorthogonaltoanindexingtechniqueused.} Indexingtechniques:B+tree,hashing,…
34
![Page 32: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/32.jpg)
Cost model
WeignoreCPUcosts,forsimplicity:} B:Thenumberofdatapages} R:Numberofrecordsperpage} D:(Average)=metoreadorwritediskpage} MeasuringnumberofpageI/Osignoresgainsofpre-fetchingasequenceofpages;thus,evenI/Ocostisonlyapproximated.
} Average-caseanalysis;basedonseveralsimplis=cassump=ons.
35
![Page 33: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/33.jpg)
Comparing file organizations
} Heapfiles(randomorder)} Sortedfiles,sortedon<age,sal>} ClusteredB+treefile,Alterna=ve(1),search
key<age,sal>} HeapfilewithunclusteredB+treeindexon
searchkey<age,sal>} Heapfilewithunclusteredhashindexon
searchkey<age,sal> 36
![Page 34: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/34.jpg)
Operations to compare
37
} Scan:Fetchallrecordsfromdisk} Equalitysearch} Rangeselec=on} Insertarecord} Deletearecord
![Page 35: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/35.jpg)
Assumptions } HeapFiles:} Equalityselec=ononkey;exactlyonematch.
} SortedFiles:} Filescompactedaverdele=ons.
} Indexes:} Alt(2),(3):dataentrysize=10%sizeofrecord} Hash:Nooverflowbuckets.
} 80%pageoccupancy=>Filesize=1.25datasize} Tree:67%occupancy(thisistypical).
} Impliesfilesize=1.5datasize
38
![Page 36: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/36.jpg)
Assumptions (contd.)
} Scans:} Leaflevelsofatree-indexarechained.} Indexdata-entriesplusactualfilescannedforunclusteredindexes.
} Rangesearches:} Weusetreeindexestorestrictthesetofdatarecordsfetched,butignorehashindexes.
39
![Page 37: Database design and implementationavid.cs.umass.edu/courses/645/s2017/lectures/08-StorageIndexing.pdf · DBMS vs. OS file system } Reason 1: Correctness } DBMS needs fine grained](https://reader035.vdocuments.net/reader035/viewer/2022081600/6027329558272e46830a85cc/html5/thumbnails/37.jpg)
Cost of operations
40
Scan Equality Range
Heap file BD 0.5 BD BD
Sorted file BD D log2 B D (log2 B + #match recs)
Clustered tree index 1.5 BD D logF 1.5B D (logF 1.5B + #pages with matched recs)
Unclustered tree index BD (R+0.15) D(1 + logF 0.15B) D (logF 0.15B + #pages with matched recs)
Unclustered hash index
BD (R + 0.125) 2D BD
Severalassump=onsunderliethese(rough)es=mates!