ordb implementation discussion
DESCRIPTION
ORDB Implementation Discussion. From RDB to ORDB. Issues to address when adding OO extensions to DBMS system. Layout of Data. Deal with large data types : ADTs/blobs special-purpose file space for such data, with special access methods Large fields in one tuple : - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/1.jpg)
ORDB ImplementationDiscussion
![Page 2: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/2.jpg)
From RDB to ORDB
Issues to address whenadding OO extensions to DBMS system
![Page 3: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/3.jpg)
Layout of DataDeal with large data types : ADTs/blobs– special-purpose file space for such data, with special access
methodsLarge fields in one tuple :– One single tuple may not even fit on one disk page– Must break into sub-tuples and link via disk pointers
Flexible layout : – constructed types may have flexible sized sets, , e.g., one
attribute can be a set of strings.– Need to provide meta-data inside each type concerning layout of
fields within the tuple– Insertion/deletion will cause problems when contiguous layout of
‘tuples’ is assumed
![Page 4: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/4.jpg)
Layout of Data
More layout design choices (clustering on disk):
– Lay out complex object nested and clustered on disk (if nested and not pointer based)
– Where to store objects that are referenced (shared) by possibly several other and different structures
– Many design options for objects that are in a type hierarchy with inheritance
– Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access
![Page 5: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/5.jpg)
Why Identifier ?
Distinguish objects regardless of content and location
Evolution of object over time
Sharing of objects without copying
Continuity of identity (persistence)
Versions of a single object
![Page 6: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/6.jpg)
Objects/OIDs/Keys
Relational keys: RDB human meaningful name (mix data value with identity)
Variable name : PL give name to objects in program (mix addressability with identity)
Object identifier : ODB system-assigned globally unique name (location- and data-independent )
![Page 7: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/7.jpg)
OIDs
System generated
Globally unique
Logical identifier (not physical representation; flexibility in relocation)
Remains valid for lifetime of object (persistent)
![Page 8: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/8.jpg)
OID Support
OID generation : – uniqueness across time and system
Object handling : – Operations to test equality/identify– Operations to manipulate OIDs for object merging
and copying.– Deal with avoiding dangling references
![Page 9: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/9.jpg)
OID Implementation
By address (physical)– 32 bits; direct fast access like a pointer
By structured address– E.g., page and slot number– Both some physical and logical information
By surrogates– Purely logical oid– Use some algorithm to assure uniqueness
By typed surrogates– Contains both type id and object id– Determine type of object without fetching it
![Page 10: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/10.jpg)
ADTs
– Type representation: size/storage– Type access : import/export– Type manipulation: special methods to serve as
filter predicates and join predicates– Special-purpose index structures : efficiency
![Page 11: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/11.jpg)
ADTs
Mechanism to add index support along with ADT:– External storage of index file outside DBMS– Provide “access method interface” a la:
• Open(), close(), search(x), retrieve-next()• Plus, statistics on external index
– Or, generic ‘template’ index structure • Generalized Search Tree (GiST) – user-extensible• Concurrency/recovery provided
![Page 12: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/12.jpg)
Query Processing
Query Parsing :– Type checking for methods– Subtyping/Overriding
Query Rewriting:– May translate path expressions into join operators– Deal with collection hierarchies (UNION?)– Indices or extraction out of collection hierarchy
![Page 13: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/13.jpg)
Query Optimization Core
– New algebra operators must be designed :• such as nest, unnest, array-ops, values/objects, etc.
– Query optimizer must integrate them into optimization process :
• New Rewrite rules• New Costing• New Heuristics
![Page 14: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/14.jpg)
Query Optimization Revisited
– Existing algebra operators revisited : SELECT– Where clause expressions can be expensive– So SELECT pushdown may be bad heuristic
![Page 15: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/15.jpg)
Selection Condition RewritingEXAMPLE:(tuple.attribute < 50) – Only CPU time (on the fly)
(tuple.location OVERLAPS lake-object)– Possibly complex CPU-heavy computations – May Involve both IO and CPU costs
State-of-art: – consider reduction factor only
Now, we must consider both factors:– Cost factor : dramatic variations – Reduction factor: unrelated to cost factor
![Page 16: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/16.jpg)
Operator Ordering
op1
op2
![Page 17: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/17.jpg)
Ordering of SELECT Operators
– Cost factor : now could be dramatic variations – Reduction factor: orthogonal to cost factor– We want maximal reduction and minimal cost: Rank ( operator ) = (reduction) * ( 1/cost )
– Order operators by increasing ‘rank’– High rank :
• (good) -> low in cost, and large reduction– Low rank
• (bad) -> high in cost, and small reduction
![Page 18: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/18.jpg)
Access Structures/Indices ( on what ?)
Indexes that are ADT specificIndexes on navigation pathIndexes on methods, not just on columnsIndexes over collection hierarchies (trade-offs)Indexes for new WHERE clause expressions not just =, <, > ; but also “overlaps”,”similar”
![Page 19: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/19.jpg)
Registering New Index (to Optimizer)
What WHERE conditions it supportsEstimated cost for “matching tuple” (IO/CPU)– Given by index designer (user?)– Monitor statistics; even construct test plans
Estimation of reduction factors/join factors– Register auxiliary function to estimate factor– Provide simple defaults
![Page 20: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/20.jpg)
Methods
Use ADT/methods in query specificationAchieve flexibility and extensibility
![Page 21: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/21.jpg)
Methods
Extensibility : Dynamic linking of methods defined outside DBFlexibility : Overwriting methods for type hierarchySemantics :– Use of “methods” with implied semantics?– Incorporation of methods into query process may cause
side-effects? – Termination may not be guaranteed?
![Page 22: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/22.jpg)
Methods
“Untrusted” methods : – methods corrupt server or – modify DB content (side effects)
Handling of “untrusted” methods :– restrict language;– interpret vs compile, – separate address space as DB server
![Page 23: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/23.jpg)
Query Optimization with Methods
Estimation of “costs” of method predicates– See earlier discussion
Optimization of method execution:– Methods may be very expensive to execute– Idea:
• Apply similar idea as handling correlated nested subqueries• Recognize repetition and rewrite physical plan.• Provide some level of pre- computation and reuse
![Page 24: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/24.jpg)
Strategies for Method Execution
– 1. If called on same input, cache that one result– 2. If on full column, presort column first (groupby)– 3. Or, precompute results of methods for each possible
value in domain; and put in hash-table : fct (val );
Look up in hash-table val fct (val)
during query processing or even join with it, instead of recomputing
![Page 25: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/25.jpg)
Query Processing
User-defined methodsUser-defined aggregate functions:– E.g., “second largest” or “most brightest picture”
Distributive aggregates:– incremental computation
![Page 26: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/26.jpg)
Incremental Computation :Query Processing
For incremental computation of distributive aggregates:Provide:– Initialize(): set up state space– Iterate(): per tuple update the state– Terminate(): compute final result based on state; and cleanup state
For example : “second largest” – Initialize(): 2 fields– Iterate(): per tuple compare numbers– Terminate(): remove 2 fields
![Page 27: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/27.jpg)
Following Disk Pointers?
Complex object structures with object pointers may exist (~ disk pointers)Navigate complex objects following pointers Long-running transaction like in CAD design may work with complex object for longer durationWhat to do about “pointers” between subobjects or related objects ?
![Page 28: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/28.jpg)
Following Disk Pointers?
Swizzle :– Swizzle = Replace OIDs references by in-memory pointers,– Unswizzle = back to disk-pointers when flushing to disk.
Issues : – In-memory table of OIDs and their state;– Indicate in each object pointer via a bit.
Different policies for swizzling: – never– on access– attached to object brought in
![Page 29: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/29.jpg)
Persistence?
We may want both persistent and transient data
Why ?– Programming language variables– Handle intermediate data– May want to apply queries to transient data
![Page 30: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/30.jpg)
Properties for Persistence?
Orthogonal to types : – Data of any type can be persistent
Transparent to programmer :– Programmer can treat persistent and non-persistent
objects the same wayIndependent from mass storage:– No explicit read and write to persistent database
![Page 31: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/31.jpg)
Models of Persistence
Different models of persistence for OODB implementations
![Page 32: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/32.jpg)
Models of Persistence
Persistence by type
Persistence by call
Persistence by reachability
![Page 33: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/33.jpg)
Models of PersistenceParallel type systems: – Persistence by type, e.g., int and dbint– Programmer is responsible to make objects persistent– Programmer must make decision at object creation time– Allow for user control by “casting” types
![Page 34: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/34.jpg)
Models of PersistencePersistence by explicit call– Explicit create/delete to persistent space– E.g., objects must be placed into “persistent containers” such as
relations in order to be kept around– Eg., Insert object into Collection MyBooks;
– Could be rather dynamic control without casting– Relatively simple to implement by DBMS
![Page 35: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/35.jpg)
Models of PersistencePersistence by reachability :– Use global (or named) variables to objects and structures– Objects being referenced by other objects that are reachable by
application, then they are also persistent by transitivity .– No explicit deletes; rather need garbage collection to garbage the
objects away once no longer referenced– Garbage collection techniques :
• mark&sweep : mark all objects reachable from persistent roots; then delete others
• scavenging: : copy all reachable objects from one space to the other; but may suffer in disk-based environment due to IO overhead and distruction of clustering
![Page 36: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/36.jpg)
TradeoffsPersistent/ transient
By type By call By reference
Orthogonal to type
At creation time/any time
Can objects dynamically switch (flex)
Transparent to use; DB independent
Explicit control by user
DBMS impl cost
![Page 37: ORDB Implementation Discussion](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814585550346895db26697/html5/thumbnails/37.jpg)
Summary
A lot of work to get to OO support : From physical database design/layout issues up to logical query optimizer extensions
ORDB: Reuses existing implementation base and
incrementally adds new features on (but relation is first-class citizen)