cs 222 – fall 2011 midterm exam and comprehensive exam …cs 222 – fall 2011 midterm exam and...

10
CS 222 – Fall 2011 Midterm Exam and Comprehensive Exam Part I (Max. Points: 100) Instructions: - This exam is closed book and closed notes but open “cheat sheet” (which you should turn in along with your exam at the end of the exam period). - The total time for the exam is 80 minutes, so budget your time accordingly. - Be sure to answer each part of each question after reading the whole question carefully. - If you don’t understand something, ask for clarification. - If you still find ambiguities in a question, write down the interpretation you are taking and then answer the question based on that interpretation. - The last two pages of this exam are blank; you can use them as scratch paper. STUDENT NAME: STUDENT ID QUESTION POINTS SCORE 1 15 2 25 3 25 4 20 5 15 TOTAL 100

Upload: others

Post on 15-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

CS 222 – Fall 2011 Midterm Exam and Comprehensive Exam Part I

(Max. Points: 100)

Instructions: - This exam is closed book and closed notes but open “cheat sheet” (which you should turn in along

with your exam at the end of the exam period). - The total time for the exam is 80 minutes, so budget your time accordingly. - Be sure to answer each part of each question after reading the whole question carefully. - If you don’t understand something, ask for clarification. - If you still find ambiguities in a question, write down the interpretation you are taking and then

answer the question based on that interpretation. - The last two pages of this exam are blank; you can use them as scratch paper.

STUDENT NAME: STUDENT ID

QUESTION POINTS SCORE

1 15

2 25

3 25

4 20

5 15

TOTAL 100

1

SCORE: _________

Question 1: Metadata (15 points)

Consider a relational DBMS called YourSQL in which each database has a pair of catalog tables, a Tables table and a Columns table, wherein it keeps track of the metadata for all tables (including the catalog tables themselves, of course). Suppose that their schemas are as follows and that all of the catalog table columns contain mandatory information:

Tables (tabName: String, owner: String, numCols: String); Columns (tabName: String, colName: String, colType: String, colPosition: Int, nullOkay: Bool)

Catalog tables are owned by user “Root”. Now suppose that user “Joe” has created a table with the following schema for keeping track of employees in his company, and suppose that age is optional information for an employee:

Emps (empID: Int, name: String, salary: Float, age: Int) (a) (5 points) Show the complete contents of the Tables table after Joe has created a database in which he has also created the Emps table, i.e., show what the query SELECT * FROM Tables would print: (b) (10 points) Similarly, show what the query SELECT * FROM Columns would print at that point:

2

SCORE: _________

Question 2: Tree Structured Indexes (25 points) Consider the following B+ tree index of order d=2.

(a) (6 points) Show the B+ tree index structure after inserting a new entry with a key value of 56.

(b) (5 points) Suppose that instead of a B+ tree, the picture at the top of this page was a picture of an ISAM index (with a slight optimization of not storing the leftmost child’s low key). Treating the original index as an ISAM index, show the index structure after inserting a new entry with a key value of 56 and explain in one sentence how/why your answer is different than your answer to (a).

3

SCORE: _________

Question 2: B+ Trees (continued)

(c) (6 points) Show the index after deleting the index entry with key value 39 from the original index.

(d) (8 points) Consider a very large B+ tree index of order d that is currently L levels deep, including the root and leaf levels, with a disk block size of B kilobytes and a total of N index entries. FaceBook is currently trying to decide whether or not to add a B+ tree index on the e-mail address field of their Users table. N will be very large in their case, so they are nervous about performance and want to know how bad things might get when a new user signs up. Set their minds at ease by showing them the expressions for the best-case and worst-case logical I/O costs (i.e., ignoring buffering effects), using one or more of the parameters just mentioned, for an insert operation on such an index:

Best case insert cost: _______________ disk reads + _______________ disk writes

Worst case insert cost: ______________ disk reads + _______________ disk writes

4

SCORE: _________

Question 3: Dynamic Hashing (25 points)

Consider the following extendible hashed file. Assume that the hash function is h(K) = K mod 32 (which is a BAD hash function!), that bits of h(K) are used in most-to-least order of significance like in lecture, and that only three records fit on a page. The figure below indicates the presence of records on a leaf page of the index by simply listing the keys of the records. (Suggestion: You should probably start by writing out all of the hashed key values as 5-bit binary numbers.)

(a) (6 points) Show the result of inserting a new record with key value 15 into the file.

(b) (6 points) Redraw the figure to show the result of deleting the record with key value 3 from the original extendible hashed file. (Be sure to use the full dynamic deletion algorithm, reclaiming space if/where applicable.)

5

SCORE: _________

Question 3: Dynamic Hashing (continued)

Consider the following linear hashed file. When it was first created, the initial file size was N=4 pages. Currently level=0 and next=3. The initial hash function used was simply h0(K) = K mod 4. Only b=2 records will fit on a page of the file.

(c) (6 points) Briefly list the steps involved in looking up a record with key 36 given the current state of the file.

(d) (7 points) Show the state of the file after inserting an index entry with a key value of 41. Assume an uncontrolled splitting policy is used, so whenever a new overflow page is added a split is to be triggered. (Be sure to include both level and next in your picture.)

6

SCORE: _________

Question 4: Multiple Choice Questions (20 points)

For each of the following questions, circle all of the answers that apply. (In some cases only one will be correct, but in some cases several of the answers will apply.) (a) (2 points) A relational table with three fields can have at most how many clustered indexes?

0 1 2 3 7

(b) (2 points) System R from IBM included which of the following file and index types?

Heap files ISAM indexes Sorted files B+ tree indexes Hashed indexes

(c) (4 points) Which of the following features did System R support?

Locking JDBC API Query compilation PL/SQL procedures

Row-oriented storage Materialized views Recovery Recursive queries

(d) (2 points) The byte stream abstraction offered by the Unix file system is an excellent basis upon which to build a B+ tree index structure for a database system.

True False

(e) (4 points) The fanout F of the non-leaf nodes in a B+ tree file containing a total of N records significantly affects:

The height of the tree The number of comparisons in a search

The degree to which the tree is balanced The number of entries in each leaf node

(f) (4 points) B+ trees are superior to dynamic hashed indexes when it comes to:

Support for exact-match searches Support for range searches

Buffer space required for good performance Support for bulk data loading

(e) (2 points) Based on Graefe’s simple heuristic for picking a good B+ tree node size, the following node size would be very good for B+ trees for flash storage with an access time of 0.1 msec and a transfer rate of 100 MBPS:

100 MB 100 KB 50KB 10KB 5KB 2.5KB 1KB 100B

7

SCORE: _________

Question 5: Storage and Performance (15 points)

Consider a data file that initially contained 50,000 user records for an online gaming company. Suppose that the file is organized as a static hashed file containing the data records in order to support very fast exact-match queries; assume that the file has a block capacity of 100 records per page and that pages of the file are contiguous (with overflow pages being added at the end of the file). Further assume that the DBA who created and loaded the file was single-mindedly focused on minimizing disk storage costs, so when the file was created, it was sized to have an expected storage utilization of 100% (i.e., full pages). Finally, assume that the company has been successful since they launched their games, and they now have 99,000 users – and due to the 24x7 nature of their operation, they have just not found time yet to reorganize the static hashed file to better accommodate its new size. Answer the following questions based on this information; you may also assume that the hash function is a very good one, so it does close to a perfect job of evenly distributing the data records. We will only look at I/O cost; please assume that a random read or random write costs 10 msec and that a sequential read costs 2 msec.

(a) (5 points) Draw a sketch that conveys the structure of the file now that the company has 99,000 users. (b) (2 points) What is the approximate disk I/O cost required to perform a successful exact-match search for a data record based on its key value? (c) (2 points) What is the approximate disk I/O cost required to perform an unsuccessful search (i.e., to look up a non-existent user)? (d) (2 points) What is the approximate disk I/O cost required to insert a new data record? (e) (4 points) How would you suggest that full file scans be implemented, in terms of how they access the contents of the file, and what would be the approximate disk I/O cost of a full scan done your way?

8

SCORE: _________

9

SCORE: _________