chapter 10 2-3-4 trees and external storage © john urrutia 2014, all rights reserved1

51
Chapter 10 2-3-4 Trees and External Storage © John Urrutia 2014, All Rights Reserved 1

Upload: ashlee-stephens

Post on 27-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1
  • Chapter 10 2-3-4 Trees and External Storage John Urrutia 2014, All Rights Reserved1
  • Slide 2
  • 2-3-4 Trees Binary Tree Each parent node may have up to 2 children. Each child can have only 1 data item. Multi-way tree (2-3-4) Each parent node must have 2 to 4 children. The max number of children is call the order of the tree Each child node will have 1 data item & can have up to 3 2-3-4 Trees are self-balancing Just like binary trees John Urrutia 2014, All Rights Reserved2
  • Slide 3
  • 2-3-4 Trees (the Rules) Leaf nodes have no children All leaf nodes are always at the same level All leaf nodes must have at least 1 Data item but may have as many as 3 50 30 10 20 40 55 62 6466 75 83 86 60 7080 John Urrutia 2014, All Rights Reserved3
  • Slide 4
  • 2-3-4 Trees (the Rules) Non-leaf nodes The data items in the node dictates the number of children 1 Data item exactly 2 children 2 Data items exactly 3 children 3 Data items exactly 4 children This relationship sets the structure of the tree Empty Nodes are not allowed John Urrutia 2014, All Rights Reserved4
  • Slide 5
  • 2-3-4 Trees (the Rules) Nodes with: 2 Links are called 2-node 3 Links are called 3-node 4 Links are called 4-node Unlike binary trees 2-3-4 do not have nodes with only 1 link John Urrutia 2014, All Rights Reserved5
  • Slide 6
  • 2-3-4 Tree Organization Data items are numbered 0, 1, 2 and are stored in ascending sequence Data links are numbered 0, 1, 2, 3 All Data in a child of Link 0 have values < the data value of Link 0 All Data in a child of Link 1 have values > the data value of Link 0 but < the data value of Link 1 All Data in a child of Link 2 have values > the data value of Link 1 but < the data value of Link 2 All Data in a child of Link 3 have values > the data value of Link 2 John Urrutia 2014, All Rights Reserved6
  • Slide 7
  • 2-3-4 Tree Organization 30 35 5578100 105 50 75 95 0 1 2 0 1 2 3 John Urrutia 2014, All Rights Reserved7
  • Slide 8
  • 2-3-4 Tree Organization All Data in a child of Link 0 have values < the data value of Link 0 All Data in a child of Link 1 have values > the data value of Link 0 but < the data value of Link 1 All Data in a child of Link 2 have values > the data value of Link 1 but < the data value of Link 2 All Data in a child of Link 3 have values > the data value of Link 2 Duplicate values are normally not permitted John Urrutia 2014, All Rights Reserved8
  • Slide 9
  • Keys & Children A B C Keys > C B < Keys < CA < Keys < BKeys < A John Urrutia 2014, All Rights Reserved9
  • Slide 10
  • Searching 2-3-4 Trees Search for the value (64) in the parent Select link whose value is greater than the 64 (Link 1 ) Search Link 1 and repeat as necessary until value found or at leaf node 50 30 10 20 40 55 62 6466 75 83 86 60 7080 John Urrutia 2014, All Rights Reserved10
  • Slide 11
  • Inserting 2-3-4 Trees Insertion always occurs in a leaf node Search for the value to insert in the root and select the first link whose value is > the insert value Navigate to the Link If the Link is full split it If not follow link to next level Repeat as necessary until the appropriate leaf is found. If leaf is full Split the leaf into two and insert the value If not Insert the value John Urrutia 2014, All Rights Reserved11
  • Slide 12
  • Inserting 2-3-4 Trees The simple process: Find the leaf node that should contain the new value If the node isnt full simply insert the value. 28|55| 11| |42| |74| | 05|09|30| |97| |44|47|63|67|72 13|23| 13| |23 18 John Urrutia 2014, All Rights Reserved12
  • Slide 13
  • Inserting 2-3-4 Trees Splitting a full node: Insert 25 Create a new node 40|50|60 39| |41| | 52| | 63| | | | 10| | John Urrutia 2014, All Rights Reserved13
  • Slide 14
  • Inserting 2-3-4 Trees Splitting a full node: Insert 25 Move 50 to the parent 40| |60 39| |41| | 52| | 63| | | | 10|50| | | John Urrutia 2014, All Rights Reserved14
  • Slide 15
  • Inserting 2-3-4 Trees Splitting a full node: Insert 25 Move 60 to the new node with the children 40| | 39| |41| | 52| | 63| | 60| | 10|50| | | John Urrutia 2014, All Rights Reserved15
  • Slide 16
  • Inserting 2-3-4 Trees Splitting the root node: Create 2 new nodes 1 for each left and right children Middle becomes the new root 9| |41| | 52| | 91| | 10|50|90 John Urrutia 2014, All Rights Reserved16
  • Slide 17
  • Inserting 2-3-4 Trees Splitting the root node: Create 2 new nodes 1 for each left and right children Middle becomes the new root 9| |41| | 52| | 91| | 10|50|90 10| | 90| | John Urrutia 2014, All Rights Reserved17
  • Slide 18
  • Inserting 2-3-4 Trees Splitting the root node: Create 2 new nodes 1 for each left and right children Middle becomes the new root 50| | 9| |41| | 52| | 91| | 10| | 90| | John Urrutia 2014, All Rights Reserved18
  • Slide 19
  • 2-3-4 DataItem Class class DataItem { public long dData; public DataItem(long dd) { dData = dd; } public void displayItem() { Console.Write("/"+dData); } } John Urrutia 2014, All Rights Reserved19 Data
  • Slide 20
  • 2-3-4 Node Class Data class Node { private const int ORDER = 4; private int numItems; private Node parent; private Node[] childArray = new Node[ORDER]; private DataItem[] itemArray = new DataItem[ORDER-1]; //-------------------------------------------- John Urrutia 2014, All Rights Reserved20
  • Slide 21
  • 2-3-4 Node Class Node Methods public void connectChild(int childNum, Node child) public Node disconnectChild(int childNum) public Node getChild(int childNum) public Node getParent() John Urrutia 2014, All Rights Reserved21
  • Slide 22
  • 2-3-4 Node Class Data Methods public DataItem getItem(int index) public int insertItem(DataItem newItem) public DataItem removeItem() John Urrutia 2014, All Rights Reserved22
  • Slide 23
  • 2-3-4 Node Class Utility Methods public Boolean isFull() public Boolean isLeaf() public int getNumItems() public int findItem(long key) public void displayNode() John Urrutia 2014, All Rights Reserved23
  • Slide 24
  • 2-3-4 Tree Class private Node root = new Node(); public int find(long key) public void insert(long dValue) public void split(Node thisNode) public Node getNextChild(Node theNode, long Value) public void displayTree() private void recDisplayTree(Node thisNode, int level, int childNumber) John Urrutia 2014, All Rights Reserved24
  • Slide 25
  • 2-3-4 Tree Class Code walk through John Urrutia 2014, All Rights Reserved25
  • Slide 26
  • 2-3-4 Trees & Red-Black Trees 2-3-4 trees dont look like Red-Black tree or do they?? Red-black trees were developed after 234 trees We can transform 2-3-4 to Red-Black because they are isomorphic using these rules: Transform any 2-node in the 2-3-4 tree into a black node in the red-black tree. Transform any 3-node into a child node and a parent node Transform any 4-node into a parent and two children John Urrutia 2014, All Rights Reserved26
  • Slide 27
  • 2-3-4 Trees & Red-Black Trees John Urrutia 2014, All Rights Reserved27 41| | 41 2 Node
  • Slide 28
  • 2-3-4 Trees & Red-Black Trees John Urrutia 2014, All Rights Reserved28 41|52| 41 3 Node 52 41 Either Is Okay
  • Slide 29
  • 2-3-4 Trees & Red-Black Trees John Urrutia 2014, All Rights Reserved29 41|52|63 41 4 Node 52 63
  • Slide 30
  • 2-3-4 Trees & Red-Black Trees Color Flips Are the same as a 4-node split Rotations are the result of a 3-node split Right rotation is the for the Left node split Left rotation is for the Right node split Efficiency with some slight differences, they are roughly the same John Urrutia 2014, All Rights Reserved30
  • Slide 31
  • 2-3 Trees Created by J. E. Hopcroft in 1970 Similar to 2-3-4 trees except a Node can hold 2 data items and can have 0 to 3 children. The split process is similar but cannot happen on the way down to the insertion point After insertion splits percolate up the tree to maintain balance John Urrutia 2014, All Rights Reserved31
  • Slide 32
  • External Storage Processor speed is rated in clock speed (Gigahertz) or Instructions per second (MIPS or FLOpS) 2.67 Gigahertz = 2,670,000,000 ticks per sec. Approx. 333,000,000 instructions per sec. The most expensive operation a system performs is I/O Approx. 1,100,000 bytes per sec. 300 times as long as an average instruction. John Urrutia 2014, All Rights Reserved32
  • Slide 33
  • External Storage
  • Slide 34
  • Disk Organization Data Terms John Urrutia 2014, All Rights Reserved34 Block Buffer Cylinder Sector Track Partition Seek Read Write Transfer Operation Terms
  • Slide 35
  • External Storage Data Terms Block the amount of data transferred in one I/O Buffer RAM to store one or more blocks of data. Usually in multiples of sector size 4,8,16,32KB Cluster the set of blocks that match the I/O buffer size. Which are read or written together. Cylinder the set of tracks simultaneously accessible by the read/write heads Sector the physical area on a platter to hold one block Track The circle scribed by the read/write head Partition a logical division on a disk drive John Urrutia 2014, All Rights Reserved35
  • Slide 36
  • External Storage Disk Organization Data Terms John Urrutia 2014, All Rights Reserved36 Block / Sector Track
  • Slide 37
  • External Storage Disk Organization Data Terms John Urrutia 2014, All Rights Reserved37 Cylinder
  • Slide 38
  • External Storage John Urrutia 2014, All Rights Reserved38 Operation Terms Seek The physical movement of the read/write head to a particular cylinder on the platter Read The process of retrieving data from the drive Write The process of storing data on the drive Transfer The movement of data to or from the drive
  • Slide 39
  • External Storage Disk Specifications John Urrutia 2014, All Rights Reserved39 Manufacturer Seagate Technology Model ST9250410AS Spindle Speed 7200 rpm Avg. Latency 4.17msec I/O data transfer rate 3.0 (Gbits/sec max) T2T seek time (read) 1.5msec Avg. seek (read) 11.0msec Avg. seek (write) 13.0msec
  • Slide 40
  • External Storage Disk Organization John Urrutia 2014, All Rights Reserved40 Bytes/Sector512 Sectors/Track63 Size232.88 GB (250,056,737,280 bytes) Total Cylinders30,401 Total Sectors488,392,065 Total Tracks7,752,255 Tracks/Cylinder2
  • Slide 41
  • External Storage File system Organization Sequential Access Stream of bytes blocked together Must be read in sequential order beginning to end or vice versa. Can only add data to either end of the file. Cant delete records without copying entire file. Direct (random) Access Data organized into record blocks based on a key value Can be read sequentially or randomly by record Can add or delete anywhere in the file provided there is room. John Urrutia 2014, All Rights Reserved41
  • Slide 42
  • B-Trees and I/O We structure our b-tree so the data in the nodes correspond to the size of the disk clusters. We use the key values to designate the cluster that contains the data. This provides us with log n access to any record in our dataset, where n represents the number of children for each node in the tree. Each level in the tree requires 1 I/O when searching for a prospective record.
  • Slide 43
  • Summary A multiway tree has more keys and children than a binary tree. A 2-3-4 tree is a multiway tree with up to three keys and four children per node. In a multiway tree, the keys in a node are arranged in ascending order. In a 2-3-4 tree, all insertions are made in leaf nodes, and all leaf nodes are on the same level. John Urrutia 2014, All Rights Reserved43
  • Slide 44
  • Summary Three kinds of nodes are possible in a 2-3-4 tree: A 2-node has one key and two children A 3-node has two keys and three children A 4-node has three keys and four children. There is no 1-node in a 2-3-4 tree. In a search in a 2-3-4 tree, at each node the keys are examined. If the search key is not found the next node will be: Child 0 If the search key is less than key 0 Child 1 if the search key is between key 0 and key 1 Child 2 if the search key is between key 1 and key 2 Child 3 if the search key is greater than key 2. John Urrutia 2014, All Rights Reserved44
  • Slide 45
  • Summary 2-3-4 tree Insertion requires that any full node be split on the way down the tree, during the search for the insertion point. Splitting the root creates two new nodes Splitting any other node creates one new node. The height of a 2-3-4 tree only increases when the root is split. John Urrutia 2014, All Rights Reserved45
  • Slide 46
  • Summary There is a one-to-one correspondence between a 2-3-4 tree and a red-black tree. To transform a 2-3-4 tree into a red-black tree Make each 2-node into a black node Make each 3-node into a black parent with a red child Make each 4-node into a black parent with two red children. John Urrutia 2014, All Rights Reserved46
  • Slide 47
  • Summary When a 3-node is transformed into a parent and child, either node can become the parent. Splitting a node in a 2-3-4 tree is the same as performing a color flip in a red-black tree. A rotation in a red-black tree corresponds to changing between the two possible orientations (slants) when transforming a 3-node. John Urrutia 2014, All Rights Reserved47
  • Slide 48
  • Summary The height of a 2-3-4 tree is less than log N. Search times are proportional to the height. The 2-3-4 tree wastes space because many nodes are not even half full. John Urrutia 2014, All Rights Reserved48
  • Slide 49
  • Summary A 2-3 tree is similar to a 2-3-4 tree, except that it can have only one or two data items and one, two, or three children. Insertion in a 2-3 tree involves finding the appropriate leaf and then performing splits from the leaf upward, until a non-full node is found. John Urrutia 2014, All Rights Reserved49
  • Slide 50
  • Summary External storage means storing data outside of main memory, usually on a disk. External storage is larger, cheaper (per byte), and slower than main memory. Data in external storage is typically transferred to and from main memory a block at a time. Data can be arranged in external storage in sequential key order. This gives fast search times but slow insertion (and deletion) times. John Urrutia 2014, All Rights Reserved50
  • Slide 51
  • Summary A B-tree is a multiway tree in which each node may have dozens or hundreds of keys and children. There is always one more child than there are keys in a node. For the best performance, a B-tree is typically organized so that a node holds one block of data. If the search criteria involve many keys, a sequential search of all the records in a file may be the most practical approach. John Urrutia 2014, All Rights Reserved51