A static index structure that is effective when the file is not frequently updated, but it is unsuitable for files that grow and shrink a lot.
What is a B+ tree?
A dynamic structure that adjusts to changes in the file gracefully. It is the most widely used structure because it adjusts well to changes and supports both equality and range queries.
What does ISAM stand for?
Indexed Sequential Access Method.
What are the three types of pages in ISAM?
1. Data pages 2. Index Pages 3. Overflow Pages
What can result if many insertions are made to the same leaf in ISAM?
Long overflow chains which can significantly affect the time to retrieve a record.
What is a significant benefit ISAM has with respect to locking?
Since we know that index-level pages are never modified, we can safely omit the locking step present when using a dynamic structure like the B+ tree.
Which of the two structures allocates pages sequentially? What is the alternative?
ISAM allocates pages sequentially. The alternative, which B+ tree uses, is to allocate pages however, and then link doubly using pointers.
What is the sequence set?
The sequence of leaf pages made up into a doubly linked list.
What are three properties of a B+ tree?
1. Operations on the tree keep it balanced. 2. A minimum occupancy of 50% is guaranteed for each node except the root. 3. Searching for a record just requires traversal from root to leaf. This length is called the height. Because of a high fan out, B+ trees are rarely over 3 or 4 in height.
What is the order of a B+ tree?
The order is d where every node contains m entries and d <= m <= 2d. This is the 50% minimum occupancy clause. For the root, 1 <= m <= 2d.
What is the typical space occupancy of a B+ tree?
Why choose B+ tree over Sorting?
B+ tree has all the benefits of Sorting while at the same time improves drastically on inserts and deletions.
Why choose B+ tree over ISAM?
Inserts are handled gracefully, avoiding overflow chains.
Why choose ISAM of B+ tree? There are two main reasons.
If the data is relatively static, ISAM improves over B+ tree in two ways: 1. Leaf pages are allocated in sequence, making scans over a large range more efficient. 2. The locking overhead of ISAM is lower than that for B+ trees.
What are the two options for insertions on a full leaf node in a B+ tree?
1. Check for redistribution possibilities in neighboring leaf nodes 2. If not possible, split and copy/push.
What are the two options for deletions which create nodes below the order?
1. Redistribution 2. If not possible, merge.
In what two possible ways are duplicate keys dealt with in B+ trees?
1. Overflow pages 2. Distinguish them with row id values
What is the height of a B+ tree proportional to?
LogF(# of data entries) where F is the fan-out. It is then clearly important to maximize fan-out to minimize I/O operations.
How can the fan out of a tree be increased?
Prefix key compression, i.e. minimizing key length by taking the minimal necessary amount of the key, e.g. David and Daniel as keys really only require Dav and Dan to be distinguished.
How is bulk loading done?
It's a left-to-right process which involves filling the root, splitting on overflow, and doing that until completed. Page 362 shows the more complicated steps, but simply the right-most node is split, promoting its right half to parent and redistributing its left half to the new right node.
What is the order concept typically replaced with? Why?
It is typically replaced with a simple physical space criterion. This is due to the nature of varying length of keys (strings), duplication, the fact that leaves contain data which index nodes do not, etc.
Want to see the other 21 Flashcards in Chapter 10: Tree-Structured Indexing?JOIN TODAY FOR FREE!