Super Simple Notes on B+ Trees and Hash Indexing
1. B+ Trees
- Definition:
A self-balancing tree data structure used for database indexing and file systems. It ensures efficient search, insert, and delete operations. - Key Features:
- All keys are stored in sorted order.
- Internal nodes guide the search; leaf nodes hold actual data.
- Height-balanced: Ensures all leaf nodes are at the same level.
- Structure:
- Internal Nodes: Contain keys and pointers to child nodes (used for navigation).
- Leaf Nodes: Contain keys and pointers to data/records.
- Advantages:
- Efficient Search: O(log N) complexity.
- Suitable for range queries (e.g., find values between X and Y).
- Sequential Access: Linked leaf nodes support easy traversal.
- Real-World Use:
Used in databases like MySQL, PostgreSQL, and file systems for indexing.
2. Hash Indexing
- Definition:
A data structure that uses hash functions to map keys to specific locations (buckets) for fast data retrieval. - Key Features:
- Uses a hash function to calculate the index.
- Keys are stored in buckets based on the hash value.
- Best for exact match queries (e.g., find a record with ID = 123).
- Structure:
- Hash Function: Transforms a key into a fixed-length hash value.
- Buckets: Hold data or pointers to records.
- Advantages:
- Fast Lookup: O(1) complexity in the best case.
- Simple Implementation.
- Limitations:
- Not suitable for range queries.
- Collisions may occur when two keys map to the same bucket.
- Collision Handling:
- Chaining: Store multiple keys in a list within the same bucket.
- Open Addressing: Probe for the next available bucket.
- Real-World Use:
Widely used in hash-based indexing (e.g., hash tables in Python, dictionaries) and databases for equality searches.
Quick Comparison: B+ Trees vs. Hash Indexing
Feature | B+ Trees | Hash Indexing |
---|---|---|
Query Type | Supports range queries | Exact match only |
Speed | Slower for exact match | Faster for exact match |
Structure | Hierarchical (tree-like) | Flat (buckets via hash func) |
Usage | Databases, file systems | Equality queries in databases |
Order | Keys stored in sorted order | No order among keys |
Last-Minute Learning Tips
- Focus on why each is used:
- B+ Trees: Range queries and sorted data.
- Hash Indexing: Fast exact lookups.
- Remember:
- B+ Trees = Tree + Sorted.
- Hash Indexing = Buckets + Hash Function.
- Practice with examples:
- Draw a small B+ tree with 2 levels.
- Hash a few keys into buckets using a simple modulo function.
MCQ
Topic 1: B+ Trees
- What is a key feature of a B+ Tree compared to a binary search tree?
A. Keys are unsorted in leaf nodes.
B. Data is stored only in internal nodes.
C. All leaf nodes are at the same level.
D. No balancing is required.
Answer: C - Which of the following operations in a B+ Tree has a time complexity of O(log N)?
A. Search
B. Insert
C. Delete
D. All of the above
Answer: D - In a B+ Tree, where are the actual data or records stored?
A. In the root node
B. In the internal nodes
C. In the leaf nodes
D. In a separate hash table
Answer: C - Why are B+ Trees suitable for database indexing?
A. They allow sequential access to records.
B. They have no height balancing.
C. They store duplicate keys in internal nodes.
D. They require less memory.
Answer: A - Which of the following is true for B+ Trees?
A. Internal nodes contain pointers to data records.
B. Leaf nodes are not linked.
C. Internal nodes store only keys and child pointers.
D. It is an unbalanced data structure.
Answer: C - What happens to a B+ Tree when an insertion causes a node to overflow?
A. The node is deleted.
B. The node is split, and the median key is promoted.
C. The tree becomes unbalanced.
D. All keys are shuffled.
Answer: B
Topic 2: Hash Indexing
- What is the primary use of hash indexing in databases?
A. Range queries
B. Exact match queries
C. Sorting data
D. Managing transactions
Answer: B - Which of the following is an example of a hash function?
A. Binary search
B. Modulo operation (key % number of buckets)
C. Depth-first traversal
D. Exponential function
Answer: B - What is the main problem addressed by collision handling in hash indexing?
A. Storing large datasets
B. Keys mapping to the same bucket
C. Searching within a bucket
D. Sorting keys within buckets
Answer: B - Which method is commonly used to handle collisions in hash indexing?
A. Balancing
B. Chaining
C. Promotion
D. Traversal
Answer: B - What is the worst-case time complexity for searching in hash indexing?
A. O(1)
B. O(log N)
C. O(N)
D. O(N log N)
Answer: C - Which of the following is a limitation of hash indexing?
A. It requires more storage than B+ Trees.
B. It does not support range queries.
C. It cannot handle duplicate keys.
D. It cannot be implemented in databases.
Answer: B
Topic 3: Comparison and Applications
- Which data structure is better for range queries in databases?
A. B+ Tree
B. Hash Table
C. Binary Search Tree
D. Graph
Answer: A - Why are hash indexes not used for range queries?
A. They are too slow for exact match queries.
B. They do not store data sequentially.
C. They use too much memory.
D. They require too much computation.
Answer: B - In which scenario would you prefer hash indexing over a B+ Tree?
A. Searching for all values in a range
B. Sorting data in a database
C. Performing exact match queries
D. Storing hierarchical data
Answer: C - What is a common use case for B+ Trees?
A. Equality-based lookups
B. Maintaining a cache
C. File system indexing
D. Graph traversal
Answer: C - If you need to store a large number of unique keys with frequent insertions and deletions, which indexing method should you choose?
A. Hash Indexing
B. B+ Tree
C. AVL Tree
D. Binary Heap
Answer: B - Which of the following is an advantage of B+ Trees over hash indexing?
A. Faster exact match queries
B. Ability to handle range queries efficiently
C. Lower memory usage
D. Simpler implementation
Answer: B
Topic 4: Practical Examples
- What would be the output of a hash function
key % 5
for key = 12?
A. 0
B. 2
C. 5
D. 7
Answer: B - If a B+ Tree has a branching factor of 4 and contains 3 levels (root, internal, and leaf), what is the maximum number of keys it can store?
A. 4
B. 64
C. 80
D. 85
Answer: C - Given a hash table with 10 buckets and a key-value pair (25, “A”), where would the key 25 be stored if the hash function is
key % 10
?
A. Bucket 0
B. Bucket 2
C. Bucket 5
D. Bucket 10
Answer: C