B+ Trees and Hash Indexing

Super Simple Notes on B+ Trees and Hash Indexing

1. B+ Trees

  • Definition:
    A self-balancing tree data structure used for database indexing and file systems. It ensures efficient search, insert, and delete operations.
  • Key Features:
  1. All keys are stored in sorted order.
  2. Internal nodes guide the search; leaf nodes hold actual data.
  3. Height-balanced: Ensures all leaf nodes are at the same level.
  • Structure:
  • Internal Nodes: Contain keys and pointers to child nodes (used for navigation).
  • Leaf Nodes: Contain keys and pointers to data/records.
  • Advantages:
  1. Efficient Search: O(log N) complexity.
  2. Suitable for range queries (e.g., find values between X and Y).
  3. Sequential Access: Linked leaf nodes support easy traversal.
  • Real-World Use:
    Used in databases like MySQL, PostgreSQL, and file systems for indexing.

2. Hash Indexing

  • Definition:
    A data structure that uses hash functions to map keys to specific locations (buckets) for fast data retrieval.
  • Key Features:
  1. Uses a hash function to calculate the index.
  2. Keys are stored in buckets based on the hash value.
  3. Best for exact match queries (e.g., find a record with ID = 123).
  • Structure:
  • Hash Function: Transforms a key into a fixed-length hash value.
  • Buckets: Hold data or pointers to records.
  • Advantages:
  1. Fast Lookup: O(1) complexity in the best case.
  2. Simple Implementation.
  • Limitations:
  1. Not suitable for range queries.
  2. Collisions may occur when two keys map to the same bucket.
  • Collision Handling:
  • Chaining: Store multiple keys in a list within the same bucket.
  • Open Addressing: Probe for the next available bucket.
  • Real-World Use:
    Widely used in hash-based indexing (e.g., hash tables in Python, dictionaries) and databases for equality searches.

Quick Comparison: B+ Trees vs. Hash Indexing

FeatureB+ TreesHash Indexing
Query TypeSupports range queriesExact match only
SpeedSlower for exact matchFaster for exact match
StructureHierarchical (tree-like)Flat (buckets via hash func)
UsageDatabases, file systemsEquality queries in databases
OrderKeys stored in sorted orderNo order among keys

Last-Minute Learning Tips

  • Focus on why each is used:
  • B+ Trees: Range queries and sorted data.
  • Hash Indexing: Fast exact lookups.
  • Remember:
  • B+ Trees = Tree + Sorted.
  • Hash Indexing = Buckets + Hash Function.
  • Practice with examples:
  • Draw a small B+ tree with 2 levels.
  • Hash a few keys into buckets using a simple modulo function.

MCQ

Topic 1: B+ Trees

  1. What is a key feature of a B+ Tree compared to a binary search tree?
    A. Keys are unsorted in leaf nodes.
    B. Data is stored only in internal nodes.
    C. All leaf nodes are at the same level.
    D. No balancing is required.
    Answer: C
  2. Which of the following operations in a B+ Tree has a time complexity of O(log N)?
    A. Search
    B. Insert
    C. Delete
    D. All of the above
    Answer: D
  3. In a B+ Tree, where are the actual data or records stored?
    A. In the root node
    B. In the internal nodes
    C. In the leaf nodes
    D. In a separate hash table
    Answer: C
  4. Why are B+ Trees suitable for database indexing?
    A. They allow sequential access to records.
    B. They have no height balancing.
    C. They store duplicate keys in internal nodes.
    D. They require less memory.
    Answer: A
  5. Which of the following is true for B+ Trees?
    A. Internal nodes contain pointers to data records.
    B. Leaf nodes are not linked.
    C. Internal nodes store only keys and child pointers.
    D. It is an unbalanced data structure.
    Answer: C
  6. What happens to a B+ Tree when an insertion causes a node to overflow?
    A. The node is deleted.
    B. The node is split, and the median key is promoted.
    C. The tree becomes unbalanced.
    D. All keys are shuffled.
    Answer: B

Topic 2: Hash Indexing

  1. What is the primary use of hash indexing in databases?
    A. Range queries
    B. Exact match queries
    C. Sorting data
    D. Managing transactions
    Answer: B
  2. Which of the following is an example of a hash function?
    A. Binary search
    B. Modulo operation (key % number of buckets)
    C. Depth-first traversal
    D. Exponential function
    Answer: B
  3. What is the main problem addressed by collision handling in hash indexing?
    A. Storing large datasets
    B. Keys mapping to the same bucket
    C. Searching within a bucket
    D. Sorting keys within buckets
    Answer: B
  4. Which method is commonly used to handle collisions in hash indexing?
    A. Balancing
    B. Chaining
    C. Promotion
    D. Traversal
    Answer: B
  5. What is the worst-case time complexity for searching in hash indexing?
    A. O(1)
    B. O(log N)
    C. O(N)
    D. O(N log N)
    Answer: C
  6. Which of the following is a limitation of hash indexing?
    A. It requires more storage than B+ Trees.
    B. It does not support range queries.
    C. It cannot handle duplicate keys.
    D. It cannot be implemented in databases.
    Answer: B

Topic 3: Comparison and Applications

  1. Which data structure is better for range queries in databases?
    A. B+ Tree
    B. Hash Table
    C. Binary Search Tree
    D. Graph
    Answer: A
  2. Why are hash indexes not used for range queries?
    A. They are too slow for exact match queries.
    B. They do not store data sequentially.
    C. They use too much memory.
    D. They require too much computation.
    Answer: B
  3. In which scenario would you prefer hash indexing over a B+ Tree?
    A. Searching for all values in a range
    B. Sorting data in a database
    C. Performing exact match queries
    D. Storing hierarchical data
    Answer: C
  4. What is a common use case for B+ Trees?
    A. Equality-based lookups
    B. Maintaining a cache
    C. File system indexing
    D. Graph traversal
    Answer: C
  5. If you need to store a large number of unique keys with frequent insertions and deletions, which indexing method should you choose?
    A. Hash Indexing
    B. B+ Tree
    C. AVL Tree
    D. Binary Heap
    Answer: B
  6. Which of the following is an advantage of B+ Trees over hash indexing?
    A. Faster exact match queries
    B. Ability to handle range queries efficiently
    C. Lower memory usage
    D. Simpler implementation
    Answer: B

Topic 4: Practical Examples

  1. What would be the output of a hash function key % 5 for key = 12?
    A. 0
    B. 2
    C. 5
    D. 7
    Answer: B
  2. If a B+ Tree has a branching factor of 4 and contains 3 levels (root, internal, and leaf), what is the maximum number of keys it can store?
    A. 4
    B. 64
    C. 80
    D. 85
    Answer: C
  3. Given a hash table with 10 buckets and a key-value pair (25, “A”), where would the key 25 be stored if the hash function is key % 10?
    A. Bucket 0
    B. Bucket 2
    C. Bucket 5
    D. Bucket 10
    Answer: C