Chapter 16: Disk Storage, Basic File Structures, Hashing, and Modern Storage Architectures
Advanced Database Management Systems - Final Term Elite Preparation
1. Introduction & Storage Hierarchy
Databases are typically massive and must be stored efficiently on magnetic disks or SSDs. The DBMS accesses this data using underlying physical database file structures.
Data Types Based on Lifespan
- Persistent Data: Outlives the execution of the program that created it. Represents the vast majority of database data, stored permanently on disk.
- Transient Data: Exists only temporarily during program execution (e.g., intermediate query results in RAM).
The Storage Hierarchy
Storage is categorized by a strict trade-off: as you go down the hierarchy, capacity increases, cost per byte decreases, but access time heavily increases.
- Primary Storage: CPU Main Memory (RAM), Cache Memory (SRAM, DRAM). Fastest, volatile, highly expensive.
- Secondary (Mass) Storage: Magnetic disks (HDDs), Flash memory, Solid-state drives (SSDs), CD-ROMs, DVDs. Non-volatile, affordable mass storage.
- Tertiary Storage: Removable media, Tape Jukeboxes. Slowest, used for massive offline archives.
2. Secondary Storage Devices
Secondary storage dictates how records are physically placed and accessed.
Hard Disk Drives (HDD)
- Architecture: Disks (single or double-sided) contain concentric circles called Tracks. A Cylinder is the collection of identical tracks across all disk platters.
- Sectors & Blocks: Tracks are divided into blocks or sectors.
- Formatting: The OS formatting process divides tracks into equal-sized disk blocks separated by interblock gaps.
- Data Transfer: Data is rigidly transferred between disk and RAM in units of Disk Blocks.
- Controllers: Hardware that interfaces the disk drive to the computer system using standard interfaces like SCSI, SATA, and SAS.
Solid State Devices (SSD / Flash Storage)
- Components: Uses a main controller and interconnected flash memory cards. There are zero moving parts.
- Advantages: Data is less likely to be fragmented. Faster access times compared to HDDs.
- DRAM-based SSDs: A specific ultra-fast variant that offers even quicker access times than standard flash, though at a higher cost.
Magnetic Tape
- Access Method: Exclusively Sequential access. To read a block, the read/write head must physically scan past all preceding blocks.
- Primary Function: Used strictly for Backup and Archiving.
3. Buffering & Efficient Data Access
Buffering minimizes the time the CPU spends waiting for slow disk I/O.
Double Buffering
A technique used to read a continuous stream of blocks. While the CPU processes data in Buffer A, the disk controller actively fills Buffer B. They swap roles instantly, allowing interleaved concurrency and parallel execution.
Buffer Management Metadata
- Pin Count: Tracks how many active processes are using a block. If Pin Count > 0, the block is "pinned" and the Buffer Manager cannot evict it.
- Dirty Bit: Set to '1' if the block has been modified in memory. If evicted, a dirty block must be safely written back to the disk. If '0', it can simply be overwritten.
Buffer Replacement Strategies
- LRU (Least Recently Used): Replaces the block that has sat unused for the longest time.
- Clock Policy: A more efficient, low-overhead approximation of LRU.
- FIFO (First-In-First-Out): Evicts the oldest block loaded into memory, regardless of usage.
Beyond standard buffering, modern systems speed up access via:
- Read-ahead: Predicting and fetching data blocks before the CPU explicitly requests them.
- I/O Scheduling: Reordering disk requests to minimize the physical movement of the disk head.
- Log Disks: Using dedicated, high-speed disks to temporarily hold write operations.
- Flash for Recovery: Utilizing non-volatile SSDs to securely hold logs for rapid crash recovery.
4. Placing File Records on Disk
Records consist of fields. Common types include Numeric, String, Boolean, and Date/time. For unstructured objects (like images, audio, or large text documents), databases use BLOBs (Binary Large Objects).
Variable-Length Records
Records are often not uniform in size. Reasons for variable-length records include:
- Fields intrinsically have variable lengths (e.g., VARCHAR).
- A record contains repeating fields (arrays/lists).
- Fields are optional (frequent NULL values).
- The file contains mixed, differing record types.
Record Blocking: Spanned vs. Unspanned
- Unspanned Records: Records are strictly not allowed to cross block boundaries. If a record cannot fit in the remaining space of a block, it is moved to the next block, resulting in wasted space (internal fragmentation).
- Spanned Records: Records larger than a single block can cross boundaries. A pointer at the end of the first block links to the block containing the remainder of the record.
The average number of records stored per disk block.
bfr = ⌊ B / R ⌋Where B = Block Size, R = Record Size. (For unspanned records, you round down).
Allocating File Blocks on Disk
- Contiguous Allocation: Blocks are stored adjacently. Very fast sequential reads, but causes heavy fragmentation over time.
- Linked Allocation: Each block contains a pointer to the next block. Good for dynamic files, terrible for random access.
- Indexed Allocation: A dedicated index block contains pointers to all individual data blocks.
- File Header (Descriptor): Crucial metadata stored at the beginning of the file containing disk addresses and format descriptions required by system programs.
5. Basic File Organizations
- Retrieval (No data change): Open, Find, Read, FindNext, Scan, Close.
- Update (Changes data): Insert, Delete, Modify.
1. Heap (Pile) Files - Unordered
- Insertion: Extremely efficient. Records are simply appended in the order of insertion at the end of the file.
- Searching: Highly inefficient. Requires a linear scan (reading block by block until found).
- Deletion: Requires reading, modifying, and rewriting the block, or using a "deletion marker" (tombstone) to virtually delete it.
2. Sorted (Sequential) Files - Ordered
- Records are physically sorted based on an ordering field (called the Ordering Key if the field is unique).
- Advantages: Reading records sequentially in order is exceptionally fast. Finding the "next" record is instant.
- Searching: Highly efficient. Allows the use of Binary Search.
| File Organization | Search Method | Average Access Time |
|---|---|---|
| Heap (Unordered) | Linear Search | b / 2 |
| Ordered | Linear Scan | b / 2 |
| Ordered | Binary Search | log₂ b |
Aside from Heap and Sorted files, modern databases might use:
- Files of Mixed Records: Implements relationships physically using logical field references.
- B-Tree Data Structures: The industry-standard tree index for keeping data sorted and allowing fast searches/insertions.
- Column-Based Storage: Stores data column-by-column rather than row-by-row, massively optimizing read-heavy analytical queries.
6. Hashing Techniques
Used when a group of records is accessed exclusively by one specific field value (Hash Field, typically the primary key). The search condition is an equality condition (=).
- Hash Function (Randomizing Function): Applied to the hash field to mathematically calculate the physical disk block address (the Bucket) of the stored record.
- Bucket: A unit of storage consisting of one disk block or multiple contiguous blocks.
- Collision: Occurs when the hash function maps a newly inserted record to an address that already contains a different record.
- Collision Resolution: Handled via Open Addressing (probing for the next free slot), Chaining (creating a linked list of overflow blocks), or Multiple Hashing (applying a second function).
Dynamic File Expansion
Static hashing allocates a fixed number of buckets, leading to severe collisions as the database grows. Modern solutions include:
- Extendible Hashing: Uses a directory of pointers that can dynamically double in size. File performance does not degrade as it grows.
- Dynamic Hashing: Maintains a tree-structured directory for flexible growth.
- Linear Hashing: An advanced technique that allows the file to seamlessly expand and shrink buckets without needing to maintain a separate directory structure.
7. Parallelizing Disk Access Using RAID
RAID (Redundant Arrays of Independent Disks) aims to dramatically improve disk speed, access time, and system reliability.
Core Mechanics
- Data Striping: Splitting data (at the bit or block level) evenly across multiple disks to achieve higher, parallel transfer rates.
- Redundancy: Mirroring and shadowing techniques duplicate data to ensure availability during hardware failures.
| RAID Level | Architecture & Mechanism | Use Case / Benefit |
|---|---|---|
| Level 0 | Data Striping across disks. No redundant data. | Max speed, zero fault tolerance. |
| Level 1 | Mirroring. Exact copies on two disks. | Ultimate reliability; rebuilding is easiest. |
| Level 2 | Memory-style redundancy using Hamming codes. | Heavy error detection and correction. |
| Level 3 | Bit-interleaved with a single parity disk. | Relies heavily on the disk controller. |
| Levels 4 & 5 | Block-level striping. RAID 5 distributes Parity across all disks. | The industry standard. Highly preferred for large volume storage. |
| Level 6 | Applies P+Q dual redundancy scheme. | Protects against up to TWO simultaneous disk failures. |
8. Modern Storage Architectures
- SAN (Storage Area Networks): Online storage peripherals configured as standalone nodes on a dedicated, high-speed network.
- NAS (Network-Attached Storage): Dedicated servers used purely for file sharing. Offers a high degree of scalability, reliability, and flexibility.
- iSCSI: Clients send SCSI commands to remote storage devices over standard IP networks.
- FCIP & FCoE: Fibre Channel over IP (translates FC codes to IP packets to connect distant SANs geographically) and Fibre Channel over Ethernet (similar to iSCSI but avoids IP overhead).
- Automated Storage Tiering: A smart system that automatically moves data between different storage types depending on need. Frequently-used data goes to SSDs; old data goes to HDDs.
- Object-Based Storage: Data is managed as "objects" rather than blocks. Objects carry rich metadata and a global identifier. Ideally suited for scalable storage of unstructured data (like AWS S3).
🔥 Core Theory Q&A Preparation
Ensure you can answer these clearly. They validate your foundational knowledge.
Q: Explain the exact mechanism and purpose of Double Buffering.
A: Double buffering facilitates interleaved concurrency. By utilizing two independent memory buffers (A and B), the CPU can actively process the data residing in Buffer A. Simultaneously, the disk I/O controller fetches the next sequential block from the hard drive into Buffer B. Once the CPU finishes with A, it instantly switches to B, eliminating CPU idle time waiting for mechanical disk reads.
Q: When is 'Spanned' record organization mandatory, and what is its primary drawback?
A: Spanned organization is mandatory when a single logical database record's size exceeds the maximum size of a physical disk block. The drawback is increased access time: fetching a single spanned record requires the disk read/write head to access at least two separate disk blocks, effectively doubling the I/O cost for that record.
Q: How does Linear Hashing differ from Extendible Hashing in handling database growth?
A: Extendible Hashing manages growth by maintaining a separate, centralized array of pointers (a directory) that doubles in size when overflow occurs. Linear Hashing eliminates the need for this directory entirely. Instead, it uses a mathematical state-machine approach that smoothly splits buckets one by one in a linear fashion as the file's load factor increases.
Q: What is a Hash Collision, and what are the three primary ways to resolve it?
A: A collision occurs when the hash function mathematically assigns two distinct records to the exact same disk block (bucket) address. It is resolved by:
1. Open Addressing: Sequentially probing the disk for the next available empty slot.
2. Chaining: Creating a linked list pointing to separate overflow blocks.
3. Multiple Hashing: Applying a secondary, completely different hash function to find a new address.
🏆 10-Mark Scenario Questions
These complex scenarios require synthesizing multiple concepts. They mirror exactly what you will face in a high-weight university final exam.
You are the DBA for a major hospital. The system requires storing millions of patient text records and massive MRI image files. The system must operate 24/7, cannot lose data, and must read data quickly. Hardware cost is a secondary concern. Design the physical storage layer specifying: (a) RAID Level, (b) Data Types, and (c) Storage Architecture. Justify each.
Elite Answer Formulation:
- (a) RAID Level: I will implement RAID Level 1 or RAID Level 6. Since the hospital cannot afford data loss and cost is secondary, RAID 1 (Mirroring) provides the easiest and fastest rebuilding of data. Alternatively, RAID 6 provides dual P+Q parity, ensuring the system survives up to two simultaneous disk failures.
- (b) Data Types: The standard patient text records will use standard Numeric, String, and Date types. However, the MRI images will explicitly be stored using BLOBs (Binary Large Objects), as they are massive unstructured data objects.
- (c) Storage Architecture: I will use a combination of SAN (Storage Area Network) for high-speed, block-level transaction processing of patient vitals, coupled with Object-based Storage. Object-based storage is perfectly suited for highly scalable, unstructured data like MRI BLOBs, as it manages them via metadata and global identifiers rather than rigid file blocks.
An E-commerce platform has a massive 'Products' table. 95% of queries are exact-match searches based on the ProductID (e.g., SELECT * WHERE ProductID = 104). The product list grows by 10,000 items daily.
Should the database store this table as a Heap file, a Sorted File, or using Hashing? Which specific hashing technique, if any? Justify your choices.
Elite Answer Formulation:
- Rejection of Heap & Sorted: A Heap file is rejected because searching takes b/2 time, which will be disastrously slow for an e-commerce site. A Sorted file provides fast binary search (log₂ b), but inserting 10,000 new items daily would require constant, highly expensive block rewriting to maintain physical sort order.
- Optimal Choice: Hashing. Hashing provides ultra-fast access time (usually 1 disk access) when the search condition is an equality condition on a key field (which matches the exact-match
ProductIDquery requirement). - Specific Technique: Because the file expands rapidly (10,000 items daily), Static Hashing would fail due to massive collisions. I will implement Extendible Hashing or Linear Hashing. Linear Hashing is highly recommended here, as it allows the hash file to smoothly expand buckets without the overhead of maintaining a tree-structured directory.
A financial database is experiencing severe latency. The CPU is frequently idling, waiting for disk reads. Upon inspection, the system is using single buffering, a FIFO replacement strategy, and contiguous block allocation. Propose four specific, distinct architectural changes to optimize this I/O bottleneck.
Elite Answer Formulation:
- Upgrade to Double Buffering: Switch from single to double buffering so the disk controller can pre-load Buffer B while the CPU actively processes Buffer A, enabling parallel execution and eliminating CPU idle time.
- Change Buffer Replacement to LRU or Clock: FIFO is inefficient for databases because it evicts blocks based strictly on age, potentially evicting highly used index blocks. LRU (Least Recently Used) will keep frequently accessed financial data pinned in RAM.
- Implement Read-Ahead (Pre-fetching): Configure the hardware controller to read data ahead of explicit requests, anticipating sequential financial reporting queries.
- Migrate to Indexed Block Allocation: Contiguous allocation causes severe fragmentation over time as financial records are inserted/deleted. Indexed allocation uses a dedicated index block to point to scattered data blocks, eliminating fragmentation while preserving read speed.
A social media app stores "Posts". A post can be a short 10-byte text, or a massive 15-Megabyte article with inline media. Currently, the system stores all posts on expensive SSDs using "Unspanned" record blocking. The CFO complains about storage costs, and the DBA complains about wasted disk space. How do you resolve both issues?
Elite Answer Formulation:
- Resolving Wasted Disk Space (Blocking Strategy): The system must immediately switch from Unspanned to Spanned Records. Because post sizes are highly variable (10 bytes to 15 MB), Unspanned records enforce block boundaries. If a 15MB post cannot fit in the remaining space of a block, it leaves massive gaps of wasted space (internal fragmentation). Spanned records allow the 15MB post to cross block boundaries seamlessly using pointers, utilizing 100% of block space.
- Resolving Storage Costs (Architecture Strategy): I will implement Automated Storage Tiering. It is a waste of money to store 5-year-old social media posts on expensive Flash/SSDs. Automated tiering will keep "Hot" (recent, frequently accessed) posts on the DRAM-based SSDs, and seamlessly migrate "Cold" (old, rarely viewed) data down to cheaper, slower HDDs or Tape archives without manual intervention.