CMU-Database-Notes

Topics

CHAPTER 12

  1. Physical Storage Systems
  2. Storage Interfaces
  3. Magnetic Disks
  4. Flash Memory
  5. Storage Types in DBMS
  6. System Design Goals
  7. Disk-Oriented DBMS Overview
  8. DBMS vs. OS

CHAPTER 13

  1. Database Storage Architecture
  2. File Organization
  3. Organization of Records in Files
  4. Database Pages
  5. Heap File
  6. Page Layout
  7. Tuple Layout

1. Physical Storage Systems

Several types of data storage exist in most computer systems, classified by:

Below are common storage media:

1. Cache

2. Main Memory

3. Flash Memory

4. Magnetic-Disk Storage (HDD)

5. Optical Storage (DVD)

6. Tape Storage

Hierarchy of Storage Media

The storage media are organized into a hierarchy based on:

Levels of the Hierarchy:

  1. Primary Storage: Fast and expensive (e.g., cache, main memory).
  2. Secondary Storage: Online storage (e.g., flash memory, magnetic disks).
  3. Tertiary Storage: Offline storage (e.g., tapes, optical-disk jukeboxes).

2. Storage Interfaces


3. Magnetic Disks

Overview

Magnetic disks are the primary form of secondary storage for modern computer systems. While capacities have increased steadily, large-scale applications still require massive amounts of storage. SSDs, with better performance, are increasingly competitive but remain more expensive than magnetic disks, which are preferred for storing large volumes of infrequently accessed data.

Physical Characteristics of Magnetic Disks

Performance Measures of Disks

Key Metrics


4. Flash Memory

Types of Flash Memory

There are two types of flash memory:

NAND Flash Characteristics

Solid-State Drives (SSDs)

Flash Memory Write Process

Limitations

Storage Class Memory

SSD Performance Metrics

  1. Random Block Reads: For 4 KB blocks, SSDs can handle around 10,000 IOPS (random reads per second), while NVMe PCIe SSDs can support over 350,000 random reads per second.
  2. Sequential Data Transfer Rate: For SATA SSDs, the rate is 400 to 500 MB/s, and for NVMe SSDs, it ranges from 2 to 3 GB/s.
  3. Random Block Writes: SSDs can support about 40,000 IOPS (without parallelism), and 100,000 IOPS with parallelism (QD-32).

Hybrid Drives


5. Storage Types in DBMS

Note => The DBMS assumes that the primary storage location of the database is on non-volatile disk.

The DBMS’s components manage the movement of data between non-volatile and volatile storage.

storage

Volatile Devices:


6. System Design Goals

design goals


7. Disk-Oriented DBMS Overview

Overview

In a disk-oriented Database Management System (DBMS), all data is stored on disk and organized into pages. The first page is designated as the directory page, which helps manage the data layout.

Disk-Oriented

Key Components

  1. Buffer Pool:
    • The buffer pool is responsible for managing the movement of data between disk and memory.
    • When the execution engine needs to operate on data, it requests specific pages from the buffer pool.
  2. Execution Engine:
    • The execution engine processes queries and relies on the buffer pool to provide access to the required pages.
    • It receives a pointer to the page in memory, allowing it to perform operations on that data.
  3. Data Management:
    • The buffer pool manager ensures that the requested page remains available in memory while the execution engine operates on it.

8. DBMS vs. OS

Overview

The design goal of a Database Management System (DBMS) is to efficiently support databases that exceed the available memory. Effective disk usage management is crucial since reading and writing to disk is expensive, and we want to minimize delays caused by fetching data from disk.

Key Points

  1. High-Level Design Goal:
    • The DBMS should process other queries while waiting for data from the disk, similar to the concept of virtual memory where the OS manages page retrieval from disk.
  2. Control and Performance:
    • The DBMS typically needs to maintain control over memory management to optimize data access and query processing.
    • The operating system may not always provide the best performance for DBMS operations.

1. Database Storage Architecture

1. Storing Records in Files

2. File Organization

3. Data Dictionary

4. Database Buffer

5. Column-Oriented Storage

6. Main-Memory Databases

7. Storage Class Memory


2. File Organization

In its most basic form, a DBMS stores a database as files on disk. Some may use a file hierarchy, while others may use a single file (e.g., SQLite). The operating system does not know anything about the contents of these files. Only the DBMS understands how to decipher their contents, as it is encoded in a way specific to the DBMS.

The DBMS’s storage manager is responsible for managing a database’s files. It represents the files as a collection of pages and keeps track of what data has been read and written to these pages, as well as how much free space exists in them.

Storagemanger

Key Concepts

1. File Structure

2. Fixed-Length Records

Handling Records

File Header

Insertion and Deletion

3. Variable-Length Records

Variable-length records are essential in database systems due to several factors, including the presence of variable-length fields (such as strings), record types with repeating fields (like arrays or multisets), and the inclusion of multiple record types within a file. This document outlines the key aspects of variable-length record representation, slotted-page file organization, and methods for storing large objects.

Key Problems

Any technique for implementing variable-length records must address two fundamental problems:

  1. Representation of a Single Record: Records consist of:
    • An initial part containing fixed-length information, uniform across all records of the same relation.
    • A variable-length part where attributes are stored.
  2. Storage within a Block: Efficiently storing variable-length records in a block requires a method that allows easy extraction of records.

Record Structure

4. Slotted-Page Structure

The slotted-page structure is a commonly used method for organizing records within a block:

Block Header

Each block contains a header with:

Records

Indirection

Pointers reference entries in the header instead of directly pointing to records. This allows records to be moved within the block, preventing fragmentation and simplifying space management.

5. Storing Large Objects

Databases often handle data that can exceed the size of a disk block, such as images, audio recordings, or videos.

Size Limitation

Typically, records must not exceed the size of a block. To manage large objects:

Performance Issues

Storing large objects within a database can affect:

File System Integration

Some databases support integration with the file system, ensuring:


3. Organization of Records in Files

In database systems, organizing records in files is crucial for efficient data access and management. This document outlines several methods for organizing records in files.

1. Heap File Organization

2. Sequential File Organization

3. Multitable Clustering File Organization

4. B+-Tree File Organization

5. Hashing File Organization

4. Database Pages

The DBMS organizes the database into fixed-size blocks of data known as pages. These pages can contain various types of data (e.g., tuples, indexes), and most systems do not mix these types within the same page. Some systems require that pages be self-contained, meaning all necessary information to read a page is contained within it.

dbpage

Page Identification

Each page has a unique identifier (page ID). In a single-file database, the page ID can be the file offset. The page ID can be unique to the DBMS instance, database, or table. Most DBMSs utilize an indirection layer to map a page ID to a file path and offset, allowing the storage manager to convert a requested page number into a specific file and offset.

Fixed-Size Pages

Most DBMSs use fixed-size pages to simplify management and avoid the complexities associated with variable-sized pages. For instance, variable-size pages can create holes in files that are difficult to fill when pages are deleted.

Types of Pages

There are three types of pages in a DBMS:

  1. Hardware Page: Typically 4 KB.
  2. OS Page: Usually 4 KB.
  3. Database Page: Ranges from 1 to 16 KB.

5. Heap File

Heap file organization is a method for locating pages on disk in a DBMS. A heap file is an unordered collection of pages where tuples are stored in a random order. There are two primary methods to find the location of a page given its page ID:

heapfile

1. Linked List

2. Page Directory

6. Page Layout

Every database page includes a header that records important metadata about its contents:

Data Layout Approaches

A basic approach to laying out data is to maintain a count of the tuples stored in a page and append new tuples at the end. However, issues can arise when tuples are deleted or when they have variable-length attributes.

There are two main approaches to organizing data within pages:

1. Slotted Pages

1. Slotted Pages

slottedpage

2. Log-Structured

7. Tuple Layout

A tuple is essentially a sequence of bytes that the DBMS interprets into attribute types and values. The layout consists of a header, data, and a unique identifier.

Tuple Header

The tuple header contains important metadata, including:

Note: The DBMS does not need to store metadata about the schema of the database in the tuple header.

Tuple Data

Unique Identifier

Denormalized Tuple Data