Chapter 16 - Disk Storage, Basic File Structures, Hashing, and Modern Storage Architectures | Data

Disk Storage Devices

Preferred secondary storage device for high storage capacity and low cost.
Data stored as magnetized areas on magnetic disk surfaces.

disk pack

contains several magnetic disks connected to a rotating spindle.

track

divided into smaller blocks or sectors

physical disk block (hardware) address

consists of:
a cylinder number (imaginary collection of tracks of same radius from all recorded surfaces)
the track number or surface number (within the cylinder) and block number (within track).

Double buffering

can be used to speed up the transfer of contiguous disk blocks.

Records

Fixed and variable length records
Records contain fields which have values of a particular type
E.g., amount, date, time, age
Fields themselves may be fixed length or variable length
Variable length fields can be mixed into one record

Blocking

Refers to storing a number of records in one block on the disk.

Blocking factor

refers to the number of records per block.
the (average) number of file records stored in a disk block.

Spanned Records

Refers to records that exceed the size of one or more blocks and hence span a number of blocks.

file

sequence of records, where each record is a collection of data values (or data items).
can have fixed-length records or variable-length records.
can be unspanned or spanned

file descriptor (or file header)

includes information that describes the file, such as the field names and their data types, and the addresses of the file blocks on dis

Unspanned

no record can span two blocks

Spanned

a record can be stored in more than one block

physical disk blocks

can be contiguous, linked, or indexed.

Unordered Files

Also called a heap or a pile file.
New records are inserted at the end of the file.
A linear search through the file records is necessary to search for a record.
This requires reading and searching half the file blocks on the average, and is hence quite e

Ordered Files

Also called a sequential file.
File records are kept sorted by the values of an ordering field.
Insertion is expensive: records must be inserted in the correct order.
It is common to keep a separate unordered overflow (or transaction) file for new records

External Hashing

Hashing for disk files

Collisions

occur when a new record hashes to a bucket that is already full.

Open addressing

From the hash address, check records in order until an unused (empty) position is found.. Coalesced start at end of file and work back to empty position

Chaining

The new record is placed in an unused overflow location and the pointer of the occupied hash address location is set to the address of that overflow location.

Multiple hashing

The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

RAID

Redundant Arrays of Inexpensive Disks.
A large array of small independent disks acting as a single higher-performance logical disk.

main goal of RAID

to even out the widely different rates of performance improvement of disks against those in memory and microprocessors.

Raid level 0

has no redundant data and hence has the best write performance at the risk of data los

Raid level 1

uses mirrored disks.

Raid level 2

uses memory-style redundancy by using Hamming codes, which contain parity bits for distinct overlapping subsets of components. Level 2 includes both error detection and correction.

Raid level 3

uses a single parity disk relying

Raid Levels 4 and 5

use block-level data striping, with level 5 distributing data and parity information across all disks.

Raid level 6

applies the so-called P + Q redundancy scheme using Reed-Soloman codes to protect against up to two disk failures by using just two redundant disks.