RAID for Web Hosting


RAID for Web Hosting

RAID is a concept introduced by Patterson, Gibson and Katz of the University of California Berkeley in 1987 via the paper entitled “A Case for Redundant Arrays of Inexpensive Disks (RAID)” The basic idea of RAID was to combine multiple small, inexpensive disk drives into an array of disk drives, that appears to the computer as a single, logical storage unit or drive, to offer performance exceeding that of a Single Large Expensive Drive (SLED)
Its meaning has now evolved into Redundant Arrays of Independent Disks, and offers three key advantages: redundancy, increased performance and lower cost.
Redundancy is achieved by storing data in multiple hard disks and increased performance, by allowing input/output operations to overlap in a balanced way. It also increases the mean time between failures (MTBF), so fault tolerance is increased
The Berkeley paper described five types of array architectures, (RAIDs 1 through 5), each providing disk fault-tolerance and offering different trade-offs in features and performance. This list has now been expanded to 9, to include RAIDs 6, 7, 10, and 53. In this article, however, only the most common ones will be discussed, including the non-redundant RAID-0.

  • RAID-0 – Its main purpose is to improve speed and requires two or more physical drives. It does this through ‘striping’, or the use of an algorithm to break files into smaller ones, called stripes, the size of which is defined by the user. Each drive then receives a stripe or more of these fragments to complete the writing process, thus decreasing the time required to write the file. The same is also true for the reverse (reading process), as both drives read at the same time)
  • RAID-1 – its main purpose is security and also requires at least two drives. Here, mirroring is done, meaning that data is duplicated and written to two drives in an array. Fault tolerance is its special feature because if either of drive fails, no data is lost. It offers little in terms of performance though. When reading data, it gets information from the drive that is not too busy, but when writing, there is overhead as the controller must duplicate the file it is sent before passing it along to the drives.
  • RAID-5 – requires at least three and usually five disks for the array and is best for multi-user systems in which performance is not critical or which do few write operations. Here, you get the speed of striping and the reliability of mirroring, since two of the disk get stripes, and the third gets a parity bit for redundancy. The assignment of stripes and parity bits among the disks shifts constantly to eliminate the random write performance hit of the dedicated drive receiving the parity information. They’re called hardware RAID controllers because they require a special chip to make the parity bits, and there is overhead due to the parity bit calculation and writing.

There are two possible approaches to RAID:

  • HardwareRAID – where the RAID subsystem is managed independently from the host and presents to the host only a single disk per RAID array. They are highly fault tolerant and are of two types: the controller-based RAID and the external SCSI RAID.
  • Software RAID – occupies host system memory, consume CPU cycles, is OS dependent and performance is directly dependent on CPU performance and load. Examples are the MD driver in the Linux kernel; the Solstice DiskSuite and Veritas Volume Manager for the Solaris system; and Adaptecs AAA-RAID controllers.

* parity – (from the Latin paritas: equal or equivalent) refers to a technique of checking whether data has been lost or written over when it’s moved from one place in storage to another or when transmitted between computers.