We’ve been using this simple diagram to illustrate the flow of information between an application and a persistent storage device. If the application is storing information on one disk drive, it often sends multiple requests to that disk drive. The diagram below shows an application sending four pieces of information (numbered “1” through “4”) down to a disk drive.
This diagram depicts a very common scenario that occurs on today’s desktop and laptop computers. A laptop computer, for example, comes equipped with an internal hard drive. When a user saves a file, for example, the CPU running on the laptop sends requests to store information to the disk drive. The CPU will often send multiple outstanding requests, and the disk drive will often queue these requests and handle them one at a time.
While this approach typically works well for laptops and desktops, large businesses are deploying applications that are generating huge amounts of data to store on disk. The CPU speeds in their enterprise-class computers will easily saturate a disk drive. Keep in mind that traditional (spinning) disk drives are mechanical devices; they spin and they seek. Spinning disk technology cannot keep pace with the speed of a CPU (never mind multiple CPUs)!
For this reason a disk array will choose to “virtualize” a disk drive. The CPU believes it is sending requests to one very large disk. The software inside of a disk array, however, is aggregating a set of disk drives and spreading the data across multiple disks. This “data striping” technique, however, increases the odds that a disk will fail (because so many more disks are being used). For this reason the disk array will generate “parity” information that can be used to reconstruct data in the case of a disk failure.
This striping technology is known as RAID (Redundant Array of Inexpensive/Independent Disks). It is often used by applications that are performing a high number of read requests (for example, it can often be used by banking applications that are continually analyzing spending patterns). Some disk arrays use one parity disk (known as RAID level 5), and some disk arrays use two parity disks (known as RAID level 6).
Disk array manufacturers use many different software techniques to lay out data. Some manufacturers distribute data and parity across every disk in the system (this is known as wide striping). Others focus on a specific set of disks (as shown in the diagram above).
Read up on all of these technologies. Understand how the needs of the application will often dictate the type of technique used within a disk array.
For your reference I’ve linked to some good starting points on Wikipedia for RAID and data striping.
Want to be found by top employers? Upload Your Resume
Join Gold to Unlock Company Reviews