Logo

Tech Careers: Know the Layout

Published: Feb 02, 2010

 Technology       
When interviewing for a high-tech job position it is beneficial to increase your knowledge of a common landing point for high-tech information: the disk array. In my last post I described a variety of alternatives and permutations for storing information within a disk array. In this post I will specifically describe information layout techniques that are common in the industry. Gaining a working knowledge of these techniques (and why they are used) will make a great impression during an interview session.

We’ve been using this simple diagram to illustrate the flow of information between an application and a persistent storage device. If the application is storing information on one disk drive, it often sends multiple requests to that disk drive. The diagram below shows an application sending four pieces of information (numbered “1” through “4”) down to a disk drive.

application stores data to a disk drive

This diagram depicts a very common scenario that occurs on today’s desktop and laptop computers. A laptop computer, for example, comes equipped with an internal hard drive. When a user saves a file, for example, the CPU running on the laptop sends requests to store information to the disk drive. The CPU will often send multiple outstanding requests, and the disk drive will often queue these requests and handle them one at a time.

While this approach typically works well for laptops and desktops, large businesses are deploying applications that are generating huge amounts of data to store on disk. The CPU speeds in their enterprise-class computers will easily saturate a disk drive. Keep in mind that traditional (spinning) disk drives are mechanical devices; they spin and they seek. Spinning disk technology cannot keep pace with the speed of a CPU (never mind multiple CPUs)!

For this reason a disk array will choose to “virtualize” a disk drive. The CPU believes it is sending requests to one very large disk. The software inside of a disk array, however, is aggregating a set of disk drives and spreading the data across multiple disks. This “data striping” technique, however, increases the odds that a disk will fail (because so many more disks are being used). For this reason the disk array will generate “parity” information that can be used to reconstruct data in the case of a disk failure.

RAID Technology with Parity diagram

This striping technology is known as RAID (Redundant Array of Inexpensive/Independent Disks). It is often used by applications that are performing a high number of read requests (for example, it can often be used by banking applications that are continually analyzing spending patterns). Some disk arrays use one parity disk (known as RAID level 5), and some disk arrays use two parity disks (known as RAID level 6).

Disk array manufacturers use many different software techniques to lay out data. Some manufacturers distribute data and parity across every disk in the system (this is known as wide striping). Others focus on a specific set of disks (as shown in the diagram above).

Read up on all of these technologies. Understand how the needs of the application will often dictate the type of technique used within a disk array.

For your reference I’ve linked to some good starting points on Wikipedia for RAID and data striping.

Steve
http://stevetodd.typepad.com
Twitter: @SteveTodd
EMC Intrapreneur

***