Hard disk drive specifications guide: What to look for when buying disk storage

Seek times, latency, data transfer rates, error correction codes and cache are important hard disk drive specifications. Learn the key features of hard disks drives so you can specify the right one for your data centre.

When buying a hard disk drive, UK storage pros will encounter many different product specifications. This article explains the key features of hard disk drives -- latency, typical seek times, rotational latency/rotational speed, data transfer rates, error correction codes and buffer/cache size -- so you can specify the right one for your data centre.

Seeks and latency

How quickly the disk can find and read a sector is determined in part by access time. Reading a particular sector consists of two steps. First, the head must be moved to the correct track. Then, once the head is over that track, you must wait for the sector to spin under the head and read the sector. Seek time is the time required for the head to position itself over a track. The latency period is how long it takes the desired sector to move under the head.

For more information on buying disk storage in the UK
UK data centres buy disk despite smaller storage budget

Moving the head takes a lot longer than waiting for the sector to come around. So low seek times (the time to move the head) are critical to good disk performance.

Access time (time to find a sector) equals seek time (time to move to the sector's cylinder) plus rotational latency period (time to wait for the sector to rotate around and appear under the heads).

Typical seek times

Of the seek time and the latency period, the seek time is usually the longer wait. Seek time is usually expressed in milliseconds (ms). It varies according to how many tracks the heads must traverse. A seek from one track to the next track is usually quick -- just a few milliseconds -- but most seeks aren't so convenient.

Remember, the lower the seek time, the better. Note that in current computing environments, a millisecond is a long period, considering that the measure for modern system memory is nanoseconds. This means the system may have to wait for the hard disk.

A common measure of an average seek is the time the system requires to travel one-third of the way across the disk. Most benchmark programs use this measurement. You might wonder, "Why not halfway across the disk, rather than one-third?" The reason is that most accesses are short seeks of just a few tracks.

In the earlier hard drives, vendors sold hard disks with seek times of almost 100ms. Today, the average seek time on a new drive is between 5ms and 10ms. In general, the low speed depends on what you're willing to spend on a drive, as seek times are built into a drive. There's no way for you to improve a drive's seek time short of getting a new drive.

Rotational latency/rotational speed

Once a head positions itself over a track, the job's still not done. Now the head has to wait for the correct sector to rotate to a position beneath it. How long this takes is a matter of luck. If you're lucky, the sector is already there; if you're really unlucky, you just missed it and will have to wait an entire revolution for it to come round again. As mentioned above, this waiting time, whether large or small, is the rotational latency period.

A common number cited is average latency period. This makes the simple assumption that, on average, the disk must make a half-revolution to get to your sector. Manufacturers calculate the latency period from the spindle speed. Latency, like seek time, is normally expressed in milliseconds.

Rotational latency is directly affected by rotational speed. Depending on the model, disk drives rotate between 3,600 rpm and 15,000 rpm. For a disk rotating at 3,600 rpm, one-half revolution takes 1/7,200 of a minute or 8.33ms. This contributes to the amount of time the system must wait for service (the rotational latency).

The higher the spindle's speed (the rpm), the lower the average latency. Calculate the average latency based on a half rotation of the disk; calculate the worst-case latency on a full rotation of the disk.

Data transfer rate

This is how fast a disk can transfer data once it has been found. Specifically, the transfer rate is a measure of the amount of data that the system can access over a period of time (typically one second). It's determined by the external data transfer rate and the internal transfer rate.

The external data transfer rate is the speed of communication between the system memory and the internal buffer or cache built into the drive. The internal data transfer rate is the speed at which the hard disk can physically write or read data to or from the surface of the platter and then transfer it to the internal drive cache or read buffer. Transfer rates vary depending on the density of the data on the disk, how fast the disk is spinning and the location of the data.

Error correction code (ECC)

No electronic data transmission or storage system is perfect. Each system makes errors at a certain rate. Modern disks have built-in error detection and error correction mechanisms.

Disk systems are great as storage media, but they're volatile. From the first second after you lay a piece of data on a disk, it starts to 'evaporate.' The magnetic domains on the disk that define the data slowly randomise until the data is unrecognisable. The disk itself and the media may be fine, but the data image can fade after some years.

Disk subsystems are aware of this and include some method of detecting and correcting minor data loss. Because the disk subsystem can detect but not correct major data loss, the controller includes extra data, known as the error correction code, when it writes information to the disk. When the controller reads back this information, it can detect whether errors have occurred in the data. The basic idea is that the controller stores redundant information with the disk data at the time that the data is originally written to disk. Then, when the data is later read from disk, the disk controller checks the redundant information to verify data integrity.

ECC calculations are more complex than a simple checksum. The ECC that most manufacturers implement in hard disks (and CD-ROMs) uses the Reed-Solomon algorithm. The calculations take time, so there's a tradeoff; more complex ECCs can recover more damaged data, but they take more computation time. The number of bits associated with a sector for ECC is a design decision, and it determines the robustness of the error detection and correction. Quite a number of modern disks use more than 200 bits of code for each sector.

Some controllers let you use an x-bit ECC. In this example, x refers to the number of consecutive bad bits the ECC can correct. The original ATA hard disk controller, for instance, could correct up to five bad consecutive bits. That meant it had a "maximum correctable error burst length" of 5 bits. Newer controllers can usually correct up to 11 bits. Some of the newest drives installed in the latest machines are using special high-speed controller hardware to do 70-bit error correction.

Buffer/cache size

Disk drives are slow. Your computer uses RAM memory that responds to requests in tens of nanoseconds, but the disk drive responds to requests in tens of milliseconds. That's six orders of magnitude difference in speed.

Whenever you're moving data between a faster medium and a slower one, adding a cache to hold recently used or anticipated data can improve performance by reducing the amount of data that needs to travel through the bottleneck area. A hard disk's performance can similarly be improved by caching. Many manufacturers refer to the cache as a buffer in their drive specifications.

A disk cache seeks to use the speed of memory to bolster the effective speed of the disk. The cache is held in memory chips and is usually one to a few megabytes in size. The operating system can access data previously placed in the disk cache on an as-needed basis. Using this disk cache can cut down on the number of physical seeks and transfers from the hard disk itself.

Smart caching algorithms generally mean that there's no need to change the size of the disk cache. This cache buffer acts as a holding area for one or more tracks, or even a complete cylinder's worth of information in case you need it. This cache buffer can be effective in speeding up both throughput and access times.

Read more on SAN, NAS, solid state, RAID