2016’s SSD (Solid State Drive) Reliability Report
We now live in a world where mechanical hard disk drives slowly phasing into the world of solid state drives (SSD). Experts agree that SSDs are more likely to develop bad blocks than HDD, but are less likely to fail completely. Previous assumptions about I/O causing wear-out and failure are confirmed, but not to the level that was expected.
A recent white paper from USENIX confirmed SSDs with embedded flash memory experience shorter drive lifespans. These new drives rely on the endurance of flash cells.
“As the endurance of flash cells is limited, raw bit error rates (RBER) are expected to grow with the number of program erase (PE) cycles, with rates that have previously been reported as exponential. The high correlation coefficients between RBER and PE.”
It’s safe to say SSDs are less durable in the long run, however the immediate performance boost might outweigh your next hardware refresh cycle. This will completely depend on the use of the machine.
“We observe that, as expected, RBER grows with the number of PE cycles, both in terms of median and 95th percentile RBER. However, the growth rate is slower than the commonly assumed exponential growth, and more closely resembles a linear increase. The second interesting observation is that the RBER rates under wear-out vary greatly across drive models, even for models that have very similar RBER rates for low PE cycles.”
As stated, most vendors vary from quality and model line-ups. A reputable drive vendor would be a wise decision as wear rates ‘wear-out vary greatly across drive models.’
The most important question is how flash reliability compares to that of hard disk drives (HDDs), their main competitor. We find that when it comes to low replacement rates–flash drives win:
- The annual replacement rates of hard disk drives have previously been reported to be 2-9%, which is high compared to the 4-10% of flash drives we see being replaced in a four-year
- However, flash drives are less attractive when it comes to their error rates. More than 20% of flash drives develop uncorrectable errors in a four-year period, 30-80% develop bad blocks and 2-7% of them develop bad chips.
- In comparison, previous work on HDDs reports that only 3.5% of disks in a large population developed bad sectors in a 32-month period – a low number when taking into account that the number of sectors on a hard disk is orders of magnitudes larger than the number of either blocks or chips on a solid state drive, and that sectors are smaller than blocks, so a failure is less severe.
Overall, SSD flash drives experience significantly lower replacement rates (within their rated lifetime) than hard disk drives. The only catch is that they experience significantly higher rates of uncorrectable errors than hard disk drives.
Our engineer summarizes:
- While wear-out from usage is often the focus of attention, we note that independently of usage the age of a drive, i.e. the time spent in the field, affects reliability.
- SLC drives, which are targeted at the enterprise market and considered to be higher end, are not more reliable than the lower end MLC drives.
- While flash drives offer lower field replacement rates than hard disk drives, they have a significantly higher rate of problems that can impact the user, such as uncorrectable errors.
- Bad blocks and bad chips occur at a significant rate: depending on the model, 30-80% of drives develop at least one bad block and 2-7% develop at least one bad chip during the first four years in the field.
- Drives tend to either have less than a handful of bad blocks, or a large number of them, suggesting that impending chip failure could be predicted based on prior number of bad blocks (and maybe other factors). Also, a drive with a large number of factory bad blocks has a higher chance of developing more bad blocks in the field, as well as certain types of errors.
When selecting new hardware it is essential to select the proper drive for your needs. The finely tuned balance between performance and redundant failover is essential for critical processes. At the end of the day availability is the only Your data is important, call (425) 2 now for expert to configure your new server. 74-1121