Moving to Real World Benchmarks in SSD Reviews

Many of our readers embrace our "real world" approach with hardware reviews. We have not published an SSD review for almost 2 years while we have been looking to revamp our SSD evaluation program. Today we wanted to give you some insight as to how we learned to stop worrying and love the real world SSD benchmark.

continued...

What’s Wrong with Synthetic Benchmarks?

Having now gone through a reasonably rigorous benchmarking cycle, I confirmed to myself what most astute observers already know; synthetic benchmarks aren't worth much. Consumers have short attention spans, and it's the nature of buyers to want to comparison shop. As a result, the majority latches onto metrics that are less than telling. Remember the Megahertz Races? We're at the same point with SSDs, where every product is introduced with headline numbers of great purported significance. For the rare consumer who probes more deeply, you'll find how those figures are calculated somewhere in the fine print. Most people will feel better from the transparency, at this point.

In reality, judging SSDs by the headline numbers of IOPS and read/write speeds is like comparing apples and orangutans. It just doesn't make sense.

The numbers on the retail box are, in large part, cherry-picked values that have little to no meaning in the real world. Let's say that Manufacturer A and Manufacturer B produce SSDs that are priced similarly and positioned in the same space; they're competing for the same customers. If you follow the prescribed benchmarking steps that Manufacturer A and Manufacturer B provide on their website, both of their drives will look amazing. Remember that both of these drives are ostensibly aimed at the same people doing the same things with them. Now run Manufacturer A's drive on Manufacturer B's protocol, and vice-versa. An apparent train wreck ensues, and you realize that things aren’t quite what they seem.

Synthetic benchmarks are mostly useful for a few narrowly-defined things; marketing, drilling into a very specific use case that you’re able to effectively simulate with a synthetic benchmark, looking at a transparent aggregation of tests to support some sort of generalization about performance, and filling review pages with content. Making purchasing decisions that use this data as a significant factor is difficult, even if the benchmarks are performed by a credible third party with a consistent methodology, do any of the tests matter for your actual use case?

To further complicate matters, compressible workloads are integral parts of some major synthetic benchmarks. Benchmarks that use compression will make compression heavy drives (like those powered by SandForce controllers) look artificially excellent.

Now that we have a better understanding of how synthetic benchmarks can be misleading or unhelpful, let’s look at some of the most widely-used examples.

Iometer

Article Image

Iometer is an incredibly useful tool that's a common, credible synthetic benchmark when used correctly. The great thing about Iometer is that you can set it up to essentially fire arbitrary data into a drive according to your parameters (duration, block size, read/write workload, et cetera), and produce repeatable test cases. You can script complicated tests, run through multiple queue depths or variations of your program, easily do longevity testing to produce a time series of performance data.

Iometer has something of a learning curve, and it’s difficult to get anything useful out of it without having a strong understanding of storage technology. The same depth that allows it to be such a useful tool, unfortunately, also allows it to be used to produce misleading data. The example in the last section about Manufacturer A and Manufacturer B is based on a real case of completely contradictory manufacturer recommended Iometer settings for two directly-competing drives. Unless you appreciate the virtues of comparing fine print, it’s exactly the sort of thing that would likely escape your notice while shopping for a new SSD.

ATTO

Article Image

ATTO uses easily-compressible data, which it writes sequentially. Consequently, it’s a good benchmark forآ…nothing, really. Users can configure the test data length (only up to 2GB), queue depth (only up to 10), transfer size, and not much else. I don’t want to be mean, so I guess I’d better end with something nice. It runs quickly?

AS-SSD

Article Image

By default, AS-SSD is a quickie benchmark that uses incompressible data. There’s also an included "Copy Benchmark" feature that creates proxy tests for copying an ISO, program, and game, but doesn’t truly provide much more transparency than that. Same with the built-in Compression Benchmark...it’s just not explained very well.

CrystalDiskMark

Article Image

Another quickie performance summary. CrystalDiskMark uses data that isn’t compressible by default. Again, using the SandForce-powered ROG RAIDR Express, we see here that the write performance gets smacked. Here’s the same thing, but with easily-compressible zeroes used in place of random data:

Article Image