Cluster Computing – Benchmarking Local Storage

Goal: Given a cluster computing rig of twenty-eight processors, each can have either a USB 2.0 or microSD local flash storage. The USB bus is faster than the microSD bus (40MB/s vs. 25MB/s), but are threaded USB flash random reads faster than the microSDs? How is the power efficiency of each? First I explore which technology should be used, then I explore which brand and model is the most performant.

Requirements

The random read speed of each compute module’s local storage must exceed 4.46MB/s to make the local flash storage solution advantageous to, say, a central NAS storage solution. This limit is due to the uplink’s gigabit bandwidth being shared by twenty-eight compute modules resulting in a modest per-module 4.46MB/s bandwidth. The sooner data can be shuttled into a given node’s RAM, the sooner the node can join in on big data operations.

The following research is broken down into four parts:


Part One – USB Flash vs. MicroSD Cards

The SOPINE compute module is operating a microSD bus speed of SDR25 (theoretically up to 25MB/s, but practically about 23MB/s), and the USB 2.0 bus has an upper limit of around 40MB/s. This is the best we can do with the buses available. On the surface using the faster bus USB flash device is a no-brainer, but how well do various devices perform with small reads and writes?

I. CrystalDiskMark

Because the SOPINE compute modules only have a USB 2.0 bus, I first performed a CrystalDiskMark speed test (Windows) on each device in a USB 2.0 hub. They were all formatted with exFAT. The CrystalDiskMark results are below. The important figures are the random read and write speeds which give a hint at how the flash devices will perform in production.

Let’s get a sense of performance.

  1. LVCARDS 32GB Class 10 U1 microSD (no-name brand)
  2. Samsung EVO Plus 64GB Class 10 U3 microSD
  3. SanDisk Ultra Fit USB 3.0 64GB flash drive (in a USB 2.0 slot)
  4. SanDisk Ultra USB 3.0 32GB flash drive (in a USB 2.0 slot)
  5. Kingston DataTraveler 8GB USB 2.0 flash drive1
Flash devices under Phoronix test in SOPINE A64 modules
Flash devices under Phoronix test in SOPINE A64 modules

1. LVCARDS 32GB Class 10 U1 microSD (no-name brand)

Most of my inexpensive AliExpress LVCARDS microSD cards are defective or have a fake capacity, but a handful are working, albeit noticeably slow. Here the speed tests for one of those cards.

Inexpensive LVCARDS microSD 32GB USB 2.0 speed tests
Inexpensive LVCARDS microSD 32GB USB 2.0 speed tests

Just to confirm the cards are good, all were tested with H2testw – the de-facto gold standard in detecting counterfeit SD cards – the result below confirms this is a good card to test as well.

LVCARDS microSD 32GB H2testw
LVCARDS microSD 32GB H2testw

During writing the card reached a comfortable peak temperature of 40℃.

LVCARDS 32GB reached 40℃
LVCARDS 32GB reached 40℃

2. Samsung EVO Plus 64GB Class 10 U3 microSD

This card is my favorite entering into this storage competition. It has the right ratings (Class 10, U3), and is a genuine Samsung EVO Plus. It has the best sequential write speed and decent random read and write speeds. It doesn’t get hot, or even warm, even under extended use.

Samsung EVO Plus 64GB microSD in USB 2.0 hub
Samsung EVO Plus 64GB microSD in USB 2.0 hub

The peak temperature reached during benchmarking was only 36.8℃ – body temperature.

Samsung EVO Plus 64GB reached 36.8℃
Samsung EVO Plus 64GB reached 36.8℃

3. SanDisk Ultra Fit USB 3.0 64GB flash drive (in a USB 2.0 slot)

This USB flash device has good ratings and has a low-profile, but it is actually slower than its older sibling the SanDisk Ultra (see below). This is one of the few low-profile USB flash devices available, so the real test is between this device and the Samsung EVO Plus.

SanDisk Ultra Fit 32GB flash in USB 2.0 hub
SanDisk Ultra Fit 32GB flash in USB 2.0 hub

Unfortunately, a deal-breaker is that this drive gets very hot to the touch. It reached 46℃ at one point. In fact, for the CrystalDiskMark tests I needed to actively cool the device or else it would slow down or freeze. How’s that for ironic? It gets so hot it freezes. There are negative reviews about this2.

SanDisk Ultra Fit 64GB USB flash temperature measurement
SanDisk Ultra Fit 64GB USB flash temperature measurement

4. SanDisk Ultra USB 3.0 32GB flash drive (in a USB 2.0 slot)

This is a thumb drive I’ve had for a couple years, but is fast enough for my daily data courier needs.

Sandisk Ultra 32GB flash in USB 2.0 hub
Sandisk Ultra 32GB flash in USB 2.0 hub

It also reached 46℃, but it wasn’t as noticeable to the touch because it has a larger surface area and plastic cover. It continued working even after 15+ hours of prolonged stress testing.

SanDisk Ultra 32GB USB reached 46℃
SanDisk Ultra 32GB USB reached 46℃

5. Kingston DataTraveler 8GB USB 2.0 flash drive

This flash drive is nearly a decade old, but still works. It looks unbearably slow on random writes, but interestingly it’s random read speeds outperform all the other flash devices which makes this flash drive somewhat of an anomaly. I’ve added it here just to give the competition an underdog.

Kingston DataTraveler 8GB USB 2.0 flash results
Kingston DataTraveler 8GB USB 2.0 flash results

Preliminary Analysis

From the above we see that random write speeds vary wildly among devices (from 0.02MB/s to 4MB/s) with the EVO Plus leading the pack. Yet among the more crucial random read speeds, excluding the token Kingston device with deplorable random write speeds, the devices were in the same ballpark (from 3.9MB/s to 7.9MB/s) with the EVO Plus again leading the group. So far the EVO Plus is beating the USB devices in both random speeds and power efficiency (less heat).


II. Phoronix Test Suite

Next let’s see how the devices compare under prolonged stress tests. Each run of the Phoronix disk suite took between 15 hours and 2 days, or didn’t finish at all. I tested the EVO Plus three times in different SOPINE modules, and the Ultras twice in different SOPINEs as well. The Ultra Fit was only tested once because it essentially melted. The DataTraveler didn’t finish the tests.

The Kingston DataTraveler has a painful random write speed so it did not finish the Phoronix suite of tests.
The SanDisk Fit drive became permanently “frozen” in a protective read-only state likely due to thermal stress after two days of continuous use. SanDisk technical support said to return the drive for a new one. This is not what we want to have happen in a computing cluster.

Testing Methodology

First, I installed a stable Armbian OS image (Armbian 5.65 with Ubuntu 16.04) so an SD card, installed Phoronix Test Suite, and cached the pts/disk tests. Then I resized the microSD to 16GB with gnome-disks and created a backup image. Next, I cloned this image to the rest of the microSD cards including the (painfully slow) LVCARDS microSD cards, and to the one Samsung EVO Plus microSD card.

SoPine A64 ready to run nand-sata-install
SoPine A64 ready to run nand-sata-install

For the USB flash drives under test, the Armbian utility script nand-sata-install was run to move the root file system to the USB drive. This ensures all disk operations truly happen on the flash device3. I chose ext2 without journaling to increase disk throughput for the tests, and to reduce USB wear.

Running nand-sata-install to move rootfs to USB
Running nand-sata-install to move rootfs to USB

The above script was run on all three USB-flash-equipped SOPINE modules via the serial console4, then they were rebooted. Finally using nmap and cssh I ran phoronix-test-suite run pts/disk on all five compute modules simultaneously.

Here are the tests performed:

PostMark

This is a test of NetApp’s PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes. The EVO is the clear winner, beating the Ultra series of USB flash drives by ten fold.

Phoronix PostMark results
Phoronix PostMark results

Asynchronous I/O

AIO-Stress is an asynchronous I/O benchmark created by SuSE. Currently this profile uses a 2048MB test file and a 64KiB record size. The asynchronous random write speeds of the devices were in the same order of magnitude, but the EVO instances edged out the rest.

Phoronix AIO-Stress results
Phoronix AIO-Stress results

GZip Compression

This test measures the time needed to archive/compress two copies of the Linux 4.13 kernel source tree using Gzip compression. The EVO instances were about 30% faster than the USBs at compressing the Linux source tree.

Phoronix Gzip Compression results
Phoronix Gzip Compression results

Unpacking the Linux Kernel

This test measures how long it takes to extract the .tar.xz Linux kernel package. The Ultras varied considerably between runs, but the EVO instances again excelled here performing more than twice as fast as the USB devices.

Phoronix Unpacking the Linux Kernel results
Phoronix Unpacking the Linux Kernel results

Preliminary Analysis

From the above again we see that the EVO Plus (microSD) is consistently outperforming the SanDisk flash devices (USB) under goal testing despite the USB having a faster bus. It looks like USB flash devices are just not designed for random file operations.


III. EVO Plus (MicroSD) vs. SSD (USB)

Just to overkill this competition, I added a WD Blue 500GB SSD with a SATA-to-USB adapter plugged into the SOPINE USB 2.0 bus. It had the same treatment as a USB flash device. I then compared it to the best storage device above: the EVO Plus.

SSD on the USB 2.0 bus for benchmarking
SSD on the USB 2.0 bus for benchmarking

Interestingly, the random read speeds were nearly as good ad the EVO, which is what we are most interested in. This is peculiar because the USB bus is faster than the microSD bus, so the SSD should be faster as in the following test. This tells me that even though both devices are fast, there is a bottleneck with the CPU or RAM, and a faster device like an SSD won’t improve this speed further.

Phoronix Gzip: SSD vs EVO
Phoronix Gzip: SSD vs EVO

Here the SSD is twice as fast as the EVO, likely owing to its faster bus.

Phoronix PostMark: SSD vs EVO
Phoronix PostMark: SSD vs EVO

Of course on random write operations the SSD is in a league of its own. Fortunately we are more excited about random reads in distributed database cluster computing.

Phoronix AIO-Stress: SSD vs EVO
Phoronix AIO-Stress: SSD vs EVO

Preliminary Analysis

We expect the SSD would perform the best among all devices, but with random, threaded reading it is not exceptionally faster than the EVO Plus (microSD). From a cost/benefit ratio, the microSD EVO Plus is still the clear winner.


IV. MicroSD in a USB Card Reader

A microSD card in the USB bus would be the best of both worlds.

Low-profile USB microSD card reader
Low-profile USB microSD card reader

We already know what the CrystalDiskMark speeds look like for the microSD cards in a USB 3.0 card reader in a USB 2.0 hub from the results in section I. We also learned that the EVO Plus microSD in the slower SDR25 bus (up to 25MB/s) outperforms the USB flash devices. Adding the microSD cards to the USB bus via card readers should theoretically improve the benchmark results further.

However, I found that the card reader became extremely hot in operation, reaching around 46℃ as well.

Preliminary Analysis

We should prefer only requiring one microSD card per compute module, not two5, and this configuration would increase the heat output and thus the power consumption. Besides, there may be a way to overclock the microSD bus. More on that later.


Part Two – MicroSD Card Benchmarking

Now that we’ve determined that the way forward is with microSD cards over USB flash devices, let’s determine how makes of microSD cards compare with one another. I’ve gone out and obtained higher-end microSD cards.

Benchmark harness for several microSD cards
Benchmark harness for several microSD cards

I. Flexible IO Tester – Phoronix Test Suite

Let’s perform the Phoronix Test Suite’s implementation of the Flexible IO Tester tests on the above seven microSD cards. We already know the EVO Plus beats the previous devices, so let’s drill down and see how the random reads and writes compare between the microSD cards. To (attempt to) confirm the results are consistent between identical cards, I’ve run the tests on multiple cards where available. The no-name LVCARDS is added again for a reference.

Random Reading Speeds

FIO is an advanced disk benchmark that depends upon the kernel’s AIO access library. Let’s compare the 4K random read speeds among the cards. We can see that the EVOs varied a bit between devices, but the Ultra A1s performed the best here. It is a little surprising that the A1s performed better than the A2.

Animation of FIO Tester results for several block sizes - random reads
Animation of FIO Tester results for several block sizes – random reads

At the 256KB-block level all the cards performed similarly with random read speeds at or above 20MB/s (near the microSD bus limit). However, at all the other block levels the A1s and the A2 outperformed the EVO Plus. It’s worth noting that not all three A1s performed the same – the third Ultra A1 was slower than the other two, so the agreeing A1s (1 and 2) are used in this analysis.

Random Writing Speeds

Writing speed is not nearly as important for a distributed database cluster, especially since operations are performed in memory and the results are sent to an upstream processor. As a point of interest let’s look at how well the cards handle random writes.

Animation of FIO Tester results for several block sizes - random writes
Animation of FIO Tester results for several block sizes – random writes

Random write speeds are all over the place and inconsistent. The EVOs look as if they failed from the 64KB-block level, and the third A1 was consistently writing at about half the speed of the other two A1s. The A1s sometimes perform better than the A2s, and then their positions of first and second place alternate depending on the block size.

Preliminary Analysis

Not all cards are created equally, and even vary between identical models like the SanDisk Ultra A1s and the Samsung EVO Pluses do. This illuminates the peril of relying on third-party microSD card ranking sites that test only one card per model. The most that can be said so far is that despite the variations the SanDisk A1s and A2s outperform the EVO Plus in both random reading and random writing.


II. Flexible IO Tester – Multi-Threaded Tests

The Phoronix Test Suite implementation of the Flexible IO Tester is single-threaded. With the script below let’s manually run a batch of tests using four jobs (not one job as in the Phoronix tests above) to test the random IO when all four cores of the SOPINE are engaged in disk operations. Here is the test script.

Below are condensed tables of results.

Random Reading Speeds

The random reading speeds are consistent with the Phoronix results above when using four cores. The best speed is in gold and the runner up is in silver. One of the SanDisk A1s is the clear winner for all block sizes.

BlockExtreme A2EVO Plus (CA)EVO Plus (EU)Ultra A1 (1)Ultra A1 (2)Ultra A1 (3)
4kB IOPS382308381408449433
4KB KB/s611649296105653671966933
16KB KB/s120201229212945145401538715105
32KB KB/s148051569314019179981859718417
64KB KB/s203791888017650204112086920728
128KB KB/s219112099220274219322218022101
256KB KB/s227302224621865226952281722788

Random Writing Speeds

The random write speeds are chaotic like the Phoronix tests above. The SanDisk Extreme A2 was in the early lead with small block sizes, but the SanDisk Ultra A1 pulled ahead with larger block sizes. The Samsung EVO Pluses appeared to give up again.

BlockExtreme A2EVO Plus (CA)EVO Plus (EU)Ultra A1 (1)Ultra A1 (2)Ultra A1 (3)
4KB IOPS112869010010736
4KB KB/s18061381144216121722590
16KB KB/s451358795713476250301207
32KB KB/s686058075362759477602209
64KB KB/s83216940410721113883021
128KB KB/s12295192984714735147883736
256KB KB/s161093102170017195174413939

Preliminary Analysis

The SanDisk Ultra A1 with a lower IOPS rating than the SanDisk Extreme A2 IOPS rating performed much better than the A2, surprisingly as well, under multi-threaded testing as with single-threaded testing. The EVOs performed poorly at higher block sizes.


Some time goes by…


Part Three – MicroSD Card Benchmarking Redux

I’ve since added a Patriot A1 microSD card from Amazon and four more Samsung EVO Plus cards from AliExpress (which surprising work well). I tested everything several times in different compute modules to avoid bias. Here are the final FIO results.

Collection of microSD cards under FIO testing
Collection of microSD cards under FIO testing

Random Reading Speeds

The SanDisk A1s still lead the pack even though I tested the SanDisk A2 three times in different hardware. The good news is that from 128KB block sizes the random read speeds of all cards are nearly the same. And, thankfully the Samsung EVOs from AliExpress (CN1~4) all read consistently.

Animation of FIO Tester results for several block sizes - random reads
Animation of FIO Tester results for several block sizes – random reads

Random Writing Speeds

Both the A1s and A2s – including the Patriot A1 – write reasonably well. For small block writes the Patriot A1 shines, but is eventually overtaken by the SanDisk A1s and A2. The takeaway is to use either an A1 or an A2.

Animation of FIO Tester results for several block sizes - random writes
Animation of FIO Tester results for several block sizes – random writes

On the other hand, the Samsung EVO Pluses all seem to have trouble with direct writes past the 64KB-block size. This is especially pronounced on 256KB-block random writes (see below). The few I had before exhibited this behavior, but with the addition of the four AliExpress EVOs (which appear genuine) that is six Samsung EVO Pluses that are having trouble writing.

Samsung EVO Plus FIO 256KB block random writes
Samsung EVO Plus FIO 256KB block random writes

Part Four – Analysis & Conclusion

These storage benchmarking conclusions are for a cluster computing rig of SOPINE A64 modules with a SRD25 microSD bus and a USB 2.0 bus. The maximum speed of the non-overclocked microSD bus is about 23MB/s, so block-size testing stopped when most of the microSD cards reached this random read speed – 128KB or 256KB. Random read speed is more important than random write speed in a distributed database that very slowly changes, like when analyzing historic stock data.

In part one I determined that microSD cards are preferred to USB flash devices even though the USB bus is faster. This is because microSD cards are better suited to random reads and writes, they don’t get as hot as a USB flash devices, and it is possible to overclock the microSD SDR25 bus.

In part two I looked at four kinds of microSD cards: A1, A2, ordinary, and low-cost. Overall, the A1 outperformed the rest at random reads and writes; the Samsung EVO Pluses beat the USB flash devices, but SanDisk A1s and A2s beat the EVOs at random reads and writes. For random writes there was little consistency, although the EVOs had consistent writing problems at higher block sizes. Anything with an A1 or A2 rating generally outperformed the ordinary and low-cost cards in both categories. These results were consistent under four-core multi-threaded testing as well.

In part three I tested even more microSD cards and retested previous cards. Of the A1 and A2 cards, some performed better at low block sizes and some better at higher block sizes. Again, they all generally performed better than the ordinary Samsung EVO Pluses which were my favorite in part one.

Most performant microSD cards
Most performant microSD cards
Conclusion: The SanDisk Ultra A1 and the Patriot A1 are the best value with high performance, and with these I will round out my cluster computing storage. The SanDisk Extreme A2 doesn’t give a performance boost for the higher rating and price tag. The Samsung EVO Pluses have fine reads, but poor random writes, so I will not add to the six I already have. The LVCARDS will be discarded, or possibly used to test an inevitable node failure. This took days of testing, but it was worth it to not make a CAD $700 mistake6 on the wrong cards.

Next Steps

There are whispers on the web that some people have had success overclocking the microSD bus on competing SBCs7. Some have claimed an 80% throughput improvement. If this is accurate, then it may be possible to achieve a microSD bus speed over 45MB/s with an overlay file system. This is my next area of research.


Notes:

  1. I have this 8GB USB 2.0 flash drive lying around for a comparison.
  2. I found these negative reviews after I started having problems.
  3. There is a shortcut of moving the ~/.phoronix-test-suite to the flash device and symlinking to it, but moving the entire root folder is more elegant.
  4. Note the strange characters in the copy progress bar.
  5. With SPI flash we could possibly boot from the USB device, or even PXE, but this adds complexity and more points of failure.
  6. $25/card x 28 cards for the cluster computing rig
  7. Single Board Computers