Categories
Encryption Kernel Linux Performance

Linux dm-crypt Performance – Kernel 5.9+

Update 11.12.2021: Finally got a new test machine and started re-evaluating the current situation.

The tests were done on a Quad-Core i5 system, using a 24 GB RAM disk (32 GB RAM total).

linux # modprobe brd rd_nr=1 rd_size=$[ 24 * 1024 * 1024 ]

System is Ubuntu 20.04 (x86_64). Basic test were done using fio with variations of this command:

linux # fio --filename=/dev/mapper/luksdev --readwrite=read|write --bs=4k|64k|1M --direct=1 --name=plain --runtime=10 --time_based

The different benchmark scenarios are these:

“Plain Crypt”: Standard Ubuntu 20.04.3 installation, standard 5.4.0-91 kernel, standard cryptsetup

“Cloudflare”: Replaced dm-crypt.ko with a version including the Cloudflare patches. Added xtsproxy.ko from Cloudflare. Crypto mode needs to be changed manually in order to use Cloudflare optimizations.

“HWE”: Running with Ubuntu HWE kernel (5.11.0-41) instead

“HWE+NRQ” / “HWE+NWQ” / “HWE+NRWQ”: Running HWE kernel, but also disabled read/write/both workqueues (requires a newer version of cryptsetup (>= 2.3.4)).

Read/Write Throughput

The basic throughput values show, that a plain 5.4.0 kernel performs badly, compared to the other options. The Cloudflare patches are still best, however newer kernels with deactivated work-queues come very close to those values (and do not require patching kernel modules).

IOPs

When it comes to IOPs comparing values gets a little confusing, when using charts, so here’s something about how the showed values were generated:

As the tests were run with 4k/64k/1M blocks, fio reports every request as an IOP. So a 1M request (which is 256 times bigger than a 4k one) will take much longer to complete. To get some nice charts anyway, the values displayed here were normalized to 4k blocks (so 64k values were multiplied with 16, 1M values with 256).

The smaller the IOP, the bigger the possible advantage.

Latency

As mentioned above for IOPs the latency graphs also need some special treatment, in order to be useful. In this case the fio reported values were divided by the same factors mentioned above to get comparable values.

Outdated …

New vanilla kernels (from 5.9.x on) do contain dm-crypt performance enhancements (disable queues). So to get some performance tests, I installed the latest kernel of the day.

In order to use these enhancements, new dm-crypt flags were introduced:

DM_CRYPT_NO_READ_WORKQUEUE
DM_CRYPT_NO_WRITE_WORKQUEUE

Instead of using dmsetup to activate those modifications you can also use latest version of cryptsetup (s. Release-Notes of version 2.3.4):

* Added support for new no_read/write_wrokqueue dm-crypt options (kernel 5.9).

  These performance options, introduced in kernel 5.9, configure dm-crypt
  to bypass read or write workqueues and run encryption synchronously.

  Use --perf-no_read_workqueue or --perf-no_write_workqueue cryptsetup arguments
  to use these dm-crypt flags.

  These options are available only for low-level dm-crypt performance tuning,
  use only if you need a change to default dm-crypt behavior.

  For LUKS2, these flags can be persistently stored in metadata with
  the --persistent option.
--perf-no_read_workqueue, --perf-no_write_workqueue
          Bypass dm-crypt internal workqueue and process read or write requests synchronously.  This option is only relevant for open action.

          NOTE: These options are available only for low-level dm-crypt performance tuning, use only if you need a change to default dm-crypt behaviour. Needs kernel 5.9 or later.

First performance tests show about the same speedup as the initial patches from cloudflare.

Update 01.12.2020: I observed a server crash tonight. System was running kernel 5.9.8 and started btrfs scrub jobs that seems to cause almost instant kernel crashes. I could easily reproduce this by running backup jobs or even compiling a new kernel on certain filesystems (stack: LVM, dm-crypt, btrfs).

Kernel reports errors and leaves system unusable:

BUG: scheduling while atomic: kworker/...
BUG: scheduling while atomic: systemd-journal/...
BUG: scheduling while atomic: kworker/...
BUG: scheduling while atomic: swapper/...
bad: scheduling from the idle thread!
bad: scheduling from the idle thread!
bad: scheduling from the idle thread!
<...>

I’m not sure what causes this, I also tried with kernel 5.9.10 – same results. For now I’ll return to Ubuntu default kernel and build the two extra modules.

For those interested in investigating: I was running btrfs on top of a LVM based dm-crypt volume. Maybe this combination causes this mess. I’ll investigate further and report back. Stay tuned.

Leave a Reply

Your email address will not be published. Required fields are marked *