Re: AF_ALG hardening
From: Simon Richter
Date: Sat May 02 2026 - 04:19:47 EST
Hi,
On 5/2/26 13:52, Demi Marie Obenour wrote:
Of course, it'll also be a fair a bit of work, and unfortunately I also
expect pushback from people who (incorrectly IMO) think that AF_ALG
performance is important, even moreso than security.
AF_ALG performance (time/power) is important in the way that it's literally the only point to its existence. If all it provides is extra overhead over a software implementation, then it makes no sense to keep it.
If one cares about crypto offload performance, they would be better
served by creating a better interface to it than AF_ALG. AF_ALG is
a horrible API with (presumably) tons of overhead. I know the QAT
driver and an Nvidia BlueField DPU accelerator driver both bypass it.
The API is designed to be zerocopy, that's why it's this horrible combination of socket API and splice(). The general assumption here is that it does not make sense to offload small requests in the first place, and application programmers are aware of that.
The use case is "I have a file or pipe full of data and a device with a kernel driver that should process it, can we somehow avoid copying the data to userspace only to immediately copy it back to kernelspace?"
This copying is even more silly if the actual question I have in userspace is "what is the SHA256 checksum of this file?" or "what is the SHA256 checksum of the string 'blob 8794311528\0' followed by this file?" (where you can see why anyone would ask such a silly question and prefer to use the dedicated hardware that processes 24 GB/s over the CPU at 100 MB/s)
Furthermore, AF_ALG only supports symmetric algorithms. These
algorithms are inexpensive in software, so the cost of going to an
accelerator and back is enormous compared to the cost of a single
operation.
Yes, initial setup cost is high, so this only makes sense for large requests or batches (submitting individual requests is generally cheap, the difficulty is ensuring the data is accessible to the hardware).
That's also why there are no asymmetric algorithms: these aren't generally used on large amounts of data, so it's never worth it to offload these.
It would make sense to offload asymmetric algorithms if there was a secure key storage inside the device, but AFAIK the API does not support that, or even the notion of on-device contexts.
It is not a good API, and it sits on top of the ahash/acomp/acrypt interfaces which are also unfriendly to accelerator hardware.
For offload to even a very fast accelerator to make sense,
one must be able to deeply pipeline requests. However, this creates
a huge amount of additional complexity for software.
Software that has requirements like that is already complex -- if I have a few thousand workload packets, I need a worker pool.
If I don't have these requirements, then indeed I am better off with a software-only solution in userspace, because it is not relevant from a performance standpoint.
Asymmetric accelerators also don't have a better alternative in the
form of inline encryption hardware.
Quite a number of architectures do not have inline encryption support, and these are more likely to use offload hardware even for smaller requests (e.g. for power saving).
I think a high performance interface to hardware cryptography (and,
more importantly, compression) would look much more like RDMA.
There would be a kernel driver that did the bare minimum to provide
isolation between userspace programs, and a userspace driver that
was responsible for abstracting over the hardware.
Offload hardware comes in two flavours: the high-throughput kind, built into devices where no one cares about power, and the lower-power-than-the-CPU-doing-it kind.
The former can easily provide user contexts even in virtualized environments, but the latter is generally found in systems that do not even have an IOMMU. Either we have two distinct interfaces for these, or we need one that can handle either.
My feeling is that no one is happy with either AF_ALG or the asynchronous interfaces in general, so I think they should be removed completely, and there should be a separate "offload" SIG that creates new interfaces that are actually usable with current hardware.
> 1. Get rid of zero-copy support (splice()).
> 2. Get rid of AIO support.
> 3. Only allow software implementations.
That makes sense if we're forced to keep the interface for now, but it means that offload support through the crypto subsystem is completely dead, and anyone wanting to support offload hardware needs to go elsewhere. Can we get a definitive statement that this is intended?
Simon
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature