On Wed, 28 Aug 2024 13:02:31 +0100
Robin Murphy <robin.murphy@xxxxxxx> wrote:
On 2024-08-22 7:37 pm, mhkelley58@xxxxxxxxx wrote:
From: Michael Kelley <mhklinux@xxxxxxxxxxx>
Background
==========
Linux device drivers may make DMA map/unmap calls in contexts that
cannot block, such as in an interrupt handler. Consequently, when a
DMA map call must use a bounce buffer, the allocation of swiotlb
memory must always succeed immediately. If swiotlb memory is
exhausted, the DMA map call cannot wait for memory to be released. The
call fails, which usually results in an I/O error.
Bounce buffers are usually used infrequently for a few corner cases,
so the default swiotlb memory allocation of 64 MiB is more than
sufficient to avoid running out and causing errors. However, recently
introduced Confidential Computing (CoCo) VMs must use bounce buffers
for all DMA I/O because the VM's memory is encrypted. In CoCo VMs
a new heuristic allocates ~6% of the VM's memory, up to 1 GiB, for
swiotlb memory. This large allocation reduces the likelihood of a
spike in usage causing DMA map failures. Unfortunately for most
workloads, this insurance against spikes comes at the cost of
potentially "wasting" hundreds of MiB's of the VM's memory, as swiotlb
memory can't be used for other purposes.
Approach
========
The goal is to significantly reduce the amount of memory reserved as
swiotlb memory in CoCo VMs, while not unduly increasing the risk of
DMA map failures due to memory exhaustion.
Isn't that fundamentally the same thing that SWIOTLB_DYNAMIC was already
meant to address? Of course the implementation of that is still young
and has plenty of scope to be made more effective, and some of the ideas
here could very much help with that, but I'm struggling a little to see
what's really beneficial about having a completely disjoint mechanism
for sitting around doing nothing in the precise circumstances where it
would seem most possible to allocate a transient buffer and get on with it.
This question can be probably best answered by Michael, but let me give
my understanding of the differences. First the similarity: Yes, one
of the key new concepts is that swiotlb allocation may block, and I
introduced a similar attribute in one of my dynamic SWIOTLB patches; it
was later dropped, but dynamic SWIOTLB would still benefit from it.
More importantly, dynamic SWIOTLB may deplete memory following an I/O
spike. I do have some ideas how memory could be returned back to the
allocator, but the code is not ready (unlike this patch series).
Moreover, it may still be a better idea to throttle the devices
instead, because returning DMA'able memory is not always cheap. In a
CoCo VM, this memory must be re-encrypted, and that requires a
hypercall that I'm told is expensive.
In short, IIUC it is faster in a CoCo VM to delay some requests a bit
than to grow the swiotlb.