Re: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: Petr Tesařík
Date: Fri Apr 07 2023 - 06:57:07 EST


On Fri, 31 Mar 2023 21:45:00 -0700
Michael Kelley <mikelley@xxxxxxxxxxxxx> wrote:

> swiotlb currently reports the total number of slabs and the instantaneous
> in-use slabs in debugfs. But with increased usage of swiotlb for all I/O
> in Confidential Computing (coco) VMs, it has become difficult to know
> how much memory to allocate for swiotlb bounce buffers, either via the
> automatic algorithm in the kernel or by specifying a value on the
> kernel boot line. The current automatic algorithm generously allocates
> swiotlb bounce buffer memory, and may be wasting significant memory in
> many use cases.
>
> To support better understanding of swiotlb usage, add tracking of the
> the high water mark usage of the default swiotlb bounce buffer memory
> pool. Report the high water mark in debugfs along with the other swiotlb
> metrics. Allow the high water mark to be reset to zero at runtime by
> writing to it.
>
> Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> ---
> Changes in v3:
> * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
> as a result, add back the mem_used() function for the "swiotlb
> buffer is full" error message. [Christoph -- I didn't hear back
> whether this approach addresses your concern about one additional
> atomic operation when slots are allocated and again when freed. I've
> gone ahead with this new version, and we can obviously have further
> discussion.]
>
> * Remove unnecessary u64 casts. [Christoph Hellwig]
>
> * Track slot usage and the high water mark only for io_tlb_default_mem.
> Previous versions incorrectly included per-device pools. [Petr Tesarik]
>
> Changes in v2:
> * Only reset the high water mark to zero when the specified new value
> is zero, to prevent confusion about the ability to reset to some
> other value [Dexuan Cui]
>
> kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index d3d6be0..6587a3d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -76,6 +76,9 @@ struct io_tlb_slot {
> static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
> static unsigned long default_nareas;
>
> +static atomic_long_t total_used = ATOMIC_LONG_INIT(0);
> +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0);
> +
> /**
> * struct io_tlb_area - IO TLB memory area descriptor
> *
> @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
> unsigned long flags;
> unsigned int slot_base;
> unsigned int slot_index;
> + unsigned long old_hiwater, new_used;
>
> BUG_ON(!nslots);
> BUG_ON(area_index >= mem->nareas);
> @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
> area->index = 0;
> area->used += nslots;
> spin_unlock_irqrestore(&area->lock, flags);
> +
> + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) {

Yes, this works fine now, but why are total_used and used_hiwater
global variables? If you make them fields in struct io_tlb_mem
(possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the
check. Of course, in instances other than io_tlb_default_mem these
fields would not be exported to userspace through debugfs, but if really
needed, I can at least find them in a crash dump (or read them through
/proc/kcore).

Petr T