RE: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: Michael Kelley (LINUX)
Date: Fri Apr 07 2023 - 18:05:34 EST


From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Friday, April 7, 2023 3:56 AM
>
> On Fri, 31 Mar 2023 21:45:00 -0700
> Michael Kelley <mikelley@xxxxxxxxxxxxx> wrote:
>
> > swiotlb currently reports the total number of slabs and the instantaneous
> > in-use slabs in debugfs. But with increased usage of swiotlb for all I/O
> > in Confidential Computing (coco) VMs, it has become difficult to know
> > how much memory to allocate for swiotlb bounce buffers, either via the
> > automatic algorithm in the kernel or by specifying a value on the
> > kernel boot line. The current automatic algorithm generously allocates
> > swiotlb bounce buffer memory, and may be wasting significant memory in
> > many use cases.
> >
> > To support better understanding of swiotlb usage, add tracking of the
> > the high water mark usage of the default swiotlb bounce buffer memory
> > pool. Report the high water mark in debugfs along with the other swiotlb
> > metrics. Allow the high water mark to be reset to zero at runtime by
> > writing to it.
> >
> > Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> > ---
> > Changes in v3:
> > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
> > as a result, add back the mem_used() function for the "swiotlb
> > buffer is full" error message. [Christoph -- I didn't hear back
> > whether this approach addresses your concern about one additional
> > atomic operation when slots are allocated and again when freed. I've
> > gone ahead with this new version, and we can obviously have further
> > discussion.]
> >
> > * Remove unnecessary u64 casts. [Christoph Hellwig]
> >
> > * Track slot usage and the high water mark only for io_tlb_default_mem.
> > Previous versions incorrectly included per-device pools. [Petr Tesarik]
> >
> > Changes in v2:
> > * Only reset the high water mark to zero when the specified new value
> > is zero, to prevent confusion about the ability to reset to some
> > other value [Dexuan Cui]
> >
> > kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index d3d6be0..6587a3d 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -76,6 +76,9 @@ struct io_tlb_slot {
> > static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
> > static unsigned long default_nareas;
> >
> > +static atomic_long_t total_used = ATOMIC_LONG_INIT(0);
> > +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0);
> > +
> > /**
> > * struct io_tlb_area - IO TLB memory area descriptor
> > *
> > @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int
> area_index,
> > unsigned long flags;
> > unsigned int slot_base;
> > unsigned int slot_index;
> > + unsigned long old_hiwater, new_used;
> >
> > BUG_ON(!nslots);
> > BUG_ON(area_index >= mem->nareas);
> > @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int
> area_index,
> > area->index = 0;
> > area->used += nslots;
> > spin_unlock_irqrestore(&area->lock, flags);
> > +
> > + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) {
>
> Yes, this works fine now, but why are total_used and used_hiwater
> global variables? If you make them fields in struct io_tlb_mem
> (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the
> check. Of course, in instances other than io_tlb_default_mem these
> fields would not be exported to userspace through debugfs, but if really
> needed, I can at least find them in a crash dump (or read them through
> /proc/kcore).
>

Got it.

Your previously comments mentioned making them fields in struct io_tlb_mem,
and I missed your point. :-( I got focused on fixing the accounting for
DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the
idea of doing the accounting for the non-default pools even though the values
aren't exposed in /sys. I'll fix this in the next version.

Michael