RE: [PATCH v2 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: Michael Kelley (LINUX)
Date: Tue Mar 28 2023 - 10:29:24 EST


From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Tuesday, March 28, 2023 6:50 AM
>
> On Tue, 28 Mar 2023 13:12:13 +0000
> "Michael Kelley (LINUX)" <mikelley@xxxxxxxxxxxxx> wrote:
>
> > From: Christoph Hellwig <hch@xxxxxxxxxxxxx> Sent: Monday, March 27, 2023 6:34
> PM
> > >
> > > On Sat, Mar 25, 2023 at 10:53:10AM -0700, Michael Kelley wrote:
> > > > @@ -659,6 +663,14 @@ static int swiotlb_do_find_slots(struct device *dev, int
> > > area_index,
> > > > area->index = wrap_area_index(mem, index + nslots);
> > > > area->used += nslots;
> > > > spin_unlock_irqrestore(&area->lock, flags);
> > > > +
> > > > + new_used = atomic_long_add_return(nslots, &total_used);
> > > > + old_hiwater = atomic_long_read(&used_hiwater);
> > > > + do {
> > > > + if (new_used <= old_hiwater)
> > > > + break;
> > > > + } while (!atomic_long_try_cmpxchg(&used_hiwater, &old_hiwater, new_used));
> > > > +
> > > > return slot_index;
> > >
> > > Hmm, so we're right in the swiotlb hot path here and add two new global
> > > atomics?
> >
> > It's only one global atomic, except when the high water mark needs to be
> > bumped. That results in an initial transient of doing the second global
> > atomic, but then it won't be done unless there's a spike in usage or the
> > high water mark is manually reset to zero. Of course, there's a similar
> > global atomic subtract when the slots are released.
> >
> > Perhaps this accounting should go under #ifdef CONFIG_DEBUGFS? Or
> > even add a swiotlb-specific debugfs config option to cover all the swiotlb
> > debugfs code. From Petr Tesarik's earlier comments, it sounds like there
> > is interest in additional accounting, such as for fragmentation.
>
> For my purposes, it does not have to be 100% accurate. I don't really
> mind if it is off by a few slots because of a race window, so we could
> (for instance):
>
> - update a local variable and set the atomic after the loop,
> - or make it a per-cpu to reduce CPU cache bouncing,
> - or just about anything that is less heavy-weight than an atomic
> CMPXCHG in the inner loop of a slot search.
>

Perhaps I'm missing your point, but there's no loop here. The atomic
add is done once per successful slot allocation. If swiotlb_do_find_slots()
doesn't find any slots for the current area, it exits at the "not_found" label
and the atomic add isn't done.

In the case where the high water mark is bumped, the try_cmpxchg()
is in a loop only to deal with a race condition where another CPU updates
the high water mark first. The try_cmpxchg() should only rarely be
executed more than once, and again, only when the high water mark
changes.

I thought about tracking the high water mark on a per-CPU basis or
per-area basis, but I don't think the resulting data is useful. Adding up
the individual high water marks likely significantly over-estimates the
true high water mark. Is there a clever way to make this useful that I'm
not thinking about?

Tracking the global high water mark using non-atomic add and subtract
could be done, with reduced accuracy. But I wanted to be tracking the
high water mark over an extended time period (days, or even weeks),
and I don't have a feel for how much accuracy would be lost from
non-atomic arithmetic on the global high water mark. It would still
be a shared cache line.

Regarding your other email about non-default io_tlb_mem instances,
my patch just extends what is already reported in debugfs, which
is only for the default io_tlb_mem. The non-default instances seemed
to me to be fairly niche cases that weren't worth the additional
complexity, but maybe I'm wrong about that.

Michael