Re: [PATCH v2 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: Petr Tesařík
Date: Tue Mar 28 2023 - 11:11:41 EST


On Tue, 28 Mar 2023 14:29:03 +0000
"Michael Kelley (LINUX)" <mikelley@xxxxxxxxxxxxx> wrote:

> From: Petr Tesařík <petr@xxxxxxxxxxx> Sent: Tuesday, March 28, 2023 6:50 AM
> >
> > On Tue, 28 Mar 2023 13:12:13 +0000
> > "Michael Kelley (LINUX)" <mikelley@xxxxxxxxxxxxx> wrote:
> >
> > > From: Christoph Hellwig <hch@xxxxxxxxxxxxx> Sent: Monday, March 27, 2023 6:34
> > PM
> > > >
> > > > On Sat, Mar 25, 2023 at 10:53:10AM -0700, Michael Kelley wrote:
> > > > > @@ -659,6 +663,14 @@ static int swiotlb_do_find_slots(struct device *dev, int
> > > > area_index,
> > > > > area->index = wrap_area_index(mem, index + nslots);
> > > > > area->used += nslots;
> > > > > spin_unlock_irqrestore(&area->lock, flags);
> > > > > +
> > > > > + new_used = atomic_long_add_return(nslots, &total_used);
> > > > > + old_hiwater = atomic_long_read(&used_hiwater);
> > > > > + do {
> > > > > + if (new_used <= old_hiwater)
> > > > > + break;
> > > > > + } while (!atomic_long_try_cmpxchg(&used_hiwater, &old_hiwater, new_used));
> > > > > +
> > > > > return slot_index;
> > > >
> > > > Hmm, so we're right in the swiotlb hot path here and add two new global
> > > > atomics?
>[...]
> > For my purposes, it does not have to be 100% accurate. I don't really
> > mind if it is off by a few slots because of a race window, so we could
> > (for instance):
> >
> > - update a local variable and set the atomic after the loop,
> > - or make it a per-cpu to reduce CPU cache bouncing,
> > - or just about anything that is less heavy-weight than an atomic
> > CMPXCHG in the inner loop of a slot search.
> >
>
> Perhaps I'm missing your point, but there's no loop here. The atomic
> add is done once per successful slot allocation. If swiotlb_do_find_slots()
> doesn't find any slots for the current area, it exits at the "not_found" label
> and the atomic add isn't done.

My bad. I read the patch too quickly and thought that the update was
done for each searched area. I stay corrected here.

>[...]
> I thought about tracking the high water mark on a per-CPU basis or
> per-area basis, but I don't think the resulting data is useful. Adding up
> the individual high water marks likely significantly over-estimates the
> true high water mark. Is there a clever way to make this useful that I'm
> not thinking about?

No, not that I'm aware of. Min/max cannot be easily split.

>[...]
> Regarding your other email about non-default io_tlb_mem instances,
> my patch just extends what is already reported in debugfs, which
> is only for the default io_tlb_mem. The non-default instances seemed
> to me to be fairly niche cases that weren't worth the additional
> complexity, but maybe I'm wrong about that.

What I mean is that the values currently reported in debugfs only refer
to io_tlb_default_mem. Since restricted DMA pools also use
swiotlb_find_slots() and swiotlb_release_slots(), the global counters
now get updated both for io_tlb_default_mem and all restricted DMA
pools.

In short, this hunk is a change in behaviour:

static int io_tlb_used_get(void *data, u64 *val)
{
- *val = mem_used(&io_tlb_default_mem);
+ *val = (u64)atomic_long_read(&total_used);
return 0;
}

Before the change, it shows the number of used slots in the default
SWIOTLB, after the change it shows the total number of used slots in
the SWIOTLB and all restricted DMA pools.

Petr T