Re: [RFC 1/1] mm: Add per-task struct tlb counters

From: Joe Damato
Date: Wed Sep 14 2022 - 10:24:04 EST


On Wed, Sep 14, 2022 at 01:58:27PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 14, 2022 at 12:40:55AM -0700, Dave Hansen wrote:
> > Why didn't the tracepoints work for you?
>
> This; perf should be able to get you per-task slices of those events.

Thanks for taking a look; I replied to Dave with a longer form response,
but IMHO, tracepoints are helpful in specific circumstances.

On a heavily loaded system with O(10,000) or O(100,000) tasks, tracepoints
can be difficult to use... especially if the TLB shootdown events are
anomalous events that happen in large bursts at unknown intervals and are
difficult to reproduce.

IMHO, I think that being able to periodically scrape /proc to see that a
particular process has a large TLB shootdown storm can then instruct you as
to when to apply perf (and to which specific tasks) in order to debug the
issue.