Re: [PATCH v2] perf lock contention: Account contending locks too

From: Arnaldo Carvalho de Melo
Date: Thu Feb 29 2024 - 12:23:42 EST


On Wed, Feb 28, 2024 at 01:19:12PM -0800, Namhyung Kim wrote:
> On Wed, Feb 28, 2024 at 12:16 PM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
> > On Wed, Feb 28, 2024 at 12:01:55PM -0800, Namhyung Kim wrote:
> > > On Wed, Feb 28, 2024 at 4:22 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
> > > > On Tue, Feb 27, 2024 at 09:33:35PM -0800, Namhyung Kim wrote:
> > > > > Currently it accounts the contention using delta between timestamps in
> > > > > lock:contention_begin and lock:contention_end tracepoints. But it means
> > > > > the lock should see the both events during the monitoring period.

> > > > > Actually there are 4 cases that happen with the monitoring:

> > > > > monitoring period
> > > > > / \
> > > > > | |
> > > > > 1: B------+-----------------------+--------E
> > > > > 2: B----+-------------E |
> > > > > 3: | B-----------+----E
> > > > > 4: | B-------------E |
> > > > > | |
> > > > > t0 t1

> > > > > where B and E mean contention BEGIN and END, respectively. So it only
> > > > > accounts the case 4 for now. It seems there's no way to handle the case
> > > > > 1. The case 2 might be handled if it saved the timestamp (t0), but it
> > > > > lacks the information from the B notably the flags which shows the lock
> > > > > types. Also it could be a nested lock which it currently ignores. So
> > > > > I think we should ignore the case 2.

> > > > Perhaps have a separate output listing locks that were found to be with
> > > > at least tE - t0 time, with perhaps a backtrace at that END time?

> > > Do you mean long contentions in case 3? I'm not sure what do
> > > you mean by tE, but they started after t0 so cannot be greater

> > case 2

> > monitoring period
> > / \
> > | |
> > 2: B----+-------------E |
> > | | |
> > t0 tE t1
> >
> > We get a notification for event E, right? We don´t have one for B,
> > because it happened before we were monitoring.
>
> Ah, ok. But there should be too many events in case 2 and
> I don't think users want to see them all. And they don't have

So maybe a summary, something like:

N locks that were locked before this session started have been
released, no further info besides this histogram of in-session
durations:

0-N units of time: ++
N+1-M units of time: ++++
...

> flags. But maybe we can update the flag when it sees exactly
> the same callstack later.

The callstack, if going all the way to userspace may have the workload
targeted in the command line ( some pid, tid, CPU, etc) and thus would
point for things the user probably is interested than some other lock
that may affect it but indirectly.

- Arnaldo