Re: [PATCH v4 0/4] perf: Support PERF_SAMPLE_READ with inherit_stat

From: Ben Gainey
Date: Tue Apr 02 2024 - 06:18:49 EST


On Wed, 2024-03-27 at 13:34 -0700, Namhyung Kim wrote:
> Hello,
>
> On Mon, Mar 25, 2024 at 3:12 AM Ben Gainey <Ben.Gainey@xxxxxxx>
> wrote:
> >
> > On Fri, 2024-03-22 at 18:22 -0700, Namhyung Kim wrote:
> > > On Fri, Mar 22, 2024 at 9:42 AM Ben Gainey <ben.gainey@xxxxxxx>
> > > wrote:
> > > >
> > > > This change allows events to use PERF_SAMPLE READ with inherit
> > > > so
> > > > long
> > > > as both inherit_stat and PERF_SAMPLE_TID are set.
> > > >
> > > > Currently it is not possible to use PERF_SAMPLE_READ with
> > > > inherit.
> > > > This
> > > > restriction assumes the user is interested in collecting
> > > > aggregate
> > > > statistics as per `perf stat`. It prevents a user from
> > > > collecting
> > > > per-thread samples using counter groups from a multi-threaded
> > > > or
> > > > multi-process application, as with `perf record -e '{....}:S'`.
> > > > Instead
> > > > users must use system-wide mode, or forgo the ability to sample
> > > > counter
> > > > groups. System-wide mode is often problematic as it requires
> > > > specific
> > > > permissions (no CAP_PERFMON / root access), or may lead to
> > > > capture
> > > > of
> > > > significant amounts of extra data from other processes running
> > > > on
> > > > the
> > > > system.
> > > >
> > > > Perf already supports the ability to collect per-thread counts
> > > > with
> > > > `inherit` via the `inherit_stat` flag. This patch changes
> > >
> > > I'm not sure about this part. IIUC inherit and inherit_stat is
> > > not
> > > for
> > > per-thread counts, it only supports per-process (including
> > > children)
> > > events.
> >
> > Hi Namhyung
> >
> > Thanks for the comments...
> >
> > I don't think this is correct, if you compare the behaviour of
> >
> > perf record --no-inherit ... <some-forking-processes>
> > perf script -F pid,tid | sort -u
> > and
> > perf record --no-inherit ... <some-multithreaded-processes>
> > perf script -F pid,tid | sort -u
> >
> > vs
> >
> > perf record ... <some-forking-processes>
> > perf script -F pid,tid | sort -u
> > and
> > perf record .. <some-multithreaded-processes>
> > perf script -F pid,tid | sort -u
> >
> > The behaviour is consistent with the fact that no-inherit only
> > records
> > the primary thread of the primary process, whereas in the inherit
> > case
> > any child tasks (either threads or forked processes) is recorded.
>
> Right, I was talking about the counting behavior not sampling
> as inherit_stat is only for the counting. I think it'd return an
> error
> if event attr has both sample_freq and inherit_stat.
>
>
> > >
> > >
> > > > `perf_event_alloc` relaxing the restriction to combine
> > > > `inherit`
> > > > with
> > > > `PERF_SAMPLE_READ` so that the combination will be allowed so
> > > > long
> > > > as
> > > > `inherit_stat` and `PERF_SAMPLE_TID` are enabled.
> > >
> > > Anyway, does it really need 'inherit_stat'? I think it's only
> > > for
> > > counting use cases (e.g. 'perf stat') not for sampling.
> >
> >
> > I would be very happy to remove the inherit_stat requirement. When
> > I
> > first came to this it seemed like the logic was all there in
> > inherit_stat already, but now that I have to take a different path
> > in
> > `perf_event_context_sched_out` I suspect it should be trivial to
> > remove
> > the inherit_stat requirement.
>
> ok.

>
> > >
> > > Also technically, it can have PERF_SAMPLE_STREAM_ID instead
> > > of PERF_SAMPLE_TID to distinguish the counter values.
> >
> >
> > It looks like you are correct, but the ID given in the read_format
> > part
> > of PERF_SAMPLE_RECORD is the ID rather than STREAM_ID. (I had
> > incorrectly thought/stated it was the latter). Hence when
> > processing
> > the read_format values in the sample record, we either need to use
> > the
> > TID to uniquely identify the source, or we would need to modify the
> > read_format to (additionally) include the STREAM_ID.
> >
> > * The current approach in tools uses the ID+TID, which puts more
> > complexity in the tools but means there isn't an extra field in the
> > read_format data (for each value).
> > * Alternatively I could introduce a PERF_FORMAT_STREAM_ID; I would
> > expect that the user/tool would need to specify
> > PERF_FORMAT_ID|PERF_FORMAT_STREAM_ID as they would need to use the
> > ID
> > to lookup the correct perf_event_attr, but could use the STREAM_ID
> > to
> > uniquely identify the child event. This approach would add an extra
> > u64
> > per value in the read_format data but is possibly simpler/safer for
> > tools?
> >
> > Any preferences?
>
> I think it's better to use TID + ID. IIUC there's no way to track
> STREAM_ID for new children other than getting it from sample.
> As sample has TID already it'd be meaningless using STREAM_ID
> to distinguish events.


So i did implement a prototype version of this where:

PERF_FORMAT_STREAM_ID is added, and `perf record` is modified to
request that in addition to PERF_FORMAT_ID.

The PERF_FORMAT_ID is necessary to identify which perf_event_attr/evsel
the counter represents, and the PERF_FORMAT_STREAM_ID uniquely
identifies the child thread. It requires almost all the same plumbing
as the previous implementation... I just modified my
`perf_sample_id__get_period_storage` in patch 3/4 to take the u64
stream id instead of the u32 tid. I expect there could be some cleanup
there, but I guess it highlights the fact that they amount to the same
outcome.

I'll park that and push out a new patch set with just the inherit
requirement.

Thanks

Ben
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.