Re: [PATCH 2/4] perf: Align user space counter reading with code

From: Ian Rogers
Date: Wed Jun 08 2022 - 20:08:13 EST


On Wed, Jun 8, 2022 at 4:25 PM Rob Herring <robh@xxxxxxxxxx> wrote:
>
> On Wed, Jun 8, 2022 at 4:44 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > Align the user space counter reading documentation with the code in
> > perf_mmap__read_self. Previously the documentation was based on the perf
> > rdpmc test, but now general purpose code is provided by libperf.
>
> IMO, this copy of not quite right code should just be removed perhaps
> with a pointer to perf_mmap__read_self(). It will just get out of date
> again. For example, the read loop might get rewritten with restartable
> sequences.

Thanks. The intent with the rdpmc test and the header is they be in
sync. The test flakes when testing at scale, why I'm here. Having the
code in the header makes it clear this is a Linux API and subject to
the Linux syscall note. I like the code in the header but not that it
is out of sync with the code used currently to read it.

> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > ---
> > include/uapi/linux/perf_event.h | 32 ++++++++++++++++-----------
> > tools/include/uapi/linux/perf_event.h | 32 ++++++++++++++++-----------
> > 2 files changed, 38 insertions(+), 26 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index d37629dbad72..3b84e0ad0723 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -538,9 +538,13 @@ struct perf_event_mmap_page {
> > *
> > * if (pc->cap_usr_time && enabled != running) {
> > * cyc = rdtsc();
>
> Kind of x86 specific.

There is a reference to it below and so I let it be, the name rdpmc is
also very x86 but something of a sailed ship.

> > - * time_offset = pc->time_offset;
> > * time_mult = pc->time_mult;
> > * time_shift = pc->time_shift;
> > + * time_offset = pc->time_offset;
> > + * if (pc->cap_user_time_short) {
> > + * time_cycles = pc->time_cycles;
> > + * time_mask = pc->time_mask;
> > + * }
>
> This still misses the need for READ_ONCE().

I'd elided that as it is something of.a kernel API. I can add it back.

> > * }
> > *
> > * index = pc->index;
> > @@ -548,6 +552,9 @@ struct perf_event_mmap_page {
> > * if (pc->cap_user_rdpmc && index) {
> > * width = pc->pmc_width;
> > * pmc = rdpmc(index - 1);
> > + * pmc <<= 64 - width;
> > + * pmc >>= 64 - width;
> > + * count += pmc;
> > * }
> > *
> > * barrier();
> > @@ -590,25 +597,24 @@ struct perf_event_mmap_page {
> > * If cap_usr_time the below fields can be used to compute the time
> > * delta since time_enabled (in ns) using rdtsc or similar.
> > *
> > - * u64 quot, rem;
> > - * u64 delta;
> > - *
> > - * quot = (cyc >> time_shift);
> > - * rem = cyc & (((u64)1 << time_shift) - 1);
> > - * delta = time_offset + quot * time_mult +
> > - * ((rem * time_mult) >> time_shift);
> > + * cyc = time_cycles + ((cyc - time_cycles) & time_mask);
> > + * delta = time_offset + mul_u64_u32_shr(cyc, time_mult, time_shift);
>
> For documentation purposes, the original code was easier to read and
> this is just an optimization. What does mul_u64_u32_shr() do exactly?
> It's not documented.

My concern with expanding the function is the header and the code
aren't in sync and so we're not testing what we're advertising.
Finding the definition is not a huge challenge imo.

Thanks,
Ian

> > *
> > * Where time_offset,time_mult,time_shift and cyc are read in the
> > * seqcount loop described above. This delta can then be added to
> > - * enabled and possible running (if index), improving the scaling:
> > + * enabled and possible running (if index) to improve the scaling. Due
> > + * to event multiplexing, running maybe zero and so care is needed to
> > + * avoid division by zero.
> > *
> > * enabled += delta;
> > - * if (index)
> > + * if (idx)
> > * running += delta;
> > *
> > - * quot = count / running;
> > - * rem = count % running;
> > - * count = quot * enabled + (rem * enabled) / running;
> > + * if (running != 0) {
> > + * quot = count / running;
> > + * rem = count % running;
> > + * count = quot * enabled + (rem * enabled) / running;
> > + * }
> > */
> > __u16 time_shift;
> > __u32 time_mult;