Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF
From: Namhyung Kim
Date: Mon Oct 28 2024 - 14:57:04 EST
On Wed, Oct 23, 2024 at 02:24:13PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > Like in the software events, the BPF overflow handler can drop samples
> > > > > > by returning 0. Let's count the dropped samples here too.
> > > > > >
> > > > > > Acked-by: Kyle Huey <me@xxxxxxxxxxxx>
> > > > > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > > > > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > > > > Cc: Song Liu <song@xxxxxxxxxx>
> > > > > > Cc: bpf@xxxxxxxxxxxxxxx
> > > > > > Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > > > > ---
> > > > > > kernel/events/core.c | 4 +++-
> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > > > --- a/kernel/events/core.c
> > > > > > +++ b/kernel/events/core.c
> > > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > > > ret = __perf_event_account_interrupt(event, throttle);
> > > > > >
> > > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > > > - !bpf_overflow_handler(event, data, regs))
> > > > > > + !bpf_overflow_handler(event, data, regs)) {
> > > > > > + atomic64_inc(&event->dropped_samples);
> > > > >
> > > > > I don't see the full patch set (please cc relevant people and mailing
> > > > > list on each patch in the patch set), but do we really want to pay the
> > > >
> > > > Sorry, you can find the whole series here.
> > > >
> > > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@xxxxxxxxxx
> > > >
> > > > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > > > I'll do in the next version.
> > > >
> > > >
> > > > > price of atomic increment on what's the very typical situation of a
> > > > > BPF program returning 0?
> > > >
> > > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs
> > > > usually return 0 but PERF_EVENT should care about the return values.
> > > >
> > >
> > > Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> > > profilers. It's rather unusual to return non-zero, actually.
> >
> > Ok, then it may be local_t or plain unsigned long. It should be
> > updated on overflow only. Read can be racy but I think it's ok to
> > miss some numbers. If someone really needs a precise count, they can
> > read it after disabling the event IMHO.
> >
> > What do you think?
> >
>
> See [0] for unsynchronized increment absolutely killing the
> performance due to cache line bouncing between CPUs. If the event is
> high-frequency and can be triggered across multiple CPUs in short
> succession, even an imprecise counter might be harmful.
Ok.
>
> In general, I'd say that if BPF attachment is involved, we probably
> should avoid maintaining unnecessary statistics. Things like this
> event->dropped_samples increment can be done very efficiently and
> trivially from inside the BPF program, if at all necessary.
Right, we can do that in the BPF too.
>
> Having said that, if it's unlikely to have perf_event bouncing between
> multiple CPUs, it's probably not that big of a deal.
Yeah, perf_event is dedicated to a CPU or a task and the counter is
updated only in the overflow handler. So I don't think it'd cause cache
line bouncing between CPUs.
Thanks,
Namhyung
>
>
> [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@xxxxxxxxxx/
>
> > Thanks,
> > Namhyung
> >