Re: [PATCH bpf-next] bpf: warn against BPF_RB_NO_WAKEUP in bpf_ringbuf_discard()

From: Eyal Birger

Date: Tue Mar 31 2026 - 09:02:57 EST


Hi,

On Mon, Mar 30, 2026 at 5:27 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Mon, Mar 30, 2026 at 5:18 AM Eyal Birger <eyal.birger@xxxxxxxxx> wrote:
> >
> > Document that BPF_RB_NO_WAKEUP is not recommended for
> > bpf_ringbuf_discard().
> >
> > A discard done with BPF_RB_NO_WAKEUP can suppress a later adaptive
> > wakeup from a valid record, leaving an epoll-based userspace consumer
> > asleep even though data is available in the ring buffer.
> >
> > Scenario:
> >
> > epoll_wait(rb_fd); // blocks
> >
> > rec = bpf_ringbuf_reserve(&rb, ...);
> > bpf_ringbuf_discard(rec, BPF_RB_NO_WAKEUP);
> >
> > rec = bpf_ringbuf_reserve(&rb, ...);
> > bpf_ringbuf_submit(rec, 0); // valid record, but no wakeup
> >
> > This behavior is surprising in the context of bpf_ringbuf_discard()
> > as it seems natural not to want to wake userspace.
>
> why? discarded record still takes space in ringbuf and needs to be
> consumed by user-space. So all the same concerns apply here whether
> it's discarded or submitted ringbuf record.

Right, but sadly this isn't obvious to someone who hasn't carefully
inspected the implementation.

>
> >
> > Reported-by: Shmulik Ladkani <shmulik.ladkani@xxxxxxxxx>
> > Signed-off-by: Eyal Birger <eyal.birger@xxxxxxxxx>
> > ---
> > include/uapi/linux/bpf.h | 3 ++-
> > tools/include/uapi/linux/bpf.h | 3 ++-
> > 2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c8d400b7680a..c46b06d45904 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -4645,7 +4645,8 @@ union bpf_attr {
> > * Description
> > * Discard reserved ring buffer sample, pointed to by *data*.
> > * If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
> > - * of new data availability is sent.
> > + * of new data availability is sent, which is not recommended as
> > + * it can suppress a later adaptive wakeup from a subsequent submit.
>
> NAK, it's not "not recommended". If you are using NO_WAKEUP you
> normally would use it with FORCE_WAKEUP as well. It's not a rule, but
> that's the most practical combination. But if you are fancy and know
> what you are doing, you are free to combine NO_WAKEUP, FORCE_WAKEUP
> and "adaptive wakeup" in any form or shape you want.

There are still possible corner cases - for example, multiple silent
discards might fill the ringbuf preventing a later placement of a
valid record - even if it's meant to use "force".
Point is, the user should be extra careful, that fact is documented
in other places, and imho should be noted here too.

>
>
> If anything, instead, it might maybe help to emphasize that
> bpf_ringbuf_discard() does live data in the ringbuf that needs to be
> consumed by user-space code (normally transparently in libbpf or other
> BPF libraries), and the only distinction between submitted and
> discarded record is whether application's user code sees that record
> or it's silently skipped.

Ack. Will send a v2 in that spirit.

Eyal.
>
> > * If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
> > * of new data availability is sent unconditionally.
> > * If **0** is specified in *flags*, an adaptive notification
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 5e38b4887de6..96de37c3b896 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -4645,7 +4645,8 @@ union bpf_attr {
> > * Description
> > * Discard reserved ring buffer sample, pointed to by *data*.
> > * If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
> > - * of new data availability is sent.
> > + * of new data availability is sent, which is not recommended as
> > + * it can suppress a later adaptive wakeup from a subsequent submit.
> > * If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
> > * of new data availability is sent unconditionally.
> > * If **0** is specified in *flags*, an adaptive notification
> > --
> > 2.43.0
> >