Re: [PATCH v5 3/4] bpf: Add libbpf logic for user-space ring buffer
From: David Vernet
Date: Mon Sep 19 2022 - 17:00:29 EST
On Mon, Sep 19, 2022 at 03:22:09PM -0500, David Vernet wrote:
[...]
> > > + timeout_ns = timeout_ms * ns_per_ms;
> > > + do {
> > > + __u64 ns_remaining = timeout_ns - ns_elapsed;
> > > + int cnt, ms_remaining;
> > > + void *sample;
> > > + struct timespec curr;
> > > +
> > > + sample = user_ring_buffer__reserve(rb, size);
> > > + if (sample)
> > > + return sample;
> > > + else if (errno != ENOSPC)
> > > + return NULL;
> > > +
> > > + ms_remaining = timeout_ms == -1 ? -1 : ns_remaining / ns_per_ms;
> >
> > ok, so you've special-cased timeout_ms == -1 but still didn't do
> > max(0, ns_remaining). Can you prove that ns_elapsed will never be
> > bigger than timeout_ns due to various delays in waking up this thread?
> > If not, let's please have max(0) otherwise we can accidentally
> > epoll_wait(-1).
>
> Yes you're right, this was an oversight. Thanks for catching this!
Wait, actually, this can't happen because of the while check at the end of
the loop:
while (ns_elapsed <= timeout_ns)
So I don't think the max is necessary to prevent underflowing, but I do
think we need to have one more attempt to invoke
user_ring_buffer__reserve() at the end of the function to account for
wakeup delays after epoll_wait() returns. Otherwise, we might think we've
timed out when there's actually a sample left in the buffer. I also still
think your suggestion below for cleaning up makes sense, so I'll still add
it in v6, but I think I can leave off the max() call.
>
> > > + /* The kernel guarantees at least one event notification
> > > + * delivery whenever at least one sample is drained from the
> > > + * ring buffer in an invocation to bpf_ringbuf_drain(). Other
> > > + * additional events may be delivered at any time, but only one
> > > + * event is guaranteed per bpf_ringbuf_drain() invocation,
> > > + * provided that a sample is drained, and the BPF program did
> > > + * not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain().
> > > + */
> > > + cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining);
> > > + if (cnt < 0)
> > > + return NULL;
> > > +
> > > + if (timeout_ms == -1)
> > > + continue;
> > > +
> > > + err = clock_gettime(CLOCK_MONOTONIC, &curr);
> > > + if (err)
> > > + return NULL;
> > > +
> > > + ns_elapsed = ns_elapsed_timespec(&start, &curr);
> >
> > nit: if you move re-calculation of ms_remaining and ns_remaining to
> > here, I think overall loop logic will be even more straightforwad. You
> > can initialize ms_remaining to -1 if timeout_ms < 0 and never
> > recalculate it, right? Note that you can also do ns_elapsed conversion
> > to ms right here and then keep everything else in ms (so no need for
> > timeout_ns, ns_remaining, etc).
>
> Sounds good, let me give this a shot in v6.
>
> Thanks for another detailed review!