Re: Some -serious- BPF-related litmus tests

From: Joel Fernandes
Date: Thu May 28 2020 - 17:48:31 EST


On Mon, May 25, 2020 at 11:38:23AM -0700, Andrii Nakryiko wrote:
> On Mon, May 25, 2020 at 7:53 AM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> >
> > Hi Andrii,
> >
> > On Fri, May 22, 2020 at 12:38:21PM -0700, Andrii Nakryiko wrote:
> > > On 5/22/20 10:43 AM, Paul E. McKenney wrote:
> > > > On Fri, May 22, 2020 at 10:32:01AM -0400, Alan Stern wrote:
> > > > > On Fri, May 22, 2020 at 11:44:07AM +0200, Peter Zijlstra wrote:
> > > > > > On Thu, May 21, 2020 at 05:38:50PM -0700, Paul E. McKenney wrote:
> > > > > > > Hello!
> > > > > > >
> > > > > > > Just wanted to call your attention to some pretty cool and pretty serious
> > > > > > > litmus tests that Andrii did as part of his BPF ring-buffer work:
> > > > > > >
> > > > > > > https://lore.kernel.org/bpf/20200517195727.279322-3-andriin@xxxxxx/
> > > > > > >
> > > > > > > Thoughts?
> > > > > >
> > > > > > I find:
> > > > > >
> > > > > > smp_wmb()
> > > > > > smp_store_release()
> > > > > >
> > > > > > a _very_ weird construct. What is that supposed to even do?
> > > > >
> > > > > Indeed, it looks like one or the other of those is redundant (depending
> > > > > on the context).
> > > >
> > > > Probably. Peter instead asked what it was supposed to even do. ;-)
> > >
> > > I agree, I think smp_wmb() is redundant here. Can't remember why I thought
> > > that it's necessary, this algorithm went through a bunch of iterations,
> > > starting as completely lockless, also using READ_ONCE/WRITE_ONCE at some
> > > point, and settling on smp_read_acquire/smp_store_release, eventually. Maybe
> > > there was some reason, but might be that I was just over-cautious. See reply
> > > on patch thread as well ([0]).
> > >
> > > [0] https://lore.kernel.org/bpf/CAEf4Bza26AbRMtWcoD5+TFhnmnU6p5YJ8zO+SoAJCDtp1jVhcQ@xxxxxxxxxxxxxx/
> > >
> >
> > While we are at it, could you explain a bit on why you use
> > smp_store_release() on consumer_pos? I ask because IIUC, consumer_pos is
> > only updated at consumer side, and there is no other write at consumer
> > side that we want to order with the write to consumer_pos. So I fail
> > to find why smp_store_release() is necessary.
> >
> > I did the following modification on litmus tests, and I didn't see
> > different results (on States) between two versions of litmus tests.
> >
>
> This is needed to ensure that producer can reliably detect whether it
> needs to trigger poll notification.

Boqun's question is on the consumer side though. Are you saying that on the
consumer side, the loads prior to the smp_store_release() on the consumer
side should have been seen by the consumer? You are already using
smp_load_acquire() so that should be satisified already because the
smp_load_acquire() makes sure that the smp_load_acquire()'s happens before
any future loads and stores.

> Basically, consumer caught up at
> about same time as producer commits new record, we need to make sure
> that:
> - either consumer sees updated producer_pos > consumer_pos, and thus
> knows that there is more data to consumer (but producer might not send
> notification of new data in this case);
> - or producer sees that consumer already caught up (i.e.,
> consumer_pos == producer_pos before currently committed record), and
> in such case will definitely send notifications.

Could you set a variable on the producer side to emulate a notification, and
check that in the conditions at the end?

thanks,

- Joel

>
> This is critical for correctness of epoll notifications.
> Unfortunately, litmus tests don't test this notification aspect, as I
> haven't originally figured out the invariant that can be defined to
> validate this. I'll give it another thought, though, maybe this time
> I'll come up with something.
>
> > Regards,
> > Boqun
> >
>
> [...]