Re: Do we need to correct barriering in circular-buffers.rst?

From: Peter Zijlstra
Date: Mon Sep 23 2019 - 10:49:39 EST


On Thu, Sep 19, 2019 at 02:59:06PM +0100, David Howells wrote:

> But I don't agree with this. You're missing half the barriers. There should
> be *four* barriers. The document mandates only 3 barriers, and uses
> READ_ONCE() where the fourth should be, i.e.:
>
> thread #1 thread #2
>
> smp_load_acquire(head)
> ... read data from queue ..
> smp_store_release(tail)
>
> READ_ONCE(tail)
> ... add data to queue ..
> smp_store_release(head)
>

Notably your READ_ONCE() pseudo code is lacking a conditional;
kernel/events/ring_buffer.c writes it like so:

* kernel user
*
* if (LOAD ->data_tail) { LOAD ->data_head
* (A) smp_rmb() (C)
* STORE $data LOAD $data
* smp_wmb() (B) smp_mb() (D)
* STORE ->data_head STORE ->data_tail
* }
*
* Where A pairs with D, and B pairs with C.
*
* In our case (A) is a control dependency that separates the load of
* the ->data_tail and the stores of $data. In case ->data_tail
* indicates there is no room in the buffer to store $data we do not.
*
* D needs to be a full barrier since it separates the data READ
* from the tail WRITE.
*
* For B a WMB is sufficient since it separates two WRITEs, and for C
* an RMB is sufficient since it separates two READs.

Where 'kernel' is the producer and 'user' is the consumer. This was
written before load-acquire and store-release came about (I _think_),
and I've so far resisted updating B to store-release because smp_wmb()
is actually cheaper than store-release on a number of architectures
(notably ARM).

C ought to be a load-aquire, and D really should be a store-release, but
I don't think the perf userspace has that (or uses C11).