Re: [PATCH tip/core/rcu 02/14] documentation: Fix control dependency and identical stores
From: Mathieu Desnoyers
Date: Wed Feb 24 2016 - 16:12:47 EST
----- On Feb 24, 2016, at 12:00 AM, Paul E. McKenney paulmck@xxxxxxxxxxxxxxxxxx wrote:
> The summary of the "CONTROL DEPENDENCIES" section incorrectly states that
> barrier() may be used to prevent compiler reordering when more than one
> leg of the control-dependent "if" statement start with identical stores.
> This is incorrect at high optimization levels. This commit therefore
> updates the summary to match the detailed description.
> Reported by: Jianyu Zhan <nasa4836@xxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Documentation/memory-barriers.txt | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
> diff --git a/Documentation/memory-barriers.txt
> index 904ee42d078e..e26058d3e253 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -800,9 +800,13 @@ In summary:
> use smp_rmb(), smp_wmb(), or, in the case of prior stores and
> later loads, smp_mb().
> - (*) If both legs of the "if" statement begin with identical stores
> - to the same variable, a barrier() statement is required at the
> - beginning of each leg of the "if" statement.
> + (*) If both legs of the "if" statement begin with identical stores to
> + the same variable, then those stores must be ordered, either by
> + preceding both of them with smp_mb() or by using smp_store_release()
> + to carry out the stores. Please note that it is -not- sufficient
> + to use barrier() at beginning of each leg of the "if" statement,
> + as optimizing compilers do not necessarily respect barrier()
> + in this case.
Hrm, I really don't understand this one.
One caveat, as stated here, would be that optimizing compilers
can reorder instruction with respect to barrier() placed at the
beginning of if/else legs that start with identical stores.
It goes on stating that "smp_mb() or smp_store_release()" should
be used rather than barrier() in those cases.
I don't get how, from a compiler optimization perspective,
barrier() is any different from smp_mb().
#define barrier() __asm__ __volatile__("": : :"memory")
#define mb() asm volatile("mfence":::"memory")
What the compiler would observe is a "memory" clobber in both
Now if the stated cause of this issue would have been
internal reordering of those identical stores within the
processor, I would understand that smp_mb() has an
effect which differs from the compiler barrier, but since
the paragraph begins by stating that this is purely for
compiler optimizations, I'm confused.
What am I missing there ?
> (*) Control dependencies require at least one run-time conditional
> between the prior load and the subsequent store, and this