Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

From: Paul E. McKenney
Date: Sat Aug 17 2019 - 19:03:34 EST

On Sat, Aug 17, 2019 at 09:03:30PM +0100, Valentin Schneider wrote:
> Apologies to Steve for continuing this thread when all he wanted was moving
> an operation inside a mutex...
> On 17/08/2019 16:02, Mathieu Desnoyers wrote:
> [...]
> > However, if the state of "x" can be any pointer value, or a reference
> > count value, then not using "WRITE_ONCE()" to store a constant leaves
> > the compiler free to perform that store in more than one memory access.
> > Based on [1], section "Store tearing", there are situations where this
> > happens on x86 in the wild today when storing 64-bit constants: the
> > compiler is then free to decide to use two 32-bit immediate store
> > instructions.
> >
> That's also how I understand things, and it's also one of the points raised
> in the compiler barrier section of memory-barriers.txt
> Taking this store tearing, or the invented stores - e.g. the branch
> optimization pointed out by Linus:
> > if (a)
> > global_var = 1
> > else
> > global_var = 0
> >
> > then the compiler had better not turn that into
> >
> > global_var = 0
> > if (a)
> > global_var = 1
> AFAICT nothing prevents this from happening inside a critical section (where
> the locking primitives provide the right barriers, but that's it). That's
> all fine when data is never accessed locklessly, but in the case of locked
> writes vs lockless reads, couldn't there be "leaks" of these transient
> states? In those cases we would want WRITE_ONCE() for the writes.
> So going back to:
> > But the reverse is not really true. All a READ_ONCE() says is "I want
> > either the old or the new value", and it can get that _without_ being
> > paired with a WRITE_ONCE().
> AFAIU it's not always the case, since a lone READ_ONCE() could get transient
> values.

Linus noted that he believes that compilers for architectures supporting
Linux can be trusted to avoid store-to-load transformations, invented
stores, and unnecessary store tearing. Should these appear, Linus would
report a bug against the compiler and expect it to be fixed.

> I'll be honest, it's not 100% clear to me when those optimizations can
> actually be done (maybe the branch thingy but the others are dubious), and
> it's even less clear when compilers *actually* do it - only that they have
> been reported to do it (so it's not made up).

There is significant unclarity inherent in the situation. The standard
says one thing, different compilers do other things, and developers
often expect yet a third thing. And sometimes things change over time,
for example, the ca. 2011 dictim against compilers inventing data races.

Hey, they didn't teach me this aspect of software development in school,
either. ;-)

Thanx, Paul