Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

From: Will Deacon
Date: Wed Aug 21 2019 - 06:32:10 EST


On Tue, Aug 20, 2019 at 01:29:32PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> > On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> >
> > > The data tearing issue is almost a non-issue. We're not going to add
> > > WRITE_ONCE() to these kinds of places for no good reason.
> >
> > Paulmck actually has an example of that somewhere; ISTR that particular
> > case actually got fixed by GCC, but I'd really _love_ for some compiler
> > people (both GCC and LLVM) to state that their respective compilers will
> > not do load/store tearing for machine word sized load/stores.
>
> I do very much recall such an example, but I am now unable to either
> find it or reproduce it. :-/
>
> If I cannot turn it up in a few days, I will ask the LWN editors to
> make appropriate changes to the "Who is afraid" article.
>
> > Without this written guarantee (which supposedly was in older GCC
> > manuals but has since gone missing), I'm loathe to rely on it.
> >
> > Yes, it is very rare, but it is a massive royal pain to debug if/when it
> > does do happen.
>
> But from what I can see, Linus is OK with use of WRITE_ONCE() for data
> races on any variable for which there is at least one READ_ONCE().
> So we can still use WRITE_ONCE() as we would like in our own code.
> Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
> it is better than the proverbial kick in the teeth.
>
> Of course, if anyone knows of a compiler/architecture combination that
> really does tear stores of 32-bit constants, please do not keep it
> a secret! After all, it would be good to get that addressed easily
> starting now rather than after a difficult and painful series of
> debugging sessions.

It's not quite what you asked for, but if you look at the following
silly code:

typedef unsigned long long u64;

struct data {
u64 arr[1023];
u64 flag;
};

void foo(struct data *x)
{
int i;

for (i = 0; i < 1023; ++i)
x->arr[i] = 0;

x->flag = 0;
}

void bar(u64 *x)
{
*x = 0xabcdef10abcdef10;
}

Then arm64 clang (-O2) generates the following for foo:

foo: // @foo
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
orr w2, wzr, #0x2000
mov w1, wzr
mov x29, sp
bl memset
ldp x29, x30, [sp], #16 // 16-byte Folded Reload
ret

and so the store to 'flag' has become part of the memset, which could
easily be bytewise in terms of atomicity (and this isn't unlikely given
we have a DC ZVA instruction which only guaratees bytewise atomicity).

GCC (also -O2) generates the following for bar:

bar:
mov w1, 61200
movk w1, 0xabcd, lsl 16
stp w1, w1, [x0]
ret

and so it is using a store-pair instruction to reduce the complexity in
the immediate generation. Thus, the 64-bit store will only have 32-bit
atomicity. In fact, this is scary because if I change bar to:

void bar(u64 *x)
{
*(volatile u64 *)x = 0xabcdef10abcdef10;
}

then I get:

bar:
mov w1, 61200
movk w1, 0xabcd, lsl 16
str w1, [x0]
str w1, [x0, 4]
ret

so I'm not sure that WRITE_ONCE would even help :/

It's worth noting that:

void baz(atomic_long *x)
{
atomic_store_explicit(x, 0xabcdef10abcdef10, memory_order_relaxed)
}

does the right thing:

baz:
mov x1, 61200
movk x1, 0xabcd, lsl 16
movk x1, 0xef10, lsl 32
movk x1, 0xabcd, lsl 48
str x1, [x0]
ret

Whilst these examples may be contrived, I do thing they illustrate that
we can't simply say "stores to aligned, word-sized pointers are atomic".

Will