Re: Debugging a TTY race condition on M1 (memory ordering dragons)
From: Hector Martin
Date: Mon Aug 15 2022 - 12:01:30 EST
(Resend, because I still can't use mail clients properly it seems...)
On 15/08/2022 22.47, Will Deacon wrote:
> As I mentioned in the thread you linked to, the architecture was undergoing
> review in this area. I should've followed back up, but in the end it was
> tightened retrospectively to provide the behaviour you wanted. This was
> achieved by augmenting the barrier-ordered-before relation with:
>
> * RW1 is a memory write effect W1 and is generated by an atomic instruction
> with both Acquire and Release semantics.
>
> You can see this in the latest Arm ARM.
>
> However, test_and_set_bit() is unordered on failure (i.e. when the bit is
> unchanged) and uses READ_ONCE() as a quick check before the RmW. See the
> "ORDERING" section of Documentation/atomic_bitops.txt.
Argh, I'd completely missed that early exit (and had stumbled on an
unofficial doc that said it was always ordered, which confused me).
Indeed, getting rid of the early exit it fixes the problem.
> I think you're missing the "shortcut" in test_and_set_bit():
>
> if (READ_ONCE(*p) & mask)
> return 1;
>
> old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p);
>
> so if the bit is already set (which I think is the 'ret == false' case)
> then you've only got a control dependency here and we elide writing to
> B.
Completely missed it. Ouch.
>
>>
>> CPU#2:
>> DMB ISHST
>> STR B
>
> Missing DMB ISH here from the smp_mb()?
Yup, my apologies, that was a brain fart while writing the email. I did
have it in the litmus tests (and they indeed completely fail without it).
> If that non-atomic store is hitting the same variable, then it cannot land
> in the middle of the atomic RmW. The architecture says:
>
> | The atomic instructions perform atomic read and write operations on a memory
> | location such that the architecture guarantees that no modification of that
> | memory location by another observer can occur between the read and the write
> | defined by that instruction.
>
> and the .cat file used by herd has a separate constraint for this (see the
> "empty rmw & (fre; coe) as atomic" line).
Ha, I was using G.a from Jan 2021 (back when I started working on this
M1 stuff), and it looks like that wording was added as an issue after
that (D17572) :-)
> There's never anything obvious when you're working with this sort of stuff,
> but my suggestion is that we work towards a litmus tests that we both agree
> represents the code and then take it from there. At the moment there's an
> awful lof of information, but you can see from my comments that I'm not
> up to speed with you yet!
I think you nailed it with the early exit, I'd completely missed that. I
think I can fairly confidently say that's the issue now. As for the
litmus test, indeed with the revised definitions of the memory model /
ARM my concerns no longer apply, hence why I couldn't reproduce them
(and the hardware, thankfully, seems to agree here).
Workqueues are broken. Yay! I'll send a patch.
- Hector