Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Andy Lutomirski
Date: Fri Nov 22 2019 - 16:23:48 EST


On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > When we use byte ops, we must consider the word as 4 independent
> > > variables. And in that case the later load might observe the lock-byte
> > > state from 3, because the modification to the lock byte from 4 is in
> > > CPU2's store-buffer.
> >
> > So we absolutely violate this with the optimization for constant arguments
> > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> >
> > So is code that does:
> >
> > set_bit(0, bitmap);
> >
> > on one CPU. While another is doing:
> >
> > set_bit(mybit, bitmap);
> >
> > on another CPU safe? The first operates on just one byte, the second on 8 bytes.
>
> It is safe if all you care about is the consistency of that one bit.
>

I'm still lost here. Can you explain how one could write code that
observes an issue? My trusty SDM, Vol 3 8.2.2 says "Locked
instructions have a total order." 8.2.3.9 says "Loads and Stores Are
Not Reordered with Locked Instructions." Admittedly, the latter is an
"example", but the section is very clear about the fact that a locked
instruction prevents reordering of a load or a store issued by the
same CPU relative to the locked instruction *regardless of whether
they overlap*.

So using LOCK to impleent smb_mb() is correct, and I still don't
understand your particular concern.

I understand that the CPU is probably permitted to optimize a LOCK RMW
operation such that it retires before the store buffers of earlier
instructions are fully flushed, but only if the store buffer and cache
coherency machinery work together to preserve the architecturally
guaranteed ordering.