RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: David Laight
Date: Thu Dec 12 2019 - 11:29:57 EST


From: Peter Zijlstra
> Sent: 12 December 2019 13:04
> On Thu, Dec 12, 2019 at 10:36:27AM +0000, David Laight wrote:
...
> > set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).
>
> Because it is the atomic set bit function, we have __set_bit() if you
> want the non-atomic one.

Horrid name, looks like part of the implementation...
I know _ prefixes get used for functions that don't acquire the obvious lock,
but they usually require the caller to hold the lock.

set_bit_nonatomic() and set_bit_atomic() wold be better names.

> Atomic bitops are (obviously) useful if you have concurrent changes to
> your bitmap.
>
> Lots of people seem confused on this though, as evidenced by a lot of
> the broken crap we keep finding (then again, them using __set_bit()
> would still be broken due to the endian thing).

Yep, quite a bit of code just wants x |= 1 << n;

> > For locked operations (including misaligned ones) that don't cross cache-line
> > boundaries the read operation almost certainly locks the cache line (against
> > a snoop) until the write has updated the cache line.
>
> Note your use of 'almost'. Almost isn't good enough. Note that other
> architectures allow the store from atomic operations to hit the store
> buffer. And I strongly suspect x86 does the same.
>
> Waiting for a store-buffer drain is *expensive*.

Right, the cpu doesn't need to wait for the store buffer to drain,
but the cache line needs to remain locked until it has drained.

...
> > This won't happen until the write 'drains' from the store buffer.
> > (I suspect that locked read requests act like write requests in ensuring
> > that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
> > Although this will delay the response to the snoop it will only
> > stall the cpu (or other bus master), not the entire memory 'bus'.
>
> I really don't think so. The commit I pointed to earlier in the thread,
> that replaced MFENCE with LOCK ADD $0, -4(%RSP) for smp_mb(), strongly
> indicates LOCK prefixed instructions do _NOT_ flush the store buffer.

They don't need to.
It is only a remote cpu trying to gain exclusive access to the cache line
that needs to be stalled by the LOCK prefix write.
Once that write has escaped the store buffer the cache line can be released.

Of course the store buffer may be able to contain the write data for multiple
atomic operations to different parts of the same cache line.

...
> > (If you are doing concurrent locked and unlocked accesses to the same
> > memory location something is badly broken.)
>
> It is actually quite common.

Sorry I meant unlocked writes.

> > It really can't matter whether one access is a mis-aligned 64bit word
> > and the other a byte. Both do atomic RMW updates so the result
> > cannot be unexpected.
>
> Expectations are often violated. Esp when talking about memory ordering.

Especially on DEC Alpha :-)

> > In principle two separate 8 bit RMW cycles could be done concurrently
> > to two halves of a 16 bit 'flag' word without losing any bits or any reads
> > returning any of the expected 4 values.
> > Not that any memory system would support such updates.
>
> I'm thinking you ought to go read that paper on mixed size concurrency I
> referenced earlier in this thread. IIRC the conclusion was that PowerPC
> does exactly that and ARM64 allows for it but it hasn't been observed,
> yet.

CPU with shared L1 cache might manage to behave 'oddly'.
But they still need to do locked RMW cycles.

> Anyway, I'm not saying x86 behaves this way, I'm saying that I have lots
> of questions and very little answers. I'm also saying that the variant
> with non-overlapping atomics could conceivably misbehave, while the
> variant with overlapping atomics is guaranteed not to.
>
> Specifically smp_mb()/SYNC on PowerPC can not restore Sequential
> Consistency under mixed size operations. How's that for expectations?

Is that the spanish inquistion?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)