Re: [PATCH v6 04/20] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access

From: Thomas Gleixner
Date: Thu Apr 04 2019 - 14:12:10 EST


On Thu, 4 Apr 2019, Paolo Bonzini wrote:

> On 04/04/19 18:52, Thomas Gleixner wrote:
> > On Thu, 4 Apr 2019, David Laight wrote:
> >> From: David Laight Sent: 04 April 2019 15:45
> >>> From: Fenghua Yu Sent: 03 April 2019 22:22
> >>> That is not true.
> >>> The BTS/BTR instructions access the memory word that contains the
> >>> expected bit.
> >>> The 'operand size' only affects the size of the register use for the
> >>> bit offset.
> >>> If the 'operand size' is 16 bits wide (+/- 32k bit offset) the cpu might
> >>> do an aligned 16bit memory access, otherwise (32 or 64bit bit offset) it
> >>> might do an aligned 32 bit access.
> >>> It should never do an 64bit access and never a misaligned one (even if
> >>> the base address is misaligned).
> >>
> >> Hmmm... I may have misread things slightly.
> >> The accessed address is 'Effective Address + (4 â (BitOffset DIV 32))'.
> >> However nothing suggests that it ever does 64bit accesses.
> >>
> >> If it does do 64bit accesses when the operand size is 64 bits then the
> >> asm stubs ought to be changed to only specify 32bit operand size.
> >
> > bitops operate on unsigned long arrays, so the RMW on the affected array
> > member has to be atomic vs. other RMW operations on the same array
> > member. If we make the bitops 32bit wide on x86/64 we break that.
> >
> > So selecting 64bit access (REX.W prefix) is correct and has to stay.
>
> Aren't bitops always atomic with respect to the whole cache line(s)? We
> regularly rely on cmpxchgl being atomic with respect to movb.

Yes, but if your long goes across a cacheline then you have lost due to the
requirement to lock both cachelines. Same problem as with bitops and I
rather catch all of those than just some.

Thanks,

tglx