Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter

From: Andy Lutomirski
Date: Thu Nov 21 2019 - 14:01:44 EST

> On Nov 21, 2019, at 10:40 AM, Fenghua Yu <fenghua.yu@xxxxxxxxx> wrote:
> ïOn Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
>>> On Thu, Nov 21, 2019 at 9:43 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
>>> From: Ingo Molnar
>>>> Sent: 21 November 2019 17:12
>>>> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>> ...
>>>>> This feature MUST be default enabled, otherwise everything will
>>>>> be/remain broken and we'll end up in the situation where you can't use
>>>>> it even if you wanted to.
>>>> Agreed.
>>> Before it can be enabled by default someone needs to go through the
>>> kernel and fix all the code that abuses the 'bit' functions by using them
>>> on int[] instead of long[].
>>> I've only seen one fix go through for one use case of one piece of code
>>> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
>> Can we really not just change the lock asm to use 32-bit accesses for
>> set_bit(), etc? Sure, it will fail if the bit index is greater than
>> 2^32, but that seems nuts.
>> (Why the *hell* do the bitops use long anyway? They're *bit masks*
>> for crying out loud. As in, users generally want to operate on fixed
>> numbers of bits.)
> We are working on a separate patch set to fix all split lock issues
> in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> 1. Still keep the byte optimization if nr is constant. No split lock.
> 2. If type of *addr is unsigned long, do quadword atomic instruction
> on addr. No split lock.
> 3. If type of *addr is unsigned int, do word atomic instruction
> on addr. No split lock.
> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> the bit and operate on the bit. No split lock.
> Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
> for set_bit()) which need a few extra instructions to re-calculate
> address but can avoid big split lock overhead.
> To get real type of *addr instead of type cast type "unsigned long",
> the atomic bitops APIs are changed to macros from functions. This change
> need to touch all architectures.

Isnât the kernel full of casts to long* to match the signature? Doing this based on type seems silly to me. I think itâs better to just to a 32-bit operation unconditionally and to try to optimize it using b*l when safe.