Re: [PATCH v5 1/7] bits: split the definition of the asm and non-asm GENMASK()
From: Vincent Mailhol
Date: Mon Mar 10 2025 - 06:47:20 EST
On 09/03/2025 at 19:23, David Laight wrote:
> On Sun, 9 Mar 2025 01:58:53 +0000
> David Laight <david.laight.linux@xxxxxxxxx> wrote:
>
>> On Fri, 7 Mar 2025 18:58:08 +0900
>> Vincent Mailhol <mailhol.vincent@xxxxxxxxxx> wrote:
>>
>>> On 07/03/2025 at 04:23, David Laight wrote:
>>>> On Thu, 06 Mar 2025 20:29:52 +0900
>>>> Vincent Mailhol via B4 Relay <devnull+mailhol.vincent.wanadoo.fr@xxxxxxxxxx> wrote:
>>>>
>>>>> From: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>
>>>>>
>>>>> In an upcoming change, GENMASK() and its friends will indirectly
>>>>> depend on sizeof() which is not available in asm.
>>>>>
>>>>> Instead of adding further complexity to __GENMASK() to make it work
>>>>> for both asm and non asm, just split the definition of the two
>>>>> variants.
>>>> ...
>>>>> +#else /* defined(__ASSEMBLY__) */
>>>>> +
>>>>> +#define GENMASK(h, l) __GENMASK(h, l)
>>>>> +#define GENMASK_ULL(h, l) __GENMASK_ULL(h, l)
>>>>
>>>> What do those actually expand to now?
>>>> As I've said a few times both UL(0) and ULL(0) are just (0) for __ASSEMBLY__
>>>> so the expansions of __GENMASK() and __GENMASK_ULL() contained the
>>>> same numeric constants.
>>>
>>> Indeed, in asm, the UL(0) and ULL(0) expands to the same thing: 0.
>>>
>>> But the two macros still expand to something different on 32 bits
>>> architectures:
>>>
>>> * __GENMASK:
>>>
>>> (((~(0)) << (l)) & (~(0) >> (32 - 1 - (h))))
>>>
>>> * __GENMASK_ULL:
>>>
>>> (((~(0)) << (l)) & (~(0) >> (64 - 1 - (h))))
>>>
>>> On 64 bits architecture these are the same.
>>
>> I've just fed those into godbolt (https://www.godbolt.org/z/Ter6WE9qE) as:
>> int fi(void)
>> {
>> int v;
>> asm("mov $(((~(0)) << (8)) & (~(0) >> (32 - 1 - (15)))),%0": "=r" (v));
>> return v -(((~(0u)) << (8)) & (~(0u) >> (32 - 1 - (15))));
>> }
>>
>> gas warns:
>> <source>:4: Warning: 0xffffffffff00 shortened to 0xffffff00
>>
>> unsigned long long fll(void)
>> {
>> unsigned long long v;
>> asm("mov $(((~(0)) << (8)) & (~(0) >> (64 - 1 - (15)))),%0": "=r" (v));
>> return v -(((~(0ull)) << (8)) & (~(0ull) >> (64 - 1 - (15))));
>> }
>>
>> (for other architectures you'll need to change the opcode)
>>
>> For x86 and x86-32 the assembler seems to be doing 64bit maths with unsigned
>> right shifts - so the second function (with the 64 in it) generates 0xff00.
>> I doubt a 32bit only assembler does 64bit maths, but the '>> 48' above
>> might get masked to a '>> 16' by the cpu and generate the correct result.
>>
>> So __GENMASK() is likely to be broken for any assembler that supports 64bits
>> when generating 32bit code.
>> __GENMASK_ULL() works (assuming all have unsigned >>) on 64bit assemblers
>> (even when generating 32bit code). It may work on some 32bit assemblers.>
> I've remembered my 'pi' has a 32bit userspace (on a 64bit kernel).
> I quick test of "mov %0,#(...)" and bits 11..8 gives the correct output
> for size '32' but the error message:
> /tmp/ccPB7bWh.s:26: Warning: shift count out of range (56 is not between 0 and 31)
> with size '64'.
>
> Assuming that part of the gnu assembler is consistent across architectures
> you can't use either GENMASK in asm for 32bit architectures.
>
> Any change (probably including removing the asm support for the uapi) isn't
> going to make things worse!
>
> David
>
>>
>> Since most uses in the header files will be GENMASK() I doubt (hope) no
>> asm code actually uses the values!
After reading your message, I wanted to understand where these
GENMASK*() were used in asm code. It happens that most of the
architectures simply don't use it!
I see these are using in aarch64, but that's about it.
I couldn't find any use of neither GENMASK() nor GENMASK_ULL() in x86,
arm 32 bits, m68k or powerpc asm code (I did not check the other
architectures).
aarch64 uses both the long and long long variants, but this being a 64
bits architecture, these are the same. So OK.
So, this goes into the same direction as you: I guess that the fact that
no one noticed the issue is that no one uses this on a 32 bits arch,
probably for historical reasons, i.e. the asm code was written before
the introduction of the GENMASK() macros.
>> The headers assemble - but that is about all that can be said.
>>
>> Bags of worms :-)
+1 (I will not open that bag)
I think that the asm and non asm variant should have used a different
implementation from the begining. By wanting to have a single variant
that fit both the asm and the C code, we have a complex result that is
hard to understand and maintain (c.f. the bug which you pointed out
which have been unnoticed for ever).
But now that it is in the uapi, I am not sure of what is the best
approach. I sincerely hope that we can just modify it with no userland
impact.
Well, just to make it clear, I am not touching the asm part. I
acknowledge the issue, but I do not think that I have the skill to fix it.
Yours sincerely,
Vincent Mailhol