Re: [PATCH v4 3/8] bits: introduce fixed-type genmasks

From: Vincent Mailhol
Date: Wed Mar 05 2025 - 11:51:12 EST


On 06/03/2025 at 00:47, Yury Norov wrote:
> On Wed, Mar 05, 2025 at 10:00:15PM +0900, Vincent Mailhol via B4 Relay wrote:
>> From: Yury Norov <yury.norov@xxxxxxxxx>
>>
>> Add __GENMASK_t() which generalizes __GENMASK() to support different
>> types, and implement fixed-types versions of GENMASK() based on it.
>> The fixed-type version allows more strict checks to the min/max values
>> accepted, which is useful for defining registers like implemented by
>> i915 and xe drivers with their REG_GENMASK*() macros.
>>
>> The strict checks rely on shift-count-overflow compiler check to fail
>> the build if a number outside of the range allowed is passed.
>> Example:
>>
>> #define FOO_MASK GENMASK_U32(33, 4)
>>
>> will generate a warning like:
>>
>> ../include/linux/bits.h:41:31: error: left shift count >= width of type [-Werror=shift-count-overflow]
>> 41 | (((t)~0ULL - ((t)(1) << (l)) + 1) & \
>> | ^~
>>
>> Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>> Acked-by: Jani Nikula <jani.nikula@xxxxxxxxx>
>> Signed-off-by: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>
>
> Co-developed-by?

OK. I will keep you as the main author and credit me as Co-developer.

>> ---
>> Changelog:
>>
>> v3 -> v4:
>>
>> - The v3 is one year old. Meanwhile people started using
>> __GENMASK() directly. So instead of generalizing __GENMASK() to
>> support different types, add a new GENMASK_t().
>>
>> - replace ~0ULL by ~_ULL(0). Otherwise, __GENMASK_t() would fail
>> in asm code.
>>
>> - Make GENMASK_U8() and GENMASK_U16() return an unsigned int. In
>> v3, due to the integer promotion rules, these were returning a
>> signed integer. By casting these to unsigned int, at least the
>
> This comment will disappear when I'll apply the patch. Can you comment
> it in the code instead?

Ack. I will add below comment in the code:

/*
* Because of the C integer promotion rules, the U8 and the U16
* variants would immediately become signed integers when used in
* expressions. Cast them to unsigned int so that, at least, the
* signedness is preserved.
*/

(unless if you prefer to go back to the u8 and u16 casts, c.f. below).

>> signedness is kept.
>> ---
>> include/linux/bitops.h | 1 -
>> include/linux/bits.h | 33 +++++++++++++++++++++++++++++----
>> 2 files changed, 29 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> index c1cb53cf2f0f8662ed3e324578f74330e63f935d..9be2d50da09a417966b3d11c84092bb2f4cd0bef 100644
>> --- a/include/linux/bitops.h
>> +++ b/include/linux/bitops.h
>> @@ -8,7 +8,6 @@
>>
>> #include <uapi/linux/kernel.h>
>>
>> -#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
>> #define BITS_TO_LONGS(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(long))
>> #define BITS_TO_U64(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u64))
>> #define BITS_TO_U32(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
>> diff --git a/include/linux/bits.h b/include/linux/bits.h
>> index 5f68980a1b98d771426872c74d7b5c0f79e5e802..f202e46d2f4b7899c16d975120f3fa3ae41556ae 100644
>> --- a/include/linux/bits.h
>> +++ b/include/linux/bits.h
>> @@ -12,6 +12,7 @@
>> #define BIT_ULL_MASK(nr) (ULL(1) << ((nr) % BITS_PER_LONG_LONG))
>> #define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
>> #define BITS_PER_BYTE 8
>> +#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
>>
>> /*
>> * Create a contiguous bitmask starting at bit position @l and ending at
>> @@ -25,14 +26,38 @@
>>
>> #define GENMASK_INPUT_CHECK(h, l) BUILD_BUG_ON_ZERO(const_true((l) > (h)))
>>
>> -#define GENMASK(h, l) \
>> - (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>> -#define GENMASK_ULL(h, l) \
>> - (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>> +/*
>> + * Generate a mask for the specified type @t. Additional checks are made to
>> + * guarantee the value returned fits in that type, relying on
>> + * shift-count-overflow compiler check to detect incompatible arguments.
>> + * For example, all these create build errors or warnings:
>> + *
>> + * - GENMASK(15, 20): wrong argument order
>> + * - GENMASK(72, 15): doesn't fit unsigned long
>> + * - GENMASK_U32(33, 15): doesn't fit in a u32
>> + */
>> +#define GENMASK_t(t, h, l) \
>
> Agree with Andy. This should be GENMASK_TYPE, or triple-underscored
> ___GENMASK() maybe. This _t thing looks misleading.

My preference goes to GENMASK_TYPE().

>> + (GENMASK_INPUT_CHECK(h, l) + \
>> + (((t)~ULL(0) - ((t)1 << (l)) + 1) & \
>> + ((t)~ULL(0) >> (BITS_PER_TYPE(t) - 1 - (h)))))
>
> Can you rebase it on top of -next? In this dev cycle I merge a patch
> that reverts the __GENMASK() back to:

Oh, I did not realize that. Do you mean a rebase on top of:

https://github.com/norov/linux/tree/bitmap-for-next

?

I will do so.

> #define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
>
>> +#define GENMASK(h, l) GENMASK_t(unsigned long, h, l)
>> +#define GENMASK_ULL(h, l) GENMASK_t(unsigned long long, h, l)
>
> This makes __GENMASK() and __GENMASK_ULL() unused in the kernel, other
> than in uapi. Or I misunderstand it?

Correct.

> Having, in fact, different implementations of the same macro for kernel
> and userspace is a source of problems. Can we move GENMASK_TYPE() to uapi,
> and implement __GENMASK() on top of them? If not, I'd prefer to keep
> GENMASK and GENMASK_ULL untouched.

This is something which I tried to explain in the cover letter. I am not
confident to declare GENMASK_TYPE() in the uapi and expose it to the
userland. If we do so, any future change in the parameters would be a
user breaking change. __GENMASK_U128() looks already too much to me for
the uapi, I am not keen to bloat it even more with GENMASK_TYPE().

This plus the fact that if we use GENMASK_TYPE() to generate the asm
variant, then we can not rely on sizeof() in the definition which makes
everything over complicated.

I acknowledge that not having a common denominator is not best, but I
see this as an acceptable tradeoff.

> Can you run bloat-o-meter and ensure there's no unwanted effects on
> code generation?

Ack, but that will be tomorrow :)

>> /*
>> * Missing asm support
>> *
>> + * __GENMASK_U*() depends on BITS_PER_TYPE() which would not work in the asm
>
> And there's no __GENMASK_U*(), right?

Yes, silly typo, sorry. Will fix in v5.

>> + * code as BITS_PER_TYPE() relies on sizeof(), something not available in
>> + * asm. Nethertheless, the concept of fixed width integers is a C thing which
>> + * does not apply to assembly code.
>> + */
>> +#define GENMASK_U8(h, l) ((unsigned int)GENMASK_t(u8, h, l))
>> +#define GENMASK_U16(h, l) ((unsigned int)GENMASK_t(u16, h, l))
>
> Typecast to the type that user provides explicitly? And maybe do
> in GENMASK_TYPE()

I have a slight preference for the cast to unsigned int for the reason
explained above. But that is not a deal breaker. If you believe that the
u8/u16 casts are better, let me know, I will be happy to change it :)

>> +#define GENMASK_U32(h, l) GENMASK_t(u32, h, l)
>> +#define GENMASK_U64(h, l) GENMASK_t(u64, h, l)
>
> OK, this looks good. But GENMASK_U128() becomes a special case now.
> The 128-bit GENMASK is unsued, but it's exported in uapi. Is there any
> simple way to end up with a common implementation for all fixed-type
> GENMASKs?

What bothers me is that the 128 bit types are not something available on
all architectures, c.f. the CONFIG_ARCH_SUPPORTS_INT128. So, I would
need a U128() equivalent to the ULL() but which does not break on
architectures which do not support 128 bits integers.

This is where I am stuck. If someone can guide me on how to write a
robust U128() macro, then I think the common implementation could be
feasible.

>> +
>> +/*
>> * __GENMASK_U128() depends on _BIT128() which would not work
>> * in the asm code, as it shifts an 'unsigned __int128' data
>> * type instead of direct representation of 128 bit constants
>
> This comment is duplicated by the previous one. Maybe just join them?
> (Let's wait for a while for updates regarding GENMASK_U128 status before
> doing it.)

OK. I will wait for this one. I will probably send the v5 before we get
the answer but I do not this this is an issue if we have two parallel
streams.


Yours sincerely,
Vincent Mailhol