Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()

From: H. Peter Anvin
Date: Tue Feb 25 2025 - 19:31:13 EST


On February 25, 2025 2:46:23 PM PST, David Laight <david.laight.linux@xxxxxxxxx> wrote:
>On Mon, 24 Feb 2025 13:55:28 -0800
>"H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
>
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> >
>> >
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>> >> Refactor parity calculations to use the standard parity8() helper. This
>> >> change eliminates redundant implementations and improves code
>> >> efficiency.
>...
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>>
>> (Also, the parity functions really ought to return bool, and be flagged
>> __attribute_const__.)
>>
>> static inline __attribute_const__ bool _arch_parity8(u8 val)
>> {
>> bool parity;
>> asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
>> return parity;
>> }
>>
>> static inline __attribute_const__ bool _arch_parity16(u16 val)
>> {
>> bool parity;
>> asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
>> return parity;
>> }
>
>The same (with fixes) can be done for parity64() on 32bit.
>
>>
>> In the generic algorithm, you probably should implement parity16() in
>> terms of parity8(), parity32() in terms of parity16() and so on:
>>
>> static inline __attribute_const__ bool parity16(u16 val)
>> {
>> #ifdef ARCH_HAS_PARITY16
>> if (!__builtin_const_p(val))
>> return _arch_parity16(val);
>> #endif
>> return parity8(val ^ (val >> 8));
>> }
>>
>> This picks up the architectural versions when available.
>
>Not the best way to do that.
>Make the name in the #ifdef the same as the function and define
>a default one if the architecture doesn't define one.
>So:
>
>static inline parity16(u16 val)
>{
> return __builtin_const_p(val) ? _parity_const(val) : _parity16(val);
>}
>
>#ifndef _parity16
>static inline _parity16(u15 val)
>{
> return _parity8(val ^ (val >> 8));
>}
>#endif
>
>You only need one _parity_const().
>
>>
>> Furthermore, if a popcnt instruction is known to exist, then the parity
>> is simply popcnt(x) & 1.
>
>Beware that some popcnt instructions are slow.
>
> David
>
>>
>> -hpa
>>
>>
>

Seems more verbose than just #ifdef _arch_parity8 et al since the const and generic code cases are the same (which they aren't always.)

But that part is a good idea, especially since on at least *some* architectures like x86 doing:

#define _arch_parity8(x) __builtin_parity(x)

... etc is entirely reasonable and lets gcc use an already available parity flag should one be available.

The inline wrapper, of course, takes care of the type mangling.