Re: [PATCH v4 1/3] powerpc/bitops: Use immediate operand when possible

From: Christophe Leroy
Date: Tue Sep 21 2021 - 11:15:37 EST




Le 20/09/2021 à 23:23, Segher Boessenkool a écrit :
Hi!

On Mon, Sep 20, 2021 at 10:31:17AM +0200, Christophe Leroy wrote:
Today we get the following code generation for bitops like
set or clear bit:

c0009fe0: 39 40 08 00 li r10,2048
c0009fe4: 7c e0 40 28 lwarx r7,0,r8
c0009fe8: 7c e7 53 78 or r7,r7,r10
c0009fec: 7c e0 41 2d stwcx. r7,0,r8

c000d568: 39 00 18 00 li r8,6144
c000d56c: 7c c0 38 28 lwarx r6,0,r7
c000d570: 7c c6 40 78 andc r6,r6,r8
c000d574: 7c c0 39 2d stwcx. r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

You can also handle the second sixteen bits (the "shifted" half), by
using oris etc. The "%eN" output modifier prints an "s" for this:
/* If the low 16 bits are 0, but some other bit is set, write 's'. */
But this doesn't handle non-constant arguments, so you're likely better
off using what you have noe.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

Or when all you want to keep are consecutive (you do handle that now :-) )

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

You can use rlwinm whenever you want to clear all top 32 bits.

A sometimes nice idiom is ori x,x,N ; xori x,x,N to clear the bits N
(or oris/xoris). But it's two insns no matter what (but no spare
register is needed).

Could be a candidate for a follow-up change if someone want to focus on PPC64.


Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>

+static inline unsigned long test_and_clear_bits(unsigned long mask, volatile unsigned long *_p)
+{
+ unsigned long old, t;
+ unsigned long *p = (unsigned long *)_p;
+
+ if (IS_ENABLED(CONFIG_PPC32) &&
+ __builtin_constant_p(mask) && is_rlwinm_mask_valid(mask)) {

is_rlwinm_mask_valid(~mask)? So that test_and_clear_bits(0, ...) will
work with rlwinm, and test_and_clear_bits(0xffffffff, ...) will not make
gas scream bloody murder ("illegal bitmask"). Tha mask you pass to the
instruction is ~mask after all.

Ok, fixed in v5.


Looks great except that one nit. Thanks :-)

Reviewed-by: Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx>

Thanks

Christophe