RE: [PATCH v4 2/5] m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant expressions
From: David Laight
Date: Sun Jan 28 2024 - 14:02:31 EST
From: Vincent MAILHOL
> Sent: 28 January 2024 13:28
...
> Fair. My goal was not to point to the assembly code but to this sentence:
>
> However, for non constant expressions, the kernel's ffs() asm
> version remains better for x86_64 because, contrary to GCC, it
> doesn't emit the CMOV assembly instruction
The comment in the code is:
* AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the
* dest reg is undefined if x==0, but their CPU architect says its
* value is written to set it to the same as before, except that the
* top 32 bits will be cleared.
I guess gcc isn't willing to act on hearsay!
>
> I should have been more clear. Sorry for that.
>
> But the fact remains, on x86, some of the asm produced more optimized
> code than the builtin.
>
> > I actually suspect the asm versions predate the builtins.
>
> This seems true. The __bultins were introduced in:
>
> generic: Implement generic ffs/fls using __builtin_* functions
> https://git.kernel.org/torvalds/c/048fa2df92c3
I was thinking of compiler versions not kernel source commits.
You'd need to be doing some serious software archaeology.
> when the asm implementation already existed in m68k.
>
> > Does (or can) the outer common header use the __builtin functions
> > if no asm version exists?
>
> Yes, this would be extremely easy. You just need to
>
> #include/asm-generic/bitops/builtin-__ffs.h
> #include/asm-generic/bitops/builtin-ffs.h
> #include/asm-generic/bitops/builtin-__fls.h
> #include/asm-generic/bitops/builtin-fls.h
>
> and remove all the asm implementations. If you give me your green
> light, I can do that change. This is trivial. The only thing I am not
> ready to do is to compare the produced assembly code and confirm
> whether or not it is better to remove asm code.
>
> Thoughts?
Not for me to say.
But the builtins are likely to generate reasonable code.
So unless the asm can be better (like trusting undocumented
x86 cpu behaviour) using them is probably best.
For the ones you are changing it may be best to propose using
the builtins all the time.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)