Re: [PATCH v2 4/4] __arch_xprod64(): make __always_inline when optimizing for performance
From: Arnd Bergmann
Date: Sun Jul 07 2024 - 15:38:31 EST
On Sun, Jul 7, 2024, at 21:14, Nicolas Pitre wrote:
> On Sun, 7 Jul 2024, Arnd Bergmann wrote:
>
>> On Sun, Jul 7, 2024, at 19:17, Nicolas Pitre wrote:
>> > From: Nicolas Pitre <npitre@xxxxxxxxxxxx>
>> >
>> > Recent gcc versions started not systematically inline __arch_xprod64()
>> > and that has performance implications. Give the compiler the freedom to
>> > decide only when optimizing for size.
>> >
>> > Signed-off-by: Nicolas Pitre <npitre@xxxxxxxxxxxx>
>>
>> Seems reasonable. Just to make sure: do you know if the non-inline
>> version of xprod_64 ends up producing a more effecient division
>> result than the __do_div64() code path on arch/arm?
>
> __arch_xprod_64() is part of the __do_div64() code path. So I'm not sure
> of your question.
>
> Obviously, having __arch_xprod_64() inlined is faster but it increases
> binary size.
I meant whether calling __div64_const32->__arch_xprod_64() is
still faster for a constant base when the new __arch_xprod_64()
is out of line, compared to the __div64_32->__do_div64()
assembly code path we take for a non-constant base.
Arnd