RE: [PATCH] x86/apic: Use div64_ul() instead of do_div()
From: David Laight
Date: Fri Mar 01 2024 - 03:53:12 EST
From: H. Peter Anvin
> Sent: 01 March 2024 01:02
>
> >>
> >> Change deltapm to unsigned long and replace do_div() with div64_ul()
> >> which doesn't implicitly cast the divisor and doesn't unnecessarily
> >> calculate the remainder.
> >
> >Eh? they are entirely different beasts.
> >
> >do_div() does a 64 by 32 divide that gives a 32bit quotient.
> >div64_ul() does a much more expensive 64 by 64 divide that
> >can generate a 64bit quotient.
> >
> >The remainder is pretty much free in both cases.
> >If a cpu has a divide instruction it will almost certainly
> >put the result in one register and the quotient in another.
> >
>
> Not on e.g. RISC-V.
If the remainder isn't used the compiler should optimise
away any code used to generate it.
gcc is also generating rather sub-optimal code.
On x86 it only does one divide for code that uses 'a / b' and
'a % b', but for riscv it does separate divide and remainder
instructions.
clang does a multiply and subtract for the remainder.
Compared to any form of divide, the extra multiply is noise.
gcc also pessimises attempts to calculate the remainder:
https://godbolt.org/z/Tojh1qcvs
Are the instruction weights set correctly for divide/remainder?
It is almost as though gcc thinks remainder is fast.
Actually I suspect even the 64 by 32 divide is a software loop
on riscv (32bit).
Not checked but I suspect the implementations (esp fpga ones) won't
allow 3 inputs to the ALU.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)