Re: [PATCH 1/3] clk: microchip: core: update to use div64_ul() instead of do_div()

From: David Laight

Date: Tue Feb 24 2026 - 19:43:00 EST


On Tue, 24 Feb 2026 11:56:03 -0500
Brian Masney <bmasney@xxxxxxxxxx> wrote:

> Hi David,
>
> On Mon, Feb 23, 2026 at 09:09:48AM +0000, David Laight wrote:
> > On Sun, 22 Feb 2026 18:51:04 -0500
> > Brian Masney <bmasney@xxxxxxxxxx> wrote:
> >
> > > This driver is currently only compiled on 32-bit MIPS systems. When
> > > compiling on 64-bit systems, the build fails with:
> > >
> > > WARNING: do_div() does a 64-by-32 division, please consider using
> > > div64_ul instead.
> > >
> > > Let's update this to use div64_ul() in preparation for allowing this
> > > driver to be compiled on all architectures.
> >
> > There are a log of 'long' in that code that hold clock frequencies.
> > I suspect they should be u32 (I think someone was scared that int might be 16bit).
>
> Instead of calling:
>
> do_div(frac, rate);
>
> Where frac is a u64, and rate is an unsigned long, I could just cast the
> rate to a u32 like this:
>
> do_div(frac, (u32) rate);
>
> Thoughts?

That cast is horrid :-)

On x86 (32bit or 64bit) I'm not sure it makes any difference whether
you use do_div() or div64_ul().
Other architectures will be different.

I originally thought that do_div() was a simpler wrapper on the x86
divide instruction - so required that both the quotient and remainder
be 32bits. But Linus corrected me saying it had always generated a
64bit quotient.
So on 32bit div64_ul() is pretty much the same code with the same timings.
The 'optimised' (and unusual) parameter rules are also pretty much
a waste of time, div takes 38/41 clocks on a '386 (I happen to have the
book on my desk!) an extra register move wouldn't matter.

Divide doesn't get much faster, 64 by 32 speeds up a bit, but you
have to get to cannon lake or zen3 to get a significant improvement.

zen3+ execute the 128 by 64 divide only slightly slower than 64 by 32.
64bit Intel is another matter entirely.
Even coffee lake has:
reciprocal
u-ops -- ports -- latency throughput
DIV r32 10 10 p0 p1 p5 p6 26 6
DIV r64 36 36 p0 p1 p5 p6 35-88 21-83
So the r64 (128 by 64) divide is a lot slower, and especially so
when it isn't really needed.

Both do_div() and div64_ul() do a 128 by 64 divide.
Even though do_div() would be likely faster using the 32bit sequence.
(Especially if the condition were fixed to that it used two divides
less often.)

Still not sure of the numeric domain of 'frac'.

David

>
> Brian
>
>
> >
> > >
> > > Reported-by: kernel test robot <lkp@xxxxxxxxx>
> > > Closes: https://lore.kernel.org/oe-kbuild-all/202601160758.bpkN4546-lkp@xxxxxxxxx/
> > > Signed-off-by: Brian Masney <bmasney@xxxxxxxxxx>
> > > ---
> > > drivers/clk/microchip/clk-core.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/clk/microchip/clk-core.c b/drivers/clk/microchip/clk-core.c
> > > index 692152b5094e00bf5acb19a67cf41e6c86b11f35..2e86ad846a66cd5487f5412c09ab0ad25ebe3f79 100644
> > > --- a/drivers/clk/microchip/clk-core.c
> > > +++ b/drivers/clk/microchip/clk-core.c
> > > @@ -341,7 +341,7 @@ static void roclk_calc_div_trim(unsigned long rate,
> > > div = parent_rate / (rate << 1);
> > > frac = parent_rate;
> > > frac <<= 8;
> > > - do_div(frac, rate);
> > > + frac = div64_ul(frac, rate);
> > > frac -= (u64)(div << 9);
> >
> > Is that cast in the right place?
> > I suspect 'div' can't be large enough to need it, but it's presence makes
> > my wonder ...
> >
> > David
> >
> > >
> > > rodiv = (div > REFO_DIV_MASK) ? REFO_DIV_MASK : div;
> > >
> >
>