Re: [PATCH] drm/rcar-du: dsi: Clean up VCLK divider calculation
From: David Laight
Date: Thu Jan 01 2026 - 18:44:50 EST
On Fri, 2 Jan 2026 00:25:54 +0100
Marek Vasut <marek.vasut@xxxxxxxxxxx> wrote:
> On 1/2/26 12:13 AM, David Laight wrote:
> > On Thu, 1 Jan 2026 22:26:34 +0100
> > Marek Vasut <marek.vasut@xxxxxxxxxxx> wrote:
> >
> >> On 1/1/26 12:42 PM, David Laight wrote:
> >>
> >> Hello David,
> >>
> >>>> static void rcar_mipi_dsi_pll_calc(struct rcar_mipi_dsi *dsi,
> >>>> unsigned long fin_rate,
> >>>> unsigned long fout_target,
> >>>> - struct dsi_setup_info *setup_info)
> >>>> + struct dsi_setup_info *setup_info,
> >>>> + u16 vclk_divider)
> >>>
> >>> There is no need for this to be u16, unsigned int will generate better code.
> >>
> >> Can you please elaborate on the "better code" part ?
> >
> > Basically the compiler has to generate extra code to ensure the value
> > doesn't exceed 65535.
> > So there will be a mask of the function parameter (passes in a 32bit register).
> > Any arithmetic needs similar masking of the result.
> > You won't see the latter (as much) on x86 (or m68k) because the compiler
> > can use the C 'as if' rule and use the cpu's 8/16 bit registers and alu.
> > But pretty much no other cpu can do that, so extra instructions are needed
> > to stop the value (in a 32bit register) exceeding the limit for the variable.
> >
> > Remember that while C variables can be char/short the values they contain
> > are promoted to 'signed int' before (pretty much) anything is done with
> > the value, any calculated value is then truncated before being assigned back.
> > For on-stack values this is fine - but modern compilers was to keep values
> > in registers as much as possible.
> >
> >>
> >>>> {
> >>>> unsigned int best_err = -1;
> >>>> const struct rcar_mipi_dsi_device_info *info = dsi->info;
> >>>> @@ -360,7 +373,7 @@ static void rcar_mipi_dsi_pll_calc(struct rcar_mipi_dsi *dsi,
> >>>> if (fout < info->fout_min || fout > info->fout_max)
> >>>> continue;
> >>>>
> >>>> - fout = div64_u64(fout, setup_info->vclk_divider);
> >>>> + fout = div64_u64(fout, vclk_divider);
> >>>
> >>> Since vclk_divider is BIT_U32(div [+ 1]) this is just a shift right.
> >>> So pass the bit number instead.
> >>
> >> While I agree this is a shift in the end, it also makes the code harder
> >> to understand. I can add the following change into V2, but I would like
> >> input from Tomi/Laurent on whether we should really push the
> >> optimization in that direction:
> >
> > The shift really is a lot faster.
> > Even on 64bit x86 cpus it can be slow - 35-88 clocks prior to 'Cannon Lake'.
> > AMD cpu get better with Zen3, and using a 64bit divide with 32bits values
> > doesn't slow things down (it does on the Intel cpu).
> > Don't even think about what happens on 32bit cpus.
> > I don't know about other architectures.
> > Just comment as 'x >> n' rather than 'x / (1 << n)'
>
> Please note that this code is built primarily for arm64 , so discussing
> x86 or m68k optimizations doesn't make much sense here.
ARM definitely only has 32 and 64bit maths - so you will see masking
instructions for arithmetic on char/short values (unless the compiler
can tell that the values cannot get too large.)
Divide performance is cpu dependant - the only arm cpu I've used only
had software divide (but a fast barrel shifter).
If you think that Intel haven't thrown much silicon at integer divide
it isn't that likely that arm will have a divide that is much faster
than 1 clock for each bit in the quotient.
(Of course I might be wrong...)
David