Re: [PATCH] __div64_32: implement division by multiplication for 32-bit arches

From: Nicolas Pitre
Date: Thu Nov 05 2015 - 00:06:52 EST

On Thu, 5 Nov 2015, Vineet Gupta wrote:

> On Thursday 05 November 2015 05:18 AM, Nicolas Pitre wrote:
> > On Wed, 4 Nov 2015, Nicolas Pitre wrote:
> >
> >> On Fri, 30 Oct 2015, Måns Rullgård wrote:
> >>
> >>> Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> writes:
> >>>
> >>>> I'm going to do it anyway given that I already have it for ARM. It'll
> >>>> be opt-in, so if your arch doesn't provide it then the current C
> >>>> implementation will be used by default.
> >>> Great. I'll try it out on MIPS once you've posted the patch.
> >> You should have seen the patches by now.
> >>
> >> I've put them along with a bunch of do_div() usage fixes here:
> >>
> >>
> > More precisely:
> >
> > div64
> Hi Nico,
> While we are current on the topic I was wondering about another optimization in
> this area.
> The slowpath __div64_32() generates both quotient and remainder. The more general
> use case in kernel only cares about quotient.
> git grep "\sdo_div(" | wc -l
> 841
> git grep "=\sdo_div(" | wc -l
> 116
> Is it possible to optimize the code, if remainder was *not* needed explicitly. I
> understand that the hand divide will still need some sort of running tally of
> remainder but can the code be any better in this case. That way we can introduce
> another API do_div_norem() and start proliferating it for the cases where
> remainder is not used.

I don't think you'll be able to optimize the code much. If you look at
the division loop, you always have the current remainder to process as
you say, so when you can't substract from the remainder anymore you
simply return that value. And on ARM we simply tell the calling code
about which register contains the remainder if it wants it. Therefore
on ARM the code would be exactly the same in either cases.