Re: [PATCH] sched: Optimize __calc_delta.

From: Peter Zijlstra
Date: Wed Mar 03 2021 - 09:33:15 EST


On Tue, Mar 02, 2021 at 12:55:07PM -0800, Josh Don wrote:
> On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote:
> > > From: Clement Courbet <courbet@xxxxxxxxxx>
> > >
> > > A significant portion of __calc_delta time is spent in the loop
> > > shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating.
> > >
> > > This is ~7x faster on benchmarks.
> >
> > Have you tried on hardware without such fancy instructions?
>
> Was not able to find any on hand unfortunately. Clement did rework the
> patch to use fls() instead, and has benchmarks for the generic and asm
> variations. All of which are faster than the loop. In my next reply,
> I'll include the updated patch inline.

Excellent; I have some vague memories where using fls ended up slower
for some ARMs, but I can't seem to remember enough to even Google it :/