Re: [PATCH] lib/int_sqrt.c: Optimize square root function

From: Linus Torvalds
Date: Thu Feb 05 2015 - 13:20:34 EST


On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg <aksgarg1989@xxxxxxxxx> wrote:
>
> I have done profiling of int_sqrt function using perf tool for 10 times.
> For this purpose i have created a userspace program which uses sqrt function
> from 1 to a million.

Hmm. I did that too, and it doesn't improve things for me. In fact, it
makes it slower.

[torvalds@i7 ~]$ gcc -Wall -O2 -DREDUCE int_sqrt.c ; time ./a.out

real 0m2.098s
user 0m2.095s
sys 0m0.000s

[torvalds@i7 ~]$ gcc -Wall -O2 int_sqrt.c ; time ./a.out

real 0m1.886s
user 0m1.883s
sys 0m0.000s

and the profile shows that 35% of the time is spent on that branch
back of the initial reduction loop.

In contrast, my suggested "reduce just once" does seem to improve things:

[torvalds@i7 ~]$ gcc -Wall -O2 -DONCE int_sqrt.c ; time ./a.out

real 0m1.436s
user 0m1.434s
sys 0m0.000s

but it's kind of hacky.

NOTE! This probably depends a lot on microarchitecture details,
including very much branch predictor etc. And I didn't actually check
that it gives the right result, but I do think that this optimization
needs to be looked at more if we want to do it.

I was running this on an i7-4770S, fwiw.

Attached is the stupid test-program I used to do the above. Maybe I
did something wrong.

Linus
/*
* Copyright (C) 2013 Davidlohr Bueso <davidlohr.bueso@xxxxxx>
*
* Based on the shift-and-subtract algorithm for computing integer
* square root from Guy L. Steele.
*/

#define BITS_PER_LONG (8*sizeof(long))

/**
* int_sqrt - rough approximation to sqrt
* @x: integer of which to calculate the sqrt
*
* A very rough approximation to the sqrt() function.
*/
unsigned long __attribute__((noinline)) int_sqrt(unsigned long x)
{
unsigned long b, m, y = 0;

if (x <= 1)
return x;

m = 1UL << (BITS_PER_LONG - 2);

#ifdef REDUCE
while (m > x)
m >>= 2;
#elif defined(ONCE)
{
unsigned long n = m >> (BITS_PER_LONG/2);
m = (n >= x) ? n : m;
}
#endif
while (m != 0) {
b = y + m;
y >>= 1;

if (x >= b) {
x -= b;
y += m;
}
m >>= 2;
}

return y;
}

int main(int argc, char **argv)
{
unsigned long i;

for (i = 0; i < 100000000; i++) {
unsigned long a = int_sqrt(i);
asm volatile("": : "r" (a));
}
return 0;
}