[PATCH] Optimize int_sqrt for small values for faster idle
From: Andi Kleen
Date: Thu Jan 28 2016 - 16:43:08 EST
From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
The menu cpuidle governor does at least two int_sqrt() each time
we go into idle in get_typical_interval to compute stddev
int_sqrts take 100-120 cycles each. Short idle latency is important
for many workloads.
I instrumented the function on my workstation and most values are
16bit only and most others 32bit (50% percentile is 122094,
75% is 3699533).
sqrt is implemented by starting with an initial estimation,
and then iterating. int_sqrt currently only uses a fixed
estimating which is good for 64bits worth of input.
This patch adds some checks at the beginning to start with
a better estimate for values fitting in 8, 16bit and 32bit.
This makes int_sqrt between 60+% faster for values in 16bit,
and still somewhat faster (between 10 and 30%) for larger values
upto 32bit. Full 64bit is slightly slower.
This optimizes the short idle calls and does not hurt the
long sleep (which probably do not care) much.
An alternative would be a full table drive approach, or
trying some inverted sqrt optimization, but this simple change
already seems to have a good payoff.
Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
---
lib/int_sqrt.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc3..2479ccf 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -21,7 +21,15 @@ unsigned long int_sqrt(unsigned long x)
if (x <= 1)
return x;
- m = 1UL << (BITS_PER_LONG - 2);
+ if (x <= 0xffff) {
+ if (m <= 0xff)
+ m = 1UL << (8 - 2);
+ else
+ m = 1UL << (16 - 2);
+ } else if (x <= 0xffffffff)
+ m = 1UL << (32 - 2);
+ else
+ m = 1UL << (BITS_PER_LONG - 2);
while (m != 0) {
b = y + m;
y >>= 1;
--
2.4.3