delay_tsc(): inefficient delay loop (2.6.16-mm1)

From: Andreas Mohr
Date: Fri Mar 24 2006 - 12:01:47 EST


Hello all,

I discovered that the delay loop used there (which simply found a new place
in 2.6.16-mm1; it existed much longer) is inefficient,
since it does an unnecessary subtraction *in the main loop* which also hits
asm code:

00000024 <delay_tsc>:
24: 53 push %ebx
25: 89 c3 mov %eax,%ebx
27: 0f 31 rdtsc
29: 89 c1 mov %eax,%ecx
2b: f3 90 pause
2d: 0f 31 rdtsc
2f: 29 c8 sub %ecx,%eax
31: 39 d8 cmp %ebx,%eax
33: 72 f6 jb 2b <delay_tsc+0x7>
35: 5b pop %ebx
36: c3 ret

With such a patch:

--- linux-2.6.16-mm1/arch/i386/lib/delay.c.orig 2006-03-23 12:52:45.000000000 +0100
+++ linux-2.6.16-mm1/arch/i386/lib/delay.c 2006-03-24 12:53:01.000000000 +0100
@@ -44,10 +44,12 @@
unsigned long bclock, now;

rdtscl(bclock);
+ /* offset with bclock to have very simple comparison below */
+ loops += bclock;
do {
rep_nop();
rdtscl(now);
- } while ((now-bclock) < loops);
+ } while (now < loops);
}

/*

the result is:

00000024 <delay_tsc>:
24: 89 c1 mov %eax,%ecx
26: 0f 31 rdtsc
28: 01 c1 add %eax,%ecx
2a: f3 90 pause
2c: 0f 31 rdtsc
2e: 39 c8 cmp %ecx,%eax
30: 72 f8 jb 2a <delay_tsc+0x6>
32: c3 ret

Improvement: no unnecessary stuff after having hit the timer target value
(read: faster), better power saving, 4 bytes less opcodes.

The patch above could be considered weird since it fiddles with "loops"
directly. A possibly cleaner way would be to introduce a new variable
end = bclock + loops.

Comments? Anything that I'm missing here as to why it hasn't been done that
way before?

If this holds water then I'll submit a final patch soon.

Thanks!

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/