Re: Solved; kernel freezes with egcs -O3 in pre-2.1.126

Colin Plumb (colin@nyx.net)
Sun, 18 Oct 1998 10:33:43 -0600 (MDT)


The problem is that gcc is less strict than it should be about clobbers.
In particular, under high register pressure it sometimes re-uses clobbered
registers rather than giving an error. (This is why you used to be able to
get away with specifying an input register that was clobbered. You never
*specify* an input register, just specify a one-register class that the
input must be assigned to. GCC should have (but didn't) complained that
every register in that class was already taken.)

What works reliably is using earlyclobbered dummy outputs with appropriate
constraints.

Here's a patch to pre-2.1.126 that is a minimal fix. Changing that
constant to an immediate avoids the problem entirely, and just leaving
the return value in %eax makes all kinds of sense.

--- time.c Sun Oct 18 08:17:30 1998
+++ time.c Sun Oct 18 08:21:55 1998
@@ -563,12 +563,11 @@
"movl %%eax, %%ecx\n\t"
"xorl %%eax, %%eax\n\t"
"movl %1, %%edx\n\t"
- "divl %%ecx\n\t" /* eax= 2^32 / (1 * TSC counts per microsecond) */
+ "divl %%ecx" /* eax= 2^32 / (1 * TSC counts per microsecond) */
/* Return eax for the use of fast_gettimeoffset */
- "movl %%eax, %0\n\t"
- : "=r" (retval)
- : "r" (5 * 1000020/HZ)
- : /* we clobber: */ "ax", "bx", "cx", "dx", "cc", "memory");
+ : "=&a" (retval)
+ : "I" (5 * 1000020/HZ)
+ : /* we clobber: */ "bx", "cx", "dx", "cc", "memory");
return retval;
}

Now, I can also make a more ambitious patch, letting gcc do more of the
work. (And changing the 7 space tab to a real tab while I'm at it.)

I also broke the block up into the time-critical part and the
non-time-critical part, saving a register by not remembering
the high 32 bits of the initial rdtsc.

I still think the 64-bit constant used in the last divide could be
computed more accurately. You want 2^32 * elapsed microseconds,
which should be

(1<<32)*5*LATCH*1000000/CLOCK_TICK_RATE

I.e. 214751964397123 = 0xC350D68D7E43.
The value you use is 214752659767296 = C35100000000.

The intermediate value before the CLOCK_TICK_RATE just barely fits
into 64 bits if you use CLOCK_TICK_FACTOR to scale it.

And finally, one last dumb question: why bother using a multiple of
LATCH for the time value? You could just count 65536 clock ticks
and adjust just the same. This would make it independent of
HZ. Currently it'll explode if HZ is decreased.

-- 
	-Colin

--- time.c Sun Oct 18 08:17:30 1998 +++ time.c Sun Oct 18 09:26:32 1998 @@ -508,68 +508,84 @@ __initfunc(static unsigned long calibrate_tsc(void)) { - unsigned long retval; + unsigned long retval; + unsigned long before, after; + unsigned long edx; /* Dummy variable */ + unsigned long long scaled_us; - __asm__( /* set the Gate high, program CTC channel 2 for mode 0 + __asm__(/* set the Gate high, program CTC channel 2 for mode 0 * (interrupt on terminal count mode), binary count, * load 5 * LATCH count, (LSB and MSB) * to begin countdown, read the TSC and busy wait. - * BTW LATCH is calculated in timex.h from the HZ value + * BTW LATCH is calculated in timex.h from the HZ value, + * and 5*LATCH is the most that will fit into 16 bits. */ - /* Set the Gate high, disable speaker */ - "inb $0x61, %%al\n\t" /* Read port */ - "andb $0xfd, %%al\n\t" /* Turn off speaker Data */ - "orb $0x01, %%al\n\t" /* Set Gate high */ - "outb %%al, $0x61\n\t" /* Write port */ - - /* Now let's take care of CTC channel 2 */ - "movb $0xb0, %%al\n\t" /* binary, mode 0, LSB/MSB, ch 2*/ - "outb %%al, $0x43\n\t" /* Write to CTC command port */ - "movb $0x0c, %%al\n\t" - "outb %%al, $0x42\n\t" /* LSB of count */ - "movb $0xe9, %%al\n\t" - "outb %%al, $0x42\n\t" /* MSB of count */ - - /* Read the TSC; counting has just started */ - "rdtsc\n\t" - /* Move the value for safe-keeping. */ - "movl %%eax, %%ebx\n\t" - "movl %%edx, %%ecx\n\t" - - /* Busy wait. Only 50 ms wasted at boot time. */ - "0: inb $0x61, %%al\n\t" /* Read Speaker Output Port */ - "testb $0x20, %%al\n\t" /* Check CTC channel 2 output (bit 5) */ - "jz 0b\n\t" - - /* And read the TSC. 5 jiffies (50.00077ms) have elapsed. */ - "rdtsc\n\t" - - /* Great. So far so good. Store last TSC reading in - * last_tsc_low (only 32 lsb bits needed) */ - "movl %%eax, last_tsc_low\n\t" - /* And now calculate the difference between the readings. */ - "subl %%ebx, %%eax\n\t" - "sbbl %%ecx, %%edx\n\t" /* 64-bit subtract */ - /* but probably edx = 0 at this point (see below). */ - /* Now we have 5 * (TSC counts per jiffy) in eax. We want - * to calculate TSC->microsecond conversion factor. */ - - /* Note that edx (high 32-bits of difference) will now be - * zero iff CPU clock speed is less than 85 GHz. Moore's - * law says that this is likely to be true for the next - * 12 years or so. You will have to change this code to - * do a real 64-by-64 divide before that time's up. */ - "movl %%eax, %%ecx\n\t" - "xorl %%eax, %%eax\n\t" - "movl %1, %%edx\n\t" - "divl %%ecx\n\t" /* eax= 2^32 / (1 * TSC counts per microsecond) */ - /* Return eax for the use of fast_gettimeoffset */ - "movl %%eax, %0\n\t" - : "=r" (retval) - : "r" (5 * 1000020/HZ) - : /* we clobber: */ "ax", "bx", "cx", "dx", "cc", "memory"); - return retval; + /* Set the Gate high, disable speaker */ + "inb $0x61, %%al\n\t" /* Read port */ + "andb $0xfd, %%al\n\t" /* Turn off speaker Data */ + "orb $0x01, %%al\n\t" /* Set Gate high */ + "outb %%al, $0x61\n\t" /* Write port */ + + /* Now let's take care of CTC channel 2 */ + "movb $0xb0, %%al\n\t" /* binary, mode 0, LSB/MSB, ch 2*/ + "outb %%al, $0x43\n\t" /* Write to CTC command port */ + "movb %3, %%al\n\t" + "outb %%al, $0x42\n\t" /* LSB of count */ + "movb %4, %%al\n\t" + "outb %%al, $0x42\n\t" /* MSB of count */ + + /* Read the TSC; counting has just started */ + "rdtsc\n\t" + /* Remeber the value for later (only 32 bits - see below) */ + "movl %%eax, %0\n" + + /* Busy wait. Only 50 ms wasted at boot time. */ + "0:\tinb $0x61, %%al\n\t" /* Read Speaker Output Port */ + "testb $0x20, %%al\n\t" /* Check CTC channel 2 output (bit 5) */ + "jz 0b\n\t" + + /* And read the TSC. 5 jiffies (50.00077ms) have elapsed. */ + "rdtsc" + : "=r" (before), "=&a" (after), "=d" (edx), + : "g" (unsigned char)(5*LATCH), + "g" (unsigned char)(5*LATCH/256), + : "cc"); + /* + * We only return 32 bits because the difference will fit as long + * as the CPU clock speed is less than 85 GHz. Moore's law says + * that this is likely to be true for the next 12 years or so. + * You will have to change this code to do a real 64-by-64 divide + * before that time's up. + */ + + /* Store last TSC reading in last_tsc_low (only 32 bits needed) */ + last_tsc_low = after; + + /* + * We want here 2^32 * elapsed microseconds, i.e. + * 2^32 * 5 * LATCH * (1000000/CLOCK_TICK_RATE). This can be + * computed to full precision if we use CLOCK_TICK_FACTOR, a factor + * of both 1000000 and CLOCK_TICK_RATE to reduce the size of the + * numbers so the numerator (barely!) fits into 64 bits. + * (It's B1CCF7C0 00000000) + */ + scaled_us = 5u*LATCH*(1000000/CLOCK_TICK_FACTOR) * (1ull<<32) / + (CLOCK_TICK_RATE/CLOCK_TICK_FACTOR); + + /* + * 64/32-bit divide to compute TSC->microseconds conversion factor. + * (GCC does it inefficiently, ugh, because it doesn't know the + * result won't overflow.) + */ + __asm__ ( + "divl %2" + : "=a" (retval), "=d" (edx) + : "g" (after-before), "a" ((unsigned)scaled_us), + "d" ((unsigned)(scaled_us>>32)) + : "cc"); + + return retval; } __initfunc(void time_init(void))

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/