Re: A new K6 bug

Benoit Poulot-Cazajous (poulot@sunchorus.france.sun.com)
01 Jun 1998 04:06:22 +0200


The problem reported by andreas@camus.xss.co.at with crashme is really caused
by a K6 bug. It can be reproduced at will on 2.0.xx kernels. It looks like
2.1.xx kernels hide the bug.

Here is how to reproduce it :

$ cat a.s
.text
.align 4096 /* r1 */
.globl _start
_start:
movl _start, %edi /* S1 */
cmpb 0x80000000(%edi),%dl /* r2, S2 */
je nowhere /* r3 */
ret
$ as -o a.o a.s
$ ld -defsym nowhere=0xc0000000 a.o
$ ./a.out
<lockup. hard reset required>

Remarks :
r1) _start must be aligned, otherwise you get a segfault instead of a lockup.
r2) Using movb instead of compb does not work.
r3) Tries to escape the code segment. Before 2.1.43, the code segments ended
at bfffffff. After and including 2.1.43, escaping is not possible, because
the code segment covers the whole address space (reducing this segment
to 3.75 GB allows to trigger the bug on 2.1.103).

Speculations :
S1) edi must be loaded with the address of something in a deep cache on the
CPU. _start works well.
S2) tries to access an invalid address. This address should look like an
already cached address. If only the highest bits are different, it is
probably more difficult to notice that the address is not really cached.
So using _start+0x80000000 works well.

I don't known if this bug is already fixed in recent revisions of the K6.
I was able to crash a K6 bought only a month ago, so AMD may not be aware
of the problem.

Is there anybody out there willing to propagate the 2.1.43 change to
pre-2.0.34 ?

-- Benoit

PS: How large is the code segment on NT ? ;-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu