Re: 2.0.30 (and pre-2.0.31) OOPS - repeatable, reliable, disgraceful :-)

Linus Torvalds (torvalds@transmeta.com)
30 May 1997 05:00:16 GMT


In article <199705300035.TAA00585@eliot213.wuh.wustl.edu>,
Evan Jeffrey <ejeffrey@eliot213.wuh.wustl.edu> wrote:
>>
>>May 29 14:54:45 ferret kernel: general protection: 0000
>...
>>Methinks this should be fixed.... patches cc: to this address please,
>>since we're about to reboot after a memory upgrade....
>
>I posted about this in 2.1.38, in which case it locks the system rock solid
>for approximately 5 minutes, at which point I get the segfault and
>everything contiunues normally. I can confirm that it still happens in
>2.1.41, and in addition, a very interesting property: the lock is VERY
>hard, ie my clock is now 10 minutes slow (I just did it twice...) I have no
>idea if this is normal or not, but I thought it was interesting.

Oops.

The reason for the lock is because the later 2.1.x had a delay loop in
the panic prinout routine. And that delay loop is "kind of" long. It
was supposed to be for something like 20 seconds on a 200MHz PPro, I can
well imagine that it takes ten minutes on a slower machine.

Fix: just delete the two _long_ while-loops in arch/i386/kernel/traps.c
(die_if_kernel()). That was just a debugging aid that was never supposed
to be released but which I didn't notice in my diffs ;)

>May 29 19:22:31 eliot213 kernel: general protection: 0000
>May 29 19:22:31 eliot213 kernel: CPU: 0
>May 29 19:22:31 eliot213 kernel: EIP: 0010:[<c01095f6>]
>May 29 19:22:31 eliot213 kernel: EFLAGS: 00010282
>May 29 19:22:31 eliot213 kernel: eax: 00000000 ebx: 0805ba78 ecx: 0805baa8 edx: 00000000
>May 29 19:22:31 eliot213 kernel: esi: 0805baa8 edi: bffffd3a ebp: bffffc28 esp: c0329fec
>May 29 19:22:31 eliot213 kernel: ds: 002b es: 002b ss: 0018
>May 29 19:22:31 eliot213 kernel: Process vmlinux (pid: 429, process nr: 41, stackpage=c0329000)
>May 29 19:22:31 eliot213 kernel: Stack: c0100000 00000023 00000282 bffffbb4 0000002b
>May 29 19:22:31 eliot213 kernel: Call Trace: [<c0100000>]
>May 29 19:22:31 eliot213 kernel: Code: cf 89 f6 8d bc 27 00 00 00 00 f7 44 24 30 00 00 02 00 54 75
>
>>>EIP: c01095f6 <ret_with_reschedule+25/2f>
>Trace: c0100000 <startup_32>

This is a pretty harmless panic in itself - it's just the delay loop
which makes it bothersome. The return to user mode is returning with a
EIP that is out of range for the code segment. It's not pretty, but it
is totally harmless (it is caused by the fact that the "initial eip" for
the kernel image is outside the normal user space address space, and the
kernel doesn't actually check it in execve() because it knows the thing
will be caught later).

This patch should fix it (NOTE! Do NOT EVER apply this patch to the
2.0.x tree: it's a major security hole in the 2.0.x series. It only
works on 2.1.x due to the new user level access scheme).

Totally untested, of course,

Linus

----- Apply to 2.1.x ONLY -----
diff -u --recursive --new-file v2.1.42/linux/arch/i386/kernel/head.S linux/arch/i386/kernel/head.S
--- v2.1.42/linux/arch/i386/kernel/head.S Tue May 13 22:41:00 1997
+++ linux/arch/i386/kernel/head.S Thu May 29 21:52:23 1997
@@ -532,8 +532,8 @@
.quad 0x0000000000000000 /* not used */
.quad 0x00cf9a000000ffff /* 0x10 kernel 4GB code at 0x00000000 */
.quad 0x00cf92000000ffff /* 0x18 kernel 4GB data at 0x00000000 */
- .quad 0x00cbfa000000ffff /* 0x23 user 3GB code at 0x00000000 */
- .quad 0x00cbf2000000ffff /* 0x2b user 3GB data at 0x00000000 */
+ .quad 0x00cffa000000ffff /* 0x23 user 4GB code at 0x00000000 */
+ .quad 0x00cff2000000ffff /* 0x2b user 4GB data at 0x00000000 */
.quad 0x0000000000000000 /* not used */
.quad 0x0000000000000000 /* not used */
.fill 2*NR_TASKS,8,0 /* space for LDT's and TSS's etc */
diff -u --recursive --new-file v2.1.42/linux/arch/i386/kernel/traps.c linux/arch/i386/kernel/traps.c
--- v2.1.42/linux/arch/i386/kernel/traps.c Tue May 13 22:41:01 1997
+++ linux/arch/i386/kernel/traps.c Thu May 29 21:45:21 1997
@@ -191,8 +191,6 @@
spin_lock_irq(&die_lock);
printk("%s: %04lx\n", str, err & 0xffff);
show_registers(regs);
-do { int i=2000000000; while (i) i--; } while (0);
-do { int i=2000000000; while (i) i--; } while (0);
spin_unlock_irq(&die_lock);
do_exit(SIGSEGV);
}
----- Apply to 2.1.x ONLY -----