> Hi,
>
> could everyone involved with this discussion please check out:
>
> ftp://ftp.kernel.org/pub/linux/kernel/testing/
> pre-patch-2.1.99-1.gz
>
> which is a cleanup of my previous patches wrt irq handling, and also fixes
> a real bug (we used to ACK the io-apic outside the irq-controller lock,
> which meant that the ack's we did and "ipi_pending[]" might have gotten
> out of sync - which could certainly have resulted in bad behaviour).
>
> This also re-enables the code that replays interrupts in enable_irq(),
> because it should be ok now that the rest of the code is cleaned up.
> People that had the earlier problem with locking up with floppies, please
> test: if this re-introduces the lockup, please just #if 0 out all the code
> inside trigger_pending_irqs(), and send me a note telling me that that
> code still doesn't work.
Hello,
I have just recompiled the kernel with clean 2.1.98+pre-patch-2.1.99-1
Only printer driver and ppp compressors are compiled as module.
Now I'm running that kernel, searching hard for troubles.
My hardware is:
Tyan Tomcat IIID (Dual P200)
32 MB ram
80 MB swap
ide0(2xHD)
ide1(1xHD+1xCDROM)
ncr53c810(1xHD)
Audio Excel DSP 16 (MSS emulation)
4 serial (1xUPS control; 1xmouse; 1xinternal modem; 1xhandheld backup)
1 parallel (driver modularized, not loaded)
I'm running:
X11
KDE
kernel ricompilation (make -j3)
Kmidi (midi2wav realtime converter and output to /dev/dsp)
pine (big mailbox, lot'o'memory consumed)
The compilation terminated okay in 23 minutes and nothing wrong occurred.
Then I have started to play with floppy disks and the troubles started.
Mounted /dev/fd0 (1.44M) and /dev/fd1 (1.2M) disks and ... just some bytes
written to fd0 ... *ka-boom* !!
Machine was totally frozen (hmm but X11 hides text console)
So I rebooted in text mode .. and tested _only_ the floppy driver.
Mounted fd0 and fd1 and .. after some writes .. *ka-boom* !!
But now I can see that alt-sysrq-* works ... *ah-ehm* .. sort of .. I can
sysrq-P ... sysrq-B okay, it rebooted the PC ... sysrq-U not worked at all
(it told me that disks was remounted R/O but my bcheckrc fsck'ed all the
partitions).
sysrq-P told me that the kernel was trapped in an (I guess) infinite loop.
Let's see where.
Hmm I have to admit I have alredy run make clean - stupid me - but I am
remaking the kernel just to rebuild the System.map (I hope it will not
change between two subsequent 'make', same configuration).
Now a description of the deadly loop.
IP caught with alt-sysrq-P: c01e2fcc / c01e2fd3 / c01e2fe7 / c01e2fef
>From System.map:
...
c01e2fc0 T __lock_kernel
c01e2ffc T __delay
...
arch/i386/lib/locks.S::__lock_kernel() [disassembled]
locks.o: file format elf32-i386
Disassembly of section .text:
00000000 <__lock_kernel>:
0: f0 0f ba 2d 00 lock btsl $0x0,0x0
5: 00 00 00 00
9: 73 29 jae 34 <__lock_kernel+0x34>
b: fb sti
c: 0f a3 15 00 00 btl %edx,0x0
11: 00 00
13: 73 12 jae 27 <__lock_kernel+0x27>
15: f0 0f b3 15 00 lock btrl %edx,0x0
1a: 00 00 00
1d: 73 08 jae 27 <__lock_kernel+0x27>
1f: 50 pushl %eax
20: 0f 20 d8 movl %cr3,%eax
23: 0f 22 d8 movl %eax,%cr3
26: 58 popl %eax
27: 0f ba 25 00 00 btl $0x0,0x0
2c: 00 00 00
2f: 72 db jb c <__lock_kernel+0xc>
31: fa cli
32: eb cc jmp 0 <__lock_kernel>
34: 88 15 00 00 00 movb %dl,0x0
39: 00
3a: c3 ret
The loop seems to be: 0c -> 13 -> 27 -> 2f
It seems consistent with the IPs I have caugth with sysrq-P so I think the
(rebuilt) System.map is good and the kernel was locked in __lock_kernel().
Hope this helps.
Ciao,
Riccardo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu