Interrupt sharing problem (kernel freeze)

From: Bernhard Mogens Ege (bme@vision.auc.dk)
Date: Thu Oct 05 2000 - 02:45:55 EST


I have for several month now had my kernel randomly freeze on me
without me getting anything in the log files. The other day I let the
machine stay in virtual console 1 (ascii mode) and let X run on
virtual console 7 (vc7). The machine still crashed, but I got the
kernel death cry. Here goes (after running it through ksymoops):

unable to handle kernel paging request at virtual address ffffffff
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
oops: 0000
cpu: 0
eip: 0010:[<ffffffff>]
eflags: 00210286
eax: 0000000f ebx: c7e471c0 ecx: 00000000 edx: 00000001
esi: c885b9bc edi: c022c3a4 ebp: c0247f4c esp: c0247f18
ds: 0018 es: 0018 ss: 0018
stack: 00000001 c7e471c0 c01127ae c7e471c0 00000001 c027dea0 00259eae c022c3a4
       00000001 c010ae3a 00000000 00000000 00000000 c0247f60 c01196d9 00000000
       c0246000 c010b19a 00000e00 c010ae60 00000000 c0246000 00000000 c0246000
call trace: [<c01127ae>] [<c010ae3a>] [<c01196d9>] [<c010b19a>] [<c010ae60>] [<c01088dd>] [<c0106000>]
            [<c0108900>] [<c010a06c>] [<c0106000>] [<c0106077>] [<c0106000>] [<c0100175>]
code: bad eip value.
Warning: trailing garbage ignored on Code: line
  Text: 'code: bad eip value.'
  Garbage: 'ip value.'
Oops_code_values invalid value 0xbad in Code line, not a multiple of 2 digits, value ignored
Oops_code_values invalid value 0xe in Code line, not a multiple of 2 digits, value ignored

>>EIP: ffffffff <END_OF_CODE+37649e23/???
Trace: c01127ae <timer_bh+2be/404>
Trace: c010ae3a <do_8259A_IRQ+9a/a8>
Trace: c01196d9 <do_bottom_half+49/70>
Trace: c010b19a <do_IRQ+3a/3c>
Trace: c010ae60 <common_interrupt+18/20>
Trace: c01088dd <cpu_idle+5d/6c>
Trace: c0106000 <get_options+0/70>
Trace: c0108900 <sys_idle+14/20>
Code: ffffffff <END_OF_CODE+37649e23/??? 00000000 <_EIP>: <===

aiee, killing interrupt handler
kernel panic: attempted to kill the idle task!
in swapper task - not syncing

1737 warnings and 5 errors issued. Results may not be reliable.

Looking at the trace, I can see the cpu wasn't actually doing
anything when it crashed. Well, an interrupt occured, and that
apparently is fatal on my machine. Now I wonder why. There is not much
to go on with, except debugging the interrupt handler, which I an NOT
qualified to do.

Out of interrest, I looked at /proc/interrupts:

           CPU0
  0: 200596 XT-PIC timer
  1: 4217 XT-PIC keyboard
  2: 0 XT-PIC cascade
  5: 0 XT-PIC MPU-401 UART
  8: 1 XT-PIC rtc
  9: 166867 XT-PIC usb-ohci, nvidia
 10: 0 XT-PIC Yamaha DS-XG
 11: 11417 XT-PIC eth0
 12: 18506 XT-PIC PS/2 Mouse
 13: 1 XT-PIC fpu
 14: 483885 XT-PIC ide0
 15: 16130 XT-PIC ide1
NMI: 0

Should I be worried about the nvidia driver/hardware sharing
interrupts with the USB port? That's apparently the only problem I can
see, but I have earlier had USB be shared with eth0 and the system
still crashed, though not that often. The NIC generates less
interrupts than the graphics card.

This is my general hardware setup:

AMD Athlon 500
Microstart 6167 motherboard (RedHat kernel 2.2.16-22)
Nvidia TNT2 (now GeForce2MX) (XFree86-4.0.1-1 with nvidia drivers)
Yamaha YMF724 (oss drivers)
3com 3c905b (now RealTek 8139) (3c59x/3c90x or rtl8139 drivers)
ISA scsi card (sym53c416, drivers not loaded).

/proc/interrupts
           CPU0
  0: 5994135 XT-PIC timer
  1: 8669 XT-PIC keyboard
  2: 0 XT-PIC cascade
  5: 0 XT-PIC MPU-401 UART
  8: 1 XT-PIC rtc
  9: 158473 XT-PIC nvidia
 10: 0 XT-PIC Yamaha DS-XG
 11: 64177 XT-PIC eth0
 12: 12952 XT-PIC PS/2 Mouse
 13: 1 XT-PIC fpu
 14: 359001 XT-PIC ide0
 15: 530370 XT-PIC ide1
NMI: 0

USB drivers was using irq 9. USB is now disabled. The log above was
with USB enabled, though.

lspci:

00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] System Controller (rev 23)
00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] AGP Bridge (rev 01)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-756 [Viper] ISA (rev 01)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-756 [Viper] IDE (rev 03)00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-756 [Viper] ACPI (rev 03)
00:08.0 Multimedia audio controller: Yamaha Corporation YMF-724F [DS-1 Audio Controller] (rev 03)
00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
01:05.0 VGA compatible controller: nVidia Corporation NV11 (rev a1)

USB would be listed here as the AMD usb interface (taken from /etc/sysconfig/hwconf):

"Advanced Micro Devices [AMD]|AMD-756 [Viper] USB"

I have had USB be shared with eth0 and nvidia, both leading to
freezes. I guess the kernel death cry would be that same as the one I
losted in this mail.

I realise that I am not using a standard kernel, and that you might
not use my bug report because of that, but I don't think RedHat would
change the interrupt handling of the kernel. They would rather add
features (such as USB support and AGP driver support). Anyway, I have
tried a standard kernel (2.2.16) with no success. Apparently interrupt
sharing is not stable with my chipset.

I hope I have given enough information to aid the debugging. If not,
please email me (as I am not a subscriber to this list).

regards,

Bernhard Ege
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Oct 07 2000 - 21:00:15 EST