Re: 2.2.12 + SMP + PDC20246 lockup -- OOPS included

Andre Hedrick (andre@suse.com)
Tue, 28 Sep 1999 19:14:09 -0700 (PDT)


I just verified your lock and I have a test now.
This was done on a Tyan TomCat with a Promise FastTrak running semi
native. Yes I did state that it does appear to be doing hardware RAID 1.
Now the question is who barfed and where.

Andre Hedrick
The Linux IDE guy

On Tue, 28 Sep 1999, Tom Livingston wrote:

> Hello,
>
> As You may remember I have been tracking down the very quick, repeatable
> lockups I have been experiencing with the PDC20246 and SMP. I was able to
> obtain an OOPS and decode it, so I am forwarding it at this point.
>
> As a refresher, my system is: RH60 base, 2.2.12 stock kernel, abit bp-6
> motherboard, dual celeron 366 cpu's, single pdc20246 with two drives, one
> connected to each channel.
>
> I've now experienced this lockup on three separate BP-6 based system (my
> only SMP systems available, unfortunately). One is all IDE, and has 3
> PDC20246 controllers and a total of 11 drives. Two others are normally scsi
> based systems (ncr53c8xx) but with the addition of a single pdc20246 card
> they will lockup as well.
>
> In tracking it down, it appears the basic cause is using both channels on a
> pdc20246 while in SMP mode. Two drives on one channel (with the other free)
> will not hang the system, nor will two drives on two controllers (with the
> 2nd channel on each controller). However, whenever two channels are used on
> one PDC20246, the systems lock up within seconds.
>
> I've previously included detailed boot logs, and much more detailed
> information about my hardware, so I won't use up the bandwidth again.
> However, I'd be happy to send any additional information required.
>
> To cause this behavior, boot the 2.2.12 kernel with SMP, and cause
> concurrent access on both drives. I do:
> 'hdparm -t /dev/hde & hdparm -t /dev/hdg'
>
> Note that I can also cause this lockup when using 2.2.10, 2.2.13pre12, and
> 2.2.12 + the ide patch. All cause the same behavior, so I chose to report
> this OOPS using 2.2.12 as the most 'vanilla'. This lockup does not occur at
> all with 2.2.6, the last stable revision I used. Any kernel version is
> stable in UP mode or with maxcpus=1.
>
> The system normally hangs without an OOPS, so to generate an OOPS I patched
> in ikd and enabled the NMI oopser with an IRQ source as 0.
>
> Note that this is the first time I have reported an OOPS, so if I have done
> something stupid, I'm sorry... please let me know. I am very motivated to
> help get this fixed, and am happy to rerun these tests or report more
> detailed information.
>
> Without further ado the OOPS (there are three in a row):
>
> CPU: 1
> EIP: 0010:[<c01d0e1d>]
> EFLAGS: 00000002
> eax: c0189f04 ebx: c7fec480 ecx: c0244aa4 edx: 0000000a
> esi: 24000001 edi: 00000082 ebp: c7ffbf18 esp: c7ffbf0c
> ds: 0018 es: 0018 ss: 0018
> Process swapper (pid: 0, process nr: 1, stackpage=c7ffb000)
> Stack: 00000004 c0196a3c c0244a88 c7ffbf38 c010ae17 0000000a c7fec480
> c7ffbf74
> 00000140 c7ff82a0 0000000a c7ffbf58 c0113517 0000000a c7ffbf74
> c7ff82a0
> c7ffbf74 c7ffa000 c0237720 c7ffbf6c c010af9b 0000000a c7ffbf74
> c7ffa000
> Call Trace: [<c0196a3c>] [<c010ae17>] [<c0113517>] [<c010af9b>] [<c0109a18>]
> [<c
> 0120018>] [<c0106fa5>]
> [<c01116fc>] [<c01116e4>] [<c01072a6>]
> Code: f6 03 01 75 fb e9 ed 90 fb ff f6 05 24 15 21 c0 01 75 f7 e9
>
> >>EIP: c01d0e1d <stext_lock+2525/4a08>
> Trace: c0196a3c <ide_dma_intr+0/90>
> Trace: c010ae17 <handle_IRQ_event+53/88>
> Trace: c0113517 <do_level_ioapic_IRQ+63/a4>
> Trace: c010af9b <do_IRQ+3b/5c>
> Trace: c0109a18 <common_interrupt+18/20>
> Code: c01d0e1d <stext_lock+2525/4a08> 00000000 <_EIP>: <===
> Code: c01d0e1d <stext_lock+2525/4a08> 0: f6 03 01
> testb $0x1,(%ebx) <===
> Code: c01d0e20 <stext_lock+2528/4a08> 3: 75 fb
> jne 0 <_EIP>
> Code: c01d0e22 <stext_lock+252a/4a08> 5: e9 ed 90 fb ff
> jmp c0189f14 <ide_intr+10/e4>
> Code: c01d0e27 <stext_lock+252f/4a08> a: f6 05 24 15 21 c0 01
> testb $0x1,0xc0211524
> Code: c01d0e2e <stext_lock+2536/4a08> 11: 75 f7
> jne c01d0e27 <stext_lock+252f/4a08>
> Code: c01d0e30 <stext_lock+2538/4a08> 13: e9 00 00 00 00
> jmp c01d0e35 <stext_lock+253d/4a08>
>
> Aiee, killing interrupt handler
> Kernel panic: Attempted to kill the idle task!
> CPU: 0
> EIP: 0010:[<c010aee0>]
> EFLAGS: 00000002
> eax: 00000140 ebx: 0000000a ecx: ffffd000 edx: c0220000
> esi: c0244be8 edi: c0244bcc ebp: c69f9c7c esp: c69f9c7c
> ds: 0018 es: 0018 ss: 0018
> Process hdparm (pid: 566, process nr: 43, stackpage=c69f9000)
> Stack: c69f9cc0 c01898ec 0000000a c7fec480 c69f8000 c44d0a80 c44d0a80
> c0244be8
> 00000000 c0244be8 00000000 00000283 00000022 c69f9cf0 c69f9cf0
> 00000046
> c0244bcc c69f9cdc c0189bc0 c7fec480 c69f9cd8 00000000 00000246
> 00000086
> Call Trace: [<c01898ec>] [<c0189bc0>] [<c0189c6f>] [<c018742c>] [<c012d02e>]
> [<c
> 01314da>] [<c012c160>]
> [<c0108828>] [<c010002b>]
> Code: 75 fa 8b 5d fc 89 ec 5d c3 8d 76 00 55 89 e5 57 56 53 8b 55
>
> >>EIP: c010aee0 <disable_irq+34/40>
> Trace: c01898ec <ide_do_request+2f0/548>
> Trace: c0189bc0 <do_hwgroup_request+48/5c>
> Trace: c0189c6f <do_ide3_request+17/2c>
> Trace: c018742c <unplug_device+48/5c>
> Trace: c012d02e <__wait_on_buffer+c6/140>
> Code: c010aee0 <disable_irq+34/40> 00000000 <_EIP>: <===
> Code: c010aee0 <disable_irq+34/40> 0: 75 fa
> jne c010aedc <disable_irq+30/40> <===
> Code: c010aee2 <disable_irq+36/40> 2: 8b 5d fc
> movl 0xfffffffc(%ebp),%ebx
> Code: c010aee5 <disable_irq+39/40> 5: 89 ec
> movl %ebp,%esp
> Code: c010aee7 <disable_irq+3b/40> 7: 5d
> popl %ebp
> Code: c010aee8 <disable_irq+3c/40> 8: c3
> ret
> Code: c010aee9 <disable_irq+3d/40> 9: 8d 76 00
> leal 0x0(%esi),%esi
> Code: c010aeec <enable_irq+0/74> c: 55
> pushl %ebp
> Code: c010aeed <enable_irq+1/74> d: 89 e5
> movl %esp,%ebp
> Code: c010aeef <enable_irq+3/74> f: 57
> pushl %edi
> Code: c010aef0 <enable_irq+4/74> 10: 56
> pushl %esi
> Code: c010aef1 <enable_irq+5/74> 11: 53
> pushl %ebx
> Code: c010aef2 <enable_irq+6/74> 12: 8b 55 00
> movl 0x0(%ebp),%edx
>
> CPU: 0
> EIP: 0010:[<c01121b9>]
> EFLAGS: 00000002
> eax: 00000000 ebx: 00000000 ecx: 00000032 edx: 00000000
> esi: 00000000 edi: c69f8000 ebp: c69f9b94 esp: c69f9b9c
> ds: 0018 es: 0018 ss: 0018
> Process hdparm (pid: 566, process nr: 43, stackpage=c69f9000)
> Stack: c69f9bf4 c010a87a 00000000 00000032 ffffffac 00000000 c69f8000
> c69f9bf4
> 00000030 c69f0018 c69f0018 ffffffff c0108998 00000010 00000246
> c011d971
> 00000010 00000246 c70588a0 00000100 c69f8000 c69f8000 c69f9c14
> c011dc0a
> Call Trace: [<c010a87a>] [<c0108998>] [<c011d971>] [<c011dc0a>] [<c0109383>]
> [<c
> 01d2240>] [<c01089b6>]
> [<c010aee0>] [<c01898ec>] [<c0189bc0>] [<c0189c6f>] [<c018742c>]
> [<c012d0
> [<c0108828>] [<c010002b>]
> Code: eb fd 90 eb fe 89 f6 55 89 e5 e8 cc ff ff ff 89 ec 5d c3 55
>
> >>EIP: c01121b9 <stop_this_cpu+25/2c>
> Trace: c010a87a <stop_cpu_interrupt+1a/20>
> Trace: c0108998 <nmi+0/30>
> Trace: c011d971 <exit_notify+24d/264>
> Trace: c011dc0a <do_exit+282/2c8>
> Trace: c0109383 <do_nmi+73/8c>
> Code: c01121b9 <stop_this_cpu+25/2c> 00000000 <_EIP>: <===
> Code: c01121b9 <stop_this_cpu+25/2c> 0: eb fd
> jmp c01121b8 <stop_this_cpu+24/2c> <===
> Code: c01121bb <stop_this_cpu+27/2c> 2: 90
> nop
> Code: c01121bc <stop_this_cpu+28/2c> 3: eb fe
> jmp c01121bc <stop_this_cpu+28/2c>
> Code: c01121be <stop_this_cpu+2a/2c> 5: 89 f6
> movl %esi,%esi
> Code: c01121c0 <smp_stop_cpu_interrupt+0/c> 7: 55
> pushl %ebp
> Code: c01121c1 <smp_stop_cpu_interrupt+1/c> 8: 89 e5
> movl %esp,%ebp
> Code: c01121c3 <smp_stop_cpu_interrupt+3/c> a: e8 cc ff ff ff
> call c0112194 <stop_this_cpu+0/2c>
> Code: c01121c8 <smp_stop_cpu_interrupt+8/c> f: 89 ec
> movl %ebp,%esp
> Code: c01121ca <smp_stop_cpu_interrupt+a/c> 11: 5d
> popl %ebp
> Code: c01121cb <smp_stop_cpu_interrupt+b/c> 12: c3
> ret
> Code: c01121cc <smp_call_function_interrupt+0/3c> 13: 55
> pushl %ebp
>
> Regards,
>
> Tom
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/