Re: 2.2.12 + SMP + PDC20246 lockup -- OOPS included

Andre Hedrick (andre@suse.com)
Tue, 28 Sep 1999 19:08:29 -0700 (PDT)


This is something I have been trying to generate.
Not the output, but the LOCKUP.

I am greatful for your crash, you may have helped pull my chestnuts out of
a fire storm. No I just have to figure out what to do with it.
Then FIX IT..............

Andre Hedrick
The Linux IDE guy

On Tue, 28 Sep 1999, Tom Livingston wrote:

> Hello,
>
> As You may remember I have been tracking down the very quick, repeatable
> lockups I have been experiencing with the PDC20246 and SMP. I was able to
> obtain an OOPS and decode it, so I am forwarding it at this point.
>
> As a refresher, my system is: RH60 base, 2.2.12 stock kernel, abit bp-6
> motherboard, dual celeron 366 cpu's, single pdc20246 with two drives, one
> connected to each channel.
>
> I've now experienced this lockup on three separate BP-6 based system (my
> only SMP systems available, unfortunately). One is all IDE, and has 3
> PDC20246 controllers and a total of 11 drives. Two others are normally scsi
> based systems (ncr53c8xx) but with the addition of a single pdc20246 card
> they will lockup as well.
>
> In tracking it down, it appears the basic cause is using both channels on a
> pdc20246 while in SMP mode. Two drives on one channel (with the other free)
> will not hang the system, nor will two drives on two controllers (with the
> 2nd channel on each controller). However, whenever two channels are used on
> one PDC20246, the systems lock up within seconds.
>
> I've previously included detailed boot logs, and much more detailed
> information about my hardware, so I won't use up the bandwidth again.
> However, I'd be happy to send any additional information required.
>
> To cause this behavior, boot the 2.2.12 kernel with SMP, and cause
> concurrent access on both drives. I do:
> 'hdparm -t /dev/hde & hdparm -t /dev/hdg'
>
> Note that I can also cause this lockup when using 2.2.10, 2.2.13pre12, and
> 2.2.12 + the ide patch. All cause the same behavior, so I chose to report
> this OOPS using 2.2.12 as the most 'vanilla'. This lockup does not occur at
> all with 2.2.6, the last stable revision I used. Any kernel version is
> stable in UP mode or with maxcpus=1.
>
> The system normally hangs without an OOPS, so to generate an OOPS I patched
> in ikd and enabled the NMI oopser with an IRQ source as 0.
>
> Note that this is the first time I have reported an OOPS, so if I have done
> something stupid, I'm sorry... please let me know. I am very motivated to
> help get this fixed, and am happy to rerun these tests or report more
> detailed information.
>
> Without further ado the OOPS (there are three in a row):
>
> CPU: 1
> EIP: 0010:[<c01d0e1d>]
> EFLAGS: 00000002
> eax: c0189f04 ebx: c7fec480 ecx: c0244aa4 edx: 0000000a
> esi: 24000001 edi: 00000082 ebp: c7ffbf18 esp: c7ffbf0c
> ds: 0018 es: 0018 ss: 0018
> Process swapper (pid: 0, process nr: 1, stackpage=c7ffb000)
> Stack: 00000004 c0196a3c c0244a88 c7ffbf38 c010ae17 0000000a c7fec480
> c7ffbf74
> 00000140 c7ff82a0 0000000a c7ffbf58 c0113517 0000000a c7ffbf74
> c7ff82a0
> c7ffbf74 c7ffa000 c0237720 c7ffbf6c c010af9b 0000000a c7ffbf74
> c7ffa000
> Call Trace: [<c0196a3c>] [<c010ae17>] [<c0113517>] [<c010af9b>] [<c0109a18>]
> [<c
> 0120018>] [<c0106fa5>]
> [<c01116fc>] [<c01116e4>] [<c01072a6>]
> Code: f6 03 01 75 fb e9 ed 90 fb ff f6 05 24 15 21 c0 01 75 f7 e9
>
> >>EIP: c01d0e1d <stext_lock+2525/4a08>
> Trace: c0196a3c <ide_dma_intr+0/90>
> Trace: c010ae17 <handle_IRQ_event+53/88>
> Trace: c0113517 <do_level_ioapic_IRQ+63/a4>
> Trace: c010af9b <do_IRQ+3b/5c>
> Trace: c0109a18 <common_interrupt+18/20>
> Code: c01d0e1d <stext_lock+2525/4a08> 00000000 <_EIP>: <===
> Code: c01d0e1d <stext_lock+2525/4a08> 0: f6 03 01
> testb $0x1,(%ebx) <===
> Code: c01d0e20 <stext_lock+2528/4a08> 3: 75 fb
> jne 0 <_EIP>
> Code: c01d0e22 <stext_lock+252a/4a08> 5: e9 ed 90 fb ff
> jmp c0189f14 <ide_intr+10/e4>
> Code: c01d0e27 <stext_lock+252f/4a08> a: f6 05 24 15 21 c0 01
> testb $0x1,0xc0211524
> Code: c01d0e2e <stext_lock+2536/4a08> 11: 75 f7
> jne c01d0e27 <stext_lock+252f/4a08>
> Code: c01d0e30 <stext_lock+2538/4a08> 13: e9 00 00 00 00
> jmp c01d0e35 <stext_lock+253d/4a08>
>
> Aiee, killing interrupt handler
> Kernel panic: Attempted to kill the idle task!
> CPU: 0
> EIP: 0010:[<c010aee0>]
> EFLAGS: 00000002
> eax: 00000140 ebx: 0000000a ecx: ffffd000 edx: c0220000
> esi: c0244be8 edi: c0244bcc ebp: c69f9c7c esp: c69f9c7c
> ds: 0018 es: 0018 ss: 0018
> Process hdparm (pid: 566, process nr: 43, stackpage=c69f9000)
> Stack: c69f9cc0 c01898ec 0000000a c7fec480 c69f8000 c44d0a80 c44d0a80
> c0244be8
> 00000000 c0244be8 00000000 00000283 00000022 c69f9cf0 c69f9cf0
> 00000046
> c0244bcc c69f9cdc c0189bc0 c7fec480 c69f9cd8 00000000 00000246
> 00000086
> Call Trace: [<c01898ec>] [<c0189bc0>] [<c0189c6f>] [<c018742c>] [<c012d02e>]
> [<c
> 01314da>] [<c012c160>]
> [<c0108828>] [<c010002b>]
> Code: 75 fa 8b 5d fc 89 ec 5d c3 8d 76 00 55 89 e5 57 56 53 8b 55
>
> >>EIP: c010aee0 <disable_irq+34/40>
> Trace: c01898ec <ide_do_request+2f0/548>
> Trace: c0189bc0 <do_hwgroup_request+48/5c>
> Trace: c0189c6f <do_ide3_request+17/2c>
> Trace: c018742c <unplug_device+48/5c>
> Trace: c012d02e <__wait_on_buffer+c6/140>
> Code: c010aee0 <disable_irq+34/40> 00000000 <_EIP>: <===
> Code: c010aee0 <disable_irq+34/40> 0: 75 fa
> jne c010aedc <disable_irq+30/40> <===
> Code: c010aee2 <disable_irq+36/40> 2: 8b 5d fc
> movl 0xfffffffc(%ebp),%ebx
> Code: c010aee5 <disable_irq+39/40> 5: 89 ec
> movl %ebp,%esp
> Code: c010aee7 <disable_irq+3b/40> 7: 5d
> popl %ebp
> Code: c010aee8 <disable_irq+3c/40> 8: c3
> ret
> Code: c010aee9 <disable_irq+3d/40> 9: 8d 76 00
> leal 0x0(%esi),%esi
> Code: c010aeec <enable_irq+0/74> c: 55
> pushl %ebp
> Code: c010aeed <enable_irq+1/74> d: 89 e5
> movl %esp,%ebp
> Code: c010aeef <enable_irq+3/74> f: 57
> pushl %edi
> Code: c010aef0 <enable_irq+4/74> 10: 56
> pushl %esi
> Code: c010aef1 <enable_irq+5/74> 11: 53
> pushl %ebx
> Code: c010aef2 <enable_irq+6/74> 12: 8b 55 00
> movl 0x0(%ebp),%edx
>
> CPU: 0
> EIP: 0010:[<c01121b9>]
> EFLAGS: 00000002
> eax: 00000000 ebx: 00000000 ecx: 00000032 edx: 00000000
> esi: 00000000 edi: c69f8000 ebp: c69f9b94 esp: c69f9b9c
> ds: 0018 es: 0018 ss: 0018
> Process hdparm (pid: 566, process nr: 43, stackpage=c69f9000)
> Stack: c69f9bf4 c010a87a 00000000 00000032 ffffffac 00000000 c69f8000
> c69f9bf4
> 00000030 c69f0018 c69f0018 ffffffff c0108998 00000010 00000246
> c011d971
> 00000010 00000246 c70588a0 00000100 c69f8000 c69f8000 c69f9c14
> c011dc0a
> Call Trace: [<c010a87a>] [<c0108998>] [<c011d971>] [<c011dc0a>] [<c0109383>]
> [<c
> 01d2240>] [<c01089b6>]
> [<c010aee0>] [<c01898ec>] [<c0189bc0>] [<c0189c6f>] [<c018742c>]
> [<c012d0
> [<c0108828>] [<c010002b>]
> Code: eb fd 90 eb fe 89 f6 55 89 e5 e8 cc ff ff ff 89 ec 5d c3 55
>
> >>EIP: c01121b9 <stop_this_cpu+25/2c>
> Trace: c010a87a <stop_cpu_interrupt+1a/20>
> Trace: c0108998 <nmi+0/30>
> Trace: c011d971 <exit_notify+24d/264>
> Trace: c011dc0a <do_exit+282/2c8>
> Trace: c0109383 <do_nmi+73/8c>
> Code: c01121b9 <stop_this_cpu+25/2c> 00000000 <_EIP>: <===
> Code: c01121b9 <stop_this_cpu+25/2c> 0: eb fd
> jmp c01121b8 <stop_this_cpu+24/2c> <===
> Code: c01121bb <stop_this_cpu+27/2c> 2: 90
> nop
> Code: c01121bc <stop_this_cpu+28/2c> 3: eb fe
> jmp c01121bc <stop_this_cpu+28/2c>
> Code: c01121be <stop_this_cpu+2a/2c> 5: 89 f6
> movl %esi,%esi
> Code: c01121c0 <smp_stop_cpu_interrupt+0/c> 7: 55
> pushl %ebp
> Code: c01121c1 <smp_stop_cpu_interrupt+1/c> 8: 89 e5
> movl %esp,%ebp
> Code: c01121c3 <smp_stop_cpu_interrupt+3/c> a: e8 cc ff ff ff
> call c0112194 <stop_this_cpu+0/2c>
> Code: c01121c8 <smp_stop_cpu_interrupt+8/c> f: 89 ec
> movl %ebp,%esp
> Code: c01121ca <smp_stop_cpu_interrupt+a/c> 11: 5d
> popl %ebp
> Code: c01121cb <smp_stop_cpu_interrupt+b/c> 12: c3
> ret
> Code: c01121cc <smp_call_function_interrupt+0/3c> 13: 55
> pushl %ebp
>
> Regards,
>
> Tom
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/