Re: PROBLEM: 2.2.5 unstable on Dell PC, 2.0.36 is stable

Richard Black (rjb@dcs.gla.ac.uk)
Wed, 04 Aug 99 11:22:50 +0100


--==_Exmh_817944993040
Content-Type: text/plain; charset="us-ascii"

Still getting problems with 2.2.10 and its aix7x patches on Dell WS 400. I was
not getting serial console messages (due to crash within interrupt routine?)
so I modified the 2.2.10 kernel so I could get the registers on the console
irrespective of the length of the stack output (patches enclosed) by putting
the registers at the end (also reduced the amount of junk printed around hex
so one could get more information on the screen). With that change I get the
following OOPS message/dump.

Call Trace: c0117cf9 c01180e5 c0107f8e c0107f93 c010f23c c01e302e
c0107b71 c0111786 c01e3196 c0117c9f c01180e5 c0107f93 c010f23c c01e302e
c0107b71 c0111786 c01e3196 c0117c9f c01180e5 c0107f93 c010f23c c01e302e
c0107b71 c0111786 c01e3196 c0117c9f c01180e5 c0107f8e c0107f93 c010f23c
c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5 c0107f8e c0107f93
c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5 c0107f93
c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5 c0107f93
c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5 c0107f93
c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5 c0107f93
c0107f93 c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5
c0107f93 c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9 c01180e5
c0107fe8 c0107f93 c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9
c01180e5 c0107f93 c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9
c01180e5 c0107f93 c010f23c c01e302e c0107b71 c0111786 c01e3196 c0117cf9
Code: c7 05 00 00 00 00 00 00 00 00 8d 65 d8 5b 5e 5f 89 ec 5d c3
CPU: 0
EIP 0010:[<c0111786>]
EFLAGS: 00010286
eax 00000018 ebx cfaea000 ecx 00000001 edx c9eb0000
esi 00101000 edi cfaea000 ebp cfaea608 esp cfaea5e4
ds 0018 es 0018 ss 0018
Process rc5des (pid 2905, process nr:60, stackpage = cfaeb000)Aieee, killing
interrupt handler

If I look in System.map I get:

c01113d8 T schedule
c011179c T __wake_up

around the failing address.

Looks like it hit line 865 in kernel/sched.c

> scheduling_in_interrupt:
> printk("Scheduling in interrupt\n");
> *(int *)0 = 0;
> return;
> }

Note that this machine has multiprocessor capable motherboard but with
only one processor. Here is appropriate blurb from system startup:

Error: only one processor found.
enabling symmetric IO mode... ...done.
ENABLING IO-APIC IRQs
init IO_APIC IRQs
IO-APIC pin 0, 2, 20, 21, 22, 23 not connected.
...trying to set up timer as ExtINT... .. (found pin 0) ... works.
number of MP IRQ sources: 37.
number of IO-APIC registers: 24.
testing the IO APIC.......................
.... register #00: 00000000
....... : physical APIC id: 00
.... register #01: 00170011
....... : max redirection entries: 0017
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 0 0 0 0 0 0 7 51
01 000 00 0 0 0 0 0 1 1 59
02 000 00 1 0 0 0 0 0 0 00
03 000 00 0 0 0 0 0 1 1 61
04 000 00 0 0 0 0 0 1 1 69
05 000 00 0 0 0 0 0 1 1 71
06 000 00 0 0 0 0 0 1 1 79
07 000 00 0 0 0 0 0 1 1 81
08 000 00 0 0 0 0 0 1 1 89
09 000 00 0 0 0 0 0 1 1 91
0a 000 00 0 0 0 0 0 1 1 99
0b 000 00 0 0 0 0 0 1 1 A1
0c 000 00 0 0 0 0 0 1 1 A9
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 0 0 0 0 0 1 1 B1
0f 000 00 0 0 0 0 0 1 1 B9
10 0FF 0F 1 1 0 1 0 1 1 C1
11 0FF 0F 1 1 0 1 0 1 1 C9
12 0FF 0F 1 1 0 1 0 1 1 D1
13 0FF 0F 1 1 0 1 0 1 1 D9
14 000 00 1 0 0 0 0 0 0 00
15 000 00 1 0 0 0 0 0 0 00
16 000 00 1 0 0 0 0 0 0 00
17 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ1 -> 1
IRQ3 -> 3
IRQ4 -> 4
IRQ5 -> 5
IRQ6 -> 6
IRQ7 -> 7
IRQ8 -> 8
IRQ9 -> 9
IRQ10 -> 10
IRQ11 -> 11
IRQ12 -> 12
IRQ13 -> 13
IRQ14 -> 14
IRQ15 -> 15
IRQ16 -> 16
IRQ17 -> 17
IRQ18 -> 18
IRQ19 -> 19
.................................... done.

--==_Exmh_817944993040
Content-Description: foo
Content-Disposition: attachment; filename="foo"
Content-Type: text/plain; name="foo"; charset="us-ascii"

--- traps.c 1999/07/29 12:51:11 1.1
+++ traps.c 1999/07/29 14:45:12
@@ -135,35 +135,26 @@
esp = regs->esp;
ss = regs->xss & 0xffff;
}
- printk("CPU: %d\nEIP: %04x:[<%08lx>]\nEFLAGS: %08lx\n",
- smp_processor_id(), 0xffff & regs->xcs, regs->eip, regs->eflags);
- printk("eax: %08lx ebx: %08lx ecx: %08lx edx: %08lx\n",
- regs->eax, regs->ebx, regs->ecx, regs->edx);
- printk("esi: %08lx edi: %08lx ebp: %08lx esp: %08lx\n",
- regs->esi, regs->edi, regs->ebp, esp);
- printk("ds: %04x es: %04x ss: %04x\n",
- regs->xds & 0xffff, regs->xes & 0xffff, ss);
- store_TR(i);
- printk("Process %s (pid: %d, process nr: %d, stackpage=%08lx)",
- current->comm, current->pid, 0xffff & i, 4096+(unsigned long)current);

/*
* When in-kernel, we also print out the stack and code at the
* time of the fault..
*/
if (in_kernel) {
- printk("\nStack: ");
+ printk("\nStack: ");
stack = (unsigned long *) esp;
- for(i=0; i < kstack_depth_to_print; i++) {
+ for(i=1; i <= kstack_depth_to_print; i++) {
if (((long) stack & 4095) == 0)
break;
if (i && ((i % 8) == 0))
- printk("\n ");
- printk("%08lx ", *stack++);
+ printk("\n ");
+ else
+ printk(" ");
+ printk("%08lx", *stack++);
}
- printk("\nCall Trace: ");
+ printk("\nCall Trace: ");
stack = (unsigned long *) esp;
- i = 1;
+ i = 2;
module_start = PAGE_OFFSET + (max_mapnr << PAGE_SHIFT);
module_start = ((module_start + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1));
module_end = module_start + MODULE_RANGE;
@@ -181,8 +172,10 @@
(addr <= (unsigned long) &_etext)) ||
((addr >= module_start) && (addr <= module_end))) {
if (i && ((i % 8) == 0))
- printk("\n ");
- printk("[<%08lx>] ", addr);
+ printk("\n ");
+ else
+ printk(" ");
+ printk("%08lx", addr);
i++;
}
}
@@ -191,6 +184,23 @@
printk("%02x ", ((unsigned char *)regs->eip)[i]);
}
printk("\n");
+
+ /*
+ * Always print out the crucial information last so it does not get
+ * scrolled off the screen.
+ */
+
+ printk("CPU: %d\nEIP: %04x:[<%08lx>]\nEFLAGS: %08lx\n",
+ smp_processor_id(), 0xffff & regs->xcs, regs->eip, regs->eflags);
+ printk("eax: %08lx ebx: %08lx ecx: %08lx edx: %08lx\n",
+ regs->eax, regs->ebx, regs->ecx, regs->edx);
+ printk("esi: %08lx edi: %08lx ebp: %08lx esp: %08lx\n",
+ regs->esi, regs->edi, regs->ebp, esp);
+ printk("ds: %04x es: %04x ss: %04x\n",
+ regs->xds & 0xffff, regs->xes & 0xffff, ss);
+ store_TR(i);
+ printk("Process %s (pid: %d, process nr: %d, stackpage=%08lx)",
+ current->comm, current->pid, 0xffff & i, 4096+(unsigned long)current);
}

spinlock_t die_lock;

--==_Exmh_817944993040--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/