Re: 2.6.12-rc2-mm3

From: Andrew Morton
Date: Mon Apr 11 2005 - 17:30:07 EST


Borislav Petkov <petkov@xxxxxxxxxxxxxxx> wrote:
>
> On Monday 11 April 2005 11:43, Andrew Morton wrote:
> > (Please do reply-to-all)
> >
> > "J.A. Magallon" <jamagallon@xxxxxxx> wrote:
> > > On 04.11, Andrew Morton wrote:
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-r
> > > >c2/2.6.12-rc2-mm3/
> > >
> > > Is this not needed anymore ?
> > >
> > > --- 25/arch/i386/kernel/entry.S~nmi_stack_correct-fix 2005-04-05
> > > 00:02:48.000000000 -0700 +++ 25-akpm/arch/i386/kernel/entry.S 2005-04-05
> > > 00:02:48.000000000 -0700
> >
> > Hopefully not. fix-crash-in-entrys-restore_all.patch works around the
> > problem. -
>
> Hello Andrew,
> I don't know whether you remember the mysterious crashes I was telling you
> about last week and me rookiesh-ly trying to debug them with kgdb over the
> serial console. Well, today I tried for the n-th time again and after rc2-mm3
> blocked again while loading, here's what I did:
>
> <snip>
> [ 12.335438] NET: Registered protocol family 17
> [ 12.362483] Testing NMI watchdog ... OK.
> [ 12.416195] Starting balanced_irq
> [ 12.443099] VFS: Mounted root (ext2 filesystem) readonly.
> [ 12.472490] Freeing unused kernel memory: 196k freed
> [ 12.521004] logips2pp: Detected unknown logitech mouse model 1
> [ 12.572581] Warning: unable to open an initial console.
> [ 12.972518] input: PS/2 Logitech Mouse on isa0060/serio1
>
> Program received signal SIGTRAP, Trace/breakpoint trap.
> 0xc0102ee7 in resume_kernelX () at atomic.h:175 <--- this one is wrong for a
> mysterious reason
> 175 {
> (gdb) p $eip
> $1 = (void *) 0xc0102ee7
>
> (gdb) disas 0xc0102ee7
> Dump of assembler code for function resume_kernelX:
> 0xc0102ee7 <resume_kernelX+0>: mov 0x30(%esp),%eax
> 0xc0102eeb <resume_kernelX+4>: mov 0x38(%esp),%ah
> 0xc0102eef <resume_kernelX+8>: mov 0x2c(%esp),%al
> 0xc0102ef3 <resume_kernelX+12>: and $0x20403,%eax
> 0xc0102ef8 <resume_kernelX+17>: cmp $0x403,%eax
> 0xc0102efd <resume_kernelX+22>: je 0xc0102f0c <ldt_ss>
> End of assembler dump.
> (gdb)
>
> And as we see, we're at the "mov 0x30(%esp),%eax" which accesses above the
> bottom of the stack. After applying nmi_stack_correct-fix.patch, rc2-mm3
> booted just fine, so I IMHO think that we might still be needing this, after
> all.

Interesting. It could be an interaction between the kgdb patch and the new
vm86 checking code. (looks. I don't think that's the case).

Stas, could you please take a look at 2.6.12-rc2-mm3's entry.S sometime,
see if you think my theory is correct?

It seems that you have CONFIG_TRAP_BAD_SYSCALL_EXITS enabled - I can't say
that I've ever used that, and I really should remove it. But I doubt if
that is the cause of this bug.


The above code is accessing esp+56, but Stas's patch only offsets the stack
pointer by 32 bytes, so I assume this, in copy_thread():

- p->thread.esp0 = (unsigned long) (childregs+1) - 8;
+ p->thread.esp0 = (unsigned long) (childregs+1) - 15;

fixes it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/