Re: 2.5.62-mjb3 (scalability / NUMA patchset)

From: Mark Haverkamp (markh@osdl.org)
Date: Wed Feb 26 2003 - 17:49:02 EST


On Wed, 2003-02-26 at 07:55, Martin J. Bligh wrote:
> >> The patchset contains mainly scalability and NUMA stuff, and anything
> >> else that stops things from irritating me. It's meant to be pretty
> >> stable, not so much a testing ground for new stuff.
> >>
> >> I'd be very interested in feedback from anyone willing to test on any
> >> platform, however large or small.
> >>
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/2.5.62/patch-2.5.62-
> >> mjb 3.bz2
> >>
> >
> > Martin,
> >
> > I have been seeing system hangs on my 16 processor numaq while running
> > contest. The system will hang within a few seconds to half an hour.
> > Unfortunately there is no stack trace or any other indication on the
> > system console. I have been running your 2.5.62-mjb2 without problems
> > previously. Any ideas what I can do to narrow this down?
>
> Humpf. Can you try backing out this patch (it caused me similar problems on
> 59, but seemed fine in 62). I suspect it's just changing timing enough that
> we hit some other bug ... if you could, would be nice to try the ALT+SYSRQ
> stuff, or turn on NMI watchdogs and get a backtrace ... I've not been able
> to reproduce this on recent kernels.
>
> Thanks,
>
> M.
>
> diff -urpN -X /home/fletch/.diff.exclude
> 330-no_kirq/include/asm-i386/mach-numaq/mach_mpparse.h
> 340-auto_disable_tsc/include/asm-i386/mach-numaq/mach_mpparse.h
> --- 330-no_kirq/include/asm-i386/mach-numaq/mach_mpparse.h Fri Jan 17
> 09:18:31 2003
> +++ 340-auto_disable_tsc/include/asm-i386/mach-numaq/mach_mpparse.h Mon Feb
> 24 08:14:42 2003
> @@ -32,6 +32,7 @@ static inline void mps_oem_check(struct
> if (mpc->mpc_oemptr)
> smp_read_mpc_oem((struct mp_config_oemtable *) mpc->mpc_oemptr,
> mpc->mpc_oemsize);
> + tsc_disable=1;
> }
>
> /* Hook from generic ACPI tables.c */
>

I turned on NMI watchdogs and when the system hung, I saw no output. My
serial console is through a terminal server that isn't set up to pass
along the sysrq, so I need to get this fixed. In any case I backed out
the patch that you suggested and I have had no system hangs since.

Mark.

-- 
Mark Haverkamp <markh@osdl.org>

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Feb 28 2003 - 22:00:39 EST