Re: Commit cb83b62 fails to boot with a divide by zero error.

From: Ingo Molnar
Date: Mon May 14 2012 - 06:48:30 EST



* Robin Holt <holt@xxxxxxx> wrote:

> On Fri, May 11, 2012 at 05:36:13PM +0200, Peter Zijlstra wrote:
> > On Fri, 2012-05-11 at 10:05 -0500, Robin Holt wrote:
> > > On Fri, May 11, 2012 at 04:33:10PM +0200, Peter Zijlstra wrote:
> > > > On Fri, 2012-05-11 at 08:39 -0500, Robin Holt wrote:
> > > >
> > > > > We found that reverting the commit:
> > > > > cb83b62 (x86/sched/core) sched/numa: Rewrite the CONFIG_NUMA sched domain support
> > > > >
> > > > > also got things working.
> > > >
> > > > there's a particularly stupid bug in that code
> > >
> > > Even with that applied, I still get the divide by zero.
> >
> > Humm.. what kind of machine is this? And how far along does it get in
> > booting? ->power isn't supposed to get to 0.
>
> It is a four blade (8 socket 80 core 160 hyper-thread machine)
> with 40 GB of RAM.
>
> Looking at the earlier kernel messages, I am wondering if I
> don't have a BIOS that is giving me crud. I have messages
> about hyperthreads being on different nodes. That had not
> been happening in the past. I don't have access to the
> machine now, but the BIOS string that had printed out is from
> a developer's debug version.
>
> When I get access to the machine again (likely not until
> Monday), I will flash a release BIOS and retest. Until then,
> please feel free to ignore me.

Please don't re-flash the BIOS! We want to fix this bug - the
kernel should never crash on whatever topology data the BIOS
passes.

We can sanitize it or ignore it, but crashing is not an option.
So lets figure this out, ok?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/