Re: recursive fault in 2.6.35.5

From: Mike Galbraith
Date: Tue May 31 2011 - 22:01:53 EST


On Tue, 2011-05-31 at 10:24 -0400, Whit Blauvelt wrote:
> On Mon, May 30, 2011 at 04:48:29AM +0200, Mike Galbraith wrote:
>
> > No, you've been bitten by an annoyingly elusive load balancing bug.
>
> Thanks Mike. Can that bug be avoided by leaving out some kernel option? The
> system that happened on had it's identical twin fail the day before. For
> both, it was a time of relatively more load (although not excessive). On the
> twin we didn't look at the console before rebooting though.
>
> On the other hand, we'd run for months with no problem up until this.

No earthly notion. I never figured out exactly how it happens. Setting
traps for the critter didn't worked out. I did receive some diagnostic
info from a group of ppc64 boxen that indicated that the clock went
backward, but when I zeroed in on it, it they went silent. All other
machines with traps set have been totally silent for months (that's a
lot of machines too).

Bug seems to be dead upstream, at least I haven't noticed any reports
with a recent kernel.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/