Re: PROBLEM: kernel crashes 2.4.31final + 2.4.32rc1

From: Willy Tarreau
Date: Wed Oct 26 2005 - 23:59:01 EST


Hi Hans & Marcelo,

On Wed, Oct 26, 2005 at 06:39:56AM -0200, Marcelo Tosatti wrote:
> Hi Hans,
>
> On Mon, Oct 24, 2005 at 05:24:14PM +0200, Hans-Jürgen Mehnert wrote:
> > [1.]
> > PROBLEM: kernel crashes 2.4.31final + 2.4.32rc1
> >
> >
> > [2.]
> > Since kernel version 2.4.31 we have system crashes with
> > blank screen and nothing in logs on at least 5 different
> > Intel and AMD SMP machines. 2.4.32rc1 has the same problem,
> > 2.4.30 is working stable on these machines. We had no crashes
> > with the newer kernel versions on about 100 single-cpu machines
> > so far.
>
> Few questions:
>
> - Can you send dmesg output?

you could also try to get messages on the screen by preventing from blanking
using setterm :

# setterm -blank 0

Also, using a serial console to get and log messages can help.

> - Are you using the ahci driver with the Intel SATA controller? There have
> been some bugfixes in ahci during v2.4.31, although they dont look suspect.
> Is there load on the interface?
>
> - Is the USB UHCI controller being used? There are known SMP problems
> in the usb-uhci driver recently fixed (problem introduced in v2.4.28 though).
>
> - What is the workload on these machines?
>
> - Have you tried v2.4.31-pre1, -pre2, and so on to isolate the possibly problematic
> change?

I also saw that at least the machine shown in the report is using grsec. Are
all machines using grsec ? Although I don't believe it would be the cause,
we cannot exclude that a recent bug could have been introduced in it.

And what's the average time before a crash ? hours, days, weeks ?

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/