Re: 2.6.12.2 dies after 24 hours

From: Bron Gondwana
Date: Tue Jul 12 2005 - 08:54:08 EST



On Tue, 12 Jul 2005 14:13:01 +0200, "Lars Roland" <lroland@xxxxxxxxx>
said:
> You have irq balancing, the line
>
> CONFIG_IRQBALANCE=y
>
> in your config file confirms it - I am not completely sure that it is
> the root of the problem but when I experienced the problem I changed
> two things: my acpi code and irq balancing and one of then made the
> difference, I am just to lazy to check which one it is (also it is
> production servers so I cannot do whatever I want).

Our ACPI looked very similar to yours, so I've disabled IRQBALANCE.

We'll be rebooting the server during a less busy time to try the new
kernel, so not for about 12 hours or so.

> > append="elevator=deadline"
>
> I use the same io scheduler so that should not be a problem. I have
> uploaded my config file - it works on ibm 335/336 servers, and a quick
> look at your boot msg seams to indicate that your server have some of
> the same hardware - note however that I load ide/scsi/filesystem stuff
> as modules so you will need to build a initrd to use my config.
>
> the config is here
>
> http://randompage.org/static/kernel.conf

Great, thanks for that. I've had a look through and I think the
IRQBALANCE issue is the most likely cause.

We're also applying the attached patch. There's a bug in reiserfs that
gets tickled by our huge MMAP usage (it's amazing what really busy
Cyrus daemons can do to a server, ouch). It's fixed in generic_write,
so we take the few percent performance hit for something that doesn't
break!

Bron.
--
Bron Gondwana
brong@xxxxxxxxxxx

Attachment: patch-2.6.12.2-reiserfix.bz2
Description: Binary data