Re: ftp server crashes on heavy load: possible scheduler bug

From: Pedro Venda (SYSADM)
Date: Fri Apr 29 2005 - 09:24:51 EST


On Friday 29 April 2005 13:08, Andrew Morton wrote:
> "Pedro Venda (SYSADM)" <pjvenda@xxxxxxxxxxxxxx> wrote:
> > We've made some changes on our ftp server, and since that it's been
> > crashing frequently (everyday) with a kernel panic.
> >
> > We've configured the 5 IDE 160GB drives into md raid5 arrays with LVM on
> > top of that. All filesystems are reiserfs. The other change we made to
> > the server was changing from a patched 2.6.10-ac12 kernel into a newer
> > 2.6.11.7.
> >
> > Not being able to see the whole stacktrace on screen, we've started a
> > netconsole to investigate. Started the server and loaded it pretty bad
> > with rsyncs and such... until it crashed after just 20 minutes.
> >
> > The netconsole log was surprising - "kernel BUG at kernel/sched.c:2634!"
>
> Strange. It'd be interesting to try disabling CONFIG_4KSTACKS. Also,
> please add this to get a bit more info.

hi,

I'll try that. Should I do it with or without preemption?

regards,
pedro venda.

>
> diff -puN kernel/sched.c~a kernel/sched.c
> --- 25/kernel/sched.c~a 2005-04-29 05:05:24.792004408 -0700
> +++ 25-akpm/kernel/sched.c 2005-04-29 05:06:36.015176840 -0700
> @@ -2631,7 +2631,12 @@ void fastcall add_preempt_count(int val)
> /*
> * Underflow?
> */
> - BUG_ON(((int)preempt_count() < 0));
> + if ((int)preempt_count() < 0) {
> + printk("preempt_count=%d\n", preempt_count());
> + BUG();
> + }
> + if ((int)preempt_count() > 1000)
> + printk("preempt_count=%d\n", preempt_count());
> preempt_count() += val;
> /*
> * Spinlock count overflowing soon?
> _

--

Pedro João Lopes Venda
email: pjvenda < at > rnl.ist.utl.pt
http://maxwell.rnl.ist.utl.pt

Equipa de Administração de Sistemas
Rede das Novas Licenciaturas (RNL)
Instituto Superior Técnico
http://www.rnl.ist.utl.pt
http://mega.ist.utl.pt

Attachment: pgp00000.pgp
Description: PGP signature