Re: ftp server crashes on heavy load: possible scheduler bug

From: Pedro Venda (SYSADM)
Date: Fri Apr 29 2005 - 10:09:38 EST


On Friday 29 April 2005 15:32, Pedro Venda (SYSADM) wrote:
> On Friday 29 April 2005 13:59, Alexander Nyberg wrote:
> > > We've made some changes on our ftp server, and since that it's been
> > > crashing frequently (everyday) with a kernel panic.
> > >
> > > We've configured the 5 IDE 160GB drives into md raid5 arrays with LVM
> > > on top of that. All filesystems are reiserfs. The other change we made
> > > to the server was changing from a patched 2.6.10-ac12 kernel into a
> > > newer 2.6.11.7.
> > >
> > > Not being able to see the whole stacktrace on screen, we've started a
> > > netconsole to investigate. Started the server and loaded it pretty bad
> > > with rsyncs and such... until it crashed after just 20 minutes.
> > >
> > > The netconsole log was surprising - "kernel BUG at
> > > kernel/sched.c:2634!"
> > >
> > > Any help would be GREATLY appreciated.
> >
> > 5 IDE disks into one MD raid5 into one LVM volume with reiserfs on top
> > of it? Could you give me some way to reproduce the specific load you put
> > on the machine plus your .config and I'll see what I can do.

forgot about the load:

It's an ftp server. official mirror for ubuntu linux, gentoo linux, eclipse
SDK, etc. see ftp://ftp.rnl.ist.utl.pt

We also provide rsync service for gentoo portage users. The load tests we
performed that eventually crashed the machine were about 10 simultaneous
rsyncs pulling information from the server, the server was uploading with
rsync via ssh at 10MB/s (100Mbit nic) and some spurious ftp download clients.
It lasted about 30 minutes. With normal load, it lasts for about 12-14 hours.

I think it crashes better with rsync clients... it usually crashed at around
00h, when some cronjobs fired rsyncs...

regards,
pedro venda.

>
> ok, current setup is:
>
> 5 160GB IDE drives spread across 2 controller cards, one onboard and a PCI
> promise ultra133 (Promise Technology, Inc. 20269 (rev 02)). All but two of
> the drives are alone in an IDE channel.
>
> boot and system partitions are mirrored [raid1] across all drives (not that
> silly because of uniform partitioning scheme for raid5 array).
>
> About 150GB of each drive belong to a RAID5 array with total size of 576.73
> GB.
>
> that raid5 array supports two LVM volumes of 320 and 256GB.
>
> df -h:
> Filesystem Size Used Avail Use% Mounted on
> /dev/md5 19G 2.6G 17G 14% /
> /dev/big/ftp 320G 260G 61G 82% /home/ftp
> /dev/big/other 257G 149G 108G 58% /home/other
> none 252M 0 252M 0% /dev/shm
> /dev/md1 69M 8.7M 57M 14% /boot
>
> lvscan:
> ACTIVE '/dev/big/ftp' [320.00 GB] inherit
> ACTIVE '/dev/big/other' [256.73 GB] inherit
>
> pvscan:
> PV /dev/md6 VG big lvm2 [576.73 GB / 0 free]
> Total: 1 [576.73 GB] / in use: 1 [576.73 GB] / in no VG: 0 [0 ]
>
> mdstat:
> Personalities : [raid1] [raid5]
> md2 : active raid1 hdc2[1] hda2[0]
> 136448 blocks [2/2] [UU]
> md7 : active raid1 hdg2[1] hde2[0]
> 136448 blocks [2/2] [UU]
> md1 : active raid1 hdh1[4] hdg1[3] hde1[2] hdc1[1] hda1[0]
> 72192 blocks [5/5] [UUUUU]
> md5 : active raid5 hdh5[4] hdg5[3] hde5[2] hdc5[1] hda5[0]
> 19566592 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> md6 : active raid5 hdh6[4] hdg6[3] hde6[2] hdc6[1] hda6[0]
> 604750336 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> unused devices: <none>
>
> hope it helps.
>
> regards,
> pedro venda.

--

Pedro João Lopes Venda
email: pjvenda < at > rnl.ist.utl.pt
http://maxwell.rnl.ist.utl.pt

Equipa de Administração de Sistemas
Rede das Novas Licenciaturas (RNL)
Instituto Superior Técnico
http://www.rnl.ist.utl.pt
http://mega.ist.utl.pt

Attachment: pgp00000.pgp
Description: PGP signature