Re: 2.4.20 instability on bigmem systems?

From: Gregory K. Ruiz-Ade (gregory@castandcrew.com)
Date: Sun Mar 16 2003 - 21:15:11 EST


On Friday 14 March 2003 12:08, William Lee Irwin III wrote:
> On Fri, Mar 14, 2003 at 09:31:15AM -0800, Gregory K. Ruiz-Ade wrote:
> > Ahh. I was a bit out of it yesterday, and didn't think to actually
> > stress the machine. :\
> > I'll be able to give it a good beating this weekend sometime.
>
> cc: me when you post those results.

Okay, I tried to load the system a bit and stress out the disk I/O, running
a couple finds across the whole system (find | xargs stat, find | xargs cat
> /dev/null, a couple other things) after sucking up free memory by catting
our database disk files to /dev/null. I also had a 'make -j5 clean
oldconfig dep bzImage modules' running to try to drive the load up a bit,
too.

I've got snapshots of meminfo, slabinfo, and output from 'ps auxfww' at:

http://castandcrew.com/~gregory/lkmlstuff/burpr/2.4.20/loadtest/

It only really starts getting interesting after 20030316.1725, when I
started the kernel build. I have a very simple shell script that basically
does nothing other than "make clean oldconfig dep && make -j5 bzImage &&
make -j5 modules". I ran that a couple times in the sources for Red Hat's
2.4.9-e.12 kernel sources.

Surprisingly I wasn't able to grind down the system like I expected. Not
sure why it's behaving so wonderfully today.

It crashed again on Friday night, running 2.4.19. The only information I
was able to get was a kernel BUG message on the serial console (I
ksymoops'ed it after rebooting). From what I could tell after the fact,
nothing was really running. Several scripts got fired off by cron, which
check various things (mainly to make sure certain services are still
running), and around then is when the system locked up. The info I have
for that crash is at:

http://castandcrew.com/~gregory/lkmlstuff/burpr/2.4.19

As it is, I'm going to try running on Red Hat's 2.4.9-e.12 sources this
week. I'm compiling the kernel right now, and will be rebooting into it
shortly.

This has been quite the week of headaches for me.

-- 
Gregory K. Ruiz-Ade <gregory@castandcrew.com>
Sr. Systems Administrator
Cast & Crew Entertainment Services, Inc.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:19 EST