Re: Maybe a VM bug in 2.4.18-18 from RH 8.0?

From: GrandMasterLee (masterlee@digitalroadkill.net)
Date: Fri Dec 06 2002 - 11:19:58 EST


On Fri, 2002-12-06 at 01:51, Andrew Morton wrote:
> GrandMasterLee wrote:
> >
> > ...
> > > "crashes"? kernel, or application? What additional info is
> > > available?
> >
> > Machine will panic. I've actually captured some and sent them to this
> > list, but I've been told that my stack was corrupt.
>
> OK. In your second oops trace the `swapper' process had used 5k of its
> 8k kernel stack processing an XFS IO completion interrupt. And I don't
> think `swapper' uses much stack of its own.

The second Oops is the *best* one IMO. I got it just over 7 days. (like
7 days 6 hours or something. I've still been testing the crud out of
this kernel on like hardware, and can't reproduce it. I'd love to know a
method for reproducing this for my beta environment.

> If some other process happens to be using 3k of stack when the same
> interrupt hits it, it's game over.
>
> So at a guess, I'd say you're being hit by excessive stack use in
> the XFS filesystem. I think the XFS team have done some work on that
> recently so an upgrade may help.

Since we run ~1TB dbs on the systems, and a LOT of IO, and Qlogic
drivers, I think that's the culprit. Will swapper use less stack in more
recent kernels?(XFS will be updated as part of a plan for the new year
I'm putting together. Till then, it's reboot every 7 days)

> Or it may be something completely different ;)

I hope not. :)

--The GrandMaster

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Dec 07 2002 - 22:00:26 EST