Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit

From: Jens Axboe
Date: Tue Aug 24 2004 - 05:34:30 EST


On Tue, Aug 24 2004, Karl Vogel wrote:
> > > > Original post with testcase + stats:
> > > > http://article.gmane.org/gmane.linux.kernel/228156
> > >
> > > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > > 2.6.8-rc4 report is not valid due to the fixed problem
> > related to that
> > > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> > >
>
> Did some extra testing yesterday. When not running X or anything
> substantial, I'm able to trigger it after running the expunge 2 or
> 3 times in a row.
> If I increase the calloc size, it triggers faster (tried with 1Gb
> calloc on a 512Mb box with 1Gb swap partition).

I'll try increasing the size.

> The first expunge run, completes fine. The ones after that, get
> OOM killed and I get a printk about page allocation order 0 failure.
>
> The 2.6.8.1-mm4 was a clean version, but I will double check this,
> this evening.
>
> I also tried with deadline, but was unable to trigger it.

I'm adding preempt to the mix, maybe that'll help provoke it.

> > Oh, and please do also do a sysrq-t from a hung box and save
> > the output.
>
> Note: the box doesn't hang completely. Just some processes get stuck
> in 'D' and the machine swaps heavily.

That's fine, I'd like a sysrq-t of that.

> The tests of yesterday evening, did recover. So I'm guessing if I had
> waited long enough the box would have recovered on the previous
> tests. Looking at the vmstat from my previous tests, shows that the
> box was low on memory (free/buff/cache are all very low):
>
> http://users.telenet.be/kvogel/vmstat-after-kill.txt
>
> That was probably why it was swapping like mad.

Ok, so now I'm confused - tests on what kernel recovered?

> Will provide you with that sysrq-t this evening.

Great.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/