Re: [PATCH] fix spurious OOM kills

From: Marcelo Tosatti
Date: Sun Nov 14 2004 - 08:51:53 EST


On Sun, Nov 14, 2004 at 07:44:17AM -0200, Marcelo Tosatti wrote:
> On Sun, Nov 14, 2004 at 12:37:40AM +0100, Andrea Arcangeli wrote:
> > On Fri, Nov 12, 2004 at 05:52:21PM +0100, Chris Ross wrote:
> > >
> > >
> > > Chris Ross escreveu:
> > > >It seems good.
> > >
> > > Sorry Marcelo, I spoke to soon. The oom killer still goes haywire even
> > > with your new patch. I even got this one whilst the machine was booting!
> >
> > On monday I'll make a patch to place the oom killer at the right place.
> >
> > Marcelo's argument that kswapd is a localized place isn't sound to me,
> > kswapd is still racing against all other task contexts, so if the task
> > context isn't reliable, there's no reason why kswapd should be more
> > reliable than the task context. the trick is to check the _right_
> > watermarks before invoking the oom killer, it's not about racing against
> > each other, 2.6 is buggy in not checking the watermarks. Moving the oom
> > killer in kswapd can only make thing worse, fix is simple, and it's the
> > opposite thing: move the oom killer up the stack outside vmscan.c.
>
> Its hard to detect OOM situation with zone->all_unreclaimable logic.
>
> Well, I'll wait for your correct and definitive approach.

Take zone->all_unreclaimable into account when you move oom_kill in page_alloc.c,
which I now think might be the simpler fix.

shrink_caches() will fail early due to all_unreclaimable() logic (it wont
scan/writeout at lower priorities):

if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */

I disabled all_unreclaimable after 5 seconds allowed kswapd to scan
the full zone and reliably detect OOM in my kill-from-kswapd patch -
you might want something similar.

That seems one the main reasons for the spurious OOM kills.

Anxious to see your patch!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/