Re: [PATCH] fix spurious OOM kills

From: Andrea Arcangeli
Date: Sun Nov 14 2004 - 12:05:21 EST


On Sun, Nov 14, 2004 at 07:44:17AM -0200, Marcelo Tosatti wrote:
> Well, I'll wait for your correct and definitive approach.

I doubt my patch will be definitive and you're welcome to keep hacking
without waiting ;). There are various problems, one issue is the
try_to_free_pages side, the other severe and obvious bug is the
invocation of the oom killer in vmscan.c that cannot know if enough
memory is already free via racing tasks (like other context and like
kswapd).

I'm looking at the latter first since that is easy to fix, but as you
said the zone->all_unreclaimable is hard and it may be buggy too, so I
doubt my fix will be definitive ;) but it'll worth a try. Feel free to
work on the zone->all_unreclaimable for example.

Especially the fact your patch didn't help make me think my firts patch
also won't fix it and we'll have to look into the try_to_free_pages
internals to fix it completely.

My patch compared to yours will only save .text/.data/.bss bloat (i.e.
the opposite of what Martin was worried about) to avoid message passing
via global variable w/o locks from task context to kswapd.

Chris, since you can reproduce so easily, could you try a `vmstat 1`
while the oom killing happens, and can you post it?

We've also to differentiate between true early-oom kills, and genuine
oom-kills. The oom killing triggering is not always by mistake ;). Chris
if you could post the vmstat 1 that would help to be sure it's really a
kernel bug (if you already posted it in another thread just let me know
and I'll search for such an email ;). Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/