Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

From: David Rientjes
Date: Tue Sep 22 2015 - 19:32:45 EST


On Tue, 22 Sep 2015, Tetsuo Handa wrote:

> David Rientjes wrote:
> > Your proposal, which I mostly agree with, tries to kill additional
> > processes so that they allocate and drop the lock that the original victim
> > depends on. My approach, from
> > http://marc.info/?l=linux-kernel&m=144010444913702, is the same, but
> > without the killing. It's unecessary to kill every process on the system
> > that is depending on the same lock, and we can't know which processes are
> > stalling on that lock and which are not.
>
> Would you try your approach with below program?
> (My reproducers are tested on XFS on a VM with 4 CPUs / 2048MB RAM.)
>
> ---------- oom-depleter3.c start ----------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sched.h>
>
> static int zero_fd = EOF;
> static char *buf = NULL;
> static unsigned long size = 0;
>
> static int dummy(void *unused)
> {
> static char buffer[4096] = { };
> int fd = open("/tmp/file", O_WRONLY | O_CREAT | O_APPEND, 0600);
> while (write(fd, buffer, sizeof(buffer) == sizeof(buffer)) &&
> fsync(fd) == 0);
> return 0;
> }
>
> static int trigger(void *unused)
> {
> read(zero_fd, buf, size); /* Will cause OOM due to overcommit */
> return 0;
> }
>
> int main(int argc, char *argv[])
> {
> unsigned long i;
> zero_fd = open("/dev/zero", O_RDONLY);
> for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
> char *cp = realloc(buf, size);
> if (!cp) {
> size >>= 1;
> break;
> }
> buf = cp;
> }
> /*
> * Create many child threads in order to enlarge time lag between
> * the OOM killer sets TIF_MEMDIE to thread group leader and
> * the OOM killer sends SIGKILL to that thread.
> */
> for (i = 0; i < 1000; i++) {
> clone(dummy, malloc(1024) + 1024, CLONE_SIGHAND | CLONE_VM,
> NULL);
> }
> /* Let a child thread trigger the OOM killer. */
> clone(trigger, malloc(4096)+ 4096, CLONE_SIGHAND | CLONE_VM, NULL);
> /* Deplete all memory reserve using the time lag. */
> for (i = size; i; i -= 4096)
> buf[i - 1] = 1;
> return * (char *) NULL; /* Kill all threads. */
> }
> ---------- oom-depleter3.c end ----------
>
> uptime > 350 of http://I-love.SAKURA.ne.jp/tmp/serial-20150922-1.txt.xz
> shows that the memory reserves completely depleted and
> uptime > 42 of http://I-love.SAKURA.ne.jp/tmp/serial-20150922-2.txt.xz
> shows that the memory reserves was not used at all.
> Is this result what you expected?
>

What are the results when the kernel isn't patched at all? The trade-off
being made is that we want to attempt to make forward progress when there
is an excessive stall in an oom victim making its exit rather than
livelock the system forever waiting for memory that can never be
allocated.

I struggle to understand how the approach of randomly continuing to kill
more and more processes in the hope that it slows down usage of memory
reserves or that we get lucky is better.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/