Re: [RFC] [PATCH] Cgroup based OOM killer controller

From: David Rientjes
Date: Tue Jan 27 2009 - 15:38:49 EST


On Tue, 27 Jan 2009, Evgeniy Polyakov wrote:

> > There is no additional oom killer limitation imposed here, nor can the oom
> > killer kill a task hung in D state any better than userspace.
>
> Well, oom-killer can, since it drops unkillable state from the process
> mask, that may be not enough though, but it tries more than userspace.
>

The only thing it does is send a SIGKILL and gives the thread access to
memory reserves with TIF_MEMDIE, it doesn't drop any unkillable state. If
its victim is hung in D state and the memory reserves do not allow it to
return to being runnable, this task will not die and the oom killer would
livelock unless given another target.

> My main point was to haev a way to monitor memory usage and that any
> process could tune own behaviour according to that information. Which is
> not realated to the system oom-killer at all. Thus /dev/mem_notify is
> interested first (and only the first) as a memory usage notification
> interface and not a way to invoke any kind of 'soft' oom-killer.

It's a way to prevent invoking the kernel oom killer by allowing userspace
notification of events where methods such as droping caches, elevating
limits, adding nodes, sending signals, etc, can prevent such a problem.
When the system (or cgroup) is completely oom, it can also issue SIGKILLs
that will free some memory and preempt the oom killer from acting.

I think there might be some confusion about my proposal for extending
/dev/mem_notify. Not only should it notify of certain low memory events,
but it should also allow userspace notification of oom events, just like
the cgroup oom notifier patch allowed. Instead of attaching a task to a
cgroup file in that case, however, this would simply be the responsibility
of a task that has set up a poll() on the cgroup's mem_notify file. A
configurable delay could be imposed so page allocation attempts simply
loop while the userspace handler responds and then only invoke the oom
killer when absolutely necessary.

> Application can do whatever it wants of course including killing itself
> or the neighbours, but this should not be forced as a usage policy.
>

If preference killing is your goal, then userspace can do it with the
/dev/mem_notify functionality.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/