Re: [PATCH] oom: always panic on OOM when panic_on_oom is configured
From: David Rientjes
Date: Mon Jun 08 2015 - 15:52:08 EST
On Fri, 5 Jun 2015, Michal Hocko wrote:
> > Nack, this is not the appropriate response to exit path livelocks. By
> > doing this, you are going to start unnecessarily panicking machines that
> > have panic_on_oom set when it would not have triggered before. If there
> > is no reclaimable memory and a process that has already been signaled to
> > die to is in the process of exiting has to allocate memory, it is
> > perfectly acceptable to give them access to memory reserves so they can
> > allocate and exit. Under normal circumstances, that allows the process to
> > naturally exit. With your patch, it will cause the machine to panic.
>
> Isn't that what the administrator of the system wants? The system
> is _clearly_ out of memory at this point. A coincidental exiting task
> doesn't change a lot in that regard. Moreover it increases a risk of
> unnecessarily unresponsive system which is what panic_on_oom tries to
> prevent from. So from my POV this is a clear violation of the user
> policy.
>
We rely on the functionality that this patch is short cutting because we
rely on userspace to trigger oom kills. For system oom conditions, we
must then rely on the kernel oom killer to set TIF_MEMDIE since userspace
cannot grant it itself. (I think the memcg case is very similar in that
this patch is short cutting it, but I'm more concerned for the system oom
in this case because it's a show stopper for us.)
We want to send the SIGKILL, which will interrupt things like
get_user_pages() which we find is our culprit most of the time. When the
process enters the exit path, it must allocate other memory (slab,
coredumping and the very problematic proc_exit_connector()) to free
memory. This patch would cause the machine to panic rather than utilizing
memory reserves so that it can exit, not as a result of a kernel oom kill
but rather a userspace kill.
Panic_on_oom is to suppress the kernel oom killer. It's not a sysctl that
triggers whenever watermarks are hit and it doesn't suppress memory
reserves from being used for things like GFP_ATOMIC. Setting TIF_MEMDIE
for an exiting process is another type of memory reserves and is
imperative that we have it to make forward progress. Panic_on_oom should
only trigger when the kernel can't make forward progress without killing
something (not true in this case). I believe that's how the documentation
has always been interpreted and the tunable used in the wild.
It would be interesting to consider your other patch that refactors the
sysrq+f tunable. I think we should make that never trigger panic_on_oom
(the sysadmin can use other sysrqs for that) and allow userspace to use
sysrq+f as a trigger when it is responsive to handle oom conditions.
But this patch itself can't possibly be merged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/