Re: [PATCH -mm -repost] memcg: do not hang on OOM when killed by userspace OOM access to memory reserves

From: David Rientjes
Date: Wed Apr 23 2014 - 19:28:30 EST


On Wed, 23 Apr 2014, Michal Hocko wrote:

> Eric has reported that he can see task(s) stuck in memcg OOM handler
> regularly. The only way out is to
>
> echo 0 > $GROUP/memory.oom_controll
>
> His usecase is:
>
> - Setup a hierarchy with memory and the freezer (disable kernel oom and
> have a process watch for oom).
>
> - In that memory cgroup add a process with one thread per cpu.
>
> - In one thread slowly allocate once per second I think it is 16M of ram
> and mlock and dirty it (just to force the pages into ram and stay
> there).
>
> - When oom is achieved loop:
> * attempt to freeze all of the tasks.
> * if frozen send every task SIGKILL, unfreeze, remove the directory in
> cgroupfs.
>
> Eric has then pinpointed the issue to be memcg specific.
>
> All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
> Those that have received fatal signal will bypass the charge and should
> continue on their way out. The tricky part is that the exit path might
> trigger a page fault (e.g. exit_robust_list), thus the memcg charge,
> while its memcg is still under OOM because nobody has released any charges
> yet.
>
> Unlike with the in-kernel OOM handler the exiting task doesn't get
> TIF_MEMDIE set so it doesn't shortcut further charges of the killed task
> and falls to the memcg OOM again without any way out of it as there are no
> fatal signals pending anymore.
>
> This patch fixes the issue by checking PF_EXITING early in
> mem_cgroup_try_charge and bypass the charge same as if it had fatal
> signal pending or TIF_MEMDIE set.
>
> Normally exiting tasks (aka not killed) will bypass the charge now but
> this should be OK as the task is leaving and will release memory and
> increasing the memory pressure just to release it in a moment seems
> dubious wasting of cycles. Besides that charges after exit_signals should
> be rare.
>
> Reported-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Acked-by: David Rientjes <rientjes@xxxxxxxxxx>

I think we should wait for a Tested-by from Eric if this is going to be
backported to stable, though, to meet the criteria.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/