Re: [RFC 1/4] mm, oom: do not rely on TIF_MEMDIE for memory reserves access

From: Tetsuo Handa
Date: Sat Sep 03 2016 - 21:51:41 EST


Michal Hocko wrote:
> @@ -816,7 +816,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>
> /*
> * If the task is already exiting, don't alarm the sysadmin or kill
> - * its children or threads, just set TIF_MEMDIE so it can die quickly
> + * its children or threads, just give it access to memory reserves
> + * so it can die quickly
> */
> task_lock(p);
> if (task_will_free_mem(p)) {
> @@ -876,9 +877,9 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
> mm = victim->mm;
> atomic_inc(&mm->mm_count);
> /*
> - * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> - * the OOM victim from depleting the memory reserves from the user
> - * space under its control.
> + * We should send SIGKILL before granting access to memory reserves
> + * in order to prevent the OOM victim from depleting the memory
> + * reserves from the user space under its control.
> */

Removing TIF_MEMDIE usage inside comments can be done as a clean up
before this series.



> @@ -3309,6 +3318,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> return alloc_flags;
> }
>
> +static bool oom_reserves_allowed(struct task_struct *tsk)
> +{
> + if (!tsk_is_oom_victim(tsk))
> + return false;
> +
> + /*
> + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves
> + * depletion and shouldn't give access to memory reserves passed the
> + * exit_mm
> + */
> + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm)
> + return false;
> +
> + return true;
> +}
> +

Are you aware that you are trying to make !MMU kernel's allocations not only
after returning exit_mm() but also from __mmput() from mmput() from exit_mm()
fail without allowing access to memory reserves? The comment says only after
returning exit_mm(), but this change is not.



> @@ -3558,8 +3593,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> goto nopage;
> }
>
> - /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> + /* Avoid allocations for oom victims from looping endlessly */
> + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
> goto nopage;

This change increases possibility of giving up without trying ALLOC_OOM
(more allocation failure messages), for currently only one thread which
remotely got TIF_MEMDIE when it was between gfp_to_alloc_flags() and
test_thread_flag(TIF_MEMDIE) will give up without trying ALLOC_NO_WATERMARKS
while all threads which remotely got current->signal->oom_mm when they were
between gfp_to_alloc_flags() and test_thread_flag(TIF_MEMDIE) will give up
without trying ALLOC_OOM. I think we should make sure that ALLOC_OOM is
tried (by using a variable which remembers whether
get_page_from_freelist(ALLOC_OOM) was tried).

We are currently allowing TIF_MEMDIE threads try ALLOC_NO_WATERMARKS for
once and give up without invoking the OOM killer. This change makes
current->signal->oom_mm threads try ALLOC_OOM for once and give up without
invoking the OOM killer. This means that allocations for cleanly cleaning
up by oom victims might fail prematurely, but we don't want to scatter
around __GFP_NOFAIL. Since there are reasonable chances of the parallel
memory freeing, we don't need to give up without invoking the OOM killer
again. I think that

- /* Avoid allocations with no watermarks from looping endlessly */
- if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+#ifndef CONFIG_MMU
+ /* Avoid allocations for oom victims from looping endlessly */
+ if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
+ goto nopage;
+#endif

is possible.