Re: [RFC 1/4] mm, oom: do not rely on TIF_MEMDIE for memory reserves access

From: Michal Hocko
Date: Fri Sep 09 2016 - 10:00:28 EST


On Sun 04-09-16 10:49:42, Tetsuo Handa wrote:
> Michal Hocko wrote:
[...]
> > @@ -3309,6 +3318,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > return alloc_flags;
> > }
> >
> > +static bool oom_reserves_allowed(struct task_struct *tsk)
> > +{
> > + if (!tsk_is_oom_victim(tsk))
> > + return false;
> > +
> > + /*
> > + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves
> > + * depletion and shouldn't give access to memory reserves passed the
> > + * exit_mm
> > + */
> > + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm)
> > + return false;
> > +
> > + return true;
> > +}
> > +
>
> Are you aware that you are trying to make !MMU kernel's allocations not only
> after returning exit_mm() but also from __mmput() from mmput() from exit_mm()
> fail without allowing access to memory reserves?

Do we allocate from that path in !mmu and would that be more broken than
with the current code which clears TIF_MEMDIE after mmput even when
__mmput is not called (aka somebody is holding a reference to mm - e.g.
a proc file)?

> The comment says only after returning exit_mm(), but this change is
> not.

I can see that the comment is not ideal. Any suggestion how to make it
better?

> > @@ -3558,8 +3593,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > goto nopage;
> > }
> >
> > - /* Avoid allocations with no watermarks from looping endlessly */
> > - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > + /* Avoid allocations for oom victims from looping endlessly */
> > + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
> > goto nopage;
>
> This change increases possibility of giving up without trying ALLOC_OOM
> (more allocation failure messages), for currently only one thread which
> remotely got TIF_MEMDIE when it was between gfp_to_alloc_flags() and
> test_thread_flag(TIF_MEMDIE) will give up without trying ALLOC_NO_WATERMARKS
> while all threads which remotely got current->signal->oom_mm when they were
> between gfp_to_alloc_flags() and test_thread_flag(TIF_MEMDIE) will give up
> without trying ALLOC_OOM. I think we should make sure that ALLOC_OOM is
> tried (by using a variable which remembers whether
> get_page_from_freelist(ALLOC_OOM) was tried).

Technically speaking you are right but I am not really sure that this
matters all that much. This code as always been racy. If we ever
consider the race harmfull we can reorganize the allo slow path in a way
to guarantee at least one allocation attempt with ALLOC_OOM I am just
not sure it is necessary right now. If this ever shows up as a problem
we would see a flood of allocation failures followed by the OOM report
so it would be quite easy to notice.

> We are currently allowing TIF_MEMDIE threads try ALLOC_NO_WATERMARKS for
> once and give up without invoking the OOM killer. This change makes
> current->signal->oom_mm threads try ALLOC_OOM for once and give up without
> invoking the OOM killer. This means that allocations for cleanly cleaning
> up by oom victims might fail prematurely, but we don't want to scatter
> around __GFP_NOFAIL. Since there are reasonable chances of the parallel
> memory freeing, we don't need to give up without invoking the OOM killer
> again. I think that
>
> - /* Avoid allocations with no watermarks from looping endlessly */
> - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> +#ifndef CONFIG_MMU
> + /* Avoid allocations for oom victims from looping endlessly */
> + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL))
> + goto nopage;
> +#endif
>
> is possible.

I would prefer to not spread out MMU ifdefs all over the place.

--
Michal Hocko
SUSE Labs