Re: [patch 1/2] mm, memcg: avoid oom notification when current needsaccess to memory reserves

From: David Rientjes
Date: Tue Dec 17 2013 - 15:50:23 EST


On Tue, 17 Dec 2013, Michal Hocko wrote:

> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index c72b03bf9679..fee25c5934d2 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2692,7 +2693,8 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
> > > * MEMDIE process.
> > > */
> > > if (unlikely(test_thread_flag(TIF_MEMDIE)
> > > - || fatal_signal_pending(current)))
> > > + || fatal_signal_pending(current))
> > > + || current->flags & PF_EXITING)
> > > goto bypass;
> > >
> > > if (unlikely(task_in_memcg_oom(current)))
> > >
> > > rather than the later checks down the oom_synchronize paths. The comment
> > > already mentions dying process...
> > >
> >
> > This is scary because it doesn't even try to reclaim memcg memory before
> > allowing the allocation to succeed.
>
> Why should it reclaim in the first place when it simply is on the way to
> release memory. In other words why should it increase the memory
> pressure when it is in fact releasing it?
>

(Answering about removing the fatal_signal_pending() check as well here.)

For memory isolation, we'd only want to bypass memcg charges when
absolutely necessary and it seems like TIF_MEMDIE is the only case where
that's required. We don't give processes with pending SIGKILLs or those
in the exit() path access to memory reserves in the page allocator without
first determining that reclaim can't make any progress for the same reason
and then we only do so by setting TIF_MEMDIE when calling the oom killer.

> I am really puzzled here. On one hand you are strongly arguing for not
> notifying when we know we can prevent from OOM action and on the other
> hand you are ok to get vmpressure/thresholds notification when an
> exiting task triggers reclaim.
>
> So I am really lost in what you are trying to achieve here. It sounds a
> bit arbirtrary.
>

It's not arbitrary to define when memcg bypass is allowed and, in my
opinion, it should only be done in situations where it is unavoidable and
therefore breaking memory isolation is required.

(We wouldn't expect a 128MB memcg to be oom [and perhaps with a userspace
oom handler attached] when it has 100 children each 1MB in size just
because they all happen to be oom at the same time. We set up the excess
memory in the parent specifically for the memcg with the oom handler
attached.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/