Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

From: Roman Gushchin
Date: Thu Jun 22 2017 - 13:00:12 EST

Next message: tip-bot for Thomas Gleixner: "[tip:irq/core] genirq/cpuhotplug: Do not migrated shutdown irqs"
Previous message: Alex Williamson: "Re: Query on VFIO in Virtual machine"
In reply to: Roman Gushchin: "[v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection"
Next in thread: Tetsuo Handa: "Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> Roman Gushchin wrote:
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > if (oom_killer_disabled)
> > return false;
> >
> > + /*
> > + * If there are oom victims in flight, we don't need to select
> > + * a new victim.
> > + */
> > + if (atomic_read(&oom_victims) > 0)
> > + return true;
> > +
> > if (!is_memcg_oom(oc)) {
> > blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> > if (freed > 0)
>
> Above in this patch and below in patch 5 are wrong.
>
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -665,7 +672,13 @@ static void mark_oom_victim(struct task_struct *tsk)
> > * that TIF_MEMDIE tasks should be ignored.
> > */
> > __thaw_task(tsk);
> > - atomic_inc(&oom_victims);
> > +
> > + /*
> > + * If there are no oom victims in flight,
> > + * give the task an access to the memory reserves.
> > + */
> > + if (atomic_inc_return(&oom_victims) == 1)
> > + set_tsk_thread_flag(tsk, TIF_MEMDIE);
> > }
> >
> > /**
>
> The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout based
> giveup is not permitted, but a multithreaded process might be selected as
> an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM victim's
> mm increases possibility of preventing some OOM victim thread from terminating
> (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem held for
> write due to waiting for the TIF_MEMDIE thread to call exit_oom_victim() when
> the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for write).

I agree, that CONFIG_MMU=n is a special case, and the proposed approach can't
be used directly. But can you, please, why do you find the first chunk wrong?
Basically, it's exactly what we do now: we increment oom_victims for every oom
victim, and every process decrements this counter during it's exit path.
If there is at least one existing victim, we will select it again, so it's just
an optimization. Am I missing something? Why should we start new victim selection
if there processes that will likely quit and release memory soon?

Thanks!

Next message: tip-bot for Thomas Gleixner: "[tip:irq/core] genirq/cpuhotplug: Do not migrated shutdown irqs"
Previous message: Alex Williamson: "Re: Query on VFIO in Virtual machine"
In reply to: Roman Gushchin: "[v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection"
Next in thread: Tetsuo Handa: "Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]