Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

From: Tetsuo Handa
Date: Mon Mar 16 2020 - 23:19:11 EST


David Rientjes wrote:
> On Sat, 14 Mar 2020, Tetsuo Handa wrote:
> > If current thread is an OOM victim, schedule_timeout_killable(1) will give other
> > threads (including the OOM reaper kernel thread) CPU time to run, by leaving
> > try_charge() path due to should_force_charge() == true and reaching do_exit() path
> > instead of returning to userspace code doing "for (;;);".
> >
> > Unless the problem is that current thread cannot reach should_force_charge() check,
> > schedule_timeout_killable(1) should work.
> >
>
> No need to yield if current is the oom victim, allowing the oom reaper to
> run when it may not actually be able to free memory is not required. It
> increases the likelihood that some other process schedules and is unable
> to yield back due to the memcg oom condition such that the victim doesn't
> get a chance to run again.
>
> This happens because the victim is allowed to overcharge but other
> processes attached to an oom memcg hierarchy simply fail the charge. We
> are then reliant on all memory chargers in the kernel to yield if their
> charges fail due to oom. It's the only way to allow the victim to
> eventually run.
>
> So the only change that I would make to your patch is to do this in
> mem_cgroup_out_of_memory() instead:
>
> if (!fatal_signal_pending(current))
> schedule_timeout_killable(1);
>
> So we don't have this reliance on all other memory chargers to yield when
> their charge fails and there is no delay for victims themselves.

I see. You want below functions for environments where current thread can
fail to resume execution for long if current thread once reschedules (e.g.
UP kernel, many threads contended on one CPU).

/*
* Give other threads CPU time, unless current thread was already killed.
* Used when we prefer killed threads to continue execution (in a hope that
* killed threads terminate quickly) over giving other threads CPU time.
*/
signed long __sched schedule_timeout_killable_expedited(signed long timeout)
{
if (unlikely(fatal_signal_pending(current)))
return timeout;
return schedule_timeout_killable(timeout);
}

/*
* Latency reduction via explicit rescheduling in places that are safe,
* but becomes no-op if current thread was already killed. Used when we
* prefer killed threads to continue execution (in a hope that killed
* threads terminate quickly) over giving other threads CPU time.
*/
int cond_resched_expedited(void)
{
if (unlikely(fatal_signal_pending(current)))
return 0;
return cond_resched();
}

>
> [ I'll still propose my change that adds cond_resched() to
> shrink_node_memcgs() because we can see need_resched set for a
> prolonged period of time without scheduling. ]

As long as there is schedule_timeout_killable(), I'm fine with adding
cond_resched() in other places.

>
> If you agree, I'll propose your patch with a changelog that indicates it
> can fix the soft lockup issue for UP and can likely get a tested-by for
> it.
>

Please go ahead.