Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

From: Tetsuo Handa
Date: Wed Mar 11 2020 - 18:04:45 EST


On 2020/03/12 4:38, David Rientjes wrote:
> On Wed, 11 Mar 2020, Tetsuo Handa wrote:
>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>>>>> unsigned long reclaimed;
>>>>> unsigned long scanned;
>>>>>
>>>>> + cond_resched();
>>>>> +
>>>>
>>>> Is this safe for CONFIG_PREEMPTION case? If current thread has realtime priority,
>>>> can we guarantee that the OOM victim (well, the OOM reaper kernel thread rather
>>>> than the OOM victim ?) gets scheduled?
>>>>
>>>
>>> I think it's the best we can do that immediately solves the issue unless
>>> you have another idea in mind?
>>
>> "schedule_timeout_killable(1) outside of oom_lock" or "the OOM reaper grabs oom_lock
>> so that allocating threads guarantee that the OOM reaper gets scheduled" or "direct OOM
>> reaping so that allocating threads guarantee that some memory is reclaimed".
>>
>
> The cond_resched() here is needed if the iteration is lengthy depending on
> the number of descendant memcgs already.

No. cond_resched() here will become no-op if CONFIG_PREEMPTION=y and current
thread has realtime priority.

>
> schedule_timeout_killable(1) does not make any guarantees that current
> will be scheduled after the victim or oom_reaper on UP systems.

The point of schedule_timeout_*(1) is to guarantee that current thread
will yield CPU to other threads even if CONFIG_PREEMPTION=y and current
thread has realtime priority case. There is no guarantee that current
thread will be rescheduled immediately after a sleep is irrelevant.

>
> If you have an alternate patch to try, we can test it. But since this
> cond_resched() is needed anyway, I'm not sure it will change the result.

schedule_timeout_killable(1) is an alternate patch to try; I don't think
that this cond_resched() is needed anyway.

>
>>>
>>>>> switch (mem_cgroup_protected(target_memcg, memcg)) {
>>>>> case MEMCG_PROT_MIN:
>>>>> /*
>>>>>
>>>>
>>
>>