Re: [PATCH 2/2] memcg: do not report racy no-eligible OOM tasks

From: Tetsuo Handa
Date: Tue Jan 08 2019 - 05:40:13 EST


On 2019/01/08 17:14, Michal Hocko wrote:
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index af7f18b32389..90eb2e2093e7 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -1387,10 +1387,22 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>>> .gfp_mask = gfp_mask,
>>> .order = order,
>>> };
>>> - bool ret;
>>> + bool ret = true;
>>>
>>> mutex_lock(&oom_lock);
>>
>> And because of "[PATCH 1/2] mm, oom: marks all killed tasks as oom
>> victims", mark_oom_victim() will be called on current thread even if
>> we used mutex_lock_killable(&oom_lock) here, like you said
>>
>> mutex_lock_killable would take care of exiting task already. I would
>> then still prefer to check for mark_oom_victim because that is not racy
>> with the exit path clearing signals. I can update my patch to use
>> _killable lock variant if we are really going with the memcg specific
>> fix.
>>
>> . If current thread is not yet killed by the OOM killer but can terminate
>> without invoking the OOM killer, using mutex_lock_killable(&oom_lock) here
>> saves some processes. What is the race you are referring by "racy with the
>> exit path clearing signals" ?
>
> This is unrelated to the patch.

Ultimately related! This is the reasoning why your patch should be preferred
over my patch.

For example, if memcg OOM events in different domains are pending, already
OOM-killed threads needlessly wait for pending memcg OOM events in different
domains. An out_of_memory() call is slow because it involves printk(). With
slow serial consoles, out_of_memory() might take more than a second. I consider
that allowing killed processes to call mmput() from exit_mm() from do_exit()
quickly (instead of waiting for pending memcg OOM events in different domains
at mem_cgroup_out_of_memory()) helps calling __mmput() (which can reclaim more
memory than the OOM reaper can reclaim) quickly. Unless what you call "racy" is
problematic, I don't see reasons not to apply my patch. So, please please answer
what you are referring to with "racy".