Re: [PATCH] mm, oom_adj: avoid meaningless loop to find processes sharing mm

From: Tetsuo Handa
Date: Tue Oct 09 2018 - 09:51:07 EST


On 2018/10/09 22:26, Michal Hocko wrote:
> On Tue 09-10-18 22:14:24, Tetsuo Handa wrote:
>> On 2018/10/09 21:58, Michal Hocko wrote:
>>> On Tue 09-10-18 21:52:12, Tetsuo Handa wrote:
>>>> On 2018/10/09 20:10, Michal Hocko wrote:
>>>>> On Tue 09-10-18 19:00:44, Tetsuo Handa wrote:
>>>>>>> 2) add OOM_SCORE_ADJ_MIN and do not kill tasks sharing mm and do not
>>>>>>> reap the mm in the rare case of the race.
>>>>>>
>>>>>> That is no problem. The mistake we made in 4.6 was that we updated oom_score_adj
>>>>>> to -1000 (and allowed unprivileged users to OOM-lockup the system).
>>>>>
>>>>> I do not follow.
>>>>>
>>>>
>>>> http://tomoyo.osdn.jp/cgi-bin/lxr/source/mm/oom_kill.c?v=linux-4.6.7#L493
>>>
>>> Ahh, so you are not referring to the current upstream code. Do you see
>>> any specific problem with the current one (well, except for the possible
>>> race which I have tried to evaluate).
>>>
>>
>> Yes. "task_will_free_mem(current) in out_of_memory() returns false due to MMF_OOM_SKIP
>> being already set" is a problem for clone(CLONE_VM without CLONE_THREAD/CLONE_SIGHAND)
>> with the current code.
>
> a) I fail to see how that is related to your previous post and b) could
> you be more specific. Is there any other scenario from the two described
> in my earlier email?
>

I do not follow. Just reverting commit 44a70adec910d692 and commit 97fd49c2355ffded
is sufficient for closing the copy_process() versus __set_oom_adj() race.

We went too far towards complete "struct mm_struct" based OOM handling. But stepping
back to "struct signal_struct" based OOM handling solves Yong-Taek's for_each_process()
latency problem and your copy_process() versus __set_oom_adj() race problem and my
task_will_free_mem(current) race problem.