Re: [PATCH RESEND] mm/oom_kill: don't kill exiting tasks in oom_kill_memcg_member

From: Haifeng Xu
Date: Tue Mar 14 2023 - 09:32:04 EST




On 2023/3/14 20:00, Michal Hocko wrote:
> On Tue 14-03-23 19:07:27, Haifeng Xu wrote:
>>
>>
>> On 2023/3/14 18:16, Michal Hocko wrote:
>>> On Tue 14-03-23 18:07:42, Haifeng Xu wrote:
>>>>
>>>>
>>>> On 2023/3/14 17:19, Michal Hocko wrote:
>>>>> On Tue 14-03-23 09:11:36, Haifeng Xu wrote:
>>>>>> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member()
>>>>>> to kill all processes in the memcg. When scanning tasks in memcg, maybe
>>>>>> the provided task is marked as oom victim. Also, some tasks are likely
>>>>>> to release their address space. There is no need to kill the exiting tasks.
>>>>>
>>>>> This doesn't state any actual problem. Could you be more specific? Is
>>>>> this a bug fix, a behavior change or an optimization?
>>>>
>>>>
>>>> 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned
>>>> in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice.
>>>
>>> Why does that matter though? The purpose of task_will_free_mem in
>>> oom_kill_process is different. It would bail out from a potentially
>>> noisy OOM report when the selected oom victim is expected to terminate
>>> soon. __oom_kill_process called for the whole memcg doesn't aim at
>>> avoiding any oom victims. It merely sends a kill signal too all of them.
>>>
>>
>> except sending kill signals, __oom_kill_process() will do some other work, such as print messeages, traversal all
>> all user processes sharing mm which holds RCU section and so on. So if skip the victim, we don't need those work again
>> and it won't affect the original mechanism. All oom victims are still get killed.
>
> mm sharing among processes is a very rare thing but do not forget that
> task_will_free_mem needs to do the same thing for the same reason.

For the victim, __oom_kill_process() traversals all processes in the system whether there some other tasks sharing mm or not.
If skip it, this work can be dropped.

>
>>>> 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking
>>>> __oom_kill_process().
>>>
>>> Is it? What if the terminating task is blocked on lock? Async oom
>>> reaping might release those resources in that case.
>>
>> Yes, the reaping process is asynchronous. I mean we don't need the work mentioned above any more.
>> "reaping them directly" here is that joining the task in oom reaper queue.
>
> I do not follow.
>
> In any case I still do not see any actual justification for the change
> other than "we can do it and it might turn out less expensive". This
> alone is not sufficient, just be explicit, because oom is hardly a fast
> path to optimize every single cpu cycle for. So unless you see an actual
> real life problem that would be behaving much better or even fixed then
> I am not convinced this is a worthwhile change to have.
>

we can also see two same messages("Memory cgroup out of memory: Killed process ***")about the victim.
This seems a little confusing. If skip the victim, only one message was printed.