Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head

From: Nico Pache
Date: Mon Apr 11 2022 - 20:02:13 EST




On 4/11/22 05:08, Michal Hocko wrote:
> On Mon 11-04-22 09:47:14, Thomas Gleixner wrote:
>> Michal,
>>
>> On Mon, Apr 11 2022 at 08:48, Michal Hocko wrote:
>>> On Fri 08-04-22 23:41:11, Thomas Gleixner wrote:
>>>> So why would a process private robust mutex be any different from a
>>>> process shared one?
>>>
>>> Purely from the OOM POV they are slightly different because the OOM
>>> killer always kills all threads which share the mm with the selected
>>> victim (with an exception of the global init - see __oom_kill_process).
>>> Note that this is including those threads which are not sharing signals
>>> handling.
>>> So clobbering private locks shouldn't be observable to an alive thread
>>> unless I am missing something.
>>
>> Yes, it kills everything, but the reaper also reaps non-shared VMAs. So
>> if the process private futex sits in a reaped VMA the shared one becomes
>> unreachable.
>>
>>> On the other hand I do agree that delayed oom_reaper execution is a
>>> reasonable workaround and the most simplistic one.
>>
>> I think it's more than a workaround. It's a reasonable expectation that
>> the kernel side of the user space threads can mop up the mess the user
>> space part created. So even if one of of N threads is stuck in a place
>> where it can't, then N-1 can still reach do_exit() and mop their mess
>> up.
>>
>> The oom reaper is the last resort to resolve the situation in case of a
>> stuck task. No?
>
> Yes, I keep saying workaround because it really doesn't address the
> underlying issue which is that the oom_reaper clobbers something it
> shouldn't be. A full fix from my POV would be making oom_reaper code
> more aware of the futex usage. But this is something nore really viable.
This is *kinda* what this approach is doing, but as Thomas has pointed out, it
has its shortcoming. Additionally, it has just come to my attention, that this
solution does not cover the compat robust list... So there is yet another
shortcoming.
>
> Btw. this is what I've in my local tree. It hasn't seen any testing but
> it might be a good start to make it a full patch. I have decided to use
> a timer rather than juggling tasks on the oom_reaper list because
> initial implementation looked uglier. I will try to find some time to
> finish that but if Nico or others beat me to it I won't complain.
> Also I absolutely do not insist on the timer approach.
> [...]

I will spend tomorrow working the delay solution and testing it. Thanks for
starting it :)

I appreciate the comments and help from everyone that has participated! I'm
sorry if any misunderstanding were had, its not our intention to upset anyone,
but rather to learn and work a solution for the problem we are facing.

Best,
-- Nico