Re: [PATCH v3] mm/oom: do not oom reap task with an unresolved robust futex

From: Nico Pache
Date: Tue Mar 08 2022 - 20:06:10 EST




On 3/3/22 00:48, Michal Hocko wrote:
> On Wed 02-03-22 12:26:45, Nico Pache wrote:
>>
>>
>> On 3/2/22 09:24, Michal Hocko wrote:
>>> Sorry, this has slipped through cracks.
>>>
>>> On Mon 14-02-22 15:39:31, Nico Pache wrote:
>>> [...]
>>>> We've recently been discussing the following if statement in __oom_reap_task_mm:
>>>> if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
>>>>
>>>> Given the comment above it, and some of the upstream discussion the original
>>>> RFC, we are struggling to see why this should be a `||` and not an `&&`. If we
>>>> only want to reap anon memory and reaping shared memory can be dangerous is this
>>>> statement incorrect?
>>>>
>>>> We have a patch queued up to make this change, but wanted to get your opinion on
>>>> why this was originally designed this way in case we are missing something.
>>>
>>> I do not really see why this would be wrong. Private file backed
>>> mappings can contain a reapable memory as well. I do not see how this
>>> would solve the futex issue
>> We were basing our discussion around the following comment:
>> /*
>> * Only anonymous pages have a good chance to be dropped
>> * without additional steps which we cannot afford as we
>> * are OOM already.
>> *
>> * We do not even care about fs backed pages because all
>> * which are reclaimable have already been reclaimed and
>> * we do not want to block exit_mmap by keeping mm ref
>> * count elevated without a good reason.
>> */
>>
>> So changing to an && would align the functionality with this comment by ignoring
>> fs backed pages, and additionally it prevents shared mappings from being reaped.
>> We have tested this change and found we can no longer reproduce the issue. In
>> our case we allocate the mutex on a MAP_SHARED|MAP_ANONYMOUS mmap so the if-
>> statement in question would no longer return true after the && change.
>>
>> If it is the case that private fs backed pages matter perhaps we want something
>> like this:
>> if ((vma_is_anonymous(vma) && !(vma->vm_flags & VM_SHARED))
>> ||(!vma_is_anonymous(vma) && !(vma->vm_flags & VM_SHARED)))
>>
>> or more simply:
>> if(!(vma->vm_flags & VM_SHARED))
>>
>> to exclude all VM_SHARED mappings.
>
> I would have to think about that some more but I do not really see how
> this is related to the futex issue. In other words what kind of problem
> does this solve?
>

We had a misunderstanding of what vma_is_anonymous actually checks for... It
returns true if the VMA is PRIVATE|ANONYMOUS. We may follow up with a patch to
change the name of this function or at least add a comment at the top of the
function to be more descriptive. Furthermore, we ended up being able to
reproduce this issue on the && kernel.

We have also found the actual cause of the issue, and we'll post that fix. Its
related to the glibc allocation done for pthreads as we discussed earlier in
this thread. The mapping that stores the futex robust list is in userspace and a
race occurs between the oom_reap_task_mm and the exit path that handles the
futex cleanup.

Cheers,
-- Nico