Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head

From: Peter Zijlstra
Date: Fri Apr 08 2022 - 04:16:30 EST


On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote:
> The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can
> be targeted by the oom reaper. This mapping is used to store the futex
> robust list head; the kernel does not keep a copy of the robust list and
> instead references a userspace address to maintain the robustness during
> a process death. A race can occur between exit_mm and the oom reaper that
> allows the oom reaper to free the memory of the futex robust list before
> the exit path has handled the futex death:
>
> CPU1 CPU2
> ------------------------------------------------------------------------
> page_fault
> do_exit "signal"
> wake_oom_reaper
> oom_reaper
> oom_reap_task_mm (invalidates mm)
> exit_mm
> exit_mm_release
> futex_exit_release
> futex_cleanup
> exit_robust_list
> get_user (EFAULT- can't access memory)
>
> If the get_user EFAULT's, the kernel will be unable to recover the
> waiters on the robust_list, leaving userspace mutexes hung indefinitely.
>
> Use the robust_list address stored in the kernel to skip the VMA that holds
> it, allowing a successful futex_cleanup.
>
> Theoretically a failure can still occur if there are locks mapped as
> PRIVATE|ANON; however, the robust futexes are a best-effort approach.
> This patch only strengthens that best-effort.
>
> The following case can still fail:
> robust head (skipped) -> private lock (reaped) -> shared lock (skipped)

This is still all sorts of confused.. it's a list head, the entries can
be in any random other VMA. You must not remove *any* user memory before
doing the robust thing. Not removing the VMA that contains the head is
pointless in the extreme.

Did you not read the previous discussion?