Re: [PATCH v1 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
From: Michal Hocko
Date: Wed Apr 29 2026 - 04:26:00 EST
On Tue 28-04-26 15:37:57, Minchan Kim wrote:
[...]
> >From be4bd22a100ed6be2d1d2599ddb9da04043143eb Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@xxxxxxxxxx>
> Date: Fri, 24 Apr 2026 14:27:08 -0700
> Subject: [PATCH] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL
> flag
>
> Currently, process_mrelease() requires userspace to send a SIGKILL signal
> prior to invocation. This separation introduces a scheduling race window
> where the victim task may receive the signal and enter the exit path
> before the reaper can invoke process_mrelease().
>
> When the victim enters the exit path (do_exit -> exit_mm), it clears its
> task->mm immediately. This causes process_mrelease() to fail with -ESRCH,
> leaving the actual address space teardown (exit_mmap) to be deferred until
> the mm's reference count drops to zero. In the field (e.g., Android),
> arbitrary reference counts (reading /proc/<pid>/cmdline, or various other
> remote VM accesses) frequently delay this teardown indefinitely,
> defeating the purpose of expedited reclamation.
>
> In Android's LMKD scenarios, this delay keeps memory pressure high, forcing
> the system to unnecessarily kill additional innocent background apps before
> the memory from the first victim is recovered.
>
> This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
> an integrated auto-kill mode. When specified, process_mrelease() directly
> injects a SIGKILL into the target task after finding its mm.
>
> To solve the race condition, we grab the mm reference via mmgrab() before
> sending the SIGKILL. If the user passed PROCESS_MRELEASE_REAP_KILL, we assume
> it will free its memory and proceed with reaping, making the logic as simple
> as reap = reap_kill || task_will_free_mem(p).
>
> To handle shared address spaces safely in the auto-kill mode, we bail out
> immediately if the mm is marked with MMF_MULTIPROCESS when
> PROCESS_MRELEASE_REAP_KILL is specified. This protects existing users of
> process_mrelease() from behavior changes while preventing unsafe reaping of
> shared memory.
Please explain why this is a different behavior from the global oom
killer and how do you intend to deal with those mm shared process
groups. I am not saying this is a wrong behavior but it will be hard to
change once in place.
> Fundamentally, this allows process_mrelease() to trigger targeted memory
> reclaim (via oom_reaper infrastructure) quickly, even if the victim is
> not yet in the exit path, while reusing existing race handling between
> reaper and exit_mmap.
>
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
Other than the above looks ok to me.
--
Michal Hocko
SUSE Labs