Re: [PATCH v1 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag

From: Minchan Kim

Date: Tue Apr 28 2026 - 18:38:01 EST


On Tue, Apr 28, 2026 at 09:01:25AM +0200, Michal Hocko wrote:
> On Mon 27-04-26 15:03:49, Minchan Kim wrote:
> > On Mon, Apr 27, 2026 at 09:02:39AM +0200, Michal Hocko wrote:
> > > On Fri 24-04-26 15:49:19, Minchan Kim wrote:
> > > > On Fri, Apr 24, 2026 at 09:57:20AM +0200, Michal Hocko wrote:
> > > > > On Tue 21-04-26 16:02:39, Minchan Kim wrote:
> > > > > > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > > > > > prior to the call. This separation introduces a scheduling race window
> > > > > > where the victim task may receive the signal and enter the exit path
> > > > > > before the reaper can invoke process_mrelease().
> > > > > >
> > > > > > When the victim enters the exit path (do_exit -> exit_mm), it clears its
> > > > > > task->mm immediately. This causes process_mrelease() to fail with -ESRCH,
> > > > > > leaving the actual address space teardown (exit_mmap) to be deferred until
> > > > > > the mm's reference count drops to zero. In Android, arbitrary reference counts
> > > > > > (e.g., async I/O, reading /proc/<pid>/cmdline, or various other remote
> > > > > > VM accesses) frequently delay this teardown indefinitely, defeating the
> > > > > > purpose of expedited reclamation.
> > > > > >
> > > > > > This delay keeps memory pressure high, forcing the system to unnecessarily
> > > > > > kill additional innocent background apps before the memory from the first
> > > > > > victim is recovered.
> > > > >
> > > > > Thanks, this makes the motivation much more clear and usecase very
> > > > > sound.
> > > > >
> > > > > > This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
> > > > > > an integrated auto-kill mode. When specified, process_mrelease() directly
> > > > > > injects a SIGKILL into the target task.
> > > > > >
> > > > > > To solve the race condition deterministically, we grab the mm reference
> > > > > > via mmget() and set the MMF_UNSTABLE flag *before* sending the SIGKILL.
> > > > > > Using mmget() instead of mmgrab() keeps mm_users > 0, preventing the
> > > > > > victim from calling exit_mmap() in its own exit path.
> > > > >
> > > > > Why is this needed? Address space tear down is an operation that can run
> > > > > from several execution contexts.
> > > >
> > > > Agreed.
> > > >
> > > > >
> > > > > > This ensures that
> > > > > > the memory is reclaimed synchronously and deterministically by the reaper
> > > > > > in the context of process_mrelease(), avoiding delays caused by
> > > > > > non-deterministic scheduling of the victim task.
> > > > >
> > > > > The memory is still reclaimed synchronously from the mrelease context.
> > > > > This is really confusing.
> > > > >
> > > > > Please also explain why do you need to do all that ugly
> > > > > task_will_free_mem hoops. Why cannot you simply kill the task if
> > > > > task_will_free_mem fails (if PROCESS_MRELEASE_REAP_KILL is used).
> > > >
> > > > I wanted to handle shared address spaces.
> > > > Even though we are okay with the target task not being in a SIGKILL
> > > > state yet (since we are about to kill it), we must ensure that all
> > > > *other* processes sharing the same mm are also dying.
> > >
> > > Then just bail out when the mm is shared accross thread groups, rather
> > > than kill just one of them. Or kill all of them. There is no reason to
> > > play around that on the task_will_free_mem level.
> >
> > Kiling unrelated processes just because they share an mm is too radicical.
>
> Well, that depends on what you try to achieve. The global OOM killer
> does kill all tasks sharing the mm.
>
> > Thinking about quick check whether mm is shared.
> >
> > An idea:
> >
> > `atomic_read(&mm->mm_users) > task->signal->nr_threads` to detect sharing
> > across thread groups without looping like task_will_free_mem.
>
> We have MMF_MULTIPROCESS. Can you use that?

That makes sense. Thanks.

Then, how about this?