Re: [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag

From: Michal Hocko

Date: Thu Apr 23 2026 - 03:19:26 EST


On Mon 20-04-26 14:47:04, Minchan Kim wrote:
> On Fri, Apr 17, 2026 at 09:04:31AM +0200, Michal Hocko wrote:
> > On Thu 16-04-26 23:30:09, Minchan Kim wrote:
> > > If I send the SIGKILL first to satisfy the process_mrelease() requirement,
> > > we immediately run into the scheduling race condition where the victim can
> > > enter the exit path before the reaper can set the flag.
> >
> > Why don't you just grab the mm before you send the signal and then continue
> > with reaping? You just want to avoid a race where the victim manages to
> > process fatal signal, start its exit path and mrelease path losing that
> > race so you rely on the exit path, right?
>
> The problem is that process_mrelease() operates on a task obtained from a pidfd.
>
> Once the victim process receives the SIGKILL and enters the exit path (exit_mm),
> the kernel sets task->mm to NULL.
>
> Even if we could somehow hold a reference to the mm_struct beforehand,
> process_mrelease() would still fail because mm_struct via task returns NULL
> after exit_mm() has been called.
>
> Therefore, we cannot simply "grab the mm" before sending the signal and expect
> process_mrelease() to work after the victim starts exiting.

I do not follow. Why cannot you simply do this
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5c6c95c169ee..b80a96f5460a 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1241,9 +1241,14 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
if (task_will_free_mem(p))
reap = true;
else {
+ if (flags & PROCESS_MRELEASE_REAP_KILL) {
+ } else {
+ /* send SIGKILL */
+ reap = true;
/* Error only if the work has not been done already */
- if (!mm_flags_test(MMF_OOM_SKIP, mm))
- ret = -EINVAL;
+ if (!mm_flags_test(MMF_OOM_SKIP, mm))
+ ret = -EINVAL;
+ }
}
task_unlock(p);


--
Michal Hocko
SUSE Labs