Re: [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
From: Minchan Kim
Date: Thu Apr 23 2026 - 19:43:58 EST
On Thu, Apr 23, 2026 at 09:17:39AM +0200, Michal Hocko wrote:
> On Mon 20-04-26 14:47:04, Minchan Kim wrote:
> > On Fri, Apr 17, 2026 at 09:04:31AM +0200, Michal Hocko wrote:
> > > On Thu 16-04-26 23:30:09, Minchan Kim wrote:
> > > > If I send the SIGKILL first to satisfy the process_mrelease() requirement,
> > > > we immediately run into the scheduling race condition where the victim can
> > > > enter the exit path before the reaper can set the flag.
> > >
> > > Why don't you just grab the mm before you send the signal and then continue
> > > with reaping? You just want to avoid a race where the victim manages to
> > > process fatal signal, start its exit path and mrelease path losing that
> > > race so you rely on the exit path, right?
> >
> > The problem is that process_mrelease() operates on a task obtained from a pidfd.
> >
> > Once the victim process receives the SIGKILL and enters the exit path (exit_mm),
> > the kernel sets task->mm to NULL.
> >
> > Even if we could somehow hold a reference to the mm_struct beforehand,
> > process_mrelease() would still fail because mm_struct via task returns NULL
> > after exit_mm() has been called.
> >
> > Therefore, we cannot simply "grab the mm" before sending the signal and expect
> > process_mrelease() to work after the victim starts exiting.
>
> I do not follow. Why cannot you simply do this
I misunderstood your point. Do you mean this?
https://lore.kernel.org/linux-mm/20260421230239.172582-4-minchan@xxxxxxxxxx/
There are more details to figure out.
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5c6c95c169ee..b80a96f5460a 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1241,9 +1241,14 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> if (task_will_free_mem(p))
> reap = true;
> else {
> + if (flags & PROCESS_MRELEASE_REAP_KILL) {
> + } else {
> + /* send SIGKILL */
> + reap = true;
> /* Error only if the work has not been done already */
> - if (!mm_flags_test(MMF_OOM_SKIP, mm))
> - ret = -EINVAL;
> + if (!mm_flags_test(MMF_OOM_SKIP, mm))
> + ret = -EINVAL;
> + }
> }
> task_unlock(p);
>
>
> --
> Michal Hocko
> SUSE Labs