Re: [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support

From: Suren Baghdasaryan

Date: Thu Apr 23 2026 - 18:37:20 EST


On Thu, Apr 23, 2026 at 2:50 AM David Hildenbrand (Arm)
<david@xxxxxxxxxx> wrote:
>
> On 4/23/26 09:50, Michal Hocko wrote:
> > On Mon 20-04-26 14:53:23, Minchan Kim wrote:
> >> On Fri, Apr 17, 2026 at 09:11:21AM +0200, Michal Hocko wrote:
> > [...]
> >>> Yes. All which make sense, really. I am still not convinced about the
> >>> clean page cache because that just seems like a hack to workaround wrong
> >>> userspace oom heuristics.
> >>
> >> I see it a bit differently. When paltform decides to kill a process
> >> to free up memory, they want that memory back right away.
> >>
> >> So it doesn't make much sense for the kernel to ignore that and leave the clean
> >> file pages to be picked up slowly by kswapd later.
> >>
> >> In some aspects, you can think of LMKD as a more specialized, userspace version
> >> of kswapd. It has high-level knowledge of process priorities and knows exactly
> >> which process is safe to kill to get memory instantly. The kernel's kswapd,
> >> however, operates globally without this specific process-level awareness, which
> >> makes it less suited for this kind of targeted reclamation.
> >>
> >> If we force LMKD to rely on the slower global kswapd to actually free the clean
> >> pages, it defeats the whole purpose of targeting a specific process.
> >>
> >> So letting process_mrelease speed this up isn't a hack at all. It's just helping
> >> the kernel do what the admin wanted in the first place: fast, targeted memory.
> >
> > This is a very creative/disruptive way to do a memory reclaim. From a
> > user POV I would much rather see clean page cache reclaimed before my
> > apps start to disappear. But this is obviously your call and your users
> > that will care.
> >
> > Anyway, I still maintain my position. I do not think it is a good
> > idea to drop clean page cache as you do not know whether there are other
> > users.

I'm very much familiar with these issues in Android and really want to
find a good solution for them. IIUC, this RFC tries to address 2
things at once:
1. handling clean private page cache when reaping memory of a kill victim;
2. addressing a race between kill() and process_release() when
process_release() can't happen before the kill() but if it happens too
late after the victim passed its exit_mm() then process_release()
fails to find the mm to reap. This defeats the purpose of
process_release() call because the actual memory (released by
exit_mmap()) might not yet be free and a successful process_release()
would be very beneficial.

I see these two as separate issues and I'm not sure combining them
into a single discussion is a good idea.

>
> IIRC, Johannes raised in the past the we cannot predict the future.
>
> For example, if an app gets OOM-killed, wouldn't we usually try restarting it,
> re-consuming the clean pagecache pages we would be evicting here?

Sure, we can't predict which app the user will use next, so when
killing we usually kill the least recently used one. That's a
reasonable strategy in most cases.
In general, if speeding up the victim's reclaim negatively affects the
overall user workflow then this would mean we are selecting wrong kill
targets. In that case, we would need to adjust the target selection
strategy.

Thanks for tackling this Minchan! I'll try to review the patches this
weekend and provide my feedback.
Thanks,
Suren.

>
> Just a thought.
>
> --
> Cheers,
>
> David