Re: [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support

From: Michal Hocko

Date: Fri Apr 17 2026 - 03:15:35 EST

On Thu 16-04-26 23:20:30, Minchan Kim wrote:
> On Thu, Apr 16, 2026 at 08:54:53AM +0200, Michal Hocko wrote:
> > On Wed 15-04-26 16:26:34, Minchan Kim wrote:
> > > On Wed, Apr 15, 2026 at 09:38:05AM +0200, Michal Hocko wrote:
> > > > On Tue 14-04-26 13:00:16, Minchan Kim wrote:
> > > > > On Tue, Apr 14, 2026 at 08:57:57AM +0200, Michal Hocko wrote:
> > > > > > On Mon 13-04-26 15:39:45, Minchan Kim wrote:
> > > > > > > This patch series introduces optimizations to expedite memory reclamation
> > > > > > > in process_mrelease() and provides a secure, race-free "auto-kill"
> > > > > > > mechanism for efficient container shutdown and OOM handling.
> > > > > > >
> > > > > > > Currently, process_mrelease() unmaps pages but leaves clean file folios
> > > > > > > on the LRU list, relying on standard memory reclaim to eventually free
> > > > > > > them. Furthermore, requiring userspace to send a SIGKILL prior to
> > > > > > > invoking process_mrelease() introduces scheduling race conditions where
> > > > > > > the victim task may enter the exit path prematurely, bypassing expedited
> > > > > > > reclamation hooks.
> > > > > > >
> > > > > > > This series addresses these limitations in three logical steps.
> > > > > > >
> > > > > > > Patch #1: mm: process_mrelease: expedite clean file folio reclaim via mmu_gather
> > > > > > > Integrates clean file folio eviction directly into the low-level TLB
> > > > > > > batching (mmu_gather) infrastructure. Symmetrically truncates clean file
> > > > > > > folios alongside anonymous pages during the unmap loop.
> > > > > >
> > > > > > Why do we need to care about clean page cache? Is this a form of
> > > > > > drop_caches?
> > > > >
> > > > > The goal is to ensure the memory is actually freed by the time
> > > > > process_mrelease returns. Currently, process_mrelease unmaps pages, but
> > > > > page caches remain on the LRU, leaving them to be reclaimed later
> > > > > by kswapd or direct reclaim.
> > > >
> > > > Correct. This was the initial design decision because there is not much
> > > > you can assume about page cache pages which are very often shared. Even
> > > > if they are not mapped by all users.
> > >
> > > Fair point. However, that's the trade-off:
> > >
> > > Leaving unmapped caches to be reclaimed asynchronously keeps system memory
> > > pressure high for too long. In Android, this delay forces the LMKD to
> > > unnecessarily kill additional innocent background apps before the memory
> > > from the original victim is recovered.
> >
> > OK, this is really not clear to me. How come you end up triggering LMKD
> > (or any OOM handling) when there is a considerable amount of clean page
> > cache?
>
> It's not simple to explain all the heuristics, but basically, LMKD is triggered
> by PSI pressure (usually contributed by kswapd rather than other components
> like refault, kcompactd, or workingset operations).
>
> It then checks the current free memory against system watermarks. Depending
> on the free memory size, file cache, and free swap, it decides to start
> killing background apps.
>
> In other words, LMKD acts as a "userspace kswapd" to assist kernel kswapd's
> reclamation speed. It is smarter than kswapd because it has high-level knowledge
> of which processes are okay to be killed rather than forcing slow, unnecessary
> paing out.
>
> Whenever LMKD is running, kswapd is usually running alongside it. You might
> wonder why LMKD kills background apps even when there are plenty of clean file
> pages. That's because the system cannot predict current memory allocation rates.
> If the allocation is bursty, kswapd can never catch up with the allocation speed.
> This forces the foreground apps into direct reclaim, resulting in visible
> UI jank. Android prioritizes UI smoothness and chooses to kill background apps.
>
> Furthermore, when LMKD kills a background app, it expects immediate memory relief.
> If the clean file pages of the killed process are left on the LRU to be reclaimed
> asynchronously later, the system's memory pressure (PSI) remains high.
> This forces LMKD to unnecessarily kill *additional* background apps before
> the memory from the first victim is fully recovered.
>
> Again, this is why I want process_mrelease expedite clean file reclamation
> synchronously.

How much of a clean page cache do you usually drop this way?

[...]
> > I suspect you are missing my point. I am arguing that those special
> > hacks in the address space release path shouldn't be process_mrelease
>
> I am a bit confused now. Do you mean you want to apply these expedited
> reclamation optimizations to ALL dying processes in the common exit path,
> rather than making them specific to process_mrelease?

Yes. All which make sense, really. I am still not convinced about the
clean page cache because that just seems like a hack to workaround wrong
userspace oom heuristics.
--
Michal Hocko
SUSE Labs