Re: [PATCH 1/1] mm/vmscan: prevent MGLRU reclaim from pinning address space

From: Lorenzo Stoakes (Oracle)

Date: Mon Mar 23 2026 - 09:52:38 EST

On Sun, Mar 22, 2026 at 12:08:43AM -0700, Suren Baghdasaryan wrote:
> When shrinking lruvec, MGLRU pins address space before walking it.
> This is excessive since all it needs for walking the page range is
> a stable mm_struct to be able to take and release mmap_read_lock and
> a stable mm->mm_mt tree to walk. This address space pinning results

Hmm, I guess exit_mmap() calls __mt_destroy(), but that'll just destroy
allocated state and leave the tree empty right, so traversal of that tree
at that point would just do nothing?

> in delays when releasing the memory of a dying process. This also
> prevents mm reapers (both in-kernel oom-reaper and userspace
> process_mrelease()) from doing their job during MGLRU scan because
> they check task_will_free_mem() which will yield negative result due
> to the elevated mm->mm_users.
>
> Replace unnecessary address space pinning with mm_struct pinning by
> replacing mmget/mmput with mmgrab/mmdrop calls. mm_mt is contained
> within mm_struct itself, therefore it won't be freed as long as
> mm_struct is stable and it won't change during the walk because
> mmap_read_lock is being held.
>
> Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
> Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> ---
> mm/vmscan.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 33287ba4a500..68e8e90e38f5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2863,8 +2863,9 @@ static struct mm_struct *get_next_mm(struct lru_gen_mm_walk *walk)
> return NULL;

Not related to this series, but I really don't like how coupled MGLRU is to
the rest of the 'classic' reclaim code.

Just in the middle of vmscan you walk into generic mm walker logic and the
only hint it's MGLRU is you see lru_gen_xxx stuff (I'm also annoyed that we
call it MGLRU but it's called lru_gen_xxx in the kernel :)

>
> clear_bit(key, &mm->lru_gen.bitmap);
> + mmgrab(mm);

Is the mm somehow pinned here or, on destruction, would move it from the mm
list meaning that we can safely assume we have something sane in mm-> to
grab? I guess this must have already been the case for mmget_not_zero() to
have been used before though.

>
> - return mmget_not_zero(mm) ? mm : NULL;
> + return mm;
> }
>
> void lru_gen_add_mm(struct mm_struct *mm)
> @@ -3064,7 +3065,7 @@ static bool iterate_mm_list(struct lru_gen_mm_walk *walk, struct mm_struct **ite
> reset_bloom_filter(mm_state, walk->seq + 1);
>
> if (*iter)
> - mmput_async(*iter);
> + mmdrop(*iter);

This will now be a blocking call that could free the mm (via __mmdrop()),
so could take a while, is that ok?

If before the code was intentionally deferring work here, doesn't that
imply that being slow here might be an issue, somehow? Or was it just
because they could? :)

>
> *iter = mm;
>
>
> base-commit: 8c65073d94c8b7cc3170de31af38edc9f5d96f0e
> --
> 2.53.0.1018.g2bb0e51243-goog
>

Thanks, Lorenzo