Re: [PATCH] mm: Fix endless reclaim on machines with unaccepted memory.

From: Kirill A. Shutemov
Date: Wed Jul 17 2024 - 07:55:22 EST


On Wed, Jul 17, 2024 at 09:19:12AM +0200, Michal Hocko wrote:
> On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote:
> > Unaccepted memory is considered unusable free memory, which is not
> > counted as free on the zone watermark check. This causes
> > get_page_from_freelist() to accept more memory to hit the high
> > watermark, but it creates problems in the reclaim path.
> >
> > The reclaim path encounters a failed zone watermark check and attempts
> > to reclaim memory. This is usually successful, but if there is little or
> > no reclaimable memory, it can result in endless reclaim with little to
> > no progress. This can occur early in the boot process, just after start
> > of the init process when the only reclaimable memory is the page cache
> > of the init executable and its libraries.
>
> How does this happen when try_to_accept_memory is the first thing to do
> when wmark check fails in the allocation path?

Good question.

I've lost access to the test setup and cannot check it directly right now.

Reading the code Looks like __alloc_pages_bulk() bypasses
get_page_from_freelist() where we usually accept more pages and goes
directly to __rmqueue_pcplist() -> rmqueue_bulk() -> __rmqueue().

Will look more into it when I have access to the test setup.

> Could you describe what was the initial configuration of the system? How
> much of the unaccepted memory was there to trigger this?

This is large TDX guest VM: 176 vCPUs and ~800GiB of memory.

One thing that I noticed that the problem is only triggered when LRU_GEN
enabled. But I failed to identify why.

The system hang (or have very little progress) shortly after systemd
starts.

> > To address this issue, teach shrink_node() and shrink_zones() to accept
> > memory before attempting to reclaim.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Reported-by: Jianxiong Gao <jxgao@xxxxxxxxxx>
> > Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
> > Cc: stable@xxxxxxxxxxxxxxx # v6.5+
> [...]
> > static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> > {
> > unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed;
> > struct lruvec *target_lruvec;
> > bool reclaimable = false;
> >
> > + /* Try to accept memory before going for reclaim */
> > + if (node_try_to_accept_memory(pgdat, sc)) {
> > + if (!should_continue_reclaim(pgdat, 0, sc))
> > + return;
> > + }
> > +
>
> This would need an exemption from the memcg reclaim.

Hm. Could you elaborate why?

--
Kiryl Shutsemau / Kirill A. Shutemov