Re: [PATCH v4 0/6] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation

From: Matthew Brost

Date: Fri May 01 2026 - 02:28:26 EST


On Thu, Apr 30, 2026 at 04:01:05PM -0700, Andrew Morton wrote:
> On Thu, 30 Apr 2026 12:18:03 -0700 Matthew Brost <matthew.brost@xxxxxxxxx> wrote:
>
> > TTM allocations at higher orders can drive Xe into a pathological
> > reclaim loop when memory is fragmented:
> >
> > kswapd → shrinker → eviction → rebind (exec ioctl) → repeat
> >
> > In this state, reclaim is triggered despite substantial free memory,
> > but fails to produce contiguous higher-order pages. The Xe shrinker then
> > evicts active buffer objects, increasing faulting and rebind activity
> > and further feeding the loop. The result is high CPU overhead and poor
> > GPU forward progress.
> >
> > ...
> >
> > This series addresses the issue in two ways:
> >
> > TTM: Restrict direct reclaim to beneficial_order. Larger allocations
> > use __GFP_NORETRY to fail quickly rather than triggering reclaim.
> >
> > Xe: Introduce a heuristic in the shrinker to avoid eviction when
> > running under kswapd and the system appears memory-rich but
> > fragmented.
>
> Please cc everyone on all the patches? It's kind of annoying to have
> to hunt around to find out how these proposed changes will be used.
> Personal preference, anyway.
>

Will do - we discussed this in the past and thought we landed on Cc
everyone on the cover then individual patches but will blast everyone
going forward.

> AI review flagged a few possible issues:
> https://sashiko.dev/#/patchset/20260430191809.2142544-1-matthew.brost@xxxxxxxxx

Idk, who authors sashiko but what make it really nice if you could reply
to it to talk things out.

Looking at replies...

- 'Could this global counter drift significantly'
this is looks right for multi-CPU which isn't really the target
here, but will adjust

- 'Additionally, does NR_FREE_PAGES implicitly include CMA pages?'
this is looks right, will adjust

- 'Can high_wmark_pages(zone) evaluate to zero during early boot'
theoretically possible (?), but non-issue IMO, certainly a GPU
shrinker which is current use case this is impossible but maybe
add a warn_on if high_wmark_pages(zone) returns zero

- 'Is this description accurate?'
I inverted the TTM kernel doc vs the code, will fix

Matt