Re: [RFC PATCH] Limit reclaim to avoid TTM desktop stutter under mem pressure
From: Christian König
Date: Wed Apr 01 2026 - 06:23:52 EST
On 4/1/26 04:08, Daniel Colascione wrote:
> TTM seems to be too eager to kick off reclaim while kwin is drawing
>
> I've noticed that in 7.0-rc6, and since at least 6.17, kwin_wayland
> stalls in DRM ioctls to xe when the system is under memory pressure,
> causing missed frames, cursor-movement stutter, and general
> sluggishness. The root cause seems to be synchronous and asynchronous
> reclaim in ttm_pool_alloc_page as TTM tries, and fails, to allocate
> progressively lower-order pages in response to pool-cache misses when
> allocating graphics buffers.
>
> Memory is fragmented enough that the compaction fails (as I can see in
> compact_fail and compact_stall in /proc/vmstat; extfrag says the normal
> pool is unusable for large allocations too). Additionally, compaction
> seems to be emptying the ttm pool, since page_pool in TTM debugfs
> reports all the buckets are empty while I'm seeing the
> kwin_wayland sluggishness.
>
> In profiles, I see time dominated by copy_pages and clear_pages in the
> TTM paging code. kswapd runs constantly despite the system as a whole
> having plenty of free memory.
>
> I can reproduce the problem on my 32GB-RAM X1C Gen 13 by booting with
> kernelcore=8G (not needed, but makes the repro happen sooner), running a
> find / >/dev/null (to fragment memory), and doing general web
> browsing. The stalls seem self-perpetuating once it gets started; it
> persists even after killing the find. I've noticed this stall in
> ordinary use too, even without the kernelcore= zone tweak, but without
> kernelcore, it usually takes a while (hours?) after boot for memory to
> become fragmented enough that higher-order allocations fail.
>
> The patch below fixes the issue for me. TBC, I'm not sure it's the
> _right_ fix, but it works for me. I'm guessing that even if the approach
> is right, a new module parameter isn't warranted.
Yeah the module parameter is probably good for testing but really won't fly.
>
> With the patch below, when I set my new max_reclaim_order ttm module
> parameter to zero, the kwin_wayland stalls under memory pressure
> stop. (TBC, this setting inhibits sync or async reclaim except for
> order-zero pages.) TTM allocation occurs in latency-critical paths
> (e.g. Wayland frame commit): do you think we _should_ reclaim here?
>
> BTW, I also tried having xe pass a beneficial order of 9, but it didn't
> help: we end up doing a lot of compaction work below this order anyway.
Well as far as I can tell that allocation behavior is completely intentional and just the lesser evil.
I can't say much for Intel HW, but for AMD HW the difference between higher order pages (2MiB) and anything below usually comes with a 20-30% performance drop. So falling back to anything below 2MiB is actually only the last resort to avoid the OOM killer.
The real question is where is that heavy fragmentation and sluggishness coming from? Even when the find / > /dev/null creates a lot of 4KiB allocations the kernel should be able to reclaim them to create larger pages again.
And then finally I agree with Thomas that userspace shouldn't make that many allocations on a normal desktop. That should also be a good place to start investigating what happens here.
Regards,
Christian.
>
> Signed-off-by: Daniel Colascione <dancol@xxxxxxxxxx>
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index c0d95559197c..fd255914c0d3 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -115,9 +115,13 @@ struct ttm_pool_tt_restore {
> };
>
> static unsigned long page_pool_size;
> +static unsigned int max_reclaim_order;
>
> MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
> module_param(page_pool_size, ulong, 0644);
> +MODULE_PARM_DESC(max_reclaim_order,
> + "Maximum order that keeps upstream reclaim behavior");
> +module_param(max_reclaim_order, uint, 0644);
>
> static atomic_long_t allocated_pages;
>
> @@ -146,16 +150,14 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> * Mapping pages directly into an userspace process and calling
> * put_page() on a TTM allocated page is illegal.
> */
> - if (order)
> + if (order) {
> gfp_flags |= __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN |
> __GFP_THISNODE;
> -
> - /*
> - * Do not add latency to the allocation path for allocations orders
> - * device tolds us do not bring them additional performance gains.
> - */
> - if (beneficial_order && order > beneficial_order)
> - gfp_flags &= ~__GFP_DIRECT_RECLAIM;
> + if (beneficial_order && order > beneficial_order)
> + gfp_flags &= ~__GFP_DIRECT_RECLAIM;
> + if (order > max_reclaim_order)
> + gfp_flags &= ~__GFP_RECLAIM;
> + }
>
> if (!ttm_pool_uses_dma_alloc(pool)) {
> p = alloc_pages_node(pool->nid, gfp_flags, order);