Re: [patch] mm, page_alloc: reintroduce page allocation stall warning
From: David Rientjes
Date: Mon Mar 30 2026 - 18:38:54 EST
On Mon, 30 Mar 2026, Vlastimil Babka (SUSE) wrote:
> >> Previously, we had warnings when a single page allocation took longer
> >> than reasonably expected. This was introduced in commit 63f53dea0c98
> >> ("mm: warn about allocations which stall for too long").
> >>
> >> The warning was subsequently reverted in commit 400e22499dd9 ("mm: don't
> >> warn about allocations which stall for too long") but for reasons
> >> unrelated to the warning itself.
> >
> > I think it makes sense to summarize reasons for the revert. I would
> > propose to change the above to somehting like
> > "
> > The warning was subsequently reverted in commit 400e22499dd9 ("mm: don't
> > warn about allocations which stall for too long") because it was
> > possible to generate memory pressure that would effectivelly stall
> > further progress through printk execution.
> > "
> >
Will do!
> >> @@ -4841,6 +4884,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >> if (current->flags & PF_MEMALLOC)
> >> goto nopage;
> >>
> >> + /* If allocation has taken excessively long, warn about it */
> >> + check_alloc_stall_warn(gfp_mask, ac->nodemask, order, alloc_start_time);
> >> +
> >> /* Try direct reclaim and then allocating */
> >> if (!compact_first) {
> >> page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags,
> >
> > Is there any specific reason for this placement? Compaction can take
> > quite some time as well.
>
> It seems fine to me - as longs as the slowpath is retrying for 10 seconds
> and still can't obtain a page, there's a warning.
>
> We don't catch cases when either the get_page_from_freelist() attempt,
> direct compaction or direct reclaim attempt is what gets us over 10 seconds,
> and at the same time it results in success. If that's a concern, we should
> add another check_alloc_stall_warn() call under got_pg label (as the RFC
> had) - I'm not sure it's all achievable with a single place with the call.
>
Right, the big idea here is that at least one of the allocations that is
looping will run into the check_alloc_stall_warn(). We might miss a
borderline case but the 10 seconds is already arbitrary and this is the
placement where commit 63f53dea0c98 ("mm: warn about allocations which
stall for too long") ended up before it was reverted.