Re: [PATCH] mm/vmalloc: do not output a spurious warning when huge vmalloc() fails

From: Lorenzo Stoakes
Date: Tue Jun 06 2023 - 04:25:12 EST


On Tue, Jun 06, 2023 at 10:17:02AM +0200, Uladzislau Rezki wrote:
> On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote:
> >
> > On 6/5/23 22:11, Lorenzo Stoakes wrote:
> > > In __vmalloc_area_node() we always warn_alloc() when an allocation
> > > performed by vm_area_alloc_pages() fails unless it was due to a pending
> > > fatal signal.
> > >
> > > However, huge page allocations instigated either by vmalloc_huge() or
> > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
> > > kvmalloc_node()) always falls back to order-0 allocations if the huge page
> > > allocation fails.
> > >
> > > This renders the warning useless and noisy, especially as all callers
> > > appear to be aware that this may fallback. This has already resulted in at
> > > least one bug report from a user who was confused by this (see link).
> > >
> > > Therefore, simply update the code to only output this warning for order-0
> > > pages when no fatal signal is pending.
> > >
> > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@xxxxxxxxx>
> >
> > I think there are more reports of same thing from the btrfs context, that
> > appear to be a 6.3 regression
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=217466
> > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@xxxxxxxxx/
> >
> I had a look at that report. The btrfs complains due to the
> fact that a high-order page(1 << 9) can not be obtained. In the
> vmalloc code we do not fall to 0-order allocator if there is
> a request of getting a high-order.

This isn't true, we _do_ fallback to order-0 (this is the basis of my patch), in
__vmalloc_node_range():-

/* Allocate physical pages and map them into vmalloc space. */
ret = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
if (!ret)
goto fail;

...

fail:
if (shift > PAGE_SHIFT) {
shift = PAGE_SHIFT;
align = real_align;
size = real_size;
goto again;
}

With the order being derived from shift, and __vmalloc_area_node() only being
called from __vmalloc_node_range().

>
> I provided a patch to fallback if a high-order. A reproducer, after
> applying the patch, started to get oppses in another places.
>
> IMO, we should fallback even for high-order requests. Because it is
> highly likely it can not be accomplished.
>
> Any thoughts?
>
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..7a06452f7807 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> page = alloc_pages(alloc_gfp, order);
> else
> page = alloc_pages_node(nid, alloc_gfp, order);
> +
> if (unlikely(!page)) {
> - if (!nofail)
> - break;
> + if (nofail)
> + alloc_gfp |= __GFP_NOFAIL;
>
> - /* fall back to the zero order allocations */
> - alloc_gfp |= __GFP_NOFAIL;
> - order = 0;
> - continue;
> + /* Fall back to the zero order allocations. */
> + if (order || nofail) {
> + order = 0;
> + continue;
> + }
> +
> + break;
> }
>
> /*
> <snip>
>
>
>
> --
> Uladzislau Rezki

I saw that, it seems to be duplicating the same thing as the original fallback
code is (which was originally designed to permit higher order non-__GFP_NOFAIL
allocations before trying order-0 __GFP_NOFAIL).

I don't think it is really useful to change this as it confuses that logic and
duplicates something we already do.

Honestly though moreover I think this whole area needs some refactoring.