Re: [PATCH] mm: kvmalloc: make kmalloc fast path real fast path

From: Darrick J. Wong
Date: Fri Apr 04 2025 - 11:33:38 EST


On Thu, Apr 03, 2025 at 09:21:50AM -0700, Kees Cook wrote:
> On Thu, Apr 03, 2025 at 09:43:39AM +0200, Michal Hocko wrote:
> > There are users like xfs which need larger allocations with NOFAIL
> > sementic. They are not using kvmalloc currently because the current
> > implementation tries too hard to allocate through the kmalloc path
> > which causes a lot of direct reclaim and compaction and that hurts
> > performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for
> > CIL shadow buffers") for more details).
> >
> > kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that
> > kmalloc (physically contiguous) allocation is preferred and we should go
> > more aggressive to make it happen. There is currently no way to express
> > that kmalloc should be very lightweight and as it has been argued [1]
> > this mode should be default to support kvmalloc(NOFAIL) with a
> > lightweight kmalloc path which is currently impossible to express as
> > __GFP_NOFAIL cannot be combined by any other reclaim modifiers.
> >
> > This patch makes all kmalloc allocations GFP_NOWAIT unless
> > __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both
> > fail fast and retry hard on physically contiguous memory with vmalloc
> > fallback.
> >
> > There is a potential downside that relatively small allocations (smaller
> > than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and
> > cause page block fragmentation. We cannot really rule that out but it
> > seems that xlog_cil_kvmalloc use doesn't indicate this to be happening.
> >
> > [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@xxxxxxxxxxxxxxxxxxx/T/#u
> > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
>
> Thanks for finding a solution for this! It makes way more sense to me to
> kick over to vmap by default for kvmalloc users.

Are 32-bit kernels still constrained by a small(ish) vmalloc space?
It's all fine for xlog_kvmalloc which will continue looping until
something makes progress, but tuning for those platforms aren't a
priority for most xfs developers AFAIK.

--D

> > ---
> > mm/slub.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b46f87662e71..2da40c2f6478 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
> > * We want to attempt a large physically contiguous block first because
> > * it is less likely to fragment multiple larger blocks and therefore
> > * contribute to a long term fragmentation less than vmalloc fallback.
> > - * However make sure that larger requests are not too disruptive - no
> > - * OOM killer and no allocation failure warnings as we have a fallback.
> > + * However make sure that larger requests are not too disruptive - i.e.
> > + * do not direct reclaim unless physically continuous memory is preferred
> > + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start
> > + * working in the background but the allocation itself.
>
> I think a word is missing here? "...but do the allocation..." or
> "...allocation itself happens" ?
>
> --
> Kees Cook
>