Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

From: Song Liu
Date: Fri Apr 15 2022 - 22:07:46 EST


On Fri, Apr 15, 2022 at 12:05 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> On Fri, Apr 15, 2022 at 09:44:09AM -0700, Song Liu wrote:
> > Changes v3 => v4:
> > 1. Fix __weak module_alloc_huge; remove unused vmalloc_huge; rename
> > __vmalloc_huge => vmalloc_huge. (Christoph Hellwig)
> > 2. Use vzalloc (as it was before vmalloc_no_huge) and clean up comments in
> > kvm_s390_pv_alloc_vm.
> >
> > Changes v2 => v3:
> > 1. Use __vmalloc_huge in alloc_large_system_hash.
> > 2. Use EXPORT_SYMBOL_GPL for new functions. (Christoph Hellwig)
> > 3. Add more description about the issues and changes.(Christoph Hellwig,
> > Rick Edgecombe).
> >
> > Changes v1 => v2:
> > 1. Add vmalloc_huge(). (Christoph Hellwig)
> > 2. Add module_alloc_huge(). (Christoph Hellwig)
> > 3. Add Fixes tag and Link tag. (Thorsten Leemhuis)
> >
> > Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
> > caused some issues [1], as many users of vmalloc are not yet ready to
> > handle huge pages. To enable a more smooth transition to use huge page
> > backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
> > opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
> > found at [2].
> >
> > Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
> > Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
> >
> > [1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@xxxxxxxxxx/
> > [2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@xxxxxxxxxx/
>
> Looks good except for that I think this should just wait for v5.19. The
> fixes are so large I can't see why this needs to be rushed in other than
> the first assumptions of the optimizations had some flaws addressed here.

We need these changes to fix issues like [3]. Note that there might
still be some
undiscovered issues with huge page backed vmalloc memory on powerpc, which
had HAVE_ARCH_HUGE_VMALLOC enabled since the 5.15 kernel. As we
agreed, the new opt-in flag is a safer approach here. We probably should have
1/4 and 2/4 back ported to stable. Therefore, I think shipping this
set now would
give us a more reliable 5.18 release.

Does this make sense?

Thanks,
Song

[3] https://lore.kernel.org/lkml/14444103-d51b-0fb3-ee63-c3f182f0b546@xxxxxxxxxxxxx/