Re: [RFC v2 4/4] vmalloc_exec: share a huge page with kernel text

From: Edgecombe, Rick P
Date: Wed Oct 12 2022 - 14:38:53 EST


On Wed, 2022-10-12 at 05:37 +0000, Song Liu wrote:
> > Then you have code that operates on module text like:
> > if (is_vmalloc_or_module_addr(addr))
> > pfn = vmalloc_to_pfn(addr);
> >
> > It looks like it would work (on x86 at least). Should it be
> > expected
> > to?
> >
> > Especially after this patch, where there is memory that isn't even
> > tracked by the original vmap_area trees, it is pretty much a
> > separate
> > allocator. So I think it might be nice to spell out which other
> > vmalloc
> > APIs work with these new functions since they are named "vmalloc".
> > Maybe just say none of them do.
>
> I guess it is fair to call this a separate allocator. Maybe
> vmalloc_exec is not the right name? I do think this is the best
> way to build an allocator with vmap tree logic.

Yea, I don't know about the name. I think someone else suggested it
specifically, right?

I had called mine perm_alloc() so it could also handle read-only and
other permissions. If you keep vmalloc_exec() it needs some big
comments about which APIs can work with it, and an audit of the
existing code that works on module and JIT text.

>
> >
> >
> > Separate from that, I guess you are planning to make this limited
> > to
> > certain architectures? It might be better to put logic with
> > assumptions
> > about x86 boot time page table details inside arch/x86 somewhere.
>
> Yes, the architecture need some text_poke mechanism to use this.

It also depends on the space between _etext and the PMD aligned _etext
to be present and not get used by anything else. For other
architectures, there might be rodata there or other things.

> On BPF side, x86_64 calls this directly from arch code (jit engine),
> so it is mostly covered. For modules, we need to handle this better.

That old RFC has some ideas around this. I kind of like your
incremental approach though. To me it seems to be moving in the right
direction.