Re: [RFC v2 4/4] vmalloc_exec: share a huge page with kernel text

From: Edgecombe, Rick P
Date: Tue Oct 11 2022 - 16:41:07 EST


On Tue, 2022-10-11 at 16:25 +0000, Song Liu wrote:
> > Maybe this is just me missing some vmalloc understanding, but this
> > pointer to an all zero vm_struct seems weird too. Are there other
> > vmap
> > allocations like this? Which vmap APIs work with this and which
> > don't?
>
> There are two vmap trees at the moment: free_area_ tree and
> vmap_area_ tree. free_area_ tree uses vmap->subtree_max_size, while
> vmap_area_ tree contains vmap backed by vm_struct, and thus uses
> vmap->vm.
>
> This set add a new tree, free_text_area_. This tree is different to
> the other two, as it uses subtree_max_size, and it is also backed
> by vm_struct. To handle this requirement without growing vmap_struct,
> we introduced all_text_vm to store the vm_struct for free_text_area_
> tree.
>
> free_text_area_ tree is different to vmap_area_ tree. Each vmap in
> vmap_area_ tree has its own vm_struct (1 to 1 mapping), while
> multiple vmap in free_text_area_ tree map to a single vm_struct.
>
> Also, free_text_area_ handles granularity < PAGE_SIZE; while the
> other two trees only work with PAGE_SIZE aligned memory.
>
> Does this answer your questions?

I mean from the perspective of someone trying to use this without
diving into the entire implementation.

The function is called vmalloc_exec() and is freed with vfree_exec().
Makes sense. But with the other vmallocs_foo's (including previous
vmalloc_exec() implementations) you can call find_vm_area(), etc on
them. They show in "vmallocinfo" and generally behave similarly. That
isn't true for these new allocations, right?

Then you have code that operates on module text like:
if (is_vmalloc_or_module_addr(addr))
pfn = vmalloc_to_pfn(addr);

It looks like it would work (on x86 at least). Should it be expected
to?

Especially after this patch, where there is memory that isn't even
tracked by the original vmap_area trees, it is pretty much a separate
allocator. So I think it might be nice to spell out which other vmalloc
APIs work with these new functions since they are named "vmalloc".
Maybe just say none of them do.


Separate from that, I guess you are planning to make this limited to
certain architectures? It might be better to put logic with assumptions
about x86 boot time page table details inside arch/x86 somewhere.