Re: [PATCH v2] module: replace module_layout with module_memory
From: Song Liu
Date: Thu Jan 26 2023 - 15:02:03 EST
Hi Luis,
Thanks for your kind review!
On Wed, Jan 25, 2023 at 10:06 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
[...]
> >
> > MOD_MEM_TYPE_TEXT,
> > MOD_MEM_TYPE_DATA,
> > MOD_MEM_TYPE_RODATA,
> > MOD_MEM_TYPE_RO_AFTER_INIT,
> > MOD_MEM_TYPE_INIT_TEXT,
> > MOD_MEM_TYPE_INIT_DATA,
> > MOD_MEM_TYPE_INIT_RODATA,
> >
> > and allocating them separately.
>
> First thanks for doing this work!
>
> This seems to not acknolwedge the original goal of the first module_layout and
> the latched rb-tree use, and that was was for speeding up __module_address()
> since it *can* even be triggered on NMIs. I say this because the first question
> that comes to me is the impact to performance on __module_address() I can't
> see that changing much here, but mention it as it similar consideration
> should be made in case future changes modify this path.
To make sure I understand this correctly. Do you mean we need something like
the following in the commit log?
"""
This adds slightly more entries to mod_tree (from up to 3 entries per module, to
up to 7 entries per module). However, this at most adds a small constant
overhead to __module_address(), which is expected to be fast.
"""
>
> Microbenching something so trivial as __module_address() may not be as useful
> for an idle system, at the very least being able to compare before and after
> even on idle may be useful *if* you eventually do some more radical changes
> here. Modules-related kernel/kallsyms_selftest.c did that for kallsyms_lookup_name()
> and friend just recently for a minor performance enhancement.
kernel/kallsyms_selftest.c is new to me. I will give it a try.
>
> At a high level it is perhaps my only conern so far while reviewing this patch,
> and I am glad it is now clear that addressing modules is a requirement. The rest
> seems like a highly welcomed cleanup even though the diffstat does not reflect
> that.
>
> > Various archs use module_layout for different data. These data are put
> > into different module_memory based on their location in module_layout.
> > IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
> > data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
>
> I think the commit log should document a bit the rationale for why
> ARCH_WANTS_MODULES_DATA_IN_VMALLOC gets *can* tidied up so well here.
Great point! Will add this in v3.
>
> > Signed-off-by: Song Liu <song@xxxxxxxxxx>
> > Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >
> > ---
> >
> > This is the preparation work for the type aware module_alloc() discussed
> > in [1]. While this work is not covered much in the discussion, it is a
> > critical step of the effort.
> >
> > As this part grows pretty big (~1000 lines, + and -), I would like get
> > some feedback on it, so that I know it is on the right track.
> >
> > Please share your comments. Thanks!
> >
> > Test coverage: Tested on x86_64.
>
> I will likely merge this onto modules-next soon, not because I think it is
> ready, but just because I think it *is* mostly ready and the next thing
> we need is exposure and testing. rc5 is pretty late to consider this
> for v6.3 and so hopefully for this cycle we can at least settle on
> something which will sit in linux-next since the respective linux-next
> after v6.3-rc1 is released.
Yes, this definitely needs more tests. Given different archs use
module_layout in all sorts of ways. I will be very surprised if I updated
all them correctly (though I tried hard to).
>
> > Build tested by kernel test bot in [2]. The only regression in [2] was a
> > typo in parisc, which is also fixed.
> >
> > [1] https://lore.kernel.org/linux-mm/20221107223921.3451913-1-song@xxxxxxxxxx/T/#u
>
> You still never addressed my performance suggestions so don't be
> surprised if I insist later. Yes you can use existing performance
> benchmarks, specially now with modules as a hard requirement, to
> show gains. So I'd like to clarify that if I'm reviewing late it is
> because:
>
> a) my modules patch review queue has been high as of late
> b) you seem to not have taken these performance suggestions into consideration
> before and so I tend to put it at my end of my queue for review.
I think it will be a lot easier to run performance tests with the
module support. Let's see what we can do when we get to the
performance test part.
[...]
>
> > @@ -1389,16 +1466,29 @@ unsigned int __weak arch_mod_section_prepend(struct module *mod,
> > return 0;
> > }
> >
> > -/* Update size with this section: return offset. */
> > -long module_get_offset(struct module *mod, unsigned int *size,
> > - Elf_Shdr *sechdr, unsigned int section)
> > +/*
> > + * Use highest 4 bits of sh_entsize to store the mod_mem_type of this
> > + * section. This leaves 28 bits for offset on 32-bit systems, which is
> > + * about 256 MiB (WARN_ON_ONCE if we exceed that).
> > + */
> > +
> > +#define SH_ENTSIZE_TYPE_BITS 4
> > +#define SH_ENTSIZE_TYPE_SHIFT (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)
> > +#define SH_ENTSIZE_TYPE_MASK ((1UL << SH_ENTSIZE_TYPE_BITS) - 1)
> > +#define SH_ENTSIZE_OFFSET_MASK ((1UL << (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)) - 1)
> > +
> > +long module_get_offset_and_type(struct module *mod, enum mod_mem_type type,
> > + Elf_Shdr *sechdr, unsigned int section)
> > {
> > - long ret;
> > + long offset;
> > + long mask = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK) << SH_ENTSIZE_TYPE_SHIFT;
> >
> > - *size += arch_mod_section_prepend(mod, section);
> > - ret = ALIGN(*size, sechdr->sh_addralign ?: 1);
> > - *size = ret + sechdr->sh_size;
> > - return ret;
> > + mod->mod_mem[type].size += arch_mod_section_prepend(mod, section);
> > + offset = ALIGN(mod->mod_mem[type].size, sechdr->sh_addralign ?: 1);
> > + mod->mod_mem[type].size = offset + sechdr->sh_size;
> > +
> > + WARN_ON_ONCE(offset & mask);
> > + return offset | mask;
> > }
>
> Could you split this somehow with the mask stuff up into its own
> non-functional patch first somehow to make this easier to review?
>
> No worries if you can't.
This is not very straightforward with INIT_OFFSET_MASK. So I will keep
it as-is for now. Btw, while looking into this, I found a leftover
INIT_OFFSET_MASK that I will remove in v3.
>
> > @@ -2117,74 +2193,45 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> >
> > static int move_module(struct module *mod, struct load_info *info)
> > {
> > + enum mod_mem_type type;
> > int i;
> > void *ptr;
> >
> > - /* Do the allocs. */
> > - ptr = module_alloc(mod->core_layout.size);
> > - /*
> > - * The pointer to this block is stored in the module structure
> > - * which is inside the block. Just mark it as not being a
> > - * leak.
> > - */
> > - kmemleak_not_leak(ptr);
> > - if (!ptr)
> > - return -ENOMEM;
> > -
> > - memset(ptr, 0, mod->core_layout.size);
> > - mod->core_layout.base = ptr;
> > + for_each_mod_mem_type(type) {
> > + if (!mod->mod_mem[type].size) {
> > + mod->mod_mem[type].base = NULL;
> > + continue;
> > + }
> > + mod->mod_mem[type].size = PAGE_ALIGN(mod->mod_mem[type].size);
> > + if (mod_mem_use_vmalloc(type))
> > + ptr = vzalloc(mod->mod_mem[type].size);
> > + else
> > + ptr = module_alloc(mod->mod_mem[type].size);
>
> This form to check for mod_mem_use_vmalloc() is used twice, how about
> a helper to just do it one line for us?
Good idea! Will update in v3.
>
> > @@ -2195,6 +2242,14 @@ static int move_module(struct module *mod, struct load_info *info)
> > }
> >
> > return 0;
> > +out_enomem:
> > + for (type--; type >= 0; type--) {
> > + if (mod_mem_use_vmalloc(type))
> > + vfree(mod->mod_mem[type].base);
> > + else
> > + module_memfree(mod->mod_mem[type].base);
> > + }
> > + return -ENOMEM;
> > }
>
> Here's the other user.
>
> > diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c
>
> This was the hardest file to review and so I ask the same, if a separate
> non-functional change could be done first to make the changes easier
> to review.
AFAICT, strict_rwx.c is pretty straightforward after the change. And it is
just ~100 LOC. The non-functional change is very likely to be more
complicated. So I will keep this part as-is.
>
> No worries if not.
>
> Reviewed-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
I will fix these, rebase, and send v3.
Thanks again!
Song