Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc()

From: Mike Rapoport
Date: Wed Mar 29 2023 - 03:31:12 EST


On Tue, Mar 28, 2023 at 05:24:49PM +0200, Michal Hocko wrote:
> On Tue 28-03-23 18:11:20, Mike Rapoport wrote:
> > On Tue, Mar 28, 2023 at 09:39:37AM +0200, Michal Hocko wrote:
> [...]
> > > OK, so you want to reduce that direct map fragmentation?
> >
> > Yes.
> >
> > > Is that a real problem?
> >
> > A while ago Intel folks published report [1] that showed better performance
> > with large pages in the direct map for majority of benchmarks.
> >
> > > My impression is that modules are mostly static thing. BPF
> > > might be a different thing though. I have a recollection that BPF guys
> > > were dealing with direct map fragmention as well.
> >
> > Modules are indeed static, but module_alloc() used by anything that
> > allocates code pages, e.g. kprobes, ftrace and BPF. Besides, Thomas
> > mentioned that having code in 2M pages reduces iTLB pressure [2], but
> > that's not only about avoiding the splits in the direct map but also about
> > using large mappings in the modules address space.
> >
> > BPF guys suggested an allocator for executable memory [3] mainly because
> > they've seen performance improvement of 0.6% - 0.9% in their setups [4].
>
> These are fair arguments and it would have been better to have them in
> the RFC. Also it is not really clear to me what is the actual benefit of
> the unmapping for those usecases. I do get they want to benefit from
> caching on the same permission setup but do they need unmapping as well?

The pages allocated with module_alloc() get different permissions depending
on whether they belong to text, rodata, data etc. The permissions are
updated in both vmalloc address space and in the direct map. The updates to
the direct map cause splits of the large pages. If we cache large pages as
unmapped we take out the entire 2M page from the direct map and then
if/when it becomes free it can be returned to the direct map as a 2M page.

Generally, the unmapped allocations are intended for use-cases that anyway
map the memory elsewhere than direct map and need to modify direct mappings
of the memory, be it modules_alloc(), secretmem, PKS page tables or maybe
even some of the encrypted VM memory.

> > > > If we were to use unmapped_pages_alloc() in modules_alloc(), we would have
> > > > to implement the part of vmalloc() that reserves the virtual addresses and
> > > > maps the allocated memory there in module_alloc().
> > >
> > > Another option would be to provide an allocator for the backing pages to
> > > vmalloc. But I do agree that a gfp flag is a less laborous way to
> > > achieve the same. So the primary question really is whether we really
> > > need vmalloc support for unmapped memory.
> >
> > I'm not sure I follow here. module_alloc() is essentially an alias to
> > vmalloc(), so to reduce direct map fragmentation caused by code allocations
> > the most sensible way IMO is to support unmapped memory in vmalloc().
>
> What I meant to say is that vmalloc is currently using the page
> allocator (resp bulk allocator) for the backing storage. I can imagine
> an extension to replace this default allocator by something else (e.g. a
> given allocation function). This would be more generic and it would
> allow different usecases (e.g. benefit from caching withtout doing the
> actual unmapping).

The whole point of unmapped cache is to allow non-default permissions in
the direct map without splitting large pages there. So the cache that
unmaps large pages in the direct map and then handles out subpages of that
page seems like the most logical thing to do. With the current use cases
the callers anyway map these pages at different virtual addresses, i.e.
page cache or vmalloc.

> > I also think vmalloc with unmmapped pages can provide backing pages for
> > execmem_alloc() Song proposed.
>
> Why would you need to have execmem_alloc have its memory virtually
> mapped into vmalloc space?

Currently all the memory allocated for code is managed in a subset of
vmalloc() space. The intention of execmem_alloc() was to replace
module_alloc() for the code pages, so it's natural that it will use the
same virtual ranges.

But anyway, execmem_alloc() is a long shot as it also requires quite a
refactoring of modules loading.

> > [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@xxxxxxxxxxxxxxx/
> > [2] https://lore.kernel.org/all/87mt86rbvy.ffs@tglx/
> > [3] https://lore.kernel.org/all/20221107223921.3451913-1-song@xxxxxxxxxx/
> > [4] https://lore.kernel.org/bpf/20220707223546.4124919-1-song@xxxxxxxxxx/
> >
> > --
> > Sincerely yours,
> > Mike.
>
> --
> Michal Hocko
> SUSE Labs

--
Sincerely yours,
Mike.