Re: [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc

From: NeilBrown
Date: Mon Oct 18 2021 - 20:44:19 EST


On Mon, 18 Oct 2021, Michal Hocko wrote:
> From: Michal Hocko <mhocko@xxxxxxxx>
>
> vmalloc historically hasn't supported GFP_NO{FS,IO} requests because
> page table allocations do not support externally provided gfp mask
> and performed GFP_KERNEL like allocations.
>
> Since few years we have scope (memalloc_no{fs,io}_{save,restore}) APIs
> to enforce NOFS and NOIO constrains implicitly to all allocators within
> the scope. There was a hope that those scopes would be defined on a
> higher level when the reclaim recursion boundary starts/stops (e.g. when
> a lock required during the memory reclaim is required etc.). It seems
> that not all NOFS/NOIO users have adopted this approach and instead
> they have taken a workaround approach to wrap a single [k]vmalloc
> allocation by a scope API.
>
> These workarounds do not serve the purpose of a better reclaim recursion
> documentation and reduction of explicit GFP_NO{FS,IO} usege so let's
> just provide them with the semantic they are asking for without a need
> for workarounds.
>
> Add support for GFP_NOFS and GFP_NOIO to vmalloc directly. All internal
> allocations already comply with the given gfp_mask. The only current
> exception is vmap_pages_range which maps kernel page tables. Infer the
> proper scope API based on the given gfp mask.
>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> mm/vmalloc.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d77830ff604c..7455c89598d3 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2889,6 +2889,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> unsigned long array_size;
> unsigned int nr_small_pages = size >> PAGE_SHIFT;
> unsigned int page_order;
> + unsigned int flags;
> + int ret;
>
> array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
> gfp_mask |= __GFP_NOWARN;
> @@ -2930,8 +2932,24 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> goto fail;
> }
>
> - if (vmap_pages_range(addr, addr + size, prot, area->pages,
> - page_shift) < 0) {
> + /*
> + * page tables allocations ignore external gfp mask, enforce it
> + * by the scope API
> + */
> + if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> + flags = memalloc_nofs_save();
> + else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))

I would *much* rather this were written

else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)

so that the comparison with the previous test is more obvious. Ditto
for similar code below.
It could even be

switch (gfp_mask & (__GFP_FS | __GFP_IO)) {
case __GFP__IO: flags = memalloc_nofs_save(); break;
case 0: flags = memalloc_noio_save(); break;
}

But I'm not completely convinced that is an improvement.

In terms of functionality this looks good.
Thanks,
NeilBrown


> + flags = memalloc_noio_save();
> +
> + ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> + page_shift);
> +
> + if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> + memalloc_nofs_restore(flags);
> + else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
> + memalloc_noio_restore(flags);
> +
> + if (ret < 0) {
> warn_alloc(gfp_mask, NULL,
> "vmalloc error: size %lu, failed to map pages",
> area->nr_pages * PAGE_SIZE);
> --
> 2.30.2
>
>