Re: [PATCH] jbd2: get rid of superfluous __GFP_REPEAT

From: Michal Hocko
Date: Thu Nov 26 2015 - 10:10:25 EST


Hi Ted,
are there any objections for the patch or should I just repost it?

On Mon 09-11-15 09:16:50, Michal Hocko wrote:
> On Sun 08-11-15 00:08:02, Theodore Ts'o wrote:
> > On Sat, Nov 07, 2015 at 10:22:55AM +0900, Tetsuo Handa wrote:
> > > All jbd2_alloc() callers seem to pass GFP_NOFS. Therefore, use of
> > > vmalloc() which implicitly passes GFP_KERNEL | __GFP_HIGHMEM can cause
> > > deadlock, can't it? This vmalloc(size) call needs to be replaced with
> > > __vmalloc(size, flags).
> >
> > jbd2_alloc is only passed in the bh->b_size, which can't be >
> > PAGE_SIZE, so the code path that calls vmalloc() should never get
> > called. When we conveted jbd2_alloc() to suppor sub-page size
> > allocations in commit d2eecb039368, there was an assumption that it
> > could be called with a size greater than PAGE_SIZE, but that's
> > certaily not true today.
>
> Thanks for the clarification. Then the patch can be simplified even
> more then.
> ---
> From fbf02c347dae8ee86e396bc769a88e85773db83e Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxxx>
> Date: Wed, 21 Oct 2015 17:14:49 +0200
> Subject: [PATCH] jbd2: get rid of superfluous __GFP_REPEAT
>
> jbd2_alloc is explicit about its allocation preferences wrt. the
> allocation size. Sub page allocations go to the slab allocator
> and larger are using either the page allocator or vmalloc. This
> is all good but the logic is unnecessarily complex.
> 1) as per Ted, the vmalloc fallback is a left-over:
> : jbd2_alloc is only passed in the bh->b_size, which can't be >
> : PAGE_SIZE, so the code path that calls vmalloc() should never get
> : called. When we conveted jbd2_alloc() to suppor sub-page size
> : allocations in commit d2eecb039368, there was an assumption that it
> : could be called with a size greater than PAGE_SIZE, but that's
> : certaily not true today.
> Moreover vmalloc allocation might even lead to a deadlock because
> the callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.
> 2) Requests smaller than order-3 are go to the page allocator with
> __GFP_REPEAT. The flag doesn't do anything useful for those because they
> are smaller than PAGE_ALLOC_COSTLY_ORDER.
>
> Let's simplify the code flow and use the slab allocator for sub-page
> requests and the page allocator for others. Even though order > 0 is
> not currently used as per above leave that option open.
>
> Cc: "Theodore Ts'o" <tytso@xxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> fs/jbd2/journal.c | 32 +++++++-------------------------
> 1 file changed, 7 insertions(+), 25 deletions(-)
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 81e622681c82..0145e7978ab4 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2299,18 +2299,10 @@ void *jbd2_alloc(size_t size, gfp_t flags)
>
> BUG_ON(size & (size-1)); /* Must be a power of 2 */
>
> - flags |= __GFP_REPEAT;
> - if (size == PAGE_SIZE)
> - ptr = (void *)__get_free_pages(flags, 0);
> - else if (size > PAGE_SIZE) {
> - int order = get_order(size);
> -
> - if (order < 3)
> - ptr = (void *)__get_free_pages(flags, order);
> - else
> - ptr = vmalloc(size);
> - } else
> + if (size < PAGE_SIZE)
> ptr = kmem_cache_alloc(get_slab(size), flags);
> + else
> + ptr = (void *)__get_free_pages(flags, get_order(size));
>
> /* Check alignment; SLUB has gotten this wrong in the past,
> * and this can lead to user data corruption! */
> @@ -2321,20 +2313,10 @@ void *jbd2_alloc(size_t size, gfp_t flags)
>
> void jbd2_free(void *ptr, size_t size)
> {
> - if (size == PAGE_SIZE) {
> - free_pages((unsigned long)ptr, 0);
> - return;
> - }
> - if (size > PAGE_SIZE) {
> - int order = get_order(size);
> -
> - if (order < 3)
> - free_pages((unsigned long)ptr, order);
> - else
> - vfree(ptr);
> - return;
> - }
> - kmem_cache_free(get_slab(size), ptr);
> + if (size < PAGE_SIZE)
> + kmem_cache_free(get_slab(size), ptr);
> + else
> + free_pages((unsigned long)ptr, get_order(size));
> };
>
> /*
> --
> 2.6.2
>
> --
> Michal Hocko
> SUSE Labs

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/