Re: upcoming kerneloops.org item: get_page_from_freelist

From: Pekka Enberg
Date: Wed Jun 24 2009 - 13:00:19 EST


On Wed, Jun 24, 2009 at 7:56 PM, Pekka Enberg<penberg@xxxxxxxxxxxxxx> wrote:
> On Wed, Jun 24, 2009 at 7:55 PM, Pekka Enberg<penberg@xxxxxxxxxxxxxx> wrote:
>> Hi Andrew,
>>
>> On Wed, 24 Jun 2009 08:07:53 -0700 Arjan van de Ven <arjan@xxxxxxxxxxxxx> wrote:
>>>> a new item is coming up fast in the kerneloops.org stats, and it's new
>>>> in 2.6.31-rc;
>>>>
>>>> http://www.kerneloops.org/searchweek.php?search=get_page_from_freelist
>>>>
>>>> it's this warning in mm/page_alloc.c:
>>>>
>>>>                         * __GFP_NOFAIL is not to be used in new code.
>>>>                          *
>>>>                          * All __GFP_NOFAIL callers should be fixed so that they
>>>>                          * properly detect and handle allocation failures.
>>>>                          *
>>>>                          * We most definitely don't want callers attempting to
>>>>                          * allocate greater than single-page units with
>>>>                          * __GFP_NOFAIL.
>>>>                          */
>>>>                         WARN_ON_ONCE(order > 0);
>>>>
>>>>
>>>> typical backtraces look like
>>>>
>>>> get_page_from_freelist
>>>> __alloc_pages_nodemask
>>>> alloc_pages_current
>>>> alloc_slab_page
>>>> new_slab
>>>> __slab_alloc
>>>> kmem_cache_alloc_notrace
>>>> start_this_handle
>>>> jbd2_journal_start
>>>>
>>>> and
>>>>
>>>> get_page_from_freelist
>>>> __alloc_pages_nodemask
>>>> alloc_pages_current
>>>> alloc_slab_page
>>>> new_slab
>>>> __slab_alloc
>>>> kmem_cache_alloc_notrace
>>>> start_this_handle
>>>> journal_start
>>>> ext3_journal_start_sb
>>>> ext3_journal_start
>>>> ext3_dirty_inode
>>>>
>>>> but there are some other ones as well at the url above.
>>>>
>>>>
>>>> git blame shows that
>>>>
>>>> commit dab48dab37d2770824420d1e01730a107fade1aa
>>>> Author: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>> Date:   Tue Jun 16 15:32:37 2009 -0700
>>>>
>>>> introduced this WARN_ON.....
>>
>> On Wed, Jun 24, 2009 at 7:46 PM, Andrew Morton<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>> Well yes.  Using GFP_NOFAIL on a higher-order allocation is bad.  This
>>> patch is there to find, name, shame, blame and hopefully fix callers.
>>>
>>> A fix for cxgb3 is in the works.  slub's design is a big problem.
>>>
>>> But we'll probably have to revert it for 2.6.31 :(
>>
>> How is SLUB's design a problem here? Can't we just clear GFP_NOFAIL
>> from the higher order allocation and thus force GFP_NOFAIL allocations
>> to use the minimum required order?
>
> Small correction: force GFP_NOFAIL allocations to use minimum order
> _if_ the higher order allocation fails.

And here's a badly linewrapped, untested patch to do that (sorry I
don't have my laptop here). Christoph, does this look ok to you?

diff --git a/mm/slub.c b/mm/slub.c
index ce62b77..8aaf0fa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1088,8 +1088,7 @@ static struct page *allocate_slab(struct
kmem_cache *s, gfp_t flags, int node)

flags |= s->allocflags;

- page = alloc_slab_page(flags | __GFP_NOWARN | __GFP_NORETRY, node,
- oo);
+ page = alloc_slab_page(flags & ~__GFP_NOFAIL | __GFP_NOWARN |
__GFP_NORETRY, node, oo);
if (unlikely(!page)) {
oo = s->min;
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/