Re: [PATCH 03/13] mm: Introduce __GFP_MEMALLOC to allow access toemergency reserves

From: Mel Gorman
Date: Tue Apr 26 2011 - 06:36:56 EST


On Tue, Apr 26, 2011 at 07:49:47PM +1000, NeilBrown wrote:
> On Tue, 26 Apr 2011 08:36:44 +0100 Mel Gorman <mgorman@xxxxxxx> wrote:
>
> > __GFP_MEMALLOC will allow the allocation to disregard the watermarks,
> > much like PF_MEMALLOC. It allows one to pass along the memalloc state in
> > object related allocation flags as opposed to task related flags, such
> > as sk->sk_allocation. This removes the need for ALLOC_PFMEMALLOC as
> > callers using __GFP_MEMALLOC can get the ALLOC_NO_WATERMARK flag which
> > is now enough to identify allocations related to page reclaim.
> >
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> > Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> > ---
> > include/linux/gfp.h | 4 +++-
> > include/linux/mm_types.h | 2 +-
> > mm/page_alloc.c | 14 ++++++--------
> > mm/slab.c | 2 +-
> > 4 files changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index bfb8f93..4e011e7 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -23,6 +23,7 @@ struct vm_area_struct;
> > #define ___GFP_REPEAT 0x400u
> > #define ___GFP_NOFAIL 0x800u
> > #define ___GFP_NORETRY 0x1000u
> > +#define ___GFP_MEMALLOC 0x2000u
> > #define ___GFP_COMP 0x4000u
> > #define ___GFP_ZERO 0x8000u
> > #define ___GFP_NOMEMALLOC 0x10000u
> > @@ -75,6 +76,7 @@ struct vm_area_struct;
> > #define __GFP_REPEAT ((__force gfp_t)___GFP_REPEAT) /* See above */
> > #define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL) /* See above */
> > #define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY) /* See above */
> > +#define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC)/* Allow access to emergency reserves */
> > #define __GFP_COMP ((__force gfp_t)___GFP_COMP) /* Add compound page metadata */
> > #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) /* Return zeroed page on success */
> > #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) /* Don't use emergency reserves */
>
> Having both "MEMALLOC" and "NOMEMALLOC" seems ... unfortunate.
>
> It appears that NOMEMALLOC over-rides MEMALLOC. It might be good to document
> this
>

I can document it. Right now, a better name than NOMEMALLOC does not
spring to mind.

> > +#define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC)/* Allow access to emergency reserves
> unless __GFP_NOMEMALLOC is set*/
>
> > #define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) /* Don't use emergency reserves
> Overrides __GFP_MEMALLOC */
>
> I suspect that it is never valid to set both. So NOMEMALLOC is really
> NO_PF_MEMALLOC, but making that change is probably just noise.
>
> Maybe a
> WARN_ON((gfp_mask & __GFP_MEMALLOC) && (gfp_mask & __GFP_NOMEMALLOC));
> might be wise?
>

Both MEMALLOC and NOMEMALLOC are related to PFMEMALLOC reserves so
it's reasonable for them to have similar names. This warning will
also trigger because it's a combination of flags that does happen.

Consider for example

any interrupt
-> __netdev_alloc_skb (mask == GFP_ATOMIC)
-> __alloc_skb (mask == GFP_ATOMIC)
if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX))
gfp_mask |= __GFP_MEMALLOC;
(mask == GFP_ATOMIC|__GFP_NOMEMALLOC)
-> __kmalloc_reserve
First attempt tries to avoid reserves so adds __GFP_MEMALLOC
(mask == GFP_ATOMIC|__GFP_NOMEMALLOC|__GFP_MEMALLOC)

You're right in that __GFP_NOMEMALLOC overrides __GFP_MEMALLOC so that
could do with a note.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/