Re: introducing __GFP_PANIC
From: Cyrill Gorcunov
Date: Mon May 04 2009 - 04:49:36 EST
[Pekka Enberg - Mon, May 04, 2009 at 11:32:21AM +0300]
| Hi Cyrill,
|
| On Mon, 2009-05-04 at 12:14 +0400, Cyrill Gorcunov wrote:
| > mm: introduce __GFP_PANIC
| >
| > Sometime we need that memory obtained via kmalloc
| > should always be granted. If there is no enough
| > memory we just can't go further.
| >
| > For such a case we introduce __GFP_PANIC panic
| > modificator. If memory can't be granted -- we just
| > panic.
| >
| > Note that __GFP_PANIC implicitly turn off failslab
| > facility on such kind calls.
| >
| > Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
| > ---
| > include/linux/gfp.h | 13 ++++++++++---
| > include/linux/slab_def.h | 1 +
| > mm/failslab.c | 3 +++
| > mm/page_alloc.c | 17 +++++++++++++++--
| > 4 files changed, 29 insertions(+), 5 deletions(-)
| >
| > Index: linux-2.6.git/include/linux/gfp.h
| > =====================================================================
| > --- linux-2.6.git.orig/include/linux/gfp.h
| > +++ linux-2.6.git/include/linux/gfp.h
| > @@ -7,6 +7,7 @@
| > #include <linux/topology.h>
| >
| > struct vm_area_struct;
| > +void oom_panic(gfp_t gfp_mask, unsigned int order);
| >
| > /*
| > * GFP bitmasks..
| > @@ -58,7 +59,9 @@ struct vm_area_struct;
| > #define __GFP_NOTRACK ((__force gfp_t)0)
| > #endif
| >
| > -#define __GFP_BITS_SHIFT 22 /* Room for 22 __GFP_FOO bits */
| > +#define __GFP_PANIC ((__force gfp_t)0x400000u) /* Panic on page alloction failure */
| > +
| > +#define __GFP_BITS_SHIFT 23 /* Room for 23 __GFP_FOO bits */
| > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
| >
| > /* This equals 0, but use constants in case they ever change */
| > @@ -196,8 +199,10 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
| > static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
| > unsigned int order)
| > {
| > - if (unlikely(order >= MAX_ORDER))
| > + if (unlikely(order >= MAX_ORDER)) {
| > + oom_panic(gfp_mask, order);
|
| This...
|
| > return NULL;
| > + }
| >
| > /* Unknown node is current node */
| > if (nid < 0)
| > @@ -212,8 +217,10 @@ extern struct page *alloc_pages_current(
| > static inline struct page *
| > alloc_pages(gfp_t gfp_mask, unsigned int order)
| > {
| > - if (unlikely(order >= MAX_ORDER))
| > + if (unlikely(order >= MAX_ORDER)) {
| > + oom_panic(gfp_mask, order);
|
| ...this...
|
| > return NULL;
| > + }
| >
| > return alloc_pages_current(gfp_mask, order);
| > }
| > Index: linux-2.6.git/include/linux/slab_def.h
| > =====================================================================
| > --- linux-2.6.git.orig/include/linux/slab_def.h
| > +++ linux-2.6.git/include/linux/slab_def.h
| > @@ -143,6 +143,7 @@ static __always_inline void *kmalloc(siz
| > i++;
| > #include <linux/kmalloc_sizes.h>
| > #undef CACHE
| > + oom_panic(flags, get_order(size));
|
| ...and this look fishy. They're static inlines that get expanded
| everywhere and they're known to be performance sensitive paths. I don't
| see much point in checking for >= MAX_ORDER at all because we will get a
| nice oops anyway for that.
Yep, I noticed that it is really ugly cases to put oom_panic
here. But shouldn't we cover all the sites? If I'm not missing
something, for slab with builtin constant greater then KMALLOC_MAX_SIZE
the kmalloc caller could get NULL even if __GFP_PANIC specified.
Which in turn means __GFP_PANIC just doesn't work properly here.
Anyway, I'll update the patch with fixes you proposed should be done.
Wait a bit :)
|
| __GFP_PANIC is an annotation saying that it's okay for a particular
| call-site not to check for NULL because we never expect to run out of
| memory at that point. But we don't really need to panic() for all the
| possible *errors*, just for the out-of-memory case.
|
| Pekka
|
Which means, __GFP_PANIC will not grant the guarantee to a caller
that he will not get NULL deref even with __GFP_PANIC specified?
Perhaps I missing something...
-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/