Re: [PATCH RFC] mm: vmalloc: do not allow kzalloc to fail

From: Michal Hocko
Date: Mon Dec 24 2018 - 03:11:03 EST


On Sat 22-12-18 09:04:21, Nicholas Mc Guire wrote:
> On Fri, Dec 21, 2018 at 01:58:39PM -0800, David Rientjes wrote:
> > On Thu, 20 Dec 2018, Nicholas Mc Guire wrote:
> >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 871e41c..1c118d7 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -1258,7 +1258,7 @@ void __init vmalloc_init(void)
> > >
> > > /* Import existing vmlist entries. */
> > > for (tmp = vmlist; tmp; tmp = tmp->next) {
> > > - va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
> > > + va = kzalloc(sizeof(*va), GFP_NOWAIT | __GFP_NOFAIL);
> > > va->flags = VM_VM_AREA;
> > > va->va_start = (unsigned long)tmp->addr;
> > > va->va_end = va->va_start + tmp->size;
> >
> > Hi Nicholas,
> >
> > You're right that this looks wrong because there's no guarantee that va is
> > actually non-NULL. __GFP_NOFAIL won't help in init, unfortunately, since
> > we're not giving the page allocator a chance to reclaim so this would
> > likely just end up looping forever instead of crashing with a NULL pointer
> > dereference, which would actually be the better result.
> >
> tried tracing the __GFP_NOFAIL path and had concluded that it would
> end in out_of_memory() -> panic("System is deadlocked on memory\n");
> which also should point cleanly to the cause - but I´m actually not
> that sure if that trace was correct in all cases.

No, we do not trigger the memory reclaim path nor the oom killer when
using GFP_NOWAIT. In fact the current implementation even ignores
__GFP_NOFAIL AFAICS (so I was wrong about the endless loop but I suspect
that we used to loop fpr __GFP_NOFAIL at some point in the past). The
patch simply doesn't have any effect. But the primary objection is that
the behavior might change in future and you certainly do not want to get
stuck in the boot process without knowing what is going on. Crashing
will tell you that quite obviously. Although I have hard time imagine
how that could happen in a reasonably configured system.

> > You could do
> >
> > BUG_ON(!va);
> >
> > to make it obvious why we crashed, however. It makes it obvious that the
> > crash is intentional rather than some error in the kernel code.
>
> makes sense - that atleast makes it imediately clear from the code
> that there is no way out from here.

How does it differ from blowing up right there when dereferencing flags?
It would be clear from the oops.
--
Michal Hocko
SUSE Labs