Re: upcoming kerneloops.org item: get_page_from_freelist

From: Jens Axboe
Date: Thu Jun 25 2009 - 15:55:47 EST


On Wed, Jun 24 2009, Andrew Morton wrote:
> On Wed, 24 Jun 2009 13:40:11 -0700 (PDT)
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> >
> >
> > On Wed, 24 Jun 2009, Linus Torvalds wrote:
> > > On Wed, 24 Jun 2009, Andrew Morton wrote:
> > > >
> > > > If the caller gets oom-killed, the allocation attempt fails. Callers need
> > > > to handle that.
> > >
> > > I actually disagree. I think we should just admit that we can always free
> > > up enough space to get a few pages, in order to then oom-kill things.
> >
> > Btw, if you want to change the WARN_ON() to warn when you're in the
> > "allocate in order to free memory" recursive case, then I'd have no issues
> > with that.
> >
> > In fact, in that case it probably shouldn't even be conditional on the
> > order.
> >
> > So a
> >
> > WARN_ON_ONCE((p->flags & PF_MEMALLOC) && (gfpmask & __GFP_NOFAIL));
> >
> > actually makes tons of sense.
>
> I suspect that warning will trigger.
>
> alloc_pages
> -> ...
> -> pageout
> -> ...
> -> get_request
> -> blk_alloc_request
> -> elv_set_request
> -> cfq_set_request
> -> cfq_get_queue
> -> cfq_find_alloc_queue
> -> kmem_cache_alloc_node(__GFP_NOFAIL)
> -> Jens
>
> How much this can happen in practice I don't know, but it looks bad.

Yeah it sucks, but I don't think it's that bad to fixup.

The request allocation can fail, if we just return NULL in
cfq_find_alloc_queue() and let that error propagate back up to
get_request_wait(), it would simply io_schedule() and wait for a request
to be freed. The only issue here is that if we have no requests going
for this queue already, we would be stuck since there's noone to wake us
up eventually. So if we did this, we'd have to make the io_schedule()
dependent on whether there are allocations out already. Use global
congestion wait in that case, or just io_schedule_timeout() for
retrying.

The other option is to retry in cfq_find_alloc_queue() without the
NOFAIL and do the congestion wait logic in there.

Yet another option would be to have a dummy queue that is allocated when
the queue io scheduler is initialized. If cfq_find_alloc_queue() fails,
just punt the IO to that dummy queue. That would allow progress even
under extreme failure conditions.

With all that said, the likely hood of ever hitting this path is about
0%. Those failures are the ones that really suck when it's hit in the
field eventually, though :-)

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/