Re: Memory Problem in 2.4.9 ?

From: Daniel Phillips (phillips@bonn-fries.net)
Date: Wed Aug 22 2001 - 14:32:21 EST


On August 22, 2001 06:47 am, Tommy Wu wrote:
> I've tried the patch in the kernel list. Got the result as following...
> This message for command:
> dd if=/dev/zero of=test.dmp bs=1000k count=2500
> on a PIII 1G SMP box with 1G RAM (HIGHMEM enabled)
> kernel 2.4.9 with XFS filesystem patch.
>
> Aug 22 11:51:04 standby kernel: __alloc_pages: 0-order allocation failed
> (gfp=0x30/1).
> Aug 22 11:51:11 standby last message repeated 111 times

OK, this is a straight-up design bug. Although this can also happen with
normal memory, it's much more likely to happen with highmem because of heavy
demand for bounce buffers while a process is in PF_MEMALLOC state. You can
just turn off highmem and these messages will go away, or become so rare that
you are unlikely to ever see one.

Now lets chase the real problem. The gfp=0x30 tells us the requestor is
willing to wait (0x10) and that it is not allowed to do any io (0x40) or call
->writepage (0x80). (By process of elimination, it's a bounce buffer.)
Furthermore, this is a recursive memory request (/1) so __alloc_pages won't
call page_launder because that could hit another allocation request resulting
in a fatal infinite recursion (note to self: why couldn't we call
page_launder here, with NOIO?).

There are probably dirty pages in flight and __alloc_pages is allowed to wait
for them, but it doesn't - it trys reclaim_page once (in
__alloc_pages_limit), falls the rest of the way through __alloc_pages and
gives up with NULL. This is clearly a bad thing because whoever wanted the
page needs it to do writeout. Memory users are supposed to be able to
tolerate alloc failure, but in a case like this, there isn't much choice
other than to spin.

So what could we do better here? Well, obviously when there are writeout
pages in flight, __alloc_pages should wait and not give up. Secondly, we
should be sure that when writeout does complete, the newly freeable page is
given to a PF_MEMALLOC waiter in preference to a normal user. We don't have
mechanisms in place for doing either of those things right now, although some
preliminary design ideas have been discussed. This gets way outside the
bound of what we should be doing in 2.4, we will need such things as
reservations (which Ben has done some work on) and orderly prioritization of
requests in __alloc_pages, with explicit blocking for low priority requests.

What can we do right now? We could always just comment out the alloc failed
message. The result will be a lot of busy waiting on dirty page writeout
which will work but it will keep us from focussing on the question: how did
we get so short of bounce buffers? Well, maybe we are submitting too much IO
without intelligent throttling (/me waves at Ben). That sounds like the
place to attack first.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 23 2001 - 21:00:51 EST