Re: I have a blaze of 353 page allocation failures, all alike

From: David Rientjes
Date: Tue Apr 12 2011 - 21:34:25 EST


On Tue, 12 Apr 2011, Christoph Lameter wrote:

> > > it took a while to find a date for a reboot... Unfortunately
> > > it was not possible to get the early boot messages with the
> > > kernel 2.6.32.23 since the compiled in log buffer is too
> > > small. So we installed as you suggested a more recent kernel
> > > 2.6.32.29 with a bigger log buffer, I attach the dmesg
> > > of that, and hope that the information in there is useful.
> > > We will keep an eye on that server with the newer kernel
> > > to see if the allocation failures appear again.
> >
> > the server was running for a few without any more allocation
> > failures with kernel 2.6.32.29 but at one point the server
> > stopped responding, it was still possible for a while to
> > get a login, and trying to kill some processes but that
> > didn't succeed. But after that even login was
> > no longer possible so we had to reset it.
> > I attach the call trace, I hope you can find out what is
> > the problem.
>
> The problem maybe that you have lots and lots of SCSI devices which
> consume ZONE_DMA memory for their control structures. I guess that is
> oversubscribing the 16M zone.
>

You can try to get more memory reserves specifically for lowmem in
ZONE_DMA by changing /proc/sys/vm/lowmem_reserve_ratio. The values are
ratios, so lowering the numbers will yield larger amounts of memory
reserves in ZONE_DMA for GFP_DMA allocations. Try lowering the non-zero
entries to 1 to reserve the entire zone for lowmem, assuming your system
has enough RAM for everything else you're running.

This will verify if ZONE_DMA is being depleted from the larger number of
SCSI devices. If you don't get any additional page allocation failures,
then check how much memory in ZONE_DMA is used at peak and that would be a
sane reserve ratio to use next time you restart the system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/