Re: I have a blaze of 353 page allocation failures, all alike

From: Peter Kruse
Date: Wed Feb 16 2011 - 07:22:26 EST


Hi Christoph,

thanks again for your time.

Christoph Lameter wrote:
On Tue, 15 Feb 2011, Peter Kruse wrote:

> > we have set vm.min_free_kbytes = 2097152 but the problem
> > obviously did not go away.
>
> 2GB of reserves? How much memory does your system have?

48GB

Ok then you just may potentially clog up the DMA zones. Maybe set the
reserves to a reasonable level like 10M or so?

ok, that's what we had before the first incident, and then increased
it to this value to see if it makes difference.


How many buffers are configured at the various levels for the device that
is receiving messages? I guess that may be a bit on the high side?

hm, I'm not sure if I know what you want mean or want me to do.


> Could you post the entire messages from the kernel log? We need the OOM
> info to figure out more about the problem.
>

I attach one of the call traces, or would it be better if I send the
kern.log (about 6MB)?

The call traces are sufficient but the traces vanished when I hit reply.
Include them inline next time. It would be good to have the log starting
at the last system boot. There is some information cut off that I would to
see.

Ok, I attach the gzipped kern.log.


An atomic order 1 allocation failed and led to the OOM but it seems that
there is still ample memory available. Slab is in "fallback_alloc" so
something went wrong with the regular allocation attempt. Any use of
cpusets or cgroups?

not that I know of, no.


A significant amount of memory has been allocated to reclaimable slabs.
I guess these are the socket buffers?

Feb 10 11:59:49 beosrv1-t kernel: [1968911.211777] Node 0 Normal
free:965164kB min:917952kB low:1147440kB high:1376928kB
active_anon:2742680kB inactive_anon:293184kB active_file:4801512kB
inactive_file:11129708kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:21719040kB mlocked:0kB dirty:600kB
writeback:0kB mapped:26356kB shmem:4896kB slab_reclaimable:1780208kB <-----!!
slab_unreclaimable:199576kB kernel_stack:1576kB pagetables:22956kB
unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no

Could you try to reduce the number of network buffers?

which parameter?

thanks,

Peter

Attachment: kern.log.gz
Description: GNU Zip compressed data