Re: kswapd craziness in 3.7

From: Zlatko Calusic
Date: Mon Dec 10 2012 - 13:29:01 EST

On 10.12.2012 12:03, Mel Gorman wrote:
There is a big difference between a direct reclaim/compaction for THP
and kswapd doing the same work. Direct reclaim/compaction will try once,
give up quickly and defer requests in the near future to avoid impacting
the system heavily for THP. The same applies for khugepaged.

kswapd is different. It can keep going until it meets its watermarks for
a THP allocation are met. Two reasons why it might keep going for a long
time are that compaction is being inefficient which we know it may be due
to crap like this

end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);

and the second reason is if the highest zone is relatively because
compaction_suitable will keep saying that allocations are failing due to
insufficient amounts of memory in the highest zone. It'll reclaim a little
from this highest zone and then shrink_slab() potentially dumping a large
amount of memory. This may be the case for Zlatko as with a 4G machine
his ZONE_NORMAL could be small depending on how the 32-bit address space
is used by his hardware.

The kernel is 64-bit, if it makes any difference (userspace, though is still 32-bit). There's no swap (swap support not even compiled in). The zones are as follows:

On node 0 totalpages: 1048019
DMA zone: 64 pages used for memmap
DMA zone: 6 pages reserved
DMA zone: 3913 pages, LIFO batch:0
DMA32 zone: 16320 pages used for memmap
DMA32 zone: 831109 pages, LIFO batch:31
Normal zone: 3072 pages used for memmap
Normal zone: 193535 pages, LIFO batch:31

If I understand correctly, you think that because 193535 pages in ZONE_NORMAL is relatively small compared to 831109 pages of ZONE_DMA32 the system has hard time balancing itself?

Is there any way I could force and test different memory layout? I'm slightly lost at all the memory models (if I have a choice at all), so if you have any suggestions, I'm all ears.

Maybe I could limit available memory and thus have only DMA32 zone, just to prove your theory? I remember doing tuning like that many years ago when I had more time to play with Linux MM, unfortunately didn't have much time lately, so I'm a bit rusty, but I'm willing to help testing and resolving this issue.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at