On Wed 22-02-17 22:31:50, hejianet wrote:You are right, will send v2 soon after testing it
On 22/02/2017 7:41 PM, Michal Hocko wrote:
On Wed 22-02-17 17:04:48, Jia He wrote:Yes, at last the allocated hugepages are less than 4000
When I try to dynamically allocate the hugepages more than system total
e.g. echo 4000 >/proc/sys/vm/nr_hugepages
I assume that the command has terminated with less huge pages allocated
than requested but
Hugepagesize: 16384 kB
In the bad case, although kswapd takes 100% cpu, the number of
HugePages_Total is not increase at all.
Node 3, zone DMA[...]
pages free 2951
it left the zone below high watermark with
no pages reclaimable, so kswapd will not go to sleep. It would be quite
easy and comfortable to call it a misconfiguration but it seems that
it might be quite easy to hit with NUMA machines which have large
differences in the node sizes. I guess it makes sense to back off
the kswapd rather than burning CPU without any way to make forward
please make sure that this information is in the changelog
@@ -3502,6 +3503,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
+ int node_has_relaimable_pages = 0;
@@ -3522,8 +3524,15 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
if (zone_balanced(zone, order, classzone_idx))
+ if (!zone_reclaimable_pages(zone))
+ node_has_relaimable_pages = 1;
What, this doesn't make any sense? Did you mean if (zone_reclaimable_pages)?
I mean, if any one zone has reclaimable pages, then this zone's *node* has
reclaimable pages. Thus, the kswapN for this node should be waken up.
e.g. node 1 has 2 zones.
zone A has no reclaimable pages but zone B has.
Thus node 1 has reclaimable pages, and kswapd1 will be waken up.
I use node_has_relaimable_pages in the loop to check all the zones' reclaimable
pages number. So I prefer the name node_has_relaimable_pages instead of
I still do not understand. This code starts with
node_has_relaimable_pages = 0. If you see a zone with no reclaimable
pages then you make it node_has_relaimable_pages = 1 which means that
+ /* Dont wake kswapd if no reclaimable pages */
+ if (!node_has_relaimable_pages)
this will not hold and we will wake up the kswapd. I believe what
you want instead, is to skip the wake up if _all_ zones have
!zone_reclaimable_pages() Or I am missing your point. This means that
node_has_relaimable_pages = 1;
trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);