RE: [PATCH 4/4] zone_reclaim_mode is always 0 by default

From: Zhang, Yanmin
Date: Mon May 18 2009 - 21:18:21 EST


>>-----Original Message-----
>>From: Wu, Fengguang
>>Sent: 2009年5月18日 11:49
>>To: KOSAKI Motohiro
>>Cc: LKML; linux-mm; Andrew Morton; Rik van Riel; Christoph Lameter; Zhang,
>>Yanmin
>>Subject: Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
>>
>>On Wed, May 13, 2009 at 12:08:12PM +0900, KOSAKI Motohiro wrote:
>>> Subject: [PATCH] zone_reclaim_mode is always 0 by default
>>>
>>> Current linux policy is, if the machine has large remote node distance,
>>> zone_reclaim_mode is enabled by default because we've be able to assume to
>>> large distance mean large server until recently.
>>>
>>> Unfrotunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P
>>transport
>>> memory controller. IOW it's NUMA from software view.
>>>
>>> Some Core i7 machine has large remote node distance and zone_reclaim don't
>>> fit desktop and small file server. it cause performance degression.
>>
>>I can confirm this, Yanmin recently ran into exactly such a
>>regression, which was fixed by manually disabling the zone reclaim
>>mode. So I guess you can safely add an
[YM] Fengguang told the truth. One Nehalem machine has 12GB memory,
but there is always 2GB free although applications accesses lots of files.
Eventually we located the root cause as zone_reclaim_mode=1.

Acked.



>>
>>Tested-by: "Zhang, Yanmin" <yanmin.zhang@xxxxxxxxx>
>>
>>> Thus, zone_reclaim == 0 is better by default. sorry, HPC gusy.
>>> you need to turn zone_reclaim_mode on manually now.
>>
>>I guess the borderline will continue to blur up. It will be more
>>dependent on workloads instead of physical NUMA capabilities. So
>>
>>Acked-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
>>
>>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
>>> Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>
>>> Cc: Rik van Riel <riel@xxxxxxxxxx>
>>> ---
>>> mm/page_alloc.c | 7 -------
>>> 1 file changed, 7 deletions(-)
>>>
>>> Index: b/mm/page_alloc.c
>>> ===================================================================
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -2494,13 +2494,6 @@ static void build_zonelists(pg_data_t *p
>>> int distance = node_distance(local_node, node);
>>>
>>> /*
>>> - * If another node is sufficiently far away then it is better
>>> - * to reclaim pages in a zone before going off node.
>>> - */
>>> - if (distance > RECLAIM_DISTANCE)
>>> - zone_reclaim_mode = 1;
>>> -
>>> - /*
>>> * We don't want to pressure a particular node.
>>> * So adding penalty to the first node in same
>>> * distance group to make it round-robin.
>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
N?叉??y??b??千v??藓{.n???{?赙zXФ?塄}?财??j:+v???赙zZ+€?zf"?????i????ア??璀??撷f?^j谦y??@A?囤?0鹅h??i