Re: [PATCH 4/4] zone_reclaim_mode is always 0 by default
From: Robin Holt
Date: Wed May 20 2009 - 10:00:57 EST
On Tue, May 19, 2009 at 11:53:44AM +0900, KOSAKI Motohiro wrote:
> Hi
>
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance
> > > mean large server until recently.
> > >
> > > Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> > > memory controller. IOW it's seen as NUMA from software view.
> > >
> > > Some Core i7 machine has large remote node distance, but zone_reclaim don't
> > > fit desktop and small file server. it cause performance degression.
> > >
> > > Thus, zone_reclaim == 0 is better by default if the machine is small.
> >
> > What if I had a node 0 with 32GB or 128GB of memory. In that case,
> > we would have 3GB for DMA32, 125GB for Normal and then a node 1 with
> > 128GB. I would suggest that zone reclaim would perform normally and
> > be beneficial.
> >
> > You are unfairly classifying this as a size of machine problem when it is
> > really a problem with the underlying zone reclaim code being triggered
> > due to imbalanced node/zones, part of which is due to a single node
> > having multiple zones and those multiple zones setting up the conditions
> > for extremely agressive reclaim. In other words, you are putting a
> > bandage in place to hide a problem on your particular hardware.
> >
> > Can RECLAIM_DISTANCE be adjusted so your Ci7 boxes are no longer caught?
> > Aren't 4 node Ci7 boxes soon to be readily available? How are your apps
> > different from my apps in that you are not impacted by node locality?
> > Are you being too insensitive to node locality? Conversely am I being
> > too sensitive?
> >
> > All that said, I would not stop this from going in. I just think the
> > selection criteria is rather random. I think we know the condition we
> > are trying to avoid which is a small Normal zone on one node and a larger
> > Normal zone on another causing zone reclaim to be overly agressive.
> > I don't know how to quantify "small" versus "large". I would suggest
> > that a node 0 with 16 or more GB should have zone reclaim on by default
> > as well. Can that be expressed in the selection criteria.
>
> I post my opinion as another mail. please see it.
I don't think you addressed my actual question. How much of this is
a result of having a node where 1/4 of the memory is in the 'Normal'
zone and 3/4 is in the DMA32 zone? How much is due to the imbalance
between Node 0 'Normal' and Node 1 'Normal'? Shouldn't that type of
sanity check be used for turning on zone reclaim instead of some random
number of nodes. Even with 128 nodes and 256 cpus, I _NEVER_ see the
system swapping out before allocating off node so I can certainly not
reproduce the situation you are seeing.
The imbalance I have seen was when I had two small memory nodes and two
large memory nodes and then oversubscribed memory. In that situation,
I noticed that the apps on the small memory nodes were more frequently
impacted. This unfairness made sense to me and seemed perfectly
reasonable.
Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/