Re: [PATCH] VM: add vm.free_node_memory sysctl

From: Ray Bryant
Date: Wed Aug 03 2005 - 15:01:49 EST


On Wednesday 03 August 2005 09:38, Andi Kleen wrote:
> On Wed, Aug 03, 2005 at 10:24:40AM -0400, Martin Hicks wrote:
> > On Wed, Aug 03, 2005 at 04:15:29PM +0200, Andi Kleen wrote:
> > > On Wed, Aug 03, 2005 at 09:56:46AM -0400, Martin Hicks wrote:
> > > > Here's the promised sysctl to dump a node's pagecache. Please
> > > > review!
> > > >
> > > > This patch depends on the zone reclaim atomic ops cleanup:
> > > > http://marc.theaimsgroup.com/?l=linux-mm&m=112307646306476&w=2
> > >
> > > Doesn't numactl --bind=node memhog nodesize-someslack do the same?
> > >
> > > It just might kick in the oom killer if someslack is too small
> > > or someone has unfreeable data there. But then there should be
> > > already an sysctl to turn that one off.
> >
Hmmm.... What happens if there are already mapped pages (e. g. mapped in the
sense that pages are mapped into an address space) on the node and you want
to allocate some more, but can't because the node is full of clean page cache
pages? Then one would have to set the memhog argument to the right thing to
keep the existing mapped memory from being swapped out, right? Is the data
to set that argument readily available to user space? Martin's patch has the
advantage of targeting just the clean page cache pages.

The way I see this, the problem is that clean page cache pages >>should<< be
easily available to be used to satisfy a request for mapped pages. This
works correctly in non-NUMA Linux systems. But in NUMA Linux systems, we
keep tripping over this problem all the time, particularly in the HPC space,
and patches like Martin's come about as an attempt to solve this in the VMM.
(We trip over this in the sense that we end up allocating off node storage
because the current node is full of page cache pages.)

The best answer we have at the present time is to run a memory hog program
that forces the clean page cache pages to be reclaimed by putting the node in
question under memory pressure, but this seems like an indirect way to solve
the problem at hand which is, really, to quickly release those page cache
pages and make them available for user programs to allocate. So the most
direct way to fix this is to fix it in the VMM rather than depending on a
memory hog based work-around of some kind. Perhaps we haven't gotten the
right set of patches together to do this, but my take is that is where the
fix belongs.

And, just for the record ( :-) ), this is not just an Altix problem.
Opterons are NUMA systems too, and we encounter exactly this same problem in
the HPC space on 4-node systems.
--
Ray Bryant
AMD Performance Labs Austin, Tx
512-602-0038 (o) 512-507-7807 (c)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/