Re: [PATCH 1/5] cpuset memory spread basic implementation

From: Andi Kleen
Date: Tue Feb 07 2006 - 09:22:32 EST


On Tuesday 07 February 2006 15:11, Ingo Molnar wrote:
>
> * Andi Kleen <ak@xxxxxxx> wrote:
>
> > I meant it's not that big an issue if it's remote, but it's bad if it
> > fills up the local node.
>
> are you sure this is not some older VM issue?

Unless you implement page migration for all caches it's still there.
The only way to get rid of caches on a node currently is to throw them
away. And refetching them from disk is quite costly.

> Unless it's a fundamental
> property of NUMA systems, it would be bad to factor in some VM artifact
> into the caching design.

Why not? It has to work with the real existing VM, not some imaginary perfect
one.

> > Basically you have to consider the frequency of access:
> >
> > Mapped memory is very frequently accessed. For it memory placement is
> > really important. Optimizing it at the cost of everything else is a
> > good default strategy
> >
> > File cache is much less frequently accessed (most programs buffer
> > read/write well) and when it is accessed it is using functions that
> > are relatively latency tolerant (kernel memcpy). So memory placement
> > is much less important here.
> >
> > And d/inode are also very infrequently accessed compared to local
> > memory, so the occasionally additional latency is better than
> > competing too much with local memory allocation.
>
> Most pagecache pages are clean,


... unless you've just written a lot of data.

> and it's easy and fast to zap a clean
> page when a new anonymous page needs space. So i dont really see why the
> pagecache is such a big issue - it should in essence be invisible to the
> rest of the VM. (barring the extreme case of lots of dirty pages in the
> pagecache) What am i missing?

d/icaches for once don't work this way. Do a find / and watch the results on
your local node.

And in practice your assumption of everything clean and nice in page cache is
also often not true.

> > > i also mentioned software-based clusters in the previous mail, so it's
> > > not only about big systems. Caching attributes are very much relevant
> > > there. Tightly integrated clusters can be considered NUMA systems with a
> > > NUMA factor of 1000 or so (or worse).
> >
> > To be honest I don't think systems with such a NUMA factor will ever
> > work well in the general case. So I wouldn't recommend considering
> > them much if at all in your design thoughts. The result would likely
> > not be a good balanced design.
>
> loosely coupled clusters do seem to work quite well, since the
> overwhelming majority of computing jobs tend to deal with easily
> partitionable workloads.

Yes, but with message passing but without any kind of shared memory.

> Making clusters more seemless via software
> (a'ka OpenMosix) is still quite tempting i think.

Ok we agree on that then. Great.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/