Re: NUMA mempolicy /proc code in mainline shouldn't have been merged

From: Andi Kleen
Date: Mon Sep 19 2005 - 16:57:35 EST


On Monday 19 September 2005 23:32, Christoph Lameter wrote:
> On Mon, 19 Sep 2005, Andi Kleen wrote:
> > On Mon, Sep 19, 2005 at 10:11:20AM -0700, Christoph Lameter wrote:
> > > However, one still does not know which memory section (vma) is
> > > allocated on which nodes. And this may be important since critical data
> > > may need to
> >
> > Maybe. Well sure of things could be maybe important. Or maybe not.
> > Doesn't seem like a particularly strong case to add a lot of ugly
> > code though.
>
> We gradually need to fix the deficiencies of the policy layer. Calling
> fixes "ugly code" and refusing to discuss solutions does not help anyone.

I'm happy to discuss solutions given a clear use case what you want
to do, why you want to do it etc.

> > > External memory policy management is a necessary feature for system
> > > administration, batch process scheduling as well as for testing and
> > > debugging a system.
> >
> > I'm not convinced of this at all. Most of these things proposed so far
> > can be done much simpler with 90% of the functionality (e.g. just swapoff
> > per process for migration) , and I haven't seen a clear rationale except
> > for lots of maybes that the missing 10% are worth all the complexity
> > you seem to plan to add.
>
> Have you ever had the challenge to work with large HPC applications on a
> large NUMA system?

Ah - my code is better because my credentials are better. Maybe better than
maybe... I did only some tuning on large systems, but I spent quite some time
tuning NUMA code on small NUMA systems. Also I did spent a bit of time
looking at some of the tools offered by other Unixes and it left the clear
impression that they were far too complex and shouldn't be emulated in Linux.

The existing NUMA API was designed by keeping things relatively
simple (in fact my experience so far was that most users only
want to have the most simple of its policies - even the moderately
fancy stuff in there seems to be rarely used) so I think the bar
for more NUMA policy should be set extremly high and everything
come with extremly good rationales.

> Which things? Many HPC apps do not use swap space
> at all and we likely wont be using swap for page migration

Yes that was a lot of quite complicated code that seemed to me
quite overkill for its job.

Regarding swap: surely you know swapping doesn't necessarily
write to disk but first put stuff into the swap cache (we even
talked about that in Ottawa). So the plan would be to first
implement swapoff_process() that writes out to disk. And then
if someone really comes up with a clear case where this doesn't work
for them this can be extended to migrate directly out of the swap
cache missing the IO step.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/