Re: [PATCH] cpusets - big numa cpu and memory placement

From: Paul Jackson
Date: Fri Oct 08 2004 - 08:23:26 EST


First, thank-you, Hubertus, for comparing me to a puppy, rather
than a kitten. I am definitely a dog person, not a cat person,
and I appreciate your considerate choice of analog.

I gather from the tone of your post yesterday that there is
a disconnect between us - you speak with the frustration of
someone who has been shouting into the wind and not being
heard.

I suspect that the disconnect, if such be, is not where you
think it is:

Hubertus wrote:
>
> The disconnect is that you do not want to recognize that CKRM does NOT
> have to be systemwide. Once you open your mind to the fact that CKRM can
> be deployed with in a subset of disconnected resources (cpu domains)
> and manages shares independently within that domain, I truely don't see
> what the problem is.

I have recognized for months that eventually we'd want to allow
for cpuset-relative CKRM domains, and I'm pretty sure I've
dropped comments to that affect one time or another here on lkml.

I suspect instead that "CKRM" is one layer more abstract than
I am normally comfortable with.

As best as I can tell, CKRM has evolved from its origins as a
fair share scheduler, into a framework (*) for things called by
such names as classes and controllers. As you may recall from
an inconclusive thread between us on the ckrm-tech email list two
months ago, I find those terms uncomfortably vague and abstract.

In general, frameworks are high risk business. What they
gain in generality, covering a wider range of situations in
a uniform pattern, they lose in down to earth concreteness,
leaving their users less confident of what works, and less able
to rely on their intuitions. The risk of serious design flaws,
shrouded for a long time in the fog of abstraction, is higher.

The more successful frameworks, such as vfs for example,
typically have deep roots in prior art, and a sizable population
of journeyman and master practitioners.

CKRM is young, its roots more shallow, and the population of
its practitioners small.

(*) P.S. - It's more like CKRM is now the combination of
a virtual resource manager framework and a particular
instance of such (the fair shair controllers that have
their conceptual origins in IBM's WLM, I suspect). If
numa placement controllers (aka cpusets) are going to
exist as well, then CKRM needs to split into (1) a
virtual resource manager framework (vrm), and (2) the
fair share stuff. The vrm framework should be neutral
of either fair share or numa placement bias.

===

So here I am with this new cpuset design (Simon Derr, primary
architect, both Simon and I feel a strong sense of ownership)
for numa placement, perhaps the 4th or 5th in SGI's history,
and the 2nd in mine. I am finding that it deliciously and
elegantly reflects the needs of its anticipated users (Sylvain
might demur, noting a couple of things I removed).

I am now being asked to morph it into a CKRM controller.

Further I deduce from the efforts over the last few days to talk
me down from meeting all the requirements satisfied by my current
cpuset patch that something of cpusets will be lost in the translation.

But I haven't figured out exactly what will be lost. And I lack the
mastery of CKRM that would enable me to engage in a constructive dialog
on the various tradeoffs that come into play here.

I look at the CKRM patch, and see something that looks an order
of magnitude larger than my cpuset patch. With its increased
number of hooks in the kernel, and its more abstract style
(it is a framework afterall), I also see something with a
higher risk of performance impact, especially on the large NUMA
configurations that I care about.

And I am looking at trading what I thought had hope of being a
Sept or Oct date for acceptance into Linus's kernel, into some
unknown schedule that is definitely further out.

I've got the bacon sizzling on the skillet, I can smell it, my
mouth is watering, and just as I go to lift it off the burner,
Andrew asks me to consider trading it for a pig in a poke.
Thanks a bunch, Andrew - you da man ;).

Putting aside for a moment my personal frustrations (which
are after all my problem - and my dogs) I am simply unable to
make sense yet of how deep would be the hit on the capabilities
of cpusets, if so morphed, and I am painfully aware of the
undetermined schedule delays and increased risks to product
performance and even ultimate success that attend such a change.

>From what my field engineers tell me, whom I've been polling
furiously on this matter the last few days, at least in the
markets that SGI frequents, there is very little overlap between
system configurations which benefit from fair share resource
management and those which benefit from numa placement resource
management. So, if that experience is generally applicable, we
are at risk of marrying a helicopter and a boat, just because
both have a motor and a hull, to the detriment of both.

Merging projects always has risks. The payoff for synergies
gained is not always greater than the cost of the inefficiencies
and compromises introduced, and the less immediate involvement
of the participants in the end result.

I cannot in good conscience recommend such a change.

Keep talking.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.650.933.1373
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/