I don't know. It may well be that by the time Linux is seriously in
contention for cc-NUMA, the number of architectures will be seriously
reduced, in much the same way that the number of architectures for general
purpose computers got shook out in the '80s and '90s. In that case, my dire
warning won't really matter, and best practice and possibly even simple
algorithms will work.
One of the things I investigated at HP Labs in the mid 90s was on-the-fly
configuration by algorithm substitution. There was some good work on
underlying technology for the nitty part of substituting algorithms done at
the Oregon Graduate Center. This is substitution, rather than run time
choice, to avoid the overhead of making the algorithm-choice branch
frequently at runtime, and is an attempt to generalize techniques like
back-patching the memory copy routines at boot time.
As we all know, the major problem with one-size-fits-all-algorithms is
scalability. Algorithms that are efficient for small N in the order
statistic don't scale well, but algorithms for large N tend to have too much
overhead to justify using them when N is small. List management (of which
operating systems are major users) gives a trivial example. A list that
might have a half dozen items at all is trivial to maintain in sorted order
and to search linearly, while one with thousands of entries an frequent
insertions requires data structures that would have outrageous overhead for
small N, and may never be kept in sorted order at all.
cc-NUMA complicates the problem because not only do you have the dimension
of growth to take into account, which could probably be coped with by
back-patching generalizations, but you also have variation in system design.
Relative costs of coherence change at different numbers of processors, and
some systems have complicated memory hierarchies while others tend to have a
small number of levels. I've worked with machines where the "right"
approach was to treat them as clusters of SMPs, effectively requiring an
approach in which each "virtual" smp ran its own independent kernel and a
lot of work had to be done to provide a single system image model and
support very efficient in-cluster IPC between the kernels. (See, for
instance, my idea "Sender Mediated Communication," and John Wilkes' patent
on a related idea, documented in the Hamlyn papers.) On other machines,
there wasn't such a break in the memory hierarchy and running the whole
thing as one virtual SMP with algorithms tuned to the cost of sharing was
the right approach. These systems may differ significantly in the source
code base of the implementation. Processor scheduling, paging, even I/O
processing models can vary radically between machines. (Try optimizing a
Tera using the same processor scheduling algorithms as are appropriate for a
T3, for example.)
But, as I say, this may all be a red herring, because the market place
doesn't tolerate a lot of diversity, and many of the interesting
architectures that the research community have worked on may never appear in
any significant numbers in the market place. If you are able to limit your
solution to the problem to a fairly narrow class of cc-NUMA machines, then
the problem really becomes simply the run-time replacement of algorithms
based on system size.
From: Jesse Noller [mailto:email@example.com]
Sent: Thursday, September 07, 2000 2:46 PM
To: Marty Fouts; firstname.lastname@example.org
Subject: RE: Scalability Efforts
But would it be possible to apply a sort of "Linux Server Tuning Best
Practices" method to items not unlike NUMA, but more specific to say,
webserver and file serving?
(this is a project i am working on, finding kernel and apache tuning
guidelines for maximum File/Web serving speed w/ the 2.4 kernel)
- Note, if anyone has any pointers, please let me know.
From: Marty Fouts [mailto:email@example.com]
Sent: Thursday, September 07, 2000 5:30 PM
Subject: RE: Scalability Efforts
FWIW, large system scalability, especially NUMA is not tractable with a 'one
size (algorithm) fits all' approach, and can be a significant test of the
degree of modularity in your system. Different relative costs of access to
the different levels of the memory hierarchy and different models of cache
concurrency, especially, tend to make what works for system A be maximally
pessimal for system B.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:31 EST