Re: Crunch time -- the musical. (2.5 merge candidate list 1.5)

From: Erich Focht (efocht@ess.nec.de)
Date: Thu Oct 24 2002 - 16:51:56 EST


Hi Rob and Michael,

I need to correct some inexactities and, of course, advertise my aproach
:-)

On Thursday 24 October 2002 21:01, Michael Hohnbaum wrote:
> > > > 26) NUMA aware scheduler extenstions (Erich Focht, Michael Hohnbaum)
> > > >
> > > > Home page:
> > > > http://home.arcor.de/efocht/sched/
> > > >
> > > > Patch:
> > > > http://home.arcor.de/efocht/sched/Nod20_numa_sched-2.5.31.patch

These are old. I posted the newer patches (splitted up in order to clearly
separate the functionality additions) to LKML:

http://marc.theaimsgroup.com/?l=linux-kernel&m=103459387719030&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=103459387519026&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=103459441119407&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=103459441319411&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=103459441419416&w=2
They should work for any NUMA platform by just adding a call to
build_pools() in smp_cpus_done(). They work for non-NUMA platforms
the same way as the O(1) scheduler (though the code looks different).
A test overview is in: http://lwn.net/Articles/12546/
This suggests that taking only patches 01+02 already gives you a VERY
good NUMA scheduler. They deliver the infrastructure for later
developments (patches 03+05) which we can further research and tune or
give only to special customers.

> The 2.5 status list has not been updated to reflect this separate
> effort, and I believe incorrectly lists this entry as "ready". There
> really are now two NUMA scheduler projects:
>
> * Simple NUMA scheduler (Michael Hohnbaum) - ready for inclusion
> * Node affine NUMA scheduler (Erich Focht) - Alpha (Beta?)
This is not correct. We have the node affine scheduler in production
since 6 months on top of 2.4. kernels and are happy with it. It is a lot
more than alpha or beta, it already makes customers happy.

The situation is really funny: Everybody seems to agree that the design
ideas in my NUMA aproach are sane and exactly what we want to have on
a NUMA platform in the end. But instead of concentrating on tuning the
parameters for the many different NUMA platforms and reshaping this
aproach to make it acceptable, IBM concentrates on a very much stripped
down aproach. I understand that this project has been started to make
the inclusion of some NUMA scheduler easier. But in the end, the simple
NUMA scheduler will have to develop to a much more complex thing and in
some form or another replicate the design ideas of my node affine
scheduler. On machines with poor NUMA ratio like NUMAQ the simple NUMA
change helps. For machines with good NUMA ratio like NEC Azusa, NEC TX7
you need a little bit more. AMD Hammer-SMP and ppc64 are certainly in
the same class as the Azusa/TX7. And as soon as Hammer SMP systems will
be around, the pressure for a full featured NUMA scheduler will be much
higher.

A NUMA scheduler extension of the 2.6 kernel fits very well with the
development effort done for better scalability and enterprise level
fitnes of Linux. Check http://lwn.net/Articles/12546/ to see that it
makes a difference to have more than O(1) on NUMA machines! I'd
definitely prefer the inclusion of my 01+02 patches (I'd have to
maintain less code to keep the customers happy), on the other side:
including Michael's patch would be better than not adding NUMA
scheduler support at all.

Best regards,
Erich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:25 EST