Re: [tip:sched/numa] sched/numa: Introduce sys_numa_{t,m}bind()

From: Ingo Molnar
Date: Sat May 19 2012 - 07:19:23 EST



* Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:

> > > I very much believe in doing the simple thing first, and
> > > this is that,
> >
> > Leave out your syscalls (which might not be useful for
> > managed runtimes), and you actually have the simple thing :)
>
> Right, but the virt people could actually trivially use those,
> and vnuma doesn't have the scambling issue outlined earlier
> since the guest kernel would also try to keep home-node
> affinity.
>
> Avi already said patching kvm would be like 5 minutes work.

These APIs also match what user-space numa daemons started doing
already.

> It also absolutely avoids the false sharing issue otherwise
> present with per-cpu memory, since you explicitly tell it
> where it belongs.

The grouping is also a natural extension to task and memory
affinities and groups in general.

It also allows us to turn auto-migration off by default, which
is a plus in my book. Without enough numbers I'm not convinced
that we really *want* auto-discovery turned on all the time, for
all workloads. The thing is, in practice most workloads that
matter are short-run and even trivial forms of CPU migration
doesnt ever happen for bursts of activity. We place them and
that's it.

Managed runtimes on the other hand can be expected to know about
and manage their locality - they do it anyway, by running guest
scheduler(s). So this patch-set gives them the ability to
express locality in a simple way, without the host kernel
scanning actively.

We can auto-scan on top of this, if the numbers support it, but
in the simple case where both the guest and the host is smart
then simply expressing locality and telling each other is vastly
superior to any scanning method.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/