Re: [PATCH RFC/TEST] sched: make sync affine wakeups work

From: Ingo Molnar
Date: Wed May 07 2014 - 08:17:54 EST



* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Tue, May 06, 2014 at 04:20:59PM -0400, Rik van Riel wrote:
> > On 05/06/2014 09:25 AM, Peter Zijlstra wrote:
> > > On Sun, May 04, 2014 at 08:41:09AM -0400, Rik van Riel wrote:
> > >> Even on 8-node DL980 systems, the NUMA distance in the
> > >> SLIT table is less than RECLAIM_DISTANCE, and we will
> > >> do wake_affine across the entire system.
> > >
> > > Yeah, so the problem is that (AFAIK) ACPI doesn't actually specify a
> > > metric for the SLIT distance. This (in as far as BIOS people would care
> > > to stick to specs anyhow) has lead to the 'fun' situation where BIOS
> > > engineers tweak SLIT table values to make OSes behave as they thing it
> > > should.
> > >
> > > So if the BIOS engineer finds that this system should have <
> > > RECLAIM_DISTANCE it will simply make the table such that the max SLIT
> > > value is below that.
> > >
> > > And yes, I've seen this :-(
> >
> > It appears to be the case on the vast majority of the NUMA systems
> > that are actually in use.
> >
> > To me, this suggests that we should probably deal with it.
>
> What we could do is redefine this distance in hops, that'll force
> them to lie more blatantly and actually miss represent the topology.

and we should make sure we reduce any graph they represent, so that
they can lie only through very heavy misrepresentation of the topology
(i.e. not just weight tweaks) which will bite them in other areas
(like the mm).

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/