Re: [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask

From: Valentin Schneider
Date: Mon Jul 13 2020 - 09:52:44 EST



On 13/07/20 14:43, Peter Zijlstra wrote:
> On Mon, Jul 13, 2020 at 02:28:29PM +0100, Valentin Schneider wrote:
>>
>> On 13/07/20 13:55, Peter Zijlstra wrote:
>> > On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote:
>> >> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
>> >> sd_degenerate_parent().
>> >>
>> >> Note that this changes sd_degenerate() somewhat: I'm using the negation of
>> >> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
>> >> is equivalent to:
>> >>
>> >> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA
>> >>
>> >> whereas the current mask for that is simply
>> >>
>> >> SD_WAKE_AFFINE
>> >>
>> >> I played with a few toy NUMA topologies on QEMU and couldn't cause a
>> >> different degeneration than what mainline does currently. If that is deemed
>> >> too risky, we can go back to using SD_WAKE_AFFINE explicitly.
>> >
>> > Arguably SD_SERIALIZE needs groups, note how we're only having that
>> > effective for machines with at least 2 nodes. It's a bit shit how we end
>> > up there, but IIRC that's what it ends up as.
>> >
>>
>> Right, AFAICT we get SD_SERIALIZE wherever we have SD_NUMA, which is any
>> level above NODE.
>
> Oh, right, I forgot we have NODE, d'0h. But in that case these lines:
>
> if (nr_node_ids == 1)
> pflags &= ~SD_SERIALIZE;
>
> are dead code, right?
>

Uuuuh right, that does sound like something made obsolete by having NODE;
we didn't have it back then:

5436499e6098 ("sched: fix sd_parent_degenerate on non-numa smp machine")

I'll fold that in (and test it, just to be sure :-)).

>> > SD_NUMA is descriptive, and not marking a group as degenerates because
>> > it has SD_NUMA seems a bit silly.
>>
>> It does, although we can still degenerate it, see below.
>>
>> > But then, it would be the top domain
>> > and would survive anyway?
>>
>> So from what I've tested we still get rid of those via
>> sd_parent_degenerate(): child and parent have the same flags and same span,
>> so parent goes out.
>>
>> That happens in the middle of the NUMA topology levels on that borked
>> topology with weird distances, aka
>>
>> node distances:
>> node 0 1 2 3
>> 0: 10 12 20 22
>> 1: 12 10 22 24
>> 2: 20 22 10 12
>> 3: 22 24 12 10
>>
>> which ought to look something like (+local distance to end result)
>>
>> 2 10 2
>> 1 <---> 0 <---> 2 <---> 3
>>
>> We end up with the following NUMA levels (i.e. deduplicated distances)
>> NUMA (<= 12)
>> NUMA (<= 20)
>> NUMA (<= 22)
>> NUMA (<= 24)
>>
>> For e.g. any CPU of node1, NUMA(<=20) is gonna have the same span as
>> NUMA(<=12), so we'll degenerate it.
>
> Man, that's horrible :-)

It is :(

> OK, fair enough, keep it as is, we'll see what
> if anything breaks.