Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.

From: Joel Fernandes
Date: Mon Aug 31 2020 - 23:38:47 EST


On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@xxxxxxxxxxxxx wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
>
> > > So where do things go side-ways?
>
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
>
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
>
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
> > Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
> > detail the individual hotplug changes squashed into this patch:
> > https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@xxxxxxxxxxxxxxxxx/
> > https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@xxxxxxxxxxxxxxxxx/
>
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.
>
> > https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@xxxxxxxxxxxxxxxxx/
>
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
>
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
>
> /me re-reads the hotplug code...
>
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.

That is indeed the problem.

I was wondering, is there any harm in just selecting idle task if the CPU
calling schedule() is missing from cpu_smt_mask? Does it need to do a
core-wide selection?

That would be best, and avoid any unnecessary surgery of the already
complicated function. This is sort of what Tim was doing in v4 and v5.

Also what do we do if cpu_smt_mask changing while this function is running? I
tried something like the following and it solves the issues but the overhead
probably really sucks. I was thinking of doing a variation of the below that
just stored the cpu_smt_mask's rq pointers in an array of size SMTS_PER_CORE
on the stack, instead of a cpumask but I am not sure if that will keep the
code clean while still having similar storage overhead.

---8<-----------------------