Re: [PATCH 8/9] sched/fair: select idle cpu from idle cpumask for task wakeup

From: Mel Gorman
Date: Fri Sep 17 2021 - 09:36:01 EST


On Fri, Sep 17, 2021 at 05:11:11PM +0800, Aubrey Li wrote:
> On 9/17/21 12:15 PM, Barry Song wrote:
> >> @@ -4965,6 +4965,7 @@ void scheduler_tick(void)
> >>
> >> #ifdef CONFIG_SMP
> >> rq->idle_balance = idle_cpu(cpu);
> >> + update_idle_cpumask(cpu, rq->idle_balance);
> >> trigger_load_balance(rq);
> >> #endif
> >> }
> >
> > might be stupid, a question bothering yicong and me is that why don't we
> > choose to update_idle_cpumask() while idle task exits and switches to a
> > normal task?
>
> I implemented that way and we discussed before(RFC v1 ?), updating a cpumask
> at every enter/exit idle is more expensive than we expected, though it's
> per LLC domain, Vincent saw a significant regression IIRC. You can also
> take a look at nohz.idle_cpus_mask as a reference.
>

It's possible to track it differently and I prototyped it some time
back. The results were mixed at the time. It helped some workloads
and was marginal on others. It appeared to help hackbench but I found
that hackbench is much more vulnerable to the wakeup_granularity and
overscheduling. For hackbench, it makes more sense to target that directly
before revisiting the alt-idlecore to see what it really helps. I'm waiting
on test results on various ways wakeup_gran can be scaled depending on
rq activity.

For alternative idle core tracking, the current 5.15-rc1 rebase
prototype looks like this

https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/commit/?h=sched-altidlecore-v2r8&id=b2af1a88245f6cbeb28343e89f3183a77b29d52d

Test results still pending and as usual the queue is busy. I swear, my
primary bottleneck for doing anything is benchmark and validation :(

--
Mel Gorman
SUSE Labs