On Mon, 2023-08-14 at 11:14 +0800, Aaron Lu wrote:
Hi Rui,
On Fri, Aug 04, 2023 at 05:08:58PM +0800, Zhang Rui wrote:
Problem statement
-----------------
When using cgroup isolated partition to isolate cpus including
cpu0, it
is observed that cpu0 is woken up frequenctly but doing nothing.
This is
not good for power efficiency.
<idle>-0 [000] 616.491602: hrtimer_cancel:
hrtimer=0xffff8e8fdf623c10
<idle>-0 [000] 616.491608: hrtimer_start:
hrtimer=0xffff8e8fdf623c10 function=tick_sched_timer/0x0
expires=615996000000 softexpires=615996000000
<idle>-0 [000] 616.491616: rcu_utilization: Start
context switch
<idle>-0 [000] 616.491618: rcu_utilization: End context
switch
<idle>-0 [000] 616.491637: tick_stop: success=1
dependency=NONE
<idle>-0 [000] 616.491637: hrtimer_cancel:
hrtimer=0xffff8e8fdf623c10
<idle>-0 [000] 616.491638: hrtimer_start:
hrtimer=0xffff8e8fdf623c10 function=tick_sched_timer/0x0
expires=616420000000 softexpires=616420000000
The above pattern repeats every one or multiple ticks, results in
total
2000+ wakeups on cpu0 in 60 seconds, when running workload on the
cpus that are not in the isolated partition.
Rootcause
---------
In NOHZ mode, an active cpu either sends an IPI or touches the idle
cpu's polling flag to wake it up, so that the idle cpu can pull
tasks
from the busy cpu. The logic for selecting the target cpu is to use
the
first idle cpu that presents in both nohz.idle_cpus_mask and
housekeeping_cpumask.
In the above scenario, when cpu0 is in the cgroup isolated
partition,
its sched domain is deteched, but it is still available in both of
the
above cpumasks. As a result, cpu0
I saw in nohz_balance_enter_idle(), if a cpu is isolated, it will not
set itself in nohz.idle_cpus_mask and thus should not be chosen as
ilb_cpu. I wonder what's stopping this from working?
One thing I forgot to mention is that the problem is gone if we offline
and re-online those cpus. In that case, the isolated cpus are removed
from the nohz.idle_cpus_mask in sched_cpu_deactivate() and are never
added back.
At runtime, the cpus can be removed from the nohz.idle_cpus_mask in
below case only
trigger_load_balance()
if (unlikely(on_null_domain(rq) || !cpu_active(cpu_of(rq))))
return;
nohz_balancer_kick(rq);
nohz_balance_exit_idle()
My understanding is that if a cpu is in nohz.idle_cpus_mask when it is
isolated, there is no chance to remove it from that mask later, so the
check in nohz_balance_enter_idle() does not help.