Hi, Pierre,
Sorry for the late response. I'm still ramping up on the related code.
On Thu, 2023-09-14 at 16:53 +0200, Pierre Gondois wrote:
On 9/14/23 11:23, Zhang, Rui wrote:
Hi, Pierre,
Yes right indeed,
This happens when putting a CPU offline (as you mentioned
earlier,
putting a CPU offline clears the CPU in the idle_cpus_mask).
The load balancing related variables
including?
I meant the nohz idle variables in the load balancing, so I was
referring to:
(struct sched_domain_shared).nr_busy_cpus
(struct sched_domain).nohz_idle
nohz.idle_cpus_mask
nohz.nr_cpus
(struct rq).nohz_tick_stopped
IMO, the problem is that, for an isolated CPU,
1. it is not an idle cpu (nohz.idle_cpus_mask should be cleared)
2. it is not a busy cpu (sds->nr_busy_cpus should be decreased)
But current code does not have a third state to describe this, so we
need to either
1. add extra logic, like on_null_domain() checks
or
2. rely on current logic, but update all related variables correctly,
like you proposed.
But in any case, we should stick with one direction.
If we follow the first one, the original patch should be used, which
IMO is simple and straight forward.
If we follow the later one, we'd better audit and remove the current
on_null_domain() usage at the same time. TBH, I'm not confident enough
to make such a change. But if you want to propose something, I'd glad
to test it.
thanks,
rui