...Reuse the data from load balance to select the unoccupied candidate
is applicable IMO, which is also aligned with SIS_UTIL path. And I have
a question regarding the update frequency. In v3 patch, the update is
based on periodic tick, which is triggered at most every 1ms(CONFIG_HZ_1000).
Then the periodic SMT load balance is launched every smt_weight ms, usually 2ms.
I expect the 2ms is of the same scale unit as 1ms, and since task tick based
update is acceptable, would excluding the CPU_NEWLY_IDLE balance from this patch
reduce the overhead meanwhile not bring too much inaccuracy?
@@ -8757,7 +8794,16 @@ static inline void update_sg_lb_stats(struct lb_env *env,Does it mean, only 1 idle CPU in the smt domain would be set to the
* No need to call idle_cpu() if nr_running is not 0
*/
if (!nr_running && idle_cpu(i)) {
+ /*
+ * Prefer the last idle cpu by overwriting
+ * preious one. The first idle cpu in this
+ * domain (if any) can trigger balancing
+ * and fed with tasks, so we'd better choose
+ * a candidate in an opposite way.
+ */
+ sds->idle_cpu = i;
idle cpu mask at one time? For SMT4/8 we might lose track of the
idle siblings.
sgs->idle_cpus++;I wonder if we could further enhance it to facilitate idle CPU scan.
+
/* Idle cpu can't have misfit task */
continue;
}
@@ -9273,8 +9319,40 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
static void sd_update_state(struct lb_env *env, struct sd_lb_stats *sds)
{
- if (sds->sd_state == sd_has_icpus && !test_idle_cpus(env->dst_cpu))
- set_idle_cpus(env->dst_cpu, true);
+ struct sched_domain_shared *sd_smt_shared = env->sd->shared;
+ enum sd_state new = sds->sd_state;
+ int this = env->dst_cpu;
+
+ /*
+ * Parallel updating can hardly contribute accuracy to
+ * the filter, besides it can be one of the burdens on
+ * cache traffic.
+ */
+ if (cmpxchg(&sd_smt_shared->updating, 0, 1))
+ return;
+
+ /*
+ * There is at least one unoccupied cpu available, so
+ * propagate it to the filter to avoid false negative
+ * issue which could result in lost tracking of some
+ * idle cpus thus throughupt downgraded.
+ */
+ if (new != sd_is_busy) {
+ if (!test_idle_cpus(this))
+ set_idle_cpus(this, true);
+ } else {
+ /*
+ * Nothing changes so nothing to update or
+ * propagate.
+ */
+ if (sd_smt_shared->state == sd_is_busy)
+ goto out;
+ }
+
+ sd_update_icpus(this, sds->idle_cpu);
For example, can we propagate the idle CPUs in smt domain, to its parent
domain in a hierarchic sequence, and finally to the LLC domain. If there is
a cluster domain between SMT and LLC domain, the cluster domain idle CPU filter
could benefit from this mechanism.
https://lore.kernel.org/lkml/20220609120622.47724-3-yangyicong@xxxxxxxxxxxxx/
Yes, this is the best case, but the worst case is something that
Furthermore, even if there is no cluster domain, would a 'virtual' middle
sched domain would help reduce the contention?
Core0(CPU0,CPU1),Core1(CPU2,CPU3) Core2(CPU4,CPU5) Core3(CPU6,CPU7)
We can create cpumask1, which is composed of Core0 and Core1, and cpumask2
which is composed of Core2 and Core3. The SIS would first scan in cpumask1,
if idle cpu is not found, scan cpumask2. In this way, the CPUs in Core0 and
Core1 only updates cpumask1, without competing with Core2 and Core3 on cpumask2.