Re: [PATCH v4 06/20] sched/core: allow only preferred CPUs in is_cpu_allowed

From: Shrikanth Hegde

Date: Thu Jun 18 2026 - 01:15:51 EST

On 6/18/26 10:19 AM, Yury Norov wrote:

On Thu, Jun 18, 2026 at 09:47:49AM +0530, Shrikanth Hegde wrote:

It is reset to indicate that any subsequent direct calls to is_cpu_allowed can't use the
old cached value of select_fallback_rq.

So events could be,

- cpu marked as non preferred - select_fallback_rq (sets the p->has_preferred_cpu_state)
Lets say CPU(300-450) are marked as non-preferred and Task affinity is (200-350)
- task moved out. Now either task's affinity changed or preferred_mask has changed.
while CPU(400) maybe still marked as non-preferred but CPU(340) is marked as preferred.
- Subsequent call to is_cpu_allowed (CPU=340) can't assume the old value.

Please, no top-posting.

My point is: out of the scope of the select_fallback_rq(), the
p->has_preferred_cpu_state is always 0 because of the line above. It
means, it doesn't belong to the task_struct, it belongs the current
scope.

So either make this caching really surviving the scope exit, or make
it a local variable.

Not sure I understood the passage above about the possible events, but
variables that are always zero out of the function scope should not be
placed in global structures.

That was the other way. add one more variable to is_cpu_allowed.

What for resetting it here? I think it should be zeroed only on update
of preferred cpumask. In other words, to properly implement caching,
you need to have a global counter incremented on each
cpu_preferred_mask update, and in task_has_preferred_cpus() you do:

{
if (p->preferred_cpu_updates == atomic_read(preferred_cpumask_updates))
return p->has_preferred_cpus;

p->preferred_cpu_updates = atomic_read(preferred_cpumask_updates);
p->has_preferred_cpus = cpumask_intersects(...);
}

Do you have any numbers that justify this caching? The best practice
is to put performance optimizations at the end of the series and
provide some sort of benchmark supporting it.

This was to avoid N**2 aspect that was there in select_fallback_rq.
Its more of the functional aspect which i mentioned above which this needs
to take care as well.

Please, collect the performance data first, then optimize your code,
not vice-versa.

It did make sense to cache it since select_fallback_rq does repeated calculations with pi_lock held and irq disabled.

return dest_cpu;
}
@@ -4612,6 +4641,7 @@ static void __sched_fork(u64 clone_flags, struct task_struct *p)
init_numa_balancing(clone_flags, p);
p->wake_entry.u_flags = CSD_TYPE_TTWU;
p->migration_pending = NULL;
+ p->has_preferred_cpu_state = 0;
init_sched_mm(p);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c7c2dea65edd..38fd84b0b8f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -4213,4 +4213,22 @@ DEFINE_CLASS_IS_UNCONDITIONAL(sched_change)
#include "ext.h"
+/*
+ * has_preferred_cpu_state is encoding two bits of information.
+ * First Byte is to encode where the call to is_cpu_allowed coming from.
+ * Second Byte is to encode the intersection of task affinity
+ * and cpu_preferred_mask.
+ *
+ * If 1st Byte is set, call to is_cpu_allowed coming from select_fallback_rq.
+ * That helps to avoid repeated calculation keeping time complexity same.
+ */
+static inline bool task_has_preferred_cpus(struct task_struct *p)

This function should be void because you change the task state.

It doesn't alter p->has_preferred_cpu_state. No?

It doesn't, but it should.

Not sure. i think this can be just a wrapper to indicate the info
to inform if this cpu should be used or not based on task affinity.

+{
+ int cached_value = p->has_preferred_cpu_state;
+
+ if (cached_value & 0x1)
+ return p->has_preferred_cpu_state >> 8;
+ else
+ return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
+}
#endif /* _KERNEL_SCHED_SCHED_H */
--
2.47.3