Tried to recreate the issue but no success so far. I used you config
file, clang-14 and a Xeon CPU E5-2690 v2 (2 sockets 40 CPUs) with 20
two-level cgoupv1 taskgroups '/X/Y' with 'hackbench (10 groups, 40 fds)
+ idling' running in all '/X/Y/'.
What userspace are you running?
There seemed to be some pressure on your machine when it happened?
<6>[13420.623334][ C7] perf: interrupt took too long (2530 > 2500),
lowering kernel.perf_event_max_sample_rate to 78900
Maybe you could split the SCHED_WARN_ON so we know which signal causes this?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..0d45e09e5bfc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3350,9 +3350,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq
*cfs_rq)
* Make sure that rounding and/or propagation of PELT values never
* break this.
*/
- SCHED_WARN_ON(cfs_rq->avg.load_avg ||
- cfs_rq->avg.util_avg ||
- cfs_rq->avg.runnable_avg);
+ SCHED_WARN_ON(cfs_rq->avg.load_avg);
+ SCHED_WARN_ON(cfs_rq->avg.util_avg);
+ SCHED_WARN_ON(cfs_rq->avg.runnable_avg);
return true;
}
[...]