Re: [Linux 5.18-rc1] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:3355 update_blocked_averages

From: Ammar Faizi
Date: Tue Apr 05 2022 - 20:21:46 EST


On 4/5/22 7:21 PM, Dietmar Eggemann wrote:
Tried to recreate the issue but no success so far. I used you config
file, clang-14 and a Xeon CPU E5-2690 v2 (2 sockets 40 CPUs) with 20
two-level cgoupv1 taskgroups '/X/Y' with 'hackbench (10 groups, 40 fds)
+ idling' running in all '/X/Y/'.

What userspace are you running?

HP Laptop, Intel i7-1165G7, 8 CPUs, with 16 GB of RAM. Ubuntu 21.10. Just for
daily workstation. Compiling kernel, browsing and coding stuff.

There seemed to be some pressure on your machine when it happened?

Yeah, might be, I don't fully remember the activity at the time it
happened, though.

<6>[13420.623334][ C7] perf: interrupt took too long (2530 > 2500),
lowering kernel.perf_event_max_sample_rate to 78900

Maybe you could split the SCHED_WARN_ON so we know which signal causes this?

OK, I will apply the diff on top of 5.18-rc1 and will start using it for daily
routine tomorrow morning. Let's see if I can hit this bug again. Will send an
update later...

Thank you.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..0d45e09e5bfc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3350,9 +3350,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq
*cfs_rq)
* Make sure that rounding and/or propagation of PELT values never
* break this.
*/
- SCHED_WARN_ON(cfs_rq->avg.load_avg ||
- cfs_rq->avg.util_avg ||
- cfs_rq->avg.runnable_avg);
+ SCHED_WARN_ON(cfs_rq->avg.load_avg);
+ SCHED_WARN_ON(cfs_rq->avg.util_avg);
+ SCHED_WARN_ON(cfs_rq->avg.runnable_avg);

return true;
}

[...]


--
Ammar Faizi