Re: [Linux 5.18-rc1] WARNING: CPU: 1 PID: 0 at kernel/sched/fair.c:3355 update_blocked_averages

From: Dietmar Eggemann
Date: Thu Apr 07 2022 - 06:52:41 EST


On 06/04/2022 22:34, Ammar Faizi wrote:
> On 4/6/22 7:21 PM, Dietmar Eggemann wrote:
>> On 05/04/2022 15:13, Ammar Faizi wrote:
>>> On 4/5/22 7:21 PM, Dietmar Eggemann wrote:

[...]

> Not familiar with CFS stuff, but here...
>
> ===============
> ammarfaizi2@integral2:~$ mount | grep "cgroup2\|\bcpu\b"
> cgroup2 on /sys/fs/cgroup type cgroup2
> (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
> ammarfaizi2@integral2:~$ cat /sys/fs/cgroup/unified/cgroup.controllers
> cat: /sys/fs/cgroup/unified/cgroup.controllers: No such file or directory
> ammarfaizi2@integral2:~$ ls /sys/fs/cgroup/{cpu,cpuacct}
> ls: cannot access '/sys/fs/cgroup/cpu': No such file or directory
> ls: cannot access '/sys/fs/cgroup/cpuacct': No such file or directory

[...]

Looks like 21.10 finally abandoned legacy cgroup v1 and switched to v2
completely, which is now mounted under /sys/fs/cgroup .

So your /sys/fs/cgroup/cgroup.controllers should contain `cpu`.

Can you check if any of the cpu.max files under /sys/fs/cgroup has
something else then `max 100000` ?

Background is that if this is the case, cgroups (i.e. cfs_rqs) might be
throttled and this could be related to what you see. I haven't
stress-test it so far with active CFS BW ctrl (cfs_rq throttling).

> Update:
> So far I have been using and torturing my machine for a day, but
> still couldn't reproduce the issue. It seems I hit a rarely
> happened bug. I will continue using this until 5.18-rc2 before
> recompile my kernel.

Thanks.