Re: sched/fair: Kernel panics in pick_next_entity
From: Vishal Chourasia
Date: Wed Oct 02 2024 - 14:23:13 EST
On Wed, Oct 02, 2024 at 10:49:32AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 01, 2024 at 10:30:26AM +0200, Mike Galbraith wrote:
> > On Tue, 2024-10-01 at 00:45 +0530, Vishal Chourasia wrote:
> > > >
> > > for sanity, I ran the workload (kernel compilation) on the base commit
> > > where the kernel panic was initially observed, which resulted in a
> > > kernel panic, along with it couple of warnings where also printed on the
> > > console, and a circular locking dependency warning with it.
> > >
> > > Kernel 6.11.0-kp-base-10547-g684a64bf32b6 on an ppc64le
> > >
> > > ------------[ cut here ]------------
> > >
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 6.11.0-kp-base-10547-g684a64bf32b6 #69 Not tainted
> > > ------------------------------------------------------
> >
> > ...
> >
> > > --- interrupt: 900
> > > se->sched_delayed
> > > WARNING: CPU: 1 PID: 27867 at kernel/sched/fair.c:6062 unthrottle_cfs_rq+0x644/0x660
> >
> > ...that warning also spells eventual doom for the box, here it does
> > anyway, running LTPs cfs_bandwidth01 testcase and hackbench together,
> > box grinds to a halt in pretty short order.
> >
>
> Right, I've picked up your patch for sched/urgent. But this does make me
> question Vishal's setup.
>
> He said all he does is compile a kernel, but afaik no regular setup uses
> CFS bandwidth by default. So something is 'special' at his end that he's
> not been telling us about.
Yes Peter, I'm compiling the kernel from source. While I'm not running the
compilation within a cgroup that has bandwidth limits set, there are some
system services running in the background that do have bandwidth
limitations applied.
# find . -name cpu.max -exec cat {} +
max 100000
max 100000
max 100000
max 100000
max 100000
max 100000
5000 100000
34000 100000
10000 100000
31000 100000
max 100000
max 100000
max 100000
max 100000
max 100000
max 100000
>
> Vishal, could you expand upon your configuration? How come you're using
> CFS bandwidth, what else is special?
config cfs_bandwidth is enabled by default in both the
pseries_le_defconfig and the distro kernel config I'm using for the
compilation.
Let me know if you need any more info. I hope I have answered your
queries.
Thanks!