Re: sched/fair: Kernel panics in pick_next_entity

From: Peter Zijlstra
Date: Wed Oct 02 2024 - 04:49:46 EST


On Tue, Oct 01, 2024 at 10:30:26AM +0200, Mike Galbraith wrote:
> On Tue, 2024-10-01 at 00:45 +0530, Vishal Chourasia wrote:
> > >
> > for sanity, I ran the workload (kernel compilation) on the base commit
> > where the kernel panic was initially observed, which resulted in a
> > kernel panic, along with it couple of warnings where also printed on the
> > console, and a circular locking dependency warning with it.
> >
> > Kernel 6.11.0-kp-base-10547-g684a64bf32b6 on an ppc64le
> >
> > ------------[ cut here ]------------
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 6.11.0-kp-base-10547-g684a64bf32b6 #69 Not tainted
> > ------------------------------------------------------
>
> ...
>
> > --- interrupt: 900
> > se->sched_delayed
> > WARNING: CPU: 1 PID: 27867 at kernel/sched/fair.c:6062 unthrottle_cfs_rq+0x644/0x660
>
> ...that warning also spells eventual doom for the box, here it does
> anyway, running LTPs cfs_bandwidth01 testcase and hackbench together,
> box grinds to a halt in pretty short order.
>

Right, I've picked up your patch for sched/urgent. But this does make me
question Vishal's setup.

He said all he does is compile a kernel, but afaik no regular setup uses
CFS bandwidth by default. So something is 'special' at his end that he's
not been telling us about.

Vishal, could you expand upon your configuration? How come you're using
CFS bandwidth, what else is special?