Re: [PATCH v2] sched: Warn on long periods of pending need_resched

From: Peter Zijlstra
Date: Wed Mar 24 2021 - 08:16:12 EST


On Wed, Mar 24, 2021 at 11:42:24AM +0000, Mel Gorman wrote:
> On Wed, Mar 24, 2021 at 11:54:24AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 24, 2021 at 10:37:43AM +0100, Peter Zijlstra wrote:
> > > Should we perhaps take out all SCHED_DEBUG sysctls and move them to
> > > /debug/sched/ ? (along with the existing /debug/sched_{debug,features,preemp}
> > > files)
> > >
> > > Having all that in sysctl and documented gives them far too much sheen
> > > of ABI.
> >
> > ... a little something like this ...
> >
>
> I did not read this particularly carefully or boot it to check but some
> of the sysctls moved are expected to exist and should never should have
> been under SCHED_DEBUG.
>
> For example, I'm surprised that numa_balancing is under the SCHED_DEBUG
> sysctl because there are legimiate reasons to disable that at runtime.
> For example, HPC clusters running various workloads may disable NUMA
> balancing globally for particular jobs without wanting to reboot and
> reenable it when finished.

Yeah, lets say I was pleasantly surprised to find it there :-)

> Moving something like sched_min_granularity_ns will break a number of
> tuning guides as well as the "tuned" tool which ships by default with
> some distros and I believe some of the default profiles used for tuned
> tweak kernel.sched_min_granularity_ns

Yeah, can't say I care. I suppose some people with PREEMPT=n kernels
increase that to make their server workloads 'go fast'. But I'll
absolutely suck rock on anything desktop.

These knobs really shouldn't have been as widely available as they are.

And guides, well, the writes have to earn a living too, right.

> Whether there are legimiate reasons to modify those values or not,
> removing them may generate fun bug reports.

Which I'll close with -EDONTCARE, userspace has to cope with
SCHED_DEBUG=n in any case.