Re: [PATCH 1/1] sched/deadline: Log Fair Server re-enablement for symmetry with debugfs

From: Shrikanth Hegde

Date: Mon Jan 12 2026 - 13:50:05 EST

On 1/12/26 8:02 PM, Aaron Tomlin wrote:

On Mon, Jan 12, 2026 at 10:44:03AM +0530, K Prateek Nayak wrote:

I believe the suggested solution to that was to trace the reason for the
kthread/fair task waking up on isolated CPUs and prevent the wakeup if
it is for some unnecessary operation as opposed to disabling the fair
server.

Hi Prateek,

We have tools like https://docs.kernel.org/trace/osnoise-tracer.html to
capture these noise. Trace the noise, bring up the case where isolation
is broken on the current *upstream* kernel to the mailing list, and we
can solve it for everyone instead of disabling fair server as a duct
tape.

Thank you for your insights.

I fully concur that, in an ideal world, the "correct" solution is
invariably to identify and eliminate the root cause of any spurious
SCHED_NORMAL wakeups on isolated CPUs. Tools such as the osnoise tracer are
indeed invaluable for this pursuit.

However, I would respectfully submit that there remains a distinction
between the theoretical purity of the kernel and the pragmatic reality of
managing highly specialised, latency-critical partitions.

It is pertinent to note that the kernel currently affords users the
capability to manually modify the Fair Server's parameters via
/sys/kernel/debug/sched/fair_server/. As this resides within debugfs, it
is, by definition, a debug-only interface and not strictly considered
"production safe" or guaranteed to be free from side effects. The capacity
for a user to destabilise their system via this interface - effectively
"shooting themselves in the foot" - already exists. This existing interface
is useful for educated users who are willing to accept full accountability
for system stability in exchange for absolute determinism for a defined
period of time.

Juri, Peter, is changing the fair server's bandwidth frequently very
common scenario is the field?

If not, can we add a pr_warn() for when the fair server's parameters
are changed by the userspace just to catch any absurd values that
reduce the bandwidth to a minimum without disabling the server?

I can do something absolutely stupid like this without dmesg logging
anything that would indicate I'm being stupid:

# echo 4000000000 > /sys/kernel/debug/sched/fair_server/cpu0/period
# echo 1 > /sys/kernel/debug/sched/fair_server/cpu0/runtime
# sudo taskset -c 0 chrt -r 99 ~/scripts/loop&
# taskset -c 0 bash -c 'mkdir /sys/fs/cgroup/cg0; echo $$ > /sys/fs/cgroup/cg0/cgroup.procs;'

... wait for a while

INFO: task bash:4272 blocked for more than 120 seconds.
Not tainted 6.19.0-rc1-tip+ #162
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:bash state:D stack:0 pid:4272 tgid:4272 ppid:4271 task_flags:0x400100 flags:0x00080000

A taint might be too far but a log should be acceptable?

Regarding your valid concern about visibility and safety: I am agreeable to
hardening the observability of such changes. In the next iteration, I
propose to introduce a pr_warn() that triggers whenever the Fair Server's
runtime or period is modified from its default value (50 * NSEC_PER_MSEC
and 1000 * NSEC_PER_MSEC). This will ensure that any deviation - whether it
be a complete disablement or a reduction to unsafe levels - is clearly
logged, rightfully alerting administrators to the non-standard
configuration without removing the latitude required by those who
explicitly need to make that trade-off.

Currently it is 5%. It is going to be tricky to define unsafe levels.

Looks like Either one wants it or don't want interference from it. Are there any
users changing the default value?