[RFC PATCH 0/1] sched/fair: Feature to suppress Fair Server for NOHZ_FULL isolation
From: Aaron Tomlin
Date: Mon Jan 05 2026 - 22:42:28 EST
Hi Ingo, Peter, Juri, Vincent,
This patch introduces a new scheduler feature, RT_SUPPRESS_FAIR_SERVER,
designed to ensure strict NOHZ_FULL isolation for SCHED_FIFO workloads,
particularly in the presence of resident CFS tasks.
In strictly partitioned, latency-critical environments (such as High
Frequency Trading platforms) administrators frequently employ fully
adaptive-tick CPUs to execute pinned SCHED_FIFO workloads. The fundamental
requirement is "zero OS noise"; specifically, the scheduler clock-tick must
remain suppressed ("offloaded"), given that standard SCHED_FIFO semantics
dictate no forced preemption between tasks of identical priority.
However, the extant "Fair Server" (Deadline Server) architecture
compromises this isolation guarantee. At present, should a background
SCHED_OTHER task be enqueued, the scheduler initiates the Fair Server
(dl_server_start). As the Fair Server functions as a SCHED_DEADLINE entity,
its activation increments rq->dl.dl_nr_running.
This condition compels sched_can_stop_tick() to return false, thereby
restarting the periodic tick to enforce the server's runtime.
To address this, the patch introduces a new scheduler feature control,
RT_SUPPRESS_FAIR_SERVER.
When engaged, this modification amends enqueue_task_fair() to forego the
invocation of dl_server_start() if, and only if, the following conditions
are met:
1. A Real-Time task (SCHED_FIFO/SCHED_RR) is currently in execution
2. RT bandwidth enforcement (rt_bandwidth_enabled()) is inactive
By precluding the server's initiation, rq->dl.dl_nr_running is maintained
at zero. This permits the tick logic to defer to the standard SCHED_FIFO
protocol, thereby ensuring the tick remains suppressed.
Considerations: This serves as a precision instrument for specialised
contexts. It explicitly prioritises determinism over fairness. Whilst
enabled, queued CFS tasks shall endure total starvation until such time as
the RT task voluntarily yields. I believe this is acceptable for
partitioned architectures where housekeeping duties are allocated to
alternative cores; however, I have guarded this capability within
CONFIG_NO_HZ_FULL and a default-disabled feature flag to obviate the risk
of inadvertent starvation on general-purpose systems.
I welcome your thoughts on this approach.
Aaron Tomlin (1):
sched/fair: Introduce RT_SUPPRESS_FAIR_SERVER to optimise NOHZ_FULL
isolation
kernel/sched/fair.c | 19 ++++++++++++++++++-
kernel/sched/features.h | 9 +++++++++
2 files changed, 27 insertions(+), 1 deletion(-)
--
2.51.0