Re: [PATCH RFC] sched: Disable DL server if sysctl_sched_rt_runtime is -1

From: Joel Fernandes
Date: Wed Mar 05 2025 - 14:56:39 EST


On Wed, Mar 05, 2025 at 09:30:33AM +0000, Juri Lelli wrote:
> Hi Joel,
>
> On 04/03/25 18:47, Joel Fernandes wrote:
> > On Tue, Mar 04, 2025 at 03:06:32PM -0500, Steven Rostedt wrote:
> > > On Tue, 4 Mar 2025 15:01:16 -0500
> > > Joel Fernandes <joelagnelf@xxxxxxxxxx> wrote:
> > >
> > > > Currently, RCU boost testing in rcutorture is broken because it relies on
> > > > having RT throttling disabled. This means the test will always pass (or
> > > > rarely fail). This occurs because recently, RT throttling was replaced
> > > > by DL server which boosts CFS tasks even when rcutorture tried to
> > > > disable throttling (see rcu_torture_disable_rt_throttle()).
> > > >
> > > > Therefore this patch prevents DL server from starting when RC torture
> > > > sets the sysctl_sched_rt_runtime to -1.
> > > >
> > > > With this patch, boosting in TREE09 fails more than 50% of boost attempts
> > > > making the test more useful.
> > > >
> > > > Also add a check of this to task_non_contending() because otherwise it
> > > > throws a warning (in the case when DL server was already started before
> > > > rcutorture started).
> > > >
> > >
> > > Hmm, I wonder if dl_server caused a regression. That is, disabling rt
> > > throttling should allow RT tasks to starve anything it wants. And some RT
> > > applications rely on this.
> > >
> > > Should this include a Fixes and Cc stable?
> >
> > Yeah that makes sense to me, I'll include the Fixes tag in the v2.
>
> Not entirely sure we want to link the (legacy?) sched_rt_runtime
> interface to DL server, as it has its own new interface at
>
> /sys/kernel/debug/sched/fair_server/cpuX/*
>
> Admittedly thought the latter is a debug interface, which is not ideal.
>
> I was thinking we might want/need to add a kernel cmdline parameter to
> tweak DL server parameters at boot (and possibly disable it), but it is
> indeed less flexible than an interface tweakable at runtime.
>
> If we end up using sched_rt_runtime (and _period) to tweak DL server I
> believe we should make sure changes are consistent with the debug
> interface at least.

We could also do it the following way, if RT bw is disabled, then just don't
start the server stuff from CFS. This also makes the RCU tests work and also
addresses the issue Steven raised. It does not require messing with DL server
interfaces. Thoughts?

I'll send a v2 along these lines as well as Paul is also testing and I'd
like him to apply the patch as well. But here's preview:

---8<-----------------------

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c0ef435a7aa..d7ba333393f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1242,7 +1242,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
* against fair_server such that it can account for this time
* and possibly avoid running this period.
*/
- if (dl_server_active(&rq->fair_server))
+ if (dl_server_active(&rq->fair_server) && rt_bandwidth_enabled())
dl_server_update(&rq->fair_server, delta_exec);
}

@@ -5957,7 +5957,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
sub_nr_running(rq, queued_delta);

/* Stop the fair server if throttling resulted in no runnable tasks */
- if (rq_h_nr_queued && !rq->cfs.h_nr_queued)
+ if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server))
dl_server_stop(&rq->fair_server);
done:
/*
@@ -6056,7 +6056,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
}

/* Start the fair server if un-throttling resulted in new runnable tasks */
- if (!rq_h_nr_queued && rq->cfs.h_nr_queued)
+ if (!rq_h_nr_queued && rq->cfs.h_nr_queued && rt_bandwidth_enabled())
dl_server_start(&rq->fair_server);

/* At this point se is NULL and we are at root level*/
@@ -7005,9 +7005,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)

if (!rq_h_nr_queued && rq->cfs.h_nr_queued) {
/* Account for idle runtime */
- if (!rq->nr_running)
+ if (!rq->nr_running && rt_bandwidth_enabled())
dl_server_update_idle_time(rq, rq->curr);
- dl_server_start(&rq->fair_server);
+
+ if (rt_bandwidth_enabled())
+ dl_server_start(&rq->fair_server);
}

/* At this point se is NULL and we are at root level*/
@@ -7134,7 +7136,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)

sub_nr_running(rq, h_nr_queued);

- if (rq_h_nr_queued && !rq->cfs.h_nr_queued)
+ if (rq_h_nr_queued && !rq->cfs.h_nr_queued && dl_server_active(&rq->fair_server))
dl_server_stop(&rq->fair_server);

/* balance early to pull high priority tasks */
--
2.43.0