Re: [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs

From: Peter Zijlstra

Date: Fri May 29 2026 - 05:17:45 EST


On Fri, May 29, 2026 at 09:09:30AM +0200, Andrea Righi wrote:
> Hi Peter,
>
> On Tue, May 26, 2026 at 02:07:53PM +0200, Juri Lelli wrote:
> > Hi Andrea,
> >
> > On 26/05/26 12:05, Andrea Righi wrote:
> > > Writing runtime or period via the per-CPU dl_server debugfs files
> > > (/sys/kernel/debug/sched/{fair,ext}_server/cpu*/{runtime,period}) on an
> > > offline CPU can trigger two distinct kernel issues:
> > >
> > > 1) Divide-by-zero in dl_server_apply_params():
> > >
> > > Oops: divide error: 0000 [#1] SMP NOPTI
> > > RIP: 0010:dl_server_apply_params+0x239/0x3a0
> > > Call Trace:
> > > sched_server_write_common.isra.0+0x21a/0x3c0
> > > full_proxy_write+0x78/0xd0
> > > vfs_write+0xe7/0x6e0
> > >
> > > Both __dl_sub() and __dl_add() divide by cpus internally, which can be
> > > 0 once the CPU has been removed from any active root-domain span (this
> > > has been latent since the debugfs interface was introduced).
> > >
> > > 2) WARN_ON_ONCE in dl_server_start():
> > >
> > > WARNING: kernel/sched/deadline.c:1805 at dl_server_start+0x232/0x270
> > >
> > > Commit ee6e44dfe6e5 ("sched/deadline: Stop dl_server before CPU goes
> > > offline") added this check to catch enqueueing the server on an
> > > offline rq.
> > >
> > > There's no meaningful semantics for re-configuring the per-CPU dl_server
> > > bandwidth while the CPU is offline, so simply reject the write with
> > > -EBUSY so userspace gets a clear error.
> > >
> > > Reported-by: Sashiko <sashiko-bot@xxxxxxxxxx>
> > > Closes: https://lore.kernel.org/all/20260526092228.3B6891F00A3A@xxxxxxxxxxxxxxx/
> > > Fixes: d741f297bcea ("sched/fair: Fair server interface")
> > > Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>
> > > ---
> > > kernel/sched/debug.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> > > index ed3a0d65da0ca..e57ad8c78a60e 100644
> > > --- a/kernel/sched/debug.c
> > > +++ b/kernel/sched/debug.c
> > > @@ -415,6 +415,9 @@ static ssize_t sched_server_write_common(struct file *filp, const char __user *u
> > > return -EINVAL;
> > > }
> > >
> > > + if (!cpu_online(cpu_of(rq)))
> > > + return -EBUSY;
> > > +
> > > update_rq_clock(rq);
> > > dl_server_stop(dl_se);
> > > retval = dl_server_apply_params(dl_se, runtime, period, 0);
> >
> > I was looking at Sashiko findings and wondered what to do about this as
> > well. I think what you are proposing should be fine, unless for some
> > reason one wants to tweak dl-server parameters before swithcing a CPU
> > on. but since hotplug it's a disruptive operation already, I would say
> > imposing to make such a change after CPU is online should be ok (and
> > simpler to get right from a bandwidth accounting pov).
> >
> > Reviewed-by: Juri Lelli <juri.lelli@xxxxxxxxxx>
>
> If this makes sense to you, could you add it to your queue:sched/core?
>
> Otherwise it's possible to trigger the issues above by changing dl_server
> bandwidth for offline CPUs.

Right, so I had seen these patches fly by, but then I couldn't find it
in a hurry when I took these other patches :/.

It is a bit unfortunate, but oh well. I'm sure people will try and fix
it if they're too annoyed by this behaviour and we can revisit thing
then.

So yes, let me go add this to the other two.