Re: [PATCH] [RFC]: sched/deadline: Avoid double enqueue_pushable_dl_task() warning
From: John Stultz
Date: Fri Mar 06 2026 - 18:45:36 EST
On Wed, Mar 4, 2026 at 1:51 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, Mar 04, 2026 at 08:06:53AM +0100, Juri Lelli wrote:
> > On 03/03/26 19:41, John Stultz wrote:
> > > In testing with the full Proxy Execution patch stack, I found
> > > I would occasionally trip over the !RB_EMPTY_NODE() WARN_ON in
> > > enqueue_pushable_dl_task(), where the task we're adding to the
> > > pushable list is already enqueued.
> > >
> > > This triggers from put_prev_task_dl(), where it seems we go into
> > > put_prev_task_dl()
> > > -> update_curr_dl()
> > > -> update_curr_dl_se() [hitting the dl_runtime_exceeded() case]
> > > -> enqueue_task_dl()
> > > -> enqueue_pushable_dl_task()
> > >
> > > Adding the task to the pushable the first time.
> >
> > Ah, so in case the task is boosted (or we fail to start the
> > replenishment timer).
> >
> > > Then we back up the call stack to put_prev_task_dl(), which at
> > > the end again calls enqueue_pushable_dl_task(), trying to add it
> > > a second time, tripping the warning.
> > >
> > > To avoid this, add a dl_task_pushable() helper which we can use
> > > to replace the RB_EMPTY_NODE checks elsewhere, and then before
> > > enqueueing in put_prev_task_dl(), we can first check
> > > dl_task_pushable() to avoid the double enqueue.
> >
> > Can't we just return early (as we do already in dequeue_pushable
> > _dl_task()) in enqueue_pushable_dl_task() instead of checking before
> > calling that function?
>
> So I was mightily confused for a moment by all this.
>
> But it turns out DL keeps current *in* the tree; since ->deadline is a
> lot less mutable than ->vruntime this is possible.
>
> But that also means that set_next_task() / put_prev_task() are
> 'simpler'.
>
> However, in this case I think they are too simple, and its leading to
> problems.
>
> So update_curr_dl() needs to dequeue+enqueue because it is pushing
> ->deadline (because current is in tree). Then because of that, it also
> ends up doing enqueue_pushable, but that is actively wrong, you must not
> do that on current.
>
> Only once you're doing put_prev_task() must you do that.
>
> Imagine this happens because of an update_curr() that is not from
> put_prev_task(), then you end up with current on the pushable list,
> which is a big no-no.
>
> So I think I'd like to see dl_rq->curr tracking, similar to what we have
> for fair.
>
> If you do that (see below), you'll probably find this case was already
> handled 'correctly' but was broken by the PE thing ;-)
>
> Note the !task_current() clause in enqueue_task_dl() guarding
> enqueue_pushable_dl_task().
>
> Hmm?
So yes, you're right it is really a proxy-exec thing. Apologies for my
confusion. :)
After digging further, the specific case I was hitting it with was
with the full proxy stack, when the lock owner releases the lock and
wake's the donor.
In try_to_wake_up() we'd hit the proxy_needs_return(), which since
we're waking the donor, we proxy_resched_idle() to temporarily make
idle the donor, and call put_prev_task() on rq->donor.
So it was then in put_prev_task_dl() that we'd update_curr_dl(), do
the dequeue+enqueue and since its the donor, not task_current() we'd
get added to the pushable list. Then at the end of put_prev_task_dl()
we'd again call enqueue_pushable_dl() and hit the warning as the
pushable_dl_task node isn't empty.
And because without proxy-exec put_prev_task() is only called on
current, the task_current() check in enqueue_task_dl() would prevent
the first enqueue_pushable_dl().
Now your patch does resolve this for proxy-exec, as since the donor
will be dl_rq->curr we'll skip enquing it as pushable from
update_curr_dl(), and we clear dl_rq->curr right afterwards so that we
only make it pushable once.
That said, I had to make some modifications:
1) As mentioned earlier, I needed to move the dl_rq->curr assignment
up before the !first check in set_next_task_dl(), so that we properly
set the curr value when tasks switches to DL (via
__sched_setscheduler)
2) I needed to still preserve the task_current() check in
enqueue_task_dl() that you dropped (replacing with the dl_rq->curr
check), as we'd trip a similar problem when the current task was
switched to DL, since the rq->curr would be enqueued to DL and since
set_next_task() hasn't run (and won't if rq->curr isn't the donor as
well), rq_dl->curr would go unset and it would be made pushable.
With those two tweaks it has been running ok in testing.
I've also split out your cleanups into a separate patch just so the
functional change is easier to review.
Since this is proxy-exec specific, I'll include these both in my next
submission of the donor-migration logic.
Let me know if you think I'm still missing anything subtle here.
thanks
-john