Re: [PATCH V2] sched/fair: Dequeue sched_delayed tasks when waking to a busy CPU
From: Phil Auld
Date: Mon Dec 02 2024 - 11:24:49 EST
On Sat, Nov 23, 2024 at 09:44:40AM +0100 Mike Galbraith wrote:
> On Thu, 2024-11-21 at 06:56 -0500, Phil Auld wrote:
> > On Wed, Nov 20, 2024 at 07:37:39PM +0100 Mike Galbraith wrote:
> > > On Tue, 2024-11-19 at 12:51 +0100, Mike Galbraith wrote:
> > > > On Tue, 2024-11-19 at 06:30 -0500, Phil Auld wrote:
> > > > >
> > > > > This, below, by itself, did not do help and caused a small slowdown on some
> > > > > other tests. Did this need to be on top of the wakeup change?
> > > >
> > > > No, that made a mess.
> > >
> > > Rashly speculating that turning mobile kthread component loose is what
> > > helped your write regression...
> > >
> > > You could try adding (p->flags & PF_KTHREAD) to the wakeup patch to
> > > only turn hard working kthreads loose to try to dodge service latency.
> > > Seems unlikely wakeup frequency * instances would combine to shred fio
> > > the way turning tbench loose did.
> > >
> >
> > Thanks, I'll try that.
>
> You may still want to try that, but my box says probably not. Playing
> with your write command line, the players I see are pinned kworkers and
> mobile fio instances.
>
Yep. The PF_KTHREAD thing did not help.
> Maybe try the below instead. The changelog is obsolete BS unless you
> say otherwise, but while twiddled V2 will still migrate tbench a bit,
> and per trace_printk() does still let all kinds of stuff wander off to
> roll the SIS dice, it does not even scratch the paint of the formerly
> obliterated tbench progression.
>
Will give this one a try when I get caught up from being off all week for
US turkey day.
Thanks!
> Question: did wiping off the evil leave any meaningful goodness behind?
Is that for this patch?
If you mean for the original patch (which subsequently broke the reads) then
no. It was more or less even for all the other tests. It fixed the randwrite
issue by moving it to randread. Everything else we run regularly was about
the same. So no extra goodness to help decide :)
Cheers,
Phil
>
> ---
>
> sched/fair: Dequeue sched_delayed tasks when waking to a busy CPU
>
> Phil Auld (Redhat) reported an fio benchmark regression having been found
> to have been caused by addition of the DELAY_DEQUEUE feature, suggested it
> may be related to wakees losing the ability to migrate, and confirmed that
> restoration of same indeed did restore previous performance.
>
> V2: do not rip buddies apart, convenient on/off switch
>
> Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
> Signed-off-by: Mike Galbraith <efault@xxxxxx>
> ---
> kernel/sched/core.c | 51 ++++++++++++++++++++++++++++++------------------
> kernel/sched/features.h | 5 ++++
> kernel/sched/sched.h | 5 ++++
> 3 files changed, 42 insertions(+), 19 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3783,28 +3783,41 @@ ttwu_do_activate(struct rq *rq, struct t
> */
> static int ttwu_runnable(struct task_struct *p, int wake_flags)
> {
> - struct rq_flags rf;
> - struct rq *rq;
> - int ret = 0;
> -
> - rq = __task_rq_lock(p, &rf);
> - if (task_on_rq_queued(p)) {
> - update_rq_clock(rq);
> - if (p->se.sched_delayed)
> - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> - if (!task_on_cpu(rq, p)) {
> - /*
> - * When on_rq && !on_cpu the task is preempted, see if
> - * it should preempt the task that is current now.
> - */
> - wakeup_preempt(rq, p, wake_flags);
> + CLASS(__task_rq_lock, rq_guard)(p);
> + struct rq *rq = rq_guard.rq;
> +
> + if (!task_on_rq_queued(p))
> + return 0;
> +
> + update_rq_clock(rq);
> + if (p->se.sched_delayed) {
> + int queue_flags = ENQUEUE_DELAYED | ENQUEUE_NOCLOCK;
> + int dequeue = sched_feat(DEQUEUE_DELAYED);
> +
> + /*
> + * Since sched_delayed means we cannot be current anywhere,
> + * dequeue it here and have it fall through to the
> + * select_task_rq() case further along in ttwu() path.
> + * Note: Do not rip buddies apart else chaos follows.
> + */
> + if (dequeue && rq->nr_running > 1 && p->nr_cpus_allowed > 1 &&
> + !(rq->curr->last_wakee == p || p->last_wakee == rq->curr)) {
> + dequeue_task(rq, p, DEQUEUE_SLEEP | queue_flags);
> + return 0;
> }
> - ttwu_do_wakeup(p);
> - ret = 1;
> +
> + enqueue_task(rq, p, queue_flags);
> + }
> + if (!task_on_cpu(rq, p)) {
> + /*
> + * When on_rq && !on_cpu the task is preempted, see if
> + * it should preempt the task that is current now.
> + */
> + wakeup_preempt(rq, p, wake_flags);
> }
> - __task_rq_unlock(rq, &rf);
> + ttwu_do_wakeup(p);
>
> - return ret;
> + return 1;
> }
>
> #ifdef CONFIG_SMP
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -47,6 +47,11 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true)
> * DELAY_ZERO clips the lag on dequeue (or wakeup) to 0.
> */
> SCHED_FEAT(DELAY_DEQUEUE, true)
> +/*
> + * Free ONLY non-buddy delayed tasks to wakeup-migrate to avoid taking.
> + * an unnecessary latency hit. Rending buddies asunder inflicts harm.
> + */
> +SCHED_FEAT(DEQUEUE_DELAYED, true)
> SCHED_FEAT(DELAY_ZERO, true)
>
> /*
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1783,6 +1783,11 @@ task_rq_unlock(struct rq *rq, struct tas
> raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
> }
>
> +DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct,
> + _T->rq = __task_rq_lock(_T->lock, &_T->rf),
> + __task_rq_unlock(_T->rq, &_T->rf),
> + struct rq *rq; struct rq_flags rf)
> +
> DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct,
> _T->rq = task_rq_lock(_T->lock, &_T->rf),
> task_rq_unlock(_T->rq, _T->lock, &_T->rf),
>
--