Re: [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()

From: Peter Zijlstra
Date: Fri Nov 29 2024 - 06:45:35 EST


On Fri, Nov 29, 2024 at 06:37:06PM +0800, Adam Li wrote:
> On 11/29/2024 6:18 PM, Peter Zijlstra wrote:
> > On Fri, Nov 29, 2024 at 11:15:41AM +0100, Peter Zijlstra wrote:
> >> On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:
> >>
> >>> Anyway.. I'm sure I started a patch series cleaning up the whole next
> >>> buddy thing months ago (there's more problems here), but I can't seem to
> >>> find it in a hurry :/
> >>
> >> There was this..
> >
> > And this I think.
> >
> > Adam, what was the reason you were enabling NEXT_BUDDY in the first
> > place?
> >
> Hi Peter,
>
> I am tuning Specjbb critical-jOPS, which is latency sensitive.

There is a lot to latency, sometimes it's best to not preempt. I think
Prateek has found a fair number of workloads where SCHED_BATCH has been
helpful.

> NEXT_BUDDY affects schedule latency so I tried to enable NEXT_BUDDY.
> However Specjbb critical-jOPS drops with NEXT_BUDDY enabled (after my patch fixing panic).

Yes, picking outside of the EEVDF policy can make worse decisions for
latency.

The yield_to_task() can help performance for KVM (the only user AFAIK
-- oh DMA fences seem to also use it these days).

And the CGROUP_BUDDY thing can sometimes help when using cgroups.

But the wakeup thing is very situational -- it's disabled for a reason.
Unfortunately it seems to also have disabled the other users, which
wasn't intended.

> I will test your new NEXT_BUDDY patches.

We still need Prateek's fix. That ensures a delayed task will ever end
up being ->next.