Re: [PATCH] sched/fair: Untangle NEXT_BUDDY and pick_next_task()

From: Adam Li
Date: Fri Nov 29 2024 - 05:37:23 EST


On 11/29/2024 6:18 PM, Peter Zijlstra wrote:
> On Fri, Nov 29, 2024 at 11:15:41AM +0100, Peter Zijlstra wrote:
>> On Fri, Nov 29, 2024 at 10:55:00AM +0100, Peter Zijlstra wrote:
>>
>>> Anyway.. I'm sure I started a patch series cleaning up the whole next
>>> buddy thing months ago (there's more problems here), but I can't seem to
>>> find it in a hurry :/
>>
>> There was this..
>
> And this I think.
>
> Adam, what was the reason you were enabling NEXT_BUDDY in the first
> place?
>
Hi Peter,

I am tuning Specjbb critical-jOPS, which is latency sensitive.
NEXT_BUDDY affects schedule latency so I tried to enable NEXT_BUDDY.
However Specjbb critical-jOPS drops with NEXT_BUDDY enabled (after my patch fixing panic).

I will test your new NEXT_BUDDY patches.

> I think someone (Ingo?) was proposing we kill the wakeup preempt thing;
> and I suspect you don't actually care about that but instead want either
> the cgroup or the yield_to_task()/KVM thing working.
>
> ---
> Subject: sched/fair: Add CGROUP_BUDDY feature
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Fri Nov 29 10:49:45 CET 2024
>
> Add a feature to toggle the cgroup optimization.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/sched/fair.c | 3 ++-
> kernel/sched/features.h | 8 +++++++-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7126,7 +7126,8 @@ static int dequeue_entities(struct rq *r
> * Bias pick_next to pick a task from this cfs_rq, as
> * p is sleeping when it is within its sched_slice.
> */
> - if (task_sleep && se && !throttled_hierarchy(cfs_rq))
> + if (sched_feat(CGROUP_BUDDY) &&
> + task_sleep && se && !throttled_hierarchy(cfs_rq))
> set_next_buddy(se);
> break;
> }
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -32,11 +32,17 @@ SCHED_FEAT(PREEMPT_SHORT, true)
> SCHED_FEAT(NEXT_BUDDY, false)
>
> /*
> + * Optimization for cgroup scheduling where a dequeue + pick tries
> + * to share as much of the hierarchy as possible.
> + */
> +SCHED_FEAT(CGROUP_BUDDY, true)
> +
> +/*
> * Allow completely ignoring cfs_rq->next; which can be set from various
> * places:
> * - NEXT_BUDDY (wakeup preemption)
> * - yield_to_task()
> - * - cgroup dequeue / pick
> + * - CGROUP_BUDDY (cgroup dequeue / pick)
> */
> SCHED_FEAT(PICK_BUDDY, true)
>

Thanks,
-adam