Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

From: Chen Yu
Date: Thu Jun 13 2024 - 07:47:33 EST

Next message: joswang: "Re: [PATCH v4, 3/3] usb: dwc3: core: Workaround for CSR read timeout"
Previous message: Wolfram Sang: "Re: [PATCH v9 1/1] gpio: add sloppy logic analyzer using polling"
In reply to: Chunxin Zang: "Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024-06-11 at 21:10:50 +0800, Chunxin Zang wrote:
>
>
> > On Jun 7, 2024, at 10:38, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
> >
> > On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
> >>
> >>
> >>> On Jun 6, 2024, at 01:19, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
> >>>
> >>>
> >>> Sorry for the late reply and thanks for help clarify this. Yes, this is
> >>> what my previous concern was:
> >>> 1. It does not consider the cgroup and does not check preemption in the same
> >>> level which is covered by find_matching_se().
> >>> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> >>> later pick_eevdf() will check the eligible of current anyway. But
> >>> as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> >>> I just wonder if we could leverage the cfs_rq->next to store the next
> >>> candidate, so it can be picked directly in the 2nd pick as a fast path?
> >>> Something like below untested:
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 8a5b1ae0aa55..f716646d595e 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> >>> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> >>> {
> >>> struct task_struct *curr = rq->curr;
> >>> - struct sched_entity *se = &curr->se, *pse = &p->se;
> >>> + struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> >>> struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> >>> int cse_is_idle, pse_is_idle;
> >>>
> >>> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >>> /*
> >>> * XXX pick_eevdf(cfs_rq) != se ?
> >>> */
> >>> - if (pick_eevdf(cfs_rq) == pse)
> >>> + next = pick_eevdf(cfs_rq);
> >>> + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> >>> + set_next_buddy(next);
> >>> +
> >>> + if (next == pse)
> >>> goto preempt;
> >>>
> >>> return;
> >>>
> >>>
> >>> thanks,
> >>> Chenyu
> >>
> >> Hi Chen
> >>
> >> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
> >> the RB-tree twice, I initially had two methods in mind.
> >> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
> >> This idea is similar to the one you proposed this time.
> >> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.'
> >> Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
> >> process to schedule' are two different things.
> >
> > I agree, and it seems that in current eevdf implementation the former relies on the latter.
> >
> >> 'check_preempt_wakeup_fair' is not just to
> >> check if the newly awakened process should preempt the current process; it can also serve
> >> as an opportunity to check whether any other processes should preempt the current one,
> >> thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
> >> the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
> >> then the current process will still not be preempted.
> >
> > I thought Mike has proposed a patch to deal with this scenario you mentioned above:
> > https://lore.kernel.org/lkml/e17d3d90440997b970067fe9eaf088903c65f41d.camel@xxxxxx/
> >
> > And I suppose you are refering to increase the preemption chance on current rather than reducing
> > the invoke of pick_eevdf() in check_preempt_wakeup_fair().
>
> Hi chen
>
> Happy holidays. I believe the modifications here will indeed provide more opportunities for preemption,
> thereby leading to lower scheduling latencies, while also truly reducing calls to pick_eevdf. It's a win-win situation. :)
>
> I conducted a test. It involved applying my modifications on top of MIKE PATCH, along with
> adding some statistical counts following your previous method, in order to assess the potential
> benefits of my changes.
>

[snip]

> Looking at the results, adding an ineligible check for the se within check_preempt_wakeup_fair
> can prevent 3% of pick_eevdf calls under the RUN_TO_PARITY feature, and in the case of
> NO_RUN_TO_PARITY, it can prevent 30% of pick_eevdf calls. It was also discovered that the
> patch_preempt_only_count is at 0, indicating that all invalid checks for the se are correct.
>
> It's worth mentioning that under the RUN_TO_PARITY feature, the number of preemptions
> triggered by 'pick_eevdf != se' would be 2.25 times that of the original version, which could
> lead to a series of other performance issues. However, logically speaking, this is indeed reasonable. :(
>
>

I wonder if we can only do this for NO_RUN_TO_PARITY? That is to say, if RUN_TO_PARITY is enabled,
we do not preempt the current task based on its eligibility in check_preempt_wakeup_fair()
or entity_tick(). Personally I don't have objection to increase the preemption a little bit, however
it seems that we have encountered over-scheduling and that is why RUN_TO_PARITY was introduced,
and RUN_TO_PARITY means "respect the slice" per my understanding.

> > So I think NEXT_BUDDY has more or less reduced the rb-tree scan.
> >
> > thanks,
> > Chenyu
>
> I'm not completely sure if my understanding is correct, but NEXT_BUDDY can only cache the process
> that has been woken up; it doesn't necessarily correspond to the result returned by pick_eevdf. Furthermore,
> even if it does cache the result returned by pick_eevdf, by the time the next scheduling occurs, due to
> other processes enqueing or dequeuing, it might not be the result picked by pick_eevdf at that moment.
> Hence, it's a 'best effort' approach, and therefore, its impact on scheduling latency may vary depending
> on the use case.
>

That is true, currently the NEXT_BUDDY is set to the wakee if it is eligible, not mean it is the best
candidate in the tree. I think it is 'best effort' to reduce the wakeup latency rather than fairness.

thanks,
Chenyu

Next message: joswang: "Re: [PATCH v4, 3/3] usb: dwc3: core: Workaround for CSR read timeout"
Previous message: Wolfram Sang: "Re: [PATCH v9 1/1] gpio: add sloppy logic analyzer using polling"
In reply to: Chunxin Zang: "Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]