Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

From: Chunxin Zang
Date: Mon May 27 2024 - 22:43:17 EST



> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
>
> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>> I found that some tasks have been running for a long enough time and
>> have become illegal, but they are still not releasing the CPU. This
>> will increase the scheduling delay of other processes. Therefore, I
>> tried checking the current process in wakeup_preempt and entity_tick,
>> and if it is illegal, reschedule that cfs queue.
>>
>> The modification can reduce the scheduling delay by about 30% when
>> RUN_TO_PARITY is enabled.
>> So far, it has been running well in my test environment, and I have
>> pasted some test results below.
>>
>
> Interesting, besides hackbench, I assume that you have workload in
> real production environment that is sensitive to wakeup latency?

Hi Chen

Yes, my workload are quite sensitive to wakeup latency .
>
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..a0005d240db5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>> return;
>> #endif
>> +
>> + if (!entity_eligible(cfs_rq, curr))
>> + resched_curr(rq_of(cfs_rq));
>> }
>>
>
> entity_tick() -> update_curr() -> update_deadline():
> se->vruntime >= se->deadline ? resched_curr()
> only current has expired its slice will it be scheduled out.
>
> So here you want to schedule current out if its lag becomes 0.
>
> In lastest sched/eevdf branch, it is controlled by two sched features:
> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>
> Maybe something like this can achieve your goal
> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
> resched_curr
>
>>
>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>> return;
>>
>> + if (!entity_eligible(cfs_rq, se))
>> + goto preempt;
>> +
>
> Not sure if this is applicable, later in this function, pick_eevdf() checks
> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
> be evicted. And this change does not consider the cgroup hierarchy.
>
> Besides, the check of current eligiblity can get false negative result,
> if the enqueued entity has a positive lag. Prateek proposed to
> remove the check of current's eligibility in pick_eevdf():
> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@xxxxxxx/

Thank you for letting me know about Peter's latest updates and thoughts.
Actually, the original intention of my modification was to minimize the
traversal of the rb-tree as much as possible. For example, in the following
scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
resched, the scheduler will call 'pick_eevdf' again, traversing the
rb-tree once more. This ultimately results in the rb-tree being traversed
twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
and directly trigger a 'resched', it would reduce the traversal of the rb-tree
by one time.


wakeup_preempt-> pick_eevdf -> resched_curr
|->'traverse the rb-tree' |
schedule->pick_eevdf
|->'traverse the rb-tree'


Of course, this would break the semantics of RESPECT_SLICE as well as
RUN_TO_PARITY. So, this might be considered a performance enhancement
for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.

thanks
Chunxin


> If I understand your requirement correctly, you want to reduce the wakeup
> latency. There are some codes under developed by Peter, which could
> customized task's wakeup latency via setting its slice:
> https://lore.kernel.org/lkml/20240405110010.934104715@xxxxxxxxxxxxx/
>
> thanks,
> Chenyu