Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible
From: Chunxin Zang
Date: Tue May 28 2024 - 02:42:03 EST
> On May 28, 2024, at 10:42, Chunxin Zang <spring.cxz@xxxxxxxxx> wrote:
>
>>
>> On May 24, 2024, at 23:30, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
>>
>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>> I found that some tasks have been running for a long enough time and
>>> have become illegal, but they are still not releasing the CPU. This
>>> will increase the scheduling delay of other processes. Therefore, I
>>> tried checking the current process in wakeup_preempt and entity_tick,
>>> and if it is illegal, reschedule that cfs queue.
>>>
>>> The modification can reduce the scheduling delay by about 30% when
>>> RUN_TO_PARITY is enabled.
>>> So far, it has been running well in my test environment, and I have
>>> pasted some test results below.
>>>
>>
>> Interesting, besides hackbench, I assume that you have workload in
>> real production environment that is sensitive to wakeup latency?
>
> Hi Chen
>
> Yes, my workload are quite sensitive to wakeup latency .
>>
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>> return;
>>> #endif
>>> +
>>> + if (!entity_eligible(cfs_rq, curr))
>>> + resched_curr(rq_of(cfs_rq));
>>> }
>>>
>>
>> entity_tick() -> update_curr() -> update_deadline():
>> se->vruntime >= se->deadline ? resched_curr()
>> only current has expired its slice will it be scheduled out.
>>
>> So here you want to schedule current out if its lag becomes 0.
>>
>> In lastest sched/eevdf branch, it is controlled by two sched features:
>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>
>> Maybe something like this can achieve your goal
>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>> resched_curr
>>
>>>
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>>
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>
>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>> be evicted. And this change does not consider the cgroup hierarchy.
>>
>> Besides, the check of current eligiblity can get false negative result,
>> if the enqueued entity has a positive lag. Prateek proposed to
>> remove the check of current's eligibility in pick_eevdf():
>> https://lore.kernel.org/lkml/20240325060226.1540-2-kprateek.nayak@xxxxxxx/
>
> Thank you for letting me know about Peter's latest updates and thoughts.
> Actually, the original intention of my modification was to minimize the
> traversal of the rb-tree as much as possible. For example, in the following
> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> 'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
> resched, the scheduler will call 'pick_eevdf' again, traversing the
> rb-tree once more. This ultimately results in the rb-tree being traversed
> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> by one time.
>
>
> wakeup_preempt-> pick_eevdf -> resched_curr
> |->'traverse the rb-tree' |
> schedule->pick_eevdf
> |->'traverse the rb-tree'
>
>
> Of course, this would break the semantics of RESPECT_SLICE as well as
> RUN_TO_PARITY. So, this might be considered a performance enhancement
> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>
Sorry for the mistake. I mean it should be a performance enhancement for scenarios
with NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
Maybe it should be like this
@@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
return;
+ if (!sched_feat(RESPECT_SLICE) && !sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
+ goto preempt;
+
> thanks
> Chunxin
>
>
>> If I understand your requirement correctly, you want to reduce the wakeup
>> latency. There are some codes under developed by Peter, which could
>> customized task's wakeup latency via setting its slice:
>> https://lore.kernel.org/lkml/20240405110010.934104715@xxxxxxxxxxxxx/
>>
>> thanks,
>> Chenyu