Re: [PATCH v2] sched/fair: Reschedule the cfs_rq when current is ineligible

From: Chunxin Zang
Date: Wed Jun 12 2024 - 06:39:49 EST




> On Jun 7, 2024, at 13:07, Chen Yu <yu.c.chen@xxxxxxxxx> wrote:
>
> On 2024-05-29 at 22:18:06 +0800, Chunxin Zang wrote:
>> I found that some tasks have been running for a long enough time and
>> have become illegal, but they are still not releasing the CPU. This
>> will increase the scheduling delay of other processes. Therefore, I
>> tried checking the current process in wakeup_preempt and entity_tick,
>> and if it is illegal, reschedule that cfs queue.
>>
>> When RUN_TO_PARITY is enabled, its behavior essentially remains
>> consistent with the original process. When NO_RUN_TO_PARITY is enabled,
>> some additional preemptions will be introduced, but not too many.
>>
>> I have pasted some test results below.
>> I isolated four cores for testing and ran hackbench in the background,
>> and observed the test results of cyclictest.
>>
>> hackbench -g 4 -l 100000000 &
>> cyclictest --mlockall -D 5m -q
>>
>> EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY
>>
>> # Min Latencies: 00006 00006 00006 00006
>> LNICE(-19) # Avg Latencies: 00191 00133 00089 00066
>> # Max Latencies: 15442 08466 14133 07713
>>
>> # Min Latencies: 00006 00010 00006 00006
>> LNICE(0) # Avg Latencies: 00466 00326 00289 00257
>> # Max Latencies: 38917 13945 32665 17710
>>
>> # Min Latencies: 00019 00053 00010 00013
>> LNICE(19) # Avg Latencies: 37151 25852 18293 23035
>> # Max Latencies: 2688299 4643635 426196 425708
>>
>> I captured and compared the number of preempt occurrences in wakeup_preempt
>> to see if it introduced any additional overhead.
>>
>> Similarly, hackbench is used to stress the utilization of four cores to
>> 100%, and the method for capturing the number of PREEMPT occurrences is
>> referenced from [1].
>>
>> schedstats EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY CFS(6.5)
>> .stats.check_preempt_count 5053054 5045388 5018589 5029585
>> .stats.patch_preempt_count ------- 0020495 ------- 0700670 -------
>> .stats.need_preempt_count 0570520 0458947 3380513 3116966 1140821
>>
>> From the above test results, there is a slight increase in the number of
>> preempt occurrences in wakeup_preempt. However, the results vary with each
>> test, and sometimes the difference is not that significant.
>>
>> [1]: https://lore.kernel.org/all/20230816134059.GC982867@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m52057282ceb6203318be1ce9f835363de3bef5cb
>>
>> Signed-off-by: Chunxin Zang <zangchunxin@xxxxxxxxxxx>
>> Reviewed-by: Chen Yang <yangchen11@xxxxxxxxxxx>
>>
>> ------
>> Changes in v2:
>> - Make the logic that determines the current process as ineligible and
>> triggers preemption effective only when NO_RUN_TO_PARITY is enabled.
>> - Update the commit message
>> ---
>> kernel/sched/fair.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..fa2c512139e5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -745,6 +745,17 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> return vruntime_eligible(cfs_rq, se->vruntime);
>> }
>>
>> +static bool check_entity_need_preempt(struct cfs_rq *cfs_rq, struct sched_entity *se)
>> +{
>> + if (sched_feat(RUN_TO_PARITY) && se->vlag != se->deadline)
>> + return true;
>
> If I understand correctly, here it intends to check if the current se
> has consumed its 1st slice after been picked at set_next_entity(), and if yes do a reschedule.
> check_entity_need_preempt() is added at the end of entity_tick(), which could overwrite
> the police to reschedule current: (entity_tick()->update_curr()->update_deadline()), only there
> are more than 1 runnable tasks will the current be preempted, even if it has expired the 1st
> requested slice.
>

The purpose of the modification is to increase preemption opportunities without breaking the
RUN_TO_PARITY rule. However, it clearly introduces some additional preemptions, or perhaps
there should be a check for the eligibility of the se. Also, to avoid overwriting the scheduling
strategy in entity_tick, would a modification like the following be more appropriate?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..5e49a15bbdd3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -745,6 +745,21 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
return vruntime_eligible(cfs_rq, se->vruntime);
}

+static bool check_entity_need_preempt(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ if (cfs_rq->nr_running <= 1)
+ return false;
+
+ if (sched_feat(RUN_TO_PARITY) && se->vlag != se->deadline
+ && !entity_eligible(cfs_rq, se))
+ return true;
+
+ if (!sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
+ return true;
+
+ return false;
+}
+
static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
{
u64 min_vruntime = cfs_rq->min_vruntime;
@@ -974,11 +989,13 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
/*
* XXX: strictly: vd_i += N*r_i/w_i such that: vd_i > ve_i
* this is probably good enough.
+ *
+ * return true if se need preempt
*/
-static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
+static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
if ((s64)(se->vruntime - se->deadline) < 0)
- return;
+ return false;

/*
* For EEVDF the virtual time slope is determined by w_i (iow.
@@ -995,10 +1012,7 @@ static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
/*
* The task has consumed its request, reschedule.
*/
- if (cfs_rq->nr_running > 1) {
- resched_curr(rq_of(cfs_rq));
- clear_buddies(cfs_rq, se);
- }
+ return true;
}

#include "pelt.h"
@@ -1157,6 +1171,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
{
struct sched_entity *curr = cfs_rq->curr;
s64 delta_exec;
+ bool need_preempt = false;

if (unlikely(!curr))
return;
@@ -1166,12 +1181,17 @@ static void update_curr(struct cfs_rq *cfs_rq)
return;

curr->vruntime += calc_delta_fair(delta_exec, curr);
- update_deadline(cfs_rq, curr);
+ need_preempt = update_deadline(cfs_rq, curr);
update_min_vruntime(cfs_rq);

if (entity_is_task(curr))
update_curr_task(task_of(curr), delta_exec);

+ if (need_preempt || check_entity_need_preempt(cfs_rq, curr)) {
+ resched_curr(rq_of(cfs_rq));
+ clear_buddies(cfs_rq, curr);
+ }
+
account_cfs_rq_runtime(cfs_rq, delta_exec);
}



>> +
>> + if (!sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
>> {
>> u64 min_vruntime = cfs_rq->min_vruntime;
>> @@ -5523,6 +5534,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>> return;
>> #endif
>> +
>> + if (check_entity_need_preempt(cfs_rq, curr))
>> + resched_curr(rq_of(cfs_rq));
>> }
>>
>>
>> @@ -8343,6 +8357,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> cfs_rq = cfs_rq_of(se);
>> update_curr(cfs_rq);
>>
>> + if (check_entity_need_preempt(cfs_rq, se))
>> + goto preempt;
>> +
>
> As we changes the preemption policy for current in two places, the tick preemption and wakeup preemption,
> do you have statistics that shows which one brings the most benefit?

This modification no longer involves both wakeup and tick but is consolidated in 'update_curr', and it completes
the preemption decision along with 'update_deadline'. This approach seems more elegant and achieves the
same performance benefits as before.

thanks
Chunxin

>
> thanks,
> Chenyu