Hi Chuyi,Hi Josh, thanks for your reply,
On Thu, Oct 27, 2022 at 1:16 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
The non-idle se dominates competition vs the idle se when they
are belong to the same group. We ensure that idle groups would not
preempt non-idle group in wakeup preemption(see check_preempt_wakeup()).
However, this can happen in tick preemption, since check_preempt_tick()
dose not check current/se is idle or not. This patch adds this check.
Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8bd941c..f3324b8753b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4750,6 +4750,7 @@ static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
unsigned long ideal_runtime, delta_exec;
+ int cse_is_idle, pse_is_idle;
struct sched_entity *se;
s64 delta;
@@ -4779,8 +4780,17 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
if (delta < 0)
return;
- if (delta > ideal_runtime)
+ if (delta > ideal_runtime) {
+ /*
+ * Favor non-idle group even in tick preemption
+ */
+ cse_is_idle = se_is_idle(curr);
+ pse_is_idle = se_is_idle(se);
+ if (unlikely(!cse_is_idle && pse_is_idle))
+ return;
This would make it so that we never have tick based preemption of aActually, I did some tests and traced this issue. the result shows that this can happen with small probability. I also do some benchmarks. The result seems it has no performance harm, and we can reduce 2%~3% context switch when idle group & non-idle group are present at the same time. In addition, for throughput concern, I think we better let non-idle entity consume it's ideal_runtime. However, as you said, it may cause starvation of the idle entity.....
non-idle entity by an idle entity. That's a recipe for starvation of
the idle entity, if the non-idle entity is cpu bound.
Beyond that though, I'm not quite sure the issue being solved here.
The large difference in weight between the idle and non-idle entity
means that the non-idle entity will not be preempted for quite a while
due to its ideal_runtime being quite high. The idle entity will
quickly be preempted on the next tick it takes due to the smaller
value of sysctl_sched_idle_min_granularity.
The wakeup check is useful for latency sensitive non-idle tasks.
However, in steady state competition between idle and non-idle, we
must allow some amount of round-robin.
+
resched_curr(rq_of(cfs_rq));
+ }
}
static void
--
2.20.1
Best,
Josh