On Thu, Oct 27, 2022 at 8:57 PM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:Suppose we have two cgroups(idle & non-idle)in /sys/fs/cgroup/cpu.
在 2022/10/28 07:34, Josh Don 写道:
Hi Chuyi,Hi Josh, thanks for your reply,
On Thu, Oct 27, 2022 at 1:16 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
The non-idle se dominates competition vs the idle se when they
are belong to the same group. We ensure that idle groups would not
preempt non-idle group in wakeup preemption(see check_preempt_wakeup()).
However, this can happen in tick preemption, since check_preempt_tick()
dose not check current/se is idle or not. This patch adds this check.
Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8bd941c..f3324b8753b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4750,6 +4750,7 @@ static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
unsigned long ideal_runtime, delta_exec;
+ int cse_is_idle, pse_is_idle;
struct sched_entity *se;
s64 delta;
@@ -4779,8 +4780,17 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
if (delta < 0)
return;
- if (delta > ideal_runtime)
+ if (delta > ideal_runtime) {
+ /*
+ * Favor non-idle group even in tick preemption
+ */
+ cse_is_idle = se_is_idle(curr);
+ pse_is_idle = se_is_idle(se);
+ if (unlikely(!cse_is_idle && pse_is_idle))
+ return;
This would make it so that we never have tick based preemption of aActually, I did some tests and traced this issue. the result shows that
non-idle entity by an idle entity. That's a recipe for starvation of
the idle entity, if the non-idle entity is cpu bound.
Beyond that though, I'm not quite sure the issue being solved here.
The large difference in weight between the idle and non-idle entity
means that the non-idle entity will not be preempted for quite a while
due to its ideal_runtime being quite high. The idle entity will
quickly be preempted on the next tick it takes due to the smaller
value of sysctl_sched_idle_min_granularity.
this can happen with small probability. I also do some benchmarks. The
result seems it has no performance harm, and we can reduce 2%~3%
context switch when idle group & non-idle group are present at the same
time. In addition, for throughput concern, I think we better let
non-idle entity consume it's ideal_runtime. However, as you said, it may
cause starvation of the idle entity.....
I don't doubt it improves the performance of the non-idle entity, but
it is never advisable to indefinitely starve threads. That's a recipe
for hard lockups, priority inversion, etc.
Does your non-idle entity have a reasonable number of cpu.shares? You
can push out the round-robin interval by tuning CFS parameters without
completely starving the idle entity.
There is another question I would like to take this opportunity to
consult you. In our production environment, in some cases, we want to
adjust the weight/shares of the idle-cgroup which means we need to
change the logic of sched_group_set_shares(), and it can increase the
probability of the above issue. So, for what reasons did you prohibit
users from changing weights of idle cgroup.
The reason for limiting the control of weight for idle cgroups is to
match the semantics of the per-task SCHED_IDLE api, which gives
SCHED_IDLE threads minimum weight. The idea behind SCHED_IDLE is that
these entities are intended to soak "unused" cpu cycles, and should
give minimal interference to any non-idle thread. However, we don't
have strict priority between idle and non-idle, due to the potential
for starvation issues.
Perhaps you could clarify your use case a bit further. Why do you want
to change the weight? Is it to adjust the competition between two idle
groups, or something else?
Thanks again for your review.
Best,
Chuyi
The wakeup check is useful for latency sensitive non-idle tasks.
However, in steady state competition between idle and non-idle, we
must allow some amount of round-robin.
+
resched_curr(rq_of(cfs_rq));
+ }
}
static void
--
2.20.1
Best,
Josh