[PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling
From: Joel Fernandes (Google)
Date: Tue Nov 17 2020 - 18:20:50 EST
From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs) to
see if they could be running RT.
Say the RQs in a particular core look like this:
Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
rq0 rq1
CFS1 (tagged) RT1 (not tag)
CFS2 (tagged)
Say schedule() runs on rq0. Now, it will enter the above loop and
pick_task(RT) will return NULL for 'p'. It will enter the above if() block
and see that need_sync == false and will skip RT entirely.
The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
rq0 rq1
CFS1 IDLE
When it should have selected:
rq0 r1
IDLE RT
Joel saw this issue on real-world usecases in ChromeOS where an RT task
gets constantly force-idled and breaks RT. Lets cure it.
NOTE: This problem will be fixed differently in a later patch. It just
kept here for reference purposes about this issue, and to make
applying later patches easier.
Reported-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
---
kernel/sched/core.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ee4902c2cf5..53af817740c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
need_sync = !!rq->core->core_cookie;
/* reset state */
+reset:
rq->core->core_cookie = 0UL;
if (rq->core->core_forceidle) {
need_sync = true;
@@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
/*
* If there weren't no cookies; we don't need to
* bother with the other siblings.
- * If the rest of the core is not running a tagged
- * task, i.e. need_sync == 0, and the current CPU
- * which called into the schedule() loop does not
- * have any tasks for this class, skip selecting for
- * other siblings since there's no point. We don't skip
- * for RT/DL because that could make CFS force-idle RT.
*/
- if (i == cpu && !need_sync && class == &fair_sched_class)
+ if (i == cpu && !need_sync)
goto next_class;
continue;
@@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* Optimize the 'normal' case where there aren't any
* cookies and we don't need to sync up.
*/
- if (i == cpu && !need_sync && !p->core_cookie) {
+ if (i == cpu && !need_sync) {
+ if (p->core_cookie) {
+ /*
+ * This optimization is only valid as
+ * long as there are no cookies
+ * involved. We may have skipped
+ * non-empty higher priority classes on
+ * siblings, which are empty on this
+ * CPU, so start over.
+ */
+ need_sync = true;
+ goto reset;
+ }
+
next = p;
goto done;
}
@@ -5299,7 +5307,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
*/
need_sync = true;
}
-
}
}
next_class:;
--
2.29.2.299.gdc1121823c-goog