3. The 'Rescheduling siblings' loop of pick_next_task() is quite fragile. ItI like this idea, its much simpler :-)
calls various functions on rq->core_pick which could very well be NULL because:
An online sibling might have gone offline before a task could be picked for it,
or it might be offline but later happen to come online, but its too late and
nothing was picked for it. Just ignore the siblings for which nothing could be
picked. This avoids any crashes that may occur in this loop that assume
rq->core_pick is not NULL.
Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
---I think we can come here when hotplug thread is scheduled during online, but
kernel/sched/core.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 717122a3dca1..4966e9f14f39 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4610,13 +4610,24 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
if (!sched_core_enabled(rq))
return __pick_next_task(rq, prev, rf);
+ cpu = cpu_of(rq);
+
+ /* Stopper task is switching into idle, no need core-wide selection. */
+ if (cpu_is_offline(cpu))We would need reset core_pick here I think. Something like
+ return __pick_next_task(rq, prev, rf);
+
/*Should this check be reversed? I mean, we should enter the fastpath if
* If there were no {en,de}queues since we picked (IOW, the task
* pointers are all still valid), and we haven't scheduled the last
* pick yet, do so now.
+ *
+ * rq->core_pick can be NULL if no selection was made for a CPU because
+ * it was either offline or went offline during a sibling's core-wide
+ * selection. In this case, do a core-wide selection.
*/
if (rq->core->core_pick_seq == rq->core->core_task_seq &&
- rq->core->core_pick_seq != rq->core_sched_seq) {
+ rq->core->core_pick_seq != rq->core_sched_seq &&
+ !rq->core_pick) {