[PATCH 14/15] sched,fair: scale vdiff in wakeup_preempt_entity

From: Rik van Riel
Date: Fri Sep 06 2019 - 15:15:11 EST


When a task wakes back up after having gone to sleep, place_entity
will limit the vruntime difference between min_vruntime and the
woken up task to half of sysctl_sched_latency.

The code in wakeup_preempt_entity calculates how much vruntime a
time slice for the woken up task represents, in wakeup_gran.

It then assumes that all the vruntime used since the task went to
sleep was used by the currently running task (which has its vruntime
scaled by calc_delta_fair, as well).

However, that assumption is not necessarily true, and the vruntime
may have advanced at different rates, pushed ahead by different tasks
on the CPU. This becomes more visible when the CPU controller is enabled.

This leads to the symptom that a high priority woken up task is likely to
preempt whatever is running, even if the currently running task is of equal
or higher priority than the woken up task!

Scaling the vdiff down if the currently running task is also high priority
solves that symptom.

This is not the correct thing to do if all of the vruntime was accumulated
by the current task, or other tasks at similar priority, and already scaled
by the same priority, but I do not have any better ideas on how to tackle
the "task X got preempted by task Y of the same priority" issue that system
administrators try to resolve by setting the sched_wakeup_granularity
sysctl variable to a larger value than half of sysctl_sched_latency...

Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
---
kernel/sched/fair.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1049f2f4ae55..3c25ab666799 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6792,6 +6792,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
if (vdiff <= 0)
return -1;

+ vdiff = min((u64)vdiff, calc_delta_fair(vdiff, curr));
gran = wakeup_gran(se);
if (vdiff > gran)
return 1;
--
2.20.1