[RFC PATCH -rt] Priority preemption latency

From: Darren Hart
Date: Fri Jun 09 2006 - 20:00:54 EST


We have run into a situation where a lower priority RT task will run after a
higher priority RT task is awoken. We are running the test case available
here:

http://linux.dvhart.com/tests/prio-preempt/

For all required files, including librt.h, download the tarball here:

http://linux.dvhart.com/tests/tests.tar.bz2

The test case returns non zero on failure, so just run it in a loop, exitting
on failure. The following is a patch by Mike Kravetz that seems to resolve
the problem.

Thoughts, comments?

Thanks,

--
Darren Hart
IBM Linux Technology Center



I've been looking into the priority preemption issues with [the prio-preempt
test case]. My 'theory' is that when RT tasks are awakened they are not always
put on the 'best' runqueue. As a result, the rt_overload functionality
has to be engaged to get the task to a CPU it can run on. This 'delay'
in getting the task on a CPU is the 'bug' exposed by the testcase as
a lower priority task is allowed to run during this delay.

In the testcase, awakened worker tasks should all run on the same CPU
as the others are busy running other higher priority tasks. But, from
the scheduler's point of view these other CPUs might be a better place
for the awakened task because they are 'less loaded'.

My quick and dirty patch (below) is to the try_to_wake_up path. When
awakening an RT task don't send it to a remote CPU (determined to
be less loaded) unless it can prerempt the task running on the remote
CPU. In such cases the task is added to the current CPU's runqueue.

I've been successfully running the C testcase for a while with this patch
applied.

I'm not sure of there is a 'complete' solution to this problem without
a redesign of 'global' RT scheduling on SMP. The rt_overload mechanism
does not guarantee strict priority ordering (as evidenced by this bug).
Perhaps the best solution we can hope for with the current mechanism is
to make the scheduler be smarter with RT task placement on SMP. This
would at least minimize the need for rt_overload.

--
Mike

diff -Naupr linux-rayrt12.1-r357/kernel/sched.c
linux-rayrt12.1-r357.work/kernel/sched.c
--- linux-rayrt12.1-r357/kernel/sched.c 2006-05-27 00:43:35.000000000 +0000
+++ linux-rayrt12.1-r357.work/kernel/sched.c 2006-06-08 22:41:20.000000000
+0000
@@ -1543,6 +1543,17 @@ static int try_to_wake_up(task_t *p, uns
}
}

+ /*
+ * XXX Don't send RT task elsewhere unless it can preempt current
+ * XXX on other CPU. Better yet would be for awakened RT tasks to
+ * XXX examine this(and all other) CPU(s) to see what is the best
+ * XXX fit. For example there is no check here to see if the
+ * XXX currently running task can be preempted (which would be the
+ * XXX ideal case).
+ */
+ if (rt_task(p) && !TASK_PREEMPTS_CURR(p, rq))
+ goto out_set_cpu;
+
new_cpu = cpu; /* Could not wake to this_cpu. Wake to cpu instead */
out_set_cpu:
new_cpu = wake_idle(new_cpu, p);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/