Re: [patch] Real-Time Preemption, deactivate() scheduling issue

From: Eugeny S. Mints
Date: Thu Mar 03 2005 - 14:52:07 EST


please consider the following scenario for full RT kernel.

Task A is running then an irq is occured which in turn wakes up irq related thread (B) of a higher priority than A.

my current understanding that actual context switch between A and B will occure at preempt_schedule_irq() on the "return form irq " path.

in this case the following "if" statement in __schedule() always returns false since preempt_schedule_irq() always sets up PREEMPT_ACTIVE before __schedule() call.

if ((prev->state & ~TASK_RUNNING_MUTEX) &&
!(preempt_count() & PREEMPT_ACTIVE)) {

as result the deactivate() is never called for preempted task A in this scenario. BUt if the task A is preempted while not in TASK_RUNNING state such behaviour seems incorrect since we get a task in not TASK_RUNNING state linked into a run queue.

An example:

drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel)

spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other tx operations */
while (dev->tx_buff.len > 0) { /* wait until tx idle */
spin_unlock_irqrestore(&dev->tx_lock, flags);
76: set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(msecs_to_jiffies(10));
spin_lock_irqsave(&dev->tx_lock, flags);
}

At line 76 irqs are enabled, preemption is enabled.
Let assume the task A executes this code and gets preempted right after line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be deactevated. Of cource this is the bug in set_current_state() utilization in this particular driver but schedule stuff should be robust to such bugs I believe. There are a lot such bugs in the kernel I believe.

Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) condition is but if it's just a sort of optimization (not remove a task from run queue if it was preemped in TASK_RUNNING state) then probably it should be removed in order to save correctness. Patch attached.

Eugeny

--- sched.c.orig 2005-03-03 22:35:16.000000000 +0300
+++ sched.c 2005-03-03 22:34:58.000000000 +0300
@@ -2891,8 +2891,7 @@
spin_lock_irq(&rq->lock);

switch_count = &prev->nvcsw; // TODO: temporary - to see it in vmstat
- if ((prev->state & ~TASK_RUNNING_MUTEX) &&
- !(preempt_count() & PREEMPT_ACTIVE)) {
+ if ((prev->state & ~TASK_RUNNING_MUTEX)) {
switch_count = &prev->nvcsw;
if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
unlikely(signal_pending(prev))))