Cpu-Hotplug and Real-Time

From: Gautham R Shenoy
Date: Tue Aug 07 2007 - 09:12:21 EST


Hi,

While running a cpu-hotplug test involving a high priority
process (SCHED_RR, prio=94) trying to periodically offline and
online cpu1 on a 2-processor machine, I noticed that the system was
becoming unresponsive after a few iterations.

However, when the same test was repeated with processors
greater than 2, it worked fine.
Also, if the hotplugging process, was not of rt-prio, it
worked fine on a 2-processor machine.

After some debugging, I saw that the hang occured because
the high prio process was stuck in a loop doing yield() inside
wait_task_inactive(). Description follows:

Say a high-prio task (A) does a kthread_create(B),
followed by a kthread_bind(B, cpu1). At this moment,
only cpu0 is online.

Now, immediately after being created, B would
do a
complete(&create->started) [kernel/kthread.c: kthread()],
before scheduling itself out.

This complete() will wake up kthreadd, which had spawned B.
It is possible that during the wakeup, kthreadd might preempt B.
Thus, B is still on the runqueue, and not yet called schedule().

kthreadd, will inturn do a
complete(&create->done); [kernel/kthread.c: create_kthread()]
which will wake up the thread which had called kthread_create().
In our case it's task A, which will run immediately, since its priority
is higher.

A will now call kthread_bind(B, cpu1).
kthread_bind(), calls wait_task_inactive(B), to ensures that
B has scheduled itself out.

B is still on the runqueue, so A calls yield() in wait_task_inactive().
But since A is the task with the highest prio, scheduler schedules it
back again.

Thus B never gets to run to schedule itself out.
A loops waiting for B to schedule out leading to system hang.

In my case,
A was the high priority process trying to bring up cpu1, and
thus doing a kthread_create/kthread_bind in
migration_call(): CPU_UP_PREPARE.

B was the migration thread for cpu1.

And the above problem occurs when only one cpu is online.

Possible solutions to this problem:
a) Let the newly spawned kernel threads inherit
their parent's prio and policy.

b) Instead of using yield() in wait_task_inactive(), we could use
something like a yield_to(p):

yield_to(struct task_struct p)
{
int old_prio = p->prio;
/* Temporarily boost p's priority atleast to that of current task */
if (current->prio > old_prio)
set_prio(p, current->prio);
yield();
/* Reset priority back to the original value */
set_prio(p, old_prio);
}


Thoughts?

Thanks and Regards
gautham.

--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/