[PATCH 1/2] Fix stop_machine_run problem with naughty real time process

From: Satoru Takeuchi
Date: Fri May 11 2007 - 04:51:26 EST


Hi,

I wrote patches which fixes the problem regarding stop_machine_run() and
cpu hotplug.

stop_machine_run() can't accomplish its work if there is a real time process
on the CPU on which "kstopmachine" kernel thread is running. For more details,
please refer to the following thread:

http://lkml.org/lkml/2007/5/7/41

TEST RESULT:

I did the following test on my ia64 box. It works fine:

-------------------------------------------------------------------------------
# cat loop.sh
while true ; do
:
done
-------------------------------------------------------------------------------
# cat test_stop_machine_run_with_rt_proc.sh
#!/bin/sh

taskset 0x2 chrt -f 98 ./loop.sh &
PID=${!}
echo 0 >/sys/devices/system/cpu/cpu1/online
kill ${PID}
echo 1 >/sys/devices/system/cpu/cpu1/online
-------------------------------------------------------------------------------

To do the test, just issue the following command.

# ./test_stop_machine_run_with_rt_proc.sh
#

TODO list
=========

Some more works are needed. See the TODO list.

- If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't
work because kstopmachine doesn't be scheduled.

-> I'm trying to fix this problem, see the followings:

http://lkml.org/lkml/2007/5/8/620

I would submit RFC patches in 1 weeks.

- On CPU hot removal, if that RT process is migrated to the CPU on which
stop_machine_run() is running, stop_machine_run can't continue to run.

-> I'm trying to fix this problem.

- Other `stop_machine_run() with FIFO` problem might exist.

-> I've not research other subsystem using stop_machine_run yet.


# FYI, I'll be offline for 2 days.

Thanks,

Satoru

---
Fix stop_machine_run() problem with naughty real time process

stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...

- "kstopmachine" try to run on CPU1
- There is a real time process which doesn't relinquish CPU time voluntary on CPU1

... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@xxxxxxxxxxxxxx>

Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c 2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c 2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
static int stop_machine(void)
{
int i, ret = 0;
- struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
- /* One high-prio thread per cpu. We'll do this one. */
- sched_setscheduler(current, SCHED_FIFO, &param);

atomic_set(&stopmachine_thread_ack, 0);
stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i

p = kthread_create(do_stop, &smdata, "kstopmachine");
if (!IS_ERR(p)) {
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+
+ /* One high-prio thread per cpu. We'll do this one. */
+ sched_setscheduler(p, SCHED_FIFO, &param);
kthread_bind(p, cpu);
wake_up_process(p);
wait_for_completion(&smdata.done);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/