Enabling CONFIG_HOTPLUG_CPU for CPU that does not have hardwaresupport hot-plug

From: Kosta Zertsekel
Date: Tue Oct 22 2013 - 05:39:14 EST


Hi guys,

The question
------------------
What are the possible drawbacks of enabling CONFIG_HOTPLUG_CPU for CPU
that does not have hardware support for hot-plug?

The question I'd like to ask is architecture agnostic, but the described behavior
is observed on MPCore Cortex-A9 CPU with Linux 3.4.59.

The issue
-------------
When Linux Kernel compiled in SMP mode, and CONFIG_HOTPLUG_CPU is not set,
and booted on single core CPU, then warning messages
"... task blocked for more than 120 seconds ..." starts popping up in dmesg log.

For example:
INFO: task ksoftirqd/1:9 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

To make the message disappear, CONFIG_HOTPLUG_CPU should be enabled.

Following the example of "ksoftirqd", the root cause of the issue is that
there are as many "ksoftirqd" threads created as CONFIG_NR_CPU
(see cpu_present_mask in kernel/cpu.c file). See below some details on
how "ksoftirqd" task is created and why it is not killed.

Now, the "ksoftirqd" task is *not* killed and just stays around in task queue
till the scheduler shouts "... task blocked ...".

Details:
----------
The first "ksoftirqd" task (for CPU[0]) is created as part of executing
the registered early_initcall function spawn_ksoftirqd() in the below
flow:

start_kernel() ---> rest_init() ---> kernel_init() --->
do_pre_smp_initcalls() ---> spawd_ksoftirqd() --->
cpu_callback(... CPU_ONLINE ...)

The "ksoftirqd" tasks for CPU[1 .. N-1] are created in the different flow.
First of all, cpu_callback() from kernel/softirq.c ("ksoftirqd" task is
created in this callback) is registered through CPU notifier in
spawn_ksoftirqd(). Then this callback is called in the below flow:

start_kernel() ---> rest_init() ---> kernel_init() --->
smp_init() ---> for_each_present_cpu(cpu) { cpu_up(cpu) --->
_cpu_up() ---> __cpu_nofity(CPU_UP_PREPARE) (here "ksoftirqd" task
is created).

Right after that, CPU[x] is attempted to be enabled (using __cpu_up) and,
if __cpu_up(cpu) fails, then "ksoftirqd" task is killed using
__cpu_notify(CPU_UP_CANCELLED) some lines below.

Now, the code that actually kills the task (using kthread_stop) is wrapped up
with #ifdef CONFIG_HOTPLUG_CPU (see kernel/softirq.c, function cpu_callback).

The solution
-----------------
The easy solution is to enable CONFIG_HOTPLUG_CPU which enables
the compilation of the code that kills "ksoftirqd" task.
What is the possible drawback of enabling CONFIG_HOTPLUG_CPU for CPU
that does not have hardware support hot-plug?

Thanks,
--- KostaZ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/