Re: [PATCH] fix cpu hotplug test failures on powerpc

From: Peter Zijlstra
Date: Wed Dec 16 2009 - 05:16:53 EST


On Wed, 2009-12-16 at 17:15 +0800, Xiaotian Feng wrote:
> Sachin found cpu hotplug test failures on powerpc, which made kernel
> hangs on his POWER box. This is addressed in
> http://marc.info/?l=linux-kernel&m=126052886204649&w=2
>
> commit 6ad4c18(sched: Fix balance vs hotplug race), switches to
> cpu_active_mask, but at some specific situation, kernel may cause
> some cpu inactive but online.
>
> In some powerpc machine, hotplug cpu0 is allowed. If cpu0 is the
> last alive cpu, when we tried to offline cpu0, we'll inactive cpu0
> in cpu_down(), after goes into __cpu_down(), kernel found num_online_cpus
> is 1, returned -EBUSY but cpu0 is not changed back to active. So
> cpu0 is inactive but online.
>
> The fix is to set cpu inactive when we're going to bring down the specific
> cpu in _cpu_down().

Good spotting, thanks! Some comments below.

> Reported-by: Sachin Sant <sachinp@xxxxxxxxxx>
> Signed-off-by: Xiaotian Feng <dfeng@xxxxxxxxxx>
> Tested-by: Sachin Sant <sachinp@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
> ---
> kernel/cpu.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 291ac58..a1e7165 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -209,6 +209,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> return -ENOMEM;
>
> cpu_hotplug_begin();
> + set_cpu_active(cpu, false);
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> hcpu, -1, &nr_calls);
> if (err == NOTIFY_BAD) {
> @@ -280,8 +281,6 @@ int __ref cpu_down(unsigned int cpu)
> goto out;
> }
>
> - set_cpu_active(cpu, false);
> -
> /*
> * Make sure the all cpus did the reschedule and are not
> * using stale version of the cpu_active_mask.

That renders the synchronize_sched() call down there useless, so might
as well remove it then.

> @@ -387,12 +386,6 @@ int disable_nonboot_cpus(void)
> */
> cpumask_clear(frozen_cpus);
>
> - for_each_online_cpu(cpu) {
> - if (cpu == first_cpu)
> - continue;
> - set_cpu_active(cpu, false);
> - }
> -
> synchronize_sched();

And here too.

> printk("Disabling non-boot CPUs ...\n");



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/