Re: [PATCH] IRQ, cpu-hotplug: Fix a race between CPU hotplug and IRQ desc alloc/free

From: Huang\, Ying
Date: Mon Sep 04 2017 - 19:41:10 EST


Thomas Gleixner <tglx@xxxxxxxxxxxxx> writes:

> On Mon, 4 Sep 2017, Huang, Ying wrote:
>> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
>> index 638eb9c83d9f..af9029625271 100644
>> --- a/kernel/irq/cpuhotplug.c
> ry> +++ b/kernel/irq/cpuhotplug.c
>> @@ -129,10 +129,13 @@ void irq_migrate_all_off_this_cpu(void)
>> struct irq_desc *desc;
>> unsigned int irq;
>>
>> + irq_lock_sparse();
>
> You cannot take that lock here as irq_migrate_all_off_this_cpu() is called
> with interrupts disabled.

Oh, sorry, I misunderstand the code. I will only keep the !desc check
in the patch.

> The protection in takedown_cpus() is wrong. Patch below.
>
> Thanks,
>
> tglx
> ----
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -642,13 +642,13 @@ static int takedown_cpu(unsigned int cpu
> wait_for_completion(&st->done);
> BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
>
> - /* Interrupts are moved away from the dying cpu, reenable alloc/free */
> - irq_unlock_sparse();
> -
> hotplug_cpu__broadcast_tick_pull(cpu);
> /* This actually kills the CPU. */
> __cpu_die(cpu);
>
> + /* Interrupts are moved away from the dying cpu, reenable alloc/free */
> + irq_unlock_sparse();
> +

I don't understand this. It appears that irq_migrate_all_off_this_cpu()
is called in take_cpu_down() which has sparse_irq_lock held already.

Best Regards,
Huang, Ying

> tick_cleanup_dead_cpu(cpu);
> return 0;
> }