Re: [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths
From: Thomas Gleixner
Date: Thu Sep 06 2018 - 03:56:37 EST
On Wed, 5 Sep 2018, pheragu@xxxxxxxxxxxxxx wrote:
> On 2018-09-05 11:23, Thomas Gleixner wrote:
> > On Wed, 5 Sep 2018, Prakruthi Deepak Heragu wrote:
> >
> > > One of the cores might have just allocated irq_desc() and other core
> > > might be doing irq migration in the hot plug path. In the hot plug path
> > > during the IRQ migration, for_each_active_irq macro is trying to get
> > > irqs whose bits are set in allocated_irqs bit map but there is no return
> > > value check after irq_to_desc for desc validity.
> >
> > Confused. All parts involved, irq allocation/deallocation and the CPU
> > hotplug code take sparse_irq_lock to prevent exavtly that.
> >
> Removing the NULL pointer check and adding this sparse_irq_lock
> that you suggested will solve this issue. The code looks like
> this now. Is this okay?
No, it's not okay at all. You _CANNOT_ take sparse_irq_lock() there.
I wrote that _ALL_ parts take that lock already. Care to look at the code?
takedown_cpu()
{
/*
* Prevent irq alloc/free while the dying cpu reorganizes the
* interrupt affinities.
*/
irq_lock_sparse();
/*
* So now all preempt/rcu users must observe !cpu_active().
*/
err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
and __cpu_disable() which in turn calls irq_migrate_all_off_this_cpu() is
called from take_cpu_down(). So this deadlocks in any case.
I have no idea what kind of modifications you have in your kernel which
make
1) The thing explode in the first place
2) Not immediately deadlock with that hack you just sent
and honestly I don't want to know at all.
Thanks,
tglx