regmap irqs cause oops when hotpluging a cpu
From: James
Date: Thu Aug 24 2017 - 03:50:43 EST
I've been running linux on Intel cherry-trail devices and discovered that I
couldn't get into S4. Debugging I found that the system would hang if I
attempted to hotplug a CPU (as is done in the final UP to SMP transition in
S4).
The simplest repro was:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
looking closer it was exploding in
static void __setup_vector_irq(int cpu)
{
struct apic_chip_data *data;
struct irq_desc *desc;
int irq, vector;
/* Mark the inuse vectors */
for_each_irq_desc(irq, desc) {
struct irq_data *idata = irq_desc_get_irq_data(desc);
data = apic_chip_data(idata);
>>>> if (!data || !cpumask_test_cpu(cpu, data->domain))
continue;
because data->domain wasn't a valid pointer. (setup_vector_irq is called to
setup the vectors of the LAPIC on the new cpu.)
digging, it appears that the (l)apic code assumes that all interrupt domain
hierarchies end up at a lapic (since that's the only way an interrupt gets
delivered on x86)
to that end it does, for every interrupt on the platform:
static struct apic_chip_data *apic_chip_data(struct irq_data *irq_data)
{
if (!irq_data)
return NULL;
while (irq_data->parent_data)
irq_data = irq_data->parent_data;
return irq_data->chip_data;
}
Irq_data->chip_data is private void pointer for the use of the irq chip, but
this code doesn't bother checking that irq_data->chip is &lapic_controller
(it's own chip) because it must be true by plumbing [it assumes walking the
parents from any interrupt must end up at a lapic].
parent_data is meant to get set from domain->parent when the interrupt is
allocated in irq_domain_alloc_irq_data.
but for lots of the interrupts on cherry trail either domain->parent is unset
or the allocation of the irq doesn't follow this path. As a result the code
above explodes. (For example there are lots of regmap irq interrupts, and
regmap_add_irq_chip calls irq_domain_add_linear rather than
irq_domain_add_hierarchy which doesn't set parent)
I'm not sure what the right way(tm) to fix this is - as the place in irqdomain
where the interrupt is mapped likely doesn't know anything about the parent
irq. I expect all calls to add_linear need replacing, and the logic in
irqdomain.c needs to write ->parent_data.
[For a quick hack, perhaps the winning plan is to make setup_vector_irq just
put in a generic cpu bitmap for anything that isn't a lapic interrupt.]
James.