Re: [PATCH v4 1/2] PCI: Add tango MSI controller support

From: Marc Zyngier
Date: Wed May 24 2017 - 06:25:33 EST


On 24/05/17 11:00, Robin Murphy wrote:
> On 23/05/17 20:15, Mason wrote:
>> On 23/05/2017 20:03, Robin Murphy wrote:
>>> On 23/05/17 18:54, Mason wrote:
>>>> On 23/05/2017 19:03, Bjorn Helgaas wrote:
>>>>> On Wed, May 17, 2017 at 04:56:08PM +0200, Marc Gonzalez wrote:
>>>>>> On 20/04/2017 16:28, Marc Gonzalez wrote:
>>>>>>
>>>>>>> +static int tango_set_affinity(struct irq_data *data,
>>>>>>> + const struct cpumask *mask, bool force)
>>>>>>> +{
>>>>>>> + return -EINVAL;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static struct irq_chip tango_chip = {
>>>>>>> + .irq_ack = tango_ack,
>>>>>>> + .irq_mask = tango_mask,
>>>>>>> + .irq_unmask = tango_unmask,
>>>>>>> + .irq_set_affinity = tango_set_affinity,
>>>>>>> + .irq_compose_msi_msg = tango_compose_msi_msg,
>>>>>>> +};
>>>>>>
>>>>>> Hmmm... I'm wondering why .irq_set_affinity is required.
>>>>>>
>>>>>> static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
>>>>>> first calls __irq_can_set_affinity() to check whether
>>>>>> desc->irq_data.chip->irq_set_affinity) exists.
>>>>>>
>>>>>> then calls irq_do_set_affinity(&desc->irq_data, mask, false);
>>>>>> which calls chip->irq_set_affinity(data, mask, force);
>>>>>> = msi_domain_set_affinity()
>>>>>> which calls parent->chip->irq_set_affinity() unconditionally.
>>>>>>
>>>>>> Would it make sense to test that the callback is implemented
>>>>>> before calling it?
>>>>>>
>>>>>>
>>>>>> [ 0.723895] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>>>>>
>>>>> I'm not sure what you're asking.
>>>>>
>>>>> Is this a bug report for the v4 tango driver?
>>>>
>>>> No.
>>>>
>>>>> Or are you asking whether msi_domain_set_affinity() should be changed
>>>>> to check whether parent->chip->irq_set_affinity is implemented?
>>>>
>>>> Yes. The way things are implemented now, drivers are forced
>>>> to define an irq_set_affinity callback, even if it just returns
>>>> an error, otherwise, the kernel crashes, because of the
>>>> unconditional function pointer deref.
>>>>
>>>>> msi_domain_set_affinity() has called parent->chip->irq_set_affinity()
>>>>> without checking since it was added in 2014 by f3cf8bb0d6c3 ("genirq: Add
>>>>> generic msi irq domain support"), so if there's a problem here, it's most
>>>>> likely in the tango code.
>>>>
>>>> The issue is having to define an "empty" function.
>>>> (Unnecessary code bloat and maintenance.)
>>>
>>> AFAICS, only one driver (other than this one) implements a "do nothing"
>>> set_affinity callback - everyone else who doesn't do anything hardware
>>> specific just passes it along via irq_chip_set_affinity_parent().
>>
>> I counted 4. Where did I mess up?
>>
>> advk_msi_set_affinity
>> altera_msi_set_affinity
>> nwl_msi_set_affinity
>> vmd_irq_set_affinity
>> tango_set_affinity
>
> Fair point - I went through drivers/irqchip and the various
> arch-specific instances and found ls_scfg_msi_set_affinity(), but
> somehow skipped over drivers/pci. Anyway, I think the question stands of
> why are these handful of drivers *not* using irq_chip_set_affinity_parent()?

Probably because they don't have a parent, in the hierarchical sense.
All they have is a chained interrupt (*puke*). Which implies that
changing the affinity of one MSI would affect all of them, completely
confusing unsuspecting userspace such as irqbalance.

> As an outsider, it naively seems logical that the affinity of an MSI
> which just gets translated to a wired interrupt should propagate through
> to the affinity of that wired interrupt, but maybe there are reasons not
> to; I really don't know.

See above. The main issue is that HW folks haven't understood the actual
use of MSIs, and persist in implementing them as an afterthought, based
on some cascading interrupt controller design.

Sad. So sad.

M.
--
Jazz is not dead. It just smells funny...