Re: [PATCH v4 1/2] PCI: Add tango MSI controller support

From: Mason
Date: Wed May 31 2017 - 10:19:15 EST


On 24/05/2017 12:22, Marc Zyngier wrote:
> On 24/05/17 11:00, Robin Murphy wrote:
>> On 23/05/17 20:15, Mason wrote:
>>> On 23/05/2017 20:03, Robin Murphy wrote:
>>>> On 23/05/17 18:54, Mason wrote:
>>>>> On 23/05/2017 19:03, Bjorn Helgaas wrote:
>>>>>> On Wed, May 17, 2017 at 04:56:08PM +0200, Marc Gonzalez wrote:
>>>>>>> On 20/04/2017 16:28, Marc Gonzalez wrote:
>>>>>>>
>>>>>>>> +static int tango_set_affinity(struct irq_data *data,
>>>>>>>> + const struct cpumask *mask, bool force)
>>>>>>>> +{
>>>>>>>> + return -EINVAL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static struct irq_chip tango_chip = {
>>>>>>>> + .irq_ack = tango_ack,
>>>>>>>> + .irq_mask = tango_mask,
>>>>>>>> + .irq_unmask = tango_unmask,
>>>>>>>> + .irq_set_affinity = tango_set_affinity,
>>>>>>>> + .irq_compose_msi_msg = tango_compose_msi_msg,
>>>>>>>> +};
>>>>>>>
>>>>>>> Hmmm... I'm wondering why .irq_set_affinity is required.
>>>>>>>
>>>>>>> static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
>>>>>>> first calls __irq_can_set_affinity() to check whether
>>>>>>> desc->irq_data.chip->irq_set_affinity) exists.
>>>>>>>
>>>>>>> then calls irq_do_set_affinity(&desc->irq_data, mask, false);
>>>>>>> which calls chip->irq_set_affinity(data, mask, force);
>>>>>>> = msi_domain_set_affinity()
>>>>>>> which calls parent->chip->irq_set_affinity() unconditionally.
>>>>>>>
>>>>>>> Would it make sense to test that the callback is implemented
>>>>>>> before calling it?
>>>>>>>
>>>>>>>
>>>>>>> [ 0.723895] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>>>>>>
>>>>>> I'm not sure what you're asking.
>>>>>>
>>>>>> Is this a bug report for the v4 tango driver?
>>>>>
>>>>> No.
>>>>>
>>>>>> Or are you asking whether msi_domain_set_affinity() should be changed
>>>>>> to check whether parent->chip->irq_set_affinity is implemented?
>>>>>
>>>>> Yes. The way things are implemented now, drivers are forced
>>>>> to define an irq_set_affinity callback, even if it just returns
>>>>> an error, otherwise, the kernel crashes, because of the
>>>>> unconditional function pointer deref.
>>>>>
>>>>>> msi_domain_set_affinity() has called parent->chip->irq_set_affinity()
>>>>>> without checking since it was added in 2014 by f3cf8bb0d6c3 ("genirq: Add
>>>>>> generic msi irq domain support"), so if there's a problem here, it's most
>>>>>> likely in the tango code.
>>>>>
>>>>> The issue is having to define an "empty" function.
>>>>> (Unnecessary code bloat and maintenance.)
>>>>
>>>> AFAICS, only one driver (other than this one) implements a "do nothing"
>>>> set_affinity callback - everyone else who doesn't do anything hardware
>>>> specific just passes it along via irq_chip_set_affinity_parent().
>>>
>>> I counted 4. Where did I mess up?
>>>
>>> advk_msi_set_affinity
>>> altera_msi_set_affinity
>>> nwl_msi_set_affinity
>>> vmd_irq_set_affinity
>>> tango_set_affinity
>>
>> Fair point - I went through drivers/irqchip and the various
>> arch-specific instances and found ls_scfg_msi_set_affinity(), but
>> somehow skipped over drivers/pci. Anyway, I think the question stands of
>> why are these handful of drivers *not* using irq_chip_set_affinity_parent()?
>
> Probably because they don't have a parent, in the hierarchical sense.
> All they have is a chained interrupt (*puke*). Which implies that
> changing the affinity of one MSI would affect all of them, completely
> confusing unsuspecting userspace such as irqbalance.
>
>> As an outsider, it naively seems logical that the affinity of an MSI
>> which just gets translated to a wired interrupt should propagate through
>> to the affinity of that wired interrupt, but maybe there are reasons not
>> to; I really don't know.
>
> See above. The main issue is that HW folks haven't understood the actual
> use of MSIs, and persist in implementing them as an afterthought, based
> on some cascading interrupt controller design.
>
> Sad. So sad.

For the record, below is the patch I had in mind:

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 8a3e872798f3..edfc95575a37 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -91,9 +91,11 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
{
struct irq_data *parent = irq_data->parent_data;
struct msi_msg msg;
- int ret;
+ int ret = -EINVAL;
+
+ if (parent->chip->irq_set_affinity)
+ ret = parent->chip->irq_set_affinity(parent, mask, force);

- ret = parent->chip->irq_set_affinity(parent, mask, force);
if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
irq_chip_write_msi_msg(irq_data, &msg);



Then, it would be safe to remove "do-nothing" .irq_set_affinity
callbacks in drivers/pci/host

Regards.