Re: PCI/kernel msi code vs GIC ITS driver conflict?

From: John Garry
Date: Wed Sep 04 2019 - 04:57:09 EST


On 03/09/2019 17:16, Marc Zyngier wrote:
Hi John,

On 03/09/2019 15:09, John Garry wrote:
Hi Marc, Bjorn, Thomas,

Hi Marc,


We've come across a conflict with the kernel/pci msi code and GIC ITS
driver on our arm64 system, whereby we can't unbind and re-bind a PCI
device driver under special conditions. I'll explain...

Our PCI device support 32 MSIs. The driver attempts to allocate msi
vectors with min msi=17, max msi = 32, and affd.pre vectors = 16. For
our test we make nr_cpus = 1 (just anything less than 16).

Just to confirm: this PCI device is requiring Multi-MSI, right? As
opposed to MSI-X?

Right, Multi-MSI.


We find that the pci/kernel msi code gives us 17 vectors, but the GIC
ITS code reserves 32 lpi maps in its_irq_domain_alloc(). The problem
then occurs when unbinding the driver in its_irq_domain_free() call,
where we only clear bits for 17 vectors. So if we unbind the driver and
then attempt to bind again, it fails.

Is this device, by any chance, sharing its requested-id with another
device? By being behind a bridge of some sort?There is some code to
deal with it, but I'm not sure it has ever been verified in anger...

It's a RC iEP and there should be no requested-id sharing:

root@ubuntu:/home/john# lspci -s 74:02.0 -v
74:02.0 Serial Attached SCSI controller: Huawei Technologies Co., Ltd. HiSilicon SAS 3.0 HBA (rev 20)
Flags: bus master, fast devsel, latency 0, IRQ 23, NUMA node 0
Memory at a2000000 (32-bit, non-prefetchable) [size=32K]
Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [80] MSI: Enable+ Count=32/32 Maskable+ 64bit+
Capabilities: [b0] Power Management version 3
Kernel driver in use: hisi_sas_v3_hw


Where the fault lies, I can't say. Maybe the kernel msi code should
always give power of 2 vectors - as I understand, the PCI spec mandates
this. Or maybe the GIC ITS driver has a problem in the free path, as
above. Or maybe the PCI driver should not be allowed to request !power
of 2 min/max vectors.

Opinion?

My hunch is that it is an ITS driver bug: the PCI layer is allowed to
give any number of MSIs to an endpoint driver, as long as they match the
requirements of the allocation for Multi-MSI.

I would tend to say that, but isn't the requirement to allocate power of 2 msi vectors, which doesn't seem to be enforced in the kernel msi layer?

That's the responsibility
of the ITS driver. If unbind/bind fails, it means that somehow we've
missed the freeing of the LPIs, which isn't good.

Is the device common enough that I can try and reproduce the issue?

No, it's integrated into the hi1620 SoC found in the D06 dev board only, but I don't think that there is anything special about this HW.

If
there's a Linux driver somewhere, I can always hack something in
emulation and find out...

Ok, the interrupt allocation for this particular driver in this test is in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c#n2393

Cheers,
John


Thanks,

M.