Hi John,
On 03/09/2019 15:09, John Garry wrote:
Hi Marc, Bjorn, Thomas,
We've come across a conflict with the kernel/pci msi code and GIC ITS
driver on our arm64 system, whereby we can't unbind and re-bind a PCI
device driver under special conditions. I'll explain...
Our PCI device support 32 MSIs. The driver attempts to allocate msi
vectors with min msi=17, max msi = 32, and affd.pre vectors = 16. For
our test we make nr_cpus = 1 (just anything less than 16).
Just to confirm: this PCI device is requiring Multi-MSI, right? As
opposed to MSI-X?
We find that the pci/kernel msi code gives us 17 vectors, but the GIC
ITS code reserves 32 lpi maps in its_irq_domain_alloc(). The problem
then occurs when unbinding the driver in its_irq_domain_free() call,
where we only clear bits for 17 vectors. So if we unbind the driver and
then attempt to bind again, it fails.
Is this device, by any chance, sharing its requested-id with another
device? By being behind a bridge of some sort?There is some code to
deal with it, but I'm not sure it has ever been verified in anger...
Where the fault lies, I can't say. Maybe the kernel msi code should
always give power of 2 vectors - as I understand, the PCI spec mandates
this. Or maybe the GIC ITS driver has a problem in the free path, as
above. Or maybe the PCI driver should not be allowed to request !power
of 2 min/max vectors.
Opinion?
My hunch is that it is an ITS driver bug: the PCI layer is allowed to
give any number of MSIs to an endpoint driver, as long as they match the
requirements of the allocation for Multi-MSI.
of the ITS driver. If unbind/bind fails, it means that somehow we've
missed the freeing of the LPIs, which isn't good.
Is the device common enough that I can try and reproduce the issue?
there's a Linux driver somewhere, I can always hack something in
emulation and find out...
Thanks,
M.