Re: arm64 iommu groups issue
From: John Garry
Date: Thu Feb 13 2020 - 10:50:08 EST
The underlying issue is that, for historical reasons, OF/IORT-based
IOMMU drivers have ended up with group allocation being tied to endpoint
driver probing via the dma_configure() mechanism (long story short,
driver probe is the only thing which can be delayed in order to wait for
a specific IOMMU instance to be ready).However, in the meantime, the
IOMMU API internals have evolved sufficiently that I think there's a way
to really put things right - I have the spark of an idea which I'll try
to sketch out ASAP...
OK, great.
Hi Robin,
I was wondering if you have had a chance to consider this problem again?
One simple idea could be to introduce a device link between the endpoint
device and its parent bridge to ensure that they probe in order, as
expected in pci_device_group():
Subject: [PATCH] PCI: Add device link to ensure endpoint device driver
probes after bridge
It is required to ensure that a device driver for an endpoint will probe
after the parent port driver, so add a device link for this.
---
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 512cb4312ddd..4b832ad25b20 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2383,6 +2383,7 @@ static void pci_set_msi_domain(struct pci_dev *dev)
void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
{
int ret;
+ struct device *parent;
pci_configure_device(dev);
@@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct
pci_bus *bus)
/* Set up MSI IRQ domain */
pci_set_msi_domain(dev);
+ parent = dev->dev.parent;
+ if (parent && parent->bus == &pci_bus_type)
+ device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
+
/* Notifier could use PCI capabilities */
dev->match_driver = false;
ret = device_add(&dev->dev);
--
This would work, but the problem is that if the port driver fails in
probing - and not just for -EPROBE_DEFER - then the child device will
never probe. This very thing happens on my dev board. However we could
expand the device links API to cover this sort of scenario.
As for alternatives, it looks pretty difficult to me to disassociate the
group allocation from the dma_configure path.
Let me know.
Thanks,
John