On Mon, Jun 13, 2022 at 07:07:02AM -0600, Jeffrey Hugo wrote:
On 6/12/2022 7:48 PM, Qiang Yu wrote:
On 6/10/2022 10:00 PM, Jeffrey Hugo wrote:
On 6/9/2022 9:21 PM, Qiang Yu wrote:
On 6/9/2022 9:54 PM, Jeffrey Hugo wrote:
On 6/9/2022 7:43 AM, Qiang Yu wrote:After free_irq(), MSI is still enabled but MSI address and data
EP tends to read MSI address/data once and cache them
after BME is set.
So host should avoid changing MSI address/data after BME is set.
In pci reset function, host invokes free_irq(), which also clears MSI
address/data in EP's PCIe config space. If the invalid address/data
are cached and used by EP, MSI triggered by EP wouldn't be received by
host, because an invalid MSI data is sent to an invalid MSI address.
To fix this issue, after host runs request_irq() successfully during
mhi driver probe, let's invoke enable_irq()/disable_irq() instead of
request_irq()/free_irq() when we want to power on and power down MHI.
Meanwhile, Host should invoke free_irq() when mhi host driver is
removed.
I don't think this works for hotplug, nor cases where there
are multiple MHI devices on the system.
The EP shouldn't be caching this information for multiple
reasons. Masking the MSIs, disabling the MSIs, changing the
address when the affinity changes, etc.
It really feels like we are solving the problem in the wrong place.
Right now, this gets a NACK from me.
are cleared. So there is a chance that device initiates MSI
using zero address. How to fix this race conditions.
On what system is MSI still enabled? I just removed the AIC100
controller on an random x86 system, and lspci is indicating MSIs are
disabled -
Capabilities: [50] MSI: Enable- Count=32/32 Maskable+ 64bit+
system: Ubuntu18.04, 5.4.0-89-generic, Intel(R) Core(TM) i7-6700 CPU @
3.40GHz
After removing MHI driver, I also see MSI enable is cleared. But I
don't think free_irq clears it. I add log before free_irq and after
free_irq as following show:
[62777.625111] msi cap before free irq
[62777.625125] msi control=0x1bb, address=0xfee00318, data=0x0
[62777.625301] msi cap after free irq
[62777.625313] msi control=0x1bb, address=0x0, data=0x0
[62777.625496] mhi-pci-generic 0000:01:00.0: mhi_pci_remove end of line,
block 90 secs.
# lspci -vvs 01:00.0
Capabilities: [50] MSI: Enable+ Count=8/32 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: ffffffff Pending: 00000000
At this point, the MSI functionality is still enabled, but every MSI is
masked out (Masking), so per the PCIe spec, the endpoint may not trigger a
MSI to the host. The device advertises that it supports maskable MSIs
(Maskable+), so this is appropiate.
If your device can still send a MSI at this point, then it violates the PCIe
spec.
disable_irq() will not help you with this as it will do the same thing.
I still think you are trying to fix an issue in the wrong location (host vs
EP), and causing additional issues by doing so.
Irrespective of caching the MSI data in endpoint, I'd like to get rid of
request_irq/free_irq during the mhi_{power_down/power_up} time. As like the MHI
endpoint stack, we should just do disable/enable irq. Because, the MHI device
may go down several times while running and we do not want to deallocate the
IRQs all the time. And if the device gets removed, ultimately the MHI driver
will get removed and we are fine while loading it back (even if MSI count
changes).
I didn't had time to look into the patch in detail but I'm in favour of
accepting the proposal.
@Jeff: Any specific issue you are seeing with hotplug etc...?