Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
From: Benjamin Herrenschmidt
Date: Wed Jan 09 2019 - 02:41:26 EST
On Wed, 2019-01-09 at 15:53 +1100, Alexey Kardashevskiy wrote:
> "A PCI completion timeout occurred for an outstanding PCI-E transaction"
> it is.
>
> This is how I bind the device to vfio:
>
> echo vfio-pci > '/sys/bus/pci/devices/0000:01:00.0/driver_override'
> echo vfio-pci > '/sys/bus/pci/devices/0000:01:00.1/driver_override'
> echo '0000:01:00.0' > '/sys/bus/pci/devices/0000:01:00.0/driver/unbind'
> echo '0000:01:00.1' > '/sys/bus/pci/devices/0000:01:00.1/driver/unbind'
> echo '0000:01:00.0' > /sys/bus/pci/drivers/vfio-pci/bind
> echo '0000:01:00.1' > /sys/bus/pci/drivers/vfio-pci/bind
>
>
> and I noticed that EEH only happens with the last command. The order
> (.0,.1 or .1,.0) does not matter, it seems that putting one function to
> D3 is fine but putting another one when the first one is already in D3 -
> produces EEH. And I do not recall ever seeing this on the firestone
> machine. Weird.
Putting all functions into D3 is what allows the device to actually go
into D3.
Does it work with other devices ? We do have that bug on early P9
revisions where the attempt of bringing the link to L1 as part of the
D3 process fails in horrible ways, I thought P8 would be ok but maybe
not ...
Otherwise, it might be that our timeouts are too low (you may want to
talk to our PCIe guys internally)
Cheers,
Ben.