Re: [PATCH] PCI: aardvark: Implement re-issuing config requests on CRS response
From: Pali Rohár
Date: Wed Sep 22 2021 - 06:17:44 EST
On Thursday 16 September 2021 09:32:57 Bjorn Helgaas wrote:
> On Wed, Sep 15, 2021 at 12:55:53PM +0200, Pali Rohár wrote:
> > On Tuesday 14 September 2021 15:55:26 Bjorn Helgaas wrote:
>
> > > It is illegal for a device to return CRS after it has returned a
> > > successful completion unless an intervening reset has occurred, so
> > > drivers and other code should never see it.
> > >
> > > > And issue is there also with write requests. Is somebody checking return
> > > > value of pci_bus_write_config function?
> > >
> > > Similar case here. The enumeration and wait-after-reset paths always
> > > do *reads* until we get a successful completion, so I don't think we
> > > ever issue a write that can get CRS.
> >
> > Yes, in normal conditions we should not see it.
> >
> > But for testing purposes (that emulated bridge works fine) I'm using
> > setpci for changing some configuration.
> >
> > And via setpci it is possible to turn off CRSSVE bit in which case then
> > Root Complex should re-issue request again.
> >
> > I'm not sure how "legal" it is if userspace / setpci changes some of
> > these bits. At least on a hardware with a real Root Port device it
> > should be fully transparent. As hardware handles this re-issue and
> > kernel then would see (reissued) response.
>
> If setpci changes bits like these, all bets are off. We can't tell
> what happened, so we can't rely on any configuration Linux did. I
> think we really should taint the kernel when this happens.
For testing purposes, setpci is still a very good tool.
> > Test case: Initialize device, then unbind it from sysfs, reset it (hot
> > reset or warm reset) and then rescan / reinit it again. Here device is
> > permitted to send CRS response.
> >
> > We know that more PCIe cards are buggy and sometimes firmware on cards
> > crashes or resets card logic. Which may put card into initialization
> > state when it is again permitted to send CRS response.
>
> Yep. That's a buggy device and normally we would work around it with
> a quirk. This particular kind of bug would be hard to work around,
> but a host bridge driver doesn't seem like the right place to do it
> because we'd have to do it in *every* such driver.
This described firmware crashing & card reset logic I saw in more wifi
cards. Sometimes even wifi drivers itself detects that card does not
respond and do some its own internal card reset (e.g. iwldvm on laptop).
So it very common situation.
But I have not seen that these cards on laptop issue CRS response. Maybe
because their firmware or PCIe logic bootup too fast (so there is a very
little window for CRS response) or because CRS response sent to OS did
not cause any issue.
So no particular workaround is needed for above described scenario.
But anyway, in case that in future there would be need for disabling CRS
feature in kernel (e.g. for doing some workaround for endpoint or
extended pcie switch) then this re-issuing of config request on CRS
response in pci-aardvark.c would be needed to have similar behavior like
real HW hen CRS is disabled.
And I like the idea if driver is "feature complete" and prepared also
for other _valid_ code paths. This is just my opinion.