Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

From: Andre Przywara
Date: Fri Dec 13 2019 - 09:39:43 EST


On Thu, 12 Dec 2019 21:07:24 +0000
Andrew Murray <andrew.murray@xxxxxxx> wrote:

Hi,

> On Tue, Dec 10, 2019 at 08:41:15AM -0600, Bjorn Helgaas wrote:
> > On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:

[ ... ]

> > Even ECAM compliance is not really minor -- if this controller were
> > fully compliant with the spec, you would need ZERO Linux changes to
> > support it. Every quirk like this means additional maintenance
> > burden, and it's not just a one-time thing. It means old kernels that
> > *should* "just work" on your system will not work unless somebody
> > backports the quirk.
>
> With regards to URs resulting in unwanted aborts or similar - this seems
> to be a very common theme amongst ARM PCI controller drivers. For example
> both ARM32 imx6 and ARM32 keystone have fault handlers to handle an abort
> and fabricate a 0xffffffff read value.
>
> The ARM32 rcar driver, whilst it doesn't appear to produce an abort, does
> read the PCI_STATUS register after making a config read to determine if
> any aborts have happened - in which case it reports
> PCIBIOS_DEVICE_NOT_FOUND.
>
> And as recently reported [1], the rockchip driver also appears to produce
> aborts.
>
> I suspect that this ARM64 controller driver won't be the last either. Thus
> any solution here may form the basis of copy-cat solutions for subsequent
> controllers.

Well, I think Bjorn is aware of them, but was actually hoping that those broken controllers would go away at some point ;-)
And just to make this clear: I would categorise this issue as an integration bug, which just can't be fixed in hardware or firmware easily. It was never meant to be this way. So I am not sure we should promote generic solutions here.

> From my understanding of the issues, the ARM64 serrors are imprecise and
> as a result there isn't a sensible way of using them to determine that a
> read is a UR. So where there are no other solutions to suppress the
> generation of an abort by the controller, the only solutions that seem to
> exist are 1) pre-scan the devices in firmware and only talk to those devices
> in Linux - a safe option but limiting - perhaps with side effects for CRS
> and 2) the approach rcar takes in using the PCI_STATUS register - though
> you'd end up having to mask the serror (PSTATE.A) for a limited period of
> time - a risky option (you'll miss real serrors) - but with no side effects.
>
> (I don't know if option 2 is feasible in this case by the way).

Interesting, we might evaluate this, but mostly out of curiosity or for debugging. I don't think it's really a better option.
If there is a safe way of making this work in the majority of cases, that should be the way to go. Setting PSTATE.A sounds quite wacky to me.

Thanks,
Andre.

> [1] https://lore.kernel.org/linux-pci/2a381384-9d47-a7e2-679c-780950cd862d@xxxxxxxxxxxxxx/2-0001-WFT-PCI-rockchip-play-game-with-unsupported-request-.patch
>
> Thanks,
>
> Andrew Murray
>
> >
> > > This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> > > and to access *any* devices (there are no platform devices except UART).