Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs

From: Laurentiu Tudor
Date: Tue Feb 11 2020 - 10:14:51 EST




On 11.02.2020 16:48, Olof Johansson wrote:
On Tue, Feb 11, 2020 at 5:04 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:

On 2020-02-11 12:13 pm, Laurentiu Tudor wrote:
[...]
This is a known issue about DPAA2 MC bus not working well with SMMU
based IO mapping. Adding Laurentiu to the chain who has been looking
into this issue.

Yes, I'm closely following the issue. I actually have a workaround
(attached) but haven't submitted as it will probably raise a lot of
eyebrows. In the mean time I'm following some discussions [1][2][3] on
the iommu list which seem to try to tackle what appears to be a similar
issue but with framebuffers. My hope is that we will be able to leverage
whatever turns out.

Indeed it's more general than framebuffers - in fact there was a
specific requirement from the IORT side to accommodate network/storage
controllers with in-memory firmware/configuration data/whatever set up
by the bootloader that want to be handed off 'live' to Linux because the
overhead of stopping and restarting them is impractical. Thus this DPAA2
setup is very much within scope of the desired solution, so please feel
free to join in (particularly on the DT parts) :)

That's a real problem that nees a solution, but that's not what's
happening here, since cold boots works fine.

Isn't it a whole lot more likely that something isn't
reset/reinitialized properly in u-boot, such that there is lingering
state in the setup, causing this?

Ok, so this is completely something else. I don't think our u-boots are designed to run in ways other than coming from hard reset.

As for right now, note that your patch would only be a partial
mitigation to slightly reduce the fault window but not remove it
entirely. To be robust the SMMU driver *has* to know about live streams
before the first arm_smmu_reset() - hence the need for generic firmware
bindings - so doing anything from the MC driver is already too late (and
indeed the current iommu_request_dm_for_dev() mechanism is itself a
microcosm of the same problem).

This is more likely a live stream that's left behind from the previous
kernel (there are some error messages about being unable to detach
domains, but the errors make it hard to tell what driver didn't unbind
enough).

I also noticed those messages. Perhaps our PCI driver doesn't do all the required cleanup.

*BUT*, even with that bug, the system should reboot reliably and come
up clean. So, something isn't clearing up the state *on boot*.

We do test some kexec based "soft-reset" scenarios, didn't hit your issue but instead we hit this:

https://lkml.org/lkml/2018/9/21/1066

Can you please provide some more info on your scenario?

---
Best Regards, Laurentiu