Re: [PATCH] thunderbolt: Stop using iommu_present()

From: Mika Westerberg
Date: Thu Mar 17 2022 - 10:23:45 EST


Hi Robin,

On Thu, Mar 17, 2022 at 01:42:56PM +0000, Robin Murphy wrote:
> On 2022-03-17 08:08, Mika Westerberg wrote:
> > Hi Robin,
> >
> > On Wed, Mar 16, 2022 at 07:17:57PM +0000, Robin Murphy wrote:
> > > The feeling I'm getting from all this is that if we've got as far as
> > > iommu_dma_protection_show() then it's really too late to meaningfully
> > > mitigate bad firmware.
> >
> > Note, these are requirements from Microsoft in order for the system to
> > use the "Kernel DMA protection". Because of this, likelyhood of "bad
> > firmware" should be quite low since these systems ship with Windows
> > installed so they should get at least some soft of validation that this
> > actually works.
> >
> > > We should be able to detect missing
> > > untrusted/external-facing properties as early as nhi_probe(), and if we
> > > could go into "continue at your own risk" mode right then *before* anything
> > > else happens, it all becomes a lot easier to reason about.
> >
> > I think what we want is that the DMAR opt-in bit is set in the ACPI
> > tables and that we know the full IOMMU translation is happening for the
> > devices behind "external facing ports". If that's not the case the
> > iommu_dma_protection_show() should return 0 meaning the userspace can
> > ask the user whether the connected device is allowed to use DMA (e.g
> > PCIe is tunneled or not).
>
> Ah, if it's safe to just say "no protection" in the case that we don't know
> for sure, that's even better. Clearly I hadn't quite grasped that aspect of
> the usage model, thanks for the nudge!

There is some documentation here too, hope it is helpful:

https://docs.kernel.org/admin-guide/thunderbolt.html

> > We do check for the DMAR bit in the Intel IOMMU code and we also do
> > check that there actually are PCIe ports marked external facing but we
> > could issue warning there if that's not the case. Similarly if the user
> > explicitly disabled the IOMMU translation. This can be done inside a new
> > IOMMU API that does something like the below pseudo-code:
> >
> > #if IOMMU_ENABLED
> > bool iommu_dma_protected(struct device *dev)
> > {
> > if (dmar_platform_optin() /* or the AMD equivalent */) {
> > if (!iommu_present(...)) /* whatever is needed to check that the full translation is enabled */
> > dev_warn(dev, "IOMMU protection disabled!");
> > /*
> > * Look for the external facing ports. Should be at
> > * least 1 or issue warning.
> > */
> > ...
> >
> > return true;
> > }
> >
> > return false;
> > }
> > #else
> > static inline bool iommu_dma_protected(struct device *dev)
> > {
> > return false;
> > }
> > #endif
> >
> > Then we can make iommu_dma_protection_show() to call this function.
>
> The problem that I've been trying to nail down here is that
> dmar_platform_optin() really doesn't mean much for us - I don't know how
> Windows' IOMMU drivers work, but there's every chance it's not the same way
> as ours. The only material effect that dmar_platform_optin() has for us is
> to prevent the user from disabling the IOMMU driver altogether, and thus
> ensure that iommu_present() is true. Whether or not we can actually trust
> the IOMMU driver to provide reliable protection depends entirely on whether
> it knows the PCIe ports are external-facing. If not, we can only
> *definitely* know what the IOMMU driver will do for a given endpoint once
> that endpoint has appeared behind the port and iommu_probe_device() has
> decided what its default domain should be, and as far as I now understand,
> that's not an option for Thunderbolt since it can only happen *after* the
> tunnel has been authorised and created.

That's correct. We do know the PCIe root/downstream ports (the external
facing ones) that host the tunneled PCIe topology but rest will appear
dynamically after the connection manager established the protocol
tunnel.

> Much as I'm tempted to de-scope back to my IOMMU API cleanup and run away
> from the rest of the issue, I think I can crib enough from the existing code
> to attempt a reasonable complete fix, so let me give that a go...

Sure ;-)