Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

From: Robin Murphy
Date: Tue Apr 23 2024 - 07:27:14 EST


On 16/04/2024 4:29 pm, Jason Gunthorpe wrote:
On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
precedence over whatever a driver's def_domain_type may have wanted to
say. This was intentionally handled in core code since 3 years prior,
to avoid drivers poking at the details of what is essentially a policy
between the PCI core and the IOMMU core. Now, though, we go to the
length of evaluating both constraints to check for any conflict, and if
so throw our toys out of the pram and refuse to handle the device at
all. Regardless of any intent, in practice this leaves the device, and
potentially the rest of its group or even the whole IOMMU, in a largely
undetermined state, which at worst may render the whole system
unusable.

For systems supporting untrusted device security the translation must
be BLOCKED at this point.

So why is that not enforced then? Simply hoping that calling ops->release_device() or returning an error from iommu_device_register() might do the right thing is no guarantee. Furthermore I'm pretty sure we're still letting an untrusted device be hotplugged into an existing group without any checks at all.

Meanwhile if we go back to letting untrusted take priority and attach the device to an empty DMA domain, oh hey look it's blocked, and what's more we also retain control of the IOMMU to keep it that way. Problem solved. You intentionally changed the code to be less effective at what you say it should be doing, so once again I'm left a little puzzled as to what point you're trying to argue here.

Unfortunately it turns out that this is a realistic situation to run
into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
an identity default domain for PASIDs to be usable, and thus sets a
def_domain_type override based on PASID capability.

The majority of places implementing def_domain_type are using it as a
statement of HW capability that should not be ignored by the core code:

- DART
* system page size is too small, can't map IOPTEs, force identity

Not a hardware limitation at all, it supports paging just fine, and iommu-dma *could* in principle work with larger pages with a bit of effort (some proof-of-concept patches were posted some time ago). This is entirely the driver hacking around the legacy of iommu-dma and core code not expecting the situation and thus not detecting it or handling it gracefully. In fact I should be able to sort that out relatively soon, once my other stuff lands and I'm able to start pulling iommu_dma_init_domain() apart.

* iommu does not support IDENTITY at all, force paging
- tegra: Device quirks mean paging and DMA API doesn't work

Hmm, I don't recall any specific device concerns. What I do know is that historically this driver has always wanted to prevent the ARM DMA domain or older versions of iommu-dma (prior to iommu_get_dma_domain()) from getting in the way of what the host1x drivers expect, so if there is any actual hardware policy here, it's very much implicitly hanging on the coat-tails of software policy.

- amd: The driver can't support PAGING when in SNP mode
- SMMU: The driver can't support paging when in legacy binding mode and
paging domain allocation fails as well

Quite obviously nothing to do with hardware. That's there solely because using the old binding prevents us from reasonably being able to order binding the IOMMU driver against the client drivers, so we can't risk having iommu-dma come up and change the DMA ops underneath an already-active device.

- qcom: Looks like these devices have some iommu bypass bus in their
HW and paging doesn't work

I believe that is true for some of the modem stuff, however again there's also plenty of software policy in there too; some of it the same thing about default domains getting in the way of how the GPU/display drivers want to use dma-direct for cache maintenance underneath their own unmanaged domains, one is completely abusing the domain type in a devious hack which just happens to make some behaviour fall out of the rest of the driver to satisfy an unrelated hardware/firmware constraint (which is still *supposed* to be only a temporary fix pending a proper solution, ha...)

- SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk

For these force overriding the driver knowledge will either result in
domain allocate/attach failure or a broken DMA environment anyhow.

The AMD PASID thing is actually unique because the driver can still
fully support PAGING, despite it wrongly telling the core code that it
can't.

This is happening because it is all just a hack to work around the
incomplete SW implementation in the AMD driver.

Not really, it's shown up because the core code took an illogical step backwards. Without significant surgery to IOMMU and/or driver core code, the best and most reasonable thing to do at the moment is still the one which happens to work out OK for the current AMD behaviour. Yes, if we *had* done all the work to make the core code super-robust then the issue could have been the GPU driver gracefully failing rather than wedging the whole system, and the remaining blame could then be fairly laid on amd_iommu_def_domain_type(), but that is not the case and cannot be claimed to be.

Per our API expectations, a driver's def_domain_type can go one of three ways; do nothing, request IOMMU_DOMAIN_DMA, or request IOMMU_DOMAIN_IDENTITY. For the first two it's clear that untrusted forcing IOMMU_DOMAIN_DMA without even asking is perfectly fine. In the third case, we still definitely want untrusted to take precedence, since our policy is that the device not having access to things it shouldn't is more important that it having to access things it should. Sure, in the almost-impossible case that we did have an external device which genuinely couldn't handle translation, then the user may allow a driver to bind, which ends up not working and causing IOMMU faults, however that would be equally true of any kind of blocking notion we could implement right now, so there's no appreciable loss of functionality. However, the overwhelming fact is that the few devices genuinely requiring IDENTITY overrides are on-chip integrated things, any of which suddenly turning up on external ports would be highly suspect, so that's actually a big argument in *favour* of not believing def_domain_type either in such cases.

In fact the more I think about it, I could swear I've had this discussion and worked through this reasoning before, possibly back when we first introduced it?

When the AMD driver is
completed its def_domain_type should be removed entirely.

Since actual PASID AMD attach isn't implemented yet we could just
remove that check from def_domain_type as an RC fix. Vasant can sort
it out properly by disabling pasid support on untrusted devices until
the DTE logic is fully completed.

In general, restoring the old behaviour of forcing translation will not
make that device's operation any more broken than leaving it potentially
blocked or subject to the rest of a group's translations would, nor will
it be any less safe than leaving it potentially bypassed or subject to
the rest of a group's translations would, so do that, and let eGPUs work
again.

Well, this is true, since we can't handle the probe error it doesn't
matter what we do.

OK, so after all that you do in fact agree? In that case, why are we still mucking about proposing hacks on top of hacks in the AMD driver rather than just fixing the regression sensibly?

Thanks,
Robin.