Re: [REGRESSION 5.19.x] AMD HD-audio devices missing on 5.19

From: Jason Gunthorpe
Date: Tue Aug 23 2022 - 20:03:10 EST


On Tue, Aug 23, 2022 at 10:01:57PM +0100, Robin Murphy wrote:

> > diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> > index 696d5555be5794..6a1f02c62dffcc 100644
> > --- a/drivers/iommu/amd/iommu_v2.c
> > +++ b/drivers/iommu/amd/iommu_v2.c
> > @@ -777,6 +777,8 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
> > if (dev_state->domain == NULL)
> > goto out_free_states;
> > + /* See iommu_is_default_domain() */
> > + dev_state->domain->type = IOMMU_DOMAIN_IDENTITY;
> > amd_iommu_domain_direct_map(dev_state->domain);
>
> Same question as 6 months ago, apparently: allocating an unmanaged domain
> with a pagetable then sucking out the pagetable is silly enough, but if
> we're going to then also call it a proper identity domain, we should really
> just allocate an identity domain directly; but then why not just enable_v2
> on the identity domain that we know is already there courtesy of
> def_domain_type?

Yeah, nobody who knows this code answered that question either..

Looking at it a bit, I think this comment will start to be a problem:

/*
* Save us all sanity checks whether devices already in the
* domain support IOMMUv2. Just force that the domain has no
* devices attached when it is switched into IOMMUv2 mode.
*/
ret = -EBUSY;
if (domain->dev_cnt > 0 || domain->flags & PD_IOMMUV2_MASK)
goto out;

Beacuse we should have dev_cnt != 0 on the existing identity domain at
this point - worse if the probe order is backwards the sound driver
may even already be running when we reach this.

Plus the challenge of undoing it when the PASID user goes away.

Overall I can see how it is easier and more logical to transition
between two domains. We already have good infrastructure for doing
that.

>From a core perspective I don't have a real problem with iommu drivers
using multiple iommu_domains to manage their internal operations, eg
for different operating modes. But you are right that it should be
cleaner and directly allocate the special domains it needs. This would
be much more self-descriptive if it called a function 'allocate v2
identity domain', for instance.

I think it would also make sense for the core to provide some API to
change the default domain (ie dma API domain) of a group, and that
would be a more logical, and self explanatory, API for iommu drivers
to use than attach/detach. ie:

iommu_change_default_domain(group, amd_identity_domain_v2):
iommu_change_default_domain(group, amd_identity_domain_v1):

At least for this effort I wanted something simple enough to backport
that maybe doesn't need to be an expert in the amd iommu to write..
[I checked some more and the hack to change the type looks like it is
OK on the free path, so maybe this even works]

My general hope is that we can convince AMD to work on this once the
generic PASID & PRI series lands, as this entire private path to the
GPU driver and non-standard PASID handling all needs to be aligned
with the upcoming core code. When doing that work it would make sense
to tidy and modernize this better. I added a bunch of AMD people to
this thread to that end. It sure would be good if AMD participated in
that series since they are going to have to use it too.

https://lore.kernel.org/linux-iommu/20220817012024.3251276-1-baolu.lu@xxxxxxxxxxxxxxx/

Regards,
Jason