Re: [REGRESSION 5.19.x] AMD HD-audio devices missing on 5.19
From: Takashi Iwai
Date: Tue Aug 23 2022 - 02:06:16 EST
On Tue, 23 Aug 2022 03:00:21 +0200,
Jason Gunthorpe wrote:
>
> On Mon, Aug 22, 2022 at 04:12:59PM +0200, Takashi Iwai wrote:
> > Hi,
> >
> > we've received regression reports about the missing HD-audio devices
> > on AMD platforms, and this turned out to be caused by the commit
> > 512881eacfa72c2136b27b9934b7b27504a9efc2
> > bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management
> >
> > The details are found in openSUSE bugzilla:
> > https://bugzilla.suse.com/show_bug.cgi?id=1202492
> >
> > The problem seems to be that HD-audio (both onboard analog and HDMI)
> > PCI devices are assigned to the same IOMMU group as AMD graphics PCI
> > device, and once after the AMDGPU is initialized beforehand, those
> > audio devices can't be probed since iommu_device_use_default_domain()
> > returns -EBUSY.
>
> Can you describe exactly what drivers are involved in this? If it is
> the above commit then several devices are sharing an iommu group and
> one of them (well, the only one already attached, I suppose) has made
> the group unsharable.
>
> With grep I don't see an obvious place where the AMDGPU driver would
> mess with the iommu configuration, so I have no guess.
I have also no concrete clue, either :)
At least, drivers/gpu/drm/amd/amdkfd/kfd_iommu.c calls
amd_iommu_init_device(), and this invokes iommu_attach_group(), which
may change group->domain. But it was just my wild guess, and it might
be others, indeed.
> It would be good to have some debugging to confirm if it is
> group->owner (should be impossible, suggests memory corruption if it
> is) or group->domain != group->default_domain.
>
> Most likely it is the later, but I can't see how that could happen on
> a system like this.. There is no obvious manipulation in AMDGPU, for
> instance.
>
> So debugging to find the backtrace for exactly when
> group->domain != group->default_domain
> Occurs for the troubled group would be necessary.
OK, will try to build a test kernel with some debug prints and ask the
reporters. It may take some time.
> If you know the group name it would be easy enough to cook a patch to
> throw a warn on when group->domain changes
>
> > domain assignment. In anyway, disabling IOMMU works around the
> > problem, and passing driver_managed_dma flag to the HD-audio driver
> > was also confirmed to work around it, too.
>
> Disabling iommu removes the groups entirely, this disables the check.
>
> driver_managed_dma disables the check entirely - which raises the
> question how the driver is even able to work..
>
> If the domain is not the default_domain it is very surprising that DMA
> can work at all. Since it does, something really odd has happened.
Yeah it's something odd ;)
I'm not sure whether the people tested HD-audio capability at all, but
otherwise they might have noticed.
Takashi