Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

From: Vasant Hegde
Date: Tue Apr 16 2024 - 06:54:03 EST


Hi Eric,


On 4/16/2024 7:42 AM, Eric Wagner wrote:
On Mon, Apr 15, 2024 at 5:44 PM Robin Murphy <robin.murphy@xxxxxxx <mailto:robin.murphy@xxxxxxx>> wrote:

As a first step I'd test the quick hack below, but be prepared for things
to still break slightly differently.

Cheers,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 996e79dc582d..063e1eb32fbd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1774,7 +1774,7 @@ static int iommu_get_default_domain_type(struct
iommu_group *group,
                                untrusted,
                                "Device is not trusted, but driver is
overriding group %u to %s,
refusing to probe.\n",
                                group->id, iommu_domain_type_str(driver_type));
-                       return -1;
+                       //return -1;
                }
                driver_type = IOMMU_DOMAIN_DMA;
        }

This worked and got the system booting for me.

Thanks for testing.

IIUC eGPU is behind Thunderbolt and hence IOMMU treated it as 'untrusted device'.

AMD driver tries to allocate "IDENTITY" domain for GPU devices. iommu_get_default_domain_type() return -1 as it expects IOMMU_DOMAIN_DMA for untrusted device.

Can you please attach lspci -vvv output? Also /proc/cmdline output.



On Mon, Apr 15, 2024 at 8:39 PM Jason Gunthorpe <jgg@xxxxxxxx <mailto:jgg@xxxxxxxx>> wrote:

On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
> On 2024-04-15 7:57 pm, Eric Wagner wrote:
> > Apologies if I made a mistake in the first bisect, I'm new to kernel
> > debugging.
> >
> > I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> > 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were
good.
> > Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> > as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
> > bisect log is attached. It ended up at the same commit as before.
> >
> > I've also attached a picture of the boot screen that occurs when it hangs.
> > 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
> > problem.
>
> Looks like 59ddce4418da483 probably broke things most - prior to that, the
> fact that it's behind a Thunderbolt port would have always taken precedence
> and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
> saywhereas now we ask the driver first, then complain that it conflicts
> with the untrusted status and ultimately don't configure the IOMMU
> at all.

Yes, if the driver wants to force a domain type it should be
forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
is incapable of supporting otherwise.


@Jason,

Looks like before commit 59ddce4418da483, core IOMMU layer was enforcing 'IOMMU_DOMAIN_DMA' for untrusted device. If its trusted device then it was letting HW IOMMU driver to pick best domain type.

Did we change that flow? Are we expecting HW IOMMU driver to handler untrusted devices as well?


> Meanwhile the GPU driver presumably goes on to believe it's using dma-direct
> with no IOMMU present, resulting in fireworks when its traffic reaches the
> IOMMU. Great :(

I wonder where is the missing error handling.. iommu probe failure
should not go on to allow driver attach, there is no guarentee any DMA
works now that many iommus are booting up in BLOCKED.


Looking into code path, in failure path we cleanup device, not group. May be that's causing issue? Not sure where it failed. If I manage to find some system I will try to debug further.




> However the other notable thing that also happened between 6.6 and 6.7 was
> the removal of the AMD iommu_v2 code, so there's some possibility that the
> GPU driver still may have only been working before due to that also

Most likely it is the above change interacting with this patch when
they are both combined in the merge:

commit 92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6
Author: Vasant Hegde <vasant.hegde@xxxxxxx <mailto:vasant.hegde@xxxxxxx>>
Date:   Thu Sep 21 09:21:45 2023 +0000

    iommu/amd: Introduce iommu_dev_data.flags to track device capabilities

@@ -2471,7 +2481,7 @@ static int amd_iommu_def_domain_type(struct device *dev)
         *    and require remapping.
         *  - SNP is enabled, because it prohibits DTE[Mode]=0.
         */
-       if (dev_data->iommu_v2 &&
+       if (pdev_pasid_supported(dev_data) &&
            !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
            !amd_iommu_snp_en) {
                return IOMMU_DOMAIN_IDENTITY;

Which, IIRC, was intended to be temporary to work around limitations
in the DTE programming logic within the driver. Previously iommu_v2 as
a module option that Eric probably doesn't set, I guess.

The below will probably make it boot, but Vasant should check what
happens if PASID is eventually attached too.

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d35c1b8c8e65ce..f3da6a5b6cb1cb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2758,11 +2758,16 @@ static int amd_iommu_def_domain_type(struct device *dev)
         *    and require remapping.
         *  - SNP is enabled, because it prohibits DTE[Mode]=0.
         */
-       if (pdev_pasid_supported(dev_data) &&
-           !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
-           !amd_iommu_snp_en) {
+       if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
+               return IOMMU_DOMAIN_IDENTITY;
+
+       /*
+        * For now driver limitations prevent enabling PASID as a paging domain
+        * on the RID together.
+        */
+       if (dev_is_pci(dev) && !to_pci_dev(dev)->untrusted &&
+           pdev_pasid_supported(dev_data))
                return IOMMU_DOMAIN_IDENTITY;
-       }

        return 0;
 }

Jason


As it booted ok with Robin's patch above, these changes to drivers/iommu/amd/iommu.c didn't seem to make a difference for me. I was testing with amd iommu v2 off in the kernel config and I also have TSME enabled in the BIOS if that matters.

TMSE is transparent to OS. So its fine.

-Vasant