Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

From: Vasant Hegde
Date: Thu Feb 16 2023 - 00:37:28 EST


Hi Jason,


On 2/16/2023 6:14 AM, Jason Gunthorpe wrote:
> On Wed, Feb 15, 2023 at 07:35:45PM -0500, Felix Kuehling wrote:
>>
>> If I understand this correctly, the HW or the BIOS is doing something wrong
>> about reporting ACS. I don't know what the GPU driver can do other than add
>> some quirk to stop using AMD IOMMUv2 on this HW/BIOS.
>
> How about this:
>
> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> index 864e4ffb6aa94e..cc027ce9a6e86f 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -732,6 +732,7 @@ EXPORT_SYMBOL(amd_iommu_unbind_pasid);
>
> int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
> {
> + struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev);
> struct device_state *dev_state;
> struct iommu_group *group;
> unsigned long flags;
> @@ -740,6 +741,9 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
>
> might_sleep();
>
> + if (!dev_data->ats.enabled)
> + return -EINVAL;
> +

Thanks for the proposed fix. But aactually this will not solve the issue because
current flow is :
- in this function it tries to allocate new domain
- Calls iommu_attach_group() which will call attach_device. In that path
it will try to enable ATS/PASID and hitting error.

As I mentioned in other reply I think even current code returns error from
amd_iommu_init_device() to GPU. But the issue is, in __iommu_attach_group() path
it detached device from current domain, failed to attach to new domain and
returned error. We didn't put the device back to old domain thats causing the
issue. Below series should fix this issue.

https://lore.kernel.org/linux-iommu/20230215052642.6016-1-vasant.hegde@xxxxxxx/

-Vasant