Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

From: Suthikulpanit, Suravee
Date: Wed Jan 16 2019 - 09:09:05 EST


Joerg,

On 1/16/19 8:26 PM, joro@xxxxxxxxxx wrote:
> On Wed, Jan 16, 2019 at 04:16:25AM +0000, Suthikulpanit, Suravee wrote:
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> index 525659b88ade..ab31ba75da1b 100644
>> --- a/drivers/iommu/amd_iommu.c
>> +++ b/drivers/iommu/amd_iommu.c
>> @@ -1248,7 +1248,13 @@ static void __domain_flush_pages(struct protection_domain *domain,
>> build_inv_iommu_pages(&cmd, address, size, domain->id, pde);
>>
>> for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
>> - if (!domain->dev_iommu[i])
>> + /*
>> + * The dev_cnt is zero when all devices are detached
>> + * from the domain. This is the case when VFIO detaches
>> + * all devices from the group before flushing IOMMU pages.
>> + * So, always issue the flush command.
>> + */
>> + if (domain->dev_cnt && !domain->dev_iommu[i])
>> continue;
>
> This doesn't look like the correct fix. We still miss the flush if we
> detach the last device from the domain.

Actually, I am not sure how we would be missing the flush on the last device.
In my test, I am seeing the flush command being issued correctly during
vfio_unmap_unpin(), which is after all devices are detached.
Although, I might be missing your point here. Could you please elaborate?

> How about the attached diff? If
> I understand the problem correctly, it should fix the problem more
> reliably.
>
> Thanks,
>
> Joerg
>
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 87ba23a75b38..dc1e2a8a19d7 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1991,25 +1991,36 @@ static void do_attach(struct iommu_dev_data *dev_data,
>
> static void do_detach(struct iommu_dev_data *dev_data)
> {
> + struct protection_domain *domain = dev_data->domain;
> struct amd_iommu *iommu;
> u16 alias;
>
> iommu = amd_iommu_rlookup_table[dev_data->devid];
> alias = dev_data->alias;
>
> - /* decrease reference counters */
> - dev_data->domain->dev_iommu[iommu->index] -= 1;
> - dev_data->domain->dev_cnt -= 1;
> -
> /* Update data structures */
> dev_data->domain = NULL;
> list_del(&dev_data->list);
> - clear_dte_entry(dev_data->devid);
> - if (alias != dev_data->devid)
> - clear_dte_entry(alias);
>
> + clear_dte_entry(dev_data->devid);
> /* Flush the DTE entry */
> device_flush_dte(dev_data);
> +
> + if (alias != dev_data->devid) {
> + clear_dte_entry(alias);
> + /* Flush the Alias DTE entry */
> + device_flush_dte(alias);
> + }
> +
> + /* Flush IOTLB */
> + domain_flush_tlb_pde(domain);
> +
> + /* Wait for the flushes to finish */
> + domain_flush_complete(domain);
> +
> + /* decrease reference counters - needs to happen after the flushes */
> + domain->dev_iommu[iommu->index] -= 1;
> + domain->dev_cnt -= 1;
> }

I have also considered this. This would also work. But since we are already
doing page flushes during page unmapping later on after all devices are detached.
So, would this be necessary? Please see vfio_iommu_type1_detach_group().

Also, if we consider the case where there are more than one devices sharing
the domain. This would issue page flush every time we detach a device,
and while other devices still attached to the domain.

Regards,
Suravee