Re: [PATCH 1/1] iommu/vt-d: Fix missed device TLB cache tag

From: Baolu Lu
Date: Thu Jun 20 2024 - 02:27:50 EST


On 2024/6/20 13:54, Vasant Hegde wrote:
On 6/20/2024 8:43 AM, Baolu Lu wrote:
On 6/20/24 11:04 AM, Tian, Kevin wrote:
From: Baolu Lu<baolu.lu@xxxxxxxxxxxxxxx>
Sent: Thursday, June 20, 2024 8:50 AM

On 6/20/24 12:46 AM, Jason Gunthorpe wrote:
On Wed, Jun 19, 2024 at 09:53:45AM +0800, Lu Baolu wrote:
When a domain is attached to a device, the required cache tags are
assigned to the domain so that the related caches could be flushed
whenever it is needed. The device TLB cache tag is created selectively
by checking the ats_enabled field of the device's iommu data. This
creates an ordered dependency between attach and ATS enabling paths.

The device TLB cache tag will not be created if device's ATS is enabled
after the domain attachment. This causes some devices, for example
intel_vpu, to malfunction.
What? How is this even possible?

ATS is controlled exclusively by the iommu driver, how can it be
changed without the driver knowing??
Yes. ATS is currently controlled exclusively by the iommu driver. The
intel iommu driver enables PCI/ATS on the probe path after the default
domain is attached. That means when the default domain is attached to
the device, the ats_supported is set, but ats_enabled is cleared. So the
cache tag for the device TLB won't be created.
I don't quite get why this is specific to the probe path and the default
domain.
The issue is with the domain attaching device path, not specific to the
probe or default domain.

dmar_domain_attach_device()
{
    cache_tag_assign_domain();
    //setup pasid entry for pt/1st/2nd
    iommu_enable_pci_caps();
}

seems that for all domain attaches above is coded in a wrong order
as ats is enabled after the cache tag is assigned.
Yes, exactly. But simply changing the order isn't future-proof,
considering ATS control will eventually be moved out of iommu drivers.
[Unrelated to this patch]

You mean ATS setup will be moved to individual device driver? Is there any
reason for that?

Not exactly to individual device drivers, but it should be out of the
iommu drivers.

https://lore.kernel.org/linux-iommu/BL1PR12MB51441FC4303BD0442EDB7A9CF7FFA@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Best regards,
baolu