RE: [PATCH 1/1] iommu/vt-d: Fix missed device TLB cache tag

From: Tian, Kevin
Date: Wed Jun 19 2024 - 23:04:33 EST


> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Thursday, June 20, 2024 8:50 AM
>
> On 6/20/24 12:46 AM, Jason Gunthorpe wrote:
> > On Wed, Jun 19, 2024 at 09:53:45AM +0800, Lu Baolu wrote:
> >> When a domain is attached to a device, the required cache tags are
> >> assigned to the domain so that the related caches could be flushed
> >> whenever it is needed. The device TLB cache tag is created selectively
> >> by checking the ats_enabled field of the device's iommu data. This
> >> creates an ordered dependency between attach and ATS enabling paths.
> >>
> >> The device TLB cache tag will not be created if device's ATS is enabled
> >> after the domain attachment. This causes some devices, for example
> >> intel_vpu, to malfunction.
> > What? How is this even possible?
> >
> > ATS is controlled exclusively by the iommu driver, how can it be
> > changed without the driver knowing??
>
> Yes. ATS is currently controlled exclusively by the iommu driver. The
> intel iommu driver enables PCI/ATS on the probe path after the default
> domain is attached. That means when the default domain is attached to
> the device, the ats_supported is set, but ats_enabled is cleared. So the
> cache tag for the device TLB won't be created.

I don't quite get why this is specific to the probe path and the default
domain.

dmar_domain_attach_device()
{
cache_tag_assign_domain();
//setup pasid entry for pt/1st/2nd
iommu_enable_pci_caps();
}

seems that for all domain attaches above is coded in a wrong order
as ats is enabled after the cache tag is assigned. why is it considered
to affect only some devices e.g. intel_vpu?

>
> A possible solution is to move ATS enabling to a place before the
> default domain attachment. However, this is not future-proof,
> considering that we will eventually hand over the ATS control to the
> device drivers. Therefore, this fix creates the device TLB cache tags as
> long as ats_supported is true and relies on ats_enabled to decide
> whether device TLB needs to be invalidated.
>