Re: [PATCH 02/12] iommu/vt-d: Add cache tag invalidation helpers

From: Baolu Lu
Date: Sun Apr 07 2024 - 01:34:17 EST


On 3/28/24 3:39 PM, Tian, Kevin wrote:
From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
Sent: Monday, March 25, 2024 10:17 AM

+
+static unsigned long calculate_psi_aligned_address(unsigned long start,
+ unsigned long end,
+ unsigned long *_pages,
+ unsigned long *_mask)
+{
+ unsigned long pages = (end - start + VTD_PAGE_SIZE - 1) >>
VTD_PAGE_SHIFT;

this doesn't sound correct. You'd want to follow aligned_nrpages().

Okay, I will make it like below.

unsigned long pages = aligned_nrpages(start, end - start + 1);


+ case CACHE_TAG_TYPE_DEVTLB:
+ if (tag->pasid == IOMMU_NO_PASID)
+ qi_flush_dev_iotlb(iommu, sid, info->pfsid,
+ info->ats_qdep, addr,
mask);
+ else
+ qi_flush_dev_iotlb_pasid(iommu, sid, info-
pfsid,
+ tag->pasid, info-
ats_qdep,
+ addr, mask);
+
+ quirk_extra_dev_tlb_flush(info, addr, mask, tag-
pasid, info->ats_qdep);
+ break;
+ case CACHE_TAG_TYPE_PARENT_DEVTLB:
+ /*
+ * Address translation cache in device side caches the
+ * result of nested translation. There is no easy way
+ * to identify the exact set of nested translations
+ * affected by a change in S2. So just flush the entire
+ * device cache.
+ */
+ qi_flush_dev_iotlb(iommu, sid, info->pfsid, info-
ats_qdep,
+ 0, MAX_AGAW_PFN_WIDTH);
+ quirk_extra_dev_tlb_flush(info, 0,
MAX_AGAW_PFN_WIDTH,
+ IOMMU_NO_PASID, info-
ats_qdep);
+ break;

parent devtlb can has pasid too, though not enabled now. As core helpers
probably we can support it now leading to simpler code:

case CACHE_TAG_TYPE_PARENT_DEVTLB:
//change addr/mask
//fallthrough
case CACHE_TAG_TYPE_DEVTLB:
//what this patch does

Yes. Yours is better. I will make it like below:

+ case CACHE_TAG_PARENT_DEVTLB:
+ /*
+ * Address translation cache in device side caches the
+ * result of nested translation. There is no easy way
+ * to identify the exact set of nested translations
+ * affected by a change in S2. So just flush the entire
+ * device cache.
+ */
+ addr = 0;
+ mask = MAX_AGAW_PFN_WIDTH;
+ fallthrough;
case CACHE_TAG_DEVTLB:


+void cache_tag_flush_all(struct dmar_domain *domain)
+{
+ struct cache_tag *tag;
+ unsigned long flags;
+
+ spin_lock_irqsave(&domain->cache_lock, flags);
+ list_for_each_entry(tag, &domain->cache_tags, node) {
+ struct device_domain_info *info = dev_iommu_priv_get(tag-
dev);
+ struct intel_iommu *iommu = tag->iommu;
+ u16 sid = PCI_DEVID(info->bus, info->devfn);
+
+ switch (tag->type) {
+ case CACHE_TAG_TYPE_IOTLB:
+ case CACHE_TAG_TYPE_PARENT_IOTLB:
+ if (domain->use_first_level)
+ qi_flush_piotlb(iommu, tag->domain_id,
+ tag->pasid, 0, -1, 0);
+ else
+ iommu->flush.flush_iotlb(iommu, tag-
domain_id,
+ 0, 0,
DMA_TLB_DSI_FLUSH);
+ break;
+ case CACHE_TAG_TYPE_DEVTLB:
+ case CACHE_TAG_TYPE_PARENT_DEVTLB:
+ qi_flush_dev_iotlb(iommu, sid, info->pfsid, info-
ats_qdep,
+ 0, MAX_AGAW_PFN_WIDTH);
+ quirk_extra_dev_tlb_flush(info, 0,
MAX_AGAW_PFN_WIDTH,
+ IOMMU_NO_PASID, info-
ats_qdep);
+ break;
+ }
+ }

could this loop be consolidated with the one in previous helper? anyway
the only difference is on addr/mask...

+/*
+ * Invalidate a range of IOVA when IOMMU is in caching mode and new
mappings
+ * are added to the target domain.
+ */
+void cache_tag_flush_cm_range(struct dmar_domain *domain, unsigned
long start,
+ unsigned long end)
+{

I'm also not sure why this is worth a separate helper. why couldn't it
be managed by previous flush_range()?
This is only my preference. I'd like to separate things belonging to
different paths, so that it's easier for maintenance. For example, if,
in the future, we need to add or enhance something for a specific case,
we don't need to care about other cases.


+ unsigned long pages, mask, addr;
+ struct cache_tag *tag;
+ unsigned long flags;
+
+ addr = calculate_psi_aligned_address(start, end, &pages, &mask);
+
+ spin_lock_irqsave(&domain->cache_lock, flags);
+ list_for_each_entry(tag, &domain->cache_tags, node) {
+ struct intel_iommu *iommu = tag->iommu;
+
+ /*
+ * When IOMMU is enabled in caching mode some non-
present
+ * mappings may be cached by the IOTLB if it uses second-
+ * only translation.
+ */
+ if (!cap_caching_mode(iommu->cap) || domain-
use_first_level) {
+ iommu_flush_write_buffer(iommu);
+ continue;
+ }

the comment really doesn't tell what this block does. from its intention
it probably more suitable to be in the caller side as today.

Yes, agreed. Will remove it.


+
+ if (tag->type == CACHE_TAG_TYPE_IOTLB ||
+ tag->type == CACHE_TAG_TYPE_PARENT_IOTLB) {
+ /*
+ * Fallback to domain selective flush if no
+ * PSI support or the size is too big.
+ */
+ if (!cap_pgsel_inv(iommu->cap) ||
+ mask > cap_max_amask_val(iommu->cap))
+ iommu->flush.flush_iotlb(iommu, tag-
domain_id,
+ 0, 0,
DMA_TLB_DSI_FLUSH);
+ else
+ iommu->flush.flush_iotlb(iommu, tag-
domain_id,
+ addr, mask,
+
DMA_TLB_PSI_FLUSH);
+ }
+ }

You skipped devtlb invalidation. yes it's not necessary based on current
nested translation design and this part is inconsistent in different paths.

but as a semantics change you may want to first make removing devtlb
invalidation a separate patch and then do this cleanup, so bisect is easier.

This helper is designed for map paths when the IOMMU is in caching mode.
Caching mode doesn't impact the device TLB, so we should not invalidate
the device TLB here.

I guess the confusing thing is about the code below.


/*
* In caching mode, changes of pages from non-present to present require
* flush. However, device IOTLB doesn't need to be flushed in this case.
*/
if (!cap_caching_mode(iommu->cap) || !map)
iommu_flush_dev_iotlb(domain, addr, mask);

The real code doesn't match the comments, and I think the comments
describe the right thing. So perhaps the code should be changed to match
the comments.

if (!map)
iommu_flush_dev_iotlb(domain, addr, mask);

??

Best regards,
baolu