Re: [PATCH 2/8] iommu/vt-d: Add entry_sync support for PASID entry updates

From: Jason Gunthorpe

Date: Mon Mar 09 2026 - 09:41:44 EST


On Mon, Mar 09, 2026 at 02:06:42PM +0800, Lu Baolu wrote:
> +static void intel_pasid_get_used(const u128 *entry, u128 *used)
> +{
> + struct pasid_entry *pe = (struct pasid_entry *)entry;
> + struct pasid_entry *ue = (struct pasid_entry *)used;
> + u16 pgtt;
> +
> + /* Initialize used bits to 0. */
> + memset(ue, 0, sizeof(*ue));
> +
> + /* Present bit always matters. */
> + ue->val[0] |= PASID_PTE_PRESENT;
> +
> + /* Nothing more for non-present entries. */
> + if (!(pe->val[0] & PASID_PTE_PRESENT))
> + return;
> +
> + pgtt = pasid_pte_get_pgtt(pe);
> + switch (pgtt) {
> + case PASID_ENTRY_PGTT_FL_ONLY:
> + /* AW, PGTT */
> + ue->val[0] |= GENMASK_ULL(4, 2) | GENMASK_ULL(8, 6);
> + /* DID, PWSNP, PGSNP */
> + ue->val[1] |= GENMASK_ULL(24, 23) | GENMASK_ULL(15, 0);
> + /* FSPTPTR, FSPM */
> + ue->val[2] |= GENMASK_ULL(63, 12) | GENMASK_ULL(3, 2);

This would be an excellent time to properly add these constants :(

/* 9.6 Scalable-Mode PASID Table Entry */
#define SM_PASID0_P BIT_U64(0)
#define SM_PASID0_FPD BIT_U64(1)
#define SM_PASID0_AW GENMASK_U64(4, 2)
#define SM_PASID0_SSEE BIT_U64(5)
#define SM_PASID0_PGTT GENMASK_U64(8, 6)
#define SM_PASID0_SSADE BIT_U64(9)
#define SM_PASID0_SSPTPTR GENMASK_U64(63, 12)

#define SM_PASID1_DID GENMASK_U64(15, 0)
#define SM_PASID1_PWSNP BIT_U64(23)
#define SM_PASID1_PGSNP BIT_U64(24)
#define SM_PASID1_CD BIT_U64(25)
#define SM_PASID1_EMTE BIT_U64(26)
#define SM_PASID1_PAT GENMASK_U64(63, 32)

#define SM_PASID2_SRE BIT_U64(0)
#define SM_PASID2_ERE BIT_U64(1)
#define SM_PASID2_FSPM GENMASK_U64(3, 2)
#define SM_PASID2_WPE BIT_U64(4)
#define SM_PASID2_NXE BIT_U64(5)
#define SM_PASID2_SMEP BIT_U64(6)
#define SM_PASID2_EAFE BIT_U64(7)
#define SM_PASID2_FSPTPTR GENMASK_U64(63, 12)

> +static void intel_pasid_sync(struct entry_sync_writer128 *writer)
> +{
> + struct intel_pasid_writer *p_writer = container_of(writer,
> + struct intel_pasid_writer, writer);
> + struct intel_iommu *iommu = p_writer->iommu;
> + struct device *dev = p_writer->dev;
> + bool was_present, is_present;
> + u32 pasid = p_writer->pasid;
> + struct pasid_entry *pte;
> + u16 old_did, old_pgtt;
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + was_present = p_writer->was_present;
> + is_present = pasid_pte_is_present(pte);
> + old_did = pasid_get_domain_id(&p_writer->orig_pte);
> + old_pgtt = pasid_pte_get_pgtt(&p_writer->orig_pte);
> +
> + /* Update the last present state: */
> + p_writer->was_present = is_present;
> +
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + /* Sync for "P=0" to "P=1": */
> + if (!was_present) {
> + if (is_present)
> + pasid_flush_caches(iommu, pte, pasid,
> + pasid_get_domain_id(pte));
> +
> + return;
> + }
> +
> + /* Sync for "P=1" to "P=1": */
> + if (is_present) {
> + intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> + return;
> + }
> +
> + /* Sync for "P=1" to "P=0": */
> + pasid_cache_invalidation_with_pasid(iommu, old_did, pasid);

Why all this logic? All this different stuff does is meddle with the
IOTLB and it should not seen below.

If the sync is called it should just always call
pasid_cache_invalidation_with_pasid(), that's it.

Writer has already eliminated all cases where sync isn't needed.

> + if (old_pgtt == PASID_ENTRY_PGTT_PT || old_pgtt == PASID_ENTRY_PGTT_FL_ONLY)
> + qi_flush_piotlb(iommu, old_did, pasid, 0, -1, 0);
> + else
> + iommu->flush.flush_iotlb(iommu, old_did, 0, 0, DMA_TLB_DSI_FLUSH);
> + devtlb_invalidation_with_pasid(iommu, dev, pasid);

The IOTLB should already be clean'd before the new entry using the
cache tag is programmed. Cleaning it after the entry is live is buggy.

The writer logic ensures it never sees a corrupted entry, so the clean
cache tag cannot be mangled during the writing process.

The way ARM is structured has the cache tags clean if they are in the
allocator bitmap, so when the driver fetches a new tag and starts
using it is clean and non cleaning is needed

When it frees a tag it cleans it and then returns it to the allocator.

ATC invalidations should always be done after the PASID entry is
written. During a hitless update both translations are unpredictably
combined, this is unavoidable and OK.

Jason