Re: [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context entries
From: Baolu Lu
Date: Wed Jan 14 2026 - 00:14:36 EST
On 1/14/26 03:27, Dmytro Maluka wrote:
On Tue, Jan 13, 2026 at 11:00:46AM +0800, Lu Baolu wrote:
On Intel IOMMU, device context entries are accessed by hardware inFWIW, Jason and Kevin contributed to this discovery more than I did. 🙂
128-bit chunks. Currently, the driver updates these entries by
programming the 'lo' and 'hi' 64-bit fields individually.
This creates a potential race condition where the IOMMU hardware may fetch
a context entry while the CPU has only completed one of the two 64-bit
writes. This "torn" entry — consisting of half-old and half-new data —
could lead to unpredictable hardware behavior, especially when
transitioning the 'Present' bit or changing translation types.
To ensure the IOMMU hardware always observes a consistent state, use
128-bit atomic updates for context entries. This is achieved by building
context entries on the stack and write them to the table in a single
operation.
As this relies on arch_cmpxchg128_local(), restrict INTEL_IOMMU
dependencies to X86_64.
Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
Reported-by: Dmytro Maluka<dmaluka@xxxxxxxxxxxx>
Thanks to all you guys.
Closes:https://lore.kernel.org/all/aTG7gc7I5wExai3S@xxxxxxxxxx/Any reason why not cmpxchg128_local()? (except following the AMD driver)
Signed-off-by: Lu Baolu<baolu.lu@xxxxxxxxxxxxxxx>
---
drivers/iommu/intel/Kconfig | 2 +-
drivers/iommu/intel/iommu.h | 22 ++++++++++++++++++----
drivers/iommu/intel/iommu.c | 30 +++++++++++++++---------------
drivers/iommu/intel/pasid.c | 18 +++++++++---------
4 files changed, 43 insertions(+), 29 deletions(-)
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 5471f814e073..efda19820f95 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -11,7 +11,7 @@ config DMAR_DEBUG
config INTEL_IOMMU
bool "Support for Intel IOMMU using DMA Remapping Devices"
- depends on PCI_MSI && ACPI && X86
+ depends on PCI_MSI && ACPI && X86_64
select IOMMU_API
select GENERIC_PT
select IOMMU_PT
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 25c5e22096d4..b8999802f401 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -546,6 +546,16 @@ struct pasid_entry;
struct pasid_state_entry;
struct page_req_dsc;
+static __always_inline void intel_iommu_atomic128_set(u128 *ptr, u128 val)
+{
+ /*
+ * Use the cmpxchg16b instruction for 128-bit atomicity. As updates
+ * are serialized by a spinlock, we use the local (unlocked) variant
+ * to avoid unnecessary bus locking overhead.
+ */
+ arch_cmpxchg128_local(ptr, *ptr, val);
Yes. This follows the AMD IOMMU driver. Both drivers use spin lock to
synchronize the update of table entries. They only need the atomicity of
the 128-bit instruction itself. So arch_cmpxchg128_local() works.
Otherwise,
Reviewed-by: Dmytro Maluka<dmaluka@xxxxxxxxxxxx>
Thanks,
baolu