Re: [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context entries

From: Baolu Lu

Date: Wed Jan 14 2026 - 21:26:14 EST


On 1/14/26 18:55, Dmytro Maluka wrote:
On Wed, Jan 14, 2026 at 01:14:36PM +0800, Baolu Lu wrote:
On 1/14/26 03:27, Dmytro Maluka wrote:
On Tue, Jan 13, 2026 at 11:00:46AM +0800, Lu Baolu wrote:
+static __always_inline void intel_iommu_atomic128_set(u128 *ptr, u128 val)
+{
+ /*
+ * Use the cmpxchg16b instruction for 128-bit atomicity. As updates
+ * are serialized by a spinlock, we use the local (unlocked) variant
+ * to avoid unnecessary bus locking overhead.
+ */
+ arch_cmpxchg128_local(ptr, *ptr, val);
Any reason why not cmpxchg128_local()? (except following the AMD driver)

Yes. This follows the AMD IOMMU driver. Both drivers use spin lock to
synchronize the update of table entries. They only need the atomicity of
the 128-bit instruction itself. So arch_cmpxchg128_local() works.

Yeah, but my question was merely: why use the raw arch_*() version, not
cmpxchg128_local() which is the same but also includes optional
kasan/kcsan instrumentation:

#define cmpxchg128_local(ptr, ...) \
({ \
typeof(ptr) __ai_ptr = (ptr); \
instrument_atomic_read_write(__ai_ptr, sizeof(*__ai_ptr)); \
raw_cmpxchg128_local(__ai_ptr, __VA_ARGS__); \
})

IOW, why bypass this instrumentation?

You are right. There is no strong technical reason to bypass the kasan/
kcsan instrumentation here. My use of the arch_ version was primarily
following the existing pattern in the AMD driver, likely under the
assumption that the spinlock provided sufficient synchronization.

That said, Jason has suggested the generic entry_sync library to handle
these types of multi-quanta updates across different IOMMU drivers. I
plan to adopt that in the next version.

Thanks,
baolu