Re: A problem of Intel IOMMU hardware ?

From: Lu Baolu
Date: Wed Mar 17 2021 - 23:13:57 EST

Hi Nadav,

On 3/18/21 2:12 AM, Nadav Amit wrote:

On Mar 17, 2021, at 2:35 AM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) <longpeng2@xxxxxxxxxx> wrote:

Hi Nadav,

-----Original Message-----
From: Nadav Amit [mailto:nadav.amit@xxxxxxxxx]
reproduce the problem with high probability (~50%).

I saw Lu replied, and he is much more knowledgable than I am (I was just intrigued
by your email).

However, if I were you I would try also to remove some “optimizations” to look for
the root-cause (e.g., use domain specific invalidations instead of page-specific).

Good suggestion! But we did it these days, we tried to use global invalidations as follow:
iommu->flush.flush_iotlb(iommu, did, 0, 0,
But can not resolve the problem.

The first thing that comes to my mind is the invalidation hint (ih) in
iommu_flush_iotlb_psi(). I would remove it to see whether you get the failure
without it.

We also notice the IH, but the IH is always ZERO in our case, as the spec says:
Paging-structure-cache entries caching second-level mappings associated with the specified
domain-id and the second-level-input-address range are invalidated, if the Invalidation Hint
(IH) field is Clear.

It seems the software is everything fine, so we've no choice but to suspect the hardware.

Ok, I am pretty much out of ideas. I have two more suggestions, but
they are much less likely to help. Yet, they can further help to rule
out software bugs:

1. dma_clear_pte() seems to be wrong IMHO. It should have used WRITE_ONCE()
to prevent split-write, which might potentially cause “invalid” (partially
cleared) PTE to be stored in the TLB. Having said that, the subsequent
IOTLB flush should have prevented the problem.

Agreed. The pte read/write should use READ/WRITE_ONCE() instead.

2. Consider ensuring that the problem is not somehow related to queued
invalidations. Try to use __iommu_flush_iotlb() instead of


Best regards,