Re: A problem of Intel IOMMU hardware ?

From: Lu Baolu
Date: Wed Mar 17 2021 - 01:27:01 EST


Hi Longpeng,

On 3/17/21 11:16 AM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
Hi guys,

We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a special
situation, it would cause DMA fails or get wrong data.

The reproducer (based on Alex's vfio testsuite[1]) is in attachment, it can
reproduce the problem with high probability (~50%).

The machine we used is:
processor : 47
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
stepping : 4
microcode : 0x2000069

And the iommu capability reported is:
ver 1:0 cap 8d2078c106f0466 ecap f020df
(caching mode = 0 , page-selective invalidation = 1)

(The problem is also on 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz' and
'Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz')

We run the reproducer on Linux 4.18 and it works as follow:

Step 1. alloc 4G *2M-hugetlb* memory (N.B. no problem with 4K-page mapping)

I don't understand 2M-hugetlb here means exactly. The IOMMU hardware
supports both 2M and 1G super page. The mapping physical memory is 4G.
Why couldn't it use 1G super page?

Step 2. DMA Map 4G memory
Step 3.
while (1) {
{UNMAP, 0x0, 0xa0000}, ------------------------------------ (a)
{UNMAP, 0xc0000, 0xbff40000},

Have these two ranges been mapped before? Does the IOMMU driver
complains when you trying to unmap a range which has never been
mapped? The IOMMU driver implicitly assumes that mapping and
unmapping are paired.

{MAP, 0x0, 0xc0000000}, --------------------------------- (b)
use GDB to pause at here, and then DMA read IOVA=0,

IOVA 0 seems to be a special one. Have you verified with other addresses
than IOVA 0?

sometimes DMA success (as expected),
but sometimes DMA error (report not-present).
{UNMAP, 0x0, 0xc0000000}, --------------------------------- (c)
{MAP, 0x0, 0xa0000},
{MAP, 0xc0000, 0xbff40000},
}

The DMA read operations sholud success between (b) and (c), it should NOT report
not-present at least!

After analysis the problem, we think maybe it's caused by the Intel iommu iotlb.
It seems the DMA Remapping hardware still uses the IOTLB or other caches of (a).

When do DMA unmap at (a), the iotlb will be flush:
intel_iommu_unmap
domain_unmap
iommu_flush_iotlb_psi

When do DMA map at (b), no need to flush the iotlb according to the capability
of this iommu:
intel_iommu_map
domain_pfn_mapping
domain_mapping
__mapping_notify_one
if (cap_caching_mode(iommu->cap)) // FALSE
iommu_flush_iotlb_psi

That's true. The iotlb flushing is not needed in case of PTE been
changed from non-present to present unless caching mode.

But the problem will disappear if we FORCE flush here. So we suspect the iommu
hardware.

Do you have any suggestion ?

Best regards,
baolu