Re: A problem of Intel IOMMU hardware ?

From: Lu Baolu
Date: Thu Mar 18 2021 - 20:25:42 EST


On 3/18/21 4:56 PM, Tian, Kevin wrote:
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
<longpeng2@xxxxxxxxxx>

-----Original Message-----
From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
Sent: Thursday, March 18, 2021 4:27 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
<longpeng2@xxxxxxxxxx>; Nadav Amit <nadav.amit@xxxxxxxxx>
Cc: chenjiashang <chenjiashang@xxxxxxxxxx>; David Woodhouse
<dwmw2@xxxxxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; LKML
<linux-kernel@xxxxxxxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Gonglei
(Arei)
<arei.gonglei@xxxxxxxxxx>; will@xxxxxxxxxx
Subject: RE: A problem of Intel IOMMU hardware ?

From: iommu <iommu-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx> On Behalf Of
Longpeng (Mike, Cloud Infrastructure Service Product Dept.)

2. Consider ensuring that the problem is not somehow related to
queued invalidations. Try to use __iommu_flush_iotlb() instead of
qi_flush_iotlb().


I tried to force to use __iommu_flush_iotlb(), but maybe something
wrong, the system crashed, so I prefer to lower the priority of this
operation.


The VT-d spec clearly says that register-based invalidation can be used only
when
queued-invalidations are not enabled. Intel-IOMMU driver doesn't provide
an
option to disable queued-invalidation though, when the hardware is
capable. If you
really want to try, tweak the code in intel_iommu_init_qi.


Hi Kevin,

Thanks to point out this. Do you have any ideas about this problem ? I tried
to descript the problem much clear in my reply to Alex, hope you could have
a look if you're interested.


btw I saw you used 4.18 kernel in this test. What about latest kernel?

Also one way to separate sw/hw bug is to trace the low level interface (e.g.,
qi_flush_iotlb) which actually sends invalidation descriptors to the IOMMU
hardware. Check the window between b) and c) and see whether the
software does the right thing as expected there.

Yes. It's better if we can reproduce this with the latest kernel which
has debugfs files to expose page tables and the invalidation queues etc.

Best regards,
baolu