RE: A problem of Intel IOMMU hardware ?

From: Tian, Kevin
Date: Thu Mar 18 2021 - 04:57:34 EST


> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@xxxxxxxxxx>
>
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > Sent: Thursday, March 18, 2021 4:27 PM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > <longpeng2@xxxxxxxxxx>; Nadav Amit <nadav.amit@xxxxxxxxx>
> > Cc: chenjiashang <chenjiashang@xxxxxxxxxx>; David Woodhouse
> > <dwmw2@xxxxxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; LKML
> > <linux-kernel@xxxxxxxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Gonglei
> (Arei)
> > <arei.gonglei@xxxxxxxxxx>; will@xxxxxxxxxx
> > Subject: RE: A problem of Intel IOMMU hardware ?
> >
> > > From: iommu <iommu-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> > > Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >
> > > > 2. Consider ensuring that the problem is not somehow related to
> > > > queued invalidations. Try to use __iommu_flush_iotlb() instead of
> > qi_flush_iotlb().
> > > >
> > >
> > > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > > wrong, the system crashed, so I prefer to lower the priority of this
> operation.
> > >
> >
> > The VT-d spec clearly says that register-based invalidation can be used only
> when
> > queued-invalidations are not enabled. Intel-IOMMU driver doesn't provide
> an
> > option to disable queued-invalidation though, when the hardware is
> capable. If you
> > really want to try, tweak the code in intel_iommu_init_qi.
> >
>
> Hi Kevin,
>
> Thanks to point out this. Do you have any ideas about this problem ? I tried
> to descript the problem much clear in my reply to Alex, hope you could have
> a look if you're interested.
>

btw I saw you used 4.18 kernel in this test. What about latest kernel?

Also one way to separate sw/hw bug is to trace the low level interface (e.g.,
qi_flush_iotlb) which actually sends invalidation descriptors to the IOMMU
hardware. Check the window between b) and c) and see whether the
software does the right thing as expected there.

Thanks
Kevin