RE: A problem of Intel IOMMU hardware ?

From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Date: Thu Mar 18 2021 - 05:26:36 EST




> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> Sent: Thursday, March 18, 2021 4:56 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@xxxxxxxxxx>; Nadav Amit <nadav.amit@xxxxxxxxx>
> Cc: chenjiashang <chenjiashang@xxxxxxxxxx>; David Woodhouse
> <dwmw2@xxxxxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; LKML
> <linux-kernel@xxxxxxxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Gonglei (Arei)
> <arei.gonglei@xxxxxxxxxx>; will@xxxxxxxxxx
> Subject: RE: A problem of Intel IOMMU hardware ?
>
> > From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > <longpeng2@xxxxxxxxxx>
> >
> > > -----Original Message-----
> > > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > > Sent: Thursday, March 18, 2021 4:27 PM
> > > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > <longpeng2@xxxxxxxxxx>; Nadav Amit <nadav.amit@xxxxxxxxx>
> > > Cc: chenjiashang <chenjiashang@xxxxxxxxxx>; David Woodhouse
> > > <dwmw2@xxxxxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; LKML
> > > <linux-kernel@xxxxxxxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Gonglei
> > (Arei)
> > > <arei.gonglei@xxxxxxxxxx>; will@xxxxxxxxxx
> > > Subject: RE: A problem of Intel IOMMU hardware ?
> > >
> > > > From: iommu <iommu-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx> On Behalf
> > > > Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > > >
> > > > > 2. Consider ensuring that the problem is not somehow related to
> > > > > queued invalidations. Try to use __iommu_flush_iotlb() instead
> > > > > of
> > > qi_flush_iotlb().
> > > > >
> > > >
> > > > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > > > wrong, the system crashed, so I prefer to lower the priority of
> > > > this
> > operation.
> > > >
> > >
> > > The VT-d spec clearly says that register-based invalidation can be
> > > used only
> > when
> > > queued-invalidations are not enabled. Intel-IOMMU driver doesn't
> > > provide
> > an
> > > option to disable queued-invalidation though, when the hardware is
> > capable. If you
> > > really want to try, tweak the code in intel_iommu_init_qi.
> > >
> >
> > Hi Kevin,
> >
> > Thanks to point out this. Do you have any ideas about this problem ? I
> > tried to descript the problem much clear in my reply to Alex, hope you
> > could have a look if you're interested.
> >
>
> btw I saw you used 4.18 kernel in this test. What about latest kernel?
>

Not test yet. It's hard to upgrade kernel in our environment.

> Also one way to separate sw/hw bug is to trace the low level interface (e.g.,
> qi_flush_iotlb) which actually sends invalidation descriptors to the IOMMU
> hardware. Check the window between b) and c) and see whether the software does
> the right thing as expected there.
>

We add some log in iommu driver these days, the software seems fine. But we
didn't look inside the qi_submit_sync yet, I'll try it tonight.

> Thanks
> Kevin