RE: A problem of Intel IOMMU hardware ?

From: Tian, Kevin
Date: Thu Mar 18 2021 - 04:44:10 EST


> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@xxxxxxxxxx>
>
>
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
> > Sent: Thursday, March 18, 2021 4:27 PM
> > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > <longpeng2@xxxxxxxxxx>; Nadav Amit <nadav.amit@xxxxxxxxx>
> > Cc: chenjiashang <chenjiashang@xxxxxxxxxx>; David Woodhouse
> > <dwmw2@xxxxxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; LKML
> > <linux-kernel@xxxxxxxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Gonglei
> (Arei)
> > <arei.gonglei@xxxxxxxxxx>; will@xxxxxxxxxx
> > Subject: RE: A problem of Intel IOMMU hardware ?
> >
> > > From: iommu <iommu-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> > > Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> > >
> > > > 2. Consider ensuring that the problem is not somehow related to
> > > > queued invalidations. Try to use __iommu_flush_iotlb() instead of
> > qi_flush_iotlb().
> > > >
> > >
> > > I tried to force to use __iommu_flush_iotlb(), but maybe something
> > > wrong, the system crashed, so I prefer to lower the priority of this
> operation.
> > >
> >
> > The VT-d spec clearly says that register-based invalidation can be used only
> when
> > queued-invalidations are not enabled. Intel-IOMMU driver doesn't provide
> an
> > option to disable queued-invalidation though, when the hardware is
> capable. If you
> > really want to try, tweak the code in intel_iommu_init_qi.
> >
>
> Hi Kevin,
>
> Thanks to point out this. Do you have any ideas about this problem ? I tried
> to descript the problem much clear in my reply to Alex, hope you could have
> a look if you're interested.
>

I agree with Nadav. Looks this implies some stale paging structure cache entry
(e.g. PMD) is not invalidated properly. It's better if Baolu can reproduce this
problem in his local environment and then do more debug to identify whether
it's a software or hardware defect.

btw what is the device under test? Does it support ATS?

Thanks
Kevin