Re: [PATCH] vhost/vsock: add IOTLB API support

From: Peter Xu
Date: Tue Nov 03 2020 - 14:46:22 EST


On Tue, Nov 03, 2020 at 05:04:23PM +0800, Jason Wang wrote:
>
> On 2020/11/3 上午1:11, Stefano Garzarella wrote:
> > On Fri, Oct 30, 2020 at 07:44:43PM +0800, Jason Wang wrote:
> > >
> > > On 2020/10/30 下午6:54, Stefano Garzarella wrote:
> > > > On Fri, Oct 30, 2020 at 06:02:18PM +0800, Jason Wang wrote:
> > > > >
> > > > > On 2020/10/30 上午1:43, Stefano Garzarella wrote:
> > > > > > This patch enables the IOTLB API support for vhost-vsock devices,
> > > > > > allowing the userspace to emulate an IOMMU for the guest.
> > > > > >
> > > > > > These changes were made following vhost-net, in details this patch:
> > > > > > - exposes VIRTIO_F_ACCESS_PLATFORM feature and inits the iotlb
> > > > > >   device if the feature is acked
> > > > > > - implements VHOST_GET_BACKEND_FEATURES and
> > > > > >   VHOST_SET_BACKEND_FEATURES ioctls
> > > > > > - calls vq_meta_prefetch() before vq processing to prefetch vq
> > > > > >   metadata address in IOTLB
> > > > > > - provides .read_iter, .write_iter, and .poll callbacks for the
> > > > > >   chardev; they are used by the userspace to exchange IOTLB messages
> > > > > >
> > > > > > This patch was tested with QEMU and a patch applied [1] to fix a
> > > > > > simple issue:
> > > > > >     $ qemu -M q35,accel=kvm,kernel-irqchip=split \
> > > > > >            -drive file=fedora.qcow2,format=qcow2,if=virtio \
> > > > > >            -device intel-iommu,intremap=on \
> > > > > >            -device vhost-vsock-pci,guest-cid=3,iommu_platform=on
> > > > >
> > > > >
> > > > > Patch looks good, but a question:
> > > > >
> > > > > It looks to me you don't enable ATS which means vhost won't
> > > > > get any invalidation request or did I miss anything?
> > > > >
> > > >
> > > > You're right, I didn't see invalidation requests, only miss and
> > > > updates.
> > > > Now I have tried to enable 'ats' and 'device-iotlb' but I still
> > > > don't see any invalidation.
> > > >
> > > > How can I test it? (Sorry but I don't have much experience yet
> > > > with vIOMMU)
> > >
> > >
> > > I guess it's because the batched unmap. Maybe you can try to use
> > > "intel_iommu=strict" in guest kernel command line to see if it
> > > works.
> > >
> > > Btw, make sure the qemu contains the patch [1]. Otherwise ATS won't
> > > be enabled for recent Linux Kernel in the guest.
> >
> > The problem was my kernel, it was built with a tiny configuration.
> > Using fedora stock kernel I can see the 'invalidate' requests, but I
> > also had the following issues.
> >
> > Do they make you ring any bells?
> >
> > $ ./qemu -m 4G -smp 4 -M q35,accel=kvm,kernel-irqchip=split \
> >     -drive file=fedora.qcow2,format=qcow2,if=virtio \
> >     -device intel-iommu,intremap=on,device-iotlb=on \
> >     -device vhost-vsock-pci,guest-cid=6,iommu_platform=on,ats=on,id=v1
> >
> >     qemu-system-x86_64: vtd_iova_to_slpte: detected IOVA overflow    
> > (iova=0x1d40000030c0)
>
>
> It's a hint that IOVA exceeds the AW. It might be worth to check whether the
> missed IOVA reported from IOTLB is legal.

Yeah. By default the QEMU vIOMMU should only support 39bits width for guest
iova address space. To extend it, we can use:

-device intel-iommu,aw-bits=48

So we'll enable 4-level iommu pgtable.

Here the iova is obvious longer than this, so it'll be interesting to know why
that iova is allocated in the guest driver since the driver should know somehow
that this iova is beyond what's supported (guest iommu driver should be able to
probe viommu capability on this width information too).

--
Peter Xu