Re: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the dmabuf

From: Alex Williamson
Date: Wed May 17 2017 - 17:43:42 EST


On Tue, 16 May 2017 10:16:28 +0000
"Chen, Xiaoguang" <xiaoguang.chen@xxxxxxxxx> wrote:

> Hi Alex,
>
> >-----Original Message-----
> >From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> >Sent: Tuesday, May 16, 2017 1:44 AM
> >To: Chen, Xiaoguang <xiaoguang.chen@xxxxxxxxx>
> >Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx>; Tian, Kevin <kevin.tian@xxxxxxxxx>;
> >intel-gfx@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >zhenyuw@xxxxxxxxxxxxxxx; Lv, Zhiyuan <zhiyuan.lv@xxxxxxxxx>; intel-gvt-
> >dev@xxxxxxxxxxxxxxxxxxxxx; Wang, Zhi A <zhi.a.wang@xxxxxxxxx>
> >Subject: Re: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the dmabuf
> >
> >On Mon, 15 May 2017 03:36:50 +0000
> >"Chen, Xiaoguang" <xiaoguang.chen@xxxxxxxxx> wrote:
> >
> >> Hi Alex and Gerd,
> >>
> >> >-----Original Message-----
> >> >From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> >> >Sent: Saturday, May 13, 2017 12:38 AM
> >> >To: Gerd Hoffmann <kraxel@xxxxxxxxxx>
> >> >Cc: Chen, Xiaoguang <xiaoguang.chen@xxxxxxxxx>; Tian, Kevin
> >> ><kevin.tian@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; linux-
> >> >kernel@xxxxxxxxxxxxxxx; zhenyuw@xxxxxxxxxxxxxxx; Lv, Zhiyuan
> >> ><zhiyuan.lv@xxxxxxxxx>; intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx; Wang,
> >> >Zhi A <zhi.a.wang@xxxxxxxxx>
> >> >Subject: Re: [RFC PATCH 6/6] drm/i915/gvt: support QEMU getting the
> >> >dmabuf
> >> >
> >> >On Fri, 12 May 2017 11:12:05 +0200
> >> >Gerd Hoffmann <kraxel@xxxxxxxxxx> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> > If the contents of the framebuffer change or if the parameters of
> >> >> > the framebuffer change? I can't image that creating a new dmabuf
> >> >> > fd for every visual change within the framebuffer would be
> >> >> > efficient, but I don't have any concept of what a dmabuf actually does.
> >> >>
> >> >> Ok, some background:
> >> >>
> >> >> The drm subsystem has the concept of planes. The most important
> >> >> plane is the primary framebuffer (i.e. what gets scanned out to the
> >> >> physical display). The cursor is a plane too, and there can be
> >> >> additional overlay planes for stuff like video playback.
> >> >>
> >> >> Typically there are multiple planes in a system and only one of
> >> >> them gets scanned out to the crtc, i.e. the fbdev emulation creates
> >> >> one plane for the framebuffer console. The X-Server creates a
> >> >> plane too, and when you switch between X-Server and framebuffer
> >> >> console via ctrl-alt-fn the intel driver just reprograms the
> >> >> encoder to scan out the one or the other plane to the crtc.
> >> >>
> >> >> The dma-buf handed out by gvt is a reference to a plane. I think
> >> >> on the host side gvt can see only the active plane (from
> >> >> encoder/crtc register
> >> >> programming) not the inactive ones.
> >> >>
> >> >> The dma-buf can be imported as opengl texture and then be used to
> >> >> render the guest display to a host window. I think it is even
> >> >> possible to use the dma-buf as plane in the host drm driver and
> >> >> scan it out directly to a physical display. The actual framebuffer
> >> >> content stays in gpu memory all the time, the cpu never has to touch it.
> >> >>
> >> >> It is possible to cache the dma-buf handles, i.e. when the guest
> >> >> boots you'll get the first for the fbcon plane, when the x-server
> >> >> starts the second for the x-server framebuffer, and when the user
> >> >> switches to the text console via ctrl-alt-fn you can re-use the
> >> >> fbcon dma-buf you already have.
> >> >>
> >> >> The caching becomes more important for good performance when the
> >> >> guest uses pageflipping (wayland does): define two planes, render
> >> >> into one while displaying the other, then flip the two for a atomic
> >> >> display update.
> >> >>
> >> >> The caching also makes it a bit difficult to create a good interface.
> >> >> So, the current patch set creates:
> >> >>
> >> >> (a) A way to query the active planes (ioctl
> >> >> INTEL_VGPU_QUERY_DMABUF added by patch 5/6 of this series).
> >> >> (b) A way to create a dma-buf for the active plane (ioctl
> >> >> INTEL_VGPU_GENERATE_DMABUF).
> >> >>
> >> >> Typical userspace workflow is to first query the plane, then check
> >> >> if it already has a dma-buf for it, and if not create one.
> >> >
> >> >Thank you! This is immensely helpful!
> >> >
> >> >> > What changes to the framebuffer require a new dmabuf fd?
> >> >> > Shouldn't the user query the parameters of the framebuffer
> >> >> > through a dmabuf fd and shouldn't the dmabuf fd have some
> >> >> > signaling mechanism to the user (eventfd perhaps) to notify the user to re-
> >evaluate the parameters?
> >> >>
> >> >> dma-bufs don't support that, they are really just a handle to a
> >> >> piece of memory, all metadata (format, size) most be communicated by
> >other means.
> >> >>
> >> >> > Otherwise are you imagining that the user polls the vfio region?
> >> >>
> >> >> Hmm, notification support would probably a good reason to have a
> >> >> separate file handle to manage the dma-bufs (instead of using
> >> >> driver-specific ioctls on the vfio fd), because the driver could
> >> >> also use the management fd for notifications then.
> >> >
> >> >I like this idea of a separate control fd for dmabufs, it provides
> >> >not only a central management point, but also a nice abstraction for
> >> >the vfio device specific interface. We potentially only need a
> >> >single
> >> >VFIO_DEVICE_GET_DMABUF_MGR_FD() ioctl to get a dmabuf management fd
> >> >(perhaps with a type parameter, ex. GFX) where maybe we could have
> >> >vfio-core incorporate this reference into the group lifecycle, so the
> >> >vendor driver only needs to fdget/put this manager fd for the various
> >> >plane dmabuf fds spawned in order to get core-level reference counting.
> >> Following is my understanding of the management fd idea:
> >> 1) QEMU will call VFIO_DEVICE_GET_DMABUF_MGR_FD() ioctl to create a fd
> >and saved the fd in vfio group while initializing the vfio.
> >
> >Ideally there'd be kernel work here too if we want vfio-core to incorporate
> >lifecycle of this fd into the device/group/container lifecycle. Maybe we even
> >want to generalize it further to something like VFIO_DEVICE_GET_FD which takes
> >a parameter of what type of FD to get, GFX_DMABUF_MGR_FD in this case. vfio-
> >core would probably allocate the fd, tap into the release hook for reference
> >counting and pass it to the vfio_device_ops (mdev vendor driver in this case) to
> >attach further.
> I tried to implement this today and now it functionally worked.
> I am still a little confuse of how to tap the fd into the release hook of device/group/container.
> I tried to create the fd in vfio core but found it is difficult to get the file operations for the fd, the file operations should be supplied by vendor drivers.

I'm not fully convinced there's benefit to having vfio-core attempt to
do this, I was just hoping to avoid each vendor driver needing to
implement their own reference counting. I don't think vfio-core wants
to get into tracking each ioctl for each vendor specific fd type to
know which create new references and which don't. Perhaps we'll come
up with easier ways to do this as we go.

> So the fd is created in kvmgt.c for now.
> Below is part of the codes:
>
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 389f072..d0649ba 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -41,6 +41,7 @@
> #include <linux/kvm_host.h>
> #include <linux/vfio.h>
> #include <linux/mdev.h>
> +#include <linux/anon_inodes.h>
>
> #include "i915_drv.h"
> #include "gvt.h"
> @@ -524,6 +525,63 @@ static int intel_vgpu_reg_init_opregion(struct intel_vgpu *vgpu)
> return ret;
> }
>
> +static int intel_vgpu_dmabuf_mgr_fd_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + WARN_ON(1);

A user can abuse this, simply return error.

> +
> + return 0;
> +}
> +
> +static int intel_vgpu_dmabuf_mgr_fd_release(struct inode *inode, struct file *filp)
> +{
> + struct intel_vgpu *vgpu = filp->private_data;
> +
> + if (vgpu->vdev.vfio_device != NULL)
> + vfio_device_put(vgpu->vdev.vfio_device);

When does the case occur where we don't have a vfio_device? This looks
a bit like a warning flag that reference counting isn't handled
properly.

> +
> + return 0;
> +}
> +
> +static long intel_vgpu_dmabuf_mgr_fd_ioctl(struct file *filp,
> + unsigned int ioctl, unsigned long arg)
> +{
> + struct intel_vgpu *vgpu = filp->private_data;
> + int minsz;
> + struct intel_vgpu_dmabuf dmabuf;
> + int ret;
> + struct fd f;
> + f = fdget(dmabuf.fd);
> + minsz = offsetofend(struct intel_vgpu_dmabuf, tiled);
> + if (copy_from_user(&dmabuf, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (ioctl == INTEL_VGPU_QUERY_DMABUF)
> + ret = intel_gvt_ops->vgpu_query_dmabuf(vgpu, &dmabuf);
> + else if (ioctl == INTEL_VGPU_GENERATE_DMABUF)
> + ret = intel_gvt_ops->vgpu_generate_dmabuf(vgpu, &dmabuf);

Why do we need vendor specific ioctls here? Aren't querying the
current plane and getting an fd for that plane very generic concepts?
Is the resulting dmabuf Intel specific?

> + else {
> + fdput(f);
> + gvt_vgpu_err("unsupported dmabuf operation\n");
> + return -EINVAL;
> + }
> +
> + if (ret != 0) {
> + fdput(f);
> + gvt_vgpu_err("gvt-g get dmabuf failed:%d\n", ret);
> + return -EINVAL;
> + }
> + fdput(f);
> +
> + return copy_to_user((void __user *)arg, &dmabuf, minsz) ? -EFAULT : 0;
> +}
> +
> +static const struct file_operations intel_vgpu_dmabuf_mgr_fd_ops = {
> + .release = intel_vgpu_dmabuf_mgr_fd_release,
> + .unlocked_ioctl = intel_vgpu_dmabuf_mgr_fd_ioctl,
> + .mmap = intel_vgpu_dmabuf_mgr_fd_mmap,
> + .llseek = noop_llseek,
> +};
> +
> static int intel_vgpu_create(struct kobject *kobj, struct mdev_device *mdev)
> {
> struct intel_vgpu *vgpu = NULL;
> @@ -1259,6 +1317,31 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, unsigned int cmd,
> } else if (cmd == VFIO_DEVICE_RESET) {
> intel_gvt_ops->vgpu_reset(vgpu);
> return 0;
> + } else if (cmd == VFIO_DEVICE_GET_FD) {
> + struct vfio_fd vfio_fd;
> + int fd;
> + struct vfio_device *device;
> +
> + minsz = offsetofend(struct vfio_fd, fd);
> + if (copy_from_user(&vfio_fd, (void __user *)arg, minsz))
> + return -EINVAL;
> +
> + if (vfio_fd.argsz < minsz)
> + return -EINVAL;
> +
> + fd = anon_inode_getfd("vfio_dmabuf_mgr_fd", &intel_vgpu_dmabuf_mgr_fd_ops,
> + vgpu, O_RDWR | O_CLOEXEC);
> + if (fd < 0)
> + return -EINVAL;
> +
> + vfio_fd.fd = fd;
> + device = vfio_device_get_from_dev(mdev_dev(mdev));
> + if (device == NULL)
> + gvt_vgpu_err("kvmgt: vfio device is null\n");
> + else
> + vgpu->vdev.vfio_device = device;
> +
> + return copy_to_user((void __user *)arg, &vfio_fd, minsz) ? -EFAULT : 0;
> }
>
> return 0;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 519eff3..98be2e0 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -484,6 +485,20 @@ struct vfio_pci_hot_reset {
>
> #define VFIO_DEVICE_PCI_HOT_RESET _IO(VFIO_TYPE, VFIO_BASE + 13)
>
> +/**
> + * VFIO_DEVICE_GET_FD - _IOW(VFIO_TYPE, VFIO_BASE + 21, struct vfio_fd)
> + *
> + * Create a fd for a vfio device.
> + * This fd can be used for various purpose.
> + */
> +struct vfio_fd {
> + __u32 argsz;
> + __u32 flags;
> + /* out */
> + __u32 fd;
> +};
> +#define VFIO_DEVICE_GET_FD _IO(VFIO_TYPE, VFIO_BASE + 14)


The idea was that we pass some sort of type to VFIO_DEVICE_GET_FD, for
instance we might ask for a DEVICE_FD_GRAPHICS_DMABUF and the vfio bus
driver (mdev vendor driver) would test whether it supports that type of
thing and either return an fd or error. We can return the fd the same
way we do for VFIO_DEVICE_GET_FD. For instance the user should do
something like:

dmabuf_fd = ioctl(device_fd,
VFIO_DEVICE_GET_FD, DEVICE_FD_GRAPHICS_DMABUF);
if (dmabuf_fd < 0)
/* not supported... */
else
/* do stuff */

Thanks,
Alex