Re: [PATCH v2 1/3] dma-buf: Add ioctl to query mmap coherency/cache info

From: Daniel Vetter
Date: Wed Sep 07 2022 - 12:43:51 EST


On Tue, Aug 16, 2022 at 06:50:54PM +0200, Christian König wrote:
> Am 16.08.22 um 16:26 schrieb Rob Clark:
> > On Tue, Aug 16, 2022 at 1:27 AM Christian König
> > <christian.koenig@xxxxxxx> wrote:
> > > Am 15.08.22 um 23:15 schrieb Rob Clark:
> > > > From: Rob Clark <robdclark@xxxxxxxxxxxx>
> > > >
> > > > This is a fairly narrowly focused interface, providing a way for a VMM
> > > > in userspace to tell the guest kernel what pgprot settings to use when
> > > > mapping a buffer to guest userspace.
> > > >
> > > > For buffers that get mapped into guest userspace, virglrenderer returns
> > > > a dma-buf fd to the VMM (crosvm or qemu). In addition to mapping the
> > > > pages into the guest VM, it needs to report to drm/virtio in the guest
> > > > the cache settings to use for guest userspace. In particular, on some
> > > > architectures, creating aliased mappings with different cache attributes
> > > > is frowned upon, so it is important that the guest mappings have the
> > > > same cache attributes as any potential host mappings.
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
> > > > ---
> > > > v2: Combine with coherency, as that is a related concept.. and it is
> > > > relevant to the VMM whether coherent access without the SYNC ioctl
> > > > is possible; set map_info at export time to make it more clear
> > > > that it applies for the lifetime of the dma-buf (for any mmap
> > > > created via the dma-buf)
> > > Well, exactly that's a conceptual NAK from my side.
> > >
> > > The caching information can change at any time. For CPU mappings even
> > > without further notice from the exporter.
> > You should look before you criticize, as I left you a way out.. the
> > idea was that DMA_BUF_MAP_INCOHERENT should indicate that the buffer
> > cannot be mapped to the guest. We could ofc add more DMA_BUF_MAP_*
> > values if something else would suit you better. But the goal is to
> > give the VMM enough information to dtrt, or return an error if mapping
> > to guest is not possible. That seems better than assuming mapping to
> > guest will work and guessing about cache attrs
>
> Well I'm not rejecting the implementation, I'm rejecting this from the
> conceptual point of view.
>
> We intentional gave the exporter full control over the CPU mappings. This
> approach here breaks that now.
>
> I haven't seen the full detailed reason why we should do that and to be
> honest KVM seems to mess with things it is not supposed to touch.
>
> For example the page reference count of mappings marked with VM_IO is a
> complete no-go. This is a very strong evidence that this was based on rather
> dangerous halve knowledge about the background of the handling here.

Wut?

KVM grabs page references of VM_IO vma? I thought the issue was that we
still had some bo/dma-buf vma that didn't set either VM_IO or VM_PFNMAP,
and not that kvm was just outright breaking every core mm contract there
is.

Is this really what's going on in that other thread about "fixing" ttm?
-Daniel

> So as long as I don't see a full explanation why KVM is grabbing reference
> to pages while faulting them and why we manually need to forward the caching
> while the hardware documentation indicates otherwise I will be rejecting
> this whole approach.
>
> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> > > If the hardware can't use the caching information from the host CPU page
> > > tables directly then that pretty much completely breaks the concept that
> > > the exporter is responsible for setting up those page tables.
> > >
> > > Regards,
> > > Christian.
> > >
> > > > drivers/dma-buf/dma-buf.c | 63 +++++++++++++++++++++++++++------
> > > > include/linux/dma-buf.h | 11 ++++++
> > > > include/uapi/linux/dma-buf.h | 68 ++++++++++++++++++++++++++++++++++++
> > > > 3 files changed, 132 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > > index 32f55640890c..262c4706f721 100644
> > > > --- a/drivers/dma-buf/dma-buf.c
> > > > +++ b/drivers/dma-buf/dma-buf.c
> > > > @@ -125,6 +125,32 @@ static struct file_system_type dma_buf_fs_type = {
> > > > .kill_sb = kill_anon_super,
> > > > };
> > > >
> > > > +static int __dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + /* check if buffer supports mmap */
> > > > + if (!dmabuf->ops->mmap)
> > > > + return -EINVAL;
> > > > +
> > > > + ret = dmabuf->ops->mmap(dmabuf, vma);
> > > > +
> > > > + /*
> > > > + * If the exporter claims to support coherent access, ensure the
> > > > + * pgprot flags match the claim.
> > > > + */
> > > > + if ((dmabuf->map_info != DMA_BUF_MAP_INCOHERENT) && !ret) {
> > > > + pgprot_t wc_prot = pgprot_writecombine(vma->vm_page_prot);
> > > > + if (dmabuf->map_info == DMA_BUF_COHERENT_WC) {
> > > > + WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) != pgprot_val(wc_prot));
> > > > + } else {
> > > > + WARN_ON_ONCE(pgprot_val(vma->vm_page_prot) == pgprot_val(wc_prot));
> > > > + }
> > > > + }
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
> > > > {
> > > > struct dma_buf *dmabuf;
> > > > @@ -134,16 +160,12 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
> > > >
> > > > dmabuf = file->private_data;
> > > >
> > > > - /* check if buffer supports mmap */
> > > > - if (!dmabuf->ops->mmap)
> > > > - return -EINVAL;
> > > > -
> > > > /* check for overflowing the buffer's size */
> > > > if (vma->vm_pgoff + vma_pages(vma) >
> > > > dmabuf->size >> PAGE_SHIFT)
> > > > return -EINVAL;
> > > >
> > > > - return dmabuf->ops->mmap(dmabuf, vma);
> > > > + return __dma_buf_mmap(dmabuf, vma);
> > > > }
> > > >
> > > > static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
> > > > @@ -326,6 +348,27 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const char __user *buf)
> > > > return 0;
> > > > }
> > > >
> > > > +static long dma_buf_info(struct dma_buf *dmabuf, void __user *uarg)
> > > > +{
> > > > + struct dma_buf_info arg;
> > > > +
> > > > + if (copy_from_user(&arg, uarg, sizeof(arg)))
> > > > + return -EFAULT;
> > > > +
> > > > + switch (arg.param) {
> > > > + case DMA_BUF_INFO_MAP_INFO:
> > > > + arg.value = dmabuf->map_info;
> > > > + break;
> > > > + default:
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + if (copy_to_user(uarg, &arg, sizeof(arg)))
> > > > + return -EFAULT;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > static long dma_buf_ioctl(struct file *file,
> > > > unsigned int cmd, unsigned long arg)
> > > > {
> > > > @@ -369,6 +412,9 @@ static long dma_buf_ioctl(struct file *file,
> > > > case DMA_BUF_SET_NAME_B:
> > > > return dma_buf_set_name(dmabuf, (const char __user *)arg);
> > > >
> > > > + case DMA_BUF_IOCTL_INFO:
> > > > + return dma_buf_info(dmabuf, (void __user *)arg);
> > > > +
> > > > default:
> > > > return -ENOTTY;
> > > > }
> > > > @@ -530,6 +576,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > > > dmabuf->priv = exp_info->priv;
> > > > dmabuf->ops = exp_info->ops;
> > > > dmabuf->size = exp_info->size;
> > > > + dmabuf->map_info = exp_info->map_info;
> > > > dmabuf->exp_name = exp_info->exp_name;
> > > > dmabuf->owner = exp_info->owner;
> > > > spin_lock_init(&dmabuf->name_lock);
> > > > @@ -1245,10 +1292,6 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
> > > > if (WARN_ON(!dmabuf || !vma))
> > > > return -EINVAL;
> > > >
> > > > - /* check if buffer supports mmap */
> > > > - if (!dmabuf->ops->mmap)
> > > > - return -EINVAL;
> > > > -
> > > > /* check for offset overflow */
> > > > if (pgoff + vma_pages(vma) < pgoff)
> > > > return -EOVERFLOW;
> > > > @@ -1262,7 +1305,7 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
> > > > vma_set_file(vma, dmabuf->file);
> > > > vma->vm_pgoff = pgoff;
> > > >
> > > > - return dmabuf->ops->mmap(dmabuf, vma);
> > > > + return __dma_buf_mmap(dmabuf, vma);
> > > > }
> > > > EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
> > > >
> > > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> > > > index 71731796c8c3..37923c8d5c24 100644
> > > > --- a/include/linux/dma-buf.h
> > > > +++ b/include/linux/dma-buf.h
> > > > @@ -23,6 +23,8 @@
> > > > #include <linux/dma-fence.h>
> > > > #include <linux/wait.h>
> > > >
> > > > +#include <uapi/linux/dma-buf.h>
> > > > +
> > > > struct device;
> > > > struct dma_buf;
> > > > struct dma_buf_attachment;
> > > > @@ -307,6 +309,13 @@ struct dma_buf {
> > > > */
> > > > size_t size;
> > > >
> > > > + /**
> > > > + * @map_info:
> > > > + *
> > > > + * CPU mapping/coherency information for the buffer.
> > > > + */
> > > > + enum dma_buf_map_info map_info;
> > > > +
> > > > /**
> > > > * @file:
> > > > *
> > > > @@ -533,6 +542,7 @@ struct dma_buf_attachment {
> > > > * @ops: Attach allocator-defined dma buf ops to the new buffer
> > > > * @size: Size of the buffer - invariant over the lifetime of the buffer
> > > > * @flags: mode flags for the file
> > > > + * @map_info: CPU mapping/coherency information for the buffer
> > > > * @resv: reservation-object, NULL to allocate default one
> > > > * @priv: Attach private data of allocator to this buffer
> > > > *
> > > > @@ -545,6 +555,7 @@ struct dma_buf_export_info {
> > > > const struct dma_buf_ops *ops;
> > > > size_t size;
> > > > int flags;
> > > > + enum dma_buf_map_info map_info;
> > > > struct dma_resv *resv;
> > > > void *priv;
> > > > };
> > > > diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
> > > > index b1523cb8ab30..07b403ffdb43 100644
> > > > --- a/include/uapi/linux/dma-buf.h
> > > > +++ b/include/uapi/linux/dma-buf.h
> > > > @@ -85,6 +85,72 @@ struct dma_buf_sync {
> > > >
> > > > #define DMA_BUF_NAME_LEN 32
> > > >
> > > > +/**
> > > > + * enum dma_buf_map_info - CPU mapping info
> > > > + *
> > > > + * This enum describes coherency of a userspace mapping of the dmabuf.
> > > > + *
> > > > + * Importing devices should check dma_buf::map_info flag and reject an
> > > > + * import if unsupported. For example, if the exporting device uses
> > > > + * @DMA_BUF_COHERENT_CACHED but the importing device does not support
> > > > + * CPU cache coherency, the dma-buf import should fail.
> > > > + */
> > > > +enum dma_buf_map_info {
> > > > + /**
> > > > + * @DMA_BUF_MAP_INCOHERENT: CPU mapping is incoherent.
> > > > + *
> > > > + * Use of DMA_BUF_IOCTL_SYNC is required for CPU managed coherenency.
> > > > + */
> > > > + DMA_BUF_MAP_INCOHERENT,
> > > > +
> > > > + /**
> > > > + * @DMA_BUF_COHERENT_WC: CPU mapping is coherent but not cached.
> > > > + *
> > > > + * A cpu mmap'ing is coherent, and DMA_BUF_IOCTL_SYNC is not required.
> > > > + * However fences may be still required for synchronizing access. Ie.
> > > > + * coherency can only be relied upon by an explicit-fencing userspace.
> > > > + * An implicit-sync userspace must still use DMA_BUF_IOCTL_SYNC.
> > > > + *
> > > > + * The cpu mapping is writecombine.
> > > > + */
> > > > + DMA_BUF_COHERENT_WC,
> > > > +
> > > > + /**
> > > > + * @DMA_BUF_COHERENT_CACHED: CPU mapping is coherent and CPU cached.
> > > > + *
> > > > + * A cpu mmap'ing is coherent, and DMA_BUF_IOCTL_SYNC is not required.
> > > > + * However fences may be still required for synchronizing access. Ie.
> > > > + * coherency can only be relied upon by an explicit-fencing userspace.
> > > > + * An implicit-sync userspace must still use DMA_BUF_IOCTL_SYNC.
> > > > + *
> > > > + * The cpu mapping is cached.
> > > > + */
> > > > + DMA_BUF_COHERENT_CACHED,
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct dma_buf_info - Query info about the buffer.
> > > > + */
> > > > +struct dma_buf_info {
> > > > +
> > > > +#define DMA_BUF_INFO_MAP_INFO 1
> > > > +
> > > > + /**
> > > > + * @param: Which param to query
> > > > + *
> > > > + * DMA_BUF_INFO_MAP_INFO:
> > > > + * Returns enum dma_buf_map_info, describing the coherency and
> > > > + * caching of a CPU mapping of the buffer.
> > > > + */
> > > > + __u32 param;
> > > > + __u32 pad;
> > > > +
> > > > + /**
> > > > + * @value: Return value of the query.
> > > > + */
> > > > + __u64 value;
> > > > +};
> > > > +
> > > > #define DMA_BUF_BASE 'b'
> > > > #define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
> > > >
> > > > @@ -95,4 +161,6 @@ struct dma_buf_sync {
> > > > #define DMA_BUF_SET_NAME_A _IOW(DMA_BUF_BASE, 1, __u32)
> > > > #define DMA_BUF_SET_NAME_B _IOW(DMA_BUF_BASE, 1, __u64)
> > > >
> > > > +#define DMA_BUF_IOCTL_INFO _IOWR(DMA_BUF_BASE, 2, struct dma_buf_info)
> > > > +
> > > > #endif
>

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch