Re: [Xen-devel] [PATCH v2 5/9] xen/gntdev: Allow mappings for DMA buffers

From: Stefano Stabellini
Date: Fri Jun 08 2018 - 13:59:17 EST


On Fri, 8 Jun 2018, Oleksandr Andrushchenko wrote:
> On 06/08/2018 12:46 AM, Boris Ostrovsky wrote:
> > (Stefano, question for you at the end)
> >
> > On 06/07/2018 02:39 AM, Oleksandr Andrushchenko wrote:
> > > On 06/07/2018 12:19 AM, Boris Ostrovsky wrote:
> > > > On 06/06/2018 04:14 AM, Oleksandr Andrushchenko wrote:
> > > > > On 06/04/2018 11:12 PM, Boris Ostrovsky wrote:
> > > > > > On 06/01/2018 07:41 AM, Oleksandr Andrushchenko wrote:
> > > > > > @@ -121,8 +146,27 @@ static void gntdev_free_map(struct grant_map
> > > > > > *map)
> > > > > > ÂÂÂÂÂÂ if (map == NULL)
> > > > > > ÂÂÂÂÂÂÂÂÂÂ return;
> > > > > > ÂÂ +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > *Option 1: kfree(map->frames);*
> > > > > > +ÂÂÂ if (map->dma_vaddr) {
> > > > > > +ÂÂÂÂÂÂÂ struct gnttab_dma_alloc_args args;
> > > > > > +
> > > > > > +ÂÂÂÂÂÂÂ args.dev = map->dma_dev;
> > > > > > +ÂÂÂÂÂÂÂ args.coherent = map->dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> > > > > > +ÂÂÂÂÂÂÂ args.nr_pages = map->count;
> > > > > > +ÂÂÂÂÂÂÂ args.pages = map->pages;
> > > > > > +ÂÂÂÂÂÂÂ args.frames = map->frames;
> > > > > > +ÂÂÂÂÂÂÂ args.vaddr = map->dma_vaddr;
> > > > > > +ÂÂÂÂÂÂÂ args.dev_bus_addr = map->dma_bus_addr;
> > > > > > +
> > > > > > +ÂÂÂÂÂÂÂ gnttab_dma_free_pages(&args);
> > > *Option 2: kfree(map->frames);*
> > > > > > +ÂÂÂ } else
> > > > > > +#endif
> > > > > > ÂÂÂÂÂÂ if (map->pages)
> > > > > > ÂÂÂÂÂÂÂÂÂÂ gnttab_free_pages(map->count, map->pages);
> > > > > > +
> > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > +ÂÂÂ kfree(map->frames);
> > > > > > +#endif
> > > > > >
> > > > > > Can this be done under if (map->dma_vaddr) ?
> > > > > > ÂÂ In other words, is it
> > > > > > possible for dma_vaddr to be NULL and still have unallocated frames
> > > > > > pointer?
> > > > > It is possible to have vaddr == NULL and frames != NULL as we
> > > > > allocate frames outside of gnttab_dma_alloc_pages which
> > > > > may fail. Calling kfree on NULL pointer is safe,
> > > > I am not questioning safety of the code, I would like avoid another
> > > > ifdef.
> > > Ah, I now understand, so you are asking if we can have
> > > that kfree(map->frames); in the place *Option 2* I marked above.
> > > Unfortunately no: map->frames is allocated before we try to
> > > allocate DMA memory, e.g. before dma_vaddr is set:
> > > [...]
> > > ÂÂÂ ÂÂÂ add->frames = kcalloc(count, sizeof(add->frames[0]),
> > > ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂÂÂ GFP_KERNEL);
> > > ÂÂÂ ÂÂÂ if (!add->frames)
> > > ÂÂÂ ÂÂÂ ÂÂÂ goto err;
> > >
> > > [...]
> > > ÂÂÂ ÂÂÂ if (gnttab_dma_alloc_pages(&args))
> > > ÂÂÂ ÂÂÂ ÂÂÂ goto err;
> > >
> > > ÂÂÂ ÂÂÂ add->dma_vaddr = args.vaddr;
> > > [...]
> > > err:
> > > ÂÂÂ gntdev_free_map(add);
> > >
> > > So, it is possible to enter gntdev_free_map with
> > > frames != NULL and dma_vaddr == NULL. Option 1 above cannot be used
> > > as map->frames is needed for gnttab_dma_free_pages(&args);
> > > and Option 2 cannot be used as frames != NULL and dma_vaddr == NULL.
> > > Thus, I think that unfortunately we need that #ifdef.
> > > Option 3 below can also be considered, but that seems to be not good
> > > as we free resources in different places which looks inconsistent.
> >
> > I was only thinking of option 2. But if it is possible to have frames !=
> > NULL and dma_vaddr == NULL then perhaps we indeed will have to live with
> > the extra ifdef.
> ok
> >
> > > Sorry if I'm still missing your point.
> > > > > so
> > > > > I see no reason to change this code.
> > > > > > > ÂÂÂÂÂÂ kfree(map->pages);
> > > > > > > ÂÂÂÂÂÂ kfree(map->grants);
> > > > > > > ÂÂÂÂÂÂ kfree(map->map_ops);
> > > > > > > @@ -132,7 +176,8 @@ static void gntdev_free_map(struct grant_map
> > > > > > > *map)
> > > > > > > ÂÂÂÂÂÂ kfree(map);
> > > > > > > ÂÂ }
> > > > > > > ÂÂ -static struct grant_map *gntdev_alloc_map(struct gntdev_priv
> > > > > > > *priv, int count)
> > > > > > > +static struct grant_map *gntdev_alloc_map(struct gntdev_priv
> > > > > > > *priv,
> > > > > > > int count,
> > > > > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int dma_flags)
> > > > > > > ÂÂ {
> > > > > > > ÂÂÂÂÂÂ struct grant_map *add;
> > > > > > > ÂÂÂÂÂÂ int i;
> > > > > > > @@ -155,6 +200,37 @@ static struct grant_map
> > > > > > > *gntdev_alloc_map(struct gntdev_priv *priv, int count)
> > > > > > > ÂÂÂÂÂÂÂÂÂÂ NULL == add->pages)
> > > > > > > ÂÂÂÂÂÂÂÂÂÂ goto err;
> > > > > > > ÂÂ +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > > +ÂÂÂ add->dma_flags = dma_flags;
> > > > > > > +
> > > > > > > +ÂÂÂ /*
> > > > > > > +ÂÂÂÂ * Check if this mapping is requested to be backed
> > > > > > > +ÂÂÂÂ * by a DMA buffer.
> > > > > > > +ÂÂÂÂ */
> > > > > > > +ÂÂÂ if (dma_flags & (GNTDEV_DMA_FLAG_WC |
> > > > > > > GNTDEV_DMA_FLAG_COHERENT)) {
> > > > > > > +ÂÂÂÂÂÂÂ struct gnttab_dma_alloc_args args;
> > > > > > > +
> > > > > > > +ÂÂÂÂÂÂÂ add->frames = kcalloc(count, sizeof(add->frames[0]),
> > > > > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ GFP_KERNEL);
> > > > > > > +ÂÂÂÂÂÂÂ if (!add->frames)
> > > > > > > +ÂÂÂÂÂÂÂÂÂÂÂ goto err;
> > > > > > > +
> > > > > > > +ÂÂÂÂÂÂÂ /* Remember the device, so we can free DMA memory. */
> > > > > > > +ÂÂÂÂÂÂÂ add->dma_dev = priv->dma_dev;
> > > > > > > +
> > > > > > > +ÂÂÂÂÂÂÂ args.dev = priv->dma_dev;
> > > > > > > +ÂÂÂÂÂÂÂ args.coherent = dma_flags & GNTDEV_DMA_FLAG_COHERENT;
> > > > > > > +ÂÂÂÂÂÂÂ args.nr_pages = count;
> > > > > > > +ÂÂÂÂÂÂÂ args.pages = add->pages;
> > > > > > > +ÂÂÂÂÂÂÂ args.frames = add->frames;
> > > > > > > +
> > > > > > > +ÂÂÂÂÂÂÂ if (gnttab_dma_alloc_pages(&args))
> > > *Option 3: kfree(map->frames);*
> > > > > > > +ÂÂÂÂÂÂÂÂÂÂÂ goto err;
> > > > > > > +
> > > > > > > +ÂÂÂÂÂÂÂ add->dma_vaddr = args.vaddr;
> > > > > > > +ÂÂÂÂÂÂÂ add->dma_bus_addr = args.dev_bus_addr;
> > > > > > > +ÂÂÂ } else
> > > > > > > +#endif
> > > > > > > ÂÂÂÂÂÂ if (gnttab_alloc_pages(count, add->pages))
> > > > > > > ÂÂÂÂÂÂÂÂÂÂ goto err;
> > > > > > > ÂÂ @@ -325,6 +401,14 @@ static int map_grant_pages(struct
> > > > > > > grant_map
> > > > > > > *map)
> > > > > > > ÂÂÂÂÂÂÂÂÂÂ map->unmap_ops[i].handle = map->map_ops[i].handle;
> > > > > > > ÂÂÂÂÂÂÂÂÂÂ if (use_ptemod)
> > > > > > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ map->kunmap_ops[i].handle =
> > > > > > > map->kmap_ops[i].handle;
> > > > > > > +#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
> > > > > > > +ÂÂÂÂÂÂÂ else if (map->dma_vaddr) {
> > > > > > > +ÂÂÂÂÂÂÂÂÂÂÂ unsigned long mfn;
> > > > > > > +
> > > > > > > +ÂÂÂÂÂÂÂÂÂÂÂ mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > > > Not pfn_to_mfn()?
> > > > > I'd love to, but pfn_to_mfn is only defined for x86, not ARM: [1]
> > > > > and [2]
> > > > > Thus,
> > > > >
> > > > > drivers/xen/gntdev.c:408:10: error: implicit declaration of function
> > > > > âpfn_to_mfnâ [-Werror=implicit-function-declaration]
> > > > > ÂÂÂÂ mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > > >
> > > > > So, I'll keep __pfn_to_mfn
> > > > How will this work on non-PV x86?
> > > So, you mean I need:
> > > #ifdef CONFIG_X86
> > > mfn = pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > #else
> > > mfn = __pfn_to_mfn(page_to_pfn(map->pages[i]));
> > > #endif
> > >
> > I'd rather fix it in ARM code. Stefano, why does ARM uses the
> > underscored version?
> Do you want me to add one more patch for ARM to wrap __pfn_to_mfn
> with static inline for ARM? e.g.
> static inline ...pfn_to_mfn(...)
> {
> ÂÂÂ __pfn_to_mfn();
> }


A Xen on ARM guest doesn't actually know the mfns behind its own
pseudo-physical pages. This is why we stopped using pfn_to_mfn and
started using pfn_to_bfn instead, which will generally return "pfn",
unless the page is a foreign grant. See include/xen/arm/page.h.
pfn_to_bfn was also introduced on x86. For example, see the usage of
pfn_to_bfn in drivers/xen/swiotlb-xen.c. Otherwise, if you don't care
about other mapped grants, you can just use pfn_to_gfn, that always
returns pfn.


Also, for your information, we support different page granularities in
Linux as a Xen guest, see the comment at include/xen/arm/page.h:

/*
* The pseudo-physical frame (pfn) used in all the helpers is always based
* on Xen page granularity (i.e 4KB).
*
* A Linux page may be split across multiple non-contiguous Xen page so we
* have to keep track with frame based on 4KB page granularity.
*
* PV drivers should never make a direct usage of those helpers (particularly
* pfn_to_gfn and gfn_to_pfn).
*/

A Linux page could be 64K, but a Xen page is always 4K. A granted page
is also 4K. We have helpers to take into account the offsets to map
multiple Xen grants in a single Linux page, see for example
drivers/xen/grant-table.c:gnttab_foreach_grant. Most PV drivers have
been converted to be able to work with 64K pages correctly, but if I
remember correctly gntdev.c is the only remaining driver that doesn't
support 64K pages yet, so you don't have to deal with it if you don't
want to.