Re: [PATCH net-next 2/4] udmabuf: emit one sg entry per pinned folio
From: Christian König
Date: Mon Jun 08 2026 - 10:07:21 EST
On 6/8/26 15:55, Bobby Eshleman wrote:
>
> On Sun, Jun 7, 2026 at 11:42 PM Christian König <christian.koenig@xxxxxxx <mailto:christian.koenig@xxxxxxx>> wrote:
>
> On 6/5/26 20:44, Bobby Eshleman wrote:
> > On Fri, Jun 05, 2026 at 11:30:07AM +0200, Christian König wrote:
> >> On 6/4/26 02:42, Bobby Eshleman wrote:
> >>> From: Bobby Eshleman <bobbyeshleman@xxxxxxxx <mailto:bobbyeshleman@xxxxxxxx>>
> >>>
> >>> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
> >>> underlying folio was larger.
> >>>
> >>> Instead, walk folios[] and emit one sg entry per folio. When folios
> >>> represent large pages (as is for MFD_HUGETLB), each sg entry is a large
> >>> page. Normal PAGE_SIZE sg tables are unchanged.
> >>>
> >>> Required by net/core/devmem to support rx-buf-size > PAGE_SIZE with
> >>> udmabuf.
> >>
> >> That doesn't explain why this is required.
> >
> > Sure, can definitely add. Devmem currently requires dmabuf sg entries to
> > be length and size aligned when it allocates niovs for NIC page pools.
> > Though udmabuf is not violating any dmabuf contract by emitting
> > PAGE_SIZE entries and the above restriction is probably more a
> > shortfalling of devmem, by emitting a single entry per folio this patch
> > allows udmabuf to be used by devmem for large pages.
> >
> >>
> >> Please note that accessing the pages/folio of an sg-table returned by DMA-buf is illegal and strictly forbidden!
> >>
> >> Regards,
> >> Christian.
> >
> > It seems both devmem and io_uring zcrx at least introspect through to
> > the sg-table to build NIC page pools (not accessing the memory itself,
> > however). Is there a better way?
>
> That's an absolute NO-GO! We need to stop that immediately.
>
> Touching the underlying struct page of an DMA-buf exported sg-table is strictly forbidden.
>
> We even have code to wrap the sg_table and hide the struct pages on debug builds to catch those issues, see function dma_buf_wrap_sg_table().
>
> My last status is that the NIC page pools are build directly from the DMA addresses exposed by the sg_table.
>
> Was there any change I'm not aware of?
>
> Regards,
> Christian.
>
>
> Oh no change, your mental model is still current.
> They just go through each sg and use sg_dma_address() on each.
Ah, thanks! That was a near heart attack :D
Yeah that is perfectly correct, question is do you then still really need this udmabuf change? I mean the DMA API usually merges together contiguous DMA addresses.
Regards,
Christian.
>
> Best,
> Bobby
>
>
> >
> > Best,
> > Bobby
> >
> >>
> >>> Signed-off-by: Bobby Eshleman <bobbyeshleman@xxxxxxxx <mailto:bobbyeshleman@xxxxxxxx>>
> >>> ---
> >>> drivers/dma-buf/udmabuf.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> >>> 1 file changed, 42 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> >>> index 94b8ecb892bb..f28dd3788ada 100644
> >>> --- a/drivers/dma-buf/udmabuf.c
> >>> +++ b/drivers/dma-buf/udmabuf.c
> >>> @@ -141,26 +141,63 @@ static void vunmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
> >>> vm_unmap_ram(map->vaddr, ubuf->pagecount);
> >>> }
> >>>
> >>> +/* Return the number of contiguous pages backed by the folio at @i.
> >>> + * A udmabuf may map only part of a folio, or reference the same folio
> >>> + * in multiple non-contiguous runs, so folio_nr_pages() can't be used.
> >>> + */
> >>> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t i)
> >>> +{
> >>> + struct folio *f = ubuf->folios[i];
> >>> + pgoff_t j;
> >>> +
> >>> + for (j = 1; i + j < ubuf->pagecount; j++) {
> >>> + if (ubuf->folios[i + j] != f)
> >>> + break;
> >>> + /* Same folio, but not a sequential offset within it. */
> >>> + if (ubuf->offsets[i + j] != ubuf->offsets[i] + j * PAGE_SIZE)
> >>> + break;
> >>> + }
> >>> + return j;
> >>> +}
> >>> +
> >>> +/* Count the contiguous folio runs in @ubuf, one sg entry per run. */
> >>> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
> >>> +{
> >>> + unsigned int nents = 0;
> >>> + pgoff_t i;
> >>> +
> >>> + for (i = 0; i < ubuf->pagecount; i += udmabuf_folio_nr_pages(ubuf, i))
> >>> + nents++;
> >>> + return nents;
> >>> +}
> >>> +
> >>> static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf,
> >>> enum dma_data_direction direction)
> >>> {
> >>> struct udmabuf *ubuf = buf->priv;
> >>> - struct sg_table *sg;
> >>> struct scatterlist *sgl;
> >>> - unsigned int i = 0;
> >>> + struct sg_table *sg;
> >>> + pgoff_t i, run;
> >>> + unsigned int nents;
> >>> int ret;
> >>>
> >>> + nents = udmabuf_sg_nents(ubuf);
> >>> +
> >>> sg = kzalloc_obj(*sg);
> >>> if (!sg)
> >>> return ERR_PTR(-ENOMEM);
> >>>
> >>> - ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
> >>> + ret = sg_alloc_table(sg, nents, GFP_KERNEL);
> >>> if (ret < 0)
> >>> goto err_alloc;
> >>>
> >>> - for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
> >>> - sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
> >>> + sgl = sg->sgl;
> >>> + for (i = 0; i < ubuf->pagecount; i += run) {
> >>> + run = udmabuf_folio_nr_pages(ubuf, i);
> >>> + sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
> >>> ubuf->offsets[i]);
> >>> + sgl = sg_next(sgl);
> >>> + }
> >>>
> >>> ret = dma_map_sgtable(dev, sg, direction, 0);
> >>> if (ret < 0)
> >>>
> >>> --
> >>> 2.53.0-Meta
> >>>
> >>
>