Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length

From: Mina Almasry

Date: Wed May 20 2026 - 16:43:31 EST


On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@xxxxxxxxx> wrote:
>
> On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
> >
> > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@xxxxxxxxx> wrote:
> > >
> > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > > PAGE_SIZE multiples without checking:
> > >
> > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> > >
> > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > > covers the full byte len, so a non-page-multiple non-final sg
> > > desyncs num_niovs from the gen_pool region for every later sg, on
> > > both RX and TX.
> > >
> > > dma-buf does not require page-aligned sizes, so the bind path has to
> > > enforce what its own indexing assumes. Reject both with -EINVAL.
> > >
> > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > > SG-length check covers both directions.
> > >
> > > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> > > ---
> > > Changes in v2:
> > > - Reframe commit message around the kernel-side OOB instead of
> > > "real exporters already page-align", which read as the OOB being
> > > unreachable and undercut Cc: stable (Stanislav Fomichev).
> > > - Hoist the SG-length check out of the if (TX) branch so it covers
> > > RX too; RX has the same num_niovs / gen_pool desync on a
> > > contract-violating exporter, just without an OOB. Keep the
> > > size-multiple check TX-only (Stanislav Fomichev).
> > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > > match the existing call site at the tx_vec[] assignment
> > > (Bobby Eshleman).
> > >
> > > net/core/devmem.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > index 468344739db2..4f71de44c0fb 100644
> > > --- a/net/core/devmem.c
> > > +++ b/net/core/devmem.c
> > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > }
> > >
> > > if (direction == DMA_TO_DEVICE) {
> > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > > + err = -EINVAL;
> > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > > + goto err_unmap;
> > > + }
> > > binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > > dmabuf->size / PAGE_SIZE);
> > > if (!binding->tx_vec) {
> > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > size_t len = sg_dma_len(sg);
> > > struct net_iov *niov;
> > >
> > > + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > > + err = -EINVAL;
> > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > > + goto err_free_chunks;
> > > + }
> > > +
> > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > > dev_to_node(&dev->dev));
> > > if (!owner) {
> > > --
> > > 2.53.0
> > >
> >
> > No hold on, I don't think we actually have a bug here. AFAIR all
> > you're describing is intionional . Yes the TX vectors and their niov
> > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
> >
> > But net_devmem_get_niov_at has this check that prevents us from trying
> > to send past the dma-buf size, even if it's not page_aligned:
> >
> > ```
> > if (virt_addr >= binding->dmabuf->size)
> > return NULL;
> > ```
> >
> > IIRC the NULL should be bubbled up to the user as some error.
> >
> > Please double check that we actually have a bug here. If not, please
> > don't merge this. This change could break existing users using devmem
> > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> > case.
> >
> > And if we have a bug, lets fix it in some way that doesn't deprecate
> > support for non page-aligned TX dmabufs. You may be breaking users
> > here.
> >
> > --
> > Thanks,
> > Mina
>
> Hi, Mina, note that the guard you're quoting doesn't cover the case
> I'm describing. It's in bytes against dmabuf->size, while tx_vec is
> sized dmabuf->size / PAGE_SIZE
> (truncating) -- there's a sub-page window where the check passes and
> the index doesn't.
>
> Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
>
> tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
> sendmsg with iov_base = 4096:
> virt_addr >= dmabuf->size -> 4096 >= 4097, passes.
> tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
>
> And the OOB pointer isn't just returned to the caller -- it flows
> through get_netmem() -> __get_netmem(), which dereferences it
> (net_is_devmem_iov() reads ->type) and
> on a matching byte refcounts the binding off of it. So this is a
> controlled OOB deref, not just a stray read.
>

Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096
and dmabuf->size is 4097, the intention in the code was to have tx_vec
be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is
[4097]. I have an off-by-one error in the tx_vec allocation :(

tx_vec[1] would contain an niov and as is stands we assume nvios are
PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the
callers trying to use tx_vec[1] would only use it for a range that's
valid, so using [4097] and not a range like [4097->8192].

However when I dug deeper on proper pruning of page alignment
assumptions in net_devmem_bind_dmabuf, the problems are deeper than
this patch suggests. We don't properly handle the (probably
non-existent?) edge case where the dma-buf itself is page aligned but
for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :(

I think probably all of this is very theoretical. In practice probably
the dmabuf implementations in the wild seem to be page-aligned.
udmabuf doesn't support non-page-aligned dmabufs even so we can't add
tests for these things. So I guess fine, lets merge this.

> On breaking users: the partial-page tail was never TX-usable to
> begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
> entries while
> gen_pool_add_owner() covers the full byte len, so any sendmsg into
> [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
> NULL or this OOB. It's not
> deprecating a working configuration -- it's rejecting one that wasn't working.
>

Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using
actually sendmsging up to the last PAGE_SIZE boundary is working,
andthat would break entirely. I guess lets merge this and on the
offchance someone is hitting this edge case we can revisit.

Reviewed-by: Mina Almasry <almasrymina@xxxxxxxxxx>

I don't know if you're interested in also fixing the edge case where
the sg table entries themselves are not not page-aligned. I think that
also doesn't work properly?

--
Thanks,
Mina