Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length

From: David CARLIER

Date: Wed May 20 2026 - 15:34:13 EST

On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@xxxxxxxxx> wrote:
> >
> > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > PAGE_SIZE multiples without checking:
> >
> > - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> >
> > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > covers the full byte len, so a non-page-multiple non-final sg
> > desyncs num_niovs from the gen_pool region for every later sg, on
> > both RX and TX.
> >
> > dma-buf does not require page-aligned sizes, so the bind path has to
> > enforce what its own indexing assumes. Reject both with -EINVAL.
> >
> > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > SG-length check covers both directions.
> >
> > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> > ---
> > Changes in v2:
> > - Reframe commit message around the kernel-side OOB instead of
> > "real exporters already page-align", which read as the OOB being
> > unreachable and undercut Cc: stable (Stanislav Fomichev).
> > - Hoist the SG-length check out of the if (TX) branch so it covers
> > RX too; RX has the same num_niovs / gen_pool desync on a
> > contract-violating exporter, just without an OOB. Keep the
> > size-multiple check TX-only (Stanislav Fomichev).
> > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > match the existing call site at the tx_vec[] assignment
> > (Bobby Eshleman).
> >
> > net/core/devmem.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > index 468344739db2..4f71de44c0fb 100644
> > --- a/net/core/devmem.c
> > +++ b/net/core/devmem.c
> > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > }
> >
> > if (direction == DMA_TO_DEVICE) {
> > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > + err = -EINVAL;
> > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > + goto err_unmap;
> > + }
> > binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > dmabuf->size / PAGE_SIZE);
> > if (!binding->tx_vec) {
> > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > size_t len = sg_dma_len(sg);
> > struct net_iov *niov;
> >
> > + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > + err = -EINVAL;
> > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > + goto err_free_chunks;
> > + }
> > +
> > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > dev_to_node(&dev->dev));
> > if (!owner) {
> > --
> > 2.53.0
> >
>
> No hold on, I don't think we actually have a bug here. AFAIR all
> you're describing is intionional . Yes the TX vectors and their niov
> arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
>
> But net_devmem_get_niov_at has this check that prevents us from trying
> to send past the dma-buf size, even if it's not page_aligned:
>
> ```
> if (virt_addr >= binding->dmabuf->size)
> return NULL;
> ```
>
> IIRC the NULL should be bubbled up to the user as some error.
>
> Please double check that we actually have a bug here. If not, please
> don't merge this. This change could break existing users using devmem
> TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> case.
>
> And if we have a bug, lets fix it in some way that doesn't deprecate
> support for non page-aligned TX dmabufs. You may be breaking users
> here.
>
> --
> Thanks,
> Mina

Hi, Mina, note that the guard you're quoting doesn't cover the case
I'm describing. It's in bytes against dmabuf->size, while tx_vec is
sized dmabuf->size / PAGE_SIZE
(truncating) -- there's a sub-page window where the check passes and
the index doesn't.

Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:

tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
sendmsg with iov_base = 4096:
virt_addr >= dmabuf->size -> 4096 >= 4097, passes.
tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.

And the OOB pointer isn't just returned to the caller -- it flows
through get_netmem() -> __get_netmem(), which dereferences it
(net_is_devmem_iov() reads ->type) and
on a matching byte refcounts the binding off of it. So this is a
controlled OOB deref, not just a stray read.

On breaking users: the partial-page tail was never TX-usable to
begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
entries while
gen_pool_add_owner() covers the full byte len, so any sendmsg into
[num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
NULL or this OOB. It's not
deprecating a working configuration -- it's rejecting one that wasn't working.

Also worth flagging: the second hunk (sg_dma_len alignment) is the
RX path too. net_devmem_get_niov_at() is TX-only, so the guard you
quoted doesn't apply there, and
the num_niovs vs gen_pool_add_owner(len) desync hits every non-final
SG with a sub-page length, both directions.

If you'd prefer to keep non-aligned binds legal, an alternative
would be tightening the guard in net_devmem_get_niov_at() to virt_addr
>= num_niovs * PAGE_SIZE
instead of >= dmabuf->size. Happy to spin that as v3 -- but it
doesn't help the SG-length leg, which still needs rejecting at bind
time.

Cheers.