Re: [PATCH net v2] net: devmem: reject dma-buf bind with non-page-aligned size or SG length

From: David CARLIER

Date: Wed May 20 2026 - 17:03:56 EST


On Wed, 20 May 2026 at 21:42, Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> On Wed, May 20, 2026 at 12:28 PM David CARLIER <devnexen@xxxxxxxxx> wrote:
> >
> > On Wed, 20 May 2026 at 19:55, Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
> > >
> > > On Tue, May 19, 2026 at 1:35 PM David Carlier <devnexen@xxxxxxxxx> wrote:
> > > >
> > > > net_devmem_bind_dmabuf() trusts dmabuf->size and sg_dma_len() to be
> > > > PAGE_SIZE multiples without checking:
> > > >
> > > > - tx_vec is sized dmabuf->size / PAGE_SIZE, and
> > > > net_devmem_get_niov_at() only bounds-checks virt_addr < dmabuf->size
> > > > before indexing tx_vec[virt_addr / PAGE_SIZE]. With size =
> > > > N*PAGE_SIZE + r (1 <= r < PAGE_SIZE), sendmsg() at iov_base =
> > > > N*PAGE_SIZE passes the bound check and reads tx_vec[N] -- one past.
> > > >
> > > > - owner->area.num_niovs = len / PAGE_SIZE while gen_pool_add_owner()
> > > > covers the full byte len, so a non-page-multiple non-final sg
> > > > desyncs num_niovs from the gen_pool region for every later sg, on
> > > > both RX and TX.
> > > >
> > > > dma-buf does not require page-aligned sizes, so the bind path has to
> > > > enforce what its own indexing assumes. Reject both with -EINVAL.
> > > >
> > > > The size check is TX-only (only tx_vec is sized off dmabuf->size); the
> > > > SG-length check covers both directions.
> > > >
> > > > Fixes: bd61848900bf ("net: devmem: Implement TX path")
> > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> > > > ---
> > > > Changes in v2:
> > > > - Reframe commit message around the kernel-side OOB instead of
> > > > "real exporters already page-align", which read as the OOB being
> > > > unreachable and undercut Cc: stable (Stanislav Fomichev).
> > > > - Hoist the SG-length check out of the if (TX) branch so it covers
> > > > RX too; RX has the same num_niovs / gen_pool desync on a
> > > > contract-violating exporter, just without an OOB. Keep the
> > > > size-multiple check TX-only (Stanislav Fomichev).
> > > > - Drop bool todevice; compare direction == DMA_TO_DEVICE inline to
> > > > match the existing call site at the tx_vec[] assignment
> > > > (Bobby Eshleman).
> > > >
> > > > net/core/devmem.c | 11 +++++++++++
> > > > 1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > > index 468344739db2..4f71de44c0fb 100644
> > > > --- a/net/core/devmem.c
> > > > +++ b/net/core/devmem.c
> > > > @@ -241,6 +241,11 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > > }
> > > >
> > > > if (direction == DMA_TO_DEVICE) {
> > > > + if (!IS_ALIGNED(dmabuf->size, PAGE_SIZE)) {
> > > > + err = -EINVAL;
> > > > + NL_SET_ERR_MSG(extack, "TX dma-buf size must be a multiple of PAGE_SIZE");
> > > > + goto err_unmap;
> > > > + }
> > > > binding->tx_vec = kvmalloc_objs(struct net_iov *,
> > > > dmabuf->size / PAGE_SIZE);
> > > > if (!binding->tx_vec) {
> > > > @@ -267,6 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> > > > size_t len = sg_dma_len(sg);
> > > > struct net_iov *niov;
> > > >
> > > > + if (!IS_ALIGNED(len, PAGE_SIZE)) {
> > > > + err = -EINVAL;
> > > > + NL_SET_ERR_MSG(extack, "dma-buf SG length must be PAGE_SIZE aligned");
> > > > + goto err_free_chunks;
> > > > + }
> > > > +
> > > > owner = kzalloc_node(sizeof(*owner), GFP_KERNEL,
> > > > dev_to_node(&dev->dev));
> > > > if (!owner) {
> > > > --
> > > > 2.53.0
> > > >
> > >
> > > No hold on, I don't think we actually have a bug here. AFAIR all
> > > you're describing is intionional . Yes the TX vectors and their niov
> > > arrays do implicitly 'pad' the dmabuf size to PAGE_SIZE.
> > >
> > > But net_devmem_get_niov_at has this check that prevents us from trying
> > > to send past the dma-buf size, even if it's not page_aligned:
> > >
> > > ```
> > > if (virt_addr >= binding->dmabuf->size)
> > > return NULL;
> > > ```
> > >
> > > IIRC the NULL should be bubbled up to the user as some error.
> > >
> > > Please double check that we actually have a bug here. If not, please
> > > don't merge this. This change could break existing users using devmem
> > > TX correctly with non-PAGE_SIZE aligned dmabufs, which is a valid use
> > > case.
> > >
> > > And if we have a bug, lets fix it in some way that doesn't deprecate
> > > support for non page-aligned TX dmabufs. You may be breaking users
> > > here.
> > >
> > > --
> > > Thanks,
> > > Mina
> >
> > Hi, Mina, note that the guard you're quoting doesn't cover the case
> > I'm describing. It's in bytes against dmabuf->size, while tx_vec is
> > sized dmabuf->size / PAGE_SIZE
> > (truncating) -- there's a sub-page window where the check passes and
> > the index doesn't.
> >
> > Concretely, PAGE_SIZE = 4096 and dmabuf->size = 4097:
> >
> > tx_vec allocated for 4097 / 4096 = 1 entry (valid index 0).
> > sendmsg with iov_base = 4096:
> > virt_addr >= dmabuf->size -> 4096 >= 4097, passes.
> > tx_vec[virt_addr / PAGE_SIZE] -> tx_vec[1], OOB by one.
> >
> > And the OOB pointer isn't just returned to the caller -- it flows
> > through get_netmem() -> __get_netmem(), which dereferences it
> > (net_is_devmem_iov() reads ->type) and
> > on a matching byte refcounts the binding off of it. So this is a
> > controlled OOB deref, not just a stray read.
> >
>
> Ah, I see. I think that is indeed the bug. In the case PAGE_SIZE=4096
> and dmabuf->size is 4097, the intention in the code was to have tx_vec
> be an array of 2, where tx_vec[0] is [1->4096] and tx_vec[1] is
> [4097]. I have an off-by-one error in the tx_vec allocation :(
>
> tx_vec[1] would contain an niov and as is stands we assume nvios are
> PAGE_SIZE, but IIRC net_devmem_get_niov_at() would make sure that the
> callers trying to use tx_vec[1] would only use it for a range that's
> valid, so using [4097] and not a range like [4097->8192].
>
> However when I dug deeper on proper pruning of page alignment
> assumptions in net_devmem_bind_dmabuf, the problems are deeper than
> this patch suggests. We don't properly handle the (probably
> non-existent?) edge case where the dma-buf itself is page aligned but
> for_each_sgtable_dma_sg itself gives us un-page_aligned sg entries :(
>
> I think probably all of this is very theoretical. In practice probably
> the dmabuf implementations in the wild seem to be page-aligned.
> udmabuf doesn't support non-page-aligned dmabufs even so we can't add
> tests for these things. So I guess fine, lets merge this.
>
> > On breaking users: the partial-page tail was never TX-usable to
> > begin with. The fill loop only populates num_niovs = len / PAGE_SIZE
> > entries while
> > gen_pool_add_owner() covers the full byte len, so any sendmsg into
> > [num_niovs*PAGE_SIZE, dmabuf->size) was already heading for either
> > NULL or this OOB. It's not
> > deprecating a working configuration -- it's rejecting one that wasn't working.
> >
>
> Yes, but still, today binding a non-PAGE_SIZE TX dmabuf but only using
> actually sendmsging up to the last PAGE_SIZE boundary is working,
> andthat would break entirely. I guess lets merge this and on the
> offchance someone is hitting this edge case we can revisit.
>
> Reviewed-by: Mina Almasry <almasrymina@xxxxxxxxxx>
>
> I don't know if you're interested in also fixing the edge case where
> the sg table entries themselves are not not page-aligned. I think that
> also doesn't work properly?
>
> --
> Thanks,
> Mina

Thanks Mina, appreciate the review.

On the SG side -- the second hunk in this patch already rejects any
SG with !IS_ALIGNED(sg_dma_len(sg), PAGE_SIZE) at bind time, so the
num_niovs vs
gen_pool_add_owner(len) desync I described in the changelog is
covered for both RX and TX.

What's not checked is the sg_dma_address() alignment itself -- an SG
with a page-multiple length but a sub-page start offset would still
slip through and produce
niovs whose dma addresses straddle pages. Like you said, it's
theoretical (in-tree exporters and udmabuf all hand back page-aligned
addresses), but happy to send a
follow-up as a separate fix if you'd like.

Cheers.