Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info frag_size

From: Vladimir Oltean

Date: Thu Feb 05 2026 - 08:41:23 EST


On Thu, Feb 05, 2026 at 02:23:15PM +0100, Larysa Zaremba wrote:
> On Thu, Feb 05, 2026 at 02:46:38PM +0200, Vladimir Oltean wrote:
> > On Thu, Feb 05, 2026 at 01:41:03PM +0100, Larysa Zaremba wrote:
> > > On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote:
> > > > On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote:
> > > > > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote:
> > > > > > Thanks! This is an extremely subtle corner case. I appreciate the patch
> > > > > > and explanation.
> > > > > >
> > > > > > I did run tests on the blamed commit (which I still have), but to catch
> > > > > > a real issue in a meaningful way it would have been required to have a
> > > > > > program which calls bpf_xdp_adjust_tail() with a very large offset.
> > > > > > I'm noting that I'm seeing the WARN_ON() much easier after your fix, but
> > > > > > before, it was mostly inconsequential for practical cases.
> > > > > >
> > > > > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256.
> > > > > > First buffers also contain the skb_shared_info (320 bytes), while
> > > > > > subsequent buffers don't.
> > > > >
> > > > > I can't wrap my head around this series, hope you can tell me where I'm
> > > > > going wrong. AFAICT enetc splits the page into two halves for small MTU.
> > > > >
> > > > > So we have
> > > > >
> > > > > | 2k | 2k |
> > > > > ----------------------------- -----------------------------
> > > > > | hroom | data | troom/shinfo | hroom | data | troom/shinfo |
> > > > > ----------------------------- -----------------------------
> > > > >
> > > > > If we attach the second chunk as frag well have:
> > > > > offset = 2k + hroom
> > > > > size = data.len
> > > > > But we use
> > > > > truesize / frag_size = 2k
> > > > > so
> > > > > tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag);
> > > > > tailroom = 2k - data.len - 2k
> > > > > tailroom = -data.len
> > > > > WARN(tailroom < 0) -> yes
> > > > >
> > > > > The frag_size thing is unusable for any driver that doesn't hand out
> > > > > full pages to frags?
> > > >
> > > > This is an excellent question.
> > > >
> > > > Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance
> > > > of working - the paged data has to be in the first half of the page,
> > > > otherwise the tailroom calculations are not correct due to rxq->frag_size,
> > > > and the WARN_ON() will trigger.
> > > >
> > > > The reason why I didn't notice this during my testing is stupid. I was
> > > > attaching the BPF program to the interface and then detaching it after
> > > > each test, and each test was sending less than the RX ring size (2048)
> > > > worth of packets. So all multi-buffer frames were using buffers which
> > > > were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first
> > > > halves) and never out of flipped pages (enetc_bulk_flip_buff()).
> > > >
> > > > This seems to be a good reason to convert this driver to use page pool,
> > > > which I can look into. I'm not sure that there's anything that can be
> > > > done to make the rxq->frag_size mechanism compatible with the current
> > > > buffer allocation scheme.
> > >
> > > I was just about to send an answer.
> > >
> > > Seems like my mistake here. I actually think adjusting the tail should work, if
> > > we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in i40e, and
> > > not to (PAGE_SIZE / 2), as I did at first, but in such case naming this
> > > frag_size is just utterly wrong. Glad Jakub has pointed this out.
> >
> > I mean, it should "work" given the caveat that calling bpf_xdp_adjust_tail()
> > on a first-half page buffer with a large offset risks leaking into the
> > second half, which may also be in use, and this will go undetected, right?
> > Although the practical chances of that happening are low, the requested
> > offset needs to be in the order of hundreds still.
>
> Oh, I did get carried away there...
> Well, one thing is shared page memory model in enetc and i40e, another thing is
> xsk_buff_pool, where chunk size can be between 2K and PAGE_SIZE. What about
>
> tailroom = rxq->frag_size - skb_frag_size(frag) -
> (skb_frag_off(frag) % rxq->frag_size);
>
> When frag_size is set to 2K, headroom is let's say 256, so aligned DMA write
> size is 1420.
> last frag at the start of the page: offset=256, size<=1420
> tailroom >= 2K - 1420 - 256 = 372
> last frag in the middle of the page: offset=256+2K, size<=1420
> tailroom >= 2K - 1420 - ((256 + 2K) % 2K) = 372
>
> And for drivers that do not fragment pages for multi-buffer packets, nothing
> changes, since offset is always less than rxq->frag_size.
>
> This brings us back to rxq->frag_size being half of a page for enetc and i40e,
> and seems like in ZC mode it should be pool->chunk_size to work properly.

With skb_frag_off() taken into account modulo 2K for the tailroom
calculation, I can confirm bpf_xdp_frags_increase_tail() works well for
ENETC. I haven't fully considered the side effects, though.