Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part

From: Amery Hung

Date: Mon May 11 2026 - 19:09:17 EST


On Sat, May 9, 2026 at 11:51 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>
>
>
> On 08.05.26 20:42, Dragos Tatulea wrote:
> >
> >
> > On 08.05.26 19:44, Amery Hung wrote:
> >> On Fri, May 8, 2026 at 2:15 AM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
> >>>
> >>>
> >>>
> >>> On 07.05.26 22:50, Amery Hung wrote:
> >>>> On Thu, May 7, 2026 at 4:50 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>
> >>>>> Hi Amery,
> >>>>>
> >>>>> On 07.05.26 15:53, Amery Hung wrote:
> >>>>>> [...]
> >>>>>> Am I understanding correctly that the better performance comes with
> >>>>>> the assumption that the XDP does not change headers?
> >>>>>>
> >>>>>> headlen is determined before the XDP program runs. If it push/pop
> >>>>>> headers, there could be headers in frags or data in the linear region
> >>>>>> after __pskb_pull_tail().
> >>>>>>
> >>>>> That's right.
> >>>>>
> >>>>>>> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> >>>>>>> struct mlx5e_frag_page *pfp;
> >>>>>>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >>>>>>> pagep->frags++;
> >>>>>>> while (++pagep < frag_page);
> >>>>>>>
> >>>>>>> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
> >>>>>>> - skb->data_len);
> >>>>>>> + headlen = min_t(u16, headlen - len, skb->data_len);
> >>>>>>
> >>>>>> headlen - len can underflow but will be capped by skb->data_len, so
> >>>>>> this should be okay, right?
> >>>>> It is safe. But it might trigger an extra allocation in the pull when
> >>>>> len > headlen. We could also skip the pull in that case. Or do a
> >>>>> min(headlen - len, min(skb->data_len, MLX5E_RX_MAX_HEAD)). WDYT?
> >>>>
> >>>> Make sense, but this line took me a bit to understand. Maybe consider
> >>>> checking len < headlen first?
> >>>>
> >>>> if (len < headlen) {
> >>>> headlen = min_t(u32, headlen - len, skb->data_len);
> >>>> __pskb_pull_tail(skb, headlen);
> >>>> }
> >>>>
> >>> Yes, that's what I had in mind when skipping the pull. I would also
> >>> tag this as likely.
> >>>
> >>>> Another clarifying question. So this patch will improve the
> >>>> performance when the XDP programs don't change header length. For
> >>>> those that encap/decap, they should precisely pull only headers into
> >>>> the linear area for optimal performance. Is it correct?
> >>>>
> >>> Right for encap, but for decap not quite:
> >>>
> >>> Let's say that the XDP program pulls 64B header into the linear part
> >>> and snips 4B of the encap out. This would result in a pull of an
> >>> additional 4B (headlen (64B) - len (60B) = 4B) which are now
> >>> data bytes => sub-optimal layout.
> >>>
> >>> I don't see how we can improve this corner case though.
> >>
> >> I see. Thanks for the clarification.
> >>
> >> I think the "if (len < headlen)" makes too many assumptions about what
> >> the XDP program did.
> >>
> >> How about this policy instead: If the XDP program did not create/pull
> >> data into the linear area, pull the parsed headers; otherwise, assume
> >> the XDP program owns the geometry. min() is still needed since the
> >> program can shrink the packet.
> >>
> >> if (!len) {
> >> headlen = min(headlen, skb->data_len);
> >> __pskb_pull_tail(skb, headen);
> >> }
> >>
> >> This preserves the optimization for the default no-modification case,
> >> and most importantly allow XDP program to get the optimal performance
> >> if it gets the final geometry right.
> >>
> > I like this. It will also save us some neurons next time we need to
> > touch these lines.
> >
> Sashiko disagrees:
>
> """
> If an XDP program changes the packet geometry by prepending data, len will
> be greater than 0, which skips the __pskb_pull_tail() call entirely.
> The resulting SKB's linear part will only contain the prepended data, with
> the Ethernet headers remaining in the fragments.
> When the network stack later calls eth_type_trans(), it unconditionally
> pulls ETH_HLEN:
> net/ethernet/eth.c:eth_type_trans() {
> ...
> skb_pull_inline(skb, ETH_HLEN);
> ...
> }
> Could pulling 14 bytes from a smaller linear area cause skb->len to drop
> below skb->data_len and trigger a BUG() in __skb_pull()?
> """
>
> So I think we need an else where we preserve the old behavior:
> if (!len)
> headlen = min(headlen, skb->data_len);
> else
> headlen = min(MLX5E_RX_MAX_HEAD - len, skb->data_len);
>
> __pskb_pull_tail(skb, headlen);

I see. I am okay with the fallback.

One last question: if the fallback is mainly to preserve the minimum
linear head needed by eth_type_trans(), can we make that explicit
instead?

unsigned int pull_len = 0;

if (!len)
pull_len = headlen;
else if (len < ETH_HLEN)
pull_len = ETH_HLEN - len;

if (pull_len)
__pskb_pull_tail(skb, min(pull_len, skb->data_len));

This keeps the parsed-header pull for the fully nonlinear case, but once
XDP leaves some linear data, the driver only pulls enough to satisfy the
Ethernet header invariant and otherwise leaves the final geometry to XDP.


>
> Thanks,
> Dragos