Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part

From: Dragos Tatulea

Date: Sun May 10 2026 - 02:51:32 EST




On 08.05.26 20:42, Dragos Tatulea wrote:
>
>
> On 08.05.26 19:44, Amery Hung wrote:
>> On Fri, May 8, 2026 at 2:15 AM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 07.05.26 22:50, Amery Hung wrote:
>>>> On Thu, May 7, 2026 at 4:50 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> Hi Amery,
>>>>>
>>>>> On 07.05.26 15:53, Amery Hung wrote:
>>>>>> [...]
>>>>>> Am I understanding correctly that the better performance comes with
>>>>>> the assumption that the XDP does not change headers?
>>>>>>
>>>>>> headlen is determined before the XDP program runs. If it push/pop
>>>>>> headers, there could be headers in frags or data in the linear region
>>>>>> after __pskb_pull_tail().
>>>>>>
>>>>> That's right.
>>>>>
>>>>>>> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
>>>>>>> struct mlx5e_frag_page *pfp;
>>>>>>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>>>>>>> pagep->frags++;
>>>>>>> while (++pagep < frag_page);
>>>>>>>
>>>>>>> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
>>>>>>> - skb->data_len);
>>>>>>> + headlen = min_t(u16, headlen - len, skb->data_len);
>>>>>>
>>>>>> headlen - len can underflow but will be capped by skb->data_len, so
>>>>>> this should be okay, right?
>>>>> It is safe. But it might trigger an extra allocation in the pull when
>>>>> len > headlen. We could also skip the pull in that case. Or do a
>>>>> min(headlen - len, min(skb->data_len, MLX5E_RX_MAX_HEAD)). WDYT?
>>>>
>>>> Make sense, but this line took me a bit to understand. Maybe consider
>>>> checking len < headlen first?
>>>>
>>>> if (len < headlen) {
>>>> headlen = min_t(u32, headlen - len, skb->data_len);
>>>> __pskb_pull_tail(skb, headlen);
>>>> }
>>>>
>>> Yes, that's what I had in mind when skipping the pull. I would also
>>> tag this as likely.
>>>
>>>> Another clarifying question. So this patch will improve the
>>>> performance when the XDP programs don't change header length. For
>>>> those that encap/decap, they should precisely pull only headers into
>>>> the linear area for optimal performance. Is it correct?
>>>>
>>> Right for encap, but for decap not quite:
>>>
>>> Let's say that the XDP program pulls 64B header into the linear part
>>> and snips 4B of the encap out. This would result in a pull of an
>>> additional 4B (headlen (64B) - len (60B) = 4B) which are now
>>> data bytes => sub-optimal layout.
>>>
>>> I don't see how we can improve this corner case though.
>>
>> I see. Thanks for the clarification.
>>
>> I think the "if (len < headlen)" makes too many assumptions about what
>> the XDP program did.
>>
>> How about this policy instead: If the XDP program did not create/pull
>> data into the linear area, pull the parsed headers; otherwise, assume
>> the XDP program owns the geometry. min() is still needed since the
>> program can shrink the packet.
>>
>> if (!len) {
>> headlen = min(headlen, skb->data_len);
>> __pskb_pull_tail(skb, headen);
>> }
>>
>> This preserves the optimization for the default no-modification case,
>> and most importantly allow XDP program to get the optimal performance
>> if it gets the final geometry right.
>>
> I like this. It will also save us some neurons next time we need to
> touch these lines.
>
Sashiko disagrees:

"""
If an XDP program changes the packet geometry by prepending data, len will
be greater than 0, which skips the __pskb_pull_tail() call entirely.
The resulting SKB's linear part will only contain the prepended data, with
the Ethernet headers remaining in the fragments.
When the network stack later calls eth_type_trans(), it unconditionally
pulls ETH_HLEN:
net/ethernet/eth.c:eth_type_trans() {
...
skb_pull_inline(skb, ETH_HLEN);
...
}
Could pulling 14 bytes from a smaller linear area cause skb->len to drop
below skb->data_len and trigger a BUG() in __skb_pull()?
"""

So I think we need an else where we preserve the old behavior:
if (!len)
headlen = min(headlen, skb->data_len);
else
headlen = min(MLX5E_RX_MAX_HEAD - len, skb->data_len);

__pskb_pull_tail(skb, headlen);

Thanks,
Dragos