Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part

From: Dragos Tatulea

Date: Fri May 08 2026 - 05:18:28 EST

On 07.05.26 22:50, Amery Hung wrote:
> On Thu, May 7, 2026 at 4:50 PM Dragos Tatulea <dtatulea@xxxxxxxxxx> wrote:
>>
>>
>> Hi Amery,
>>
>> On 07.05.26 15:53, Amery Hung wrote:
>>> [...]
>>> Am I understanding correctly that the better performance comes with
>>> the assumption that the XDP does not change headers?
>>>
>>> headlen is determined before the XDP program runs. If it push/pop
>>> headers, there could be headers in frags or data in the linear region
>>> after __pskb_pull_tail().
>>>
>> That's right.
>>
>>>> if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
>>>> struct mlx5e_frag_page *pfp;
>>>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>>>> pagep->frags++;
>>>> while (++pagep < frag_page);
>>>>
>>>> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
>>>> - skb->data_len);
>>>> + headlen = min_t(u16, headlen - len, skb->data_len);
>>>
>>> headlen - len can underflow but will be capped by skb->data_len, so
>>> this should be okay, right?
>> It is safe. But it might trigger an extra allocation in the pull when
>> len > headlen. We could also skip the pull in that case. Or do a
>> min(headlen - len, min(skb->data_len, MLX5E_RX_MAX_HEAD)). WDYT?
>
> Make sense, but this line took me a bit to understand. Maybe consider
> checking len < headlen first?
>
> if (len < headlen) {
> headlen = min_t(u32, headlen - len, skb->data_len);
> __pskb_pull_tail(skb, headlen);
> }
>
Yes, that's what I had in mind when skipping the pull. I would also
tag this as likely.

> Another clarifying question. So this patch will improve the
> performance when the XDP programs don't change header length. For
> those that encap/decap, they should precisely pull only headers into
> the linear area for optimal performance. Is it correct?
>
Right for encap, but for decap not quite:

Let's say that the XDP program pulls 64B header into the linear part
and snips 4B of the encap out. This would result in a pull of an
additional 4B (headlen (64B) - len (60B) = 4B) which are now
data bytes => sub-optimal layout.

I don't see how we can improve this corner case though.

Thanks,
Dragos