Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part

From: David Laight

Date: Fri May 08 2026 - 08:44:28 EST


On Thu, 7 May 2026 12:53:29 +0300
Tariq Toukan <tariqt@xxxxxxxxxx> wrote:

> From: Christoph Paasch <cpaasch@xxxxxxxxxx>
>
> mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
> bytes from the page-pool to the skb's linear part. Those 256 bytes
> include part of the payload.
>
> When attempting to do GRO in skb_gro_receive, if headlen > data_offset
> (and skb->head_frag is not set), we end up aggregating packets in the
> frag_list.
>
> This is of course not good when we are CPU-limited. Also causes a worse
> skb->len/truesize ratio,...
>
> So, let's avoid copying parts of the payload to the linear part. We use
> eth_get_headlen() to parse the headers and compute the length of the
> protocol headers, which will be used to copy the relevant bits of the
> skb's linear part.
>
> We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
> stack needs to call pskb_may_pull() later on, we don't need to reallocate
> memory.
>
> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
> LRO enabled):
>
> BEFORE:
> =======
> (netserver pinned to core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
> 87380 16384 262144 60.01 32547.82
>
> (netserver pinned to adjacent core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
> 87380 16384 262144 60.00 52531.67
>
> AFTER:
> ======
> (netserver pinned to core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
> 87380 16384 262144 60.00 52896.06
>
> (netserver pinned to adjacent core receiving interrupts)
> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
> 87380 16384 262144 60.00 85094.90
>
> Additional tests across a larger range of parameters w/ and w/o LRO, w/
> and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
> TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
> better performance with this patch.
>
> Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Reviewed-by: Saeed Mahameed <saeedm@xxxxxxxxxx>
> Signed-off-by: Christoph Paasch <cpaasch@xxxxxxxxxx>
> Signed-off-by: Dragos Tatulea <dtatulea@xxxxxxxxxx>
> Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 75ccf40a7f17..301b33419207 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
...
> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> pagep->frags++;
> while (++pagep < frag_page);
>
> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
> - skb->data_len);
> + headlen = min_t(u16, headlen - len, skb->data_len);

That looks entirely broken.
skb->data_len can be larger than 65535 so (u16)skb->data_len can
discard significant bits.

I can't quite see why the subtract can't overflow either.
It is entirely non-obvious.

There seem to be far too many u16 local variables in this code.
Typically they just make the code larger because they require the
compiler mask arithmetic results to 16bits all the time.
(Only x86 and m68k have instructions for 8 and 16bit arithmetic.)
The same is true for function parameters and results.

I think all the min_t() in this file can easily be changed to min().

-- David

> __pskb_pull_tail(skb, headlen);
> }
> } else {