Re: [PATCH net-next V6 2/3] net/mlx5e: Avoid copying payload to the skb's linear part

From: Dragos Tatulea

Date: Fri May 08 2026 - 09:39:53 EST




On 08.05.26 14:43, David Laight wrote:
> On Thu, 7 May 2026 12:53:29 +0300
> Tariq Toukan <tariqt@xxxxxxxxxx> wrote:
>
>> From: Christoph Paasch <cpaasch@xxxxxxxxxx>
>>
>> mlx5e_skb_from_cqe_mpwrq_nonlinear() copies MLX5E_RX_MAX_HEAD (256)
>> bytes from the page-pool to the skb's linear part. Those 256 bytes
>> include part of the payload.
>>
>> When attempting to do GRO in skb_gro_receive, if headlen > data_offset
>> (and skb->head_frag is not set), we end up aggregating packets in the
>> frag_list.
>>
>> This is of course not good when we are CPU-limited. Also causes a worse
>> skb->len/truesize ratio,...
>>
>> So, let's avoid copying parts of the payload to the linear part. We use
>> eth_get_headlen() to parse the headers and compute the length of the
>> protocol headers, which will be used to copy the relevant bits of the
>> skb's linear part.
>>
>> We still allocate MLX5E_RX_MAX_HEAD for the skb so that if the networking
>> stack needs to call pskb_may_pull() later on, we don't need to reallocate
>> memory.
>>
>> This gives a nice throughput increase (ARM Neoverse-V2 with CX-7 NIC and
>> LRO enabled):
>>
>> BEFORE:
>> =======
>> (netserver pinned to core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.01 32547.82
>>
>> (netserver pinned to adjacent core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.00 52531.67
>>
>> AFTER:
>> ======
>> (netserver pinned to core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,9 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.00 52896.06
>>
>> (netserver pinned to adjacent core receiving interrupts)
>> $ netperf -H 10.221.81.118 -T 80,10 -P 0 -l 60 -- -m 256K -M 256K
>> 87380 16384 262144 60.00 85094.90
>>
>> Additional tests across a larger range of parameters w/ and w/o LRO, w/
>> and w/o IPv6-encapsulation, different MTUs (1500, 4096, 9000), different
>> TCP read/write-sizes as well as UDP benchmarks, all have shown equal or
>> better performance with this patch.
>>
>> Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
>> Reviewed-by: Saeed Mahameed <saeedm@xxxxxxxxxx>
>> Signed-off-by: Christoph Paasch <cpaasch@xxxxxxxxxx>
>> Signed-off-by: Dragos Tatulea <dtatulea@xxxxxxxxxx>
>> Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> index 75ccf40a7f17..301b33419207 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> ...
>> @@ -2060,8 +2066,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
>> pagep->frags++;
>> while (++pagep < frag_page);
>>
>> - headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len,
>> - skb->data_len);
>> + headlen = min_t(u16, headlen - len, skb->data_len);
>
> That looks entirely broken.
> skb->data_len can be larger than 65535 so (u16)skb->data_len can
> discard significant bits.
>
> I can't quite see why the subtract can't overflow either.
> It is entirely non-obvious.
>
A check will be added for that.

> There seem to be far too many u16 local variables in this code.
> Typically they just make the code larger because they require the
> compiler mask arithmetic results to 16bits all the time.
> (Only x86 and m68k have instructions for 8 and 16bit arithmetic.)
> The same is true for function parameters and results.
>
> I think all the min_t() in this file can easily be changed to min().
>
Will use min() here. And for the rest of the datapath files I will look
into a follow-up patch.

Thanks,
Dragos