Re: [Intel-wired-lan] [PATCH net v2] ice: Fix missing 1's complement negation in GCS raw checksum

From: Jacob Keller

Date: Thu May 07 2026 - 17:57:30 EST


On 5/7/2026 2:34 AM, Matt Fleming wrote:
> On Mon, May 04, 2026 at 05:10:23PM -0700, Jacob Keller wrote:
>>
>> Hi,
>>
>> Based on your patch description, I assume that you've tested this on
>> real hardware.
>>
>> I dug a little through some of our internal changes history and sawe
>> that it looks like the hardware has a register setting in its
>> GL_RDPU_CNTRL register which determines whether the checksum value
>> reported is inverted or not. In E830 hardware, it is supposed to be off
>> (i.e. the checksum value reported already matches the expected setting.
>>
>> Perhaps your device somehow got the GL_RDPU_CNTRL register set to the
>> wrong mode and that results in the swap being necessary. Hmm.
>>
>> I'll ask the team to see if they can confirm this behavior.
>
> Hi Jake,
>
> Thanks for digging into this.
>

I'm still trying to see if I can get someone on our team with hardware
setup to confirm this behavior.

> I read GL_RDPU_CNTRL on our affected E830 and the value is the same on
> both ports of the NIC:
>
> 0000:c1:00.0: GL_RDPU_CNTRL = 0x0020a275
> 0000:c1:00.1: GL_RDPU_CNTRL = 0x0020a275
>

Ok. Makes sense.

> Decoding bit 22 (E830_GL_RDPU_CNTRL_CHECKSUM_COMPLETE_INV) gives 0,
> i.e. the hardware is supposedly in "not inverted" mode, which matches
> the default you described.
>
I wonder if it would actually work if we set the bit to 1. It could be
that there was miscommunication between us and the hardware folks so
what "inverted" means got.. inverted. (pun intended).

> However, looking at the data on the wire I see:
>
> - netdev_rx_csum_fault fires ~65 000 times/sec on this host.
> - bpftrace at fexit:ice_process_skb_fields shows skb->csum =
> swab16(raw_csum) directly (no negation), e.g. raw_csum=0xfb4f
> -> skb->csum=0x4ffb.
> - At fentry:__skb_checksum_complete the upper 16 bits of skb->csum
> are 0xFFFF on every TCP/UDP packet -- the signature of nf_ip_checksum
> adding the pseudo-header to a value that was the un-negated raw_csum.
> - fold2(skb->csum_at_fentry + skb_checksum(skb,0,len,0)) ≈ 0xFFFF
> for every packet, which means the two values are ones-complement
> complements of each other, i.e. the driver stored S where the
> stack expects ~S.
>
> Negating the checksum makes the failures go away.
>

Yea. Clearly we're inverting the checksum relative to what the stack wants.

I'm also curious why our validation folks haven't noticed or
complained.. but I think it may be partially because we have to disable
the GSC support when operating TSO offload, so TCP traffic won't be able
to replicate this. Hmm... Oh....

I wonder if the way we distinguish between these modes isn't per-flow
but instead per-device.. so if you have NETIF_F_ALL_TSO enabled you
won't get NETIF_F_HW_CSUM, which means that we'd go through the old
legacy path... Ugh. That might explain how this escaped testing :(

I've queued your v2 to dev-queue, so at the very least we can get some
validation and confirm the behavior.

> Thanks,
> Matt