Re: [PATCH net-next v2] af-packet: new flag to indicate all csums are good

From: Victor Julien
Date: Fri Jun 05 2020 - 08:38:20 EST


On 04-06-2020 15:48, Willem de Bruijn wrote:
> On Thu, Jun 4, 2020 at 5:47 AM Victor Julien <victor@xxxxxxxxxxxx> wrote:
>>
>> On 02-06-2020 22:18, Willem de Bruijn wrote:
>>> On Tue, Jun 2, 2020 at 4:05 PM Victor Julien <victor@xxxxxxxxxxxx> wrote:
>>>>
>>>> On 02-06-2020 21:38, Willem de Bruijn wrote:
>>>>> On Tue, Jun 2, 2020 at 3:22 PM Victor Julien <victor@xxxxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 02-06-2020 21:03, Willem de Bruijn wrote:
>>>>>>> On Tue, Jun 2, 2020 at 2:31 PM Victor Julien <victor@xxxxxxxxxxxx> wrote:
>>>>>>>> On 02-06-2020 19:37, Willem de Bruijn wrote:
>>>>>>>>> On Tue, Jun 2, 2020 at 1:03 PM Victor Julien <victor@xxxxxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> On 02-06-2020 16:29, Willem de Bruijn wrote:
>>>>>>>>>>> On Tue, Jun 2, 2020 at 4:05 AM Victor Julien <victor@xxxxxxxxxxxx> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Introduce a new flag (TP_STATUS_CSUM_UNNECESSARY) to indicate
>>>>>>>>>>>> that the driver has completely validated the checksums in the packet.
>>>>>>>>>>>>
>>>>>>>>>>>> The TP_STATUS_CSUM_UNNECESSARY flag differs from TP_STATUS_CSUM_VALID
>>>>>>>>>>>> in that the new flag will only be set if all the layers are valid,
>>>>>>>>>>>> while TP_STATUS_CSUM_VALID is set as well if only the IP layer is valid.
>>>>>>>>>>>
>>>>>>>>>>> transport, not ip checksum.
>>>>>>>>>>
>>>>>>>>>> Allow me a n00b question: what does transport refer to here? Things like
>>>>>>>>>> ethernet? It isn't clear to me from the doc.
>>>>>>>>>
>>>>>>>>> The TCP/UDP/.. transport protocol checksum.
>>>>>>>>
>>>>>>>> Hmm that is what I thought originally, but then it didn't seem to work.
>>>>>>>> Hence my patch.
>>>>>>>>
>>>>>>>> However I just redid my testing. I took the example tpacketv3 program
>>>>>>>> and added the status flag checks to the 'display()' func:
>>>>>>>>
>>>>>>>> if (ppd->tp_status & TP_STATUS_CSUM_VALID) {
>>>>>>>> printf("TP_STATUS_CSUM_VALID, ");
>>>>>>>> }
>>>>>>>> if (ppd->tp_status & (1<<8)) {
>>>>>>>> printf("TP_STATUS_CSUM_UNNECESSARY, ");
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then using scapy sent some packets in 2 variants:
>>>>>>>> - default (good csums)
>>>>>>>> - deliberately bad csums
>>>>>>>> (then also added a few things like ip6 over ip)
>>>>>>>>
>>>>>>>>
>>>>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(),
>>>>>>>> iface="enp1s0") // good csums
>>>>>>>>
>>>>>>>> srp1(Ether()/IP(src="1.2.3.4", dst="5.6.7.8")/IPv6()/TCP(chksum=1),
>>>>>>>> iface="enp1s0") //bad tcp
>>>>>>>
>>>>>>> Is this a test between two machines? What is the device driver of the
>>>>>>> machine receiving and printing the packet? It would be helpful to know
>>>>>>> whether this uses CHECKSUM_COMPLETE or CHECKSUM_UNNECESSARY.
>>>>>>
>>>>>> Yes 2 machines, or actually 2 machines and a VM. The receiving Linux
>>>>>> sits in a kvm vm with network pass through and uses the virtio driver
>>>>>> (host uses e1000e). Based on a quick 'git grep CHECKSUM_UNNECESSARY'
>>>>>> virtio seems to support that.
>>>>>>
>>>>>> I've done some more tests. In a pcap replay that I know contains packet
>>>>>> with bad TCP csums (but good IP csums for those pkts), to a physical
>>>>>> host running Ubuntu Linux kernel 5.3:
>>>>>>
>>>>>> - receiver uses nfp (netronome) driver: TP_STATUS_CSUM_VALID set for
>>>>>> every packet, including the bad TCP ones
>>>>>> - receiver uses ixgbe driver: TP_STATUS_CSUM_VALID not set for the bad
>>>>>> packets.
>>>>>
>>>>> Great. Thanks a lot for running all these experiments.
>>>>>
>>>>> We might have to drop the TP_STATUS_CSUM_VALID with CHECKSUM_COMPLETE
>>>>> unless skb->csum_valid.
>>>>>
>>>>> For packets with multiple transport layer checksums,
>>>>> CHECKSUM_UNNECESSARY should mean that all have been verified.
>>>>>
>>>>> I believe that in the case of multiple transport headers, csum_valid
>>>>> similarly ensures all checksums up to csum_start are valid. Will need
>>>>> to double check.
>>>>>
>>>>> If so, there probably is no need for a separate new TP_STATUS.
>>>>> TP_STATUS_CSUM_VALID is reported only when all checksums are valid.
>>>>
>>>> So if I understand you correctly the key may be in the call to
>>>> `skb_csum_unnecessary`:
>>>>
>>>> That reads:
>>>>
>>>> static inline int skb_csum_unnecessary(const struct sk_buff *skb)
>>>> {
>>>> return ((skb->ip_summed == CHECKSUM_UNNECESSARY) ||
>>>> skb->csum_valid ||
>>>> (skb->ip_summed == CHECKSUM_PARTIAL &&
>>>> skb_checksum_start_offset(skb) >= 0));
>>>> }
>>>>
>>>> But really only the first 2 conditions are reachable
>>>
>>> .. from this codepath. That function is called in other codepaths as well.
>>>
>>>> , as we already know
>>>> skb->ip_summed is not CHECKSUM_PARTIAL when we call it.
>>>>
>>>> So our unmodified check is:
>>>>
>>>> else if (skb->pkt_type != PACKET_OUTGOING &&
>>>> (skb->ip_summed == CHECKSUM_COMPLETE ||
>>>> skb->ip_summed == CHECKSUM_UNNECESSARY ||
>>>> skb->csum_valid))
>>>>
>>>> Should this become something like:
>>>>
>>>> else if (skb->pkt_type != PACKET_OUTGOING &&
>>>> (skb->ip_summed == CHECKSUM_COMPLETE &&
>>>> skb->csum_valid) ||
>>>> skb->ip_summed == CHECKSUM_UNNECESSARY)
>>>>
>>>> Is this what you had in mind?
>>>
>>> I don't suggest modifying skb_csum_unnecessary probably. Certainly not
>>> until I've looked at all other callers of it.
>>>
>>> But in case of packet sockets, yes, adding that csum_valid check is my
>>> first rough approximation.
>>>
>>> That said, first let's give others more familiar with
>>> TP_STATUS_CSUM_VALID some time to comment.
>>>
>>
>> I did some more experiments, on real hw this time. I made the following
>> change to 5.7.0 (wasn't brave enough to remote upgrade a box to netnext):
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 29bd405adbbd..3afb1913837a 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -2216,8 +2216,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct
>> net_device *dev,
>> if (skb->ip_summed == CHECKSUM_PARTIAL)
>> status |= TP_STATUS_CSUMNOTREADY;
>> else if (skb->pkt_type != PACKET_OUTGOING &&
>> - (skb->ip_summed == CHECKSUM_COMPLETE ||
>> - skb_csum_unnecessary(skb)))
>> + ((skb->ip_summed == CHECKSUM_COMPLETE &&
>> skb->csum_valid) ||
>> + skb->ip_summed == CHECKSUM_UNNECESSARY))
>> status |= TP_STATUS_CSUM_VALID;
>>
>> if (snaplen > res)
>>
>> With this change it seems the TP_STATUS_CSUM_VALID flag is *never* set
>> for the nfp driver.
>
> I was mistaken. skb->csum_valid only signals whether the skb->csum
> field is initialized. As of commit 573e8fca255a ("net: skb_gro_checksum_*
> functions") skb->csum_valid it is always set if CHECKSUM_COMPLETE.
> This does not imply that the checksum field in the header is correct.
>
> The checksum field may get checked against the known checksum of
> the payload in skb->csum before __netif_receive_skb_core and thus
> before packet sockets during GRO when that is enabled. But not
> always. Not if the packet gets flushed, for instance, see tcp4_gro_receive.
>
> Commit 662880f44203 ("net: Allow GRO to use and set levels of checksum
> unnecessary") indicates that the original assumption in this patch
> that CHECKSUM_UNNECESSARY implies all checksums being valid does not
> necessarily hold. Drivers are expected to set up skb->csum_level when
> they have verified more than just the inner transport header.
>

I think I found another case in the kernel that does seem to assume we
can rely on skb_csum_unnecessary.

496e4ae7dc94 ("netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info
flag") seems to try to do what I'm after for nfqueue, but with an
inverted flag. I assume that if the flag is not set (and neither
NFQA_SKB_CSUMNOTREADY) it means we should be able to infer that the
csums are valid. Otherwise, what would be the point of the flag.

The logic seems to come down to:

csum_verify = !skb_csum_unnecessary(entskb);

(for ip_summed != CHECKSUM_PARTIAL)

The it's passed to userspace:

if (packet->ip_summed == CHECKSUM_PARTIAL)
flags = NFQA_SKB_CSUMNOTREADY;
else if (csum_verify)
flags = NFQA_SKB_CSUM_NOTVERIFIED;

So according to this code, if skb_csum_unnecessary returns false the
csums is not verified, implying that it is when skb_csum_unnecessary
returns true.

I have no idea if this can be mapped directly to af-packet like this.

Despite reading 77cffe23c1f8 ("net: Clarification of
CHECKSUM_UNNECESSARY") multiple times I'm still not sure. If we get a
straightforward IPv4/TCP or IPv6/UDP does it mean that if
CHECKSUM_UNNECESSARY is set we can trust the csums of those layers are
validated?

If properly documented that would cover all the use cases I initially
care about, although it would of course be nice if the kernel already
knows the VXLAN encapsulated traffic was also verified that we can pass
this on as well.

--
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------