Re: [PATCH] nvmet-tcp: switch to using the crc32c library
From: David Laight
Date: Sun Mar 02 2025 - 06:50:02 EST
On Wed, 26 Feb 2025 19:01:22 +0000
Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
...
> I have patches for nvme-tls almost ready too. Just been taking my time since
> I've been updating all other users of "crc32" and "crc32c" in the kernel too.
> And I need to decide what to do about skb_copy_and_hash_datagram_iter().
I've wondered if any of the 'copy and xxx' functions are actually worth the
extra complexity they add.
The (non-Atom) Intel cpu will copy at 32 bytes/clock provided the destination
is 32 byte aligned (so for an skb copy you may want to copy a few bytes of
'headroom' to align the copy) (I'm not sure how any other cpu behave).
The 'and xxx' algorithm is likely to run faster without having to worry
about writes. May cpu can do more than 1 read/clock, but only one write.
I guess the main benefit is for buffers that are larger than the l1-cache
(or half the cache size if you do the copy first).
It is likely worse for the 'iter' functions (which scatter-gather copy a
linear kernel buffer). They have to allow for the unusual case of multiple
fragments - and I'd guess the initial fragments are likely to be short.
Although I'm not at all sure of the point of doing the IP checksum with
the user copy. My guess is it helped NFS (8k UDP datagrams).
These days most high performance ethernet hardware supports checksum offload.
So RX UDP datagrams (which probably rarely matter) have a valid checksum
and there is no point making send() checksum the transmit data.
I ought to double check that the TX data is always checksummed in send()
I don't remember a conditional - and you pretty much never need it.
UDP TX are going to be short (no userspace NFS) and the normal path transmits
on the callers stack - so the data is likely to be in the right cache if
the checksum is needed.
David