RE: [PATCH 04/18] csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum

From: David Laight
Date: Thu Jul 23 2020 - 04:29:08 EST


From: Al Viro
> Sent: 22 July 2020 18:39
> On Wed, Jul 22, 2020 at 04:17:02PM +0000, David Laight wrote:
> > > David, do you *ever* bother to RTFS? I mean, competent supercilious twits
> > > are annoying, but at least with those you can generally assume that what
> > > they say makes sense and has some relation to reality. You, OTOH, keep
> > > spewing utter bollocks, without ever lowering yourself to checking if your
> > > guesses have anything to do with the reality. With supercilious twit part
> > > proudly on the display - you do speak with confidence, and the way you
> > > dispense the oh-so-valuable advice to everyone around...
> >
> > Yes, I do look at the code.
> > I've actually spent a lot of time looking at the x86 checksum code.
> > I've posted a patch for a version that is about twice as fast as the
> > current one on a large range of x86 cpus.
> >
> > Possibly I meant the 32bit reduction inside csum_add()
> > rather than what csum_fold() does.
>
> Really?
> static inline unsigned add32_with_carry(unsigned a, unsigned b)
> {
> asm("addl %2,%0\n\t"
> "adcl $0,%0"
> : "=r" (a)
> : "0" (a), "rm" (b));
> return a;
> }

I agree it isn't much, but both those instructions almost certainly
get replicated with the initial value fed into the checksum function.

Everything except x86, sparc/64 and powerpc/64 uses the C code
from include/net/checksum.h which is the longer sequences:
csum += addend;
csum += csum < addend;
That's three instructions on something like MIPS - not too bad.
I'm not sure about ARM - ARM could probably use adc.
Some architectures may end up with an actual conditional jump.

Quite how the instructions get scheduled probably makes more
difference.
The sequence is a register dependency chain, and the checksum
register could easily be limiting the execution speed.
On x86 the 'adc' loop runs at two clocks per adc on a wide
range of Intel cpus.

Actually there is lot more to be gained in the code that reads
the iovec[] from userspace.
The calling sequences for the two nexted functions used are horrid.
Fixing that does make a measurable difference to semdmsg().

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)