RE: [PATCH] x86/lib: Remove the special case for odd-aligned buffers in csum_partial.c

From: David Laight
Date: Mon Dec 13 2021 - 11:16:47 EST


From: Eric Dumazet
> Sent: 13 December 2021 15:56
>
> On Mon, Dec 13, 2021 at 7:37 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > From: Dave Hansen
> > > Sent: 13 December 2021 15:02
> > .c
> > >
> > > On 12/13/21 6:43 AM, David Laight wrote:
> > > > There is no need to special case the very unusual odd-aligned buffers.
> > > > They are no worse than 4n+2 aligned buffers.
> > > >
> > > > Signed-off-by: David Laight <david.laight@xxxxxxxxxx>
> > > > ---
> > > >
> > > > On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte
> > > > checksum.
> > > > That is just measuring the main loop with an lfence prior to rdpmc to
> > > > read PERF_COUNT_HW_CPU_CYCLES.
> > >
> > > I'm a bit confused by this changelog.
> > >
> > > Are you saying that the patch causes a (small) performance regression?
> > >
> > > Are you also saying that the optimization here is not worth it because
> > > it saves 15 lines of code? Or that the misalignment checks themselves
> > > add 2 or 3 cycles, and this is an *optimization*?
> >
> > I'm saying that it can't be worth optimising for a misaligned
> > buffer because the cost of the buffer being misaligned is so small.
> > So the test for a misaligned buffer are going to cost more than
> > and plausible gain.
> >
> > Not only that the buffer will never be odd aligned at all.
> >
> > The code is left in from a previous version that did do aligned
> > word reads - so had to do extra for odd alignment.
> >
> > Note that code is doing misaligned reads for the more likely 4n+2
> > aligned ethernet receive buffers.
> > I doubt that even a test for that would be worthwhile even if you
> > were checksumming full sized ethernet packets.
> >
> > So the change is deleting code that is never actually executed
> > from the hot path.
> >
>
> I think I left this code because I got confused with odd/even case,
> but this is handled by upper functions like csum_block_add()
>
> What matters is not if the start of a frag is odd/even, but what
> offset it is in the overall ' frame', if a frame is split into multiple
> areas (scatter/gather)

Yes odd length fragments are a different problem.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)