RE: [RFC] csum experts, csum_replace2() is too expensive

From: David Laight
Date: Fri Mar 21 2014 - 10:16:03 EST


From: Eric Dumazet
> On Thu, 2014-03-20 at 18:56 -0700, Andi Kleen wrote:
> > Eric Dumazet <eric.dumazet@xxxxxxxxx> writes:
> > >
> > > I saw csum_partial() consuming 1% of cpu cycles in a GRO workload, that
> > > is insane...
> >
> >
> > Couldn't it just be the cache miss?
>
> Or the fact that we mix 16 bit stores and 32bit loads ?
>
> BTW, any idea why ip_fast_csum() on x86 contains a "memory" constraint ?

The correct constraint would be one that told gcc that it
accesses the 20 bytes from the source pointer.

Without it gcc won't necessarily write out the values before
the asm instructions execute.

David