Re: [PATCH RFC V2 2/2] net: Optimize snmp stat aggregation by walking all the percpu data at once

From: Joe Perches
Date: Fri Aug 28 2015 - 19:13:20 EST

On Fri, 2015-08-28 at 15:29 -0700, Eric Dumazet wrote:
> On Fri, 2015-08-28 at 14:26 -0700, Joe Perches wrote:
> 1) u64 array[XX] on stack is naturally aligned,

Of course it is.

> kzalloc() wont improve this at all. Not sure what you believe.

An alloc would only reduce stack use.

Copying into the buffer, then copying the buffer into the
skb may be desirable on some arches though.

> 2) put_unaligned() is basically a normal memory write on x86.
> memcpy(dst,src,...) will have a problem anyway on arches that care,
> because src & dst wont have same alignment.

OK, so all the world's an x86?

On arm32, copying 288 bytes using nearly all aligned word
transfers is generally faster than using only unsigned
short transfers.

> 288 bytes on stack in a leaf function in this path is totally fine, it
> is not like we're calling ext4/xfs/nfs code after this point.

Generally true. It's always difficult to know how much
stack has been consumed though and smaller stack frames
are generally better.

Anyway, the block copy from either the alloc'd or stack
buffer amounts only to a slight performance improvement
for arm32. It doesn't really have much other utility.

