Re: [PATCH] ARM: make memzero optimization smarter

From: Arnd Bergmann
Date: Tue Jan 16 2018 - 17:30:35 EST


On Tue, Jan 16, 2018 at 6:10 PM, Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> wrote:
> On Tue, 16 Jan 2018, Arnd Bergmann wrote:
>
>> While testing with a gcc-8.0.0 snapshot, I ran into a harmless
>> build warning:
>>
>> In file included from include/linux/string.h:18:0,
>> ...
>> from drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:10:
>> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c: In function '__lb_other_process':
>> arch/arm/include/asm/string.h:50:5: error: 'memset' specified size 4294967295 exceeds maximum object size 2147483647 [-Werror=stringop-overflow=]
>> memset((__p),(v),(__n)); \
>> ^~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:394:3: note: in expansion of macro 'memset'
>> memset(&skb->data[frame_size / 2], 0xAA, frame_size / 2 - 1);
>> ^~~~~~
>>
>> I think the warning is unintentional here, and gcc should not actually
>> warn, so I reported this in the gcc bugzilla as pr82103. From the
>> discussion there, it seems unlikely that this gets addressed in
>> the compiler.
>>
>> The problem here is that testing the 'frame_size' variable for non-zero
>> in the first memset() macro invocation leads to a code path in which
>> gcc thinks it may be zero, and that code path would lead to an overly
>> large length for the following memset that is now "(u32)-1". We know
>> this won't happen as the skb len is already guaranteed to be nonzero
>> when we get here (it has just been allocated with a nonzero size).
>>
>> However, we can avoid this class of bogus warnings for the memset() macro
>> by only doing the micro-optimization for zero-length arguments when the
>> length is a compile-time constant. This should also reduce code size by
>> a few bytes, and avoid an extra branch for the cases that a variable-length
>> argument is always nonzero, which is probably the common case anyway.
>>
>> I have made sure that the __memzero implementation can safely handle
>> a zero length argument.
>
> Why not simply drop the test on (__n) != 0 then? I fail to see what the
> advantage is in that case.

Good point. We might actually get even better results by dropping the
__memzero path entirely, since gcc has can optimize trivial memset()
operations and inline them.

If I read arch/arm/lib/memzero.S correctly, it saves exactly two 'orr'
instructions compared to the memset.S implementation, but calling
memset() rather than __memzero() from C code ends up saving a
function call at least some of the time.

Building a defconfig kernel with gcc-7.2.1, I see 1919 calls to __memzero()
and 636 calls to memset() in vmlinux. If I remove the macro entirely,
I get 1775 calls to memset() instead, so 780 memzero instances got
inlined, and kernel shrinks by 5488 bytes (0.03%), not counting the
__memzero implementation that we could possibly also drop.

FWIW, the zero-length check saves five references to __memzero()
and one reference to memset(), or 16 bytes in kernel size, I have not
checked what those are.

Arnd