Re: [RFC] Improve memset

From: Borislav Petkov
Date: Fri Sep 13 2019 - 06:42:41 EST


On Fri, Sep 13, 2019 at 11:18:00AM +0200, Rasmus Villemoes wrote:
> Something like
>
> if (__builtin_constant_p(c) && __builtin_constant_p(n) && n <= 32)
> return __builtin_memset(dest, c, n);
>
> might be enough? Of course it would be sad if 32 was so high that this
> turned into a memset() call, but there's -mmemset-strategy= if one wants
> complete control. Though that's of course build-time, so can't consider
> differences between cpu models.

Yah, that seems to turn this:

memset(&tr, 0, sizeof(tr));
tr.b = b;

where sizeof(tr) < 32 into:

# ./arch/x86/include/asm/string_64.h:29: return __builtin_memset(dest, c, n);
movq $0, 16(%rsp) #, MEM[(void *)&tr + 8B]
movq $0, 24(%rsp) #, MEM[(void *)&tr + 8B]
# arch/x86/kernel/cpu/mce/amd.c:1070: tr.b = b;
movq %rbx, 8(%rsp) # b, tr.b

which is 2 u64 moves and the assignment of b at offset 8.

Question is, where we should put the size cap? I'm thinking 32 or 64
bytes...

Linus, got any suggestions?

Or should we talk to Intel hw folks about it...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette