Re: [RFC] Improve memset
From: Borislav Petkov
Date: Fri Sep 13 2019 - 06:42:41 EST
On Fri, Sep 13, 2019 at 11:18:00AM +0200, Rasmus Villemoes wrote:
> Something like
> if (__builtin_constant_p(c) && __builtin_constant_p(n) && n <= 32)
> return __builtin_memset(dest, c, n);
> might be enough? Of course it would be sad if 32 was so high that this
> turned into a memset() call, but there's -mmemset-strategy= if one wants
> complete control. Though that's of course build-time, so can't consider
> differences between cpu models.
Yah, that seems to turn this:
memset(&tr, 0, sizeof(tr));
tr.b = b;
where sizeof(tr) < 32 into:
# ./arch/x86/include/asm/string_64.h:29: return __builtin_memset(dest, c, n);
movq $0, 16(%rsp) #, MEM[(void *)&tr + 8B]
movq $0, 24(%rsp) #, MEM[(void *)&tr + 8B]
# arch/x86/kernel/cpu/mce/amd.c:1070: tr.b = b;
movq %rbx, 8(%rsp) # b, tr.b
which is 2 u64 moves and the assignment of b at offset 8.
Question is, where we should put the size cap? I'm thinking 32 or 64
Linus, got any suggestions?
Or should we talk to Intel hw folks about it...