Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible

From: Mateusz Guzik
Date: Fri Mar 03 2023 - 14:38:23 EST


On 3/3/23, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> On 3/3/23, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Fri, Mar 3, 2023 at 9:39 AM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>>>
>>> So the key is: memset is underperforming at least on x86-64 for
>>> certain sizes and the init-on-alloc thingy makes it used significantly
>>> more, exacerbating the problem
>>
>> One reason that the kernel memset isn't as optimized as memcpy, is
>> simply because under normal circumstances it shouldn't be *used* that
>> much outside of page clearing and constant-sized structure
>> initialization.
>>
>
> I mentioned in the previous e-mail that memset is used a lot even
> without the problematic opt and even have shown size distribution of
> what's getting passed there.
>
> You made me curious how usage compares to memcpy, so I checked 'make'
> in kernel dir once again.
>
> I stress the init-on-alloc opt is *OFF*:
> # CONFIG_INIT_ON_ALLOC_DEFAULT_ON is not set
> # CONFIG_INIT_ON_FREE_DEFAULT_ON is not set
>
> # bpftrace -e 'kprobe:memset,kprobe:memcpy { @[probe] = count(); }'
> [snip]
> @[kprobe:memcpy]: 6948498
> @[kprobe:memset]: 36150671
>
> iow memset is used about 7 times as much as memcpy when building the
> kernel. If it matters I'm building on top of
> 2eb29d59ddf02e39774abfb60b2030b0b7e27c1f (reasonably fresh master).
>
>> So this is literally a problem with pointless automated memset,
>> introduced by that hardening option. And hardening people generally
>> simply don't care about performance, and the people who _do _care
>> about performance usually don't enable the known-expensive crazy
>> stuff.
>>
>> Honestly, I think the INIT_ONCE stuff is actively detrimental, and
>> only hides issues (and in this case adds its own). So I can't but help
>> to say "don't do that then". I think it's literally stupid to clear
>> allocations by default.
>>
>
> Questioning usability of the entire thing was on the menu in what I
> intended to write, but I did not read entire public discussion so
> perhaps I missed a crucial insight.
>
>> I'm not opposed to improving memset, but honestly, if the argument is
>> based on the stupid hardening behavior, I really think *that* needs to
>> be fixed first.
>>
>
> It is not. The argument is that this is a core primitive, used a lot
> with sizes where rep stos is detrimental to its performance. There is
> no urgency, but eventually someone(tm) should sort it out. For
> $reasons I don't want to do it myself.
>
> I do bring it up in the context of the init-on-alloc machinery because
> it disfigures whatever results are obtained testing on x86-64 -- they
> can get exactly the same effect for much smaller cost, consequently
> they should have interest in sorting this out.
>
> General point though was that the work should have sanity-checked
> performance of the primitive it relies on, instead of assuming it is
> ~optimal. I'm guilty of this mistake myself, so not going to throw
> stones.
>

Forgot to paste the crapper I used to make both visible to bpftrace.

I'm guessing there is a sensible way, I just don't see it and would
love an instruction :)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index a64017602010..c40084308d58 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -43,9 +43,6 @@ SYM_TYPED_FUNC_START(__memcpy)
SYM_FUNC_END(__memcpy)
EXPORT_SYMBOL(__memcpy)

-SYM_FUNC_ALIAS(memcpy, __memcpy)
-EXPORT_SYMBOL(memcpy)
-
/*
* memcpy_erms() - enhanced fast string memcpy. This is faster and
* simpler than memcpy. Use memcpy_erms when possible.
diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 6143b1a6fa2c..141d899d5f1d 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -45,9 +45,6 @@ SYM_FUNC_START(__memset)
SYM_FUNC_END(__memset)
EXPORT_SYMBOL(__memset)

-SYM_FUNC_ALIAS(memset, __memset)
-EXPORT_SYMBOL(memset)
-
/*
* ISO C memset - set a memory block to a byte value. This function uses
* enhanced rep stosb to override the fast string function.
diff --git a/fs/open.c b/fs/open.c
index 4401a73d4032..a3383391bd17 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1564,3 +1564,19 @@ int stream_open(struct inode *inode, struct file *filp)
}

EXPORT_SYMBOL(stream_open);
+
+void *(memset)(void *s, int c, size_t n)
+{
+ return __memset(s, c, n);
+}
+
+EXPORT_SYMBOL(memset);
+
+
+void *(memcpy)(void *d, const void *s, size_t n)
+{
+ return __memcpy(d, s, n);
+}
+
+EXPORT_SYMBOL(memcpy);



--
Mateusz Guzik <mjguzik gmail.com>