Re: [PATCH] x86: only use ERMS for user copies for larger sizes

From: Jens Axboe
Date: Sat Nov 24 2018 - 01:22:08 EST


On 11/21/18 11:16 AM, Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> It would be interesting to know exactly which copy it is that matters
>> so much... *inlining* the erms case might show that nicely in
>> profiles.
>
> Side note: the fact that Jens' patch (which I don't like in that form)
> allegedly shrunk the resulting kernel binary would seem to indicate
> that there's a *lot* of compile-time constant-sized memcpy calls that
> we are missing, and that fall back to copy_user_generic().

Other kind of side note... This also affects memset(), which does
rep stosb if we have ERMS if any size memset. I noticed this from
sg_init_table(), which does a memset of the table. For my kind of
testing, the entry size is small. The below, too, reduces memset()
overhead by 50% here for me.

diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 9bc861c71e75..bad0fdb9ddcd 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -60,6 +60,8 @@ EXPORT_SYMBOL(__memset)
* rax original destination
*/
ENTRY(memset_erms)
+ cmpl $128,%edx
+ jb memset_orig
movq %rdi,%r9
movb %sil,%al
movq %rdx,%rcx

--
Jens Axboe