Re: [PATCH] x86: bring back rep movsq for user access on CPUs without ERMS
From: Mateusz Guzik
Date: Tue Aug 29 2023 - 16:21:03 EST
On 8/29/23, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 29 Aug 2023 at 12:45, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>>
>> So I think I know how to fix it, but I'm going to sleep on it.
>
> I think you can just skip the %r8 games, and do that
>
> leal (%rax,%rcx,8),%rcx
>
> in the exception fixup code, since %rax will have the low bits of the
> byte count, and %rcx will have the remaining qword count.
>
> We should also have some test-case for partial reads somewhere, but I
> have to admit that when I did the cleanup patches I just wrote some
> silly test myself (ie just doing a 'mmap()' and then reading/writing
> into the end of that mmap at different offsets.
>
> I didn't save that hacky thing, I'm afraid.
>
Ye I was planning on writing some tests to illustrate this works as
intended and v1 does not. Part of why I'm going to take more time,
there is no rush patching this.
> I also tried to figure out if there is any CPU we should care about
> that doesn't like 'rep movsq', but I think you are right that there
> really isn't. The "good enough" rep things were introduced in the PPro
> if I recall correctly, and while you could disable them in the BIOS,
> by the time Intel did 64-bit in Northwood (?) it was pretty much
> standard.
>
gcc already inlines rep movsq for copies which fit, so....
On that note I'm going to submit a patch to whack non-rep clear_page as well.
> So yeah, no reason to have the unrolled loop at all, and I think your
> patch is fine conceptually, just needs fixing and testing for the
> partial success case.
>
> Oh, and you should also remove the clobbers of r8-r11 in the
> copy_user_generic() inline asm in <asm/uaccess_64.h> when you've fixed
> the exception handling. The only reason for those clobbers were for
> that unrolled register use.
>
> So only %rax ends up being a clobber for the rep_movs_alternative
> case, as far as I can tell.
>
Ok, I'll patch it up.
That label reorg was cosmetics, did not matter. But that bad fixup
thing was quite avoidable by checking what original movsq was doing,
which I should have done before sending v1. Sorry for the lame patch
on that front. ;) (fwiw I did multiple kernel builds and whatnot with
it, nothing blew up)
Thanks for the review.
--
Mateusz Guzik <mjguzik gmail.com>