Re: [PATCH] x86/memcpy: Introduce memcpy_mcsafe_fast

From: Dan Williams
Date: Mon Apr 20 2020 - 01:08:38 EST


On Sat, Apr 18, 2020 at 1:52 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Apr 18, 2020 at 1:30 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >
> > Maybe Iâm missing something obvious, but whatâs the alternative? The _mcsafe variants donât just avoid the REP mess â they also tell the kernel that this particular access is recoverable via extable.
>
> .. which they could easily do exactly the same way the user space
> accessors do, just with a much simplified model that doesn't even care
> about multiple sizes, since unaligned accesses weren't valid anyway.
>
> The thing is, all of the MCS code has been nasty. There's no reason
> for it what-so-ever that I can tell. The hardware has been so
> incredibly broken that it's basically unusable, and most of the
> software around it seems to have been about testing.
>
> So I absolutely abhor that thing. Everything about that code has
> screamed "yeah, we completely mis-designed the hardware, we're pushing
> the problems into software, and nobody even uses it or can test it so
> there's like 5 people who care".
>
> And I'm pushing back on it, because I think that the least the code
> can do is to at least be simple.
>
> For example, none of those optimizations should exist. That function
> shouldn't have been inline to begin with. And if it really really
> matters from a performance angle that it was inline (which I doubt),
> it shouldn't have looked like a memory copy, it should have looked
> like "get_user()" (except without all the complications of actually
> having to test addresses or worry about different sizes).
>
>
> And it almost certainly shouldn't have been done in low-level asm
> either. It could have been a single "read aligned word" interface
> using an inline asm, and then everything else could have been done as
> C code around it.

Do we have examples of doing exception handling from C? I thought all
the exception handling copy routines were assembly routines?

>
> But no. The software side is almost as messy as the hardware side is.
> I hate it. And since nobody sane can test it, and the broken hardware
> is _so_ broken than nobody should ever use it, I have continually
> pushed back against this kind of ugly nasty special code.
>
> We know the writes can't fault, since they are buffered. So they
> aren't special at all.

The writes can mmu-fault now that memcpy_mcsafe() is also used by
_copy_to_iter_mcsafe(). This allows a clean bypass of the block layer
in fs/dax.c in addition to the pmem driver access of poisoned memory.
Now that the fallback is a sane rep; movs; it can be considered for
plain copy_to_iter() for other user copies so you get exception
handling on kernel access of poison outside of persistent memory. To
Andy's point I think a recoverable copy (for exceptions or faults) is
generally useful.

> We know the acceptable reads for the broken hardware basically boil
> down to a single simple word-size aligned read, so you need _one_
> special inline asm for that. The rest of the cases can be handled by
> masking and shifting if you really really need to - and done better
> that way than with byte accesses anyway.
>
> Then you have _one_ C file that implements everything using that
> single operation (ok, if people absolutely want to do sizes, I guess
> they can if they can just hide it in that one file), and you have one
> header file that exposes the interfaces to it, and you're done.
>
> And you strive hard as hell to not impact anything else, because you
> know that the hardware is unacceptable until all those special rules
> go away. Which they apparently finally have.

I understand the gripes about the mcsafe_slow() implementation, but
how do I implement mcsafe_fast() any better than how it is currently
organized given that, setting aside machine check handling,
memcpy_mcsafe() is the core of a copy_to_iter*() front-end that can
mmu-fault on either source or destination access?