Re: [PATCH v13] x86, mce: Add memcpy_trap()

From: Tony Luck
Date: Wed Feb 24 2016 - 12:39:00 EST

On Fri, Feb 19, 2016 at 9:53 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> Make use of the EXTABLE_FAULT exception table entries. This routine
> returns a structure to indicate the result of the copy:
> struct mcsafe_ret {
> u64 trap_nr;
> u64 bytes_left;
> };
> If the copy is successful, then both 'trap_nr' and 'bytes_left' are zero.
> If we faulted during the copy, then 'trap_nr' will say which type
> of trap (X86_TRAP_PF or X86_TRAP_MC) and 'bytes_left' says how many
> bytes were not copied.
> Note that this is probably the first of several copy functions.
> We can make new ones for non-temporal cache handling etc.
> Reviewed-by: Borislav Petkov <bp@xxxxxxx>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
> V12-V13
> Ingo: Separate instruction arguments with a ", "
> Note that I didn't add spaces after "," within an argument.
> E.g. "lea (%rdx,%rcx,8), %rdx"
> Did you want them there too? I don't think they help as much there.
> Ingo: More readable layout for fixup stubs
> arch/x86/include/asm/string_64.h | 26 ++++++++
> arch/x86/kernel/x8664_ksyms_64.c | 2 +
> arch/x86/lib/memcpy_64.S | 128 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 156 insertions(+)

Where do we stand with this? The followup discussion dropped LKML at some point
in the thread ... so here is the summary to bring the archive up to date:

1) Dan Williams doesn't really care about getting the bytes_left
value. A simple succeed/fail code would work for him.

2) But if we want to use this for copy_from_user() as part of the
write(2) call stack (and I *do* want to do that), then there are some
POSIX corner cases that say that if the middle of a buffer supplied by
the user is invalid we should write bytes up to that point to the file
and return a short, but accurate, byte count rather than -EFAULT

3) Linus was concerned that we would not be able to get a precise
bytes_left value when using the "rep mov" x86ism because it might be
copying in a weird order (even backwards) for speed reasons. But the
Intel architects pointed to the SDM volume 2 "REP" description which
makes it clear that whatever shenanigans might be happening behind the
scenes, if the "rep" is interrupted by a trap or fault the
architectural view will be that all bytes up to the point of the fault
will have been copied, no bytes beyond that point will have been
copied (in flight writes will be dropped). rdi/rsi/ecx registers will
all have been updated to the point of the fault (so we somehow fixed
the reason for the machine check we'l be able to *continue* the copy
from the point where it faulted).