Re: [PATCH 8/8] x86/mce: Decode a kernel instruction to determine if it is copying from user

From: Borislav Petkov
Date: Mon Sep 21 2020 - 07:31:56 EST


On Tue, Sep 08, 2020 at 10:55:19AM -0700, Tony Luck wrote:
> +static bool is_copy_from_user(struct pt_regs *regs)
> +{
> + u8 insn_buf[MAX_INSN_SIZE];
> + struct insn insn;
> +
> + if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip, MAX_INSN_SIZE))
> + return false;

<---- newline here.

> + kernel_insn_init(&insn, insn_buf, MAX_INSN_SIZE);
> + insn_get_length(&insn);

insn_get_opcode() I guess.

> +
> + switch (insn.opcode.value) {
> + case 0x8A: case 0x8B: /* MOV */

No side comments pls - put them ontop.

Also, this comment needs to say that you're looking for MOVs where the
source operand can also be a memory operand.

Now lemme stare at an example, let's look at this function:

static __always_inline __must_check unsigned long
raw_copy_to_user(void __user *dst, const void *src, unsigned long size)
{
return copy_user_generic((__force void *)dst, src, size);

In this case, we copy to user memory, so dst is the user pointer.

Comment over copy_user_generic_unrolled() says rsi is the source so
let's look at some of the insns in there:

ffffffff813accc2: 4c 8b 06 mov (%rsi),%r8
ffffffff813accc5: 4c 8b 4e 08 mov 0x8(%rsi),%r9
ffffffff813accc9: 4c 8b 56 10 mov 0x10(%rsi),%r10
ffffffff813acccd: 4c 8b 5e 18 mov 0x18(%rsi),%r11

All those are at labels which are exception-handled with the new
_ASM_EXTABLE_CPY().

So according to the above check, this is a copy *from* user. But it
ain't. And to confirm that, I added a breakpoint at that insn:

(gdb) break *0xffffffff813accc2
Breakpoint 1 at 0xffffffff813accc2: file arch/x86/lib/copy_user_64.S, line 66.

and the first time it hit, it has this:

Dump of assembler code from 0xffffffff813accc2 to 0xffffffff813accd6:
=> 0xffffffff813accc2 <copy_user_generic_unrolled+50>: 4c 8b 06 mov (%rsi),%r8
0xffffffff813accc5 <copy_user_generic_unrolled+53>: 4c 8b 4e 08 mov 0x8(%rsi),%r9

rsi 0xffffc90000013e10
r8 0x7fff60425120

So this is reading from *kernel* memory and writing to *user* memory.
And I don't think you want that, according to the whole intent of those
series.

And it makes sense - getting an MCE while writing is probably going to
go boom.

> + case 0xB60F: case 0xB70F: /* MOVZ */

Ditto.

> + return true;
> + case 0xA4: case 0xA5: /* MOVS */
> + return !fault_in_kernel_space(regs->si);
> + }
> +
> + return false;
> +}
> +
> /*
> * If mcgstatus indicated that ip/cs on the stack were
> * no good, then "m->cs" will be zero and we will have
> @@ -215,10 +238,17 @@ static int error_context(struct mce *m, struct pt_regs *regs)
>
> if ((m->cs & 3) == 3)
> return IN_USER;
> + if (!mc_recoverable(m->mcgstatus))
> + return IN_KERNEL;
>
> t = ex_fault_handler_type(m->ip);
> - if (mc_recoverable(m->mcgstatus) && t == HANDLER_FAULT) {
> + if (t == HANDLER_FAULT) {
> + m->kflags |= MCE_IN_KERNEL_RECOV;
> + return IN_KERNEL_RECOV;
> + }
> + if (t == HANDLER_UACCESS && regs && is_copy_from_user(regs)) {
> m->kflags |= MCE_IN_KERNEL_RECOV;
> + m->kflags |= MCE_IN_KERNEL_COPYIN;
> return IN_KERNEL_RECOV;

I'm guessing that should be generic enough to do on the other vendors
too...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette