Re: general protection fault in perf_misc_flags

From: Nick Desaulniers
Date: Tue Sep 22 2020 - 14:56:21 EST


On Mon, Sep 21, 2020 at 3:13 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Mon, Sep 21, 2020 at 01:59:43PM -0700, Nick Desaulniers wrote:
> > Right, the two sequences above look almost the same, except those 4
> > bytes of zeros (the disassembler gets confused about the rest, but
> > it's the same byte sequence otherwise). Are the two disassemblies a
> > comparison of the code at runtime vs. compile-time?
>
> Yes.
>
> > If so, how did you disassemble the runtime code?
>
> ./scripts/decodecode < /tmp/splat
>
> where /tmp/splat contains the line starting with "Code:". Make sure you
> have only one "Code:"-line, otherwise you'll see the code of the *last*
> Code: line only.

Thanks.

> > If runtime and compile time differ, I suspect some kind of runtime
> > patching.
>
> If it is, it ain't patching at the right place. :)

Yeah, but we've had this kind of bug before:
https://nickdesaulniers.github.io/blog/2020/04/06/off-by-two/
I'm sure it's not the last.

> But no, that function is pretty simple and looking at its asm, there's
> no asm goto() or alternatives in there. But that .config might add them.
> It adds a lot of calls to *ASAN helpers and whatnot.

Maybe not in this translation unit, but it's possible another TU does
have one and it miscalculates the offset; overwriting code in another
TU.

> > I wonder if we calculated the address of a static_key wrong
> > (asm goto). What function am I looking at the disassembly of?
> > perf_misc_flags() in arch/x86/events/core.c?
>
> Yes.
>
> > With this config?
> > https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 (though I
> > don't see _any_ asm goto in the IR for this file built with this
> > config).
>
> Right, there should be none.
>
> > If this is deterministically reproducible, I suppose we
> > could set a watchpoint on the address being overwritten?
>
> Sounds like worth a try. I'll go sleep instead, tho. :)

So I think there's an issue with "deterministically reproducible."
The syzcaller report has:
> > Unfortunately, I don't have any reproducer for this issue yet.

Following my hypothesis about having a bad address calculation; the
tricky part is I'd need to look through the relocations and try to see
if any could resolve to the address that was accidentally modified. I
suspect objtool could be leveraged for that; maybe it could check
whether each `struct jump_entry`'s `target` member referred to either
a NOP or a CMP, and error otherwise? (Do we have other non-NOP or CMP
targets? IDK)

This hypothesis might also be incorrect, and thus would be chasing a
red herring...not really sure how else to pursue debugging this.

> Gnight and good luck.

Ah, that's a famous quote from journalist Edward R Murrow, who helped
defeat Senator Joseph McCarthy (Murrow's show See It Now dedicated a
segment to addressing McCarthy). Sometimes I fund uncanny parallels
between claims of what a compiler can do on LKML "without proper
regard for evidence" and McCarthyism. Falsifiability is an
interesting trait. That's why I try to advocate for sharing links
from godbolt.org as much as possible.
--
Thanks,
~Nick Desaulniers