Re: general protection fault in perf_misc_flags

From: Borislav Petkov
Date: Wed Sep 23 2020 - 05:03:54 EST


On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote:
> So I think there's an issue with "deterministically reproducible."
> The syzcaller report has:
> > > Unfortunately, I don't have any reproducer for this issue yet.

Yeah, Dmitry gave two other links of similar reports, the first one
works for me:

https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c

and that one doesn't have a reproducer either. The bytes look familiar
though:

Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18
All code
========
0: c1 e8 03 shr $0x3,%eax
3: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1)
8: 74 05 je 0xf
a: e8 79 7a a7 00 callq 0xa77a88
f: 49 8b 47 10 mov 0x10(%r15),%rax
13: 48 89 05 f6 d8 ef 09 mov %rax,0x9efd8f6(%rip) # 0x9efd910
1a: 49 8d 7f 08 lea 0x8(%r15),%rdi
1e: 48 89 f8 mov %rdi,%rax
21: 48 c1 e8 03 shr $0x3,%rax
25: 42 80 3c 00 00 cmpb $0x0,(%rax,%r8,1)
2a:* 00 00 add %al,(%rax) <-- trapping instruction
2c: e8 57 7a a7 00 callq 0xa77a88
31: 49 8b 47 08 mov 0x8(%r15),%rax
35: 48 89 05 dc d8 ef 09 mov %rax,0x9efd8dc(%rip) # 0x9efd918
3c: 49 8d 7f 18 lea 0x18(%r15),%rdi

4 zero bytes again. And that .config has kasan stuff enabled too so
could the failure be related to having kasan stuff enabled and it
messing up offsets?

That is, provided this is the mechanism how it would happen. We still
don't know what and when wrote those zeroes in there. Not having a
reproducer is nasty but looking at those reports above and if I'm
reading this correctly, rIP points to

RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline]

each time and the URL says they're 9 crashes total. And each have
happened at that rIP. So all we'd need is set a watchpoint when that
address is being written and dump stuff.

Dmitry, can the syzkaller do debugging stuff like that?

> Following my hypothesis about having a bad address calculation; the
> tricky part is I'd need to look through the relocations and try to see
> if any could resolve to the address that was accidentally modified. I
> suspect objtool could be leveraged for that;

If you can find this at compile time...

> maybe it could check whether each `struct jump_entry`'s `target`
> member referred to either a NOP or a CMP, and error otherwise? (Do we
> have other non-NOP or CMP targets? IDK)

Follow jump_label_transform() - it does verify what it is going to
patch. And while I'm looking at this, I realize that the jump labels
patch 5 bytes but the above zeroes are 4 bytes. In the other opcode
bytes I decoded it is 4 bytes too. So this might not be caused by the
jump labels patching...

> This hypothesis might also be incorrect, and thus would be chasing a
> red herring...not really sure how else to pursue debugging this.

Yeah, this one is tricky to debug.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette