Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP

From: Jann Horn
Date: Mon Nov 18 2019 - 11:20:13 EST


On Mon, Nov 18, 2019 at 5:03 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Mon, Nov 18, 2019 at 3:21 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
> >
> > On Fri, Nov 15, 2019 at 08:17:27PM +0100, Jann Horn wrote:
> > > dotraplinkage void
> > > do_general_protection(struct pt_regs *regs, long error_code)
> > > {
> > > @@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
> > > return;
> > >
> > > if (notify_die(DIE_GPF, desc, regs, error_code,
> > > - X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
> > > - die(desc, regs, error_code);
> > > + X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
> > > + return;
> > > +
> > > + if (error_code)
> > > + pr_alert("GPF is segment-related (see error code)\n");
> > > + else
> > > + print_kernel_gp_address(regs);
> > > +
> > > + die(desc, regs, error_code);
> >
> > Right, this way, those messages appear before the main "general
> > protection ..." message:
> >
> > [ 2.434372] traps: probably dereferencing non-canonical address 0xdfff000000000001
> > [ 2.442492] general protection fault: 0000 [#1] PREEMPT SMP
> >
> > Can we glue/merge them together? Or is this going to confuse tools too much:
> >
> > [ 2.542218] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> >
> > (and that sentence could be shorter too:
> >
> > "general protection fault for non-canonical address 0xdfff000000000001"
> >
> > looks ok to me too.)
>
> This exact form will confuse syzkaller crash parsing for Linux kernel:
> https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L1347
> It expects a "general protection fault:" line for these crashes.
>
> A graceful way to update kernel crash messages would be to add more
> tests with the new format here:
> https://github.com/google/syzkaller/tree/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/testdata/linux/report
> Update parsing code. Roll out new version. Update all other testing
> systems that detect and parse kernel crashes. Then commit kernel
> changes.

So for syzkaller, it'd be fine as long as we keep the colon there?
Something like:

general protection fault: derefing non-canonical address
0xdfff000000000001: 0000 [#1] PREEMPT SMP

And it looks like the 0day test bot doesn't have any specific pattern
for #GP, it seems to just look for the panic triggered by
panic-on-oops as far as I can tell (oops=panic in lkp-exec/qemu, no
"general protection fault" in etc/dmesg-kill-pattern).

> An unfortunate consequence of offloading testing to third-party systems...

And of not having a standard way to signal "this line starts something
that should be reported as a bug"? Maybe as a longer-term idea, it'd
help to have some sort of extra prefix byte that the kernel can print
to say "here comes a bug report, first line should be the subject", or
something like that, similar to how we have loglevels...