Re: [PATCH v2 2/3] x86/traps: Print non-canonical address on #GP

From: Dmitry Vyukov
Date: Mon Nov 18 2019 - 11:29:56 EST


On Mon, Nov 18, 2019 at 5:20 PM 'Jann Horn' via kasan-dev
<kasan-dev@xxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 18, 2019 at 5:03 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > On Mon, Nov 18, 2019 at 3:21 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
> > >
> > > On Fri, Nov 15, 2019 at 08:17:27PM +0100, Jann Horn wrote:
> > > > dotraplinkage void
> > > > do_general_protection(struct pt_regs *regs, long error_code)
> > > > {
> > > > @@ -547,8 +581,15 @@ do_general_protection(struct pt_regs *regs, long error_code)
> > > > return;
> > > >
> > > > if (notify_die(DIE_GPF, desc, regs, error_code,
> > > > - X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
> > > > - die(desc, regs, error_code);
> > > > + X86_TRAP_GP, SIGSEGV) == NOTIFY_STOP)
> > > > + return;
> > > > +
> > > > + if (error_code)
> > > > + pr_alert("GPF is segment-related (see error code)\n");
> > > > + else
> > > > + print_kernel_gp_address(regs);
> > > > +
> > > > + die(desc, regs, error_code);
> > >
> > > Right, this way, those messages appear before the main "general
> > > protection ..." message:
> > >
> > > [ 2.434372] traps: probably dereferencing non-canonical address 0xdfff000000000001
> > > [ 2.442492] general protection fault: 0000 [#1] PREEMPT SMP
> > >
> > > Can we glue/merge them together? Or is this going to confuse tools too much:
> > >
> > > [ 2.542218] general protection fault while derefing a non-canonical address 0xdfff000000000001: 0000 [#1] PREEMPT SMP
> > >
> > > (and that sentence could be shorter too:
> > >
> > > "general protection fault for non-canonical address 0xdfff000000000001"
> > >
> > > looks ok to me too.)
> >
> > This exact form will confuse syzkaller crash parsing for Linux kernel:
> > https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L1347
> > It expects a "general protection fault:" line for these crashes.
> >
> > A graceful way to update kernel crash messages would be to add more
> > tests with the new format here:
> > https://github.com/google/syzkaller/tree/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/testdata/linux/report
> > Update parsing code. Roll out new version. Update all other testing
> > systems that detect and parse kernel crashes. Then commit kernel
> > changes.
>
> So for syzkaller, it'd be fine as long as we keep the colon there?
> Something like:
>
> general protection fault: derefing non-canonical address
> 0xdfff000000000001: 0000 [#1] PREEMPT SMP

Probably. Tests help a lot to answer such questions ;) But presumably
it should break parsing.

> And it looks like the 0day test bot doesn't have any specific pattern
> for #GP, it seems to just look for the panic triggered by
> panic-on-oops as far as I can tell (oops=panic in lkp-exec/qemu, no
> "general protection fault" in etc/dmesg-kill-pattern).
>
> > An unfortunate consequence of offloading testing to third-party systems...
>
> And of not having a standard way to signal "this line starts something
> that should be reported as a bug"? Maybe as a longer-term idea, it'd
> help to have some sort of extra prefix byte that the kernel can print
> to say "here comes a bug report, first line should be the subject", or
> something like that, similar to how we have loglevels...

This would be great.
Also a way to denote crash end.
However we have lots of special logic for subjects, not sure if kernel
could provide good subject:
https://github.com/google/syzkaller/blob/1daed50ac33511e1a107228a9c3b80e5c4aebb5c/pkg/report/linux.go#L537-L1588
Probably it could, but it won't be completely trivial. E.g. if there
is a stall inside of a timer function, it should give the name of the
actual timer callback as identity ("stall in timer_subsystem_foo"). Or
for syscalls we use more disambiguation b/c "in sys_ioclt" is not much
different than saying "there is a bug in kernel" :)