Re: [2/2] x86/unwind: Make CONFIG_UNWINDER_ORC=y the default in kconfig for 64-bit
From: Andrei Vagin
Date: Fri Oct 20 2017 - 02:54:39 EST
On Thu, Oct 19, 2017 at 08:28:04PM -0500, Josh Poimboeuf wrote:
> On Thu, Oct 19, 2017 at 03:35:22PM -0700, Andrei Vagin wrote:
> > On Thu, Oct 19, 2017 at 01:16:55PM -0500, Josh Poimboeuf wrote:
> > > On Thu, Oct 19, 2017 at 09:51:04AM -0700, Andrei Vagin wrote:
> > > > Hi,
> > > >
> > > > We run CRIU tests for tip/auto-latest regularly, and a few days ago our
> > > > test job started to detect this warning in a kernel log:
> > > >
> > > > [ 44.235786] WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b
> > > >
> > > > What does it mean? How critical is it?
> > > >
> > > > Our test job fails if it detects any warning in a kernel log. Maybe we
> > > > need to investigate reasons of this warning and try to eliminate it?
> > > >
> > > > Here are logs:
> > > > https://travis-ci.org/avagin/linux/jobs/289676634
> > >
> > > I think it means the unwinder found some bad ORC unwinder metadata. Any
> > > chance you have access to the kernel binary? I need to know what code
> > > corresponds to that ffffffff95f0d94b address.
> > >
> > > Or if you can reproduce with the following patch, that should help:
> > >
> > >
> > > diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
> > > index 570b70d3f604..95b633f0ce51 100644
> > > --- a/arch/x86/kernel/unwind_orc.c
> > > +++ b/arch/x86/kernel/unwind_orc.c
> > > @@ -448,7 +448,7 @@ bool unwind_next_frame(struct unwind_state *state)
> > >
> > > case ORC_TYPE_REGS_IRET:
> > > if (!deref_stack_regs(state, sp, &state->ip, &state->sp, false)) {
> > > - orc_warn("can't dereference iret registers at %p for ip %p\n",
> > > + orc_warn("can't dereference iret registers at %p for ip %pB\n",
> > > (void *)sp, (void *)orig_ip);
> > > goto done;
> > > }
> >
> > I applied your patch and rerun tests.
> >
> > [ 44.947699] WARNING: can't dereference iret registers at ffff880178f5ffe0 for ip int3+0x5b/0x60
>
> Thanks, that was enough for me to figure it out. Can you test the below fix?
This patch works for me. I run tests a few times and they found nothing
suspicious.
Tested-by: Andrei Vagin <avagin@xxxxxxxxxxxxx>
Thank you!
>
> > and now here is a warning from kasan:
> >
> > [ 477.775676] ==================================================================
> > [ 477.775845] BUG: KASAN: stack-out-of-bounds in deref_stack_reg+0x11d/0x150
>
> The KASAN warning is a known issue for which the fix is a little more
> complicated. v1 of the patch was here:
>
> https://lkml.kernel.org/r/cover.1507128293.git.jpoimboe@xxxxxxxxxx
>
>
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 49167258d587..f6cdb7a1455e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -808,7 +808,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
>
> .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
> ENTRY(\sym)
> - UNWIND_HINT_IRET_REGS offset=8
> + UNWIND_HINT_IRET_REGS offset=\has_error_code*8
>
> /* Sanity check */
> .if \shift_ist != -1 && \paranoid == 0
>