Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

From: Mike Galbraith
Date: Fri Sep 01 2017 - 13:53:23 EST


On Fri, 2017-09-01 at 10:12 -0700, Kees Cook wrote:
> On Fri, Sep 1, 2017 at 6:09 AM, Mike Galbraith <efault@xxxxxx> wrote:
> > On Fri, 2017-09-01 at 08:57 +0200, Mike Galbraith wrote:
> >> On Thu, 2017-08-31 at 11:45 -0700, Kees Cook wrote:
> >> > On Thu, Aug 31, 2017 at 10:19 AM, Mike Galbraith <efault@xxxxxx> wrote:
> >> > > On Thu, 2017-08-31 at 10:00 -0700, Kees Cook wrote:
> >> > >>
> >> > >> Oh! So it's gcc-version sensitive? That's alarming. Is this mapping correct:
> >> > >>
> >> > >> 4.8.5: WARN, eventual kernel hang
> >> > >> 6.3.1, 7.0.1: WARN, but continues working
> >> > >
> >> > > Yeah, that's correct. I find that troubling, simply because this gcc
> >> > > version has been through one hell of a lot of kernels with me. Yeah, I
> >> > > know, that doesn't exempt it from having bugs, but color me suspicious.
> >> >
> >> > I still can't hit this with a 4.8.5 build. :(
> >> >
> >> > With _RATELIMIT removed, this should, in theory, report whatever goes
> >> > negative first...
> >>
> >> I applied the other patch you posted, and built with gcc-6.3.1 to
> >> remove the gcc-4.8.5 aspect. Look below the resulting splat.
> >
> > Grr, that one has a in6_dev_getx() line missing for the first
> > increment, where things go pear shaped.
> >
> > With that added, looking at counter both before, and after incl, with a
> > trace_printk() in the exception handler showing it doing its saturate
> > thing, irqs disabled across the whole damn refcount_inc(), and even
> > booting box nr_cpus=1 for extra credit...
> >
> > HTH can that first refcount_inc() get there?
> >
> > # tracer: nop
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > systemd-1 [000] d..1 1.937284: in6_dev_getx: PRE refs.counter:3
> > systemd-1 [000] d..1 1.937295: ex_handler_refcount: *(int *)regs->cx = -1073741824
> > systemd-1 [000] d..1 1.937296: in6_dev_getx: POST refs.counter:-1073741824
>
> O_o
>
> Can you paste the disassembly of in6_dev_getx? I can't understand how
> we're landing in the exception handler.

I was hoping you'd say that.

0xffffffff816b2f72 <+0>: push %rbp
0xffffffff816b2f73 <+1>: mov %rsp,%rbp
0xffffffff816b2f76 <+4>: push %r12
0xffffffff816b2f78 <+6>: push %rbx
0xffffffff816b2f79 <+7>: incl %gs:0x7e95a2d0(%rip) # 0xd250 <__preempt_count>
0xffffffff816b2f80 <+14>: mov 0x308(%rdi),%rbx
0xffffffff816b2f87 <+21>: test %rbx,%rbx
0xffffffff816b2f8a <+24>: je 0xffffffff816b2feb <in6_dev_getx+121>
0xffffffff816b2f8c <+26>: callq *0xffffffff81c35a00
0xffffffff816b2f93 <+33>: mov %rax,%r12
0xffffffff816b2f96 <+36>: callq *0xffffffff81c35a10
0xffffffff816b2f9d <+43>: mov 0x769ad4(%rip),%rsi # 0xffffffff81e1ca78 <trace_printk_fmt.21733>
0xffffffff816b2fa4 <+50>: mov 0xf0(%rbx),%edx
0xffffffff816b2faa <+56>: mov $0xffffffff816b2f8c,%rdi
0xffffffff816b2fb1 <+63>: callq 0xffffffff81171fc0 <__trace_bprintk>
0xffffffff816b2fb6 <+68>: lock incl 0xf0(%rbx)
0xffffffff816b2fbd <+75>: js 0xffffffff816b2fbf <in6_dev_getx+77>
0xffffffff816b2fbf <+77>: lea 0xf0(%rbx),%rcx
0xffffffff816b2fc6 <+84>: (bad)
0xffffffff816b2fc8 <+86>: mov 0x769a99(%rip),%rsi # 0xffffffff81e1ca68 <trace_printk_fmt.21744>
0xffffffff816b2fcf <+93>: mov 0xf0(%rbx),%edx
0xffffffff816b2fd5 <+99>: mov $0xffffffff816b2f8c,%rdi
0xffffffff816b2fdc <+106>: callq 0xffffffff81171fc0 <__trace_bprintk>
0xffffffff816b2fe1 <+111>: mov %r12,%rdi
0xffffffff816b2fe4 <+114>: callq *0xffffffff81c35a08
0xffffffff816b2feb <+121>: decl %gs:0x7e95a25e(%rip) # 0xd250 <__preempt_count>
0xffffffff816b2ff2 <+128>: mov %rbx,%rax
0xffffffff816b2ff5 <+131>: pop %rbx
0xffffffff816b2ff6 <+132>: pop %r12
0xffffffff816b2ff8 <+134>: pop %rbp
0xffffffff816b2ff9 <+135>: retq

I don't get the section business at all, +75 looks to me like we're
gonna trap no matter what.. as we appear to be doing.

> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -55,6 +55,20 @@ static __always_inline void refcount_inc
> > : : "cc", "cx");
> > }
> >
> > +static __always_inline void refcount_inc_x(refcount_t *r)
> > +{
> > + unsigned long flags;
> > +
> > + local_irq_save(flags);
> > + trace_printk("PRE refs.counter:%d\n", r->refs.counter);
> > + asm volatile(LOCK_PREFIX "incl %0\n\t"
> > + REFCOUNT_CHECK_LT_ZERO
> > + : [counter] "+m" (r->refs.counter)
> > + : : "cc", "cx");
>
> Does this need an explicit "memory" added to the clobbers line here?
> This isn't present in the atomic_inc() implementation, but maybe
> something confuses gcc in this case into ignoring the "+m" marking?

I thought about adding that (hail mary), but resisted.

> > + trace_printk("POST refs.counter:%d\n", r->refs.counter);
> > + local_irq_restore(flags);
> > +}
> > +
> > static __always_inline void refcount_dec(refcount_t *r)
> > {
> > asm volatile(LOCK_PREFIX "decl %0\n\t"
> > --- a/arch/x86/mm/extable.c
> > +++ b/arch/x86/mm/extable.c
> > @@ -45,6 +45,7 @@ bool ex_handler_refcount(const struct ex
> > {
> > /* First unconditionally saturate the refcount. */
> > *(int *)regs->cx = INT_MIN / 2;
> > + trace_printk("*(int *)regs->cx = %d\n", *(int *)regs->cx);
>
> Just for fun, can you print out *(int *)regs->cx before the assignment too?

Sure, tomorrow.

-Mike