Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

From: Kees Cook
Date: Wed Aug 30 2017 - 15:46:54 EST


On Wed, Aug 30, 2017 at 12:19 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Wed, Aug 30, 2017 at 10:55 AM, Mike Galbraith <efault@xxxxxx> wrote:
>> On Wed, 2017-08-30 at 10:32 -0700, Kees Cook wrote:
>>> On Wed, Aug 30, 2017 at 10:13 AM, Mike Galbraith <efault@xxxxxx> wrote:
>>> > On Wed, 2017-08-30 at 09:35 -0700, Kees Cook wrote:
>>> >> On Tue, Aug 29, 2017 at 10:02 PM, Mike Galbraith <efault@xxxxxx> wrote:
>>> >> > On Tue, 2017-08-29 at 11:41 -0700, Kees Cook wrote:
>>> >> >> Can you also test with 14afee4b6092 ("net: convert sock.sk_wmem_alloc
>>> >> >> from atomic_t to refcount_t") reverted (instead of ARCH_HAS_REFCOUNT
>>> >> >> disabled)?
>>> >> >
>>> >> > Nogo.
>>> >>
>>> >> Thanks for checking!
>>> >>
>>> >> > [ 44.901930] WARNING: CPU: 5 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
>>> >>
>>> >> This is so odd if 14afee4b6092 is reverted. What is line 374 for you
>>> >> in net/netlink/af_netlink.c?
>>> >
>>> > 374 WARN_ON(atomic_read(&sk->sk_rmem_alloc));
>>> >
>>> > That line is unchanged by 14afee4b6092.
>>>
>>> Uuuuhmm. Wow, now I'm really baffled. I thought you were getting the
>>> warn from the next line with the refcount usage... I will keep
>>> digging. Thanks!
>>
>> I just double checked freshly pulled tip (rapidly moving target), and
>> it's definitely nogo with CONFIG_ARCH_HAS_REFCOUNT=y and 14afee4b6092
>> reverted.

With CONFIG_ARCH_HAS_REFCOUNT=y and this patch, do you get an earlier splat?

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index c076f710de4c..569d97a4c3e8 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -72,6 +72,8 @@ bool ex_handler_refcount(const struct
exception_table_entry *fixup,
bool zero = regs->flags & X86_EFLAGS_ZF;

refcount_error_report(regs, zero ? "hit zero" : "overflow");
+ } else {
+ refcount_error_report(regs, "silent saturation");
}

return true;

The difference between none, FULL, and x86-refcount is the saturation
semantics. none has, obviously, none, and FULL pins values either to
zero or UINT_MAX, depending on various things. x86-refcount saturates
to INT_MIN / 2, but it should only get there after overflow or
unexpected zeroing (both cases would WARN). So if there's some other
path to hitting saturation that could put some refcount_t into a state
that is different from none and FULL, and maybe that side-effect
results in the af_netlink WARN. (I can't see _how_ yet, though.)

So if the above patch does NOT throw a new WARN, then that'll be even weirder.

-Kees

--
Kees Cook
Pixel Security