Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

From: Kees Cook
Date: Wed Aug 30 2017 - 22:27:36 EST


On Wed, Aug 30, 2017 at 7:09 PM, Mike Galbraith <efault@xxxxxx> wrote:
> On Wed, 2017-08-30 at 12:46 -0700, Kees Cook wrote:
>>
>> With CONFIG_ARCH_HAS_REFCOUNT=y and this patch, do you get an earlier splat?
>
> Yup, first gripe below.
>
> [ 2.448393] refcount_t silent saturation at skb_unref.part.36+0x12/0x1a in (haveged)[136], uid/euid: 0/0
> [ 2.454975] ------------[ cut here ]------------
> [ 2.456452] WARNING: CPU: 1 PID: 136 at kernel/panic.c:612 refcount_error_report+0xa0/0xa4
> [ 2.456452] Modules linked in: scsi_mod(E+) autofs4(E)
> [ 2.456456] CPU: 1 PID: 136 Comm: (haveged) Tainted: G E 4.13.0.g152d54a-tip-default #47
> [ 2.456456] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 2.456457] task: ffff880137f65640 task.stack: ffffc9000093c000
> [ 2.456458] RIP: 0010:refcount_error_report+0xa0/0xa4
> [ 2.456459] RSP: 0018:ffffc9000093fa38 EFLAGS: 00010282
> [ 2.456459] RAX: 000000000000005c RBX: ffffffff81a33864 RCX: 0000000000000830
> [ 2.456459] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000246
> [ 2.456460] RBP: ffffc9000093fb88 R08: 000000008e8d4f61 R09: 0000000000000000
> [ 2.456462] R10: 0000000000000000 R11: ffff880137f61600 R12: ffff880137f65640
> [ 2.456463] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000006
> [ 2.456464] FS: 00007f5a33a14880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> [ 2.456464] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.456464] CR2: 00007ffe37ca0c88 CR3: 0000000137f74005 CR4: 00000000001606e0
> [ 2.456467] Call Trace:
> [ 2.456478] ex_handler_refcount+0x63/0x70
> [ 2.456479] fixup_exception+0x32/0x40
> [ 2.456482] do_trap+0x11b/0x180
> [ 2.456493] do_error_trap+0x70/0xd0
> [ 2.456497] ? skb_unref.part.36+0x10/0x1a
> [ 2.456502] ? idr_get_free+0xcb/0x2f0
> [ 2.456505] invalid_op+0x1e/0x30
> [ 2.456510] RIP: 0010:skb_unref.part.36+0x12/0x1a
> [ 2.456510] RSP: 0018:ffffc9000093fc30 EFLAGS: 00010202
> [ 2.456510] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff880137df9ee4
> [ 2.456511] RDX: ffff880137df0518 RSI: ffff880137df9e00 RDI: ffff880137df9e00
> [ 2.456511] RBP: ffff880137df9e00 R08: ffff880137e30018 R09: ffffffff816b91f0
> [ 2.456511] R10: 0000000000000017 R11: 00000000fffffff4 R12: 0000000000000000
> [ 2.456512] R13: ffff88013a4e9000 R14: 0000000000000000 R15: ffff880137e30000
> [ 2.456513] ? cleanup_uevent_env+0x10/0x10
> [ 2.456520] kfree_skb+0x3f/0xa0
> [ 2.456525] netlink_broadcast_filtered+0x2c8/0x420
> [ 2.456530] ? cleanup_uevent_env+0x10/0x10
> [ 2.456531] kobject_uevent_env+0x476/0x650
> [ 2.456540] device_add+0x41e/0x5f0
> [ 2.456550] netdev_register_kobject+0x8e/0x170
> [ 2.456553] register_netdevice+0x27a/0x3d0
> [ 2.456557] register_netdev+0x16/0x30
> [ 2.456562] loopback_net_init+0x48/0xa0
> [ 2.456567] ops_init+0x39/0x110
> [ 2.456568] setup_net+0x85/0x130
> [ 2.456569] copy_net_ns+0xb9/0x1f0
> [ 2.456571] create_new_namespaces+0x11a/0x1b0
> [ 2.456577] unshare_nsproxy_namespaces+0x55/0xa0
> [ 2.456578] SyS_unshare+0x18d/0x330
> [ 2.456581] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 2.456584] RIP: 0033:0x7f5a320669f7
> [ 2.456585] RSP: 002b:00007ffcd1e10908 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
> [ 2.456585] RAX: ffffffffffffffda RBX: 000000000159ae48 RCX: 00007f5a320669f7
> [ 2.456586] RDX: 000000000000000b RSI: 00007ffcd1e10910 RDI: 0000000040000000
> [ 2.456588] RBP: 00007ffcd1e10cf0 R08: 0000000000000001 R09: 00000000015b8d20
> [ 2.456588] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000
> [ 2.456588] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000000
> [ 2.456589] Code: 10 09 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 f0 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 80 70 a3 81 31 c0 e8 bd f1 05 00 <0f> ff eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89
> [ 2.456599] ---[ end trace d3215cc8334d8520 ]---

Interesting! Can you try with 633547973ffc3 ("net: convert
sk_buff.users from atomic_t to refcount_t") reverted? I'll see if
running haveged will help me trigger this on my system...

-Kees

--
Kees Cook
Pixel Security