Re: tip -ENOBOOT - bisected to locking/refcounts, x86/asm: Implement fast refcount overflow protection

From: Kees Cook
Date: Tue Aug 29 2017 - 11:58:20 EST


On Tue, Aug 29, 2017 at 1:50 AM, Mike Galbraith <efault@xxxxxx> wrote:
> Take 2 of KVM bisect as you work fingered $subject. Take 1 was stymied
> by build dependencies (aa5d1b81, df340524) which I foolishly tried to
> skip, leading git bisect to end up handing me a list of commits that
> might be busted. During take 2, I added those two as required.
>
> Symptom is a few splats as below, with box finally hanging. Network
> comes up, but neither ssh nor console login is possible.
>
> [ 4.105048] ------------[ cut here ]------------
> [ 4.106072] WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0
> [ 4.107969] Modules linked in: autofs4(E)
> [ 4.109328] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G E 4.13.0.g44e89e4-tip-default #27
> [ 4.111075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 4.114119] task: ffff88018ee743c0 task.stack: ffffc90000cc0000
> [ 4.115481] RIP: 0010:netlink_sock_destruct+0x82/0xa0
> [ 4.116698] RSP: 0018:ffff880246103eb0 EFLAGS: 00010206
> [ 4.117997] RAX: 0000000000000300 RBX: ffff880236f1f000 RCX: 000077ff80000000
> [ 4.120657] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
> [ 4.123145] RBP: ffff880236f1f000 R08: 000400010000b630 R09: 0000b6290000b621
> [ 4.125139] R10: 000400010000b630 R11: 0000b6290000b621 R12: 0000000000000202
> [ 4.126866] R13: ffffffff81cf1440 R14: ffff88018ee743c0 R15: ffffffff815e0fd0
> [ 4.128731] FS: 0000000000000000(0000) GS:ffff880246100000(0000) knlGS:0000000000000000
> [ 4.130206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4.131581] CR2: 000055c7ab255df0 CR3: 0000000236fd9001 CR4: 00000000001606e0
> [ 4.133066] Call Trace:
> [ 4.133919] <IRQ>
> [ 4.134836] __sk_destruct+0x21/0x190
> [ 4.136016] rcu_process_callbacks+0x23e/0x880
> [ 4.137050] ? rebalance_domains+0x182/0x2b0
> [ 4.138050] __do_softirq+0xc8/0x287
> [ 4.139174] irq_exit+0xd5/0xe0
> [ 4.140252] smp_apic_timer_interrupt+0x64/0x140
> [ 4.141880] apic_timer_interrupt+0x96/0xa0
> [ 4.143290] </IRQ>
> [ 4.144214] RIP: 0010:native_safe_halt+0x2/0x10
> [ 4.145990] RSP: 0018:ffffc90000cc3ed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
> [ 4.147449] RAX: ffffffff816d4820 RBX: ffff88018ee743c0 RCX: 0000000000000000
> [ 4.148626] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 4.151061] RBP: 0000000000000004 R08: 000000008e8d302a R09: 0000000000000000
> [ 4.153687] R10: 0000000000000006 R11: 0000000000000005 R12: ffff88018ee743c0
> [ 4.155587] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 4.157462] ? __sched_text_end+0x5/0x5
> [ 4.158869] default_idle+0x18/0x110
> [ 4.160348] do_idle+0x15e/0x1f0
> [ 4.161734] cpu_startup_entry+0x5f/0x70
> [ 4.163211] start_secondary+0x14c/0x180
> [ 4.164716] secondary_startup_64+0xa5/0xa5
> [ 4.165730] Code: 00 00 85 c0 75 25 8b 83 44 01 00 00 85 c0 75 10 48 83 bb e0 02 00 00 00 75 02 5b c3 0f ff 5b c3 0f ff 0f 1f 80 00 00 00 00 eb e5 <0f> ff eb d7 48 89 de 48 c7 c7 e0 e6 ab 81 31 c0 5b e9 25 ca af
> [ 4.168787] ---[ end trace 79aa32f0718d3fc7 ]---

Ah-ha, found the tip-bot commit now that disables the x86 refcount
implementation. Can you boot with CONFIG_REFCOUNT_FULL=y?

Can you send me your .config and details on the machine?

Also, what is above the "cut here" line (which is very misleading, as
things above that line tend to be very important)?

Thanks!

-Kees

--
Kees Cook
Pixel Security