Re: tty: panic in tty_ldisc_restore

From: Greg Kroah-Hartman
Date: Thu Mar 02 2017 - 16:31:50 EST


On Thu, Mar 02, 2017 at 08:30:48PM +0100, Dmitry Vyukov wrote:
> On Thu, Mar 2, 2017 at 8:27 PM, Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >> >>>>> >> >> Hello,
> >> >>>>> >> >>
> >> >>>>> >> >> Syzkaller fuzzer started crashing kernel with the following panics:
> >> >>>>> >> >>
> >> >>>>> >> >> Kernel panic - not syncing: Couldn't open N_TTY ldisc for ircomm0 --- error -12.
> >> >>>>> >> >> CPU: 0 PID: 5637 Comm: syz-executor3 Not tainted 4.9.0 #6
> >> >>>>> >> >> Hardware name: Google Google Compute Engine/Google Compute Engine,
> >> >>>>> >> >> BIOS Google 01/01/2011
> >> >>>>> >> >> ffff8801d4ba7a18 ffffffff8234d0df ffffffff00000000 1ffff1003a974ed6
> >> >>>>> >> >> ffffed003a974ece 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1
> >> >>>>> >> >> 0000000000000000 0000000000000000 ffff8801d4ba76a8 00000000dabb4fad
> >> >>>>> >> >> Call Trace:
> >> >>>>> >> >> [<ffffffff8234d0df>] __dump_stack lib/dump_stack.c:15 [inline]
> >> >>>>> >> >> [<ffffffff8234d0df>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> >> >>>>> >> >> [<ffffffff818280d4>] panic+0x1fb/0x412 kernel/panic.c:179
> >> >>>>> >> >> [<ffffffff826bb0d4>] tty_ldisc_restore drivers/tty/tty_ldisc.c:520 [inline]
> >> >>>>> >> >> [<ffffffff826bb0d4>] tty_set_ldisc+0x704/0x8b0 drivers/tty/tty_ldisc.c:579
> >> >>>>> >> >> [<ffffffff826a3a93>] tiocsetd drivers/tty/tty_io.c:2667 [inline]
> >> >>>>> >> >> [<ffffffff826a3a93>] tty_ioctl+0xc63/0x2370 drivers/tty/tty_io.c:2924
> >> >>>>> >> >> [<ffffffff81a7a22f>] vfs_ioctl fs/ioctl.c:43 [inline]
> >> >>>>> >> >> [<ffffffff81a7a22f>] do_vfs_ioctl+0x1bf/0x1630 fs/ioctl.c:679
> >> >>>>> >> >> [<ffffffff81a7b72f>] SYSC_ioctl fs/ioctl.c:694 [inline]
> >> >>>>> >> >> [<ffffffff81a7b72f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
> >> >>>>> >> >> [<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
> >> >>>>> >> >>
> >> >>>>> >> >> Kernel panic - not syncing: Couldn't open N_TTY ldisc for ptm2 --- error -12.
> >> >>>>> >> >> CPU: 0 PID: 7844 Comm: syz-executor0 Not tainted 4.9.0 #6
> >> >>>>> >> >> Hardware name: Google Google Compute Engine/Google Compute Engine,
> >> >>>>> >> >> BIOS Google 01/01/2011
> >> >>>>> >> >> ffff8801c3307a18 ffffffff8234d0df ffffffff00000000 1ffff10038660ed6
> >> >>>>> >> >> ffffed0038660ece 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1
> >> >>>>> >> >> 0000000000000000 0000000000000000 ffff8801c33076a8 00000000dabb4fad
> >> >>>>> >> >> Call Trace:
> >> >>>>> >> >> [<ffffffff8234d0df>] __dump_stack lib/dump_stack.c:15 [inline]
> >> >>>>> >> >> [<ffffffff8234d0df>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> >> >>>>> >> >> [<ffffffff818280d4>] panic+0x1fb/0x412 kernel/panic.c:179
> >> >>>>> >> >> [<ffffffff826bb0d4>] tty_ldisc_restore drivers/tty/tty_ldisc.c:520 [inline]
> >> >>>>> >> >> [<ffffffff826bb0d4>] tty_set_ldisc+0x704/0x8b0 drivers/tty/tty_ldisc.c:579
> >> >>>>> >> >> [<ffffffff826a3a93>] tiocsetd drivers/tty/tty_io.c:2667 [inline]
> >> >>>>> >> >> [<ffffffff826a3a93>] tty_ioctl+0xc63/0x2370 drivers/tty/tty_io.c:2924
> >> >>>>> >> >> [<ffffffff81a7a22f>] vfs_ioctl fs/ioctl.c:43 [inline]
> >> >>>>> >> >> [<ffffffff81a7a22f>] do_vfs_ioctl+0x1bf/0x1630 fs/ioctl.c:679
> >> >>>>> >> >> [<ffffffff81a7b72f>] SYSC_ioctl fs/ioctl.c:694 [inline]
> >> >>>>> >> >> [<ffffffff81a7b72f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
> >> >>>>> >> >> [<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
> >> >>>>> >> >>
> >> >>>>> >> >>
> >> >>>>> >> >> In all cases there is a vmalloc failure right before that:
> >> >>>>> >> >>
> >> >>>>> >> >> syz-executor4: vmalloc: allocation failure, allocated 0 of 16384
> >> >>>>> >> >> bytes, mode:0x14000c2(GFP_KERNEL|__GFP_HIGHMEM), nodemask=(null)
> >> >>>>> >> >> syz-executor4 cpuset=/ mems_allowed=0
> >> >>>>> >> >> CPU: 1 PID: 4852 Comm: syz-executor4 Not tainted 4.9.0 #6
> >> >>>>> >> >> Hardware name: Google Google Compute Engine/Google Compute Engine,
> >> >>>>> >> >> BIOS Google 01/01/2011
> >> >>>>> >> >> ffff8801c41df898 ffffffff8234d0df ffffffff00000001 1ffff1003883bea6
> >> >>>>> >> >> ffffed003883be9e 0000000041b58ab3 ffffffff84b38180 ffffffff8234cdf1
> >> >>>>> >> >> 0000000000000282 ffffffff84fd53c0 ffff8801dae65b38 ffff8801c41df4d0
> >> >>>>> >> >> Call Trace:
> >> >>>>> >> >> [< inline >] __dump_stack lib/dump_stack.c:15
> >> >>>>> >> >> [<ffffffff8234d0df>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
> >> >>>>> >> >> [<ffffffff8186530f>] warn_alloc+0x21f/0x360
> >> >>>>> >> >> [<ffffffff819792c9>] __vmalloc_node_range+0x4e9/0x770
> >> >>>>> >> >> [< inline >] __vmalloc_node mm/vmalloc.c:1749
> >> >>>>> >> >> [< inline >] __vmalloc_node_flags mm/vmalloc.c:1763
> >> >>>>> >> >> [<ffffffff8197961b>] vmalloc+0x5b/0x70 mm/vmalloc.c:1778
> >> >>>>> >> >> [<ffffffff826ad77b>] n_tty_open+0x1b/0x470 drivers/tty/n_tty.c:1883
> >> >>>>> >> >> [<ffffffff826ba973>] tty_ldisc_open.isra.3+0x73/0xd0
> >> >>>>> >> >> drivers/tty/tty_ldisc.c:463
> >> >>>>> >> >> [< inline >] tty_ldisc_restore drivers/tty/tty_ldisc.c:510
> >> >>>>> >> >> [<ffffffff826bafb4>] tty_set_ldisc+0x5e4/0x8b0 drivers/tty/tty_ldisc.c:579
> >> >>>>> >> >> [< inline >] tiocsetd drivers/tty/tty_io.c:2667
> >> >>>>> >> >> [<ffffffff826a3a93>] tty_ioctl+0xc63/0x2370 drivers/tty/tty_io.c:2924
> >> >>>>> >> >> [<ffffffff81a7a22f>] do_vfs_ioctl+0x1bf/0x1630
> >> >>>>> >> >> [< inline >] SYSC_ioctl fs/ioctl.c:698
> >> >>>>> >> >> [<ffffffff81a7b72f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
> >> >>>>> >> >> [<ffffffff84377941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
> >> >>>>> >> >> arch/x86/entry/entry_64.S:204
> >> >>>>> >> >>
> >> >>>>> >> >>
> >> >>>>> >> >> I've found that it's even documented in the source code, but it does
> >> >>>>> >> >> not look like a good failure mode for allocation failure:
> >> >>>>> >> >>
> >> >>>>> >> >> static int n_tty_open(struct tty_struct *tty)
> >> >>>>> >> >> {
> >> >>>>> >> >> struct n_tty_data *ldata;
> >> >>>>> >> >>
> >> >>>>> >> >> /* Currently a malloc failure here can panic */
> >> >>>>> >> >> ldata = vmalloc(sizeof(*ldata));
> >> >>>>> >> >
> >> >>>>> >> > How are you running out of vmalloc() memory?
> >> >>>>> >>
> >> >>>>> >>
> >> >>>>> >> I don't know exactly. But it does not seem to represent a problem for
> >> >>>>> >> the fuzzer.
> >> >>>>> >> Is it meant to be very hard to do?
> >> >>>>> >
> >> >>>>> > Yes, do you know of any normal way to cause it to fail?
> >> >>>>>
> >> >>>>>
> >> >>>>> I don't. But I means approximately nothing.
> >> >>>>> Do you mean that it is not possible to trigger?
> >> >>>>> Doesn't simply creating lots of kernel resources (files, sockets,
> >> >>>>> pipe) will do the trick? Or just paging in lots of memory? Even if the
> >> >>>>> process itself will be chosen as OOM kill target, it will still take
> >> >>>>> the machine down with itself due to the panic while returning from the
> >> >>>>> syscall, no?
> >> >>>>
> >> >>>> I'm not saying that it's impossible, just an "almost" impossible thing
> >> >>>> to hit. Obviously you have hit it, so it can happen :)
> >> >>>>
> >> >>>> But, how to fix it? I really don't know. Unwinding a failure at this
> >> >>>> point in time is very tough, as that comment shows. Any suggestions of
> >> >>>> how it could be resolved are greatly appreciated.
> >> >>>
> >> >>> Is it possible to not shutdown the old discipline tty_set_ldisc before
> >> >>> we prepare everything for the new one:
> >> >>>
> >> >>> /* Shutdown the old discipline. */
> >> >>> tty_ldisc_close(tty, old_ldisc);
> >> >>>
> >> >>> Currently it does:
> >> >>>
> >> >>> close(old)
> >> >>> if (open(new))
> >> >>> open(old) // assume never fails
> >> >>>
> >> >>> it looks inherently problematic.
> >> >>> Couldn't we do:
> >> >>>
> >> >>> if (open(new))
> >> >>> return -ESOMETHING
> >> >>> close(old)
> >> >>>
> >> >>> ?
> >> >>
> >> >>
> >> >> Or can we just kill the task? Still better than kernel panic.
> >> >
> >> > I guess we can't get away with killing the task as tty will be left in
> >> > inconsistent state and it is accessible to other tasks.
> >> > But what creating new ldisk first and then, if that succeeds,
> >> > destroying the old one?
> >>
> >>
> >> This is hurting us badly.
> >
> > Really? How? Are you hitting this a lot? Why now and never before?
> > Are you really out of memory?
>
>
> This crashes our test bots a lot.
> Why now... I don't have exact answer. Probably a combination of fuzzer
> figuring out some magic sequences of syscalls and increased memory
> consumption due to something (again maybe due to fuzzer figuring out
> how to eat more memory).

If the fuzzer is suddenly eating more memory, you should be seeing lots
of other problems right? This can't be the only thing that has issues
with memory allocation failures?

thanks,

greg k-h