Re: [syzbot] BUG: sleeping function called from invalid context in __might_resched

From: Fabio M. De Francesco
Date: Tue Nov 16 2021 - 04:13:28 EST


On Tuesday, November 16, 2021 9:53:53 AM CET Fabio M. De Francesco wrote:
> On Tuesday, November 16, 2021 9:09:11 AM CET syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer is still
triggering
> an issue:
> > BUG: sleeping function called from invalid context in __might_resched
> >
> > BUG: sleeping function called from invalid context at kernel/printk/
> printk.c:2522
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8755, name: syz-
> executor.2
> > preempt_count: 1, expected: 0
> > RCU nest depth: 0, expected: 0
> > 3 locks held by syz-executor.2/8755:
> > #0: ffff888070c9a098
> > (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/
> tty/tty_ldisc.c:252
> > #1: ffff888070c9a468
> > (&tty->flow.lock){....}-{2:2}, at: spin_lock_irq include/linux/
spinlock.h:
> 374 [inline]
> > (&tty->flow.lock){....}-{2:2}, at: n_tty_ioctl_helper+0xb6/0x2d0
drivers/
> tty/tty_ioctl.c:877
> > #2: ffff888070c9a098 (&tty->ldisc_sem){++++}-{0:0}, at:
> tty_ldisc_ref+0x1d/0x80 drivers/tty/tty_ldisc.c:273
> > irq event stamp: 916
> > hardirqs last enabled at (915): [<ffffffff81beabd5>]
> kasan_quarantine_put+0xf5/0x210 mm/kasan/quarantine.c:220
> > hardirqs last disabled at (916): [<ffffffff8950a731>] __raw_spin_lock_irq
> include/linux/spinlock_api_smp.h:117 [inline]
> > hardirqs last disabled at (916): [<ffffffff8950a731>]
> _raw_spin_lock_irq+0x41/0x50 kernel/locking/spinlock.c:170
> > softirqs last enabled at (0): [<ffffffff8144cf0c>] copy_process+0x1e8c/
> 0x75a0 kernel/fork.c:2136
> > softirqs last disabled at (0): [<0000000000000000>] 0x0
> > Preemption disabled at:
> > [<0000000000000000>] 0x0
> > CPU: 1 PID: 8755 Comm: syz-executor.2 Not tainted 5.16.0-rc1-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > __might_resched.cold+0x222/0x26b kernel/sched/core.c:9542
> > console_lock+0x17/0x80 kernel/printk/printk.c:2522
> > con_flush_chars drivers/tty/vt/vt.c:3365 [inline]
> > con_flush_chars+0x35/0x90 drivers/tty/vt/vt.c:3357
> > con_write+0x2c/0x40 drivers/tty/vt/vt.c:3296
>
> The reproducer is still triggering an issue, but this time it looks like it
> is triggered by a different path of execution.
>
> The same invalid "in_interrupt()" test is also in con_flush_chars().
>
> Let's try to remove it too...
>
> My first idea would be to replace "if (in_interrupt())" with the same
> "preempt_count() || irqs_disabled()" I used in do_con_write(). However I
> noticed that both do_con_write() and con_flush_chars() are only called from
> inside con_write() (which, aside from calling those functions, does nothing
> else).
>
> So why not remove the if (in_interrupt()) from both them and use if
> (preempt_count() || irqs_disabled()) just once in con_write()?
>
> I think this should be the right solution, but I prefer to go one step at a
> time.
>
> Therefore, I'll (1) use the same (redundant, if it was used in con_write())
> test also in con_flush_chars(), (2) wait for Syzbot to confirm that it
fixes
> the bug, and (3) wait for maintainers review and suggestions about whether
or
> not moving those tests one level upper.
>

#syz test:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

>
> ---
> Fabio M. De Francesco
>

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 7359c3e80d63..46511d1ac6ee 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -2902,7 +2902,7 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co
struct vt_notifier_param param;
bool rescan;

- if (in_interrupt())
+ if (preempt_count() || irqs_disabled())
return count;

console_lock();
@@ -3358,7 +3358,7 @@ static void con_flush_chars(struct tty_struct *tty)
{
struct vc_data *vc;

- if (in_interrupt()) /* from flush_to_ldisc */
+ if (preempt_count() || irqs_disabled()) /* from flush_to_ldisc */
return;

/* if we race with con_close(), vt may be null */