Re: INFO: rcu detected stall in ext4_write_checks

From: Theodore Ts'o
Date: Wed Jun 26 2019 - 14:43:19 EST


On Wed, Jun 26, 2019 at 10:27:08AM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: abf02e29 Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pu..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1435aaf6a00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e5c77f8090a3b96b
> dashboard link: https://syzkaller.appspot.com/bug?extid=4bfbbf28a2e50ab07368
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11234c41a00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15d7f026a00000
>
> The bug was bisected to:
>
> commit 0c81ea5db25986fb2a704105db454a790c59709c
> Author: Elad Raz <eladr@xxxxxxxxxxxx>
> Date: Fri Oct 28 19:35:58 2016 +0000
>
> mlxsw: core: Add port type (Eth/IB) set API

Um, so this doesn't pass the laugh test.

> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10393a89a00000

It looks like the automated bisection machinery got confused by two
failures getting triggered by the same repro; the symptoms changed
over time. Initially, the failure was:

crashed: INFO: rcu detected stall in {sys_sendfile64,ext4_file_write_iter}

Later, the failure changed to something completely different, and much
earlier (before the test was even started):

run #5: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor216456474" "root@xxxxxxxxxxxxx:./syz-executor216456474"]: exit status 1
Connection timed out during banner exchange
lost connection

Looks like an opportunity to improve the bisection engine?

- Ted