Re: [PATCH] fs: sendfile handles O_NONBLOCK of out_fd

From: Andrei Vagin
Date: Sun May 08 2022 - 15:02:34 EST


On Sat, May 07, 2022 at 02:52:24PM -0700, Andrew Morton wrote:
> On Mon, 2 May 2022 00:01:46 -0700 Andrei Vagin <avagin@xxxxxxxxx> wrote:
>
> > Andrew, could you take a look at this patch?
> >
> > Here is a small reproducer for the problem:
> >
> > #define _GNU_SOURCE /* See feature_test_macros(7) */
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <errno.h>
> > #include <sys/stat.h>
> > #include <sys/types.h>
> > #include <sys/sendfile.h>
> >
> >
> > #define FILE_SIZE (1UL << 30)
> > int main(int argc, char **argv) {
> > int p[2], fd;
> >
> > if (pipe2(p, O_NONBLOCK))
> > return 1;
> >
> > fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
> > if (fd < 0)
> > return 1;
> > ftruncate(fd, FILE_SIZE);
> >
> > if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
> > fprintf(stderr, "FAIL\n");
> > }
> > if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
> > fprintf(stderr, "FAIL\n");
> > }
> > return 0;
> > }
> >
> > It worked before b964bf53e540, it is stuck after b964bf53e540, and it
> > works again with this fix.
>
> Thanks. How did b964bf53e540 cause this? do_splice_direct()
> accidentally does the right thing even when SPLICE_F_NONBLOCK was not
> passed?

do_splice_direct() calls pipe_write that handles O_NONBLOCK. Here is
a trace log from the reproducer:

1) | __x64_sys_sendfile64() {
1) | do_sendfile() {
1) | __fdget()
1) | rw_verify_area()
1) | __fdget()
1) | rw_verify_area()
1) | do_splice_direct() {
1) | rw_verify_area()
1) | splice_direct_to_actor() {
1) | do_splice_to() {
1) | rw_verify_area()
1) | generic_file_splice_read()
1) + 74.153 us | }
1) | direct_splice_actor() {
1) | iter_file_splice_write() {
1) | __kmalloc()
1) 0.148 us | pipe_lock();
1) 0.153 us | splice_from_pipe_next.part.0();
1) 0.162 us | page_cache_pipe_buf_confirm();
... 16 times
1) 0.159 us | page_cache_pipe_buf_confirm();
1) | vfs_iter_write() {
1) | do_iter_write() {
1) | rw_verify_area()
1) | do_iter_readv_writev() {
1) | pipe_write() {
1) | mutex_lock()
1) 0.153 us | mutex_unlock();
1) 1.368 us | }
1) 1.686 us | }
1) 5.798 us | }
1) 6.084 us | }
1) 0.174 us | kfree();
1) 0.152 us | pipe_unlock();
1) + 14.461 us | }
1) + 14.783 us | }
1) 0.164 us | page_cache_pipe_buf_release();
... 16 times
1) 0.161 us | page_cache_pipe_buf_release();
1) | touch_atime()
1) + 95.854 us | }
1) + 99.784 us | }
1) ! 107.393 us | }
1) ! 107.699 us | }

>
> I assume that Al will get to this. Meanwhile I can toss it
> into linux-next to get some exposure and so it won't be lost.
>