Re: [PATCH v3] fs/splice: don't block splice_direct_to_actor() after data was read

From: Max Kellermann
Date: Tue Jun 04 2024 - 15:24:38 EST


On Tue, Jun 4, 2024 at 3:27 PM Jan Kara <jack@xxxxxxx> wrote:
> OK, so that was not clear to me (and this may well be just my ignorance of
> networking details). Do you say that your patch changes the behavior only
> for this cornercase? Even if the socket fd is blocking? AFAIU with your
> patch we'd return short write in that case as well (roughly 64k AFAICT
> because that's the amount the internal splice pipe will take) but currently
> we block waiting for more space in the socket bufs?

My patch changes only the file-read side, not the socket-write side.
It adds IOCB_NOWAIT for reading from the file, just like
filemap_read() does. Therefore, it does not matter whether the socket
is non-blocking.

But thanks for the reply - this was very helpful input for me because
I have to admit that part of my explanation was wrong:
I misunderstood how sending to a blocking socket works. I thought that
send() and sendfile() would return after sending at least one byte
(only recv() works that way), but in fact both block until everything
has been submitted. That is a rather embarrassing misunderstanding of
socket basics, but, uh, it just shows I've never really used blocking
sockets!

That means my patch can indeed change the behavior of sendfile() in a
way that might surprise (badly written) applications and should NOT be
merged as-is.
Your concerns were correct and thanks again!

That leaves me wondering how to solve this. Of course, io_uring is the
proper solution, but that part of my software isn't ready for io_uring
yet.

I could change this to only use IOCB_NOWAIT if the destination is
non-blocking, but something about this sounds wrong - it changes the
read side just because the write side is non-blocking.
We can't change the behavior out of fear of breaking applications; but
can we have a per-file flag so applications can opt into partial
reads/writes? This would be useful for all I/O on regular files (and
sockets and everything else). There would not be any guarantees, just
allowing the kernel to use relaxed semantics for those who can deal
with partial I/O.
Maybe I'm overthinking things and I should just fast-track full
io_uring support in my code...

Max