Re: copy on write for splice() from file to pipe?

From: Stefan Metzmacher
Date: Fri Feb 10 2023 - 14:54:58 EST


Am 10.02.23 um 20:42 schrieb Linus Torvalds:
On Fri, Feb 10, 2023 at 11:27 AM Jeremy Allison <jra@xxxxxxxxx> wrote:

1). Client opens file with a lease. Hurrah, we think we can use splice() !
2). Client writes into file.
3). Client calls SMB_FLUSH to ensure data is on disk.
4). Client reads the data just wrtten to ensure it's good.
5). Client overwrites the previously written data.

Now when client issues (4), the read request, if we
zero-copy using splice() - I don't think theres a way
we get notified when the data has finally left the
system and the mapped splice memory in the buffer cache
is safe to overwrite by the write (5).

Well, but we know that either:

(a) the client has already gotten the read reply, and does the write
afterwards. So (4) has already not just left the network stack, but
actually made it all the way to the client.

OR

(b) (4) and (5) clearly aren't ordered on the client side (ie your
"client" is not one single thread, and did an independent read and
overlapping write), and the client can't rely on one happening before
the other _anyway_.

So if it's (b), then you might as well do the write first, because
there's simply no ordering between the two. If you have a concurrent
read and a concurrent write to the same file, the read result is going
to be random anyway.

I guess that's true, most clients won't have a problem.

However in theory it's possible that client uses a feature
called compounding, which means two requests are batched on the
way to the server they are processed sequentially and the responses
are batched on the way back again.

But we already have detection for that and the existing code also avoids
sendfile() in that case.

metze