Re: copy on write for splice() from file to pipe?

From: Stefan Metzmacher
Date: Fri Feb 10 2023 - 14:43:57 EST


Am 10.02.23 um 20:27 schrieb Jeremy Allison:
On Fri, Feb 10, 2023 at 11:18:05AM -0800, Linus Torvalds via samba-technical wrote:

We should point the fingers at either the _user_ of splice - as Jeremy
Allison has done a couple of times - or we should point it at the sink
that cannot deal with unstable sources.
....
- it sounds like the particular user in question (samba) already very
much has a reasonable model for "I have exclusive access to this" that
just wasn't used

Having said that, I just had a phone discussion with Ralph Boehme
on the Samba Team, who has been following along with this in
read-only mode, and he did point out one case I had missed.

1). Client opens file with a lease. Hurrah, we think we can use splice() !
2). Client writes into file.
3). Client calls SMB_FLUSH to ensure data is on disk.
4). Client reads the data just wrtten to ensure it's good.
5). Client overwrites the previously written data.

Now when client issues (4), the read request, if we
zero-copy using splice() - I don't think theres a way
we get notified when the data has finally left the
system and the mapped splice memory in the buffer cache
is safe to overwrite by the write (5).

So the read in (4) could potentially return the data
written in (5), if the buffer cache mapped memory has
not yet been sent out over the network.

That is certainly unexpected behavior for the client,
even if the client leased the file.

If that's the case, then splice() is unusable for
Samba even in the leased file case.

I think we just need some coordination in userspace.

What might be helpful in addition would be some kind of
notification that all pages are no longer used by the network
layer, IORING_OP_SENDMSG_ZC already supports such a notification,
maybe we can build something similar.

  Maybe this thread raised some awareness of it for some people, but
more realistically - maybe we can really document this whole issue
somewhere much more clearly

Complete comprehensive documentation on this would
be extremely helpful (to say the least :-).

Yes, good documentation is always good :-)