Re: copy on write for splice() from file to pipe?
From: Linus Torvalds
Date: Fri Feb 10 2023 - 14:18:31 EST
On Fri, Feb 10, 2023 at 11:02 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> Second, either make splice more strict or add a new "strict splice"
> variant. Strict splice only completes when it can promise that writes
> to the source that start after strict splice's completion won't change
> what gets written to the destination.
The thing ius, I think your "strict splice" is pointless and wrong.
It's pointless, because it simply means that it won't perform well.
And since the whole point of splice was performance, it's wrong.
I really think the whole "source needs to be stable" is barking up the
wrong tree.
You are pointing fingers at splice().
And I think that's wrong.
We should point the fingers at either the _user_ of splice - as Jeremy
Allison has done a couple of times - or we should point it at the sink
that cannot deal with unstable sources.
Because that whole "source is unstable" is what allows for that higher
performance. The moment you start requiring stability, you _will_ lose
it. You will have to lock the page, you'll have to umap it from any
shared mappings, etc etc. And even if there are no writers, or no
current mappers, all that effort to make sure that is the case is
actually fairly expensive.
So I would instead suggest a different approach entirely, with several
different steps:
- make sure people are *aware* of this all.
Maybe this thread raised some awareness of it for some people, but
more realistically - maybe we can really document this whole issue
somewhere much more clearly
- it sounds like the particular user in question (samba) already very
much has a reasonable model for "I have exclusive access to this" that
just wasn't used
- and finally, I do think it might make sense for the networking
people to look at how the networking side works with 'sendpage()'.
Because I really think that your "strict splice" model would just mean
that now the kernel would have to add not just a memcpy, but also a
new allocation for that new stable buffer for the memcpy, and that
would all just be very very pointless.
Alternatively, it would require some kind of nasty hard locking
together with other limitations on what can be done by non-splice
users.
Linus