Re: [patch v3] splice: fix race with page invalidation

From: Nick Piggin
Date: Mon Aug 04 2008 - 23:04:24 EST


On Tuesday 05 August 2008 01:29, Jamie Lokier wrote:
> Nick Piggin wrote:
> > On Saturday 02 August 2008 04:28, Miklos Szeredi wrote:
> > > On Fri, 1 Aug 2008, Nick Piggin wrote:
> > > > Well, a) it probably makes sense in that case to provide another mode
> > > > of operation which fills the data synchronously from the sender and
> > > > copys it to the pipe (although the sender might just use read/write)
> > > > And b) we could *also* look at clearing PG_uptodate as an
> > > > optimisation iff that is found to help.
> > >
> > > IMO it's not worth it to complicate the API just for the sake of
> > > correctness in the so-very-rare read error case. Users of the splice
> > > API will simply ignore this requirement, because things will work fine
> > > on ext3 and friends, and will break only rarely on NFS and FUSE.
> > >
> > > So I think it's much better to make the API simple: invalid pages are
> > > OK, and for I/O errors we return -EIO on the pipe. It's not 100%
> > > correct, but all in all it will result in less buggy programs.
> >
> > That's true, but I hate how we always (in the VM, at least) just brush
> > error handling under the carpet because it is too hard :(
> >
> > I guess your patch is OK, though. I don't see any reasons it could cause
> > problems...
>
> At least, if there are situations where the data received is not what
> a common sense programmer would expect (e.g. blocks of zeros, data
> from an unexpected time in syscall sequence, or something, or just
> "reliable except with FUSE and NFS"), please ensure it's documented in
> splice.txt or wherever.

Not quite true. Many filesystems can return -EIO, and truncate can
partially zero pages.

Basically the man page should note that until the splice API is
improved, then a) -EIO errors will be seen at the receiever, b)
the pages can see transient zeroes (this is the case with read(2)
as well, but splice has a much bigger window), and c) the sender
does not send a snapshot of data because it can still be modified
until it is recieved.

c is not too surprising for an asynchronous interface, but it is
nice to document in case people are expecting COw or something.
b and c can more or less be worked around by not doing silly things
like truncating or scribbling on data until reciever really has it.
a, I argue, should be fixed in API.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/