Re: POSIX violation by writeback error

From: J. Bruce Fields
Date: Tue Sep 04 2018 - 12:12:07 EST


On Tue, Sep 04, 2018 at 11:44:20AM -0400, Jeff Layton wrote:
> On Tue, 2018-09-04 at 22:56 +0800, çæå wrote:
> > A practical and concrete example may be,
> > A disk cleaner program that first searches for garbage files that won't be used
> > anymore and save the list in a file (open()-write()-close()) and wait for the
> > user to confirm the list of files to be removed. A writeback error occurs
> > and the related page/inode/address_space gets evicted while the user is
> > taking a long thought about it. Finally, the user hits enter and the
> > cleaner begin
> > to open() read() the list again. But what gets removed is the old list
> > of files that
> > was generated several months ago...
> >
> > Another example may be,
> > An email editor and a busy mail sender. A well written mail to my boss is
> > composed by this email editor and is saved in a file (open()-write()-close()).
> > The mail sender gets notified with the path of the mail file to queue it and
> > send it later. A writeback error occurs and the related
> > page/inode/address_space gets evicted while the mail is still waiting in the
> > queue of the mail sender. Finally, the mail file is open() read() by the sender,
> > but what is sent is the mail to my girlfriend that was composed yesterday...
> >
> > In both cases, the files are not meant to be persisted onto the disk.
> > So, fsync()
> > is not likely to be called.
> >
>
> So at what point are you going to give up on keeping the data? The
> fundamental problem here is an open-ended commitment. We (justifiably)
> avoid those in kernel development because it might leave the system
> without a way out of a resource crunch.

Well, I think the point was that in the above examples you'd prefer that
the read just fail--no need to keep the data. A bit marking the file
(or even the entire filesystem) unreadable would satisfy posix, I guess.
Whether that's practical, I don't know.

> > - If the following read() could be served by a page in memory, just returns the
> > data. If the following read() could not be served by a page in memory and the
> > inode/address_space has a writeback error mark, returns EIO.
> > If there is a writeback error on the file, and the request data could
> > not be served
> > by a page in memory, it means we are reading a (partically) corrupted
> > (out-of-data)
> > file. Receiving an EIO is expected.
> >
>
> No, an error on read is not expected there. Consider this:
>
> Suppose the backend filesystem (maybe an NFSv3 export) is really r/o,
> but was mounted r/w. An application queues up a bunch of writes that of
> course can't be written back (they get EROFS or something when they're
> flushed back to the server), but that application never calls fsync.
>
> A completely unrelated application is running as a user that can open
> the file for read, but not r/w. It then goes to open and read the file
> and then gets EIO back or maybe even EROFS.
>
> Why should that application (which did zero writes) have any reason to
> think that the error was due to prior writeback failure by a completely
> separate process? Does EROFS make sense when you're attempting to do a
> read anyway?
>
> Moreover, what is that application's remedy in this case? It just wants
> to read the file, but may not be able to even open it for write to issue
> an fsync to "clear" the error. How do we get things moving again so it
> can do what it wants?
>
> I think your suggestion would open the floodgates for local DoS attacks.

Do we really care about processes with write permissions (even only
local client-side write permissions) being able to DoS readers? In
general readers kinda have to trust writers.

--b.