Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

From: Jeff Layton
Date: Wed Apr 05 2017 - 15:50:02 EST


On Tue, 2017-04-04 at 10:09 -0700, Matthew Wilcox wrote:
> On Tue, Apr 04, 2017 at 12:25:46PM -0400, Jeff Layton wrote:
> > That said, I think giving more specific errors where we can is useful.
> > When your program is erroring out and writing 'I/O error' to the logs,
> > then how much time will your admins burn before they figure out that it
> > really failed because the filesystem was full?
>
> df is one of the first things I check ... a few years ago, I also learned
> to check df -i ... ;-)
>
> Anyway, given the decision to simply report the last error lets us do this
> implementation:
>
> void filemap_set_wb_error(struct address_space *mapping, int err)
> {
> struct inode *inode = mapping->host;
> unsigned int wb_err;
>
> if (!err)
> return;
> /*
> * This should be called with the error code that we want to return
> * on fsync. Thus, it should always be <= 0.
> */
> WARN_ON(err > 0 || err < -MAX_ERRNO);
>
> spin_lock(&inode->i_lock);
> wb_err = ((mapping->wb_err & ~MAX_ERRNO) + (1 << 12)) | -err;
> WRITE_ONCE(mapping->wb_err, wb_err);
> spin_unlock(&inode->i_lock);
> }
>

I like this idea of being able to store arbitrary error codes there.
That should be used judiciously of course, but we already allow
returning arbitrary errors via the ->fsync op anyway.

I'll plan to incorporate something like that into the next set (with
judicious comments and constants).

One question...is the i_lock the right way to protect this? I think we
could do this locklessly too (cmpxchg in a loop, for instance). I'm not
worried about performance here -- it's just nice to be able to call
simple stuff like this without worrying about locking.

> int filemap_report_wb_error(struct file *file)
> {
> struct inode *inode = file_inode(file);
> unsigned int wb_err = READ_ONCE(mapping->wb_err);
>
> if (file->f_wb_err == wb_err)
> return 0;
> return -(wb_err & 4095);
> }
>
> That only gives us 20 bits of counter, but I think that's enough.

2^20 is 1048576, which seems a little small to me.

We may end up bumping the counter on every failed I/O. How fast can we
generate 1M failed I/Os? :)

2^52 however is 4503599627370496 (4Tios or so) ... that might take a
little longer to overflow. Is it worth the cost here to ensure that
this won't occur?

Actually...we could put this field in the inode instead of the mapping.
I know we've traditionally tracked this in the mapping, but is that
required here?

If we put this field in the inode then perhaps we can union it with
something and mitigate the cost of a larger counter...maybe in the
i_pipe union? I don't think S_ISREG inodes use anything in there, do
they?
--
Jeff Layton <jlayton@xxxxxxxxxx>