Re: [PATCH v4 15/27] fs: retrofit old error reporting API onto new infrastructure

From: Jan Kara
Date: Tue May 23 2017 - 05:06:08 EST


On Mon 22-05-17 15:09:33, Jeff Layton wrote:
> On Mon, 2017-05-22 at 19:53 +0200, Jan Kara wrote:
> > On Mon 22-05-17 09:53:21, Jeff Layton wrote:
> > > On Mon, 2017-05-22 at 15:38 +0200, Jan Kara wrote:
> > > > > In the case of something like ext2, could we instead get away with just
> > > > > marking the data mapping of the inode with an error if the metadata
> > > > > writeout fails?
> > > > >
> > > > > Then we could just have write_inode operations call mapping_set_error on
> > > > > inode->i_mapping when they're going to return an error. That should be
> > > > > functionally equivalent, I'd think.
> > > > >
> > > > > The catch there is that that requires a 1:1 data:metadata mapping, and
> > > > > I'm not sure that that is the case (or will always be, even if it is
> > > > > now).
> > > >
> > > > So for ext2 / ext4 in nojournal mode this should work - we track all
> > > > relevant metadata in mapping->private_list. But I cannot really comment
> > > > on other filesystems like f2fs...
> > > >
> > >
> > > Actually, I think that may be problematic...
> > >
> > > We could end up calling ext2_write_inode with sync_mode != WB_SYNC_ALL,
> > > which just dirties the buffer without starting writeback. Then, have VM
> > > subsystem write back the buffer due to memory pressure and have that
> > > fail. Trying to set the error in write_inode would miss that situation.
> >
> > Two notes here:
> >
> > 1) Inode is a bad example because there isn't 1:1 mapping between buffers
> > containing inodes and mappings - one buffer contains several inodes.
> > I wanted to add that for inodes specifically it does not matter as they get
> > special handling but actually fsync seems to be currently unreliable for
> > them - if we first wrote them in WB_SYNC_NONE mode, they will be just
> > written in bdev's page cache, but following fsync(2) will do nothing as
> > they will be clean. Anyway, this is unrelated problem.
> >
>
> Yes, that's what I was trying to articulate above. I'm not sure it's
> unrelated though. Moving to errseq_t based handling there based on the
> blockdev mapping seems like it'd solve that. That does require an extra
> errseq_t though.

Well, it might help solving the error handling case but it doesn't solve
the fundamental problem that the inode buffer even doesn't have to be
written to disk by the time fsync(2) returns.

> (I assume that on ext2 inode writeback, bh->b_page->mapping->host points
> to the bdev inode?)

Yes, it does.

> > 2) For metadata like indirect blocks where you indeed have 1:1 mapping, you
> > can do the error setting in ->end_io handler based on bh->b_assoc_map and
> > that should do what you need, shouldn't it?
> >
>
> That would probably work, and I think the mark_buffer_write_io_error
> function that I was adding should already be doing the right thing
> there.

Agreed.

> > If I'm indeed right, then for buffers which have 1:1 mapping we are fine
> > and if we find a solution for inodes, we could avoid the second errseq_t.
>
> Yeah, I'm just still not seeing a good way to track error in inode
> metadata writeback without an extra errseq_t though. I don't suppose
> that a buffer holding inode metadata has a list of those inodes, does
> it? Then we could walk the list and flag each one with the error.
> Without something like that, I think we're stuck with an extra errseq_t.

No, the buffer doesn't have a list of associated inodes. For ext2/4 it is
doable to actually track down all the inodes but I don't think we want to
complicate this series by implementing such mechanism for each filesystem
that needs this. So let's start with a generic solution that uses second
errseq_t for the metadata mapping. It is somewhat rough (error in writeback
of any metadata block will fail fsync(2) for all open files) but we can later
improve on this for each fs which cares enough about better error reporting.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR