Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

From: Jeff Layton
Date: Mon Apr 10 2017 - 09:20:27 EST


On Mon, 2017-04-10 at 09:15 +1000, NeilBrown wrote:
> On Fri, Apr 07 2017, Jeff Layton wrote:
>
> >
> > > ... and then there's no need to update if it's the same errno and nobody's
> > > seen it:
> > >
> > > if (old == new)
> > > break;
> > >
> >
> > No, we can't do this. The thing could have just been updated by a task
> > that is setting the "seen" bit. We don't want to lose the error here. We
> > always have to do the cmpxchg on the set_wb_error side, I think.
>
> I don't follow your logic.
> If (old == new) then there was a moment since this function started when
> performing the cmpxchg() would not have changed the contents of memory.
> So let's pretend it did actually happen at that moment, and not change
> memory.
>
> If we race with a task setting the "seen" bit, then it will have seen
> the error *after* the new error, that this thread is reporting, actually
> happened. So the result is still correct.
>

Ok, that does make sense. I'll plan to do that.

There's also a bug in the last patch that I sent. We need to mark the
SEEN bit when we sample the value at open time, so we need a
filemap_sample_wb_error function to grab the current wb_err_t and mark
it SEEN if necessary.

That also gives us a way to handle something like filemap_write_and_wait
(which doesn't take a struct file). We can sample the wb_err_t prior to
starting writeback, and then return an error if anything failed after
that point.

I think that's probably close enough to how the current code works that
we can use it to make drop-in replacements for filemap_write_and_wait*
which should keep us from having to change so much existing code here.

filemap_check_errors would need to take a previously-sampled wb_err_t
argument, but only the lowest-level callers of that and
filemap_fdatawait* would need to deal with them directly.
--
Jeff Layton <jlayton@xxxxxxxxxx>