Re: [PATCH] iomap: Handle I/O errors gracefully in page_mkwrite

From: Dave Chinner
Date: Thu Jun 04 2020 - 19:30:59 EST


On Thu, Jun 04, 2020 at 04:05:19PM -0700, Matthew Wilcox wrote:
> On Fri, Jun 05, 2020 at 08:57:26AM +1000, Dave Chinner wrote:
> > On Thu, Jun 04, 2020 at 01:23:40PM -0700, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
> > >
> > > Test generic/019 often results in:
> > >
> > > WARNING: at fs/iomap/buffered-io.c:1069 iomap_page_mkwrite_actor+0x57/0x70
> > >
> > > Since this can happen due to a storage error, we should not WARN for it.
> > > Just return -EIO, which will be converted to a SIGBUS for the hapless
> > > task attempting to write to the page that we can't read.
> >
> > Why didn't the "read" part of the fault which had the EIO error fail
> > the page fault? i.e. why are we waiting until deep inside the write
> > fault path to error out on a failed page read?
>
> I have a hypothesis that I don't know how to verify.
>
> First the task does a load from the page and we put a read-only PTE in
> the page tables. Then it writes to the page using write(). The page
> gets written back, but hits an error in iomap_writepage_map()
> which calls ClearPageUptodate(). Then the task with it mapped attempts
> to store to it.

Sure, but that's not really what I was asking: why isn't this
!uptodate state caught before the page fault code calls
->page_mkwrite? The page fault code has a reference to the page,
after all, and in a couple of paths it even has the page locked.

What I'm trying to understand is why this needs to be fixed inside
->page_mkwrite, because AFAICT if we have to fix this in the iomap
code, we have the same "we got handed a bad page by the page fault"
problem in every single ->page_mkwrite implementation....

> I haven't dug through what generic/019 does, so I don't know how plausible
> this is.

# Run fsstress and fio(dio/aio and mmap) and simulate disk failure
# check filesystem consistency at the end.

I don't think it is mixing buffered writes and mmap writes on the
same file via fio. Maybe fsstress is triggering that, but the
fsstress workload is single threaded so, again, I'm not sure it will
do this.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx