Re: [GIT PULL] gfs2 fix

From: Andreas Gruenbacher
Date: Thu Apr 28 2022 - 13:39:12 EST

On Thu, Apr 28, 2022 at 7:09 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Apr 28, 2022 at 6:27 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
> >
> > The data corruption we've been getting unfortunately didn't have to do
> > with lock contention (we already knew that); it still occurs. I'm
> > running out of ideas on what to try there.
> Hmm.
> I don't see the bug, but I do have a suggestion on something to try.
> In particular, you said the problem started with commit 00bfe02f4796
> ("gfs2: Fix mmap + page fault deadlocks for buffered I/O").

Yes, but note that it's gfs2_file_buffered_write() that fails. When
the pagefault_disable/enable() around iomap_file_buffered_write() is
removed, the corruption goes away.

> And to me, I see two main things that are going on
> (a) the obvious "calling generic IO functions with pagefault disabled" thing
> (b) the "allow demotion" thing
> And I wonder if you could at least pinpoint which of the cases it is
> that triggers it.
> So I'd love to see you try three things:
> (1) just remove the "allow demotion" cases.
> This will re-introduce the deadlock the commit is trying to fix,
> but that's such a special case that I assume you can run your
> test-suite that shows the problem even without that fix in place?
> This would just pinpoint whether it's due to some odd locking issue or not.
> Honestly, from how you describe the symptoms, I don't think (1) is the
> cause, but I think making sure is good.
> It sounds much more likely that it's one of those generic vfs
> functions that screws up when a page fault happens and it gets a
> partial result instead of handling the fault.

The test should run just fine without allowing demotion. I'll try (1),
but I don't expect the outcome to change.

> Which gets us to
> (2) remove the pagefault_disable/enable() around just the
> generic_file_read_iter() case in gfs2_file_read_iter().
> and
> (3) finally, remove the pagefault_disable/enable() around the
> iomap_file_buffered_write() case in gfs2_file_buffered_write()
> Yeah, yeah, you say it's just the read that fails, but humor me on
> (3), just in case it's an earlier write in your test-suite and the
> read just then uncovered it.
> But I put it as (3) so that you'd do the obvious (2) case first, and
> narrow it down (ie if (1) still shows the bug, then do (2), and if
> that fixes the bug it will be fairly well pinpointed to
> generic_file_read_iter().

As mentioned above, we already did (3) and it didn't help. I'll do (1)
now, and then (2).

> Looking around, gfs2 is the only thing that obviously calls
> generic_file_read_iter() with pagefaults disabled, so it does smell
> like filemap_read() might have some issue, but the only thing that
> does is basically that
> copied = copy_folio_to_iter(folio, offset, bytes, iter);
> which should just become copy_page_to_iter_iovec(), which you'd hope
> would get things right.
> But it would be good to just narrow things down a bit.
> I'll look at that copy_page_to_iter_iovec() some more regardless, but
> doing that "let's double-check it's not somethign else" would be good.

We've actually been running most of our experiments on a 5.14-based
kernel with a plethora of backports, so pre-folio. Sorry I forgot to
mention that. I'll reproduce with mainline as well.