Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

From: Matthew Wilcox
Date: Fri Sep 13 2024 - 11:52:03 EST


On Fri, Sep 13, 2024 at 11:30:41AM -0400, Chris Mason wrote:
> I've mentioned this in the past to both Willy and Dave Chinner, but so
> far all of my attempts to reproduce it on purpose have failed. It's
> awkward because I don't like to send bug reports that I haven't
> reproduced on a non-facebook kernel, but I'm pretty confident this bug
> isn't specific to us.

I don't think the bug is specific to you either. It's been hit by
several people ... but it's really hard to hit ;-(

> I'll double down on repros again during plumbers and hopefully come up
> with a recipe for explosions. On other important datapoint is that we

I appreciate the effort!

> The issue looked similar to Christian Theune's rcu stalls, but since it
> was just one CPU spinning away, I was able to perf probe and drgn my way
> to some details. The xarray for the file had a series of large folios:
>
> [ index 0 large folio from the correct file ]
> [ index 1: large folio from the correct file ]
> ...
> [ index N: large folio from a completely different file ]
> [ index N+1: large folio from the correct file ]
>
> I'm being sloppy with index numbers, but the important part is that
> we've got a large folio from the wrong file in the middle of the bunch.

If you could get the precise index numbers, that would be an important
clue. It would be interesting to know the index number in the xarray
where the folio was found rather than folio->index (as I suspect that
folio->index is completely bogus because folio->mapping is wrong).
But gathering that info is going to be hard.

Maybe something like this?

+++ b/mm/filemap.c
@@ -2317,6 +2317,12 @@ static void filemap_get_read_batch(struct address_space *mapping,
if (unlikely(folio != xas_reload(&xas)))
goto put_folio;

+{
+ struct address_space *fmapping = READ_ONCE(folio->mapping);
+ if (fmapping != NULL && fmapping != mapping)
+ printk("bad folio at %lx\n", xas.xa_index);
+}
+
if (!folio_batch_add(fbatch, folio))
break;
if (!folio_test_uptodate(folio))

(could use VM_BUG_ON_FOLIO() too, but i'm not sure that the identity of
the bad folio we've found is as interesting as where we found it)