Re: [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups
From: Matthew Wilcox
Date: Thu Mar 05 2026 - 14:27:09 EST
On Thu, Mar 05, 2026 at 12:34:33PM -0600, Chris J Arges wrote:
> We have been hitting VM_BUG_ON_FOLIO(!folio_contains(folio, index)) in
> production environments. These machines are using XFS with large folio
> support enabled and are under high memory pressure.
>
> >From reading the code it seems plausible that folio splits due to memory
> reclaim are racing with filemap_fault() serving mmap page faults.
>
> The existing code checks for truncation (folio->mapping != mapping) and
> retries, but there does not appear to be equivalent handling for the
> split case. The result is:
>
> kernel BUG at mm/filemap.c:3519!
> VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio)
This didn't occur to me as a possibility because filemap_get_entry()
is _supposed_ to take care of it. But if this patch fixes it, then
we need to understand why it works.
folio_split() needs to be sure that it's the only one holding a reference
to the folio. To that end, it calculates the expected refcount of the
folio, and freezes it (sets the refcount to 0 if the refcount is the
expected value). Once filemap_get_entry() has incremented the refcount,
freezing will fail.
But of course, we can race. filemap_get_entry() can load a folio first,
the entire folio_split can happen, then it calls folio_try_get() and
succeeds, but it no longer covers the index we were looking for. That's
what the xas_reload() is trying to prevent -- if the index is for a
folio which has changed, then the xas_reload() should come back with a
different folio and we goto repeat.
So how did we get through this with a reference to the wrong folio?