Re: [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups

Next message: Luca Leonardo Scorcia: "[PATCH] dt-bindings: display: panel: Document the rotation property"
Previous message: Nathan Chancellor: "Re: linux-next: manual merge of the kvm-x86 tree with the tip tree"
In reply to: Chris J Arges: "[PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups"
Next in thread: Kiryl Shutsemau: "Re: [PATCH RFC 1/1] mm/filemap: handle large folio split race in page cache lookups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Matthew Wilcox

Date: Thu Mar 05 2026 - 14:27:09 EST

On Thu, Mar 05, 2026 at 12:34:33PM -0600, Chris J Arges wrote:
> We have been hitting VM_BUG_ON_FOLIO(!folio_contains(folio, index)) in
> production environments. These machines are using XFS with large folio
> support enabled and are under high memory pressure.
>
> >From reading the code it seems plausible that folio splits due to memory
> reclaim are racing with filemap_fault() serving mmap page faults.
>
> The existing code checks for truncation (folio->mapping != mapping) and
> retries, but there does not appear to be equivalent handling for the
> split case. The result is:
>
> kernel BUG at mm/filemap.c:3519!
> VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio)

This didn't occur to me as a possibility because filemap_get_entry()
is _supposed_ to take care of it. But if this patch fixes it, then
we need to understand why it works.

folio_split() needs to be sure that it's the only one holding a reference
to the folio. To that end, it calculates the expected refcount of the
folio, and freezes it (sets the refcount to 0 if the refcount is the
expected value). Once filemap_get_entry() has incremented the refcount,
freezing will fail.

But of course, we can race. filemap_get_entry() can load a folio first,
the entire folio_split can happen, then it calls folio_try_get() and
succeeds, but it no longer covers the index we were looking for. That's
what the xas_reload() is trying to prevent -- if the index is for a
folio which has changed, then the xas_reload() should come back with a
different folio and we goto repeat.

So how did we get through this with a reference to the wrong folio?