Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

From: Linus Torvalds
Date: Fri Sep 13 2024 - 17:24:38 EST


On Fri, 13 Sept 2024 at 11:15, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> Oh! I think split is the key. Let's say we have an order-6 (or
> larger) folio. And we call split_huge_page() (whatever it's called
> in your kernel version). That calls xas_split_alloc() followed
> by xas_split(). xas_split_alloc() puts entry in node->slots[0] and
> initialises node->slots[1..XA_CHUNK_SIZE] to a sibling entry.

Hmm. The splitting does seem to be not just indicated by the debug
logs, but it ends up being a fairly complicated case. *The* most
complicated case of adding a new folio by far, I'd say.

And I wonder if it's even necessary?

Because I think the *common* case is through filemap_add_folio(),
isn't it? And that code path really doesn't care what the size of the
folio is.

So instead of splitting, that code path would seem to be perfectly
happy with instead erroring out, and simply re-doing the new folio
allocation using the same size that the old conflicting folio had (at
which point it won't be conflicting any more).

No?

It's possible that I'm entirely missing something, but at least the
filemap_add_folio() case looks like it really would actually be
happier with a "oh, that size conflicts with an existing entry, let's
just allocate a smaller size then"

Linus