Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
From: Chris Mason
Date: Tue Sep 24 2024 - 15:18:13 EST
On 9/20/24 3:54 PM, Chris Mason wrote:
[ ... ]
> xas_split_alloc() does the allocation and also shoves an entry into some of
> the slots. When the tree changes, the entry we've stored is wildly
> wrong, but xas_reset() doesn't undo any of that. So when we actually
> use the xas->xa_alloc nodes we've setup, they are pointing to the
> wrong things.
>
> Which is probably why the commits in 6.10 added this:
>
> /* entry may have changed before we re-acquire the lock */
> if (alloced_order && (old != alloced_shadow || order != alloced_order)) {
> xas_destroy(&xas);
> alloced_order = 0;
> }
>
> The only way to undo the work done by xas_split_alloc() is to call
> xas_destroy().
>
> To prove this theory, I tried making a minimal version that also
> called destroy, but it all ended up less minimal than the code
> that's actually in 6.10. I've got a long test going now with
> an extra cond_resched() to make the race bigger, and a printk of victory.
>
> It hasn't fired yet, and I need to hop on an airplane, so I'll just leave
> it running for now. But long story short, I think we should probably
> just tag all of these for stable:
>
> https://lore.kernel.org/all/20240415171857.19244-2-ryncsn@xxxxxxxxx/T/#mdb85922624c39ea7efb775a044af4731890ff776
>
> Also, Willy's proposed changes to xas_split_alloc() seem like a good
> idea.
A few days of load later and some extra printks, it turns out that
taking the writer lock in __filemap_add_folio() makes us dramatically
more likely to just return EEXIST than go into the xas_split_alloc() dance.
With the changes in 6.10, we only get into that xas_destroy() case above
when the conflicting entry is a shadow entry, so I changed my repro to
use memory pressure instead of fadvise.
I also added a schedule_timeout(1) after the split alloc, and with all
of that I'm able to consistently make the xas_destroy() case trigger
without causing any system instability. Kairui Song's patches do seem
to have fixed things nicely.
-chris